Tecton 0.7: Making Batch, Streaming & Real-Time ML Transformations More Powerful & Flexible
We’re excited to introduce Tecton 0.7, our latest product release that makes it easier than ever before to implement high-quality data transformations for ML.
Tecton 0.7 makes Tecton data transformations more powerful and flexible to better support real-time and production machine learning applications such as fraud detection, recommendation systems, real-time pricing, and personalization. This release is an important step towards our long-term objective of helping all data teams build and operate highly optimized ML data pipelines using batch, streaming, and real-time data.
In 0.7, we expanded Tecton’s feature engineering framework to support optimized implementations of Count Distinct and Percentile aggregations, and added support for complex data types including Map, Struct, and multi-dimensional Arrays.
We’ve also added support for popular Python packages, further simplifying the process of implementing Python transformations. We’re introducing the new Stream Ingest API, which allows Tecton to ingest streaming events and write them to the feature store at sub-second latency.
Additionally, data teams can now build streaming features using Tecton’s Serverless Python and Aggregation engines, eliminating the need to use complex stream processing engines like Spark or Flink. Finally, Tecton 0.7 introduces support for data sources managed by the Databricks Unity Catalog, expanding the scope of data sources that can connect directly to Tecton.
While Tecton 0.7 was heavily influenced by feedback from our existing customers, we’re confident that it will also benefit many new users. And we’re not done—we’re working on introducing many additional enhancements to the Tecton framework in future releases.
Below is a detailed list of capabilities introduced in Tecton 0.7:
New Aggregations & Complex Feature Types:
Tecton added Percentile and Count Distinct to the robust set of built-in aggregations supported by Tecton’s feature engineering framework. As with Tecton’s other built-in aggregations, these are performant, simple to write, available for both batch and streaming features, and guaranteed to be consistent across online and offline environments.
This release also gives customers additional flexibility when defining features with new support for Map and Struct-type feature values, along with support for multi-dimensional Arrays. First-class support for these types will give users more ergonomic and performant feature definitions when working with complex data.
Enhanced Python Environments for On-Demand Feature Views (Public Preview):
With 0.7, developers can now choose different Python Environments for running their on-demand transformations. These Python Environments enable developers to leverage common Data Science packages in their on-demand transformation logic. For example, they can use the fuzzywuzzy
package to calculate the similarity between a user’s search terms and a product’s name.
Stream Ingest API (Public Preview):
The Stream Ingest API is an endpoint to update features with sub-second latency in the Tecton feature store. Records sent to the Stream Ingest API are immediately written to the Tecton feature store and made available for training and inference. By using the Stream Ingest API to send data to the Tecton feature platform, ML teams can:
- Integrate Tecton with any existing streaming feature pipeline without migrating feature code. The Stream Ingest API lets teams get all the data management, serving, governance, monitoring, and other benefits of Tecton’s feature platform on top of their existing feature pipelines, without having to rewrite any feature code. This means there is no need to migrate feature pipelines that are already working before getting started on using an enterprise feature platform, making it faster and easier for ML and DS teams to get their features centrally managed for trusted and reliable training and serving.
- Easily build powerful streaming features on event data using Python and performant aggregations. Tecton’s Serverless Python and Aggregations Engines enable data scientists and ML engineers to author and manage transformations in familiar Python, allowing you to skip the complicated code and heavy stream processing infrastructure required by other solutions.
- Bring read-after-write consistency to your feature infrastructure. The Stream Ingest API explicitly acknowledges when your input data is fully processed, making it easy for your application to push event data to the feature platform and quickly retrieve up-to-date feature vectors—something very useful for event-driven decisioning applications like loan approvals and fraud monitoring.
Support for Databricks Unity Catalog:
Tecton now supports data sources managed by Unity Catalog, Databricks’ new unified data governance solution which provides a centralized interface for data assets, fine-grained access control, data lineage, improved data sharing, and other new capabilities. In 0.7, Tecton customers can use the new UnityConfig
option to connect to Unity data sources.
For more details, please see Tecton Documentation or reach out to our team and we’ll be happy to answer your questions directly.