Tecton 0.6 Enables Data Teams to Improve Iteration Speed When Building Batch, Streaming & Real-Time Features
Today, we released Tecton 0.6, which comes with new capabilities and improvements to enhance feature engineering workflows. The 0.6 release aims to accelerate feature definition and testing with the addition of notebook-driven development.
Workflow improvements
Notebook-driven development for machine learning features
Tecton 0.6 introduces a notebook-centric workflow to develop production-ready machine learning features iteratively directly in a Jupyter notebook (or any other notebook), bridging the gap between development and production environments. Previously, ML features could only be tested with Tecton after rolling them out via Tecton’s CLI. Now, data teams can leverage Tecton’s feature engineering framework in their core modeling workflow without leaving their notebooks.
When it’s time to productionize, the tested feature definitions can be copied into a repo and pushed to production in one step. This approach maintains the ability to productionize features with a GitOps workflow, like “features-as-code,” version control, and CI/CD. Data teams can now write, run, and revise feature definitions within a notebook, offering the flexibility of a notebook while increasing the speed of iteration, making it easier to test and refine features.
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta
# Fetch the workspace
ws = tecton.get_workspace("prod")
# Fetch objects from the workspace
user = ws.get_entity("user")
user_sign_ups = ws.get_data_source("user_sign_ups")
# Use those objects as dependencies and define objects in a notebook
@batch_feature_view(
sources=[FilteredSource(user_sign_ups)],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '3' THEN 'AmEx'
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as credit_card_issuer
FROM
{user_sign_ups}
"""
# Validate objects interactively
user_credit_card_issuer.validate()
Stream Ingest API
Tecton’s new Stream Ingest API makes it easy to publish real-time data to the feature store from any stream or micro-service via a simple HTTP API call. In turn, ingested data is made available both for online serving and for offline training data generation. The new API is also fully compatible with Tecton’s aggregations framework, meaning that Tecton can calculate aggregations on top of ingested real-time data.
For example, a microservice could ingest raw transactions into Tecton using the API, and the ML application could then retrieve the 1-minute aggregate transaction count for a given credit card from Tecton.
Query Debugging Tree
Tecton 0.6 brings new explainability and debugging capabilities to the feature development process with the Query Debugging Tree, which allows users to break down a pipeline from input data to materialized feature output, investigating and debugging every step involved. This capability can help speed up slow queries and identify issues, making it helpful for investigating training data or testing a pipeline. For any interactive query, users can print a query tree using .explain() and step through to inspect data and diagnose slow queries or unexpected results.
Continuous Mode for all Stream Feature Views
Fresh feature data can meaningfully increase the accuracy of real-time machine learning applications. Tecton 0.6 extends continuous processing mode to all Streaming Feature Views, delivering single-digit second freshness for all streaming features. Ultimately, this capability allows users to revise feature stores from streaming data more quickly.
New Aggregation functions
Tecton’s feature engineering framework makes it easy to express and efficiently compute time-windowed aggregation features in batch and streaming pipelines, down to single-digit second freshness.
Tecton 0.6 adds First-N, First-N Distinct, and Last-N to a robust set of built-in aggregations, including Count, Min, Max, Mean, Sum, StdDev, Variance, and Last-N Distinct. The time window aggregations are guaranteed to be consistent across the offline and online environments. Many companies have implemented this on their own with their own approach, and now Tecton is making it available to any organization, out of the box.
Tecton Access Control and Service Account CLI Commands
Finally, Tecton 0.6 now supports ACL CLI commands, enabling users to manage the Service Account lifecycle and inspect and modify Workspace roles programmatically. The new tecton access-control and tecton service-account commands provide new options for managing Tecton Access Controls through the CLI. You can now view, assign, and remove roles directly from your terminal. This can be particularly powerful if combined with CI/CD pipelines.
Additional compatibility in Tecton 0.6
To further improve compatibility, Tecton 0.6 is also compatible with Databricks Runtime 10.4 LTS, as well as Amazon EMR release 6.7.0. These compatibilities provide additional resources that enable customers to process and analyze large amounts of data even more efficiently.
Tecton 0.6 in summary!
With these new capabilities, Tecton is making feature engineering even more efficient, allowing companies to approach feature engineering with increased agility and flexibility. The new capabilities make it easier for data scientists and engineers to work together, leading to better collaboration and streamlined feature engineering workflows.
Register for our webinar on notebook-driven development on March 28th at 9:30AM PT/12:30PM ET to learn more!
To learn more about this release and the news capabilities above, check out the following resource: What’s New.