Introducing SDK 0.6 Notebook Development
Tecton 0.6 is now released! It comes packed with major new capabilities designed to dramatically speed up the feature development process, help build better features, and integrate streaming data easier than ever before!
For a full list of updates and instructions on upgrading, refer to the Tecton 0.6 Release Notes. The release includes the following key features:
- Notebook-Driven Development: Tecton 0.6 introduces a notebook-centric workflow to develop production-ready ML features iteratively directly in a notebook. This new workflow bridges development and production environments in a completely novel way. Previously, Tecton feature pipelines could only be tested with Tecton after deploying them with Tecton’s CLI. Now Data Scientists, Data Engineers, and ML Engineers can leverage Tecton’s entire feature engineering framework in their core modeling workflow without leaving their notebooks. When it comes time to productionize, the tested feature definitions can be copied into a repo and pushed to production in a single step. This unique approach offers speed and flexibility in feature development while preserving the productionization best practices of a GitOps workflow like “features-as-code”, version control, and CI/CD.
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from datetime import datetime, timedelta
# Fetch the workspace
ws = tecton.get_workspace("prod")
# Fetch objects from the workspace
user = ws.get_entity("user")
user_sign_ups = ws.get_data_source("user_sign_ups")
# Use those objects as dependencies and define objects in a notebook
@batch_feature_view(
sources=[FilteredSource(user_sign_ups)],
entities=[user],
mode="spark_sql",
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
)
def user_credit_card_issuer(user_sign_ups):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '3' THEN 'AmEx'
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as credit_card_issuer
FROM
{user_sign_ups}
"""
# Validate objects interactively
user_credit_card_issuer.validate()
- New Aggregations: Tecton’s feature engineering framework makes it easy to express and efficiently compute time-windowed aggregation features in batch and streaming pipelines, down to single-digit second freshness. Tecton 0.6 adds First-N, First-N Distinct, and Last-N to a robust set of built-in aggregations, including Count, Min, Max, Mean, Sum, StdDev, Variance, and Last-N Distinct. The time window aggregations are guaranteed to be consistent across the offline and online environments.
- Stream Ingest API (Private Preview): Tecton’s new Stream Ingest API in 0.6 makes publishing real-time data to the Feature Store from any stream or micro-service easy – you can do it via a simple HTTP API call! Tecton makes ingested data available both for online serving and for offline training data generation. Tecton’s Stream Ingest API is fully serverless, and offers sub-100ms ingestion latencies into the online store. The API is fully compatible with Tecton’s aggregations framework – this means that Tecton can even calculate aggregations on top of ingested real-time data.
- Faster data ingestion for Stream Feature Views: Fresh feature data can meaningfully increase the accuracy of real-time machine learning applications. Tecton 0.6 extends continuous processing mode to all Streaming Feature Views, delivering single-digit second freshness for all streaming features.
- Query Debugging Tools: Tecton 0.6 brings new explainability and debugging capabilities to the feature development process. For any query via Tecton’s SDK, users can print a query tree using .explain() and step through it to inspect data and diagnose slow queries or queries that return unexpected data.
- Tecton Access Control and Service Account CLI Commands: The new tecton access-control and tecton service-account commands provide new options for managing Tecton Access Controls through the CLI. You can now view, assign, and remove roles directly from your terminal. This can be particularly powerful if combined with CI/CD pipelines.
- Additional compatibility in Tecton 0.6: This release is also compatible with Databricks Runtime 10.4 LTS (including support for Spark 3.2.1 and Amazon EMR release 6.7.0 which offers additional resources and reading for this release.
With these new capabilities in Tecton 0.6, companies can now approach feature engineering even more efficiently, making it easier to work with the best data source for a given use case, from fraud detection to dynamic pricing, to whichever challenge your team is looking to tackle next.