Tecton

How to Integrate With Tecton

Last updated: October 17, 2023

Many of Tecton’s customers are mature or rapidly growing engineering organizations that have spent years building infrastructure and best practices to suit their needs. This often includes established code review workflows, custom data pipelines, deployment processes, or data scientist environments.

Therefore, the most common consideration when adopting Tecton is how seamlessly it will integrate with an organization’s existing infrastructure and processes. 

Tecton’s product ethos has always been to offer simple solutions, complemented by flexible options for more complex requirements. In this post, I’ll walk through tradeoffs to consider when integrating with Tecton, beginning with:

  • Configuring Tecton’s Declarative API
  • Data ingestion patterns
  • Managed and manual orchestration

Note: This post is focused on Tecton’s Spark-based solutions, but these patterns also apply to Tecton’s other data platform integrations, such as Snowflake.

Configuring Tecton’s Declarative API

Tecton is an end-to-end feature platform that uses a Python-based Declarative API to define and iterate on features. from creating feature pipelines to training data generation and real-time feature serving. Tecton repositories can be organized in isolated workspaces that allow users to iterate in parallel. It can also be integrated with CI/CD systems for more robust production deployments.

Repositories 

Tecton’s CLI is used to create repositories based on the Declarative API, which can be registered with Tecton using tecton apply. Once repositories are registered, Tecton will orchestrate jobs and pipelines on your existing infrastructure.

It’s easy to get started interactively: install the Tecton CLI, log in, create a repository, and apply it to Tecton.

$ pip install "tecton[pyspark]"
$ tecton login
  Tecton Cluster URL: https://acme.tecton.ai
$ tecton init
$ tecton apply

The CLI workflow is a great way to get started and iterate on feature definitions with Tecton. tecton apply syncs and pushes your changes to a Tecton workspace on demand. In practice, mature ML teams often have dozens of data scientists and engineers iterating in parallel, which requires a process to merge changes safely without affecting production environments.

Workspaces

Tecton workspaces provide isolation between users as well as environments (e.g., production and development). Users can create and iterate using “development” workspaces, which can eventually graduate to “live” workspaces, which enable production feature serving for real-time feature serving.

For example, at an e-commerce company, Alice on the fraud team and Betty on the recommendations team are developing use cases on Tecton. Both teams already have real-time use cases in production. Using workspaces, Alice and Betty can independently iterate in separate development workspaces before promoting their features to isolated live workspaces serving their team’s production data. 

Image showing isolation between users and isolation between development and live environments using workspaces in Tecton.

The next step is deploying changes to production. The simple option is interactively using tecton apply to push changes to the production environment.

CI/CD Integration

Many engineering teams have established processes for production changes. This means ensuring changes are well-tested, reviewed, auditable, and sequenced to prevent merge conflicts.

CI/CD integration with Tecton is the recommended strategy for safely pushing changes to production. CI/CD systems authenticated with a service account can safely merge and apply changes once they land in version control, which allows teams to manage Tecton using their engineering organization’s established code review and testing processes. 

Image showing isolated workspaces updated using CI/CD integrations and service accounts with Tecton.

Tecton’s Declarative API also supports unit testing that can be run on-demand or as a pre-commit hook before pull requests are merged. Now, making changes to Tecton looks a lot like the common software engineering workflow: make some changes, write some unit tests, create a pull request, get it reviewed, merge it, and let your CI/CD system apply changes to Tecton. 

Image showing audit log of changes applied to a workspace in Tecton.

In summary, there is a spectrum of options when configuring Tecton’s Declarative API:

  • Simple option: Install Tecton CLI and apply changes to a live workspace.
  • Flexible option: Use multiple live and development workspaces to isolate users and environments. Integrate with CI/CD to meet an organization’s code review, unit testing, and audit requirements.

Data ingestion patterns

At the heart of any real-time ML application is fresh, highly available feature data derived from one or more data sources. While the industry is converging on a set of batch (e.g., Snowflake, Databricks Delta Lake, BigQuery) and streaming (e.g., Kinesis, Kafka) options, there remains a long tail of storage and data pipeline patterns that need to be supported for a feature platform to be effective.

For ingesting data, Tecton supports two paradigms: pulling or pushing data. 

Pushing data into Tecton

Pushing data into Tecton’s feature store eliminates many prerequisites for reading directly from data sources (e.g., IAM policies, cross-account permissions, etc.). 

Tecton’s options for pushing data are Feature Tables and Ingest API. Feature Tables are effective for ingesting large volumes of infrequently updated batch data. Ingest API provides a low-latency, high-availability HTTP API for row-level data ingestion from stream processors or other clients.

Image showing features being written to Tecton using an Ingest API and Feature Tables.

With Tecton’s Ingest API, row-level data is pushed into Tecton using a simple HTTP request:

$ curl -X POST https://acme.tecton.ai/ingest\
     -H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
  "workspace_name": "prod",
  "push_source_name": "click_event_source",
  "push_record_map": {
    "user_id": "C1000262126",
    "clicked": 1,
    "timestamp": "2022-10-27T02:05:01Z"
  },
}'

With Feature Tables, DataFrames (Pandas or PySpark) can be pushed into Tecton in batches using the Python SDK:

import pandas
import tecton
from datetime import datetime

df = pandas.DataFrame(
   [
       {
           "user_id": "user_1",
           "timestamp": pandas.Timestamp(datetime.now()),
           "user_login_count_7d": 15,
           "user_login_count_30d": 35,
       }
   ]
)
ws = tecton.get_workspace(“prod”)
ws.get_feature_table("user_login_counts").ingest(df)

Pulling data into Tecton

To pull data directly into Tecton, several data source integrations are natively supported. Tecton can connect directly to batch sources (S3, Glue, Snowflake, Redshift), as well as stream sources (Kinesis, Kafka).

Image showing Tecton ingesting data from stream and batch data sources.
from tecton import HiveConfig, BatchSource

fraud_users_batch = BatchSource(
   name="users_batch",
   batch_config=HiveConfig(database="fraud", table="fraud_users")
)

For power users on Spark, Data Source Functions provide complete flexibility when connecting to any Spark-compatible batch or streaming sources (e.g., Delta tables with live stream updates). In general, if data can be read into a Spark DataFrame, it can be read from Tecton. 

from tecton import spark_batch_config, BatchSource

@spark_batch_config()
def csv_data_source_function(spark):
   from pyspark.sql.functions import col

   ts_column = "created_at"
   df = spark.read.csv(csv_uri, header=True)
   df = df.withColumn(ts_column, col(ts_column).cast("timestamp"))
   return df


csv_batch_source = BatchSource(
  name="csv_ds",
  batch_config=csv_data_source_function
)

In summary, there is a range of options for ingesting data into Tecton.

  • Pushing data is simple as it eliminates many data source integration prerequisites while providing flexibility for more complex integrations such as custom stream processors.
  • Pulling data directly from stream and batch data sources offers high performance ingestion that supports a growing list of cloud-native data storage options.

Orchestration

Managed schedules

Tecton manages the schedule of materialization jobs that ingest and transform data into the feature store. Simple schedules such as hourly jobs or daily jobs at a specific time are easy to schedule when creating Feature Views.

Image showing managed orchestration jobs on daily schedule.

Job progress can be monitored in Tecton’s UI. In the event of upstream data source issues, individual jobs can be re-run to overwrite any invalid data time ranges.

Triggered materialization API

For additional flexibility, Tecton also supports a triggered materialization API that enables data teams to set their own materialization schedule or trigger jobs on demand as part of their own orchestration platform such as Airflow, Dagster, or Prefect. The API also supports re-running materialization jobs over previously materialized time windows.

import tecton

fv = tecton.get_workspace("prod").get_feature_view("user_fraud")
fv.trigger_materialization_job(
  start_time=datetime.utcnow() - timedelta(hours=1),
  end_time=datetime.utcnow(),
  online=True,
  offline=False,
)
Image showing managed schedule and manually triggered orchestration.

In summary, there are two options for scheduling in Tecton:

  • Simple approach: Managed schedules.
  • Flexible approach: Schedule jobs manually using triggered materialization API.

Recap

This post covered a few of the options available when defining features using Tecton and ingesting your data into Tecton’s feature store. We recommend starting simple and incrementally integrating Tecton with your engineering team’s infrastructure and processes as you see fit.

The example below illustrates a simple Tecton integration with a user-managed feature repository that configures batch-only Feature Views orchestrated with managed schedules. Each component (feature repository, data ingestion patterns, orchestration) can be customized to fit the engineering team’s needs.

Example simple Tecton integration: user-managed repository with batch-only Feature Views and managed schedules.

This is the first in a series of posts covering integration options when integrating with Tecton. In future posts, I’ll cover topics ranging from training data generation to online stores and access controls. 

Questions or comments? Join us in our community Slack channel. Or if you’re interested in trying out Tecton for yourself, sign up for a free trial

Book a Demo

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button

Contact Sales

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button