Tecton

Serving 100,000 feature vectors per second with Tecton and DynamoDB

Last updated: October 18, 2023

There is a class of Machine Learning models that require real-time data to make predictions – the process by which these models make predictions is known as Online Inference. Examples of these types of models are models that detect fraudulent credit card activity at the time of purchase or make purchasing recommendations based on a customer’s recent browsing history. The speed with which features are served to the model is of the utmost importance.

Tecton is a feature store that is built to pull in data from a variety of sources (batch and streaming) and serve those features to production models. In this blog post we’ll benchmark Tecton’s online feature serving capabilities, and show how Tecton is able to serve feature values at low latency (<< 100ms) even at very high load (> 3 million DynamoDB requests per second)

Tecton Terminology

As we review the results of our benchmarks there are some Tecton specific terms that need to be described:

Feature View –  A Feature View encapsulates a data source, transformation logic and information about how often to refresh the feature. There are Feature Views for Batch and Streaming data sources as well as Feature Views that are specific to deriving aggregates from the data sources. The features used for online serving are by default stored in a DynamoDB table on AWS (other options are available). More details on Feature Views can be found in our documentation.

On Demand Feature View – A unique type of Feature View is the On Demand Feature View. It’s different from the other feature views in that the end result isn’t precomputed and stored in a DynamoDB table. Rather, as the name implies, this feature is calculated on demand at the time of the feature request.

Feature Service – A grouping of all the Feature Views necessary for a model. Typically there is 1-1 mapping between a Feature Service and a model. The Feature Service is accessed via an HTTPS Rest endpoint. Additional details can be found in our documentation.

How are features served in Tecton

Tecton is deployed in a Hybrid SaaS model. There is a control plane that runs in Tecton’s AWS account, and a data plane that lives in the customer’s AWS account. When a request for a feature service is made it interacts with our control plane which consists of an Nginx ingress layer backed by our feature serving application that is deployed on Kubernetes. This application then makes the request for data that is in DynamoDB (typically in the customer’s account). The Feature Servers are stateless and do not share state. This allows us to scale out the Feature servers horizontally to handle large QPS. The primary responsibility of feature servers is serving data from DynamoDB after filtering and aggregations at low latency.

One important note to make is that one Feature Service request can result in many requests to DynamoDB. As an example if a feature service has 50 Feature Views then the 1 feature service request results in 50 underlying requests to DynamoDB. These details are noted in our results section when we differentiate between feature service queries per second (FS-QPS) and the resulting DynamoDB queries per second (DDB-QPS).

Test Data

As discussed in a previous post, Tecton uses a “tiling” mechanism for computing aggregate features. The size of the aggregation window impacts serving latency.

For the following scenarios we retrieved a variety of features. These features were a combination of sum-aggregations and simple look up features.

Feature TypeNumber of Features
Non- Aggregate1250
28 Day Aggregate1250
7 Day Aggregate1250
1 Day Aggregate1000
356 Day Aggregate250
Total:5000

Results

Scenario 1

The first scenario targeted multiple Feature Services with a varying number of Feature Views and total features.

FS-QPS# of Feature ViewsDynamoDB TablesDDB-QPS# of Features
Feature Service 160,00050503,000,0005000
Feature Service 220,00055100,000500
Feature Service 320,0001010200,0001000
Total:100,00065653,300,0006,500

Latency

P-ValueObserved Latency
p50 Latency14ms
p90 Latency28ms
p95 Latency37ms
p99 Latency55ms

Availability

Scenario 2

Scenario 2 tested our On Demand Feature View. As noted earlier this type of Feature View does not rely on DynamoDB to store the computed value as the value is computed at the time of request.

The On Demand Feature View was written in Python and calculated the Jaccard Similarity between two sets of data. The data in the two sets was the result of two queries to the same datasource with different primary keys. The Jaccard Similarity was then calculated On Demand for these two sets of data. The primary keys were randomly selected to ensure that the same query wasn’t repeated.

# of Feature Views# of FeaturesDynamoDB TablesDDB-QPSFS-QPS
Feature Service 1110040,000

Latency

P-ValueObserved Latency
p50 Latency20ms
p90 Latency35ms
p95 Latency40ms
p99 Latency70ms

Availability

Conclusion

Reviewing the results we can notice that in all use cases latency was under 75ms per request. We can also appreciate the fact that the error rates were < 0.00001% meaning that in a test where we’re doing 100,000 queries per second we can expect a single digit failure rate (~99.9999 availability). With these details we can see that Tecton meets its stated SLA’s (p99 < 100ms and 99.9% uptime) under a variety of high load scenarios.

Tecton is built to scale to the volumes needed by the largest and most sophisticated ML organizations on the planet. At Tecton we continue to push the boundaries of scale and performance to achieve lower latency. We are currently working on adding support for other stores like Redis which will further lower our p99 latency and provide lower total cost of operations. We will be following up with those numbers in a future blog. If you’re curious to try Tecton out for yourself, check out tecton.ai.

Book a Demo

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button

Read what top ML teams are building

Our newsletter shares practical insights on productionizing data for AI.

Contact Sales

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button