Tecton

How Good Models Go Bad in Production

Published: December 10, 2024

Ultimately you want your model to make better predictions — which means improving your model accuracy. Model accuracy measures how often a model’s predictions align with the actual values. How well can it reflect the current state of the real world?

The key phrase here is “real world.” Because the hard part of machine learning is ensuring it’s effective in real-world situations.

Training an accurate model is one thing. Training a model that stays accurate in production? That’s a whole other ball game. 91% of ML models degrade over time once they’re in production.

Why do seemingly accurate models stumble in the real world? While many factors affect model accuracy, one of the most fundamental is feature production. Models are only as good as the features that inform them. From the experimentation and training phase, to the post-production work of learning and iterating, every step of the feature production process makes a big impact on model accuracy. 

So when a good model goes bad, suboptimal feature engineering practices are often the culprit. Let’s look at two different moments in the feature lifecycle that influence how models perform in production. 

Pre-production: Getting the training data right in the first place

The first big challenge hits you right at the beginning: making sure your training data will hold up in production and avoiding training/serving skew. It might seem straightforward – your data scientists write features, and those features go into production. But here’s where things often go sideways:

The feature logic is different: The data engineering team might interpret the feature logic differently from the data scientists, so when they build data pipelines, the features are calculated incorrectly.

The features aren’t historically accurate: Point-in-time accuracy is tricky. If you just queried your data for the latest known feature values, it would fetch present-day values that weren’t available in the past (known as “data leakage”). This creates unrealistically good performance in training, but fails in real-world applications. For instance, if you want to predict customer churn 30 days before it happens, you need to train on what customer data looked like 30 days before churn events. Using data from the day before they churned would be “cheating” because when making the real prediction, you won’t have that data.

The training data isn’t comprehensive enough: It takes complex data engineering to add all relevant features into a single dataset, including joining millions of records from disparate batch, streaming, and real-time sources. And it’s difficult to calculate advanced features you might need, such as time-window aggregations and embedding-based features.

The key is to match your training data with how you will actually use that data in production – from the way you calculate the features, to the prediction horizon, to the data sources needed. 

And even when you get to deployment, your work is just beginning.

Post-production: Maintaining accuracy as the world changes

So you’ve trained your model and deployed your feature pipelines – but now you need to fight off model drift. As one study on model degradation pointed out, “models become inherently dependent on the data as it was at the time of training.” 

The world changes. So your model accuracy inevitably changes, or “drifts”, when deployed in production environments. The data as it was at the time you first trained the model no longer accurately reflects new patterns. There are two types of drift:

Concept drift: The relationship changes between the target variable and the input features. Concept drift is what happens when the patterns that the model has learned are no longer reflective of the world. For example, new fraud techniques emerge, or a user’s movie preferences change.

Data drift: The input data has changed. The statistical properties of the input features change over time. The data itself looks different, but the relationship between inputs and outputs remains the same. For instance, you trained a model on housing prices from 2010-2020, but the distribution of house sizes or locations in your new data from 2024 is significantly different.

Stale features are a silent accuracy killer. Without the right intervention, models lose their predictive power as they operate on increasingly outdated patterns and data distributions, potentially leading to costly mistakes in critical business decisions.

How to improve accuracy and prevent your model’s untimely demise

High accuracy might seem unattainable, especially if your ML team doesn’t have a huge budget or a deep bench of ML talent. But model accuracy doesn’t have to be a losing battle. 

There are a few best practices for improving accuracy more efficiently, by optimizing your feature engineering process:

  • Get data scientists and data engineers on the same page with a common language and single feature definition for training and serving.
  • Combine batch, streaming, and real-time features to create comprehensive training datasets that utilize all relevant signals
  • Quickly generate accurate training datasets in a single line of code
  • Rapidly iterate on features, making improvements as you get real-world feedback

Ready to dive in more? Watch this virtual talk to hear from experts who have seen the same accuracy challenges that your team might be facing.

Book a Demo

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button

Contact Sales

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button