How Good Models Go Bad in Production

Ultimately you want your model to make better predictions — which means improving your model accuracy. Model accuracy measures how often a model’s predictions align with the actual values. How well can it reflect the current state of the real world?

The key phrase here is “real world.” Because the hard part of machine learning is ensuring it’s effective in real-world situations.

Training an accurate model is one thing. Training a model that stays accurate in production? That’s a whole other ball game. 91% of ML models degrade over time once they’re in production.

Why do seemingly accurate models stumble in the real world? While many factors affect model accuracy, one of the most fundamental is feature production. Models are only as good as the features that inform them. From the experimentation and training phase, to the post-production work of learning and iterating, every step of the feature production process makes a big impact on model accuracy.

So when a good model goes bad, suboptimal feature engineering practices are often the culprit. Let’s look at two different moments in the feature lifecycle that influence how models perform in production.

Pre-production: Getting the training data right in the first place

The first big challenge hits you right at the beginning: making sure your training data will hold up in production and avoiding training/serving skew. It might seem straightforward – your data scientists write features, and those features go into production. But here’s where things often go sideways:

The feature logic is different: The data engineering team might interpret the feature logic differently from the data scientists, so when they build data pipelines, the features are calculated incorrectly.

The features aren’t historically accurate: Point-in-time accuracy is tricky. If you just queried your data for the latest known feature values, it would fetch present-day values that weren’t available in the past (known as “data leakage”). This creates unrealistically good performance in training, but fails in real-world applications. For instance, if you want to predict customer churn 30 days before it happens, you need to train on what customer data looked like 30 days before churn events. Using data from the day before they churned would be “cheating” because when making the real prediction, you won’t have that data.

The training data isn’t comprehensive enough: It takes complex data engineering to add all relevant features into a single dataset, including joining millions of records from disparate batch, streaming, and real-time sources. And it’s difficult to calculate advanced features you might need, such as time-window aggregations and embedding-based features.

The key is to match your training data with how you will actually use that data in production – from the way you calculate the features, to the prediction horizon, to the data sources needed.

And even when you get to deployment, your work is just beginning.

Post-production: Maintaining accuracy as the world changes

So you’ve trained your model and deployed your feature pipelines – but now you need to fight off model drift. As one study on model degradation pointed out, “models become inherently dependent on the data as it was at the time of training.”

The world changes. So your model accuracy inevitably changes, or “drifts”, when deployed in production environments. The data as it was at the time you first trained the model no longer accurately reflects new patterns. There are two types of drift:

Concept drift: The relationship changes between the target variable and the input features. Concept drift is what happens when the patterns that the model has learned are no longer reflective of the world. For example, new fraud techniques emerge, or a user’s movie preferences change.

Data drift: The input data has changed. The statistical properties of the input features change over time. The data itself looks different, but the relationship between inputs and outputs remains the same. For instance, you trained a model on housing prices from 2010-2020, but the distribution of house sizes or locations in your new data from 2024 is significantly different.

Stale features are a silent accuracy killer. Without the right intervention, models lose their predictive power as they operate on increasingly outdated patterns and data distributions, potentially leading to costly mistakes in critical business decisions.

How to improve accuracy and prevent your model’s untimely demise

High accuracy might seem unattainable, especially if your ML team doesn’t have a huge budget or a deep bench of ML talent. But model accuracy doesn’t have to be a losing battle.

There are a few best practices for improving accuracy more efficiently, by optimizing your feature engineering process:

Get data scientists and data engineers on the same page with a common language and single feature definition for training and serving.
Combine batch, streaming, and real-time features to create comprehensive training datasets that utilize all relevant signals
Quickly generate accurate training datasets in a single line of code
Rapidly iterate on features, making improvements as you get real-world feedback

Ready to dive in more? Watch this virtual talk to hear from experts who have seen the same accuracy challenges that your team might be facing.

How Good Models Go Bad in Production

Pre-production: Getting the training data right in the first place

Post-production: Maintaining accuracy as the world changes

How to improve accuracy and prevent your model’s untimely demise

Follow Us

Book a Demo

Contact Sales

Request a free trial

Pre-production: Getting the training data right in the first place

Post-production: Maintaining accuracy as the world changes

How to improve accuracy and prevent your model’s untimely demise

Related Posts

Introducing Rift: Tecton’s Compute Engine

Introducing AI-Assisted Feature Engineering with Cursor & MCP

Why Feature Freshness Makes or Breaks Fraud Detection

Follow Us

Book a Demo

Contact Sales

Request a free trial