Rules & Heuristics for Machine Learning
Tecton is a feature platform for production machine learning built to orchestrate the complete lifecycle of features, from transformation to online serving. Developers leverage Tecton to power their feature development for real-time machine learning because building and maintaining your own platform comes with a number of significant challenges. However, since the creation and retrieval of these features happens as code, it has opened the platform to be used in other, unique ways. We have seen a number of organizations leverage Tecton to power not only their features, but rules and heuristics as well!
An introduction to rules engines & heuristics for ML
Rule engines implement business logic via conditionals. These rules can take the form of policies that an organization must adhere to or heuristics that guide informed decisions. To help make a clear distinction between the two, we will take a look at how each of these methods can be used to determine whether or not to approve a loan request.
Rule-based policies
Regulations and restrictions may prohibit an organization from taking certain types of actions, regardless of what an ML model is set up to predict. If we are considering loan approvals, there is a laundry list of rules that could prevent requests from even being considered, such as:
- Minimum and maximum loan amounts
- The requester’s state of residence or age
- The requester’s loan history or credit score
Enforcing policies with a rules engine provides a clear and understandable way to reject these requests and eliminates the risk of an ML model predicting a false positive. Once a set of policies is defined, they can be enforced intrinsically, used as features to models, or incorporated into data pipelines that incorporate multiple stages of rules, heuristics, features, and models.
Rule-based heuristics
Rules engines can be used for more complicated logic as well. If the policies described above are regulations that must be followed, heuristics would be a collection of rules and/or rule-based algorithms that do not have to be followed, but can be used to make an informed decision.
For instance, when approving or rejecting loans, a heuristic for new loan requests could be to check historical data and see if a loan applicant’s credit score is in a higher percentile than the percentile of the loan amount they are requesting. In the graph below, this heuristic would accept every loan request in the green area, but reject every request above the dotted blue line.
This example covers three characteristics good heuristics should possess:
- Simple. Any heuristic should be easy to implement, trivial to update, and should be able to be executed effectively with data and compute resources that are readily available. In the loan approval example, every request should at least come with an amount and the applicant’s credit score. Historical information on previous approvals should be easy to obtain and work with as well. Heuristics are a great way to launch a new product before enough data is available for machine learning.
- Explainable. The outcome of a decision based on heuristics should be capable of being easily understood by those affected by the decision. For instance, if a loan request is rejected, the applicant and the organization using the heuristic should be able to understand why; in our example, either the requested loan amount is too high, the credit score is too low, or both.
- Informed. The heuristics an organization utilizes should be informed and opinionated by that organization. In the example, we are using historical information in the organization’s approval history to inform new approvals. This will provide a clearer path to the adoption of an ML model to eventually replace the heuristics implemented. The heuristics used in the past can also now become features powering a decision-making model if the organization so chooses.
Improving ML model performance with rules and heuristics in Tecton
Rules and heuristics can be used to subvert machine learning and make a decision, or they can be used together to improve an ML model’s performance. When it is time to make a decision (e.g., approving or denying a loan), we can first check if the request breaks any rules. If the request doesn’t break any rules, it can then be sent to both the heuristics model and ML model. Heuristics can act as a performance check against predictions from an ML model, ensuring that the performance increases of using it justify the additional complexities of building and running it.
Tecton offers a single platform to manage your rules and features as code, and its tagging system can help you differentiate between them. We’ve created a sample notebook that utilizes Tecton to develop policies, heuristics, and features all working together to solve the single problem of loan approvals and rejections.
Creating rules in Tecton
Tecton’s On-Demand Feature Views (ODFVs) can be used and pipelined together to create rules engines. For simple policies, ODFVs will only depend on a RequestSource. After every rule is created, managing them all together with common tags and Feature Services makes them easier to version control and manage in production, and easy to check against by iterating through the Feature Service.
@on_demand_feature_view(
sources=[loan_request],
mode='python',
schema=[Field('request_age_check', Bool)],
description='The loan requests age is older than 18',
tags={'feature_type': 'rule'}
)
def request_age_check(loan_request):
from datetime import datetime
return {'request_age_check': (datetime.now() - datetime.strptime(loan_request['user_dob'], '%m/%d/%Y')).days / 365.2425 > 18}
@on_demand_feature_view(
sources=[loan_request],
mode='python',
schema=[Field('request_residence_check', Bool)],
description='The loan request is coming from a coverage area',
tags={'feature_type': 'rule'}
)
def request_residence_check(loan_request):
states = ['CA', 'NY']
return {'request_residence_check': loan_request['user_state'] in states}
rules_fs = FeatureService(
name = 'rules_fs',
features = [
request_age_check,
request_residence_check
]
)
rules_fs.validate()
for rule in rules_fs.features:
if list(rule.feature_definition.run(loan_request=mock_loan_request).values())[0]:
print("Request passes rule " + rule.feature_definition.info.name)
else:
print("Request valiates rule " + rule.feature_definition.info.name)
Tecton can combine the request sources used when enforcing policies with historical datasets to create heuristics with ODFVs as well, building more complex logic that can act as a baseline for an ML model’s performance.
heuristic_schema = [
Field("loan_request_amount", Int64),
Field("user_credit_score", Int64),
Field("user_dob", String),
Field("user_state", String)
]
heuristic_request = RequestSource(schema=heuristic_schema)
output_schema = [
Field("loan_rank", Float64),
Field("credit_rank", Float64),
Field("accept_loan", Bool),
]
@on_demand_feature_view(
sources=[heuristic_request, loan_percentiles],
mode='python',
schema=output_schema
)
def loan_odfv(heuristic_request, loan_percentiles):
from scipy import stats
loan_rank = stats.percentileofscore(loan_percentiles['loan_request_amount'], heuristic_request["loan_request_amount"])
credit_rank = stats.percentileofscore(loan_percentiles['user_credit_score'], heuristic_request["user_credit_score"])
return {
'loan_rank': loan_rank,
'credit_rank': credit_rank,
'heuristic_accept_loan': loan_rank <= credit_rank
}
loan_odfv.validate()
Key Takeaways
When used traditionally, Tecton powers features for production ML. When used as a rules engine to implement policies and heuristics, however, you can extend the functionality of Tecton and improve the performance of the models using Tecton features. As previously mentioned, there is a sample notebook that uses Tecton to build and use rules, heuristics, and features. This notebook uses Rift, a Python-based compute engine that is optimized for AI data workflows. Check out this blog post for more information on Rift, including how to use it to infuse real-time AI decisioning into production applications.