In this section of the notebook, we'll load our fraud detection model from mlflow, layer in the transaction information we'll assume we're passed in (type of transaction and amount), and layer it with features about the user we retrieve from Tecton.
Ordinarily, this would be running behind a REST API so you can do this inference in real time.
# Here, we import the tecton library and use its `get_feature_vector` convenience function,
# but we can also save the import and issue an HTTP request directly to the Tecton endpoint
import tecton
import mlflow
from dotenv import load_dotenv, find_dotenv
import os
import pandas as pd
# Load the TECTON_API_KEY from a .env file - in production you might
# pass this in as an environment variable or use a secrets manager
load_dotenv(find_dotenv())
True
ws = tecton.get_workspace('prod')
fs = tecton.get_feature_service('fraud_prediction_service')
tecton_api_key = os.environ['TECTON_API_KEY']
tecton.set_credentials(tecton_api_key)
vec = fs.get_feature_vector(join_keys={"user_id": "C1986564990"}).to_pandas()
Here's what we get back from Tecton
vec.T
0 | |
---|---|
days_since_last_transaction.days_since_last | 82 |
transaction_aggregates.amount_mean_12h_1h | None |
transaction_aggregates.amount_mean_168h_1h | None |
transaction_aggregates.amount_mean_1h_1h | None |
transaction_aggregates.amount_mean_24h_1h | None |
transaction_aggregates.amount_mean_72h_1h | None |
transaction_aggregates.amount_mean_960h_1h | None |
transaction_aggregates.transaction_sum_12h_1h | None |
transaction_aggregates.transaction_sum_168h_1h | None |
transaction_aggregates.transaction_sum_1h_1h | None |
transaction_aggregates.transaction_sum_24h_1h | None |
transaction_aggregates.transaction_sum_72h_1h | None |
transaction_aggregates.transaction_sum_960h_1h | None |
user_age_days.age | 10477 |
users_credit_score.credit_score | None |
req_cols = pd.DataFrame({
'amount': 123.45,
'type_cash_in': True,
'type_cash_out': False,
'type_debit': False,
'type_payment': False,
'type_transfer': False
}, index=[0])
req = vec.merge(req_cols, left_index=True, right_index=True)
req_ordered = req[['amount', 'type_cash_in',
'type_cash_out', 'type_debit', 'type_payment', 'type_transfer',
'transaction_aggregates.transaction_sum_1h_1h',
'transaction_aggregates.transaction_sum_12h_1h',
'transaction_aggregates.transaction_sum_24h_1h',
'transaction_aggregates.transaction_sum_72h_1h',
'transaction_aggregates.transaction_sum_168h_1h',
'transaction_aggregates.transaction_sum_960h_1h',
'transaction_aggregates.amount_mean_1h_1h',
'transaction_aggregates.amount_mean_12h_1h',
'transaction_aggregates.amount_mean_24h_1h',
'transaction_aggregates.amount_mean_72h_1h',
'transaction_aggregates.amount_mean_168h_1h',
'transaction_aggregates.amount_mean_960h_1h',
'users_credit_score.credit_score',
'days_since_last_transaction.days_since_last', 'user_age_days.age']]
And here's the final vector we'll pass to the model, incorporating the additional features we were passed in (transaction type columns) and Tecton features about the user in the correct order
req_ordered.T
0 | |
---|---|
amount | 123.45 |
type_cash_in | True |
type_cash_out | False |
type_debit | False |
type_payment | False |
type_transfer | False |
transaction_aggregates.transaction_sum_1h_1h | None |
transaction_aggregates.transaction_sum_12h_1h | None |
transaction_aggregates.transaction_sum_24h_1h | None |
transaction_aggregates.transaction_sum_72h_1h | None |
transaction_aggregates.transaction_sum_168h_1h | None |
transaction_aggregates.transaction_sum_960h_1h | None |
transaction_aggregates.amount_mean_1h_1h | None |
transaction_aggregates.amount_mean_12h_1h | None |
transaction_aggregates.amount_mean_24h_1h | None |
transaction_aggregates.amount_mean_72h_1h | None |
transaction_aggregates.amount_mean_168h_1h | None |
transaction_aggregates.amount_mean_960h_1h | None |
users_credit_score.credit_score | None |
days_since_last_transaction.days_since_last | 82 |
user_age_days.age | 10477 |
As we load the model from databricks, you need to have all of the libraries installed that you use in your model, and using the same versions as well. You can see which versions are captured in your model by inspecting the conda.yaml
file in your MLFlow run in your model directory under "Artifacts"
# We load the model from mlflow - in Databricks we transitioned the model trained above to "production" so we can specify that
# stage below
os.environ['MLFLOW_TRACKING_URI'] = 'databricks'
model = mlflow.sklearn.load_model('models:/fraud_detection/Production')
# Predict will return a dataframe. Since we only passed in one vector to predict, res will be length 1.
res = model.predict(req_ordered)
# This is the result of the prediction (0 = not fraud, 1 = fraud)
res[0]
0