Introducing Array Type Features
We’re excited to announce that Tecton now natively supports Array type features. Our customers are now deploying array features in operational machine learning models. In this article, we’ll go through (1) how arrays are commonly used in operational machine learning systems and (2) an example of how a user can compute a similarity score between a product and a query using embeddings in real time with Tecton.
Array Features in Operational ML
Arrays are a feature data type that can be used across a number of applications. Consider a retailer that serves product recommendations to users based on their current search query and purchase history. Our retailer might build the following kinds of features:
- Lists of categorical variables.
product_categories
: a list of categories a product belongs to, e.g.[shoes, women, outdoors]
for a pair of women’s hiking boots.user_last_10_purchased_products
: a list of the last ten product ids purchased by a user. Using our streaming capabilities, Tecton can keep this feature extremely fresh.
- Dense embeddings.
product_embedding
: a precomputed embedding based off of each product’s description and metadata.search_text_embedding
: a query-time embedding computed from the user’s search text, e.g."5-piece knife set"
. This embedding can be provided to the Tecton API to be combined with precomputed features.
Because embeddings have become such an important part of operational ML systems, we dive deeper into how to use them in Tecton in the following section (see this article for more background on embeddings).
Embeddings
Embeddings are a way to transform text, images, or even arbitrary entities, such as a product id, into a lower-dimensional vector representation that captures most of the meaning in the original data.
By natively supporting arrays (including 32-bit float arrays), our customers can now easily bring powerful embedding features into production with a compact online storage format. This matters to our users because it can significantly reduce the infrastructure cost of online storage and serving.
A very common use for embeddings is found in language inputs, where outputs from pre-trained embedding models like Word2vec and GloVe can be used directly as features into models. Another use case we commonly see is employing embeddings to calculate a similarity score between two items and using that score as a feature.
Let’s go back to our example to show how you can compute a similarity score in real time using Tecton. Our customer, the retailer, wants to compare a user’s search to the descriptions of products in the catalogue. Computing a similarity score between every possible search query and every product description is impossible, as there are endless combinations. Instead, the similarity score must be computed between the query embedding and the precomputed product embedding on-the-fly. Tecton allows you to do this with sub-100ms latency. It’s also extremely easy to code:
@on_demand_feature_view(
inputs={
'product_embedding': Input(product_embedding),
'search_text_embedding': Input(search_text_embedding)
},
output_schema=StructType([StructField('cosine_similarity', DoubleType())]),
description="Computes the cosine similarity between a search text embedding and a precomputed product embedding."
)
def search_product_similarity(product_embedding: pandas.DataFrame, query_embedding: pandas.DataFrame):
@np.vectorize
def cosine_similarity(a: np.ndarray, b: np.ndarray):
return np.dot(a, b)/(norm(a)*norm(b))
df = pd.DataFrame()
df["cosine_similarity"] = cosine_similarity(search_text_embedding["embedding"], product_embedding["embedding"])
return df
The feature author only needs to declare the inputs and a simple pandas definition with the similarity score. Tecton then orchestrates the pipelines to compute and serve the feature on-demand. Tecton is uniquely built to simplify real time machine learning applications.
Conclusion
With the release of native support for array features, our customers are now able to deploy powerful features into production cheaper and faster. At Tecton, we continue to add capabilities that allow our customers to easily put complex features into production. If you are an organization building operational ML models and want to learn more, you can request a free trial here.