Instance Type Configuration and Data Source Previews
Default Stream Cluster Configuration now includes On-demand instances for Driver nodes
Using an on-demand instance for the Spark driver node can make stream processing more reliable by preventing losing the entire cluster to spot termination. By using a mix of instance types, you can ensure reliability while taking advantage of cheaper spot instances for additional processing power.
The new first_on_demand
parameter for DatabricksClusterConfig
and EMRClusterConfig
enables configuring a mix of on-demand and spot instances in a single cluster. When configured, the first first_on_demand
nodes of the cluster will use on_demand instances. The rest will use the type specified by instance_availability
.
If not specified, then Tecton will default to first_on_demand=1
for StreamFeatureView
and StreamWindowAggregateFeatureView
.
Spot with fallback instance availability for Databricks
Materialization jobs on Databricks can now be configured to use the spot with fallback availability option.
DatabricksClusterConfig.instance_availability
now supports the spot_with_fallback
option. See the Databricks documentation for more details.
Raw Data Source Preview
You can now use the Tecton SDK to view Data Source inputs before the translator function is applied. Viewing a sample of this raw data can help debug translator and data source issues.
To do so, set apply_translator=False
when using theStreamDataSource.start_stream_preview()
or BatchDataSource.get_dataframe()
methods.