Basic Example¶
Overview¶
This example demonstrates how Metaxy automatically detects changes in upstream features and triggers recomputation of downstream features. It shows the core value proposition of Metaxy: avoiding unnecessary recomputation while ensuring data consistency.
We will build a simple two-feature pipeline where a child feature depends on a parent feature. When the parent's algorithm changes (represented by code_version), the child feature is automatically recomputed.
The Pipeline¶
Let's define a pipeline with two features:
---
title: Feature Graph
---
flowchart LR
%% Snapshot version: none
%%{init: {'flowchart': {'htmlLabels': true, 'curve': 'basis'}, 'themeVariables': {'fontSize': '14px'}}}%%
examples_parent["<div style="text-align:left"><b>examples/parent</b><br/>7de0f5e8<br/><font color="#999">---</font><br/>- embeddings (05e66510)</div>"]
examples_child["<div style="text-align:left"><b>examples/child</b><br/>b10ea448<br/><font color="#999">---</font><br/>- predictions (9cd1c608)</div>"]
examples_parent --> examples_child
Defining features: "examples/parent"¶
The parent feature represents raw embeddings computed from source data. It has a single field embeddings with a code_version that tracks the algorithm version.
import metaxy as mx
class ParentFeature(
mx.BaseFeature,
spec=mx.FeatureSpec(
key="examples/parent",
fields=[
mx.FieldSpec(
key="embeddings",
code_version="1",
),
],
id_columns=("sample_uid",),
),
):
"""Parent feature that generates embeddings from raw data."""
pass
Defining features: "examples/child"¶
The child feature depends on the parent and produces predictions. The key configuration is the FeatureDep which declares that "examples/child" depends on "examples/parent".
class ChildFeature(
mx.BaseFeature,
spec=mx.FeatureSpec(
key="examples/child",
deps=[ParentFeature],
fields=["predictions"],
id_columns=("sample_uid",),
),
):
"""Child feature that uses parent embeddings to generate predictions."""
pass
The FeatureDep declaration tells Metaxy:
"examples/child"depends on"examples/parent"- When the parent's field provenance changes, the child must be recomputed
- This dependency is tracked automatically, enabling incremental recomputation
Getting Started¶
Install the example's dependencies:
Walkthrough¶
Step 1: Initial Run¶
Run the pipeline to create parent embeddings and child predictions:
Graph project_version: 490f2c18
Written 3 rows for feature examples/parent
Pipeline
============================================================
[1/2] Computing parent feature...
[2/2] Computing child feature...
Graph project_version: 490f2c18
📊 Computing examples/child...
feature_version: b10ea448
Identified: 3 new samples, 0 samples with new provenance_by_field
✓ Materialized 3 new samples
📋 Child provenance_by_field:
sample_uid=1: {'predictions': '24503967'}
sample_uid=2: {'predictions': '24458329'}
sample_uid=3: {'predictions': '26963083'}
✅ Pipeline complete!
The pipeline materialized 3 samples for the child feature. Each sample has its provenance tracked.
Step 2: Verify Idempotency¶
Run the pipeline again without any changes:
Graph project_version: 490f2c18
Metadata already exists for feature examples/parent (feature_version: 7de0f5e8...)
Skipping write to avoid duplicates
Pipeline
============================================================
[1/2] Computing parent feature...
[2/2] Computing child feature...
Graph project_version: 490f2c18
📊 Computing examples/child...
feature_version: b10ea448
Identified: 0 new samples, 0 samples with new provenance_by_field
📋 Child provenance_by_field:
sample_uid=1: {'predictions': '24503967'}
sample_uid=2: {'predictions': '24458329'}
sample_uid=3: {'predictions': '26963083'}
No changes detected (idempotent)
✅ Pipeline complete!
Key observation: No recomputation occurred.
Step 3: Update Parent Algorithm¶
Now let's simulate an algorithm improvement by changing the parent's code_version from "1" to "2":
---
title: Feature Graph Changes
---
flowchart TB
%% Snapshot version: none
%%{init: {'flowchart': {'htmlLabels': true, 'curve': 'basis'}, 'themeVariables': {'fontSize': '14px'}}}%%
examples_parent["<div style="text-align:left"><b>examples/parent</b><br/><font color="#FF0000">7de0f5e8</font> → <font color="#00FF00">68827f3e</font><br/><font color="#999">---</font><br/>- <font color="#FFAA00">embeddings</font> (<font color="#FF0000">05e66510</font> → <font color="#00FF00">3c8d3e9b</font>)</div>"]
examples_child["<div style="text-align:left"><b>examples/child</b><br/><font color="#FF0000">b10ea448</font> → <font color="#00FF00">e5b92b18</font><br/><font color="#999">---</font><br/>- <font color="#FFAA00">predictions</font> (<font color="#FF0000">9cd1c608</font> → <font color="#00FF00">7cef6acb</font>)</div>"]
examples_parent --> examples_child
style examples_child stroke:#FFAA00,stroke-width:2px
style examples_parent stroke:#FFAA00,stroke-width:2px
This change means that the existing embeddings and the downstream feature have to be recomputed.
Step 4: Observe Automatic Recomputation¶
Run the pipeline again after the algorithm change:
Graph project_version: c423d51a
Written 3 rows for feature examples/parent
Pipeline
============================================================
[1/2] Computing parent feature...
[2/2] Computing child feature...
Graph project_version: c423d51a
📊 Computing examples/child...
feature_version: e5b92b18
Identified: 3 new samples, 0 samples with new provenance_by_field
✓ Materialized 3 new samples
📋 Child provenance_by_field:
sample_uid=1: {'predictions': '24503967'}
sample_uid=2: {'predictions': '24458329'}
sample_uid=3: {'predictions': '26963083'}
✅ Pipeline complete!
Key observation: The child feature was automatically recomputed because:
- The parent's
code_versionchanged from"1"to"2" - This changed the parent's
metaxy_feature_version - The child's field dependency on
embeddingsdetected the change - All child samples were marked for recomputation
How It Works¶
Metaxy tracks provenance at the field level using:
- Field Version: A hash combining the field's
code_versionand provenances of upstream fields - Feature Version: A hash combining the field versions of all fields in the feature
- Dependency Resolution: When resolving updates, Metaxy computes what the provenance would be and compares it to what's stored
This enables precise, incremental recomputation without re-processing unchanged data.
Conclusion¶
Metaxy provides automatic change detection and incremental recomputation through:
- Feature dependency tracking via
FeatureDep - Algorithm versioning via
FieldSpec.code_version - Provenance-based change detection via
MetadataStore.resolve_update
This mechanism ensures your pipelines are both efficient and keep relevant data up to date.
Related Materials¶
Learn more about: