Skip to content

Concepts

Metaxy is built around a few core ideas that work together to solve the problem of incremental processing in multi-modal pipelines.

The Big Picture

graph LR
    A[Feature Definitions] --> B[Incremental Updates]
    B --> C[Metadata Store]

Feature definitions declare what data you have and how it depends on other data. Metaxy builds a feature graph from these definitions and uses it to track versions at the sample level. When upstream data changes, Metaxy identifies exactly which downstream samples need recomputation and resolves incremental updates. All of this is persisted in a metadata store. Feature definitions may optionally define custom metadata columns (such as file path, size, etc.) which are stored alongside the versioning information.

Core Concepts


Unified interface for storing and retrieving metadata across different backends.


Declarative specifications that define your data schema, (partial) dependencies, and how versions are calculated.


Sample-level version tracking that detects changes and determines what needs recomputation.


Automatic registration and graph building from feature definitions in your codebase.

Dependencies and Lineage


How features relate to upstream dependencies: one-to-one, one-to-many, or many-to-one.


Handle missing upstream data gracefully without blocking downstream processing.


Select subsets of samples for processing based on metadata conditions.

Advanced Topics


Propagate sample deletions through the feature graph correctly.


Reserved columns used internally by Metaxy for versioning and deduplication.


Patterns and utilities for testing Metaxy features.