Concepts¶

Metaxy is built around a few core ideas that work together to solve the problem of incremental processing in multi-modal pipelines.

The Big Picture¶

graph LR
    A[Feature Definitions] --> B[Incremental Updates]
    B --> C[Metadata Store]

Feature definitions declare what data you have and how it depends on other data. Metaxy builds a feature graph from these definitions and uses it to track versions at the sample level. When upstream data changes, Metaxy identifies exactly which downstream samples need recomputation and resolves incremental updates. All of this is persisted in a metadata store. Feature definitions may optionally define custom metadata columns (such as file path, size, etc.) which are stored alongside the versioning information.

Core Concepts¶

Metadata Stores

Unified interface for storing and retrieving metadata across different backends.

Feature Definitions

Declarative specifications that define your data schema, (partial) dependencies, and how versions are calculated.

Versioning

Sample-level version tracking that detects changes and determines what needs recomputation.

Feature Discovery

Automatic registration and graph building from feature definitions in your codebase.

Dependencies and Lineage¶

Lineage Relationship

How features relate to upstream dependencies: one-to-one, one-to-many, or many-to-one.

Optional Dependencies

Handle missing upstream data gracefully without blocking downstream processing.

Filters

Select subsets of samples for processing based on metadata conditions.

Advanced Topics¶

Deletions

Propagate sample deletions through the feature graph correctly.

System Columns

Reserved columns used internally by Metaxy for versioning and deduplication.

Testing

Patterns and utilities for testing Metaxy features.