The Metaxy Pitch¶
Here is why we think Metaxy is so cool. Metaxy is...
🧩 Composable¶
Bring your own... really everything. Metaxy is a universal glue for metadata. Use it with:
- Your database or storage format of choice to keep metadata where you want. DuckDB, ClickHouse and 20+ databases via Ibis (1), lakehouse storage formats such as DeltaLake or DuckLake, and other solutions such as LanceDB. All of this is available through a unified interface.
- Your favorite dataframe library: Polars, Pandas, or even run all Metaxy computations in the DB thanks to Narwhals
- Orchestrators: see the excellent Dagster integration
- Compute frameworks like Ray. We totally don't care how data (2) is produced or where it is stored.
- Version tracking methods. By default, Metaxy uses Merkle Trees to track changes in metadata. However, users can provide their own data versions and use content-based hashing techniques (3) if needed.
- while we don't (yet) ship native support for all these databases, the base
IbisMetadataStorecan be easily extended to handle additional databases - so not the tables but the actual stuff: images, videos, texts, etc.
- from naive
sha256to more sophisticated semantic hashing
🪨 Reliable¶
Metaxy is obsessively tested across all supported tabular compute engines. We guarantee to produce versioning hashes that are consistent across DBs and local compute engines. We really have tested this very well! (1)
- We have a rich test suite that's run against Linux, Windows and MacOS on all supported Python versions
Metadata is organized in append-only tables. Metaxy never attempts to modify historical metadata, (1) ensuring that data integrity is maintained and historical metadata can be easily retrieved.
- But provides the ability to perform hard and soft deletions
📈 Scalable¶
Metaxy is built with performance in mind: all operations default to run in the DB, the storage layout is designed with the goal of supporting parallel writers and bulk insertions.
Feature definitions can be split across independent Python modules and packages and automatically loaded via packaging entry points. This enables collaboration across teams and projects.
We also have a Ray integration which simplifies working with Metaxy from distributed workflows.
🧑💻 Developer Friendly¶
Metaxy provides a clean, intuitive Python API with syntactic sugar that simplifies common feature definitions. The feature discovery system enables effortless feature dependency management.
The library includes comprehensive type hints (1), and utilizes Pydantic for feature definitions. There's first-class support for local development (2), testing, preview environments, and CI/CD workflows.
- with all the typing shenanigans you would expect from a project as serious as ours
- the reference local versioning engine is implemented in Polars and
polars-hash
The included CLI tool allows easy interaction, inspection and visualization of feature graphs, enriched with real metadata and stats. You can even drop your database in one command! (1)
- that's almost a joke
Hopefully this was impressive enough and has sparked some interest in Metaxy!
What's Next?¶
- Learn about Metaxy design choices
- Itching to write some Metaxy code? Jump to Quickstart.
- Learn more about Metaxy concepts
- View complete, end-to-end examples
- Explore Metaxy integrations
- Invoke
mxCLI from your terminal - Learn how to configure Metaxy
- Get lost in our API Reference