About Metaxy¶
Metaxy is...
🧩 Composable¶
Bring your own... really everything. Metaxy is a universal glue for metadata. Use it with:
- Your database or storage format of choice to keep metadata where you want. DuckDB, ClickHouse and 20+ databases via Ibis (1), lakehouse storage formats such as DeltaLake or DuckLake, and other solutions such as LanceDB. All of this is available through a unified interface.
- Your favorite dataframe library: Polars, Pandas, or even run all Metaxy computations in the DB thanks to Narwhals
- Orchestrators: see the excellent Dagster integration
- Compute frameworks like Ray. We totally don't care how is data (2) produced or where is it stored.
- Version tracking methods. By default, Metaxy uses Merkle Trees to track changes in metadata. However, users can provide their own data versions and use content-based hashing techniques (3) if needed.
- while we don't (yet) ship native support for all these databases, the base
IbisMetadataStorecan be easily extended to handle additional databases - so not the tables but the actual stuff: images, videos, texts, etc.
- from naive
sha256to more sophisticated semantic hashing
🪨 Reliable¶
Metaxy is obsessively tested across all supported tabular compute engines. We guarantee to produce versioning hashes that are consistent across DBs and local compute engines. We really have tested this very well! (1)
- At the moment of writing our test suite contains more than 2000 tests executed against Linux, Windows and MacOS on all supported Python versions
Metadata is organized in append-only tables. Metaxy never attempts to modify historical metadata, (1) ensuring that data integrity is maintained and historical metadata can be easily retrieved and analyzed.
- But provides the ability to perform hard and soft deletions
📈 Scalable¶
Metaxy is built with performance in mind: all operations default to run in the DB, the storage layout is designed with the goal of supporting parallel writers and bulk insertions.
Feature definitions can be split across independent Python modules and packages and automatically loaded via packaging entry points. This enables collaboration across teams and projects.
We also have a Ray integration which simplifies working with Metaxy from distributed workflows.
🧑💻 Developer Friendly¶
Metaxy provides a clean, intuitive Python API with syntactic sugar that simplifies common feature definitions. The feature discovery system enables effortless feature dependency management.
The library includes comprehensive type hints (1), and utilizes Pydantic for feature definitions. There's first-class support for local development (2), testing, preview environments, and CI/CD workflows.
- with all the typing shenanigans you would expect from a project as serious as ours
- the reference local versioning engine is implemented in Polars and
polars-hash
The included CLI tool allows easy interaction, inspection and visualization of feature graphs, enriched with real metadata and stats. You can even drop your database in one command! (1)
- that's a joke, it can only be truncated from CLI
Hopefully this was impressive enough and has sparked some interest in Metaxy!
🚀 What's Next?¶
- Itching to write some Metaxy code? Continue to Quickstart (WIP!).
- Learn more about feature definitions or versioning
- View complete, end-to-end examples
- Explore Metaxy integrations
- Use Metaxy from the command line
- Learn how to configure Metaxy
- Get lost in our API Reference