Metadata Stores¶
Metaxy abstracts interactions with metadata stored in external systems such as databases, files, or object stores, through a unified interface: MetadataStore. MetadataStore is implemented to satisfy storage design choices.
All operations with metadata stores may reference features as one of the supported syntactic sugar alternatives. In practice, it is typically convenient to either use feature classes or stringified feature keys.
Metadata accept Narwhals-compatible dataframes and return Narwhals dataframes. In practice, we have tested Metaxy with Pandas, Polars and Ibis dataframes.
Instantiation¶
There are generally two ways to create a MetadataStore. We are going to demonstrate both with DeltaLake as an example.
-
Using the Python API directly:
-
Via Metaxy configuration:
First, create a
metaxy.tomlfile:metaxy.toml[stores.dev] type = "metaxy.ext.metadata_stores.delta.DeltaMetadataStore" root_path = "/path/to/directory"Now the metadata store can be constructed from a
MetaxyConfiginstance.
Now the store is ready to be used. We'll also assume there is a MyFeature feature class (1) prepared.
- with
"my/feature"key
Writes¶
In order to save metadata into a metadata store, you can use the write method:
Subsequent writes effectively overwrite the previous metadata, while actually appending to the same table.
Flushing Metadata In The Background
Usually it's desired to write metadata to the metadata store as soon as it becomes available.
This ensures the pipeline can resume processing after a failure and no data is lost.
BufferedMetadataWriter can be used to achieve this: it writes metadata in real-time from a background thread.
Reads¶
Metadata can be retrieved using the read method:
Example
By default, Metaxy drops historical records with the same feature version, which makes the write-read sequence idempotent for an outside observer.
Increment Resolution¶
Increments can be computed using the resolve_update method:
The returned Increment (or LazyIncrement) holds fresh samples that haven't been processed yet, stale samples which require to be processed again, and orphaned samples which are no longer present in upstream features and may be deleted.
Tip
Root features (1) require the samples argument to be set as well, since Metaxy would not be able to load upstream metadata automatically.
- features that do not have upstream features
It is up to the caller to decide how to handle the processing and potential deletion of orphaned samples.
Once processing is complete, the caller is expected to call MetadataStore.write to record metadata about the processed samples.
Where are increments computed?
Learn more here.
How are increments computed?
Learn more here.
Deletes¶
Metadata stores support deletions, which are not required during normal Metaxy operations (1).
- deletions might be necessary when working with expansion linear relationships.
Here is an example of how a deletion would look like:
from datetime import datetime, timedelta, timezone
import narwhals as nw
with store.open("w"):
store.delete(
MyFeature,
filters=[nw.col("metaxy_created_at") < datetime.now(timezone.utc) - timedelta(days=30)],
)
Learn more about deletions here.
Fallback Stores¶
Metaxy metadata stores can be configured to pull missing metadata from another store. This is very useful for local and testing workflows, because it allows to avoid materializing the entire data pipeline locally. Instead, Metaxy stores can automatically pull missing metadata from production.
Example Metaxy configuration:
[stores.dev]
type = "metaxy.ext.metadata_stores.delta.DeltaMetadataStore"
root_path = "${HOME}/.metaxy/dev"
fallback_stores = ["prod"]
[stores.prod]
type = "metaxy.ext.metadata_stores.delta.DeltaMetadataStore"
root_path = "s3://my-prod-bucket/metaxy"
Warning
Currently, the "missing metadata" detection works by checking whether the feature table exists in the store. This works in conjunction with automatic table creation, but doesn't work if empty tables are pre-created by e.g. migration tooling or some kind of CI/CD workflows. This will be improved in the future.
Metaxy doesn't mix metadata from different stores: either the entire feature is going to be pulled from the fallback store, or the primary store will be used.
Fallback stores can be chained at arbitrary depth.
Metadata Store Implementations¶
Metaxy provides ready MetadataStore implementations for popular databases and storage systems.