Metaxy + Delta Lake¶
Delta Lake is an open-source lakehouse storage format with ACID transactions and schema enforcement. To use Metaxy with Delta Lake, configure DeltaMetadataStore. It persists metadata as Delta tables and uses an in-memory Polars engine for versioning computations.
It supports the local filesystem and remote object stores.
Tip
If Polars 1.37 or greater is installed, lazy Polars frames are sinked via
LazyFrame.sink_delta, avoiding unnecessary materialization.
Installation¶
API Reference¶
metaxy.ext.metadata_stores.delta
¶
Delta Lake metadata store implemented with delta-rs.
metaxy.ext.metadata_stores.delta.DeltaMetadataStore
¶
DeltaMetadataStore(
root_path: str | Path,
*,
storage_options: dict[str, Any] | None = None,
fallback_stores: list[MetadataStore] | None = None,
layout: Literal["flat", "nested"] = "nested",
delta_write_options: dict[str, Any] | None = None,
**kwargs: Any,
)
Bases: MetadataStore
Delta Lake metadata store backed by delta-rs.
It stores feature metadata in Delta Lake tables located under root_path.
It uses the Polars versioning engine for provenance calculations.
Tip
If Polars 1.37 or greater is installed, lazy Polars frames are sinked via
LazyFrame.sink_delta, avoiding unnecessary materialization.
Example:
```py
from metaxy.ext.metadata_stores.delta import DeltaMetadataStore
store = DeltaMetadataStore(
root_path="s3://my-bucket/metaxy",
storage_options={"AWS_REGION": "us-west-2"},
)
```
Parameters:
-
root_path(str | Path) βBase directory or URI where feature tables are stored. Supports local paths (
/path/to/dir),s3://URLs, and other object store URIs. -
storage_options(dict[str, Any] | None, default:None) βStorage backend options passed to delta-rs. Example:
{"AWS_REGION": "us-west-2", "AWS_ACCESS_KEY_ID": "...", ...}See https://delta-io.github.io/delta-rs/ for details on supported options. -
fallback_stores(list[MetadataStore] | None, default:None) βOrdered list of read-only fallback stores.
-
layout(Literal['flat', 'nested'], default:'nested') βDirectory layout for feature tables. Options:
-
"nested": Feature tables stored in nested directories{part1}/{part2}.delta -
"flat": Feature tables stored as{part1}__{part2}.delta
-
-
delta_write_options(dict[str, Any] | None, default:None) βAdditional options passed to
deltalake.write_deltalake. Overrides default {"schema_mode": "merge"}. Example: {"max_workers": 4} -
**kwargs(Any, default:{}) βForwarded to metaxy.metadata_store.base.MetadataStore.
Source code in src/metaxy/ext/metadata_stores/delta.py
def __init__(
self,
root_path: str | Path,
*,
storage_options: dict[str, Any] | None = None,
fallback_stores: list[MetadataStore] | None = None,
layout: Literal["flat", "nested"] = "nested",
delta_write_options: dict[str, Any] | None = None,
**kwargs: Any,
) -> None:
"""
Initialize Delta Lake metadata store.
Args:
root_path: Base directory or URI where feature tables are stored.
Supports local paths (`/path/to/dir`), `s3://` URLs, and other object store URIs.
storage_options: Storage backend options passed to delta-rs.
Example: `{"AWS_REGION": "us-west-2", "AWS_ACCESS_KEY_ID": "...", ...}`
See https://delta-io.github.io/delta-rs/ for details on supported options.
fallback_stores: Ordered list of read-only fallback stores.
layout: Directory layout for feature tables. Options:
- `"nested"`: Feature tables stored in nested directories `{part1}/{part2}.delta`
- `"flat"`: Feature tables stored as `{part1}__{part2}.delta`
delta_write_options: Additional options passed to [`deltalake.write_deltalake`][deltalake.write_deltalake].
Overrides default {"schema_mode": "merge"}. Example: {"max_workers": 4}
**kwargs: Forwarded to [metaxy.metadata_store.base.MetadataStore][metaxy.metadata_store.base.MetadataStore].
"""
self.storage_options = storage_options or {}
if layout not in ("flat", "nested"):
raise ValueError(f"Invalid layout: {layout}. Must be 'flat' or 'nested'.")
self.layout = layout
self.delta_write_options = delta_write_options or {}
root_str = str(root_path)
self._is_remote = not is_local_path(root_str)
if self._is_remote:
# Remote path (S3, Azure, GCS, etc.)
self._root_uri = root_str.rstrip("/")
else:
# Local path (including file:// and local:// URLs)
if root_str.startswith("file://"):
# Strip file:// prefix
root_str = root_str[7:]
elif root_str.startswith("local://"):
# Strip local:// prefix
root_str = root_str[8:]
local_path = Path(root_str).expanduser().resolve()
self._root_uri = str(local_path)
super().__init__(
fallback_stores=fallback_stores,
versioning_engine="polars",
**kwargs,
)
Configuration¶
Configuration for DeltaMetadataStore.
Example
Show JSON schema:
{
"$defs": {
"HashAlgorithm": {
"description": "Supported hash algorithms for field provenance calculation.\n\nThese algorithms are chosen for:\n- Speed (non-cryptographic hashes preferred)\n- Cross-database availability\n- Good collision resistance for field provenance calculation",
"enum": [
"xxhash64",
"xxhash32",
"wyhash",
"sha256",
"md5",
"farmhash"
],
"title": "HashAlgorithm",
"type": "string"
}
},
"additionalProperties": false,
"description": "Configuration for DeltaMetadataStore.\n\nExample:\n ```toml title=\"metaxy.toml\"\n [stores.dev]\n type = \"metaxy.ext.metadata_stores.delta.DeltaMetadataStore\"\n\n [stores.dev.config]\n root_path = \"s3://my-bucket/metaxy\"\n layout = \"nested\"\n\n [stores.dev.config.storage_options]\n AWS_REGION = \"us-west-2\"\n ```",
"properties": {
"fallback_stores": {
"description": "List of fallback store names to search when features are not found in the current store.",
"items": {
"type": "string"
},
"title": "Fallback Stores",
"type": "array"
},
"hash_algorithm": {
"anyOf": [
{
"$ref": "#/$defs/HashAlgorithm"
},
{
"type": "null"
}
],
"default": null,
"description": "Hash algorithm for versioning. If None, uses store's default."
},
"versioning_engine": {
"default": "auto",
"description": "Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.",
"enum": [
"auto",
"native",
"polars"
],
"title": "Versioning Engine",
"type": "string"
},
"root_path": {
"anyOf": [
{
"type": "string"
},
{
"format": "path",
"type": "string"
}
],
"description": "Base directory or URI where feature tables are stored.",
"title": "Root Path"
},
"storage_options": {
"anyOf": [
{
"additionalProperties": true,
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "Storage backend options passed to delta-rs.",
"title": "Storage Options"
},
"layout": {
"default": "nested",
"description": "Directory layout for feature tables ('nested' or 'flat').",
"enum": [
"flat",
"nested"
],
"title": "Layout",
"type": "string"
},
"delta_write_options": {
"anyOf": [
{
"additionalProperties": true,
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "Options passed to [`deltalake.write_deltalake`][deltalake.write_deltalake].",
"title": "Delta Write Options"
}
},
"required": [
"root_path"
],
"title": "DeltaMetadataStoreConfig",
"type": "object"
}
Config:
frozen:Trueextra:forbid
metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.fallback_stores
pydantic-field
¶
List of fallback store names to search when features are not found in the current store.
metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.hash_algorithm
pydantic-field
¶
hash_algorithm: HashAlgorithm | None = None
Hash algorithm for versioning. If None, uses store's default.
metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.versioning_engine
pydantic-field
¶
versioning_engine: Literal["auto", "native", "polars"] = (
"auto"
)
Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.
metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.root_path
pydantic-field
¶
Base directory or URI where feature tables are stored.
metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.storage_options
pydantic-field
¶
Storage backend options passed to delta-rs.
metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.layout
pydantic-field
¶
layout: Literal['flat', 'nested'] = 'nested'
Directory layout for feature tables ('nested' or 'flat').
metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.delta_write_options
pydantic-field
¶
Options passed to deltalake.write_deltalake.