Delta Lake¶
Delta Lake is an open-source lakehouse storage format with ACID transactions and schema enforcement. To use Metaxy with Delta Lake, configure DeltaMetadataStore. It persists metadata as Delta tables and uses an in-memory Polars engine for versioning computations.
It supports the local filesystem and remote object stores.
Tip
If Polars 1.37 or greater is installed, lazy Polars frames are sinked via
LazyFrame.sink_delta, avoiding unnecessary materialization.
Installation¶
Using Object Stores¶
Point root_path at any supported URI (s3://, abfss://, gs://, ...) and forward credentials with storage_options.
The dict is passed verbatim to deltalake.
Learn more in the API docs.
Storage Layout¶
It's possible to control how feature keys map to DeltaLake table locations with the layout parameter:
nested(default) places every feature in its own directory:your/feature/key.deltaflatstores all of them in the same directory:your__feature_key.delta
metaxy.ext.metadata_stores.delta
¶
Delta Lake metadata store implemented with delta-rs.
metaxy.ext.metadata_stores.delta.DeltaMetadataStore
¶
DeltaMetadataStore(
root_path: str | Path,
*,
storage_options: dict[str, Any] | None = None,
fallback_stores: list[MetadataStore] | None = None,
layout: Literal["flat", "nested"] = "nested",
delta_write_options: dict[str, Any] | None = None,
**kwargs: Any,
)
Bases: MetadataStore
Delta Lake metadata store backed by delta-rs.
It stores feature metadata in Delta Lake tables located under root_path.
It uses the Polars versioning engine for provenance calculations.
Tip
If Polars 1.37 or greater is installed, lazy Polars frames are sinked via
LazyFrame.sink_delta, avoiding unnecessary materialization.
Example:
```py
from metaxy.ext.metadata_stores.delta import DeltaMetadataStore
store = DeltaMetadataStore(
root_path="s3://my-bucket/metaxy",
storage_options={"AWS_REGION": "us-west-2"},
)
```
Parameters:
-
root_path(str | Path) –Base directory or URI where feature tables are stored. Supports local paths (
/path/to/dir),s3://URLs, and other object store URIs. -
storage_options(dict[str, Any] | None, default:None) –Storage backend options passed to delta-rs. Example:
{"AWS_REGION": "us-west-2", "AWS_ACCESS_KEY_ID": "...", ...}See https://delta-io.github.io/delta-rs/ for details on supported options. -
fallback_stores(list[MetadataStore] | None, default:None) –Ordered list of read-only fallback stores.
-
layout(Literal['flat', 'nested'], default:'nested') –Directory layout for feature tables. Options:
-
"nested": Feature tables stored in nested directories{part1}/{part2}.delta -
"flat": Feature tables stored as{part1}__{part2}.delta
-
-
delta_write_options(dict[str, Any] | None, default:None) –Additional options passed to
deltalake.write_deltalake. Overrides default {"schema_mode": "merge"}. Example: {"max_workers": 4} -
**kwargs(Any, default:{}) –Forwarded to metaxy.metadata_store.base.MetadataStore.
Source code in src/metaxy/ext/metadata_stores/delta.py
def __init__(
self,
root_path: str | Path,
*,
storage_options: dict[str, Any] | None = None,
fallback_stores: list[MetadataStore] | None = None,
layout: Literal["flat", "nested"] = "nested",
delta_write_options: dict[str, Any] | None = None,
**kwargs: Any,
) -> None:
"""
Initialize Delta Lake metadata store.
Args:
root_path: Base directory or URI where feature tables are stored.
Supports local paths (`/path/to/dir`), `s3://` URLs, and other object store URIs.
storage_options: Storage backend options passed to delta-rs.
Example: `{"AWS_REGION": "us-west-2", "AWS_ACCESS_KEY_ID": "...", ...}`
See https://delta-io.github.io/delta-rs/ for details on supported options.
fallback_stores: Ordered list of read-only fallback stores.
layout: Directory layout for feature tables. Options:
- `"nested"`: Feature tables stored in nested directories `{part1}/{part2}.delta`
- `"flat"`: Feature tables stored as `{part1}__{part2}.delta`
delta_write_options: Additional options passed to [`deltalake.write_deltalake`][deltalake.write_deltalake].
Overrides default {"schema_mode": "merge"}. Example: {"max_workers": 4}
**kwargs: Forwarded to [metaxy.metadata_store.base.MetadataStore][metaxy.metadata_store.base.MetadataStore].
"""
self.storage_options = storage_options or {}
if layout not in ("flat", "nested"):
raise ValueError(f"Invalid layout: {layout}. Must be 'flat' or 'nested'.")
self.layout = layout
self.delta_write_options = delta_write_options or {}
root_str = str(root_path)
self._is_remote = not is_local_path(root_str)
if self._is_remote:
# Remote path (S3, Azure, GCS, etc.)
self._root_uri = root_str.rstrip("/")
else:
# Local path (including file:// and local:// URLs)
if root_str.startswith("file://"):
# Strip file:// prefix
root_str = root_str[7:]
elif root_str.startswith("local://"):
# Strip local:// prefix
root_str = root_str[8:]
local_path = Path(root_str).expanduser().resolve()
self._root_uri = str(local_path)
super().__init__(
fallback_stores=fallback_stores,
versioning_engine="polars",
**kwargs,
)
Configuration¶
fallback_stores¶
List of fallback store names to search when features are not found in the current store.
Type: list[str]
hash_algorithm¶
Hash algorithm for versioning. If None, uses store's default.
Type: metaxy.versioning.types.HashAlgorithm | None
versioning_engine¶
Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.
Type: Literal['auto', 'native', 'polars'] | Default: "auto"
root_path¶
Base directory or URI where feature tables are stored.
Type: str | pathlib.Path
storage_options¶
Storage backend options passed to delta-rs.
Type: dict[str, Any | None
layout¶
Directory layout for feature tables ('nested' or 'flat').
Type: Literal['flat', 'nested'] | Default: "nested"
delta_write_options¶
Options passed to deltalake.write_deltalake.
Type: dict[str, Any | None