Skip to content

Delta Lake

Delta Lake is an open-source lakehouse storage format with ACID transactions and schema enforcement. To use Metaxy with Delta Lake, configure DeltaMetadataStore. It persists metadata as Delta tables and uses an in-memory Polars engine for versioning computations.

It supports the local filesystem and remote object stores.

Tip

If Polars 1.37 or greater is installed, lazy Polars frames are sinked via LazyFrame.sink_delta, avoiding unnecessary materialization.

Installation

pip install 'metaxy[delta]'

Using Object Stores

Point root_path at any supported URI (s3://, abfss://, gs://, ...) and forward credentials with storage_options. The dict is passed verbatim to deltalake.

Learn more in the API docs.

Storage Layout

It's possible to control how feature keys map to DeltaLake table locations with the layout parameter:

  • nested (default) places every feature in its own directory: your/feature/key.delta
  • flat stores all of them in the same directory: your__feature_key.delta

metaxy.ext.metadata_stores.delta

Delta Lake metadata store implemented with delta-rs.

metaxy.ext.metadata_stores.delta.DeltaMetadataStore

DeltaMetadataStore(
    root_path: str | Path,
    *,
    storage_options: dict[str, Any] | None = None,
    fallback_stores: list[MetadataStore] | None = None,
    layout: Literal["flat", "nested"] = "nested",
    delta_write_options: dict[str, Any] | None = None,
    **kwargs: Any,
)

Bases: MetadataStore

Delta Lake metadata store backed by delta-rs.

It stores feature metadata in Delta Lake tables located under root_path. It uses the Polars versioning engine for provenance calculations.

Tip

If Polars 1.37 or greater is installed, lazy Polars frames are sinked via LazyFrame.sink_delta, avoiding unnecessary materialization.

Example:

```py
from metaxy.ext.metadata_stores.delta import DeltaMetadataStore

store = DeltaMetadataStore(
    root_path="s3://my-bucket/metaxy",
    storage_options={"AWS_REGION": "us-west-2"},
)
```

Parameters:

  • root_path (str | Path) –

    Base directory or URI where feature tables are stored. Supports local paths (/path/to/dir), s3:// URLs, and other object store URIs.

  • storage_options (dict[str, Any] | None, default: None ) –

    Storage backend options passed to delta-rs. Example: {"AWS_REGION": "us-west-2", "AWS_ACCESS_KEY_ID": "...", ...} See https://delta-io.github.io/delta-rs/ for details on supported options.

  • fallback_stores (list[MetadataStore] | None, default: None ) –

    Ordered list of read-only fallback stores.

  • layout (Literal['flat', 'nested'], default: 'nested' ) –

    Directory layout for feature tables. Options:

    • "nested": Feature tables stored in nested directories {part1}/{part2}.delta

    • "flat": Feature tables stored as {part1}__{part2}.delta

  • delta_write_options (dict[str, Any] | None, default: None ) –

    Additional options passed to deltalake.write_deltalake. Overrides default {"schema_mode": "merge"}. Example: {"max_workers": 4}

  • **kwargs (Any, default: {} ) –
Source code in src/metaxy/ext/metadata_stores/delta.py
def __init__(
    self,
    root_path: str | Path,
    *,
    storage_options: dict[str, Any] | None = None,
    fallback_stores: list[MetadataStore] | None = None,
    layout: Literal["flat", "nested"] = "nested",
    delta_write_options: dict[str, Any] | None = None,
    **kwargs: Any,
) -> None:
    """
    Initialize Delta Lake metadata store.

    Args:
        root_path: Base directory or URI where feature tables are stored.
            Supports local paths (`/path/to/dir`), `s3://` URLs, and other object store URIs.
        storage_options: Storage backend options passed to delta-rs.
            Example: `{"AWS_REGION": "us-west-2", "AWS_ACCESS_KEY_ID": "...", ...}`
            See https://delta-io.github.io/delta-rs/ for details on supported options.
        fallback_stores: Ordered list of read-only fallback stores.
        layout: Directory layout for feature tables. Options:

            - `"nested"`: Feature tables stored in nested directories `{part1}/{part2}.delta`

            - `"flat"`: Feature tables stored as `{part1}__{part2}.delta`

        delta_write_options: Additional options passed to [`deltalake.write_deltalake`][deltalake.write_deltalake].
            Overrides default {"schema_mode": "merge"}. Example: {"max_workers": 4}
        **kwargs: Forwarded to [metaxy.metadata_store.base.MetadataStore][metaxy.metadata_store.base.MetadataStore].
    """
    self.storage_options = storage_options or {}
    if layout not in ("flat", "nested"):
        raise ValueError(f"Invalid layout: {layout}. Must be 'flat' or 'nested'.")
    self.layout = layout
    self.delta_write_options = delta_write_options or {}

    root_str = str(root_path)
    self._is_remote = not is_local_path(root_str)

    if self._is_remote:
        # Remote path (S3, Azure, GCS, etc.)
        self._root_uri = root_str.rstrip("/")
    else:
        # Local path (including file:// and local:// URLs)
        if root_str.startswith("file://"):
            # Strip file:// prefix
            root_str = root_str[7:]
        elif root_str.startswith("local://"):
            # Strip local:// prefix
            root_str = root_str[8:]
        local_path = Path(root_str).expanduser().resolve()
        self._root_uri = str(local_path)

    super().__init__(
        fallback_stores=fallback_stores,
        versioning_engine="polars",
        **kwargs,
    )

Configuration

fallback_stores

List of fallback store names to search when features are not found in the current store.

Type: list[str]

[stores.dev.config]
# Optional
# fallback_stores = []
[tool.metaxy.stores.dev.config]
# Optional
# fallback_stores = []
export METAXY_STORES__DEV__CONFIG__FALLBACK_STORES=...

hash_algorithm

Hash algorithm for versioning. If None, uses store's default.

Type: metaxy.versioning.types.HashAlgorithm | None

[stores.dev.config]
# Optional
# hash_algorithm = null
[tool.metaxy.stores.dev.config]
# Optional
# hash_algorithm = null
export METAXY_STORES__DEV__CONFIG__HASH_ALGORITHM=...

versioning_engine

Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.

Type: Literal['auto', 'native', 'polars'] | Default: "auto"

[stores.dev.config]
versioning_engine = "auto"
[tool.metaxy.stores.dev.config]
versioning_engine = "auto"
export METAXY_STORES__DEV__CONFIG__VERSIONING_ENGINE=auto

root_path

Base directory or URI where feature tables are stored.

Type: str | pathlib.Path

[stores.dev.config]
# Optional
# root_path = null
[tool.metaxy.stores.dev.config]
# Optional
# root_path = null
export METAXY_STORES__DEV__CONFIG__ROOT_PATH=...

storage_options

Storage backend options passed to delta-rs.

Type: dict[str, Any | None

[stores.dev.config]
# Optional
# storage_options = {}
[tool.metaxy.stores.dev.config]
# Optional
# storage_options = {}
export METAXY_STORES__DEV__CONFIG__STORAGE_OPTIONS=...

layout

Directory layout for feature tables ('nested' or 'flat').

Type: Literal['flat', 'nested'] | Default: "nested"

[stores.dev.config]
layout = "nested"
[tool.metaxy.stores.dev.config]
layout = "nested"
export METAXY_STORES__DEV__CONFIG__LAYOUT=nested

delta_write_options

Options passed to deltalake.write_deltalake.

Type: dict[str, Any | None

[stores.dev.config]
# Optional
# delta_write_options = {}
[tool.metaxy.stores.dev.config]
# Optional
# delta_write_options = {}
export METAXY_STORES__DEV__CONFIG__DELTA_WRITE_OPTIONS=...