Skip to content

Metaxy + Delta Lake

Delta Lake is an open-source lakehouse storage format with ACID transactions and schema enforcement. To use Metaxy with Delta Lake, configure DeltaMetadataStore. It persists metadata as Delta tables and uses an in-memory Polars engine for versioning computations.

It supports the local filesystem and remote object stores.

Tip

If Polars 1.37 or greater is installed, lazy Polars frames are sinked via LazyFrame.sink_delta, avoiding unnecessary materialization.

Installation

pip install 'metaxy[delta]'

API Reference

metaxy.ext.metadata_stores.delta

Delta Lake metadata store implemented with delta-rs.

metaxy.ext.metadata_stores.delta.DeltaMetadataStore

DeltaMetadataStore(
    root_path: str | Path,
    *,
    storage_options: dict[str, Any] | None = None,
    fallback_stores: list[MetadataStore] | None = None,
    layout: Literal["flat", "nested"] = "nested",
    delta_write_options: dict[str, Any] | None = None,
    **kwargs: Any,
)

Bases: MetadataStore

Delta Lake metadata store backed by delta-rs.

It stores feature metadata in Delta Lake tables located under root_path. It uses the Polars versioning engine for provenance calculations.

Tip

If Polars 1.37 or greater is installed, lazy Polars frames are sinked via LazyFrame.sink_delta, avoiding unnecessary materialization.

Example:

```py
from metaxy.ext.metadata_stores.delta import DeltaMetadataStore

store = DeltaMetadataStore(
    root_path="s3://my-bucket/metaxy",
    storage_options={"AWS_REGION": "us-west-2"},
)
```

Parameters:

  • root_path (str | Path) –

    Base directory or URI where feature tables are stored. Supports local paths (/path/to/dir), s3:// URLs, and other object store URIs.

  • storage_options (dict[str, Any] | None, default: None ) –

    Storage backend options passed to delta-rs. Example: {"AWS_REGION": "us-west-2", "AWS_ACCESS_KEY_ID": "...", ...} See https://delta-io.github.io/delta-rs/ for details on supported options.

  • fallback_stores (list[MetadataStore] | None, default: None ) –

    Ordered list of read-only fallback stores.

  • layout (Literal['flat', 'nested'], default: 'nested' ) –

    Directory layout for feature tables. Options:

    • "nested": Feature tables stored in nested directories {part1}/{part2}.delta

    • "flat": Feature tables stored as {part1}__{part2}.delta

  • delta_write_options (dict[str, Any] | None, default: None ) –

    Additional options passed to deltalake.write_deltalake. Overrides default {"schema_mode": "merge"}. Example: {"max_workers": 4}

  • **kwargs (Any, default: {} ) –
Source code in src/metaxy/ext/metadata_stores/delta.py
def __init__(
    self,
    root_path: str | Path,
    *,
    storage_options: dict[str, Any] | None = None,
    fallback_stores: list[MetadataStore] | None = None,
    layout: Literal["flat", "nested"] = "nested",
    delta_write_options: dict[str, Any] | None = None,
    **kwargs: Any,
) -> None:
    """
    Initialize Delta Lake metadata store.

    Args:
        root_path: Base directory or URI where feature tables are stored.
            Supports local paths (`/path/to/dir`), `s3://` URLs, and other object store URIs.
        storage_options: Storage backend options passed to delta-rs.
            Example: `{"AWS_REGION": "us-west-2", "AWS_ACCESS_KEY_ID": "...", ...}`
            See https://delta-io.github.io/delta-rs/ for details on supported options.
        fallback_stores: Ordered list of read-only fallback stores.
        layout: Directory layout for feature tables. Options:

            - `"nested"`: Feature tables stored in nested directories `{part1}/{part2}.delta`

            - `"flat"`: Feature tables stored as `{part1}__{part2}.delta`

        delta_write_options: Additional options passed to [`deltalake.write_deltalake`][deltalake.write_deltalake].
            Overrides default {"schema_mode": "merge"}. Example: {"max_workers": 4}
        **kwargs: Forwarded to [metaxy.metadata_store.base.MetadataStore][metaxy.metadata_store.base.MetadataStore].
    """
    self.storage_options = storage_options or {}
    if layout not in ("flat", "nested"):
        raise ValueError(f"Invalid layout: {layout}. Must be 'flat' or 'nested'.")
    self.layout = layout
    self.delta_write_options = delta_write_options or {}

    root_str = str(root_path)
    self._is_remote = not is_local_path(root_str)

    if self._is_remote:
        # Remote path (S3, Azure, GCS, etc.)
        self._root_uri = root_str.rstrip("/")
    else:
        # Local path (including file:// and local:// URLs)
        if root_str.startswith("file://"):
            # Strip file:// prefix
            root_str = root_str[7:]
        elif root_str.startswith("local://"):
            # Strip local:// prefix
            root_str = root_str[8:]
        local_path = Path(root_str).expanduser().resolve()
        self._root_uri = str(local_path)

    super().__init__(
        fallback_stores=fallback_stores,
        versioning_engine="polars",
        **kwargs,
    )

Configuration

Configuration for DeltaMetadataStore.

Example
metaxy.toml
[stores.dev]
type = "metaxy.ext.metadata_stores.delta.DeltaMetadataStore"

[stores.dev.config]
root_path = "s3://my-bucket/metaxy"
layout = "nested"

[stores.dev.config.storage_options]
AWS_REGION = "us-west-2"
Show JSON schema:
{
  "$defs": {
    "HashAlgorithm": {
      "description": "Supported hash algorithms for field provenance calculation.\n\nThese algorithms are chosen for:\n- Speed (non-cryptographic hashes preferred)\n- Cross-database availability\n- Good collision resistance for field provenance calculation",
      "enum": [
        "xxhash64",
        "xxhash32",
        "wyhash",
        "sha256",
        "md5",
        "farmhash"
      ],
      "title": "HashAlgorithm",
      "type": "string"
    }
  },
  "additionalProperties": false,
  "description": "Configuration for DeltaMetadataStore.\n\nExample:\n    ```toml title=\"metaxy.toml\"\n    [stores.dev]\n    type = \"metaxy.ext.metadata_stores.delta.DeltaMetadataStore\"\n\n    [stores.dev.config]\n    root_path = \"s3://my-bucket/metaxy\"\n    layout = \"nested\"\n\n    [stores.dev.config.storage_options]\n    AWS_REGION = \"us-west-2\"\n    ```",
  "properties": {
    "fallback_stores": {
      "description": "List of fallback store names to search when features are not found in the current store.",
      "items": {
        "type": "string"
      },
      "title": "Fallback Stores",
      "type": "array"
    },
    "hash_algorithm": {
      "anyOf": [
        {
          "$ref": "#/$defs/HashAlgorithm"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Hash algorithm for versioning. If None, uses store's default."
    },
    "versioning_engine": {
      "default": "auto",
      "description": "Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.",
      "enum": [
        "auto",
        "native",
        "polars"
      ],
      "title": "Versioning Engine",
      "type": "string"
    },
    "root_path": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "format": "path",
          "type": "string"
        }
      ],
      "description": "Base directory or URI where feature tables are stored.",
      "title": "Root Path"
    },
    "storage_options": {
      "anyOf": [
        {
          "additionalProperties": true,
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Storage backend options passed to delta-rs.",
      "title": "Storage Options"
    },
    "layout": {
      "default": "nested",
      "description": "Directory layout for feature tables ('nested' or 'flat').",
      "enum": [
        "flat",
        "nested"
      ],
      "title": "Layout",
      "type": "string"
    },
    "delta_write_options": {
      "anyOf": [
        {
          "additionalProperties": true,
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Options passed to [`deltalake.write_deltalake`][deltalake.write_deltalake].",
      "title": "Delta Write Options"
    }
  },
  "required": [
    "root_path"
  ],
  "title": "DeltaMetadataStoreConfig",
  "type": "object"
}

Config:

  • frozen: True
  • extra: forbid

metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.fallback_stores pydantic-field

fallback_stores: list[str]

List of fallback store names to search when features are not found in the current store.

[stores.dev.config]
fallback_stores = []
[tool.metaxy.stores.dev.config]
fallback_stores = []
export METAXY_STORES__DEV__CONFIG__FALLBACK_STORES=[]

metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.hash_algorithm pydantic-field

hash_algorithm: HashAlgorithm | None = None

Hash algorithm for versioning. If None, uses store's default.

[stores.dev.config]
hash_algorithm = "..."
[tool.metaxy.stores.dev.config]
hash_algorithm = "..."
export METAXY_STORES__DEV__CONFIG__HASH_ALGORITHM=...

metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.versioning_engine pydantic-field

versioning_engine: Literal["auto", "native", "polars"] = (
    "auto"
)

Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.

[stores.dev.config]
versioning_engine = "auto"
[tool.metaxy.stores.dev.config]
versioning_engine = "auto"
export METAXY_STORES__DEV__CONFIG__VERSIONING_ENGINE=auto

metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.root_path pydantic-field

root_path: str | Path

Base directory or URI where feature tables are stored.

[stores.dev.config]
root_path = "..."
[tool.metaxy.stores.dev.config]
root_path = "..."
export METAXY_STORES__DEV__CONFIG__ROOT_PATH=...

metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.storage_options pydantic-field

storage_options: dict[str, Any] | None = None

Storage backend options passed to delta-rs.

[stores.dev.config]
storage_options = {}
[tool.metaxy.stores.dev.config]
storage_options = {}
export METAXY_STORES__DEV__CONFIG__STORAGE_OPTIONS=...

metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.layout pydantic-field

layout: Literal['flat', 'nested'] = 'nested'

Directory layout for feature tables ('nested' or 'flat').

[stores.dev.config]
layout = "nested"
[tool.metaxy.stores.dev.config]
layout = "nested"
export METAXY_STORES__DEV__CONFIG__LAYOUT=nested

metaxy.ext.metadata_stores.delta.DeltaMetadataStoreConfig.delta_write_options pydantic-field

delta_write_options: dict[str, Any] | None = None

Options passed to deltalake.write_deltalake.

[stores.dev.config]
delta_write_options = {}
[tool.metaxy.stores.dev.config]
delta_write_options = {}
export METAXY_STORES__DEV__CONFIG__DELTA_WRITE_OPTIONS=...