Skip to content

Metaxy + LanceDB

Experimental

This functionality is experimental.

LanceDB is an vector database built on the Lance columnar format. To use Metaxy with LanceDB, configure LanceDBMetadataStore. It uses the in-memory Polars engine for versioning computations. LanceDB handles schema evolution, transactions, and compaction automatically.

It runs embedded (local directory) or against external storage (object stores, HTTP endpoints, LanceDB Cloud), so you can use the same store type for local development and cloud workloads.

Installation

The backend relies on lancedb, which is shipped with Metaxy's lancedb extras.

pip install 'metaxy[lancedb]'

Storage Targets

Point uri at any supported URI (s3://, gs://, az://, db://, ...) and forward credentials with the platform's native mechanism (environment variables, IAM roles, workload identity, etc.). LanceDB supports local filesystem, S3, GCS, Azure, LanceDB Cloud, and remote HTTP/HTTPS endpoints.

Storage Layout

All tables are stored within a single LanceDB database at the configured URI location. Each feature gets its own Lance table.

API Reference

metaxy.ext.metadata_stores.lancedb

LanceDB metadata store implementation.

metaxy.ext.metadata_stores.lancedb.LanceDBMetadataStore

LanceDBMetadataStore(
    uri: str | Path,
    *,
    fallback_stores: list[MetadataStore] | None = None,
    connect_kwargs: dict[str, Any] | None = None,
    **kwargs: Any,
)

Bases: MetadataStore

LanceDB metadata store for vector and structured data.

LanceDB is a columnar database optimized for vector search and multimodal data. Each feature is stored in its own Lance table within the database directory. Uses Polars components for data processing (no native SQL execution).

Storage layout:

  • Each feature gets its own table: {namespace}__{feature_name}

  • Tables are stored as Lance format in the directory specified by the URI

  • LanceDB handles schema evolution, transactions, and compaction automatically

Local Directory
from pathlib import Path
from metaxy.ext.metadata_stores.lancedb import LanceDBMetadataStore

# Local filesystem
store = LanceDBMetadataStore(Path("/path/to/featuregraph"))
Object Storage (S3, GCS, Azure)
# object store (requires credentials)
store = LanceDBMetadataStore("s3:///path/to/featuregraph")
LanceDB Cloud
import os

# Option 1: Environment variable
os.environ["LANCEDB_API_KEY"] = "your-api-key"
store = LanceDBMetadataStore("db://my-database")

# Option 2: Explicit credentials
store = LanceDBMetadataStore(
    "db://my-database", connect_kwargs={"api_key": "your-api-key", "region": "us-east-1"}
)

The database directory is created automatically if it doesn't exist (local paths only). Tables are created on-demand when features are first written.

Parameters:

  • uri (str | Path) โ€“

    Directory path or URI for LanceDB tables. Supports:

    • Local path: "./metadata" or Path("/data/metaxy/lancedb")

    • Object stores: s3://, gs://, az:// (requires cloud credentials)

    • LanceDB Cloud: "db://database-name" (requires API key)

    • Remote HTTP/HTTPS: Any URI supported by LanceDB

  • fallback_stores (list[MetadataStore] | None, default: None ) โ€“

    Ordered list of read-only fallback stores. When reading features not found in this store, Metaxy searches fallback stores in order. Useful for local dev โ†’ staging โ†’ production chains.

  • connect_kwargs (dict[str, Any] | None, default: None ) โ€“

    Extra keyword arguments passed directly to lancedb.connect(). Useful for LanceDB Cloud credentials (api_key, region) when you cannot rely on environment variables.

  • **kwargs (Any, default: {} ) โ€“

    Passed to metaxy.metadata_store.base.MetadataStore (e.g., hash_algorithm, hash_truncation_length, prefer_native)

Note

Unlike SQL stores, LanceDB doesn't require explicit table creation. Tables are created automatically when writing metadata.

Source code in src/metaxy/ext/metadata_stores/lancedb.py
def __init__(
    self,
    uri: str | Path,
    *,
    fallback_stores: list[MetadataStore] | None = None,
    connect_kwargs: dict[str, Any] | None = None,
    **kwargs: Any,
):
    """
    Initialize [LanceDB](https://lancedb.com/docs/) metadata store.

    The database directory is created automatically if it doesn't exist (local paths only).
    Tables are created on-demand when features are first written.

    Args:
        uri: Directory path or URI for LanceDB tables. Supports:

            - **Local path**: `"./metadata"` or `Path("/data/metaxy/lancedb")`

            - **Object stores**: `s3://`, `gs://`, `az://` (requires cloud credentials)

            - **LanceDB Cloud**: `"db://database-name"` (requires API key)

            - **Remote HTTP/HTTPS**: Any URI supported by LanceDB

        fallback_stores: Ordered list of read-only fallback stores.
            When reading features not found in this store, Metaxy searches
            fallback stores in order. Useful for local dev โ†’ staging โ†’ production chains.
        connect_kwargs: Extra keyword arguments passed directly to
            [lancedb.connect()](https://lancedb.github.io/lancedb/python/python/#lancedb.connect).
            Useful for LanceDB Cloud credentials (api_key, region) when you cannot
            rely on environment variables.
        **kwargs: Passed to [metaxy.metadata_store.base.MetadataStore][]
            (e.g., hash_algorithm, hash_truncation_length, prefer_native)

    Note:
        Unlike SQL stores, LanceDB doesn't require explicit table creation.
        Tables are created automatically when writing metadata.
    """
    self.uri: str = str(uri)
    self._conn: Any | None = None
    self._connect_kwargs = connect_kwargs or {}
    super().__init__(
        fallback_stores=fallback_stores,
        auto_create_tables=True,
        **kwargs,
    )

Configuration

Configuration for LanceDBMetadataStore.

Example
metaxy.toml
[stores.dev]
type = "metaxy.ext.metadata_stores.lancedb.LanceDBMetadataStore"

[stores.dev.config]
uri = "/path/to/featuregraph"

[stores.dev.config.connect_kwargs]
api_key = "your-api-key"
Show JSON schema:
{
  "$defs": {
    "HashAlgorithm": {
      "description": "Supported hash algorithms for field provenance calculation.\n\nThese algorithms are chosen for:\n- Speed (non-cryptographic hashes preferred)\n- Cross-database availability\n- Good collision resistance for field provenance calculation",
      "enum": [
        "xxhash64",
        "xxhash32",
        "wyhash",
        "sha256",
        "md5",
        "farmhash"
      ],
      "title": "HashAlgorithm",
      "type": "string"
    }
  },
  "additionalProperties": false,
  "description": "Configuration for LanceDBMetadataStore.\n\nExample:\n    ```toml title=\"metaxy.toml\"\n    [stores.dev]\n    type = \"metaxy.ext.metadata_stores.lancedb.LanceDBMetadataStore\"\n\n    [stores.dev.config]\n    uri = \"/path/to/featuregraph\"\n\n    [stores.dev.config.connect_kwargs]\n    api_key = \"your-api-key\"\n    ```",
  "properties": {
    "fallback_stores": {
      "description": "List of fallback store names to search when features are not found in the current store.",
      "items": {
        "type": "string"
      },
      "title": "Fallback Stores",
      "type": "array"
    },
    "hash_algorithm": {
      "anyOf": [
        {
          "$ref": "#/$defs/HashAlgorithm"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Hash algorithm for versioning. If None, uses store's default."
    },
    "versioning_engine": {
      "default": "auto",
      "description": "Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.",
      "enum": [
        "auto",
        "native",
        "polars"
      ],
      "title": "Versioning Engine",
      "type": "string"
    },
    "uri": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "format": "path",
          "type": "string"
        }
      ],
      "description": "Directory path or URI for LanceDB tables.",
      "title": "Uri"
    },
    "connect_kwargs": {
      "anyOf": [
        {
          "additionalProperties": true,
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Extra keyword arguments passed to lancedb.connect().",
      "title": "Connect Kwargs"
    }
  },
  "required": [
    "uri"
  ],
  "title": "LanceDBMetadataStoreConfig",
  "type": "object"
}

metaxy.ext.metadata_stores.lancedb.LanceDBMetadataStoreConfig.fallback_stores pydantic-field

fallback_stores: list[str]

List of fallback store names to search when features are not found in the current store.

[stores.dev.config]
fallback_stores = []
[tool.metaxy.stores.dev.config]
fallback_stores = []
export METAXY_STORES__DEV__CONFIG__FALLBACK_STORES=[]

metaxy.ext.metadata_stores.lancedb.LanceDBMetadataStoreConfig.hash_algorithm pydantic-field

hash_algorithm: HashAlgorithm | None = None

Hash algorithm for versioning. If None, uses store's default.

[stores.dev.config]
hash_algorithm = "..."
[tool.metaxy.stores.dev.config]
hash_algorithm = "..."
export METAXY_STORES__DEV__CONFIG__HASH_ALGORITHM=...

metaxy.ext.metadata_stores.lancedb.LanceDBMetadataStoreConfig.versioning_engine pydantic-field

versioning_engine: Literal["auto", "native", "polars"] = (
    "auto"
)

Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.

[stores.dev.config]
versioning_engine = "auto"
[tool.metaxy.stores.dev.config]
versioning_engine = "auto"
export METAXY_STORES__DEV__CONFIG__VERSIONING_ENGINE=auto

metaxy.ext.metadata_stores.lancedb.LanceDBMetadataStoreConfig.uri pydantic-field

uri: str | Path

Directory path or URI for LanceDB tables.

[stores.dev.config]
uri = "..."
[tool.metaxy.stores.dev.config]
uri = "..."
export METAXY_STORES__DEV__CONFIG__URI=...

metaxy.ext.metadata_stores.lancedb.LanceDBMetadataStoreConfig.connect_kwargs pydantic-field

connect_kwargs: dict[str, Any] | None = None

Extra keyword arguments passed to lancedb.connect().

[stores.dev.config]
connect_kwargs = {}
[tool.metaxy.stores.dev.config]
connect_kwargs = {}
export METAXY_STORES__DEV__CONFIG__CONNECT_KWARGS=...