Skip to content

LanceDB

Experimental

This functionality is experimental.

LanceDB is an vector database built on the Lance columnar format. To use Metaxy with LanceDB, configure LanceDBMetadataStore. It uses the in-memory Polars engine for versioning computations. LanceDB handles schema evolution, transactions, and compaction automatically.

It runs embedded (local directory) or against external storage (object stores, HTTP endpoints, LanceDB Cloud), so you can use the same store type for local development and cloud workloads.

Installation

The backend relies on lancedb, which is shipped with Metaxy's lancedb extras.

pip install 'metaxy[lancedb]'

Storage Targets

Point uri at any supported URI (s3://, gs://, az://, db://, ...) and forward credentials with the platform's native mechanism (environment variables, IAM roles, workload identity, etc.). LanceDB supports local filesystem, S3, GCS, Azure, LanceDB Cloud, and remote HTTP/HTTPS endpoints.

Storage Layout

All tables are stored within a single LanceDB database at the configured URI location. Each feature gets its own Lance table.


metaxy.ext.metadata_stores.lancedb

LanceDB metadata store implementation.

metaxy.ext.metadata_stores.lancedb.LanceDBMetadataStore

LanceDBMetadataStore(
    uri: str | Path,
    *,
    fallback_stores: list[MetadataStore] | None = None,
    connect_kwargs: dict[str, Any] | None = None,
    **kwargs: Any,
)

Bases: MetadataStore

LanceDB metadata store for vector and structured data.

LanceDB is a columnar database optimized for vector search and multimodal data. Each feature is stored in its own Lance table within the database directory. Uses Polars components for data processing (no native SQL execution).

Storage layout:

  • Each feature gets its own table: {namespace}__{feature_name}

  • Tables are stored as Lance format in the directory specified by the URI

  • LanceDB handles schema evolution, transactions, and compaction automatically

Local Directory
from pathlib import Path
from metaxy.ext.metadata_stores.lancedb import LanceDBMetadataStore

# Local filesystem
store = LanceDBMetadataStore(Path("/path/to/featuregraph"))
Object Storage (S3, GCS, Azure)
# object store (requires credentials)
store = LanceDBMetadataStore("s3:///path/to/featuregraph")
LanceDB Cloud
import os

# Option 1: Environment variable
os.environ["LANCEDB_API_KEY"] = "your-api-key"
store = LanceDBMetadataStore("db://my-database")

# Option 2: Explicit credentials
store = LanceDBMetadataStore(
    "db://my-database", connect_kwargs={"api_key": "your-api-key", "region": "us-east-1"}
)

The database directory is created automatically if it doesn't exist (local paths only). Tables are created on-demand when features are first written.

Parameters:

  • uri (str | Path) –

    Directory path or URI for LanceDB tables. Supports:

    • Local path: "./metadata" or Path("/data/metaxy/lancedb")

    • Object stores: s3://, gs://, az:// (requires cloud credentials)

    • LanceDB Cloud: "db://database-name" (requires API key)

    • Remote HTTP/HTTPS: Any URI supported by LanceDB

  • fallback_stores (list[MetadataStore] | None, default: None ) –

    Ordered list of read-only fallback stores. When reading features not found in this store, Metaxy searches fallback stores in order. Useful for local dev β†’ staging β†’ production chains.

  • connect_kwargs (dict[str, Any] | None, default: None ) –

    Extra keyword arguments passed directly to lancedb.connect(). Useful for LanceDB Cloud credentials (api_key, region) when you cannot rely on environment variables.

  • **kwargs (Any, default: {} ) –

    Passed to metaxy.metadata_store.base.MetadataStore (e.g., hash_algorithm, hash_truncation_length, prefer_native)

Note

Unlike SQL stores, LanceDB doesn't require explicit table creation. Tables are created automatically when writing metadata.

Source code in src/metaxy/ext/metadata_stores/lancedb.py
def __init__(
    self,
    uri: str | Path,
    *,
    fallback_stores: list[MetadataStore] | None = None,
    connect_kwargs: dict[str, Any] | None = None,
    **kwargs: Any,
):
    """
    Initialize [LanceDB](https://lancedb.com/docs/) metadata store.

    The database directory is created automatically if it doesn't exist (local paths only).
    Tables are created on-demand when features are first written.

    Args:
        uri: Directory path or URI for LanceDB tables. Supports:

            - **Local path**: `"./metadata"` or `Path("/data/metaxy/lancedb")`

            - **Object stores**: `s3://`, `gs://`, `az://` (requires cloud credentials)

            - **LanceDB Cloud**: `"db://database-name"` (requires API key)

            - **Remote HTTP/HTTPS**: Any URI supported by LanceDB

        fallback_stores: Ordered list of read-only fallback stores.
            When reading features not found in this store, Metaxy searches
            fallback stores in order. Useful for local dev β†’ staging β†’ production chains.
        connect_kwargs: Extra keyword arguments passed directly to
            [lancedb.connect()](https://lancedb.github.io/lancedb/python/python/#lancedb.connect).
            Useful for LanceDB Cloud credentials (api_key, region) when you cannot
            rely on environment variables.
        **kwargs: Passed to [metaxy.metadata_store.base.MetadataStore][]
            (e.g., hash_algorithm, hash_truncation_length, prefer_native)

    Note:
        Unlike SQL stores, LanceDB doesn't require explicit table creation.
        Tables are created automatically when writing metadata.
    """
    self.uri: str = str(uri)
    self._conn: Any | None = None
    self._connect_kwargs = connect_kwargs or {}
    super().__init__(
        fallback_stores=fallback_stores,
        auto_create_tables=True,
        **kwargs,
    )

Configuration

fallback_stores

List of fallback store names to search when features are not found in the current store.

Type: list[str]

[stores.dev.config]
# Optional
# fallback_stores = []
[tool.metaxy.stores.dev.config]
# Optional
# fallback_stores = []
export METAXY_STORES__DEV__CONFIG__FALLBACK_STORES=...

hash_algorithm

Hash algorithm for versioning. If None, uses store's default.

Type: metaxy.versioning.types.HashAlgorithm | None

[stores.dev.config]
# Optional
# hash_algorithm = null
[tool.metaxy.stores.dev.config]
# Optional
# hash_algorithm = null
export METAXY_STORES__DEV__CONFIG__HASH_ALGORITHM=...

versioning_engine

Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.

Type: Literal['auto', 'native', 'polars'] | Default: "auto"

[stores.dev.config]
versioning_engine = "auto"
[tool.metaxy.stores.dev.config]
versioning_engine = "auto"
export METAXY_STORES__DEV__CONFIG__VERSIONING_ENGINE=auto

uri

Directory path or URI for LanceDB tables.

Type: str | pathlib.Path

[stores.dev.config]
# Optional
# uri = null
[tool.metaxy.stores.dev.config]
# Optional
# uri = null
export METAXY_STORES__DEV__CONFIG__URI=...

connect_kwargs

Extra keyword arguments passed to lancedb.connect().

Type: dict[str, Any | None

[stores.dev.config]
# Optional
# connect_kwargs = {}
[tool.metaxy.stores.dev.config]
# Optional
# connect_kwargs = {}
export METAXY_STORES__DEV__CONFIG__CONNECT_KWARGS=...