LanceDB¶
Experimental
This functionality is experimental.
LanceDB is an vector database built on the Lance columnar format. To use Metaxy with LanceDB, configure LanceDBMetadataStore. It uses the in-memory Polars engine for versioning computations. LanceDB handles schema evolution, transactions, and compaction automatically.
It runs embedded (local directory) or against external storage (object stores, HTTP endpoints, LanceDB Cloud), so you can use the same store type for local development and cloud workloads.
Installation¶
The backend relies on lancedb, which is shipped with Metaxy's lancedb extras.
Storage Targets¶
Point uri at any supported URI (s3://, gs://, az://, db://, ...) and forward credentials with the platform's native mechanism (environment variables, IAM roles, workload identity, etc.). LanceDB supports local filesystem, S3, GCS, Azure, LanceDB Cloud, and remote HTTP/HTTPS endpoints.
Storage Layout¶
All tables are stored within a single LanceDB database at the configured URI location. Each feature gets its own Lance table.
metaxy.ext.metadata_stores.lancedb
¶
LanceDB metadata store implementation.
metaxy.ext.metadata_stores.lancedb.LanceDBMetadataStore
¶
LanceDBMetadataStore(
uri: str | Path,
*,
fallback_stores: list[MetadataStore] | None = None,
connect_kwargs: dict[str, Any] | None = None,
**kwargs: Any,
)
Bases: MetadataStore
LanceDB metadata store for vector and structured data.
LanceDB is a columnar database optimized for vector search and multimodal data. Each feature is stored in its own Lance table within the database directory. Uses Polars components for data processing (no native SQL execution).
Storage layout:
-
Each feature gets its own table:
{namespace}__{feature_name} -
Tables are stored as Lance format in the directory specified by the URI
-
LanceDB handles schema evolution, transactions, and compaction automatically
Local Directory
Object Storage (S3, GCS, Azure)
LanceDB Cloud
The database directory is created automatically if it doesn't exist (local paths only). Tables are created on-demand when features are first written.
Parameters:
-
uri(str | Path) βDirectory path or URI for LanceDB tables. Supports:
-
Local path:
"./metadata"orPath("/data/metaxy/lancedb") -
Object stores:
s3://,gs://,az://(requires cloud credentials) -
LanceDB Cloud:
"db://database-name"(requires API key) -
Remote HTTP/HTTPS: Any URI supported by LanceDB
-
-
fallback_stores(list[MetadataStore] | None, default:None) βOrdered list of read-only fallback stores. When reading features not found in this store, Metaxy searches fallback stores in order. Useful for local dev β staging β production chains.
-
connect_kwargs(dict[str, Any] | None, default:None) βExtra keyword arguments passed directly to lancedb.connect(). Useful for LanceDB Cloud credentials (api_key, region) when you cannot rely on environment variables.
-
**kwargs(Any, default:{}) βPassed to metaxy.metadata_store.base.MetadataStore (e.g., hash_algorithm, hash_truncation_length, prefer_native)
Note
Unlike SQL stores, LanceDB doesn't require explicit table creation. Tables are created automatically when writing metadata.
Source code in src/metaxy/ext/metadata_stores/lancedb.py
def __init__(
self,
uri: str | Path,
*,
fallback_stores: list[MetadataStore] | None = None,
connect_kwargs: dict[str, Any] | None = None,
**kwargs: Any,
):
"""
Initialize [LanceDB](https://lancedb.com/docs/) metadata store.
The database directory is created automatically if it doesn't exist (local paths only).
Tables are created on-demand when features are first written.
Args:
uri: Directory path or URI for LanceDB tables. Supports:
- **Local path**: `"./metadata"` or `Path("/data/metaxy/lancedb")`
- **Object stores**: `s3://`, `gs://`, `az://` (requires cloud credentials)
- **LanceDB Cloud**: `"db://database-name"` (requires API key)
- **Remote HTTP/HTTPS**: Any URI supported by LanceDB
fallback_stores: Ordered list of read-only fallback stores.
When reading features not found in this store, Metaxy searches
fallback stores in order. Useful for local dev β staging β production chains.
connect_kwargs: Extra keyword arguments passed directly to
[lancedb.connect()](https://lancedb.github.io/lancedb/python/python/#lancedb.connect).
Useful for LanceDB Cloud credentials (api_key, region) when you cannot
rely on environment variables.
**kwargs: Passed to [metaxy.metadata_store.base.MetadataStore][]
(e.g., hash_algorithm, hash_truncation_length, prefer_native)
Note:
Unlike SQL stores, LanceDB doesn't require explicit table creation.
Tables are created automatically when writing metadata.
"""
self.uri: str = str(uri)
self._conn: Any | None = None
self._connect_kwargs = connect_kwargs or {}
super().__init__(
fallback_stores=fallback_stores,
auto_create_tables=True,
**kwargs,
)
Configuration¶
fallback_stores¶
List of fallback store names to search when features are not found in the current store.
Type: list[str]
hash_algorithm¶
Hash algorithm for versioning. If None, uses store's default.
Type: metaxy.versioning.types.HashAlgorithm | None
versioning_engine¶
Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.
Type: Literal['auto', 'native', 'polars'] | Default: "auto"
uri¶
Directory path or URI for LanceDB tables.
Type: str | pathlib.Path
connect_kwargs¶
Extra keyword arguments passed to lancedb.connect().
Type: dict[str, Any | None