Metaxy + DuckDB¶
DuckDB is an embedded analytical database. To use Metaxy with DuckDB, configure DuckDBMetadataStore. This runs versioning computations natively in DuckDB.
Warning
File-based DuckDB does not (currently) support concurrent writes. If multiple writers are a requirement (e.g. with distributed data processing), consider using Motherduck, DuckLake with a PostgreSQL catalog, or refer to DuckDB's documentation to learn about implementing application-side work-arounds.
Tip
The Delta Lake metadata store might be a better alternative for concurrent writes (with it's Polars-based versioning engine being as fast as DuckDB).
Installation¶
API Reference¶
metaxy.ext.metadata_stores.duckdb
¶
DuckDB metadata store - thin wrapper around IbisMetadataStore.
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStore
¶
DuckDBMetadataStore(
database: str | Path,
*,
config: dict[str, str] | None = None,
extensions: Sequence[str | ExtensionSpec] | None = None,
fallback_stores: list[MetadataStore] | None = None,
ducklake: DuckLakeConfig | None = None,
**kwargs,
)
Bases: IbisMetadataStore
DuckDB metadata store using Ibis backend.
With extensions
Parameters:
-
database(str | Path) –Database connection string or path. - File path:
"metadata.db"orPath("metadata.db")-
In-memory:
":memory:" -
MotherDuck:
"md:my_database"or"md:my_database?motherduck_token=..." -
S3:
"s3://bucket/path/database.duckdb"(read-only via ATTACH) -
HTTPS:
"https://example.com/database.duckdb"(read-only via ATTACH) -
Any valid DuckDB connection string
-
-
config(dict[str, str] | None, default:None) –Optional DuckDB configuration settings (e.g., {'threads': '4', 'memory_limit': '4GB'})
-
extensions(Sequence[str | ExtensionSpec] | None, default:None) –List of DuckDB extensions to install and load on open. Supports strings (assumes
"core"repo) or metaxy.ext.metadata_stores.duckdb.ExtensionSpec instances. -
ducklake(DuckLakeConfig | None, default:None) – -
fallback_stores(list[MetadataStore] | None, default:None) –Ordered list of read-only fallback stores.
Source code in src/metaxy/ext/metadata_stores/duckdb.py
def __init__(
self,
database: str | Path,
*,
config: dict[str, str] | None = None,
extensions: Sequence[str | ExtensionSpec] | None = None,
fallback_stores: list["MetadataStore"] | None = None,
ducklake: DuckLakeConfig | None = None,
**kwargs,
):
"""
Initialize [DuckDB](https://duckdb.org/) metadata store.
Args:
database: Database connection string or path.
- File path: `"metadata.db"` or `Path("metadata.db")`
- In-memory: `":memory:"`
- MotherDuck: `"md:my_database"` or `"md:my_database?motherduck_token=..."`
- S3: `"s3://bucket/path/database.duckdb"` (read-only via ATTACH)
- HTTPS: `"https://example.com/database.duckdb"` (read-only via ATTACH)
- Any valid DuckDB connection string
config: Optional DuckDB configuration settings (e.g., {'threads': '4', 'memory_limit': '4GB'})
extensions: List of DuckDB extensions to install and load on open.
Supports strings (assumes `"core"` repo) or
[metaxy.ext.metadata_stores.duckdb.ExtensionSpec][] instances.
ducklake: Optional [DuckLake](https://ducklake.select/) attachment configuration.
Learn more [here](/integrations/metadata-stores/storage/ducklake.md).
fallback_stores: Ordered list of read-only fallback stores.
"""
database_str = str(database)
connection_params = {"database": database_str}
if config:
connection_params.update(config)
self.database = database_str
self.extensions: list[ExtensionSpec] = _normalise_extensions(extensions or [])
self._ducklake_config: DuckLakeConfig | None = None
self._ducklake_attachment: DuckLakeAttachmentManager | None = None
if ducklake is not None:
existing_names = {ext.name for ext in self.extensions}
if "ducklake" not in existing_names:
self.extensions.append(ExtensionSpec(name="ducklake"))
if isinstance(ducklake.catalog, MotherDuckCatalogConfig) and "motherduck" not in existing_names:
self.extensions.append(ExtensionSpec(name="motherduck"))
self._ducklake_config = ducklake
self._ducklake_attachment = DuckLakeAttachmentManager(ducklake, store_name=kwargs.get("name"))
if "hashfuncs" not in {ext.name for ext in self.extensions}:
self.extensions.append(ExtensionSpec(name="hashfuncs", repository="community"))
super().__init__(
backend="duckdb",
connection_params=connection_params,
fallback_stores=fallback_stores,
**kwargs,
)
metaxy.ext.metadata_stores.duckdb.ExtensionSpec
pydantic-model
¶
Bases: BaseModel
DuckDB extension specification accepted by DuckDBMetadataStore.
Show JSON schema:
{
"description": "DuckDB extension specification accepted by DuckDBMetadataStore.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"repository": {
"default": "core",
"title": "Repository",
"type": "string"
},
"init_sql": {
"default": [],
"items": {
"type": "string"
},
"title": "Init Sql",
"type": "array"
}
},
"required": [
"name"
],
"title": "ExtensionSpec",
"type": "object"
}
Configuration¶
Configuration for DuckDBMetadataStore.
Example
Show JSON schema:
{
"$defs": {
"DuckDBCatalogConfig": {
"description": "DuckDB file-based metadata backend for [DuckLake](https://ducklake.select/).",
"properties": {
"type": {
"const": "duckdb",
"default": "duckdb",
"title": "Type",
"type": "string"
},
"uri": {
"title": "Uri",
"type": "string"
}
},
"required": [
"uri"
],
"title": "DuckDBCatalogConfig",
"type": "object"
},
"DuckLakeConfig": {
"description": "[DuckLake](https://ducklake.select/) attachment configuration for a DuckDB connection.",
"properties": {
"catalog": {
"description": "Metadata catalog backend (DuckDB, SQLite, PostgreSQL, or MotherDuck).",
"discriminator": {
"mapping": {
"duckdb": "#/$defs/DuckDBCatalogConfig",
"motherduck": "#/$defs/MotherDuckCatalogConfig",
"postgres": "#/$defs/PostgresCatalogConfig",
"sqlite": "#/$defs/SQLiteCatalogConfig"
},
"propertyName": "type"
},
"oneOf": [
{
"$ref": "#/$defs/DuckDBCatalogConfig"
},
{
"$ref": "#/$defs/SQLiteCatalogConfig"
},
{
"$ref": "#/$defs/PostgresCatalogConfig"
},
{
"$ref": "#/$defs/MotherDuckCatalogConfig"
}
],
"title": "Catalog"
},
"storage": {
"anyOf": [
{
"discriminator": {
"mapping": {
"gcs": "#/$defs/GCSStorageConfig",
"local": "#/$defs/LocalStorageConfig",
"r2": "#/$defs/R2StorageConfig",
"s3": "#/$defs/S3StorageConfig"
},
"propertyName": "type"
},
"oneOf": [
{
"$ref": "#/$defs/LocalStorageConfig"
},
{
"$ref": "#/$defs/S3StorageConfig"
},
{
"$ref": "#/$defs/R2StorageConfig"
},
{
"$ref": "#/$defs/GCSStorageConfig"
}
]
},
{
"type": "null"
}
],
"default": null,
"description": "Data storage backend (local filesystem, S3, R2, or GCS). Not required for MotherDuck.",
"title": "Storage"
},
"alias": {
"default": "ducklake",
"description": "DuckDB catalog alias for the attached DuckLake database.",
"title": "Alias",
"type": "string"
},
"attach_options": {
"additionalProperties": true,
"description": "Extra [DuckLake](https://ducklake.select/) ATTACH options (e.g., api_version, override_data_path).",
"title": "Attach Options",
"type": "object"
},
"data_inlining_row_limit": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null,
"description": "Store inserts smaller than this row count directly in the metadata catalog instead of creating Parquet files.",
"title": "Data Inlining Row Limit"
}
},
"required": [
"catalog"
],
"title": "DuckLakeConfig",
"type": "object"
},
"ExtensionSpec": {
"description": "DuckDB extension specification accepted by DuckDBMetadataStore.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"repository": {
"default": "core",
"title": "Repository",
"type": "string"
},
"init_sql": {
"default": [],
"items": {
"type": "string"
},
"title": "Init Sql",
"type": "array"
}
},
"required": [
"name"
],
"title": "ExtensionSpec",
"type": "object"
},
"GCSStorageConfig": {
"description": "Google Cloud Storage backend for [DuckLake](https://ducklake.select/).\n\nUses the DuckDB [`TYPE GCS`](https://duckdb.org/docs/stable/core_extensions/httpfs/s3api#gcs-secrets) secret.",
"properties": {
"type": {
"const": "gcs",
"default": "gcs",
"title": "Type",
"type": "string"
},
"key_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Key Id"
},
"secret": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Secret"
},
"data_path": {
"title": "Data Path",
"type": "string"
},
"secret_name": {
"title": "Secret Name",
"type": "string"
},
"secret_parameters": {
"anyOf": [
{
"additionalProperties": true,
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"title": "Secret Parameters"
}
},
"required": [
"data_path",
"secret_name"
],
"title": "GCSStorageConfig",
"type": "object"
},
"HashAlgorithm": {
"description": "Supported hash algorithms for field provenance calculation.\n\nThese algorithms are chosen for:\n- Speed (non-cryptographic hashes preferred)\n- Cross-database availability\n- Good collision resistance for field provenance calculation",
"enum": [
"xxhash64",
"xxhash32",
"wyhash",
"sha256",
"md5",
"farmhash"
],
"title": "HashAlgorithm",
"type": "string"
},
"LocalStorageConfig": {
"description": "Local filesystem storage backend for DuckLake.",
"properties": {
"type": {
"const": "local",
"default": "local",
"title": "Type",
"type": "string"
},
"path": {
"title": "Path",
"type": "string"
}
},
"required": [
"path"
],
"title": "LocalStorageConfig",
"type": "object"
},
"MotherDuckCatalogConfig": {
"description": "[MotherDuck](https://motherduck.com/)-managed metadata backend for [DuckLake](https://ducklake.select/).",
"properties": {
"type": {
"const": "motherduck",
"default": "motherduck",
"title": "Type",
"type": "string"
},
"database": {
"title": "Database",
"type": "string"
},
"region": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "AWS region of the MotherDuck-managed S3 storage (e.g. 'eu-central-1').",
"title": "Region"
}
},
"required": [
"database"
],
"title": "MotherDuckCatalogConfig",
"type": "object"
},
"PostgresCatalogConfig": {
"description": "PostgreSQL metadata backend for [DuckLake](https://ducklake.select/).",
"properties": {
"type": {
"const": "postgres",
"default": "postgres",
"title": "Type",
"type": "string"
},
"database": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Database"
},
"user": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "User"
},
"password": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Password"
},
"host": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Host"
},
"port": {
"default": 5432,
"title": "Port",
"type": "integer"
},
"secret_name": {
"title": "Secret Name",
"type": "string"
},
"secret_parameters": {
"anyOf": [
{
"additionalProperties": true,
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"title": "Secret Parameters"
}
},
"required": [
"secret_name"
],
"title": "PostgresCatalogConfig",
"type": "object"
},
"R2StorageConfig": {
"description": "Cloudflare R2 storage backend for [DuckLake](https://ducklake.select/).\n\nUses the DuckDB [`TYPE R2`](https://duckdb.org/docs/stable/core_extensions/httpfs/s3api#r2-secrets) secret.",
"properties": {
"type": {
"const": "r2",
"default": "r2",
"title": "Type",
"type": "string"
},
"key_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Key Id"
},
"secret": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Secret"
},
"account_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Account Id"
},
"data_path": {
"title": "Data Path",
"type": "string"
},
"secret_name": {
"title": "Secret Name",
"type": "string"
},
"secret_parameters": {
"anyOf": [
{
"additionalProperties": true,
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"title": "Secret Parameters"
}
},
"required": [
"data_path",
"secret_name"
],
"title": "R2StorageConfig",
"type": "object"
},
"S3StorageConfig": {
"description": "[S3 storage](https://duckdb.org/docs/stable/core_extensions/httpfs/s3api) backend for DuckLake.",
"properties": {
"type": {
"const": "s3",
"default": "s3",
"title": "Type",
"type": "string"
},
"key_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Key Id"
},
"secret": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Secret"
},
"endpoint": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Endpoint"
},
"bucket": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Bucket"
},
"prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Prefix"
},
"region": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Region"
},
"url_style": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Url Style"
},
"use_ssl": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"title": "Use Ssl"
},
"scope": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Scope"
},
"data_path": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Data Path"
},
"secret_name": {
"title": "Secret Name",
"type": "string"
},
"secret_parameters": {
"anyOf": [
{
"additionalProperties": true,
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"title": "Secret Parameters"
}
},
"required": [
"secret_name"
],
"title": "S3StorageConfig",
"type": "object"
},
"SQLiteCatalogConfig": {
"description": "SQLite file-based metadata backend for [DuckLake](https://ducklake.select/).",
"properties": {
"type": {
"const": "sqlite",
"default": "sqlite",
"title": "Type",
"type": "string"
},
"uri": {
"title": "Uri",
"type": "string"
}
},
"required": [
"uri"
],
"title": "SQLiteCatalogConfig",
"type": "object"
}
},
"additionalProperties": false,
"description": "Configuration for DuckDBMetadataStore.\n\nExample:\n ```toml title=\"metaxy.toml\"\n [stores.dev]\n type = \"metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStore\"\n\n [stores.dev.config]\n database = \"metadata.db\"\n hash_algorithm = \"xxhash64\"\n ```",
"properties": {
"fallback_stores": {
"description": "List of fallback store names to search when features are not found in the current store.",
"items": {
"type": "string"
},
"title": "Fallback Stores",
"type": "array"
},
"hash_algorithm": {
"anyOf": [
{
"$ref": "#/$defs/HashAlgorithm"
},
{
"type": "null"
}
],
"default": null,
"description": "Hash algorithm for versioning. If None, uses store's default."
},
"versioning_engine": {
"default": "auto",
"description": "Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.",
"enum": [
"auto",
"native",
"polars"
],
"title": "Versioning Engine",
"type": "string"
},
"connection_string": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Ibis connection string (e.g., 'clickhouse://host:9000/db').",
"title": "Connection String"
},
"backend": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Ibis backend name (e.g., 'clickhouse', 'postgres', 'duckdb').",
"mkdocs_metaxy_hide": true,
"title": "Backend"
},
"connection_params": {
"anyOf": [
{
"additionalProperties": true,
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "Backend-specific connection parameters.",
"title": "Connection Params"
},
"table_prefix": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"description": "Optional prefix for all table names.",
"title": "Table Prefix"
},
"auto_create_tables": {
"anyOf": [
{
"type": "boolean"
},
{
"type": "null"
}
],
"default": null,
"description": "If True, create tables on open. For development/testing only.",
"title": "Auto Create Tables"
},
"database": {
"anyOf": [
{
"type": "string"
},
{
"format": "path",
"type": "string"
}
],
"description": "Database path (:memory:, file path, or md:database).",
"title": "Database"
},
"config": {
"anyOf": [
{
"additionalProperties": {
"type": "string"
},
"type": "object"
},
{
"type": "null"
}
],
"default": null,
"description": "DuckDB configuration settings (e.g., {'threads': '4'}).",
"title": "Config"
},
"extensions": {
"anyOf": [
{
"items": {
"anyOf": [
{
"type": "string"
},
{
"$ref": "#/$defs/ExtensionSpec"
}
]
},
"type": "array"
},
{
"type": "null"
}
],
"default": null,
"description": "DuckDB extensions to install and load on open. If only a string is provided, the `core` repository is assumed.",
"title": "Extensions"
},
"ducklake": {
"anyOf": [
{
"$ref": "#/$defs/DuckLakeConfig"
},
{
"type": "null"
}
],
"default": null,
"description": "DuckLake attachment configuration. Learn more [here](/integrations/metadata-stores/storage/ducklake.md)."
}
},
"required": [
"database"
],
"title": "DuckDBMetadataStoreConfig",
"type": "object"
}
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.fallback_stores
pydantic-field
¶
List of fallback store names to search when features are not found in the current store.
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.hash_algorithm
pydantic-field
¶
hash_algorithm: HashAlgorithm | None = None
Hash algorithm for versioning. If None, uses store's default.
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.versioning_engine
pydantic-field
¶
versioning_engine: Literal["auto", "native", "polars"] = (
"auto"
)
Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.connection_string
pydantic-field
¶
connection_string: str | None = None
Ibis connection string (e.g., 'clickhouse://host:9000/db').
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.connection_params
pydantic-field
¶
Backend-specific connection parameters.
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.table_prefix
pydantic-field
¶
table_prefix: str | None = None
Optional prefix for all table names.
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.auto_create_tables
pydantic-field
¶
auto_create_tables: bool | None = None
If True, create tables on open. For development/testing only.
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.database
pydantic-field
¶
Database path (:memory:, file path, or md:database).
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.config
pydantic-field
¶
DuckDB configuration settings (e.g., {'threads': '4'}).
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.extensions
pydantic-field
¶
extensions: Sequence[str | ExtensionSpec] | None = None
DuckDB extensions to install and load on open. If only a string is provided, the core repository is assumed.
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStoreConfig.ducklake
pydantic-field
¶
ducklake: DuckLakeConfig | None = None
DuckLake attachment configuration. Learn more here.