DuckDB¶
DuckDB is an embedded analytical database. To use Metaxy with DuckDB, configure DuckDBMetadataStore. This runs versioning computations natively in DuckDB.
Warning
File-based DuckDB does not (currently) support concurrent writes. If multiple writers are a requirement (e.g. with distributed data processing), consider either using DuckLake with a PostgreSQL catalog, or refer to DuckDB's documentation to learn about implementing application-side work-arounds.
Tip
The Delta Lake metadata store might be a better alternative for concurrent writes (with it's Polars-based versioning engine being as fast as DuckDB).
Installation¶
Extensions¶
DuckDB extensions can be loaded automatically:
from metaxy.ext.metadata_stores.duckdb import DuckDBMetadataStore
store = DuckDBMetadataStore(":memory:", extensions=["hashfuncs"])
hashfuncs is typically used by the versioning engine.
metaxy.ext.metadata_stores.duckdb
¶
DuckDB metadata store - thin wrapper around IbisMetadataStore.
metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStore
¶
DuckDBMetadataStore(
database: str | Path,
*,
config: dict[str, str] | None = None,
extensions: Sequence[ExtensionInput] | None = None,
fallback_stores: list[MetadataStore] | None = None,
ducklake: DuckLakeConfigInput | None = None,
**kwargs,
)
Bases: IbisMetadataStore
DuckDB metadata store using Ibis backend.
With extensions
Parameters:
-
database(str | Path) –Database connection string or path. - File path:
"metadata.db"orPath("metadata.db")-
In-memory:
":memory:" -
MotherDuck:
"md:my_database"or"md:my_database?motherduck_token=..." -
S3:
"s3://bucket/path/database.duckdb"(read-only via ATTACH) -
HTTPS:
"https://example.com/database.duckdb"(read-only via ATTACH) -
Any valid DuckDB connection string
-
-
config(dict[str, str] | None, default:None) –Optional DuckDB configuration settings (e.g., {'threads': '4', 'memory_limit': '4GB'})
-
extensions(Sequence[ExtensionInput] | None, default:None) –List of DuckDB extensions to install and load on open. Supports strings (community repo), mapping-like objects with
name/repositorykeys, or metaxy.ext.metadata_stores.duckdb.ExtensionSpec instances.
Optional DuckLake attachment configuration. Provide either a
mapping with 'metadata_backend' and 'storage_backend' entries or a DuckLakeAttachmentConfig instance. When supplied, the DuckDB connection is configured to ATTACH the DuckLake catalog after open(). fallback_stores: Ordered list of read-only fallback stores.
**kwargs: Passed to IbisMetadataStore`
Warning
Parent directories are NOT created automatically. Ensure paths exist before initializing the store.
Source code in src/metaxy/ext/metadata_stores/duckdb.py
def __init__(
self,
database: str | Path,
*,
config: dict[str, str] | None = None,
extensions: Sequence[ExtensionInput] | None = None,
fallback_stores: list["MetadataStore"] | None = None,
ducklake: DuckLakeConfigInput | None = None,
**kwargs,
):
"""
Initialize [DuckDB](https://duckdb.org/) metadata store.
Args:
database: Database connection string or path.
- File path: `"metadata.db"` or `Path("metadata.db")`
- In-memory: `":memory:"`
- MotherDuck: `"md:my_database"` or `"md:my_database?motherduck_token=..."`
- S3: `"s3://bucket/path/database.duckdb"` (read-only via ATTACH)
- HTTPS: `"https://example.com/database.duckdb"` (read-only via ATTACH)
- Any valid DuckDB connection string
config: Optional DuckDB configuration settings (e.g., {'threads': '4', 'memory_limit': '4GB'})
extensions: List of DuckDB extensions to install and load on open.
Supports strings (community repo), mapping-like objects with
``name``/``repository`` keys, or [metaxy.ext.metadata_stores.duckdb.ExtensionSpec][] instances.
ducklake: Optional DuckLake attachment configuration. Provide either a
mapping with 'metadata_backend' and 'storage_backend' entries or a
DuckLakeAttachmentConfig instance. When supplied, the DuckDB
connection is configured to ATTACH the DuckLake catalog after open().
fallback_stores: Ordered list of read-only fallback stores.
**kwargs: Passed to [`IbisMetadataStore`][metaxy.metadata_store.ibis.IbisMetadataStore]`
Warning:
Parent directories are NOT created automatically. Ensure paths exist
before initializing the store.
"""
database_str = str(database)
# Build connection params for Ibis DuckDB backend
# Ibis DuckDB backend accepts config params directly (not nested under 'config')
connection_params = {"database": database_str}
if config:
connection_params.update(config)
self.database = database_str
base_extensions: list[NormalisedExtension] = _normalise_extensions(extensions or [])
self._ducklake_config: DuckLakeAttachmentConfig | None = None
self._ducklake_attachment: DuckLakeAttachmentManager | None = None
if ducklake is not None:
attachment_config, manager = build_ducklake_attachment(ducklake)
ensure_extensions_with_plugins(base_extensions, attachment_config.plugins)
self._ducklake_config = attachment_config
self._ducklake_attachment = manager
self.extensions = base_extensions
# Auto-add hashfuncs extension if not present (needed for default XXHASH64)
# But we'll fall back to MD5 if hashfuncs is not available
extension_names: list[str] = []
for ext in self.extensions:
if isinstance(ext, str):
extension_names.append(ext)
elif isinstance(ext, ExtensionSpec):
extension_names.append(ext.name)
else:
# After _normalise_extensions, this should not happen
# But keep defensive check for type safety
raise TypeError(f"Extension must be str or ExtensionSpec after normalization; got {type(ext)}")
if "hashfuncs" not in extension_names:
self.extensions.append("hashfuncs")
# Initialize Ibis store with DuckDB backend
super().__init__(
backend="duckdb",
connection_params=connection_params,
fallback_stores=fallback_stores,
**kwargs,
)
metaxy.ext.metadata_stores.duckdb.ExtensionSpec
pydantic-model
¶
Bases: BaseModel
DuckDB extension specification accepted by DuckDBMetadataStore.
Supports additional keys for forward compatibility.
Show JSON schema:
{
"additionalProperties": true,
"description": "DuckDB extension specification accepted by DuckDBMetadataStore.\n\nSupports additional keys for forward compatibility.",
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"repository": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Repository"
}
},
"required": [
"name"
],
"title": "ExtensionSpec",
"type": "object"
}
Config:
extra:allow
Fields:
metaxy.ext.metadata_stores.duckdb.DuckLakeConfigInput
module-attribute
¶
DuckLakeConfigInput = (
DuckLakeAttachmentConfig | Mapping[str, Any]
)
metaxy.ext.metadata_stores._ducklake_support.DuckLakeAttachmentConfig
pydantic-model
¶
Bases: BaseModel
Configuration payload used to attach DuckLake to a DuckDB connection.
Show JSON schema:
{
"additionalProperties": true,
"description": "Configuration payload used to attach DuckLake to a DuckDB connection.",
"properties": {
"metadata_backend": {
"additionalProperties": true,
"title": "Metadata Backend",
"type": "object"
},
"storage_backend": {
"additionalProperties": true,
"title": "Storage Backend",
"type": "object"
},
"alias": {
"default": "ducklake",
"title": "Alias",
"type": "string"
},
"plugins": {
"items": {
"type": "string"
},
"title": "Plugins",
"type": "array"
},
"attach_options": {
"additionalProperties": true,
"title": "Attach Options",
"type": "object"
}
},
"required": [
"metadata_backend",
"storage_backend"
],
"title": "DuckLakeAttachmentConfig",
"type": "object"
}
Config:
arbitrary_types_allowed:Trueextra:allow
Fields:
-
metadata_backend(DuckLakeBackend) -
storage_backend(DuckLakeBackend) -
alias(str) -
plugins(tuple[str, ...]) -
attach_options(dict[str, Any])
Validators:
-
_coerce_backends→metadata_backend,storage_backend -
_coerce_alias→alias -
_coerce_plugins→plugins -
_coerce_attach_options→attach_options
Configuration¶
fallback_stores¶
List of fallback store names to search when features are not found in the current store.
Type: list[str]
hash_algorithm¶
Hash algorithm for versioning. If None, uses store's default.
Type: metaxy.versioning.types.HashAlgorithm | None
versioning_engine¶
Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.
Type: Literal['auto', 'native', 'polars'] | Default: "auto"
connection_string¶
Ibis connection string (e.g., 'clickhouse://host:9000/db').
Type: str | None
connection_params¶
Backend-specific connection parameters.
Type: dict[str, Any | None
table_prefix¶
Optional prefix for all table names.
Type: str | None
auto_create_tables¶
If True, create tables on open. For development/testing only.
Type: bool | None
database¶
Database path (:memory:, file path, or md:database).
Type: str | pathlib.Path
config¶
DuckDB configuration settings (e.g., {'threads': '4'}).
Type: dict[str, str | None
extensions¶
DuckDB extensions to install and load on open.
Type: collections.abc.Sequence[str | metaxy.ext.metadata_stores.duckdb.ExtensionSpec | collections.abc.Mapping[str, Any | None
ducklake¶
DuckLake attachment configuration.
metadata_backend¶
Type: metaxy.ext.metadata_stores._ducklake_support.SupportsDuckLakeParts | dict[str, Any
storage_backend¶
Type: metaxy.ext.metadata_stores._ducklake_support.SupportsDuckLakeParts | dict[str, Any
alias¶
Type: str | Default: "ducklake"
plugins¶
Type: tuple[str, ...]
attach_options¶
Type: dict[str, Any]