Skip to content

DuckDB

DuckDB is an embedded analytical database. To use Metaxy with DuckDB, configure DuckDBMetadataStore. This runs versioning computations natively in DuckDB.

Warning

File-based DuckDB does not (currently) support concurrent writes. If multiple writers are a requirement (e.g. with distributed data processing), consider either using DuckLake with a PostgreSQL catalog, or refer to DuckDB's documentation to learn about implementing application-side work-arounds.

Tip

The Delta Lake metadata store might be a better alternative for concurrent writes (with it's Polars-based versioning engine being as fast as DuckDB).

Installation

pip install 'metaxy[duckdb]'

Extensions

DuckDB extensions can be loaded automatically:

from metaxy.ext.metadata_stores.duckdb import DuckDBMetadataStore

store = DuckDBMetadataStore(":memory:", extensions=["hashfuncs"])

hashfuncs is typically used by the versioning engine.


metaxy.ext.metadata_stores.duckdb

DuckDB metadata store - thin wrapper around IbisMetadataStore.

metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStore

DuckDBMetadataStore(
    database: str | Path,
    *,
    config: dict[str, str] | None = None,
    extensions: Sequence[ExtensionInput] | None = None,
    fallback_stores: list[MetadataStore] | None = None,
    ducklake: DuckLakeConfigInput | None = None,
    **kwargs,
)

Bases: IbisMetadataStore

DuckDB metadata store using Ibis backend.

Local File
store = DuckDBMetadataStore("metadata.db")
In-memory database
# In-memory database
store = DuckDBMetadataStore(":memory:")
MotherDuck
# MotherDuck
store = DuckDBMetadataStore("md:my_database")
With extensions
# With extensions
store = DuckDBMetadataStore("metadata.db", hash_algorithm=HashAlgorithm.XXHASH64, extensions=["hashfuncs"])

Parameters:

  • database (str | Path) –

    Database connection string or path. - File path: "metadata.db" or Path("metadata.db")

    • In-memory: ":memory:"

    • MotherDuck: "md:my_database" or "md:my_database?motherduck_token=..."

    • S3: "s3://bucket/path/database.duckdb" (read-only via ATTACH)

    • HTTPS: "https://example.com/database.duckdb" (read-only via ATTACH)

    • Any valid DuckDB connection string

  • config (dict[str, str] | None, default: None ) –

    Optional DuckDB configuration settings (e.g., {'threads': '4', 'memory_limit': '4GB'})

  • extensions (Sequence[ExtensionInput] | None, default: None ) –

    List of DuckDB extensions to install and load on open. Supports strings (community repo), mapping-like objects with name/repository keys, or metaxy.ext.metadata_stores.duckdb.ExtensionSpec instances.

Optional DuckLake attachment configuration. Provide either a

mapping with 'metadata_backend' and 'storage_backend' entries or a DuckLakeAttachmentConfig instance. When supplied, the DuckDB connection is configured to ATTACH the DuckLake catalog after open(). fallback_stores: Ordered list of read-only fallback stores.

**kwargs: Passed to IbisMetadataStore`

Warning

Parent directories are NOT created automatically. Ensure paths exist before initializing the store.

Source code in src/metaxy/ext/metadata_stores/duckdb.py
def __init__(
    self,
    database: str | Path,
    *,
    config: dict[str, str] | None = None,
    extensions: Sequence[ExtensionInput] | None = None,
    fallback_stores: list["MetadataStore"] | None = None,
    ducklake: DuckLakeConfigInput | None = None,
    **kwargs,
):
    """
    Initialize [DuckDB](https://duckdb.org/) metadata store.

    Args:
        database: Database connection string or path.
            - File path: `"metadata.db"` or `Path("metadata.db")`

            - In-memory: `":memory:"`

            - MotherDuck: `"md:my_database"` or `"md:my_database?motherduck_token=..."`

            - S3: `"s3://bucket/path/database.duckdb"` (read-only via ATTACH)

            - HTTPS: `"https://example.com/database.duckdb"` (read-only via ATTACH)

            - Any valid DuckDB connection string

        config: Optional DuckDB configuration settings (e.g., {'threads': '4', 'memory_limit': '4GB'})
        extensions: List of DuckDB extensions to install and load on open.
            Supports strings (community repo), mapping-like objects with
            ``name``/``repository`` keys, or [metaxy.ext.metadata_stores.duckdb.ExtensionSpec][] instances.

    ducklake: Optional DuckLake attachment configuration. Provide either a
        mapping with 'metadata_backend' and 'storage_backend' entries or a
        DuckLakeAttachmentConfig instance. When supplied, the DuckDB
        connection is configured to ATTACH the DuckLake catalog after open().
        fallback_stores: Ordered list of read-only fallback stores.

        **kwargs: Passed to [`IbisMetadataStore`][metaxy.metadata_store.ibis.IbisMetadataStore]`

    Warning:
        Parent directories are NOT created automatically. Ensure paths exist
        before initializing the store.
    """
    database_str = str(database)

    # Build connection params for Ibis DuckDB backend
    # Ibis DuckDB backend accepts config params directly (not nested under 'config')
    connection_params = {"database": database_str}
    if config:
        connection_params.update(config)

    self.database = database_str
    base_extensions: list[NormalisedExtension] = _normalise_extensions(extensions or [])

    self._ducklake_config: DuckLakeAttachmentConfig | None = None
    self._ducklake_attachment: DuckLakeAttachmentManager | None = None
    if ducklake is not None:
        attachment_config, manager = build_ducklake_attachment(ducklake)
        ensure_extensions_with_plugins(base_extensions, attachment_config.plugins)
        self._ducklake_config = attachment_config
        self._ducklake_attachment = manager

    self.extensions = base_extensions

    # Auto-add hashfuncs extension if not present (needed for default XXHASH64)
    # But we'll fall back to MD5 if hashfuncs is not available
    extension_names: list[str] = []
    for ext in self.extensions:
        if isinstance(ext, str):
            extension_names.append(ext)
        elif isinstance(ext, ExtensionSpec):
            extension_names.append(ext.name)
        else:
            # After _normalise_extensions, this should not happen
            # But keep defensive check for type safety
            raise TypeError(f"Extension must be str or ExtensionSpec after normalization; got {type(ext)}")
    if "hashfuncs" not in extension_names:
        self.extensions.append("hashfuncs")

    # Initialize Ibis store with DuckDB backend
    super().__init__(
        backend="duckdb",
        connection_params=connection_params,
        fallback_stores=fallback_stores,
        **kwargs,
    )

metaxy.ext.metadata_stores.duckdb.ExtensionSpec pydantic-model

Bases: BaseModel

DuckDB extension specification accepted by DuckDBMetadataStore.

Supports additional keys for forward compatibility.

Show JSON schema:
{
  "additionalProperties": true,
  "description": "DuckDB extension specification accepted by DuckDBMetadataStore.\n\nSupports additional keys for forward compatibility.",
  "properties": {
    "name": {
      "title": "Name",
      "type": "string"
    },
    "repository": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Repository"
    }
  },
  "required": [
    "name"
  ],
  "title": "ExtensionSpec",
  "type": "object"
}

Config:

  • extra: allow

Fields:

  • name (str)
  • repository (str | None)

metaxy.ext.metadata_stores.duckdb.DuckLakeConfigInput module-attribute

DuckLakeConfigInput = (
    DuckLakeAttachmentConfig | Mapping[str, Any]
)

metaxy.ext.metadata_stores._ducklake_support.DuckLakeAttachmentConfig pydantic-model

Bases: BaseModel

Configuration payload used to attach DuckLake to a DuckDB connection.

Show JSON schema:
{
  "additionalProperties": true,
  "description": "Configuration payload used to attach DuckLake to a DuckDB connection.",
  "properties": {
    "metadata_backend": {
      "additionalProperties": true,
      "title": "Metadata Backend",
      "type": "object"
    },
    "storage_backend": {
      "additionalProperties": true,
      "title": "Storage Backend",
      "type": "object"
    },
    "alias": {
      "default": "ducklake",
      "title": "Alias",
      "type": "string"
    },
    "plugins": {
      "items": {
        "type": "string"
      },
      "title": "Plugins",
      "type": "array"
    },
    "attach_options": {
      "additionalProperties": true,
      "title": "Attach Options",
      "type": "object"
    }
  },
  "required": [
    "metadata_backend",
    "storage_backend"
  ],
  "title": "DuckLakeAttachmentConfig",
  "type": "object"
}

Config:

  • arbitrary_types_allowed: True
  • extra: allow

Fields:

  • metadata_backend (DuckLakeBackend)
  • storage_backend (DuckLakeBackend)
  • alias (str)
  • plugins (tuple[str, ...])
  • attach_options (dict[str, Any])

Validators:

  • _coerce_backendsmetadata_backend, storage_backend
  • _coerce_aliasalias
  • _coerce_pluginsplugins
  • _coerce_attach_optionsattach_options

Configuration

fallback_stores

List of fallback store names to search when features are not found in the current store.

Type: list[str]

[stores.dev.config]
# Optional
# fallback_stores = []
[tool.metaxy.stores.dev.config]
# Optional
# fallback_stores = []
export METAXY_STORES__DEV__CONFIG__FALLBACK_STORES=...

hash_algorithm

Hash algorithm for versioning. If None, uses store's default.

Type: metaxy.versioning.types.HashAlgorithm | None

[stores.dev.config]
# Optional
# hash_algorithm = null
[tool.metaxy.stores.dev.config]
# Optional
# hash_algorithm = null
export METAXY_STORES__DEV__CONFIG__HASH_ALGORITHM=...

versioning_engine

Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.

Type: Literal['auto', 'native', 'polars'] | Default: "auto"

[stores.dev.config]
versioning_engine = "auto"
[tool.metaxy.stores.dev.config]
versioning_engine = "auto"
export METAXY_STORES__DEV__CONFIG__VERSIONING_ENGINE=auto

connection_string

Ibis connection string (e.g., 'clickhouse://host:9000/db').

Type: str | None

[stores.dev.config]
# Optional
# connection_string = null
[tool.metaxy.stores.dev.config]
# Optional
# connection_string = null
export METAXY_STORES__DEV__CONFIG__CONNECTION_STRING=...

connection_params

Backend-specific connection parameters.

Type: dict[str, Any | None

[stores.dev.config]
# Optional
# connection_params = {}
[tool.metaxy.stores.dev.config]
# Optional
# connection_params = {}
export METAXY_STORES__DEV__CONFIG__CONNECTION_PARAMS=...

table_prefix

Optional prefix for all table names.

Type: str | None

[stores.dev.config]
# Optional
# table_prefix = null
[tool.metaxy.stores.dev.config]
# Optional
# table_prefix = null
export METAXY_STORES__DEV__CONFIG__TABLE_PREFIX=...

auto_create_tables

If True, create tables on open. For development/testing only.

Type: bool | None

[stores.dev.config]
# Optional
# auto_create_tables = null
[tool.metaxy.stores.dev.config]
# Optional
# auto_create_tables = null
export METAXY_STORES__DEV__CONFIG__AUTO_CREATE_TABLES=...

database

Database path (:memory:, file path, or md:database).

Type: str | pathlib.Path

[stores.dev.config]
# Optional
# database = null
[tool.metaxy.stores.dev.config]
# Optional
# database = null
export METAXY_STORES__DEV__CONFIG__DATABASE=...

config

DuckDB configuration settings (e.g., {'threads': '4'}).

Type: dict[str, str | None

[stores.dev.config]
# Optional
# config = {}
[tool.metaxy.stores.dev.config]
# Optional
# config = {}
export METAXY_STORES__DEV__CONFIG__CONFIG=...

extensions

DuckDB extensions to install and load on open.

Type: collections.abc.Sequence[str | metaxy.ext.metadata_stores.duckdb.ExtensionSpec | collections.abc.Mapping[str, Any | None

[stores.dev.config]
# Optional
# extensions = null
[tool.metaxy.stores.dev.config]
# Optional
# extensions = null
export METAXY_STORES__DEV__CONFIG__EXTENSIONS=...

ducklake

DuckLake attachment configuration.

metadata_backend

Type: metaxy.ext.metadata_stores._ducklake_support.SupportsDuckLakeParts | dict[str, Any

[stores.dev.config.ducklake]
# Optional
# metadata_backend = {}
[tool.metaxy.stores.dev.config.ducklake]
# Optional
# metadata_backend = {}
export METAXY_STORES__DEV__CONFIG__DUCKLAKE__METADATA_BACKEND=...

storage_backend

Type: metaxy.ext.metadata_stores._ducklake_support.SupportsDuckLakeParts | dict[str, Any

[stores.dev.config.ducklake]
# Optional
# storage_backend = {}
[tool.metaxy.stores.dev.config.ducklake]
# Optional
# storage_backend = {}
export METAXY_STORES__DEV__CONFIG__DUCKLAKE__STORAGE_BACKEND=...

alias

Type: str | Default: "ducklake"

[stores.dev.config.ducklake]
alias = "ducklake"
[tool.metaxy.stores.dev.config.ducklake]
alias = "ducklake"
export METAXY_STORES__DEV__CONFIG__DUCKLAKE__ALIAS=ducklake

plugins

Type: tuple[str, ...]

[stores.dev.config.ducklake]
# Optional
# plugins = null
[tool.metaxy.stores.dev.config.ducklake]
# Optional
# plugins = null
export METAXY_STORES__DEV__CONFIG__DUCKLAKE__PLUGINS=...

attach_options

Type: dict[str, Any]

[stores.dev.config.ducklake]
# Optional
# attach_options = {}
[tool.metaxy.stores.dev.config.ducklake]
# Optional
# attach_options = {}
export METAXY_STORES__DEV__CONFIG__DUCKLAKE__ATTACH_OPTIONS=...