Skip to content

DuckLake

Overview

View Source on GitHub

This example demonstrates how to configure DuckLake with the DuckDB metadata store. DuckLake is an open lakehouse format that separates the metadata catalog (table definitions, schema evolution, and transaction history) from data file storage. This lets you choose independent backends for each layer, for example PostgreSQL for the catalog and S3 for data files.

We will set up a DuckLake-backed store via metaxy.toml and preview the SQL statements that DuckLake would execute when attaching to a DuckDB connection.

Getting Started

Install the example's dependencies:

uv sync

Configuration

DuckLake is configured in metaxy.toml with two parts: a metadata backend (where the catalog is stored) and a storage backend (where data files live).

The active configuration below uses SQLite for the metadata catalog and the local filesystem for data storage. Commented-out sections show alternative backends.

metaxy.toml
project = "example_ducklake"
entrypoints = ["example_ducklake.demo"]
auto_create_tables = true # Enable for development/examples

[stores.dev]
type = "metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStore"

[stores.dev.config]
database = "/tmp/ducklake_demo.db"

[stores.dev.config.ducklake]
alias = "ducklake"

[stores.dev.config.ducklake.attach_options]
override_data_path = true

# -- Metadata backend: SQLite (local, no extra dependencies) -----------------
[stores.dev.config.ducklake.metadata_backend]
type = "sqlite"
uri = "/tmp/ducklake_meta.db"

# -- Metadata backend: PostgreSQL ---------------------------------------------
# [stores.dev.config.ducklake.metadata_backend]
# type = "postgres"
# secret_name = "my_pg_secret"       # Required: name for the DuckDB secret
# host = "localhost"
# port = 5432
# database = "ducklake_meta"
# user = "ducklake"
# password = "changeme"
# # Extra parameters forwarded to DuckDB's CREATE SECRET (optional):
# # secret_parameters = { sslmode = "require" }
# # Omit inline credentials (host, database, user, password) to reference
# # a pre-existing DuckDB secret named by secret_name.

# -- Metadata backend: MotherDuck (managed, no storage_backend needed) --------
# [stores.dev.config.ducklake.metadata_backend]
# type = "motherduck"
# database = "my_lake"

# -- Metadata backend: MotherDuck BYOB (bring your own bucket) ----------------
# Use MotherDuck for the catalog but store data in your own S3 bucket.
# [stores.dev.config.ducklake.metadata_backend]
# type = "motherduck"
# database = "my_ducklake"
# region = "eu-central-1"
# [stores.dev.config.ducklake.storage_backend]
# type = "s3"
# secret_name = "my_s3_secret"
# key_id = "AKIA..."
# secret = "..."
# region = "eu-central-1"
# scope = "s3://mybucket/"
# bucket = "mybucket"

# -- Storage backend: local filesystem ----------------------------------------
[stores.dev.config.ducklake.storage_backend]
type = "local"
path = "/tmp/ducklake_storage"

# -- Storage backend: S3 -----------------------------------------------------
# [stores.dev.config.ducklake.storage_backend]
# type = "s3"
# secret_name = "my_s3_secret"       # Required: name for the DuckDB secret
# bucket = "my-ducklake-bucket"
# prefix = "data"
# region = "eu-central-1"
# # Provide inline credentials to create a secret, or omit them to reference
# # a pre-existing DuckDB secret named by secret_name.
# # key_id = "AKIA..."
# # secret = "secret"
# # For credential chain (IAM roles, env vars, etc.) instead of static credentials:
# # secret_parameters = { provider = "credential_chain" }
# # Extra parameters forwarded to DuckDB's CREATE SECRET (optional):
# # secret_parameters = { kms_key_id = "arn:aws:kms:..." }

# -- Storage backend: Cloudflare R2 ------------------------------------------
# [stores.dev.config.ducklake.storage_backend]
# type = "r2"
# secret_name = "my_r2_secret"       # Required: name for the DuckDB secret
# account_id = "your-cloudflare-account-id"
# data_path = "r2://my-bucket/ducklake/"
# # key_id = "R2_ACCESS_KEY"
# # secret = "R2_SECRET_KEY"

# -- Storage backend: Google Cloud Storage ------------------------------------
# [stores.dev.config.ducklake.storage_backend]
# type = "gcs"
# secret_name = "my_gcs_secret"      # Required: name for the DuckDB secret
# data_path = "gs://my-bucket/ducklake/"
# # key_id = "HMAC_ACCESS_KEY"
# # secret = "HMAC_SECRET"

Available backend combinations:

Metadata backend Storage backend
DuckDB, SQLite, PostgreSQL local filesystem, S3, Cloudflare R2, Google Cloud Storage
MotherDuck managed (no storage backend needed), or BYOB with S3/R2/GCS

Tip

To use the credential chain (IAM roles, environment variables, etc.) instead of static credentials, set secret_parameters = { provider = "credential_chain" } on S3, R2, or GCS storage backends.

Note

MotherDuck supports a "Bring Your Own Bucket" (BYOB) mode where MotherDuck manages the DuckLake catalog while you provide your own S3-compatible storage. Storage secrets are created IN MOTHERDUCK so that MotherDuck compute can access your bucket.

Walkthrough

The demo script initializes the store from configuration and previews the SQL statements that would be executed:

src/example_ducklake/demo.py
"""Demonstration of configuring the DuckLake metadata store.

This example mirrors the Dagster DuckLake resource behaviour while staying
Narwhals-compatible. No DuckLake installation is required to run this script;
we preview the SQL statements that would be executed when attaching DuckLake.
"""

import metaxy as mx
from metaxy.ext.metadata_stores.duckdb import DuckDBMetadataStore


def preview_attachment_sql(store: DuckDBMetadataStore) -> list[str]:
    """Return the SQL statements DuckLake would execute on open()."""
    return store.preview_ducklake_sql()


if __name__ == "__main__":
    # Initialize metaxy and get config (searches for metaxy.toml)
    config = mx.init()
    ducklake_store = config.get_store()
    assert isinstance(ducklake_store, DuckDBMetadataStore), (
        "DuckLake example misconfigured: expected DuckDBMetadataStore."
    )
    ducklake_store.ducklake_attachment
    print("DuckLake store initialised. Extensions:", ducklake_store.extensions)
    print("\nPreview of DuckLake ATTACH SQL:")
    for line in preview_attachment_sql(ducklake_store):
        print("  ", line)

Run the demo:

uv run python src/example_ducklake/demo.py

The output shows the full sequence of SQL statements: creating secrets for the metadata and storage backends, and attaching the DuckLake database.