DuckLake¶
Overview¶
This example demonstrates how to configure DuckLake with the DuckDB metadata store. DuckLake is an open lakehouse format that separates the metadata catalog (table definitions, schema evolution, and transaction history) from data file storage. This lets you choose independent backends for each layer, for example PostgreSQL for the catalog and S3 for data files.
We will set up a DuckLake-backed store via metaxy.toml and preview the SQL statements that DuckLake would execute when attaching to a DuckDB connection.
Getting Started¶
Install the example's dependencies:
Configuration¶
DuckLake is configured in metaxy.toml with two parts: a metadata backend (where the catalog is stored) and a storage backend (where data files live).
The active configuration below uses SQLite for the metadata catalog and the local filesystem for data storage. Commented-out sections show alternative backends.
project = "example_ducklake"
entrypoints = ["example_ducklake.demo"]
auto_create_tables = true # Enable for development/examples
[stores.dev]
type = "metaxy.ext.metadata_stores.duckdb.DuckDBMetadataStore"
[stores.dev.config]
database = "/tmp/ducklake_demo.db"
[stores.dev.config.ducklake]
alias = "ducklake"
[stores.dev.config.ducklake.attach_options]
override_data_path = true
# -- Metadata backend: SQLite (local, no extra dependencies) -----------------
[stores.dev.config.ducklake.metadata_backend]
type = "sqlite"
uri = "/tmp/ducklake_meta.db"
# -- Metadata backend: PostgreSQL ---------------------------------------------
# [stores.dev.config.ducklake.metadata_backend]
# type = "postgres"
# secret_name = "my_pg_secret" # Required: name for the DuckDB secret
# host = "localhost"
# port = 5432
# database = "ducklake_meta"
# user = "ducklake"
# password = "changeme"
# # Extra parameters forwarded to DuckDB's CREATE SECRET (optional):
# # secret_parameters = { sslmode = "require" }
# # Omit inline credentials (host, database, user, password) to reference
# # a pre-existing DuckDB secret named by secret_name.
# -- Metadata backend: MotherDuck (managed, no storage_backend needed) --------
# [stores.dev.config.ducklake.metadata_backend]
# type = "motherduck"
# database = "my_lake"
# -- Metadata backend: MotherDuck BYOB (bring your own bucket) ----------------
# Use MotherDuck for the catalog but store data in your own S3 bucket.
# [stores.dev.config.ducklake.metadata_backend]
# type = "motherduck"
# database = "my_ducklake"
# region = "eu-central-1"
# [stores.dev.config.ducklake.storage_backend]
# type = "s3"
# secret_name = "my_s3_secret"
# key_id = "AKIA..."
# secret = "..."
# region = "eu-central-1"
# scope = "s3://mybucket/"
# bucket = "mybucket"
# -- Storage backend: local filesystem ----------------------------------------
[stores.dev.config.ducklake.storage_backend]
type = "local"
path = "/tmp/ducklake_storage"
# -- Storage backend: S3 -----------------------------------------------------
# [stores.dev.config.ducklake.storage_backend]
# type = "s3"
# secret_name = "my_s3_secret" # Required: name for the DuckDB secret
# bucket = "my-ducklake-bucket"
# prefix = "data"
# region = "eu-central-1"
# # Provide inline credentials to create a secret, or omit them to reference
# # a pre-existing DuckDB secret named by secret_name.
# # key_id = "AKIA..."
# # secret = "secret"
# # For credential chain (IAM roles, env vars, etc.) instead of static credentials:
# # secret_parameters = { provider = "credential_chain" }
# # Extra parameters forwarded to DuckDB's CREATE SECRET (optional):
# # secret_parameters = { kms_key_id = "arn:aws:kms:..." }
# -- Storage backend: Cloudflare R2 ------------------------------------------
# [stores.dev.config.ducklake.storage_backend]
# type = "r2"
# secret_name = "my_r2_secret" # Required: name for the DuckDB secret
# account_id = "your-cloudflare-account-id"
# data_path = "r2://my-bucket/ducklake/"
# # key_id = "R2_ACCESS_KEY"
# # secret = "R2_SECRET_KEY"
# -- Storage backend: Google Cloud Storage ------------------------------------
# [stores.dev.config.ducklake.storage_backend]
# type = "gcs"
# secret_name = "my_gcs_secret" # Required: name for the DuckDB secret
# data_path = "gs://my-bucket/ducklake/"
# # key_id = "HMAC_ACCESS_KEY"
# # secret = "HMAC_SECRET"
Available backend combinations:
| Metadata backend | Storage backend |
|---|---|
| DuckDB, SQLite, PostgreSQL | local filesystem, S3, Cloudflare R2, Google Cloud Storage |
| MotherDuck | managed (no storage backend needed), or BYOB with S3/R2/GCS |
Tip
To use the credential chain (IAM roles, environment variables, etc.) instead of static credentials, set secret_parameters = { provider = "credential_chain" } on S3, R2, or GCS storage backends.
Note
MotherDuck supports a "Bring Your Own Bucket" (BYOB) mode where MotherDuck manages the DuckLake catalog while you provide your own S3-compatible storage. Storage secrets are created IN MOTHERDUCK so that MotherDuck compute can access your bucket.
Walkthrough¶
The demo script initializes the store from configuration and previews the SQL statements that would be executed:
"""Demonstration of configuring the DuckLake metadata store.
This example mirrors the Dagster DuckLake resource behaviour while staying
Narwhals-compatible. No DuckLake installation is required to run this script;
we preview the SQL statements that would be executed when attaching DuckLake.
"""
import metaxy as mx
from metaxy.ext.metadata_stores.duckdb import DuckDBMetadataStore
def preview_attachment_sql(store: DuckDBMetadataStore) -> list[str]:
"""Return the SQL statements DuckLake would execute on open()."""
return store.preview_ducklake_sql()
if __name__ == "__main__":
# Initialize metaxy and get config (searches for metaxy.toml)
config = mx.init()
ducklake_store = config.get_store()
assert isinstance(ducklake_store, DuckDBMetadataStore), (
"DuckLake example misconfigured: expected DuckDBMetadataStore."
)
ducklake_store.ducklake_attachment
print("DuckLake store initialised. Extensions:", ducklake_store.extensions)
print("\nPreview of DuckLake ATTACH SQL:")
for line in preview_attachment_sql(ducklake_store):
print(" ", line)
Run the demo:
The output shows the full sequence of SQL statements: creating secrets for the metadata and storage backends, and attaching the DuckLake database.