Skip to content

Metaxy + ClickHouse

ClickHouse is a (1) column-oriented OLAP database designed for real-time analytics. To use Metaxy with ClickHouse, configure ClickHouseMetadataStore. Versioning computations run natively in ClickHouse, making it well-suited for high-throughput production workloads.

  1. extremely fast

Installation

pip install 'metaxy[clickhouse]'

Metaxy's Versioning Struct Columns

Metaxy uses struct columns (metaxy_provenance_by_field, metaxy_data_version_by_field) to track field-level versioning. In Python world this corresponds to dict[str, str]. In ClickHouse, there are several options to represent these columns.

How ClickHouse Handles Structs

ClickHouse offers multiple approaches to represent Metaxy's structured versioning columns:

Type Description Use Case
Map(String, String) Native key-value map Recommended for Metaxy because of dynamic keys
JSON Native JSON with typed subcolumns Less performant than Map(String, String) but more flexible than Nested
Nested(field_1 String, ...) Static struct with named fields More performant than Map(String, String) but keys are static

Recommended: Map(String, String)

For Metaxy's metaxy_provenance_by_field and metaxy_data_version_by_field columns, use Map(String, String):

  • No migrations required when feature fields change

  • Good performance for key-value lookups

Special Map columns handling

Metaxy transforms its system columns (metaxy_provenance_by_field, metaxy_data_version_by_field):

  • Reading: System Map columns are converted into Ibis Structs (e.g., Struct[{"field_a": str, "field_b": str}])

  • Writing: If the input comes from Polars, then Polars Structs are converted into expected ClickHouse Map format

User-defined Map columns are not transformed. They remain as List[Struct[{"key": str, "value": str}]] (Arrow's Map representation). Make sure to use the right format when providing a Polars DataFrame for writing.

SQLAlchemy and Alembic Migrations

For SQLAlchemy and Alembic migrations support, use the clickhouse-sqlalchemy driver with the native protocol:

pip install clickhouse-sqlalchemy

Use Native Clickhouse Protocol

The HTTP protocol has limited reflection support. Always use the native protocol (clickhouse+native://) for full SQLAlchemy/Alembic compatibility:

connection_string = "clickhouse+native://user:pass@localhost:9000/default"

The ClickHouseMetadataStore.sqlalchemy_url property is tweaked to return the native connection string variant.

Alternative: ClickHouse Connect

Alternatively, use the official clickhouse-connect driver.

Alembic Integration

See Alembic setup guide for additional instructions on how to use Alembic with Metaxy.

Performance Optimization

Table Design

For optimal query performance, create your ClickHouse tables with:

  • Partitioning: Partition your tables!
  • Ordering: It's probably a good idea to use (metaxy_feature_version, <id_columns>, metaxy_updated_at)

API Reference

metaxy.ext.metadata_stores.clickhouse

This module implements IbisMetadataStore for ClickHouse.

It takes care of some ClickHouse-specific logic such as nw.Struct type conversion against ClickHouse types such as Map(K,V).

metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStore

ClickHouseMetadataStore(
    connection_string: str | None = None,
    *,
    connection_params: dict[str, Any] | None = None,
    fallback_stores: list[MetadataStore] | None = None,
    auto_cast_struct_for_map: bool = True,
    **kwargs: Any,
)

Bases: IbisMetadataStore

ClickHouse metadata store using Ibis backend.

Connection Parameters
store = ClickHouseMetadataStore(
    backend="clickhouse",
    connection_params={
        "host": "localhost",
        "port": 8443,
        "database": "default",
        "user": "default",
        "password": "",
    },
    hash_algorithm=HashAlgorithm.XXHASH64,
)

Parameters:

  • connection_string (str | None, default: None ) –

    ClickHouse connection string.

    Format: clickhouse://[user[:password]@]host[:port]/database[?param=value]

    Example:

    "clickhouse://localhost:8443/default"
    

  • connection_params (dict[str, Any] | None, default: None ) –

    Alternative to connection_string, specify params as dict:

    • host: Server host

    • port: Server port (default: 8443)

    • database: Database name

    • user: Username

    • password: Password

    • secure: Use secure connection (default: False)

  • fallback_stores (list[MetadataStore] | None, default: None ) –

    Ordered list of read-only fallback stores.

  • auto_cast_struct_for_map (bool, default: True ) –

    whether to auto-convert DataFrame user-defined Struct columns to Map format on write when the ClickHouse column is Map type. Metaxy system columns are always converted.

  • **kwargs (Any, default: {} ) –

    Passed to IbisMetadataStore`

Raises:

  • ImportError

    If ibis-clickhouse not installed

  • ValueError

    If neither connection_string nor connection_params provided

Source code in src/metaxy/ext/metadata_stores/clickhouse.py
def __init__(
    self,
    connection_string: str | None = None,
    *,
    connection_params: dict[str, Any] | None = None,
    fallback_stores: list["MetadataStore"] | None = None,
    auto_cast_struct_for_map: bool = True,
    **kwargs: Any,
):
    """
    Initialize [ClickHouse](https://clickhouse.com/) metadata store.

    Args:
        connection_string: ClickHouse connection string.

            Format: `clickhouse://[user[:password]@]host[:port]/database[?param=value]`

            Example:
                ```
                "clickhouse://localhost:8443/default"
                ```

        connection_params: Alternative to connection_string, specify params as dict:

            - host: Server host

            - port: Server port (default: `8443`)

            - database: Database name

            - user: Username

            - password: Password

            - secure: Use secure connection (default: `False`)

        fallback_stores: Ordered list of read-only fallback stores.

        auto_cast_struct_for_map: whether to auto-convert DataFrame user-defined Struct columns to Map format on write when the ClickHouse column is Map type. Metaxy system columns are always converted.

        **kwargs: Passed to [`IbisMetadataStore`][metaxy.metadata_store.ibis.IbisMetadataStore]`

    Raises:
        ImportError: If ibis-clickhouse not installed
        ValueError: If neither connection_string nor connection_params provided
    """
    if connection_string is None and connection_params is None:
        raise ValueError(
            "Must provide either connection_string or connection_params. "
            "Example: connection_string='clickhouse://localhost:8443/default'"
        )

    # Cache for ClickHouse table schemas (cleared on close)
    self._ch_schema_cache: dict[str, IbisSchema] = {}

    # Store auto_cast_struct_for_map setting
    self.auto_cast_struct_for_map = auto_cast_struct_for_map

    # Initialize Ibis store with ClickHouse backend
    super().__init__(
        connection_string=connection_string,
        backend="clickhouse" if connection_string is None else None,
        connection_params=connection_params,
        fallback_stores=fallback_stores,
        **kwargs,
    )

Configuration

Configuration for ClickHouseMetadataStore.

Example
metaxy.toml
[stores.dev]
type = "metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStore"

[stores.dev.config]
connection_string = "clickhouse://localhost:8443/default"
hash_algorithm = "xxhash64"
Show JSON schema:
{
  "$defs": {
    "HashAlgorithm": {
      "description": "Supported hash algorithms for field provenance calculation.\n\nThese algorithms are chosen for:\n- Speed (non-cryptographic hashes preferred)\n- Cross-database availability\n- Good collision resistance for field provenance calculation",
      "enum": [
        "xxhash64",
        "xxhash32",
        "wyhash",
        "sha256",
        "md5",
        "farmhash"
      ],
      "title": "HashAlgorithm",
      "type": "string"
    }
  },
  "additionalProperties": false,
  "description": "Configuration for ClickHouseMetadataStore.\n\nExample:\n    ```toml title=\"metaxy.toml\"\n    [stores.dev]\n    type = \"metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStore\"\n\n    [stores.dev.config]\n    connection_string = \"clickhouse://localhost:8443/default\"\n    hash_algorithm = \"xxhash64\"\n    ```",
  "properties": {
    "fallback_stores": {
      "description": "List of fallback store names to search when features are not found in the current store.",
      "items": {
        "type": "string"
      },
      "title": "Fallback Stores",
      "type": "array"
    },
    "hash_algorithm": {
      "anyOf": [
        {
          "$ref": "#/$defs/HashAlgorithm"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Hash algorithm for versioning. If None, uses store's default."
    },
    "versioning_engine": {
      "default": "auto",
      "description": "Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.",
      "enum": [
        "auto",
        "native",
        "polars"
      ],
      "title": "Versioning Engine",
      "type": "string"
    },
    "connection_string": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Ibis connection string (e.g., 'clickhouse://host:9000/db').",
      "title": "Connection String"
    },
    "backend": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Ibis backend name (e.g., 'clickhouse', 'postgres', 'duckdb').",
      "mkdocs_metaxy_hide": true,
      "title": "Backend"
    },
    "connection_params": {
      "anyOf": [
        {
          "additionalProperties": true,
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Backend-specific connection parameters.",
      "title": "Connection Params"
    },
    "table_prefix": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Optional prefix for all table names.",
      "title": "Table Prefix"
    },
    "auto_create_tables": {
      "anyOf": [
        {
          "type": "boolean"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "If True, create tables on open. For development/testing only.",
      "title": "Auto Create Tables"
    },
    "auto_cast_struct_for_map": {
      "default": true,
      "description": "Auto-convert DataFrame Struct columns to Map format on write when the ClickHouse column is Map type. Metaxy system columns are always converted.",
      "title": "Auto Cast Struct For Map",
      "type": "boolean"
    }
  },
  "title": "ClickHouseMetadataStoreConfig",
  "type": "object"
}

metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStoreConfig.fallback_stores pydantic-field

fallback_stores: list[str]

List of fallback store names to search when features are not found in the current store.

[stores.dev.config]
fallback_stores = []
[tool.metaxy.stores.dev.config]
fallback_stores = []
export METAXY_STORES__DEV__CONFIG__FALLBACK_STORES=[]

metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStoreConfig.hash_algorithm pydantic-field

hash_algorithm: HashAlgorithm | None = None

Hash algorithm for versioning. If None, uses store's default.

[stores.dev.config]
hash_algorithm = "..."
[tool.metaxy.stores.dev.config]
hash_algorithm = "..."
export METAXY_STORES__DEV__CONFIG__HASH_ALGORITHM=...

metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStoreConfig.versioning_engine pydantic-field

versioning_engine: Literal["auto", "native", "polars"] = (
    "auto"
)

Which versioning engine to use: 'auto' (prefer native), 'native', or 'polars'.

[stores.dev.config]
versioning_engine = "auto"
[tool.metaxy.stores.dev.config]
versioning_engine = "auto"
export METAXY_STORES__DEV__CONFIG__VERSIONING_ENGINE=auto

metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStoreConfig.connection_string pydantic-field

connection_string: str | None = None

Ibis connection string (e.g., 'clickhouse://host:9000/db').

[stores.dev.config]
connection_string = "..."
[tool.metaxy.stores.dev.config]
connection_string = "..."
export METAXY_STORES__DEV__CONFIG__CONNECTION_STRING=...

metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStoreConfig.connection_params pydantic-field

connection_params: dict[str, Any] | None = None

Backend-specific connection parameters.

[stores.dev.config]
connection_params = {}
[tool.metaxy.stores.dev.config]
connection_params = {}
export METAXY_STORES__DEV__CONFIG__CONNECTION_PARAMS=...

metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStoreConfig.table_prefix pydantic-field

table_prefix: str | None = None

Optional prefix for all table names.

[stores.dev.config]
table_prefix = "..."
[tool.metaxy.stores.dev.config]
table_prefix = "..."
export METAXY_STORES__DEV__CONFIG__TABLE_PREFIX=...

metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStoreConfig.auto_create_tables pydantic-field

auto_create_tables: bool | None = None

If True, create tables on open. For development/testing only.

[stores.dev.config]
auto_create_tables = false
[tool.metaxy.stores.dev.config]
auto_create_tables = false
export METAXY_STORES__DEV__CONFIG__AUTO_CREATE_TABLES=...

metaxy.ext.metadata_stores.clickhouse.ClickHouseMetadataStoreConfig.auto_cast_struct_for_map pydantic-field

auto_cast_struct_for_map: bool = True

Auto-convert DataFrame Struct columns to Map format on write when the ClickHouse column is Map type. Metaxy system columns are always converted.

[stores.dev.config]
auto_cast_struct_for_map = true
[tool.metaxy.stores.dev.config]
auto_cast_struct_for_map = true
export METAXY_STORES__DEV__CONFIG__AUTO_CAST_STRUCT_FOR_MAP=true