Skip to content

Types

Versioning Engine

metaxy.versioning.types.LazyIncrement dataclass

LazyIncrement(
    *,
    new: LazyFrame[Any],
    stale: LazyFrame[Any],
    orphaned: LazyFrame[Any],
    input: LazyFrame[Any] | None = None,
)

Result of an incremental update containing lazy dataframes.

Attributes:

  • new (LazyFrame[Any]) –

    New samples from upstream not present in current metadata

  • stale (LazyFrame[Any]) –

    Samples with provenance different to what was processed before

  • orphaned (LazyFrame[Any]) –

    Samples that have been processed before but are no longer present in upstream

  • input (LazyFrame[Any] | None) –

    Joined upstream metadata with FeatureDep rules applied.

Functions

metaxy.versioning.types.LazyIncrement.collect

collect(**kwargs: Any) -> Increment

Collect all lazy frames to eager DataFrames.

Tip

If all lazy frames are Polars frames, leverages polars.collect_all to optimize the collection process and take advantage of common subplan elimination.

Parameters:

  • **kwargs (Any, default: {} ) –

    backend-specific keyword arguments to pass to the collect method of the lazy frames.

Returns:

  • Increment ( Increment ) –

    The collected increment.

Source code in src/metaxy/versioning/types.py
def collect(self, **kwargs: Any) -> Increment:
    """Collect all lazy frames to eager DataFrames.

    !!! tip
        If all lazy frames are Polars frames, leverages
        [`polars.collect_all`](https://docs.pola.rs/api/python/stable/reference/api/polars.collect_all.html)
        to optimize the collection process and take advantage of common subplan elimination.

    Args:
        **kwargs: backend-specific keyword arguments to pass to the collect method of the lazy frames.

    Returns:
        Increment: The collected increment.
    """
    if (
        self.new.implementation
        == self.stale.implementation
        == self.orphaned.implementation
        == nw.Implementation.POLARS
    ):
        polars_eager_increment = PolarsLazyIncrement(
            new=self.new.to_native(),
            stale=self.stale.to_native(),
            orphaned=self.orphaned.to_native(),
        ).collect(**kwargs)
        return Increment(
            new=nw.from_native(polars_eager_increment.new),
            stale=nw.from_native(polars_eager_increment.stale),
            orphaned=nw.from_native(polars_eager_increment.orphaned),
        )
    else:
        return Increment(
            new=self.new.collect(**kwargs),
            stale=self.stale.collect(**kwargs),
            orphaned=self.orphaned.collect(**kwargs),
        )

metaxy.versioning.types.LazyIncrement.to_polars

to_polars() -> PolarsLazyIncrement

Convert to Polars.

Tip

If the Narwhals lazy frames are already backed by Polars, this is a no-op.

Warning

If the Narwhals lazy frames are not backed by Polars, this will trigger a full materialization for them.

Source code in src/metaxy/versioning/types.py
def to_polars(self) -> PolarsLazyIncrement:
    """Convert to Polars.

    !!! tip
        If the Narwhals lazy frames are already backed by Polars, this is a no-op.

    !!! warning
        If the Narwhals lazy frames are **not** backed by Polars, this will
        trigger a full materialization for them.
    """
    return PolarsLazyIncrement(
        new=lazy_frame_to_polars(self.new),
        stale=lazy_frame_to_polars(self.stale),
        orphaned=lazy_frame_to_polars(self.orphaned),
        input=lazy_frame_to_polars(self.input) if self.input is not None else None,
    )

metaxy.versioning.types.Increment

Bases: NamedTuple

Result of an incremental update containing eager dataframes.

Attributes:

  • new (DataFrame[Any]) –

    New samples from upstream not present in current metadata

  • stale (DataFrame[Any]) –

    Samples with provenance different to what was processed before

  • orphaned (DataFrame[Any]) –

    Samples that have been processed before but are no longer present in upstream

Functions

metaxy.versioning.types.Increment.collect

collect() -> Increment

Convenience method that's a no-op.

Source code in src/metaxy/versioning/types.py
def collect(self) -> "Increment":
    """Convenience method that's a no-op."""
    return self

metaxy.versioning.types.Increment.to_polars

to_polars() -> PolarsIncrement

Convert to Polars.

Source code in src/metaxy/versioning/types.py
def to_polars(self) -> PolarsIncrement:
    """Convert to Polars."""
    return PolarsIncrement(
        new=self.new.to_polars(),
        stale=self.stale.to_polars(),
        orphaned=self.orphaned.to_polars(),
    )

metaxy.versioning.types.PolarsIncrement

Bases: NamedTuple

Like Increment, but converted to Polars frames.

Attributes:

  • new (DataFrame) –

    New samples from upstream not present in current metadata

  • stale (DataFrame) –

    Samples with provenance different to what was processed before

  • orphaned (DataFrame) –

    Samples that have been processed before but are no longer present in upstream

Attributes

metaxy.versioning.types.PolarsIncrement.new instance-attribute

new: DataFrame

metaxy.versioning.types.PolarsIncrement.stale instance-attribute

stale: DataFrame

metaxy.versioning.types.PolarsIncrement.orphaned instance-attribute

orphaned: DataFrame

metaxy.versioning.types.PolarsLazyIncrement dataclass

PolarsLazyIncrement(
    *,
    new: LazyFrame,
    stale: LazyFrame,
    orphaned: LazyFrame,
    input: LazyFrame | None = None,
)

Like LazyIncrement, but converted to Polars lazy frames.

Attributes:

  • new (LazyFrame) –

    New samples from upstream not present in current metadata

  • stale (LazyFrame) –

    Samples with provenance different to what was processed before

  • orphaned (LazyFrame) –

    Samples that have been processed before but are no longer present in upstream

  • input (LazyFrame | None) –

    Joined upstream metadata with FeatureDep rules applied.

Attributes

metaxy.versioning.types.PolarsLazyIncrement.new instance-attribute

new: LazyFrame

metaxy.versioning.types.PolarsLazyIncrement.stale instance-attribute

stale: LazyFrame

metaxy.versioning.types.PolarsLazyIncrement.orphaned instance-attribute

orphaned: LazyFrame

metaxy.versioning.types.PolarsLazyIncrement.input class-attribute instance-attribute

input: LazyFrame | None = None

Functions

metaxy.versioning.types.PolarsLazyIncrement.collect

collect(**kwargs: Any) -> PolarsIncrement

Collect into a PolarsIncrement.

Tip

Leverages polars.collect_all to optimize the collection process and take advantage of common subplan elimination.

Parameters:

  • **kwargs (Any, default: {} ) –

    backend-specific keyword arguments to pass to the collect method of the lazy frames.

Returns:

Source code in src/metaxy/versioning/types.py
def collect(self, **kwargs: Any) -> PolarsIncrement:
    """Collect into a [`PolarsIncrement`][metaxy.versioning.types.PolarsIncrement].

    !!! tip
        Leverages [`polars.collect_all`](https://docs.pola.rs/api/python/stable/reference/api/polars.collect_all.html)
        to optimize the collection process and take advantage of common subplan elimination.

    Args:
        **kwargs: backend-specific keyword arguments to pass to the collect method of the lazy frames.

    Returns:
        PolarsIncrement: The collected increment.
    """
    added, changed, removed = pl.collect_all([self.new, self.stale, self.orphaned], **kwargs)
    return PolarsIncrement(added, changed, removed)  # ty: ignore[invalid-argument-type]

metaxy.HashAlgorithm

Bases: Enum

Supported hash algorithms for field provenance calculation.

These algorithms are chosen for: - Speed (non-cryptographic hashes preferred) - Cross-database availability - Good collision resistance for field provenance calculation

Attributes

metaxy.HashAlgorithm.XXHASH64 class-attribute instance-attribute

XXHASH64 = 'xxhash64'

metaxy.HashAlgorithm.XXHASH32 class-attribute instance-attribute

XXHASH32 = 'xxhash32'

metaxy.HashAlgorithm.WYHASH class-attribute instance-attribute

WYHASH = 'wyhash'

metaxy.HashAlgorithm.SHA256 class-attribute instance-attribute

SHA256 = 'sha256'

metaxy.HashAlgorithm.MD5 class-attribute instance-attribute

MD5 = 'md5'

metaxy.HashAlgorithm.FARMHASH class-attribute instance-attribute

FARMHASH = 'farmhash'

Keys

Types for working with feature and field keys.

Canonical Keys

metaxy.FeatureKey

FeatureKey(parts: str)
FeatureKey(parts: Sequence[str])
FeatureKey(parts: FeatureKey)

Bases: _Key

Feature key as a sequence of string parts.

Hashable for use as dict keys in registries. Parts cannot contain forward slashes (/) or double underscores (__).

Example:

```py
FeatureKey("a/b/c")  # String format
# FeatureKey(parts=['a', 'b', 'c'])

FeatureKey(["a", "b", "c"])  # List format
# FeatureKey(parts=['a', 'b', 'c'])

FeatureKey(FeatureKey(["a", "b", "c"]))  # FeatureKey copy
# FeatureKey(parts=['a', 'b', 'c'])
```
Source code in src/metaxy/models/types.py
def __init__(
    self,
    parts: str | Sequence[str] | FeatureKey,
) -> None: ...

Functions

metaxy.FeatureKey.model_dump

model_dump(**kwargs: Any) -> Any

Serialize to string format for JSON dict key compatibility.

Source code in src/metaxy/models/types.py
def model_dump(self, **kwargs: Any) -> Any:
    """Serialize to string format for JSON dict key compatibility."""
    return self.to_string()

metaxy.FeatureKey.__hash__

__hash__() -> int

Return hash for use as dict keys.

Source code in src/metaxy/models/types.py
def __hash__(self) -> int:
    """Return hash for use as dict keys."""
    return hash(self.parts)

metaxy.FeatureKey.__eq__

__eq__(other: Any) -> bool

Check equality with another instance.

Source code in src/metaxy/models/types.py
def __eq__(self, other: Any) -> bool:
    """Check equality with another instance."""
    if isinstance(other, self.__class__):
        return self.parts == other.parts
    return super().__eq__(other)

metaxy.FeatureKey.to_column_suffix

to_column_suffix() -> str

Convert to a suffix usable for database column names (typically temporary).

Source code in src/metaxy/models/types.py
def to_column_suffix(self) -> str:
    """Convert to a suffix usable for database column names (typically temporary)."""
    return "__" + "_".join(self.parts)

metaxy.FieldKey

FieldKey(parts: str)
FieldKey(parts: Sequence[str])
FieldKey(parts: FieldKey)

Bases: _Key

Field key as a sequence of string parts.

Hashable for use as dict keys in registries. Parts cannot contain forward slashes (/) or double underscores (__).

Example:

```py
FieldKey("a/b/c")  # String format
# FieldKey(parts=['a', 'b', 'c'])

FieldKey(["a", "b", "c"])  # List format
# FieldKey(parts=['a', 'b', 'c'])

FieldKey(FieldKey(["a", "b", "c"]))  # FieldKey copy
# FieldKey(parts=['a', 'b', 'c'])
```
Source code in src/metaxy/models/types.py
def __init__(
    self,
    parts: str | Sequence[str] | FieldKey,
) -> None: ...

Functions

metaxy.FieldKey.model_dump

model_dump(**kwargs: Any) -> Any

Serialize to string format for JSON dict key compatibility.

Source code in src/metaxy/models/types.py
def model_dump(self, **kwargs: Any) -> Any:
    """Serialize to string format for JSON dict key compatibility."""
    return self.to_string()

metaxy.FieldKey.__hash__

__hash__() -> int

Return hash for use as dict keys.

Source code in src/metaxy/models/types.py
def __hash__(self) -> int:
    """Return hash for use as dict keys."""
    return hash(self.parts)

metaxy.FieldKey.__eq__

__eq__(other: Any) -> bool

Check equality with another instance.

Source code in src/metaxy/models/types.py
def __eq__(self, other: Any) -> bool:
    """Check equality with another instance."""
    if isinstance(other, self.__class__):
        return self.parts == other.parts
    return super().__eq__(other)

Type Annotations

These are typically used to annotate function parameters. Most APIs in Metaxy accepts them and perform type coercion into canonical types.

metaxy.CoercibleToFeatureKey module-attribute

CoercibleToFeatureKey: TypeAlias = "str | Sequence[str] | FeatureKey | type[BaseFeature] | FeatureDefinition | FeatureSpec"

Type alias for values that can be coerced to a FeatureKey.

Accepted formats:

  • str: Slash-separated string like "raw/video" or "ml/embeddings/v2"
  • Sequence[str]: sequences of parts like ["user", "profile"]
  • FeatureKey: Pass through unchanged
  • type[BaseFeature]: Any BaseFeature subclass - extracts its key via .spec().key
  • FeatureDefinition: Extracts its key via .key
  • FeatureSpec: Extracts its key via .key
Example
key1 = "raw/video"
key2 = ["raw", "video"]
key3 = mx.FeatureKey("raw/video")
key4 = MyFeatureClass  # where MyFeatureClass is a BaseFeature subclass
key5 = mx.FeatureDefinition("raw/video", ...)
key6 = mx.FeatureSpec(key="raw/video", id_columns=("id",))

metaxy.CoercibleToFieldKey module-attribute

CoercibleToFieldKey: TypeAlias = (
    str | Sequence[str] | FieldKey
)

Type alias for values that can be coerced to a FieldKey.

Accepted formats:

  • str: Slash-separated string like "audio/english"
  • Sequence[str]: sequence of parts like ["audio", "english"]
  • FieldKey: Pass through unchanged
Example
key1 = "audio/english"
key2 = ["audio", "english"]
key3 = mx.FieldKey("audio/english")

Pydantic Type Annotations

These types are used for type coercion into canonical types with Pydantic.

metaxy.ValidatedFeatureKey module-attribute

ValidatedFeatureKey: TypeAlias = FeatureKey

metaxy.ValidatedFieldKey module-attribute

ValidatedFieldKey: TypeAlias = FieldKey

metaxy.ValidatedFeatureKeySequence module-attribute

ValidatedFeatureKeySequence: TypeAlias = Sequence[
    ValidatedFeatureKey
]

metaxy.ValidatedFieldKeySequence module-attribute

ValidatedFieldKeySequence: TypeAlias = Sequence[
    ValidatedFieldKey
]

Adapters

These can perform type coercsion into canonical types in non-pydantic code.

metaxy.ValidatedFeatureKeyAdapter module-attribute

metaxy.ValidatedFeatureKeySequenceAdapter module-attribute

metaxy.ValidatedFieldKeyAdapter module-attribute

ValidatedFieldKeyAdapter: TypeAdapter[ValidatedFieldKey] = (
    TypeAdapter(ValidatedFieldKey)
)

metaxy.ValidatedFieldKeySequenceAdapter module-attribute

Other Types

metaxy.models.types.PushResult

Bases: NamedTuple

Result of recording a feature graph snapshot.

Attributes:

  • project_version (str) –

    The deterministic hash of the graph's project version

  • already_pushed (bool) –

    True if this project_version was already pushed previously

  • updated_features (list[str]) –

    List of feature keys with updated information (changed definition_version)

metaxy.IDColumns module-attribute

IDColumns: TypeAlias = Sequence[str]