Skip to content

Feature Spec

Feature specs act as source of truth for all metadata related to features: their dependencies, fields, code versions, and so on.

metaxy.FeatureSpec pydantic-model

FeatureSpec(
    *,
    key: CoercibleToFeatureKey,
    id_columns: IDColumns,
    deps: list[FeatureDep] | None = None,
    fields: Sequence[str | FieldSpec] | None = None,
    metadata: dict[str, Any] | None = None,
    description: str | None = None,
)
FeatureSpec(
    *,
    key: CoercibleToFeatureKey,
    id_columns: IDColumns,
    deps: list[CoercibleToFeatureDep] | None = None,
    fields: Sequence[str | FieldSpec] | None = None,
    metadata: dict[str, Any] | None = None,
    description: str | None = None,
)

Bases: FrozenBaseModel

Show JSON schema:
{
  "$defs": {
    "AggregationRelationship": {
      "description": "Many-to-one relationship where multiple parent rows aggregate to one child row.\n\nParent features have more granular ID columns than the child. The child aggregates\nmultiple parent rows by grouping on a subset of the parent's ID columns.\n\nConstruct this relationship via [`LineageRelationship.aggregation`][metaxy.models.lineage.LineageRelationship.aggregation] classmethod.\n\nAttributes:\n    on: Columns to group by for aggregation. These should be a subset of the\n        target feature's ID columns. If not specified, uses all target ID columns.\n\nExample:\n    ```python\n    mx.LineageRelationship.aggregation(on=[\"sensor_id\", \"hour\"])\n    ```",
      "properties": {
        "type": {
          "const": "N:1",
          "default": "N:1",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Columns to group by for aggregation. Defaults to all target ID columns.",
          "title": "On"
        }
      },
      "title": "AggregationRelationship",
      "type": "object"
    },
    "AllFieldsMapping": {
      "description": "Field mapping that explicitly depends on all upstream fields.",
      "properties": {
        "type": {
          "const": "all",
          "default": "all",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "AllFieldsMapping",
      "type": "object"
    },
    "DefaultFieldsMapping": {
      "description": "Default automatic field mapping configuration.\n\nWhen used, automatically maps fields to matching upstream fields based on field keys.\n\nAttributes:\n    match_suffix: If True, allows suffix matching (e.g., \"french\" matches \"audio/french\")\n    exclude_fields: List of field keys to exclude from auto-mapping",
      "properties": {
        "type": {
          "const": "default",
          "default": "default",
          "title": "Type",
          "type": "string"
        },
        "match_suffix": {
          "default": false,
          "title": "Match Suffix",
          "type": "boolean"
        },
        "exclude_fields": {
          "items": {
            "$ref": "#/$defs/FieldKey"
          },
          "title": "Exclude Fields",
          "type": "array"
        }
      },
      "title": "DefaultFieldsMapping",
      "type": "object"
    },
    "ExpansionRelationship": {
      "description": "One-to-many relationship where one parent row expands to multiple child rows.\n\nChild features have more granular ID columns than the parent. Each parent row\ngenerates multiple child rows with additional ID columns.\n\nConstruct this relationship via [`LineageRelationship.expansion`][metaxy.models.lineage.LineageRelationship.expansion] classmethod.\n\nAttributes:\n    on: Parent ID columns that identify the parent record. Child records with\n        the same parent IDs will share the same upstream provenance.\n        If not specified, will be inferred from the available columns.\n    id_generation_pattern: Optional pattern for generating child IDs.\n        Can be \"sequential\", \"hash\", or a custom pattern. If not specified,\n        the feature's load_input() method is responsible for ID generation.\n\nExample:\n    ```python\n    mx.LineageRelationship.expansion(on=[\"video_id\"], id_generation_pattern=\"sequential\")\n    ```",
      "properties": {
        "type": {
          "const": "1:N",
          "default": "1:N",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "description": "Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.",
          "items": {
            "type": "string"
          },
          "title": "On",
          "type": "array"
        },
        "id_generation_pattern": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Pattern for generating child IDs. If None, handled by load_input().",
          "title": "Id Generation Pattern"
        }
      },
      "required": [
        "on"
      ],
      "title": "ExpansionRelationship",
      "type": "object"
    },
    "FeatureDep": {
      "additionalProperties": false,
      "description": "Feature dependency specification with optional column selection, renaming, and lineage.\n\nAttributes:\n    feature: The feature key to depend on. Accepts string (\"a/b/c\"), list ([\"a\", \"b\", \"c\"]),\n        FeatureKey instance, or BaseFeature class.\n    select: Optional sequence of column names to select from the upstream feature.\n        By default, all columns are selected. System columns are always selected.\n        Uses post-rename names when `rename` is also specified.\n    rename: Optional mapping of old column names to new names.\n        Applied before column selection.\n    fields_mapping: Optional field mapping configuration for automatic field dependency resolution.\n        When provided, fields without explicit deps will automatically map to matching upstream fields.\n        Defaults to using `[FieldsMapping.default()][metaxy.models.fields_mapping.DefaultFieldsMapping]`.\n    filters: Optional SQL-like filter strings applied to this dependency. Automatically parsed into\n        Narwhals expressions (accessible via the `filters` property). Filters are automatically\n        applied by FeatureDepTransformer after renames during all FeatureDep operations (including\n        resolve_update and version computation).\n    lineage: The lineage relationship between this upstream dependency and the downstream feature.\n        - `LineageRelationship.identity()` (default): 1:1 relationship, same cardinality\n        - `LineageRelationship.aggregation(on=...)`: N:1, multiple upstream rows aggregate to one downstream\n        - `LineageRelationship.expansion(on=...)`: 1:N, one upstream row expands to multiple downstream rows\n    optional: Whether individual samples of the downstream feature can be computed without\n        the corresponding samples of the upstream feature. If upstream samples are missing,\n        they are going to be represented as NULL values in the joined upstream metadata.\n        Defaults to False (required dependency).\n\nExample: Basic Usage\n    ```py\n    # Keep all columns with default field mapping (1:1 lineage)\n    mx.FeatureDep(feature=\"upstream\")\n\n    # Keep only specific columns\n    mx.FeatureDep(feature=\"upstream/feature\", select=(\"col1\", \"col2\"))\n\n    # Rename columns to avoid conflicts\n    mx.FeatureDep(feature=\"upstream/feature\", rename={\"old_name\": \"new_name\"})\n\n    # Combined rename + select: select uses post-rename names\n    mx.FeatureDep(\n        feature=\"upstream/feature\",\n        rename={\"old_name\": \"new_name\"},\n        select=(\"new_name\", \"other_col\"),\n    )\n\n    # SQL filters\n    mx.FeatureDep(feature=\"upstream\", filters=[\"age >= 25\", \"status = 'active'\"])\n\n    # Optional dependency (left join - samples preserved even if no match)\n    mx.FeatureDep(feature=\"enrichment/data\", optional=True)\n    ```\n\nExample: Lineage Relationships\n    ```py\n    from metaxy.models.lineage import LineageRelationship\n\n    # Aggregation: many sensor readings aggregate to one hourly stat\n    mx.FeatureDep(feature=\"sensor_readings\", lineage=LineageRelationship.aggregation(on=[\"sensor_id\", \"hour\"]))\n\n    # Expansion: one video expands to many frames\n    mx.FeatureDep(feature=\"video\", lineage=LineageRelationship.expansion(on=[\"video_id\"]))\n\n    # Mixed lineage: aggregate from one parent, identity from another\n    # In FeatureSpec:\n    deps = [\n        mx.FeatureDep(feature=\"readings\", lineage=LineageRelationship.aggregation(on=[\"sensor_id\"])),\n        mx.FeatureDep(feature=\"sensor_info\", lineage=LineageRelationship.identity()),\n    ]\n    ```",
      "properties": {
        "feature": {
          "$ref": "#/$defs/FeatureKey",
          "description": "Feature key. Accepts a slashed string ('a/b/c'), a sequence of strings, a FeatureKey instance, or a child class of BaseFeature"
        },
        "select": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Select"
        },
        "rename": {
          "anyOf": [
            {
              "additionalProperties": {
                "type": "string"
              },
              "type": "object"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Rename"
        },
        "fields_mapping": {
          "$ref": "#/$defs/FieldsMapping"
        },
        "filters": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "SQL-like filter strings applied to this dependency.",
          "title": "Filters"
        },
        "lineage": {
          "$ref": "#/$defs/LineageRelationship",
          "description": "Lineage relationship between this upstream dependency and the downstream feature."
        },
        "optional": {
          "default": false,
          "description": "Whether individual samples of the downstream feature can be computed without the corresponding samples of the upstream feature. If upstream samples are missing, they are going to be represented as NULL values in the joined upstream metadata.",
          "title": "Optional",
          "type": "boolean"
        }
      },
      "required": [
        "feature"
      ],
      "title": "FeatureDep",
      "type": "object"
    },
    "FeatureKey": {
      "description": "Feature key as a sequence of string parts.\n\nHashable for use as dict keys in registries.\nParts cannot contain forward slashes (/) or double underscores (__).\n\nExample:\n\n    ```py\n    FeatureKey(\"a/b/c\")  # String format\n    # FeatureKey(parts=['a', 'b', 'c'])\n\n    FeatureKey([\"a\", \"b\", \"c\"])  # List format\n    # FeatureKey(parts=['a', 'b', 'c'])\n\n    FeatureKey(FeatureKey([\"a\", \"b\", \"c\"]))  # FeatureKey copy\n    # FeatureKey(parts=['a', 'b', 'c'])\n    ```",
      "items": {
        "type": "string"
      },
      "title": "FeatureKey",
      "type": "array"
    },
    "FieldDep": {
      "additionalProperties": false,
      "properties": {
        "feature": {
          "$ref": "#/$defs/FeatureKey"
        },
        "fields": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/FieldKey"
              },
              "type": "array"
            },
            {
              "const": "__METAXY_ALL_DEP__",
              "type": "string"
            }
          ],
          "default": "__METAXY_ALL_DEP__",
          "title": "Fields"
        }
      },
      "required": [
        "feature"
      ],
      "title": "FieldDep",
      "type": "object"
    },
    "FieldKey": {
      "description": "Field key as a sequence of string parts.\n\nHashable for use as dict keys in registries.\nParts cannot contain forward slashes (/) or double underscores (__).\n\nExample:\n\n    ```py\n    FieldKey(\"a/b/c\")  # String format\n    # FieldKey(parts=['a', 'b', 'c'])\n\n    FieldKey([\"a\", \"b\", \"c\"])  # List format\n    # FieldKey(parts=['a', 'b', 'c'])\n\n    FieldKey(FieldKey([\"a\", \"b\", \"c\"]))  # FieldKey copy\n    # FieldKey(parts=['a', 'b', 'c'])\n    ```",
      "items": {
        "type": "string"
      },
      "title": "FieldKey",
      "type": "array"
    },
    "FieldsMapping": {
      "description": "Base class for field mapping configurations.\n\nField mappings define how a field automatically resolves its dependencies\nbased on upstream feature fields. This is separate from explicit field\ndependencies which are defined directly.",
      "properties": {
        "mapping": {
          "discriminator": {
            "mapping": {
              "all": "#/$defs/AllFieldsMapping",
              "default": "#/$defs/DefaultFieldsMapping",
              "none": "#/$defs/NoneFieldsMapping",
              "specific": "#/$defs/SpecificFieldsMapping"
            },
            "propertyName": "type"
          },
          "oneOf": [
            {
              "$ref": "#/$defs/AllFieldsMapping"
            },
            {
              "$ref": "#/$defs/SpecificFieldsMapping"
            },
            {
              "$ref": "#/$defs/NoneFieldsMapping"
            },
            {
              "$ref": "#/$defs/DefaultFieldsMapping"
            }
          ],
          "title": "Mapping"
        }
      },
      "required": [
        "mapping"
      ],
      "title": "FieldsMapping",
      "type": "object"
    },
    "IdentityRelationship": {
      "description": "One-to-one relationship where each child row maps to exactly one parent row.\n\nThis is the default relationship type. Parent and child features share the same\nID columns and have the same cardinality.\n\nConstruct this relationship via [`LineageRelationship.identity`][metaxy.models.lineage.LineageRelationship.identity] classmethod.\n\nExample:\n    ```python\n    mx.LineageRelationship.identity()\n    ```",
      "properties": {
        "type": {
          "const": "1:1",
          "default": "1:1",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "IdentityRelationship",
      "type": "object"
    },
    "LineageRelationship": {
      "description": "Wrapper class for lineage relationship configurations with convenient constructors.\n\nThis provides a cleaner API for creating lineage relationships while maintaining\ntype safety through discriminated unions.",
      "properties": {
        "relationship": {
          "discriminator": {
            "mapping": {
              "1:1": "#/$defs/IdentityRelationship",
              "1:N": "#/$defs/ExpansionRelationship",
              "N:1": "#/$defs/AggregationRelationship"
            },
            "propertyName": "type"
          },
          "oneOf": [
            {
              "$ref": "#/$defs/IdentityRelationship"
            },
            {
              "$ref": "#/$defs/AggregationRelationship"
            },
            {
              "$ref": "#/$defs/ExpansionRelationship"
            }
          ],
          "title": "Relationship"
        }
      },
      "required": [
        "relationship"
      ],
      "title": "LineageRelationship",
      "type": "object"
    },
    "NoneFieldsMapping": {
      "description": "Field mapping that never matches any upstream fields.",
      "properties": {
        "type": {
          "const": "none",
          "default": "none",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "NoneFieldsMapping",
      "type": "object"
    },
    "SpecialFieldDep": {
      "enum": [
        "__METAXY_ALL_DEP__"
      ],
      "title": "SpecialFieldDep",
      "type": "string"
    },
    "SpecificFieldsMapping": {
      "description": "Field mapping that explicitly depends on specific upstream fields.",
      "properties": {
        "type": {
          "const": "specific",
          "default": "specific",
          "title": "Type",
          "type": "string"
        },
        "mapping": {
          "additionalProperties": {
            "items": {
              "$ref": "#/$defs/FieldKey"
            },
            "type": "array",
            "uniqueItems": true
          },
          "propertyNames": {
            "$ref": "#/$defs/FieldKey"
          },
          "title": "Mapping",
          "type": "object"
        }
      },
      "required": [
        "mapping"
      ],
      "title": "SpecificFieldsMapping",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "properties": {
    "key": {
      "$ref": "#/$defs/FeatureKey"
    },
    "id_columns": {
      "description": "Columns that uniquely identify a sample in this feature.",
      "items": {
        "type": "string"
      },
      "title": "Id Columns",
      "type": "array"
    },
    "deps": {
      "items": {
        "$ref": "#/$defs/FeatureDep"
      },
      "title": "Deps",
      "type": "array"
    },
    "fields": {
      "items": {
        "additionalProperties": false,
        "properties": {
          "key": {
            "$ref": "#/$defs/FieldKey"
          },
          "code_version": {
            "default": "__metaxy_initial__",
            "title": "Code Version",
            "type": "string"
          },
          "deps": {
            "anyOf": [
              {
                "$ref": "#/$defs/SpecialFieldDep"
              },
              {
                "items": {
                  "$ref": "#/$defs/FieldDep"
                },
                "type": "array"
              }
            ],
            "title": "Deps"
          }
        },
        "title": "FieldSpec",
        "type": "object"
      },
      "title": "Fields",
      "type": "array"
    },
    "metadata": {
      "additionalProperties": true,
      "description": "Metadata attached to this feature.",
      "title": "Metadata",
      "type": "object"
    },
    "description": {
      "anyOf": [
        {
          "type": "string"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "Human-readable description of this feature.",
      "title": "Description"
    }
  },
  "required": [
    "key",
    "id_columns"
  ],
  "title": "FeatureSpec",
  "type": "object"
}

Fields:

Validators:

Source code in src/metaxy/models/feature_spec.py
def __init__(
    self,
    *,
    key: CoercibleToFeatureKey,
    id_columns: IDColumns,
    deps: list[FeatureDep] | list[CoercibleToFeatureDep] | None = None,
    fields: Sequence[str | FieldSpec] | None = None,
    metadata: dict[str, Any] | None = None,
    description: str | None = None,
) -> None: ...

Attributes

metaxy.FeatureSpec.id_columns pydantic-field

id_columns: tuple[str, ...]

Columns that uniquely identify a sample in this feature.

metaxy.FeatureSpec.metadata pydantic-field

metadata: dict[str, Any]

Metadata attached to this feature.

metaxy.FeatureSpec.description pydantic-field

description: str | None = None

Human-readable description of this feature.

metaxy.FeatureSpec.deps_by_key cached property

deps_by_key: Mapping[FeatureKey, FeatureDep]

Get dependencies indexed by their feature key.

metaxy.FeatureSpec.code_version cached property

code_version: str

Hash of this feature's field code_versions only (no dependencies).

metaxy.FeatureSpec.feature_spec_version property

feature_spec_version: str

Compute SHA256 hash of the complete feature specification.

This property provides a deterministic hash of ALL specification properties, including key, deps, fields, and any metadata/tags. Used for audit trail and tracking specification changes.

Unlike feature_version which only hashes computational properties (for migration triggering), feature_spec_version captures the entire specification for complete reproducibility and audit purposes.

Returns:

  • str –

    SHA256 hex digest of the specification

Example
spec = mx.FeatureSpec(
    key=mx.FeatureKey(["my", "feature"]),
    id_columns=["id"],
)
spec.feature_spec_version
# 'abc123...'  # 64-character hex string

Functions

metaxy.FeatureSpec.table_name

table_name() -> str

Get SQL-like table name for this feature spec.

Source code in src/metaxy/models/feature_spec.py
def table_name(self) -> str:
    """Get SQL-like table name for this feature spec."""
    return self.key.table_name

metaxy.FeatureSpec.validate_unique_field_keys pydantic-validator

validate_unique_field_keys() -> Self

Validate that all fields have unique keys.

Source code in src/metaxy/models/feature_spec.py
@pydantic.model_validator(mode="after")
def validate_unique_field_keys(self) -> Self:
    """Validate that all fields have unique keys."""
    seen_keys: set[tuple[str, ...]] = set()
    for field in self.fields:
        # Convert to tuple for hashability in case it's a plain list
        key_tuple = tuple(field.key)
        if key_tuple in seen_keys:
            raise ValueError(f"Duplicate field key found: {field.key}. All fields must have unique keys.")
        seen_keys.add(key_tuple)
    return self

metaxy.FeatureSpec.validate_id_columns pydantic-validator

validate_id_columns() -> Self

Validate that id_columns is non-empty if specified.

Source code in src/metaxy/models/feature_spec.py
@pydantic.model_validator(mode="after")
def validate_id_columns(self) -> Self:
    """Validate that id_columns is non-empty if specified."""
    if self.id_columns is not None and len(self.id_columns) == 0:
        raise ValueError("id_columns must be non-empty if specified. Use None for default.")
    return self

Feature Dependencies

metaxy.FeatureDep pydantic-model

FeatureDep(
    *,
    feature: str
    | Sequence[str]
    | FeatureKey
    | type[BaseFeature],
    select: tuple[str, ...] | None = None,
    rename: dict[str, str] | None = None,
    fields_mapping: FieldsMapping | None = None,
    filters: Sequence[str] | None = None,
    lineage: LineageRelationship | None = None,
    optional: bool = False,
)

Bases: BaseModel

Feature dependency specification with optional column selection, renaming, and lineage.

Attributes:

  • feature (ValidatedFeatureKey) –

    The feature key to depend on. Accepts string ("a/b/c"), list (["a", "b", "c"]), FeatureKey instance, or BaseFeature class.

  • select (tuple[str, ...] | None) –

    Optional sequence of column names to select from the upstream feature. By default, all columns are selected. System columns are always selected. Uses post-rename names when rename is also specified.

  • rename (dict[str, str] | None) –

    Optional mapping of old column names to new names. Applied before column selection.

  • fields_mapping (FieldsMapping) –

    Optional field mapping configuration for automatic field dependency resolution. When provided, fields without explicit deps will automatically map to matching upstream fields. Defaults to using [FieldsMapping.default()][metaxy.models.fields_mapping.DefaultFieldsMapping].

  • filters (tuple[Expr, ...]) –

    Optional SQL-like filter strings applied to this dependency. Automatically parsed into Narwhals expressions (accessible via the filters property). Filters are automatically applied by FeatureDepTransformer after renames during all FeatureDep operations (including resolve_update and version computation).

  • lineage (LineageRelationship) –

    The lineage relationship between this upstream dependency and the downstream feature. - LineageRelationship.identity() (default): 1:1 relationship, same cardinality - LineageRelationship.aggregation(on=...): N:1, multiple upstream rows aggregate to one downstream - LineageRelationship.expansion(on=...): 1:N, one upstream row expands to multiple downstream rows

  • optional (bool) –

    Whether individual samples of the downstream feature can be computed without the corresponding samples of the upstream feature. If upstream samples are missing, they are going to be represented as NULL values in the joined upstream metadata. Defaults to False (required dependency).

Basic Usage
# Keep all columns with default field mapping (1:1 lineage)
mx.FeatureDep(feature="upstream")

# Keep only specific columns
mx.FeatureDep(feature="upstream/feature", select=("col1", "col2"))

# Rename columns to avoid conflicts
mx.FeatureDep(feature="upstream/feature", rename={"old_name": "new_name"})

# Combined rename + select: select uses post-rename names
mx.FeatureDep(
    feature="upstream/feature",
    rename={"old_name": "new_name"},
    select=("new_name", "other_col"),
)

# SQL filters
mx.FeatureDep(feature="upstream", filters=["age >= 25", "status = 'active'"])

# Optional dependency (left join - samples preserved even if no match)
mx.FeatureDep(feature="enrichment/data", optional=True)
Lineage Relationships
from metaxy.models.lineage import LineageRelationship

# Aggregation: many sensor readings aggregate to one hourly stat
mx.FeatureDep(feature="sensor_readings", lineage=LineageRelationship.aggregation(on=["sensor_id", "hour"]))

# Expansion: one video expands to many frames
mx.FeatureDep(feature="video", lineage=LineageRelationship.expansion(on=["video_id"]))

# Mixed lineage: aggregate from one parent, identity from another
# In FeatureSpec:
deps = [
    mx.FeatureDep(feature="readings", lineage=LineageRelationship.aggregation(on=["sensor_id"])),
    mx.FeatureDep(feature="sensor_info", lineage=LineageRelationship.identity()),
]
Show JSON schema:
{
  "$defs": {
    "AggregationRelationship": {
      "description": "Many-to-one relationship where multiple parent rows aggregate to one child row.\n\nParent features have more granular ID columns than the child. The child aggregates\nmultiple parent rows by grouping on a subset of the parent's ID columns.\n\nConstruct this relationship via [`LineageRelationship.aggregation`][metaxy.models.lineage.LineageRelationship.aggregation] classmethod.\n\nAttributes:\n    on: Columns to group by for aggregation. These should be a subset of the\n        target feature's ID columns. If not specified, uses all target ID columns.\n\nExample:\n    ```python\n    mx.LineageRelationship.aggregation(on=[\"sensor_id\", \"hour\"])\n    ```",
      "properties": {
        "type": {
          "const": "N:1",
          "default": "N:1",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "anyOf": [
            {
              "items": {
                "type": "string"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Columns to group by for aggregation. Defaults to all target ID columns.",
          "title": "On"
        }
      },
      "title": "AggregationRelationship",
      "type": "object"
    },
    "AllFieldsMapping": {
      "description": "Field mapping that explicitly depends on all upstream fields.",
      "properties": {
        "type": {
          "const": "all",
          "default": "all",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "AllFieldsMapping",
      "type": "object"
    },
    "DefaultFieldsMapping": {
      "description": "Default automatic field mapping configuration.\n\nWhen used, automatically maps fields to matching upstream fields based on field keys.\n\nAttributes:\n    match_suffix: If True, allows suffix matching (e.g., \"french\" matches \"audio/french\")\n    exclude_fields: List of field keys to exclude from auto-mapping",
      "properties": {
        "type": {
          "const": "default",
          "default": "default",
          "title": "Type",
          "type": "string"
        },
        "match_suffix": {
          "default": false,
          "title": "Match Suffix",
          "type": "boolean"
        },
        "exclude_fields": {
          "items": {
            "$ref": "#/$defs/FieldKey"
          },
          "title": "Exclude Fields",
          "type": "array"
        }
      },
      "title": "DefaultFieldsMapping",
      "type": "object"
    },
    "ExpansionRelationship": {
      "description": "One-to-many relationship where one parent row expands to multiple child rows.\n\nChild features have more granular ID columns than the parent. Each parent row\ngenerates multiple child rows with additional ID columns.\n\nConstruct this relationship via [`LineageRelationship.expansion`][metaxy.models.lineage.LineageRelationship.expansion] classmethod.\n\nAttributes:\n    on: Parent ID columns that identify the parent record. Child records with\n        the same parent IDs will share the same upstream provenance.\n        If not specified, will be inferred from the available columns.\n    id_generation_pattern: Optional pattern for generating child IDs.\n        Can be \"sequential\", \"hash\", or a custom pattern. If not specified,\n        the feature's load_input() method is responsible for ID generation.\n\nExample:\n    ```python\n    mx.LineageRelationship.expansion(on=[\"video_id\"], id_generation_pattern=\"sequential\")\n    ```",
      "properties": {
        "type": {
          "const": "1:N",
          "default": "1:N",
          "title": "Type",
          "type": "string"
        },
        "on": {
          "description": "Parent ID columns for grouping. Child records with same parent IDs share provenance. Required for expansion relationships.",
          "items": {
            "type": "string"
          },
          "title": "On",
          "type": "array"
        },
        "id_generation_pattern": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "Pattern for generating child IDs. If None, handled by load_input().",
          "title": "Id Generation Pattern"
        }
      },
      "required": [
        "on"
      ],
      "title": "ExpansionRelationship",
      "type": "object"
    },
    "FeatureKey": {
      "description": "Feature key as a sequence of string parts.\n\nHashable for use as dict keys in registries.\nParts cannot contain forward slashes (/) or double underscores (__).\n\nExample:\n\n    ```py\n    FeatureKey(\"a/b/c\")  # String format\n    # FeatureKey(parts=['a', 'b', 'c'])\n\n    FeatureKey([\"a\", \"b\", \"c\"])  # List format\n    # FeatureKey(parts=['a', 'b', 'c'])\n\n    FeatureKey(FeatureKey([\"a\", \"b\", \"c\"]))  # FeatureKey copy\n    # FeatureKey(parts=['a', 'b', 'c'])\n    ```",
      "items": {
        "type": "string"
      },
      "title": "FeatureKey",
      "type": "array"
    },
    "FieldKey": {
      "description": "Field key as a sequence of string parts.\n\nHashable for use as dict keys in registries.\nParts cannot contain forward slashes (/) or double underscores (__).\n\nExample:\n\n    ```py\n    FieldKey(\"a/b/c\")  # String format\n    # FieldKey(parts=['a', 'b', 'c'])\n\n    FieldKey([\"a\", \"b\", \"c\"])  # List format\n    # FieldKey(parts=['a', 'b', 'c'])\n\n    FieldKey(FieldKey([\"a\", \"b\", \"c\"]))  # FieldKey copy\n    # FieldKey(parts=['a', 'b', 'c'])\n    ```",
      "items": {
        "type": "string"
      },
      "title": "FieldKey",
      "type": "array"
    },
    "FieldsMapping": {
      "description": "Base class for field mapping configurations.\n\nField mappings define how a field automatically resolves its dependencies\nbased on upstream feature fields. This is separate from explicit field\ndependencies which are defined directly.",
      "properties": {
        "mapping": {
          "discriminator": {
            "mapping": {
              "all": "#/$defs/AllFieldsMapping",
              "default": "#/$defs/DefaultFieldsMapping",
              "none": "#/$defs/NoneFieldsMapping",
              "specific": "#/$defs/SpecificFieldsMapping"
            },
            "propertyName": "type"
          },
          "oneOf": [
            {
              "$ref": "#/$defs/AllFieldsMapping"
            },
            {
              "$ref": "#/$defs/SpecificFieldsMapping"
            },
            {
              "$ref": "#/$defs/NoneFieldsMapping"
            },
            {
              "$ref": "#/$defs/DefaultFieldsMapping"
            }
          ],
          "title": "Mapping"
        }
      },
      "required": [
        "mapping"
      ],
      "title": "FieldsMapping",
      "type": "object"
    },
    "IdentityRelationship": {
      "description": "One-to-one relationship where each child row maps to exactly one parent row.\n\nThis is the default relationship type. Parent and child features share the same\nID columns and have the same cardinality.\n\nConstruct this relationship via [`LineageRelationship.identity`][metaxy.models.lineage.LineageRelationship.identity] classmethod.\n\nExample:\n    ```python\n    mx.LineageRelationship.identity()\n    ```",
      "properties": {
        "type": {
          "const": "1:1",
          "default": "1:1",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "IdentityRelationship",
      "type": "object"
    },
    "LineageRelationship": {
      "description": "Wrapper class for lineage relationship configurations with convenient constructors.\n\nThis provides a cleaner API for creating lineage relationships while maintaining\ntype safety through discriminated unions.",
      "properties": {
        "relationship": {
          "discriminator": {
            "mapping": {
              "1:1": "#/$defs/IdentityRelationship",
              "1:N": "#/$defs/ExpansionRelationship",
              "N:1": "#/$defs/AggregationRelationship"
            },
            "propertyName": "type"
          },
          "oneOf": [
            {
              "$ref": "#/$defs/IdentityRelationship"
            },
            {
              "$ref": "#/$defs/AggregationRelationship"
            },
            {
              "$ref": "#/$defs/ExpansionRelationship"
            }
          ],
          "title": "Relationship"
        }
      },
      "required": [
        "relationship"
      ],
      "title": "LineageRelationship",
      "type": "object"
    },
    "NoneFieldsMapping": {
      "description": "Field mapping that never matches any upstream fields.",
      "properties": {
        "type": {
          "const": "none",
          "default": "none",
          "title": "Type",
          "type": "string"
        }
      },
      "title": "NoneFieldsMapping",
      "type": "object"
    },
    "SpecificFieldsMapping": {
      "description": "Field mapping that explicitly depends on specific upstream fields.",
      "properties": {
        "type": {
          "const": "specific",
          "default": "specific",
          "title": "Type",
          "type": "string"
        },
        "mapping": {
          "additionalProperties": {
            "items": {
              "$ref": "#/$defs/FieldKey"
            },
            "type": "array",
            "uniqueItems": true
          },
          "propertyNames": {
            "$ref": "#/$defs/FieldKey"
          },
          "title": "Mapping",
          "type": "object"
        }
      },
      "required": [
        "mapping"
      ],
      "title": "SpecificFieldsMapping",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "description": "Feature dependency specification with optional column selection, renaming, and lineage.\n\nAttributes:\n    feature: The feature key to depend on. Accepts string (\"a/b/c\"), list ([\"a\", \"b\", \"c\"]),\n        FeatureKey instance, or BaseFeature class.\n    select: Optional sequence of column names to select from the upstream feature.\n        By default, all columns are selected. System columns are always selected.\n        Uses post-rename names when `rename` is also specified.\n    rename: Optional mapping of old column names to new names.\n        Applied before column selection.\n    fields_mapping: Optional field mapping configuration for automatic field dependency resolution.\n        When provided, fields without explicit deps will automatically map to matching upstream fields.\n        Defaults to using `[FieldsMapping.default()][metaxy.models.fields_mapping.DefaultFieldsMapping]`.\n    filters: Optional SQL-like filter strings applied to this dependency. Automatically parsed into\n        Narwhals expressions (accessible via the `filters` property). Filters are automatically\n        applied by FeatureDepTransformer after renames during all FeatureDep operations (including\n        resolve_update and version computation).\n    lineage: The lineage relationship between this upstream dependency and the downstream feature.\n        - `LineageRelationship.identity()` (default): 1:1 relationship, same cardinality\n        - `LineageRelationship.aggregation(on=...)`: N:1, multiple upstream rows aggregate to one downstream\n        - `LineageRelationship.expansion(on=...)`: 1:N, one upstream row expands to multiple downstream rows\n    optional: Whether individual samples of the downstream feature can be computed without\n        the corresponding samples of the upstream feature. If upstream samples are missing,\n        they are going to be represented as NULL values in the joined upstream metadata.\n        Defaults to False (required dependency).\n\nExample: Basic Usage\n    ```py\n    # Keep all columns with default field mapping (1:1 lineage)\n    mx.FeatureDep(feature=\"upstream\")\n\n    # Keep only specific columns\n    mx.FeatureDep(feature=\"upstream/feature\", select=(\"col1\", \"col2\"))\n\n    # Rename columns to avoid conflicts\n    mx.FeatureDep(feature=\"upstream/feature\", rename={\"old_name\": \"new_name\"})\n\n    # Combined rename + select: select uses post-rename names\n    mx.FeatureDep(\n        feature=\"upstream/feature\",\n        rename={\"old_name\": \"new_name\"},\n        select=(\"new_name\", \"other_col\"),\n    )\n\n    # SQL filters\n    mx.FeatureDep(feature=\"upstream\", filters=[\"age >= 25\", \"status = 'active'\"])\n\n    # Optional dependency (left join - samples preserved even if no match)\n    mx.FeatureDep(feature=\"enrichment/data\", optional=True)\n    ```\n\nExample: Lineage Relationships\n    ```py\n    from metaxy.models.lineage import LineageRelationship\n\n    # Aggregation: many sensor readings aggregate to one hourly stat\n    mx.FeatureDep(feature=\"sensor_readings\", lineage=LineageRelationship.aggregation(on=[\"sensor_id\", \"hour\"]))\n\n    # Expansion: one video expands to many frames\n    mx.FeatureDep(feature=\"video\", lineage=LineageRelationship.expansion(on=[\"video_id\"]))\n\n    # Mixed lineage: aggregate from one parent, identity from another\n    # In FeatureSpec:\n    deps = [\n        mx.FeatureDep(feature=\"readings\", lineage=LineageRelationship.aggregation(on=[\"sensor_id\"])),\n        mx.FeatureDep(feature=\"sensor_info\", lineage=LineageRelationship.identity()),\n    ]\n    ```",
  "properties": {
    "feature": {
      "$ref": "#/$defs/FeatureKey",
      "description": "Feature key. Accepts a slashed string ('a/b/c'), a sequence of strings, a FeatureKey instance, or a child class of BaseFeature"
    },
    "select": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Select"
    },
    "rename": {
      "anyOf": [
        {
          "additionalProperties": {
            "type": "string"
          },
          "type": "object"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "title": "Rename"
    },
    "fields_mapping": {
      "$ref": "#/$defs/FieldsMapping"
    },
    "filters": {
      "anyOf": [
        {
          "items": {
            "type": "string"
          },
          "type": "array"
        },
        {
          "type": "null"
        }
      ],
      "default": null,
      "description": "SQL-like filter strings applied to this dependency.",
      "title": "Filters"
    },
    "lineage": {
      "$ref": "#/$defs/LineageRelationship",
      "description": "Lineage relationship between this upstream dependency and the downstream feature."
    },
    "optional": {
      "default": false,
      "description": "Whether individual samples of the downstream feature can be computed without the corresponding samples of the upstream feature. If upstream samples are missing, they are going to be represented as NULL values in the joined upstream metadata.",
      "title": "Optional",
      "type": "boolean"
    }
  },
  "required": [
    "feature"
  ],
  "title": "FeatureDep",
  "type": "object"
}

Config:

  • extra: forbid

Fields:

Validators:

  • validate_select_uses_post_rename_names
Source code in src/metaxy/models/feature_spec.py
def __init__(
    self,
    *,
    feature: str | Sequence[str] | FeatureKey | type[BaseFeature],
    select: tuple[str, ...] | None = None,
    rename: dict[str, str] | None = None,
    fields_mapping: FieldsMapping | None = None,
    filters: Sequence[str] | None = None,
    lineage: LineageRelationship | None = None,
    optional: bool = False,
) -> None: ...

Attributes

metaxy.FeatureDep.sql_filters pydantic-field

sql_filters: tuple[str, ...] | None = None

SQL-like filter strings applied to this dependency.

metaxy.FeatureDep.lineage pydantic-field

Lineage relationship between this upstream dependency and the downstream feature.

metaxy.FeatureDep.optional pydantic-field

optional: bool = False

Whether individual samples of the downstream feature can be computed without the corresponding samples of the upstream feature. If upstream samples are missing, they are going to be represented as NULL values in the joined upstream metadata.

metaxy.FeatureDep.filters cached property

filters: tuple[Expr, ...]

Parse sql_filters into Narwhals expressions.

Functions

metaxy.FeatureDep.table_name

table_name() -> str

Get SQL-like table name for this feature spec.

Source code in src/metaxy/models/feature_spec.py
def table_name(self) -> str:
    """Get SQL-like table name for this feature spec."""
    return self.feature.table_name