fix(curate): fail fast on list-typed label_column (regression) by mdressman · Pull Request #7 · mdressman/dataset-scout

mdressman · 2026-05-11T23:31:01Z

What & why

Catches a regression where the strategy assessor sometimes infers label_column from the column name alone — e.g. HarmBench's positive column, which is actually a list<string> of completion examples, not a scalar label. Before this change, curate silently wrote 0 rows with only a generic recipe column mismatch note in report.md.

After this change, the component is skipped with a clear label_column_type_mismatch failure category surfaced in both lockfile.json and report.md, with the observed type, a sample value, and an actionable hint.

How it works

Probes the first 20 rows of every component whose recipe references a text_column or label_column.
A row is bad if its label_column value is missing, None, a container type (list/dict/tuple/set), or a scalar whose str(...) form isn't a key in label_value_map.
If >50% of probed rows are bad, raises LabelColumnTypeMismatch (subclass of DatasetScoutError) with the column name, observed type, sample value, and bad fraction. _classify_component_failure routes it to the new label_column_type_mismatch category with a recipe-fixing hint.
The probe buffer is re-yielded into the row loop, so no rows are dropped on the happy path.

Relationship to #3

Complementary, not redundant. #3 catches column-name hallucinations (column doesn't exist). This catches value-type hallucinations (column exists but values are wrong type). Both validations now run off the same probe buffer:

Column-name existence check on probe[0] (from FM1+FM5 mitigations: column verification, schema validation, label distribution warnings; Performance Optimizations #3)
Value-type check across the full probe (this PR)

Rebased the original commit cleanly onto the post-#3 main and merged the two validations into a single probe pass.

Behaviour change?

Yes — the user-visible change is 0 silent rows → skipped component with explicit failure category and recipe-fix hint.

How tested

uv run pytest -m unit — 570 passed locally
uv run ruff check . clean
uv run mypy clean
New regression test mirrors the HarmBench failure mode (test_curate_skips_component_when_label_column_is_list_typed)

Honest-limits impact

None — this narrows a known honest-limit (assessor mis-inferring schema). No README change needed beyond the natural fewer silent failures improvement.

Strategy assessor previously inferred label_column=positive for HarmBench-style datasets where `positive` is a list<string> column of completion examples, not a string label. curate then silently wrote 0 rows. Adds per-component-skip with a clear `label_column_type_mismatch` failure (surfaced in lockfile + report.md, mirroring the existing failure-classification pattern) and a regression test mirroring the failure mode. Bug reproduces on main: a recipe with a list-typed label_column materialises 0 rows silently with only a generic 'usually a recipe column mismatch' note in report.md. After this change, the component is skipped with category=label_column_type_mismatch citing the observed type and a sample value. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(curate): fail fast on list-typed label_column (regression)#7

fix(curate): fail fast on list-typed label_column (regression)#7
mdressman wants to merge 1 commit into
mainfrom
users/mdressman/curate-label-column-validation

mdressman commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mdressman commented May 11, 2026

What & why

How it works

Relationship to #3

Behaviour change?

How tested

Honest-limits impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant