-
Notifications
You must be signed in to change notification settings - Fork 5
Scenario validator does not check literal option compatibility in 'when' expressions #118
Description
Summary
Scenario validation checks expression syntax and unknown attribute names, but does not validate that compared string literals are valid categorical options for the referenced attribute.
This allows case/value mismatches (e.g., urban_rural == 'urban' when option is Urban) to pass validation and silently no-op at runtime.
Why This Matters
This is a major source of "looks valid but behavior is wrong" bugs. Incorrect conditions don’t throw hard errors during simulation; they just fail to match and flatten dynamics.
Current Behavior (Code)
In /Users/adithyasrinivasan/Projects/extropy/extropy/scenario/validator.py:
- syntax check:
validate_expression_syntax(...) - reference check:
extract_names_from_expression(...)vs known attrs - no check that literals used in comparisons are present in attribute option domains
By contrast, population semantic validation already has this concept via AST comparison extraction.
Proposed Fix
- Add AST-based comparison extraction for scenario
whenclauses:
seed_exposure.rules[].when- timeline exposure rules
timeline[].exposure_rules[].when(if present) spread.share_modifiers[].when
- For each
(attribute, compared_string_values)pair:
- if attribute is categorical with known options, require literal values to match one of those options
- invalid literals should be
ERROR(not warning) because rule is effectively broken
-
Handle list membership checks (
in [...]) and single comparisons (==,!=). -
Add tests covering:
- exact match pass
- case mismatch fail
- nonexistent value fail
- non-categorical attributes skipped
Acceptance Criteria
- Invalid categorical literals in scenario conditions are caught at validation time.
- Common mismatch classes (case/style/legacy tokens) no longer survive to runtime.
- Validation message includes valid option set for quick fix.
Pipeline Impact
Reduces recursive debug loops by catching scenario-domain mismatches before sampling/simulation.