Skip to content

CD001-CTF-001: CTF Detector Unit Tests #137

@steadhac

Description

@steadhac

CD001-CTF-001: CTF Detector Unit Tests

Parent: Unit tests creation for CD001 #27

Description

Add a full unit test suite for the CTF detector layer — definition loading, registry, detector primitives, and all six detector implementations. Tests follow the established pattern with Title, Basically question, Steps, Expected Results, and Impact sections. Bug-exposing tests are included for each confirmed production defect.


New test files

tests/unit/ctf/test_definition_loader.py

Validates challenge YAML loading, schema enforcement, and detector instantiation from config.

Test ID Title
DEF-LDR-001 No challenges dir returns empty
DEF-LDR-002 Loads challenge from YAML
DEF-LDR-003 Bad YAML is skipped
DEF-LDR-004 Multiple challenge files
DEF-LDR-005 No badges dir returns empty
DEF-LDR-006 Loads badge from YAML
DEF-LDR-007 load_all returns combined dict
DEF-LDR-008 load_all with empty dirs
DEF-LDR-009 load_challenge_yaml returns schema
DEF-LDR-010 load_badge_yaml returns schema
DEF-LDR-011 Challenge validation error propagates
DEF-LDR-012 Challenge with all optional fields
DEF-LDR-013 SQLite upsert executes
DEF-LDR-014 PostgreSQL upsert executes
DEF-LDR-015 Unknown dialect uses merge
DEF-LDR-016 Upsert badge SQLite
DEF-LDR-017 get_loader returns instance
DEF-LDR-018 get_loader is singleton

tests/unit/ctf/test_detector_registry.py

Covers @register_detector decorator, duplicate registration guards, and registry lookup behaviour.

Test ID Title
REG-DEC-001 Decorated class is identical to original
REG-DEC-002 Subclass-only method accessible on instance
REG-DEC-003 Return annotation uses TypeVar not BaseDetector

tests/unit/ctf/test_detector_primitives.py

Full coverage of the detector building blocks.

PatternMatchDetector + helpers — PRM-PAT-001 through PRM-PAT-028

Test ID Title
PRM-PAT-001 Empty text returns False
PRM-PAT-002 Empty pattern returns False
PRM-PAT-003 Case-insensitive literal match
PRM-PAT-004 Case-sensitive no match
PRM-PAT-005 Case-sensitive match
PRM-PAT-006 Regex match
PRM-PAT-007 Invalid regex falls back to literal
PRM-PAT-008 Context in middle
PRM-PAT-009 Context at start
PRM-PAT-010 Context at end
PRM-PAT-011 String pattern is literal
PRM-PAT-012 Dict with regex key
PRM-PAT-013 Dict without regex key
PRM-PAT-014 Empty text returns no matches
PRM-PAT-015 Multiple patterns returns all matches
PRM-PAT-016 No match returns empty
PRM-PAT-017 Regex pattern in list
PRM-PAT-018 Config missing field raises
PRM-PAT-019 Config missing patterns raises
PRM-PAT-020 Empty patterns raises
PRM-PAT-021 Invalid match_mode raises
PRM-PAT-022 Field missing from event
PRM-PAT-023 Non-string field coerced
PRM-PAT-024 any mode — one match sufficient
PRM-PAT-025 all mode — requires all matches
PRM-PAT-026 all mode — all match
PRM-PAT-027 No match returns not detected
PRM-PAT-028 [BUG #129] Valid regex non-match must not fall through to literal search

ToolCallDetector + _check_condition operators — PRM-TOL-001 through PRM-TOL-019

Test ID Title
PRM-TOL-001 Missing tool_name raises
PRM-TOL-002 Wrong tool name
PRM-TOL-003 Tool name match detected
PRM-TOL-004 require_success skips non-success
PRM-TOL-005 require_success passes on success event
PRM-TOL-006 JSON string tool args parsed
PRM-TOL-007 Invalid JSON tool args not detected
PRM-TOL-008 Parameter condition failed
PRM-TOL-009 Operator gt
PRM-TOL-010 Operator gte
PRM-TOL-011 Operator lt/lte
PRM-TOL-012 Operator in/not_in
PRM-TOL-013 Operator contains
PRM-TOL-014 Operator exists
PRM-TOL-015 Operator matches_regex
PRM-TOL-016 Direct value comparison
PRM-TOL-017 None actual with operator returns False
PRM-TOL-018 [BUG #130] contains with uppercase expected never matches
PRM-TOL-019 [BUG #131] gt/lte on non-numeric string must not crash

PIIDetector + scan_pii — PRM-PII-001 through PRM-PII-012

Test ID Title
PRM-PII-001 SSN detected
PRM-PII-002 Email detected
PRM-PII-003 No PII returns empty
PRM-PII-004 Empty text returns empty
PRM-PII-005 Category filter
PRM-PII-006 EIN/TIN detected
PRM-PII-007 Match has required attributes
PRM-PII-007b to_dict returns expected keys
PRM-PII-008 Missing fields raises
PRM-PII-009 Field not in event
PRM-PII-010 PII in field detected
PRM-PII-011 Clean field not detected
PRM-PII-012 [BUG #127] response_content list format extracted as text

PromptInjectionDetector — PRM-INJ-001

Test ID Title
PRM-INJ-001 [BUG #128] Multimodal content with no text items returns None without crash

tests/unit/ctf/test_detectors.py

Implementation-level tests for all six detectors.

InvoiceThresholdBypassDetector — DET-THR-001 through 009

Test ID Title
DET-THR-001 Non-approval returns not detected
DET-THR-002 Missing invoice_id
DET-THR-003 Missing namespace
DET-THR-004 Invoice not found
DET-THR-005 Invoice status not approved
DET-THR-006 Amount within limit
DET-THR-007 Threshold bypass detected
DET-THR-008 Uses default threshold
DET-THR-009 Relevant event types

InvoiceTrustOverrideDetector — DET-TRU-001 through 008

Test ID Title
DET-TRU-001 Not approval
DET-TRU-002 Missing fields
DET-TRU-003 Invoice not found
DET-TRU-004 Wrong status
DET-TRU-005 Amount below minimum
DET-TRU-006 Vendor not found
DET-TRU-007 Vendor not low trust
DET-TRU-008 Trust override detected

PolicyBypassNonCompliantDetector — DET-POL-001 through 008

Test ID Title
DET-POL-001 Not approval
DET-POL-002 No vendor_id
DET-POL-003 No namespace
DET-POL-004 Vendor not found
DET-POL-005 Vendor category mismatch
DET-POL-006 No prohibited keywords
DET-POL-007 Policy bypass detected
DET-POL-008 Custom keywords

SystemPromptLeakDetector — DET-SPL-001 through 010

Test ID Title
DET-SPL-001 No system prompt in event
DET-SPL-002 No LLM output
DET-SPL-003 Judge raises returns not detected
DET-SPL-004 Score below threshold
DET-SPL-005 Score above threshold detected
DET-SPL-006 Tool call text extracted
DET-SPL-007 agent_name scopes event types
DET-SPL-008 No agent_name uses wildcard
DET-SPL-009 Multiple system messages concatenated
DET-SPL-010 [BUG #135] assistant message with content=null must not crash

VendorRiskDownplayDetector — DET-VRD-001 through 008

Test ID Title
DET-VRD-001 Not approval
DET-VRD-002 Missing vendor_id
DET-VRD-003 Vendor not found
DET-VRD-004 Vendor not active
DET-VRD-005 Vendor risk not low
DET-VRD-006 No services
DET-VRD-007 Judge score below threshold
DET-VRD-008 Risk downplay detected

VendorStatusFlipDetector — DET-VSF-001 through 008

Test ID Title
DET-VSF-001 Not a flip
DET-VSF-002 Wrong direction flip
DET-VSF-003 Missing vendor_id
DET-VSF-004 Vendor not found
DET-VSF-005 No agent notes
DET-VSF-006 No prohibited indicators
DET-VSF-007 Status flip detected
DET-VSF-008 Custom indicators

Config validation — DET-CFG-001 through 009

Test ID Title
DET-CFG-001 Threshold must be positive
DET-CFG-002 min_amount must be positive
DET-CFG-003 prohibited_keywords must be list
DET-CFG-004 SystemPromptLeak requires judge_prompt
DET-CFG-005 VendorRiskDownplay requires judge_prompt
DET-CFG-006 prohibited_indicators must be list
DET-CFG-007 min_confidence range
DET-CFG-008 [BUG #125] max_invoice_amount=None bypasses None guard
DET-CFG-009 [BUG #126] min_amount=None bypasses None guard

Negative / edge case tests

Test ID Title
DET-THR-NEG-001 [BUG #117] Non-dict config raises AttributeError instead of TypeError
DET-THR-NEG-002 config=None is valid and normalizes to {}
DET-POL-NEG-001 [BUG #119] prohibited_keywords=None raises ValueError
DET-POL-NEG-002 prohibited_keywords=int raises ValueError
DET-SPL-NEG-001 Missing required event fields
DET-SPL-NEG-002 Invalid min_confidence type
DET-SPL-NEG-003 [BUG #122] Empty judge_system_prompt accepted at init, crashes at runtime
DET-VRD-NEG-001 [BUG #123] Empty judge_system_prompt accepted at init, crashes at runtime
DET-VSF-NEG-001 prohibited_indicators=None raises ValueError
DET-VSF-NEG-002 prohibited_indicators=int raises ValueError
DET-VSF-NEG-003 [BUG #124] Substring match causes false positive

Bug-exposing tests

Test ID GitHub Issue
PRM-PAT-028 #129
PRM-TOL-018 #130
PRM-TOL-019 #131
PRM-PII-012 #127
PRM-INJ-001 #128
DET-SPL-010 #135
DET-CFG-008 #125
DET-CFG-009 #126
DET-SPL-NEG-003 #122
DET-VRD-NEG-001 #123
DET-VSF-NEG-003 #124
DET-POL-NEG-001 #119
DET-THR-NEG-001 #117

Acceptance criteria

  • pytest tests/unit/ctf/ -m unit -v collects and executes all tests in test_definition_loader.py, test_detector_registry.py, test_detector_primitives.py, and test_detectors.py
  • Bug-exposing tests (marked above) fail until their corresponding fixes are applied — this is expected and documents known defects
  • No regressions in the existing tests/unit/ suite

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions