Skip to content

Add YARA rule reference extraction #46

@rolandpg

Description

@rolandpg

Summary

Add regex pattern to extract YARA rule references from CTI text.

Context

YARA rules are the standard for malware detection signatures. CTI reports reference them by name (e.g., rule apt28_beacon { ... }, YARA: win_cobalt_strike).

  • File to edit: src/zettelforge/entity_indexer.py
  • Tests: tests/test_basic.py::TestEntityExtractor

Acceptance Criteria

  • Add yara_rule to REGEX_PATTERNS in entity_indexer.py
  • Pattern matches: rule rule_name, YARA: rule_name, yara:rule_name
  • Does NOT match generic English uses of "rule" (e.g., "the rule of law") — require at least one underscore or alphanumeric+underscore pattern in the rule name
  • Add yara_rule to ENTITY_TYPES
  • At least 3 test cases:
    • rule apt28_beacon extracts as yara_rule
    • YARA: win_cobalt_strike_loader extracts as yara_rule
    • the rule of law does NOT extract
  • All existing tests pass: pytest tests/test_basic.py -v

Example Input/Output

ext = EntityExtractor()
result = ext.extract_all("Detected by rule apt28_beacon")
assert "apt28_beacon" in result["yara_rule"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions