Skip to content

fix(yara): tighten CCCS metadata regexes (SEC-6 / SEC-7)#145

Closed
rolandpg wants to merge 1 commit into
masterfrom
fix/cccs-regexes-sec6-sec7
Closed

fix(yara): tighten CCCS metadata regexes (SEC-6 / SEC-7)#145
rolandpg wants to merge 1 commit into
masterfrom
fix/cccs-regexes-sec6-sec7

Conversation

@rolandpg
Copy link
Copy Markdown
Owner

Tightens CCCS metadata validation per security review findings SEC-6 and SEC-7.

  • SEC-6: _regex_match now uses p.fullmatch() instead of p.search() to prevent multiline injection bypass
  • SEC-7: Author regex tightened to safe printable ASCII range; removed permissive ^.*$ catch-all fallback
  • New test file tests/test_cccs_metadata.py with negative test cases for both findings

Closes #73.

SEC-6: _regex_match now uses p.fullmatch() instead of p.search()
to prevent multiline injection bypass in sharing/category/status
regexes. TLP:WHITE\nid: hostile_id no longer matches.

SEC-7: Author regex tightened to safe printable ASCII range.
Removed permissive ^.*$ catch-all fallback.

Added targeted negative test cases for both findings.
Closes #73.
Copilot AI review requested due to automatic review settings April 28, 2026 04:17
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens CCCS YARA metadata validation in zettelforge.yara.cccs_metadata to address security review findings SEC-6 (multiline/substring bypass) and SEC-7 (overly permissive author pattern), and adds targeted regression tests.

Changes:

  • Switch _regex_match from re.search() to re.fullmatch() to prevent multiline/substring bypasses (SEC-6).
  • Tighten _AUTHOR_REGEX to a restricted safe-character set and remove the permissive catch-all fallback (SEC-7).
  • Add tests/test_cccs_metadata.py with negative test cases covering both findings and end-to-end tier behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/zettelforge/yara/cccs_metadata.py Tightens regex matching semantics (fullmatch) and restricts author metadata format.
tests/test_cccs_metadata.py Adds regression tests for multiline injection and author regex tightening across tiers.
Comments suppressed due to low confidence (1)

src/zettelforge/yara/cccs_metadata.py:90

  • Several single-value validators use re.Pattern.match() with ^...$-anchored regexes (_AUTHOR_REGEX/_VERSION_REGEX/_DATE_REGEX/_UUID_REGEX/_FINGERPRINT_REGEX/_MITRE_ATT_REGEX). In Python, $ can match before a terminal newline, so inputs like "1.0\n" or "CCCS\n" can still pass .match(). To fully close the newline edge case (and to align with SEC-6’s full-match intent), switch these checks to .fullmatch() (or change the patterns to use \Z instead of $).
# SEC-7: Tightened from ^[a-zA-Z]+\@[A-Z]+$|^[A-Z\s._\-]+$|^.*$ to
# only permit safe printable ASCII characters.
_AUTHOR_REGEX = re.compile(r"^[A-Za-z0-9_.@+-]+$")
_VERSION_REGEX = re.compile(r"^\d+\.\d+$")
_DATE_REGEX = re.compile(r"^\d{4}-\d{2}-\d{2}$")
_UUID_REGEX = re.compile(r"^[0-9A-Za-z]{16,}$")  # base62 UUID, generous lower bound.
_FINGERPRINT_REGEX = re.compile(r"^[a-fA-F0-9]{40,64}$")  # SHA-1 / SHA-256-ish.
_MITRE_ATT_REGEX = re.compile(r"^(TA|T|M|G|S)\d{4}(\.\d{3})?$")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# From CCCS_YARA.yml: author's own regexExpression.
_AUTHOR_REGEX = re.compile(r"^[a-zA-Z]+\@[A-Z]+$|^[A-Z\s._\-]+$|^.*$")
# SEC-7: Tightened from ^[a-zA-Z]+\@[A-Z]+$|^[A-Z\s._\-]+$|^.*$ to
# only permit safe printable ASCII characters.
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the author regex permits “safe printable ASCII characters”, but the actual character class is a much narrower subset ([A-Za-z0-9_.@+-]). Consider updating the wording to reflect the exact allowed set (or to say “safe ASCII subset”) to avoid over-claiming what’s permitted.

Suggested change
# only permit safe printable ASCII characters.
# only permit a safe ASCII subset: letters, digits, and _ . @ + -.

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +24
_MALWARE_TYPE_REGEXES,
_ACTOR_TYPE_REGEXES,
_HASH_REGEXES,
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import block includes several names that are never referenced in this test module (_MALWARE_TYPE_REGEXES, _ACTOR_TYPE_REGEXES, _HASH_REGEXES). With Ruff's F401 enabled for tests, this will fail lint; either remove these imports or add assertions that exercise them.

Suggested change
_MALWARE_TYPE_REGEXES,
_ACTOR_TYPE_REGEXES,
_HASH_REGEXES,

Copilot uses AI. Check for mistakes.
Comment on lines +133 to +137
NEGATIVE = ["", "1", "1.0.0", "1.", ".1", "a.b", "1.0\n"]

def test_positive(self) -> None:
for v in self.POSITIVE:
assert _VERSION_REGEX.match(v), f"Version should accept: {v!r}"
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests use regex.match(...) with patterns ending in $ and expect values with a trailing newline (e.g. "1.0\n") to be rejected. In Python re, $ can match before a final newline, so .match() will still succeed here. Use .fullmatch() in these tests (and ideally in the validator) or change the regex to end with \Z so trailing newlines can’t slip through.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7d4c07e9f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


def test_negative(self) -> None:
for v in self.NEGATIVE:
assert not _VERSION_REGEX.match(v), f"Version should reject: {v!r}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use fullmatch for newline-rejection regex tests

These negative cases are asserting that strings like "1.0\n" do not match, but the test uses re.Pattern.match() and Python's $ anchor matches before a trailing newline, so ^...$ still passes for newline-terminated inputs. This makes the newly added SEC-6/SEC-7 test suite fail deterministically (same issue repeats in Date/UUID/Fingerprint/MITRE classes), so the commit cannot validate as written unless these checks switch to fullmatch() (or the newline cases are removed).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Closing in favor of #146, which is the superset: same SEC-6/SEC-7 CCCS regex hardening, plus the LLM token-budget plumbing (closes #125) and the bulk-ingest path (closes #72). Closes #73 there as well.

If #146 has to be split for any reason, we can re-open this — but right now keeping both open just means whichever lands second will conflict on yara/cccs_metadata.py and tests/test_cccs_metadata.py.


Generated by Claude Code

@rolandpg rolandpg closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tighten CCCS metadata regexes (SEC-6 / SEC-7)

2 participants