fix(yara): tighten CCCS metadata regexes (SEC-6 / SEC-7) by rolandpg · Pull Request #145 · rolandpg/zettelforge

rolandpg · 2026-04-28T04:17:03Z

Tightens CCCS metadata validation per security review findings SEC-6 and SEC-7.

SEC-6: _regex_match now uses p.fullmatch() instead of p.search() to prevent multiline injection bypass
SEC-7: Author regex tightened to safe printable ASCII range; removed permissive ^.*$ catch-all fallback
New test file tests/test_cccs_metadata.py with negative test cases for both findings

Closes #73.

SEC-6: _regex_match now uses p.fullmatch() instead of p.search() to prevent multiline injection bypass in sharing/category/status regexes. TLP:WHITE\nid: hostile_id no longer matches. SEC-7: Author regex tightened to safe printable ASCII range. Removed permissive ^.*$ catch-all fallback. Added targeted negative test cases for both findings. Closes #73.

Copilot

Pull request overview

This PR tightens CCCS YARA metadata validation in zettelforge.yara.cccs_metadata to address security review findings SEC-6 (multiline/substring bypass) and SEC-7 (overly permissive author pattern), and adds targeted regression tests.

Changes:

Switch _regex_match from re.search() to re.fullmatch() to prevent multiline/substring bypasses (SEC-6).
Tighten _AUTHOR_REGEX to a restricted safe-character set and remove the permissive catch-all fallback (SEC-7).
Add tests/test_cccs_metadata.py with negative test cases covering both findings and end-to-end tier behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/zettelforge/yara/cccs_metadata.py`	Tightens regex matching semantics (`fullmatch`) and restricts author metadata format.
`tests/test_cccs_metadata.py`	Adds regression tests for multiline injection and author regex tightening across tiers.

Comments suppressed due to low confidence (1)

src/zettelforge/yara/cccs_metadata.py:90

Several single-value validators use re.Pattern.match() with ^...$-anchored regexes (_AUTHOR_REGEX/_VERSION_REGEX/_DATE_REGEX/_UUID_REGEX/_FINGERPRINT_REGEX/_MITRE_ATT_REGEX). In Python, $ can match before a terminal newline, so inputs like "1.0\n" or "CCCS\n" can still pass .match(). To fully close the newline edge case (and to align with SEC-6’s full-match intent), switch these checks to .fullmatch() (or change the patterns to use \Z instead of $).

# SEC-7: Tightened from ^[a-zA-Z]+\@[A-Z]+$|^[A-Z\s._\-]+$|^.*$ to
# only permit safe printable ASCII characters.
_AUTHOR_REGEX = re.compile(r"^[A-Za-z0-9_.@+-]+$")
_VERSION_REGEX = re.compile(r"^\d+\.\d+$")
_DATE_REGEX = re.compile(r"^\d{4}-\d{2}-\d{2}$")
_UUID_REGEX = re.compile(r"^[0-9A-Za-z]{16,}$")  # base62 UUID, generous lower bound.
_FINGERPRINT_REGEX = re.compile(r"^[a-fA-F0-9]{40,64}$")  # SHA-1 / SHA-256-ish.
_MITRE_ATT_REGEX = re.compile(r"^(TA|T|M|G|S)\d{4}(\.\d{3})?$")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T04:20:34Z

 # From CCCS_YARA.yml: author's own regexExpression.
-_AUTHOR_REGEX = re.compile(r"^[a-zA-Z]+\@[A-Z]+$|^[A-Z\s._\-]+$|^.*$")
+# SEC-7: Tightened from ^[a-zA-Z]+\@[A-Z]+$|^[A-Z\s._\-]+$|^.*$ to
+# only permit safe printable ASCII characters.


The comment says the author regex permits “safe printable ASCII characters”, but the actual character class is a much narrower subset ([A-Za-z0-9_.@+-]). Consider updating the wording to reflect the exact allowed set (or to say “safe ASCII subset”) to avoid over-claiming what’s permitted.

Suggested change

# only permit safe printable ASCII characters.

# only permit a safe ASCII subset: letters, digits, and _ . @ + -.

Copilot · 2026-04-28T04:20:34Z

+    _MALWARE_TYPE_REGEXES,
+    _ACTOR_TYPE_REGEXES,
+    _HASH_REGEXES,


This import block includes several names that are never referenced in this test module (_MALWARE_TYPE_REGEXES, _ACTOR_TYPE_REGEXES, _HASH_REGEXES). With Ruff's F401 enabled for tests, this will fail lint; either remove these imports or add assertions that exercise them.

Suggested change

_MALWARE_TYPE_REGEXES,

_ACTOR_TYPE_REGEXES,

_HASH_REGEXES,

Copilot · 2026-04-28T04:20:34Z

+    NEGATIVE = ["", "1", "1.0.0", "1.", ".1", "a.b", "1.0\n"]
+
+    def test_positive(self) -> None:
+        for v in self.POSITIVE:
+            assert _VERSION_REGEX.match(v), f"Version should accept: {v!r}"


These tests use regex.match(...) with patterns ending in $ and expect values with a trailing newline (e.g. "1.0\n") to be rejected. In Python re, $ can match before a final newline, so .match() will still succeed here. Use .fullmatch() in these tests (and ideally in the validator) or change the regex to end with \Z so trailing newlines can’t slip through.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7d4c07e9f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-28T04:21:09Z

+
+    def test_negative(self) -> None:
+        for v in self.NEGATIVE:
+            assert not _VERSION_REGEX.match(v), f"Version should reject: {v!r}"


Use fullmatch for newline-rejection regex tests

These negative cases are asserting that strings like "1.0\n" do not match, but the test uses re.Pattern.match() and Python's $ anchor matches before a trailing newline, so ^...$ still passes for newline-terminated inputs. This makes the newly added SEC-6/SEC-7 test suite fail deterministically (same issue repeats in Date/UUID/Fingerprint/MITRE classes), so the commit cannot validate as written unless these checks switch to fullmatch() (or the newline cases are removed).

Useful? React with 👍 / 👎.

rolandpg · 2026-04-29T00:47:34Z

Closing in favor of #146, which is the superset: same SEC-6/SEC-7 CCCS regex hardening, plus the LLM token-budget plumbing (closes #125) and the bulk-ingest path (closes #72). Closes #73 there as well.

If #146 has to be split for any reason, we can re-open this — but right now keeping both open just means whichever lands second will conflict on yara/cccs_metadata.py and tests/test_cccs_metadata.py.

Generated by Claude Code

Copilot AI review requested due to automatic review settings April 28, 2026 04:17

Copilot started reviewing on behalf of rolandpg April 28, 2026 04:17 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 28, 2026

View reviewed changes

rolandpg closed this Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(yara): tighten CCCS metadata regexes (SEC-6 / SEC-7)#145

fix(yara): tighten CCCS metadata regexes (SEC-6 / SEC-7)#145
rolandpg wants to merge 1 commit into
masterfrom
fix/cccs-regexes-sec6-sec7

rolandpg commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Uh oh!

rolandpg commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# only permit safe printable ASCII characters.
	# only permit a safe ASCII subset: letters, digits, and _ . @ + -.

Conversation

rolandpg commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

rolandpg commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants