Skip to content

fix(pure-magic): apply binary/text test gate at rule entry only#45

Draft
tnaroska wants to merge 1 commit into
qjerome:mainfrom
tnaroska:fix/rtf-binary-text-skip-only-toplevel
Draft

fix(pure-magic): apply binary/text test gate at rule entry only#45
tnaroska wants to merge 1 commit into
qjerome:mainfrom
tnaroska:fix/rtf-binary-text-skip-only-toplevel

Conversation

@tnaroska

Copy link
Copy Markdown
Contributor

Summary

Fix a binary/text gating bug that silently stripped any rule whose message lives on a scalar/float/pstring/string16/indirect child — most visibly magdir/rtf, which carries its message on a level-2 ubyte qualifier and was reported as plain ASCII text instead of Rich Text Format data.

Problem

Match::matches (pure-magic/src/lib.rs) skipped tests with is_only_binary / is_only_text mismatching the stream kind on every recursion. Once the top-level string {\rtf matched on a text-classified stream, its >5 ubyte and >>5 ubyte children were dropped because ubyte is binary-only — so the rule's message (carried on the level-2 child) never fired.

Fix

Hoist the gate into EntryNode::matches and run it only when self.root. This mirrors upstream libmagic match() in src/softmagic.c:243-262, where the STRING_BINTEST / STRING_TEXTTEST check fires inside the top-level for (magindex = 0; ...) loop and goto flush skips sub-tests — i.e. the binary/text decision is taken once at the outer match loop, not re-applied to children. Top-level perf filtering across the >300 rule files is preserved.

Tests

Two regression tests in pure-magic/src/lib.rs:

  • test_string_parent_with_scalar_children_on_text_stream — RTF-shaped rule emits its message on a text stream (the bug).
  • test_rtf_droid_skip_still_rejects — the upstream level-1 >5 ubyte !0xAB qualifier (added specifically to skip DROID fmt-355-signature-id-522.rtf) still rejects that input now that children run on text streams.

Test plan

  • cargo test -p pure-magic — passes locally (139 tests).
  • Maintainer review.

🤖 Generated with Claude Code

The skip for is_only_binary/is_only_text in Match::matches ran on every
recursion, so child tests of an already-matched parent were dropped on
text streams. This silently stripped any rule whose message lived on a
scalar/float/pstring/string16/indirect child — most visibly the RTF rule
(magic-db/src/magdir/rtf), which carries its message on a level-2 ubyte
qualifier and was reported as plain "ASCII text".

Hoist the gate into EntryNode::matches and run it only when self.root,
mirroring upstream libmagic's softmagic.c::match() semantics: the
binary/text decision is taken once at the outer match loop, not
re-applied to children. Top-level perf filtering for the >300 rule
files is preserved.

Add two regression tests covering (a) the RTF-shaped rule emitting its
message on a text stream and (b) the upstream DROID-skip qualifier
still rejecting fmt-355-signature-id-522.rtf.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant