Skip to content

Simplify regex patterns in magic files to improve performance#10

Merged
sachintu47 merged 3 commits intomainfrom
regexfix
Apr 14, 2026
Merged

Simplify regex patterns in magic files to improve performance#10
sachintu47 merged 3 commits intomainfrom
regexfix

Conversation

@sachintu47
Copy link
Copy Markdown
Member

No description provided.

@sachintu47
Copy link
Copy Markdown
Member Author

augment review

@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Apr 10, 2026

🤖 Augment PR Summary

Summary: This PR updates the file/libmagic port to reduce regex backtracking risk in upstream magic definitions and to fix GUID handling on big-endian targets.

Changes:

  • Trimmed the STABLE build dependency list in buildenv (now only make and zlib).
  • Added patches/regex.patch to simplify several magic/Magdir/* regex patterns (C++, execline, perl, python, ruby) to avoid redundant (...)*-style constructs.
  • Added patches/endianfix.patch to make GUID parsing/printing independent of host endianness.
  • Centralized byte-order helper macros into src/file.h and removed duplicated definitions from src/softmagic.c.

Technical Notes: The endian fix writes GUID fields in a defined little-endian byte layout and prints them via explicit byte-order reconstruction, which should improve correctness on big-endian platforms.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread buildenv Outdated
## Required IF ZOPEN_BUILD_LINE='STABLE'
export ZOPEN_STABLE_URL="http://ftp.astron.com/pub/file/file-${FILE_VERSION}.tar.gz" # Specify the stable build URL (either git or tarball)
export ZOPEN_STABLE_DEPS="autoconf automake libtool make check_python zlib gzip tar perl m4 openssl" # Specify the stable build dependencies.
export ZOPEN_STABLE_DEPS="make check_python zlib perl" # Specify the stable build dependencies.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildenv:19 reduces ZOPEN_STABLE_DEPS to make check_python zlib perl, but the stable source is a .tar.gz at ZOPEN_STABLE_URL. If the zopen build environment doesn’t always provide tar/gzip (and any download tooling) implicitly, this change could make stable builds fail during fetch/extract.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Comment thread patches/regex.patch
# class name[(base classes,)]: [pass]
0 search/8192 class
->0 regex \^class\ [_[:alpha:]]+(\\(.*\\))?(\ )*:([\ \t]+pass)?$ Python script text executable
+>0 regex \^class\ [_[:alpha:]]+(\([^)]*\))?(\ )*:([\ \t]+pass)?$ Python script text executable
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patches/regex.patch:49 changes the Python class regex from a permissive \(.*\) to \([^)]*\), which will no longer match valid bases containing ) (e.g., class A(factory()):). That’s a behavior change in type detection, so it may be worth confirming the reduced coverage is acceptable for the performance win.

Other locations where this applies: patches/regex.patch:10

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@sachintu47
Copy link
Copy Markdown
Member Author

augment review

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread patches/regex.patch
# But class alone is reduced to avoid beating php (Jens Schleusener)
0 search/8192 class
->0 regex \^[[:space:]]*class[[:space:]]+[[:digit:][:alpha:]:_]+[[:space:]]*\\{(.*[\n]*)*\\}(;)?$ C++ source text
+>0 regex \^[[:space:]]*class[[:space:]]+[[:digit:][:alpha:]:_]+[[:space:]]*\\{[^}]*\\}(;)?$ C++ source text
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patches/regex.patch:10: The new C++ pattern uses [^}]*, but libmagic compiles regexes with REG_NEWLINE, where a negated character class won’t match newlines; this may stop matching typical multi-line class { ... } blocks that the previous (...[\n]*...)* form handled. Consider confirming this doesn’t noticeably reduce C++ source detection coverage in practice.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@sachintu47 sachintu47 marked this pull request as ready for review April 14, 2026 10:32
Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread patches/endianfix.patch
Copy link
Copy Markdown
Member

@IgorTodorovskiIBM IgorTodorovskiIBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sachintu47 sachintu47 merged commit bb4869c into main Apr 14, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants