Add plaintext log detection via timestamp and log-level pattern density analysis#158
Open
hemantkumar15438 wants to merge 1 commit into
Open
Conversation
Contributor
|
@hemantkumar15438 Is this PR ready for review? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This Pull Request introduces an analysis task to
openrelik-worker-analyzer-logsthat detects unparsed, rotated, or extensionless plaintext log files by measuring the density of timestamps and log-level markers within the file content.Rather than relying on file extensions or paths, the engine evaluates the raw text stream to calculate a pattern-to-text ratio. To scan raw storage inputs, the worker automatically handles system-level block device mounting to inspect inner filesystems recursively.
Technical Implementation & Mechanics
Timestamp and Log-Level Ratio Math: The engine samples up to the first 500 lines of a target file and tracks lines matching specific structural logging signatures:
[INFO],ERROR:,WARN,DEBUG).The evaluation metric is calculated using a strict ratio:
$$\text{Density} = \frac{\text{Lines Matching Patterns}}{\text{Total Lines Evaluated}}$$
Files meeting or exceeding the user-defined threshold (default:
0.15or 15%) are flagged in the output report.Block Device Partition Traversal: When processing raw disk images (
.dd,.raw,.e01), the task routes execution through OpenRelik’sBlockDeviceinfrastructure. The worker handles system-level loop device attachment vialosetup, maps the partition tables, mounts the underlying filesystems dynamically, and passes the inner file paths directly to the density analysis loop.Architectural Constraints & Safeguards
\x00). If detected within the initial block read, the stream is immediately classified as a binary object (compiled executable, media archive, database) and skipped to prevent unnecessary regex processing..dd,.img, etc.) discovered inside an active filesystem mount to eliminate recursive loop device allocations or kernel lockups.try...finallyblock. This guarantees that regardless of processing exceptions,BlockDevice.umount()is executed deterministically, eliminating unreleased loop devices or host OS mount-point leaks.Verification & Testing
.ddtargets, verifying host kernel module utilization (nbd), loop device mapping, system tree traversal, and final artifact generation.