Add plaintext log detection via timestamp and log-level pattern density analysis by hemantkumar15438 · Pull Request #158 · openrelik/openrelik-workers

hemantkumar15438 · 2026-05-20T13:53:22Z

Overview

This Pull Request introduces an analysis task to openrelik-worker-analyzer-logs that detects unparsed, rotated, or extensionless plaintext log files by measuring the density of timestamps and log-level markers within the file content.

Rather than relying on file extensions or paths, the engine evaluates the raw text stream to calculate a pattern-to-text ratio. To scan raw storage inputs, the worker automatically handles system-level block device mounting to inspect inner filesystems recursively.

Technical Implementation & Mechanics

Timestamp and Log-Level Ratio Math: The engine samples up to the first 500 lines of a target file and tracks lines matching specific structural logging signatures:
- Timestamps: ISO 8601, RFC 3164/5424 Syslog, and standard date/time string variants.
- Log Levels: Standard severity markers (e.g., [INFO], ERROR:, WARN, DEBUG).
The evaluation metric is calculated using a strict ratio:
$$\text{Density} = \frac{\text{Lines Matching Patterns}}{\text{Total Lines Evaluated}}$$
Files meeting or exceeding the user-defined threshold (default: 0.15 or 15%) are flagged in the output report.
Block Device Partition Traversal: When processing raw disk images (.dd, .raw, .e01), the task routes execution through OpenRelik’s BlockDevice infrastructure. The worker handles system-level loop device attachment via losetup, maps the partition tables, mounts the underlying filesystems dynamically, and passes the inner file paths directly to the density analysis loop.

Architectural Constraints & Safeguards

Memory Boundary Management: Line sampling is hard-capped at 500 lines per file descriptor to prevent memory exhaustion on large data sets.
Binary Stream Checks: Implements an early header check for null-bytes (\x00). If detected within the initial block read, the stream is immediately classified as a binary object (compiled executable, media archive, database) and skipped to prevent unnecessary regex processing.
Nested Mount Prevention: The file tree traversal explicitly skips any matching storage image extensions (.dd, .img, etc.) discovered inside an active filesystem mount to eliminate recursive loop device allocations or kernel lockups.
The block-device mapping layer is encapsulated entirely within a try...finally block. This guarantees that regardless of processing exceptions, BlockDevice.umount() is executed deterministically, eliminating unreleased loop devices or host OS mount-point leaks.

Verification & Testing

Unit Validation: Verified regex parsing accuracy and threshold triggers directly against standard text streams and extensionless UNIX log samples.
Integration Validation: Executed successful end-to-end integration cycles within a local containerized deployment against raw .dd targets, verifying host kernel module utilization (nbd), loop device mapping, system tree traversal, and final artifact generation.

…canning

hacktobeer · 2026-06-09T00:23:00Z

@hemantkumar15438 Is this PR ready for review?

Add log discovery analyzer task for extensionless timestamp density s…

1080261

…canning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add plaintext log detection via timestamp and log-level pattern density analysis#158

Add plaintext log detection via timestamp and log-level pattern density analysis#158
hemantkumar15438 wants to merge 1 commit into
openrelik:mainfrom
hemantkumar15438:feature/log-discovery-release

hemantkumar15438 commented May 20, 2026

Uh oh!

hacktobeer commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hemantkumar15438 commented May 20, 2026

Overview

Technical Implementation & Mechanics

Architectural Constraints & Safeguards

Verification & Testing

Uh oh!

hacktobeer commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants