Skip to content

cwccie/logforge

Repository files navigation

LogForge

Hybrid log parser (Drain3-inspired + LLM-ready) for production log analysis. Parse, mine templates, detect anomalies, extract entities, search, and compute statistics across multiple log formats.

Architecture

                        +-------------------+
                        |   CLI Interface   |
                        |  (Click commands) |
                        +--------+----------+
                                 |
              +------------------+------------------+
              |                  |                   |
     +--------v-------+  +------v------+  +---------v--------+
     |     Parser      |  |   Miner     |  | Anomaly Detector |
     | (Multi-format)  |  | (Drain3)    |  | (Frequency-based)|
     +--------+--------+  +------+------+  +---------+--------+
              |                  |                    |
              |     +------------+------------+      |
              |     |                         |      |
     +--------v-----v--+            +---------v------v--+
     |    Entity        |            |    Statistics      |
     |   Extraction     |            |    Engine          |
     +--------+---------+            +---------+----------+
              |                                |
     +--------v--------------------------------v--+
     |              Data Models                    |
     |  (ParsedLog, LogTemplate, Anomaly, Stats)   |
     +---------------------------------------------+

  Supported Formats:
  +----------+  +-----------+  +-----+  +------+
  | Syslog   |  | JSON      |  | CEF |  | LEEF |
  | RFC3164  |  | Structured|  |     |  |      |
  | RFC5424  |  |           |  |     |  |      |
  +----------+  +-----------+  +-----+  +------+
  +----------------+  +------------+
  | Windows Event  |  | Plain Text |
  | XML            |  | (regex)    |
  +----------------+  +------------+

Features

  • Multi-format parsing -- Syslog RFC3164/5424, JSON structured, CEF, LEEF, Windows Event XML, and plain text with regex-based extraction
  • Template mining -- Drain3-inspired algorithm using a fixed-depth parse tree for automatic log template extraction; groups similar log lines and replaces variables with <*> wildcards
  • Anomaly detection -- Frequency-based detection of new templates, frequency spikes, rare templates, volume anomalies (z-score), and time gap anomalies
  • Entity extraction -- Extracts IPv4/IPv6 addresses, hostnames, emails, timestamps, error codes, MAC addresses, ports, usernames, PIDs, file paths, URLs, stack traces, and Java exceptions
  • Search and filter -- Full-text search, severity/source/hostname/time-range/template-id/regex filters, grouping by template/severity/source
  • Statistics -- Log volume over time, severity breakdown, source distribution, top templates, error rate computation

Installation

pip install .

For development:

pip install -e ".[dev]"

Usage

Parse logs

# Parse from file (auto-detects format)
logforge parse /var/log/syslog

# Parse from stdin
cat /var/log/app.log | logforge parse

# Parse with entity extraction
logforge parse --extract-entities /var/log/auth.log

# Force format
logforge parse --format json app-logs.jsonl

# Limit lines
logforge parse --max-lines 1000 huge.log

Mine templates

# Extract templates from logs
logforge mine /var/log/syslog

# Tune similarity threshold (0.0 - 1.0)
logforge mine --threshold 0.6 /var/log/app.log

# Adjust parse tree depth
logforge mine --depth 5 /var/log/app.log

Search logs

# Full-text search
logforge search -q "connection refused" /var/log/app.log

# Filter by severity
logforge search --severity error /var/log/syslog

# Filter by minimum severity (WARNING and above)
logforge search --severity-min warning /var/log/syslog

# Regex search
logforge search -r "E\d{4}" /var/log/app.log

# Group by severity
logforge search --group-by severity /var/log/syslog

# Combine filters
logforge search -q "timeout" --severity error --source nginx --limit 50 access.log

Compute statistics

logforge stats /var/log/syslog
logforge stats --max-lines 10000 /var/log/app.log

Detect anomalies

# Basic anomaly detection
logforge anomaly /var/log/app.log

# Custom time window (seconds)
logforge anomaly --window 600 /var/log/app.log

# Adjust spike sensitivity
logforge anomaly --spike-threshold 2.0 /var/log/app.log

Output to file

All commands support -o for file output:

logforge parse -o parsed.json /var/log/syslog
logforge mine -o templates.json /var/log/app.log
logforge stats -o report.json /var/log/syslog

Docker

Build and run with Docker:

docker build -t logforge .
docker run --rm -v /var/log:/data/logs logforge parse /data/logs/syslog

Docker Compose (full stack):

docker-compose up -d

Services:

  • parser-engine -- Log parsing service
  • log-store -- Log storage backend
  • api -- API service

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/ tests/

# Run with coverage
pytest --cov=logforge --cov-report=term-missing

Project Structure

logforge/
  src/logforge/
    __init__.py        # Package version
    models.py          # Data classes (ParsedLog, LogTemplate, Anomaly, LogStats)
    parser.py          # Multi-format log parser
    miner.py           # Drain3-inspired template mining
    entities.py        # Entity extraction (IPs, hostnames, errors, etc.)
    anomaly.py         # Anomaly detection engine
    search.py          # Search, filter, and grouping
    stats.py           # Statistics computation
    cli.py             # Click CLI (parse, mine, search, stats, anomaly)
  tests/               # 20+ pytest test cases
  pyproject.toml       # Hatchling build config
  Dockerfile
  docker-compose.yml
  .github/workflows/ci.yml
  LICENSE              # MIT
  README.md

License

MIT License. Copyright (c) 2026 Corey Wade.

About

Hybrid log parser (Drain3 + LLM) for production log analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors