Skip to content

04RR/sievelog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sievelog

Cut your observability bill by 40%+ in 5 minutes. No migration. No risk.

sievelog is an open-source log filter that sits between your applications and your observability platform (Datadog, Splunk, Grafana, New Relic — anything). It drops the noise before it's ingested, so you pay less without losing signal.

your apps → sievelog → Datadog/Splunk/Grafana
              ↓
         drops 40-60% of noise
         keeps 100% of errors

The problem

You're paying $0.10/GB to ingest logs into Datadog. 90% of those logs are health checks passing, requests completing normally, and cache hits — operational noise nobody looks at unless something breaks. But you pay to ingest, index, and store all of it.

What sievelog does

It reads your log stream, applies a configurable set of rules, and outputs only what matters:

Rule What it drops Typical reduction
drop_levels DEBUG/TRACE in production 10-30%
health_check Health/readiness/liveness probes 5-15%
dedup Identical consecutive lines 5-10%
drop_match Known noise patterns (cache hits, pool stats) 5-15%
field_strip Verbose fields (stack traces on INFO, full headers) 5-10% bytes
rate_limit Per-service line rate caps variable

Errors and warnings always pass through. sievelog never drops ERROR, FATAL, PANIC, or CRITICAL lines regardless of any rule. This is the safety guarantee.

Quick start

# Build from source
go build -o sievelog ./cmd/sievelog/

# Pipe your logs through it
cat your_logs.jsonl | ./sievelog -config sievelog.json > filtered.jsonl

# See what it did
# [sievelog] FINAL lines_in=30 lines_out=15 dropped=15 reduction=43.7%

Example

Input (30 lines of a typical payment service):

{"level":"INFO","msg":"Health check OK","latency_ms":2}
{"level":"INFO","msg":"Request processed","endpoint":"/api/charge","status":200}
{"level":"DEBUG","msg":"Cache hit for customer_id=8827"}
{"level":"INFO","msg":"Health check OK","latency_ms":1}
{"level":"INFO","msg":"Request processed","endpoint":"/api/charge","status":200}
{"level":"ERROR","msg":"DB connection timeout","status":503,"latency_ms":5002}
{"level":"INFO","msg":"Metrics exported successfully"}
...

Output (15 lines — 43.7% reduction):

{"level":"INFO","msg":"Request processed","endpoint":"/api/charge","status":200}
{"level":"ERROR","msg":"DB connection timeout","status":503,"latency_ms":5002}
{"level":"WARN","msg":"Elevated latency detected","latency_ms":312}
...

Dropped: health checks, DEBUG lines, cache hits, metrics noise, pool stats. Kept: all errors, all warnings, all real request traffic.

Config

sievelog uses a JSON config file. Here's the default:

{
  "global": {
    "json_mode": true,
    "level_field": "level",
    "message_field": "msg",
    "passthrough_on_error": true,
    "stats": true
  },
  "rules": [
    {
      "name": "drop_debug",
      "type": "drop_levels",
      "action": "drop",
      "config": { "levels": ["DEBUG", "TRACE"] }
    },
    {
      "name": "drop_health_checks",
      "type": "health_check",
      "action": "drop",
      "config": {
        "patterns": ["health check", "healthcheck", "liveness probe"],
        "endpoints": ["/health", "/healthz", "/ready", "/readyz", "/livez"]
      }
    },
    {
      "name": "dedup",
      "type": "dedup",
      "action": "drop",
      "config": { "window_sec": 60, "min_count": 3 }
    },
    {
      "name": "drop_noise",
      "type": "drop_match",
      "action": "drop",
      "config": {
        "patterns": ["Connection pool stats", "Cache hit for", "Metrics exported"]
      }
    },
    {
      "name": "strip_fields",
      "type": "field_strip",
      "action": "passthrough",
      "config": {
        "fields": ["stack_trace", "full_headers", "request_body"]
      }
    }
  ]
}

Usage with common log shippers

# With Filebeat (output to file, sievelog reads it)
filebeat -e | ./sievelog -config sievelog.json | your-destination

# With Fluentd (pipe through sievelog before forwarding)
<match **>
  @type exec_filter
  command /usr/local/bin/sievelog -config /etc/sievelog.json
</match>

# With Vector (as an external transform)
[transforms.sievelog]
  type = "exec"
  command = ["/usr/local/bin/sievelog", "-config", "/etc/sievelog.json"]

# With Docker
docker logs my-container 2>&1 | ./sievelog -config sievelog.json

# Dry run (process but don't filter — just see stats)
cat production.log | ./sievelog -config sievelog.json -dry-run 2>&1 >/dev/null
# [sievelog] FINAL lines_in=1000000 lines_out=420000 dropped=580000 reduction=58.0%

CLI flags

Flag Default Description
-config sievelog.json Path to config file
-stats false Print reduction stats to stderr
-dry-run false Process input, count stats, but forward everything
-version Print version and exit

Safety guarantees

  1. Errors always pass. Lines with level ERROR, FATAL, PANIC, CRIT, or CRITICAL are forwarded regardless of any rule.
  2. Parse failures pass. If passthrough_on_error is true (default), lines that fail JSON parsing are forwarded as-is.
  3. No data loss by default. sievelog only drops what you explicitly configure it to drop. The default config is conservative.

Performance

sievelog is a single Go binary with zero external dependencies. It processes logs line-by-line with minimal memory allocation.

  • Throughput: 50K+ lines/second on a single core
  • Memory: ~50MB baseline
  • Binary size: ~8MB
  • Startup time: instant

How much will you save?

Run the dry-run mode on a sample of your production logs:

# Grab 1 hour of logs
kubectl logs -l app=your-service --since=1h > sample.log

# See the reduction
cat sample.log | ./sievelog -config sievelog.json -dry-run -stats 2>&1 >/dev/null

If you're sending 1TB/day to Datadog at $0.10/GB and sievelog reduces it by 40%, that's $14,600/month saved.

Roadmap

  • v0.1: Core rule engine (drop, dedup, health check, field strip, rate limit)
  • v0.2: Prometheus metrics endpoint (/metrics) for monitoring sievelog itself
  • v0.3: Helm chart for K8s DaemonSet deployment
  • v0.4: OTel Collector processor plugin
  • v0.5: Statistical summaries (replace N identical lines with one summary)
  • v1.0: ML-based anomaly detection (learn what's "normal" per-service)

License

Apache 2.0

Contributing

Issues and PRs welcome. Start with the sievelog.json config — the best way to help is to add rule patterns that work for your log format.

About

sievelog is an open-source log filter that sits between your applications and your observability platform (Datadog, Splunk, Grafana, New Relic, anything). It drops the noise before it's ingested, so you pay less without losing signal.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages