You're already using Fluent Bit to collect logs. But when you want to query them later, you hit the wall: decompress everything first, or pay for Athena/BigQuery to do it for you.
pfc-fluentbit routes Fluent Bit output into compressed .pfc archives that DuckDB can query directly — skipping only the blocks outside your time window, leaving everything else untouched on disk.
Free for personal and open-source use — no account, no signup, no daily limits. Commercial use requires a license: info@impossibleforge.com
| gzip / zstd | PFC-JSONL | |
|---|---|---|
| Compression | ~12–14% | ~9% (25–37% smaller) |
| Query one hour from a 30-day archive | Decompress everything | Decompress 1 block, skip 719 |
| Query in DuckDB | ❌ Not possible | ✅ read_pfc_jsonl() |
| Works with Fluent Bit | ✅ | ✅ (this repo) |
A 30-day log archive compressed with gzip means downloading and decompressing the entire thing just to look at one hour of errors. PFC-JSONL stores a block index alongside every archive — DuckDB reads the index, skips irrelevant blocks, and decompresses only what you asked for.
The pipeline:
Your Services
|
v
Fluent Bit (collect, filter, enrich)
|
| TCP json_lines (port 5170)
v
pfc_forwarder.py <-- this repo
| buffers records in memory
| compresses when threshold reached
v
/var/log/pfc/logs_20260404_1400.pfc <- compressed archive
/var/log/pfc/logs_20260404_1400.pfc.bidx <- block index for DuckDB
The forwarder is a lightweight Python daemon (standard library only, no pip installs).
It receives JSON log records over TCP, buffers them, and calls the pfc_jsonl binary to compress.
# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# macOS (Apple Silicon M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonlmacOS Intel (x64): Binary coming soon. Windows: No native binary. Use WSL2 or a Linux machine.
# Download
curl -L https://raw.githubusercontent.com/ImpossibleForge/pfc-fluentbit/main/pfc_forwarder.py \
-o /opt/pfc_forwarder.py
# Run (will create /var/log/pfc/ automatically)
python3 /opt/pfc_forwarder.pyDefault: listens on port 5170, rotates every 32 MB or 1 hour.
Add to your Fluent Bit config:
[OUTPUT]
Name tcp
Match *
Host 127.0.0.1
Port 5170
Format json_linesThat's it. Archives appear in /var/log/pfc/ automatically.
python3 pfc_forwarder.py \
--port 5170 # TCP port (default: 5170)
--output-dir /var/log/pfc # Where to write .pfc archives
--buffer-mb 32 # Rotate after N MB (default: 32)
--rotate-secs 3600 # Rotate after N seconds (default: 3600 = 1h)
--pfc-binary /usr/local/bin/pfc_jsonl # Path to pfc_jsonl binaryThe .pfc.bidx index enables fast time-range queries without reading entire files:
INSTALL pfc FROM community;
LOAD pfc;
LOAD json;
-- Query one hour of logs
SELECT
line->>'$.level' AS level,
line->>'$.service' AS service,
line->>'$.message' AS message
FROM read_pfc_jsonl(
'/var/log/pfc/logs_20260404_1400.pfc',
ts_from = epoch(TIMESTAMPTZ '2026-04-04 14:00:00+00'),
ts_to = epoch(TIMESTAMPTZ '2026-04-04 15:00:00+00')
);Only the relevant blocks are decompressed — the rest is never read.
See pfc-duckdb for the full DuckDB extension.
See examples/docker-compose.yml for a complete setup with Fluent Bit collecting container logs.
ERROR: pfc_jsonl binary not found
Run the Step 1 install command above.
Address already in use: port 5170
Another process is on that port. Use --port 5171 or kill the old process.
Archives are bigger than input
Buffer size is too small — data compresses poorly at < 1 MB. Use the default --buffer-mb 32 in production.
Already have years of logs compressed with gzip, zstd, or bzip2 — on disk, on S3, on Azure, or on GCS?
pfc-migrate converts them in one command, in-region (no egress charges):
pip install pfc-migrate[all]
# Local directory
pfc-migrate convert --dir /var/log/old-archive/ --output-dir /var/log/pfc/ -v
# S3 bucket (converts in-region)
pfc-migrate s3 --bucket my-logs --prefix 2025/ --out-bucket my-logs-pfc --out-prefix pfc/After conversion, DuckDB can query them directly via the pfc extension — no decompression needed.
Use the pfc Python package to read or query .pfc archives from Python scripts:
pip install pfc-jsonlimport pfc
# Query the archives written by pfc_forwarder
pfc.query("/var/log/pfc/logs_20260404_1400.pfc",
from_ts="2026-04-04T14:30:00",
to_ts="2026-04-04T14:45:00",
output_path="/tmp/15min.jsonl")- pfc-jsonl — core binary (compress/decompress/query)
- pfc-gateway — HTTP REST gateway — ingest + query, no DuckDB
- pfc-vector — high-performance Rust ingest daemon for Vector.dev and Telegraf
- pfc-migrate — one-shot export and archive conversion
- pfc-py — Python client library for PFC
- pfc-duckdb — DuckDB extension for SQL queries on PFC files
- pfc-otel-collector — OpenTelemetry OTLP/HTTP log exporter
- pfc-kafka-consumer — Kafka / Redpanda consumer → PFC
- pfc-telegraf — Telegraf HTTP output plugin → PFC
- pfc-grafana — Grafana data source plugin for PFC archives
PFC-FluentBit is an independent open-source project and is not affiliated with, endorsed by, or associated with the Fluent Bit project or the Cloud Native Computing Foundation (CNCF).
pfc-fluentbit (this repository) is released under the MIT License — see LICENSE.
The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use.
Commercial use requires a license: info@impossibleforge.com
Built by ImpossibleForge