You have compressed log archives on disk. To query them you normally decompress everything first β even if you only need one hour out of thirty days.
This extension changes that. Query .pfc files directly from DuckDB SQL. A block index tells the extension exactly which chunks of the file to decompress β the rest stays compressed.
Requires: The
pfc_jsonlbinary installed on your machine (Step 1 below). The extension calls it for decompression.Platform: Linux x86_64 and macOS Apple Silicon (ARM64). No native Windows binary β Windows users must use WSL2 or a Linux machine.
INSTALL pfc FROM community;
LOAD pfc;
LOAD json;
SELECT
line->>'$.level' AS level,
line->>'$.message' AS message
FROM read_pfc_jsonl('/var/log/events.pfc')
WHERE line->>'$.level' = 'ERROR';PFC-JSONL is a high-performance compressed log format built for structured (JSONL) data. It achieves better compression than gzip and zstd on real log data while supporting random block access β meaning you can decompress only the time range you need.
Key properties:
- Each file is split into independently compressible blocks
- A
.pfc.bidxbinary index stores the byte offset and timestamp range of every block - The PFC binary can decompress any subset of blocks in a single call
- Free for personal and open-source use β no account, no signup required
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DuckDB β
β β
β SELECT * FROM read_pfc_jsonl('events.pfc', ts_from=...) β
β β β
β ββββββββββΌβββββββββββ reads βββββββββββββββββββββββ β
β β pfc extension βββββββββββββββΆβ events.pfc.bidx β β
β β (MIT, open src) β block index β (block timestamps) β β
β ββββββββββ¬βββββββββββ βββββββββββββββββββββββ β
β β popen() / subprocess β
βββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β pfc_jsonl binary β β proprietary, closed source
β (v3.4+, local) β contains BWT+rANS compression
βββββββββββββββββββββββ
β
βΌ
decompressed JSON lines β back to DuckDB
The extension is a thin open-source wrapper β it reads the .bidx index in C++ to select which blocks are needed, then calls the PFC binary once to decompress only those blocks. The compression algorithm stays closed.
The extension calls the pfc_jsonl binary for decompression.
Download the latest release for your platform:
Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
-o /usr/local/bin/pfc_jsonl
chmod +x /usr/local/bin/pfc_jsonl
pfc_jsonl --help # verify installmacOS (Apple Silicon M1/M2/M3/M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
-o /usr/local/bin/pfc_jsonl
chmod +x /usr/local/bin/pfc_jsonl
pfc_jsonl --help # verify installmacOS Intel (x64): Binary coming soon.
Custom path: Set
PFC_JSONL_BINARY=/path/to/pfc_jsonlin your environment to override the default/usr/local/bin/pfc_jsonl.
INSTALL pfc FROM community;
LOAD pfc;git clone --recurse-submodules https://github.com/ImpossibleForge/pfc-duckdb
cd pfc-duckdb
GEN=ninja make release
# Extension at: build/release/extension/pfc/pfc.duckdb_extensionLOAD pfc;
SELECT line FROM read_pfc_jsonl('/path/to/file.pfc');Each row contains one raw JSON string in the line column.
Use the DuckDB json extension to parse fields:
LOAD json;
SELECT
line->>'$.timestamp' AS ts,
line->>'$.level' AS level,
line->>'$.message' AS message,
line->>'$.service' AS service
FROM read_pfc_jsonl('/path/to/file.pfc');PFC files include a .pfc.bidx index with the timestamp range of each block.
Pass ts_from and/or ts_to (Unix seconds) to skip entire blocks before decompression:
-- Only decompress blocks that overlap the given time window
SELECT line
FROM read_pfc_jsonl(
'/path/to/file.pfc',
ts_from = 1735689600, -- 2026-01-01 00:00:00 UTC
ts_to = 1735775999 -- 2026-01-01 23:59:59 UTC
);Convert a timestamp string to Unix seconds with epoch():
SELECT line
FROM read_pfc_jsonl(
'/path/to/file.pfc',
ts_from = epoch(TIMESTAMPTZ '2026-01-01 00:00:00+00'),
ts_to = epoch(TIMESTAMPTZ '2026-01-02 00:00:00+00')
);ts_from/ts_to skip entire blocks (coarse, fast).
Add a WHERE clause for row-level precision:
LOAD json;
SELECT line->>'$.message' AS msg
FROM read_pfc_jsonl(
'/var/log/api.pfc',
ts_from = epoch(TIMESTAMPTZ '2026-03-15 08:00:00+00'),
ts_to = epoch(TIMESTAMPTZ '2026-03-15 10:00:00+00')
)
WHERE line->>'$.level' = 'ERROR';LOAD json;
-- Error rate per hour
SELECT
strftime(to_timestamp((line->>'$.ts')::BIGINT), '%Y-%m-%d %H:00') AS hour,
count(*) FILTER (WHERE line->>'$.level' = 'ERROR') AS errors,
count(*) AS total
FROM read_pfc_jsonl('/var/log/api.pfc')
GROUP BY hour ORDER BY hour;
-- Top 10 slowest endpoints
SELECT
line->>'$.path' AS endpoint,
avg((line->>'$.duration_ms')::DOUBLE) AS avg_ms,
count(*) AS requests
FROM read_pfc_jsonl('/var/log/api.pfc')
GROUP BY endpoint ORDER BY avg_ms DESC LIMIT 10;| Parameter | Type | Default | Description |
|---|---|---|---|
path |
VARCHAR | β | Path to the .pfc file. A .pfc.bidx index must exist at path + ".bidx". |
ts_from |
BIGINT | 0 | Lower bound for block selection (Unix seconds). 0 = no lower bound. |
ts_to |
BIGINT | 0 | Upper bound for block selection (Unix seconds). 0 = no upper bound. |
Returns: table with one column line VARCHAR β one row per decompressed JSON line.
Block filtering semantics:
A block is included if its timestamp range [ts_start, ts_end] overlaps [ts_from, ts_to].
Blocks with unknown timestamps are always included.
If both ts_from and ts_to are 0, all blocks are read.
| File | Required | Description |
|---|---|---|
file.pfc |
yes | Compressed PFC-JSONL file |
file.pfc.bidx |
yes | Binary block index (requires PFC-JSONL v3.4+) |
Generate both with the PFC binary:
pfc_jsonl compress input.jsonl output.pfc
# Produces: output.pfc + output.pfc.bidxNote: The Docker image on Docker Hub (
impossibleforge/pfc-jsonl) is a server-side compression tool. It is not required for using the DuckDB extension β you only need the standalonepfc_jsonlbinary from GitHub Releases.
Block-level filtering can skip the majority of a file. Example: 30-day log file, 720 hourly blocks β a 1-hour query reads 1 block instead of 720.
| Query range | Blocks read | Speedup (720-block file) |
|---|---|---|
| 30 days | 720/720 | 1Γ |
| 1 day | ~24/720 | ~30Γ |
| 1 hour | ~1/720 | ~720Γ |
Cannot open index file: /path/to/file.pfc.bidx
The .pfc.bidx index is missing. Compress with PFC-JSONL v3.4+:
pfc_jsonl compress input.jsonl output.pfcPFC binary not found at '/usr/local/bin/pfc_jsonl'
Binary is missing or not executable. Re-run the curl install command, or set PFC_JSONL_BINARY=/path/to/pfc_jsonl.
popen() failed β could not start PFC binary subprocess
The extension uses popen() to call the PFC binary. Windows is not supported β use WSL2 or a Linux machine.
ts_from (...) must be <= ts_to (...)
You passed an inverted time range. Swap the values so ts_from comes before ts_to.
| Project | Description |
|---|---|
| pfc-jsonl | The core binary β compress, decompress, query |
| pfc-gateway | HTTP query + ingest API for .pfc archives β enables Grafana and remote access |
| pfc-fluentbit | Stream Fluent Bit logs directly to .pfc archives |
| pfc-migrate | Convert existing gzip/zstd/lz4 archives to PFC β local, S3, Azure, GCS |
| pfc-jsonl (PyPI) | Python package β pip install pfc-jsonl |
| pfc-vector | High-performance Rust ingest daemon for Vector.dev |
| pfc-otel-collector | OpenTelemetry OTLP/HTTP log exporter |
| pfc-kafka-consumer | Kafka / Redpanda consumer |
| pfc-telegraf | Telegraf HTTP output plugin β PFC |
| pfc-grafana | Grafana data source plugin for PFC archives |
PFC-DuckDB is an independent open-source project and is not affiliated with, endorsed by, or associated with the DuckDB Foundation or DuckDB Labs.
The pfc DuckDB extension (this repository) is released under the MIT License β see LICENSE.
The PFC-JSONL binary (pfc_jsonl) is proprietary software β free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com