Export any JSONL data directly to PFC cold storage — or convert existing compressed JSONL archives from local disk, S3, Azure, or GCS. No intermediate files, no schema changes, no pipelines.
| Command | What it does |
|---|---|
pfc-migrate cratedb |
Stream a CrateDB table directly to a .pfc archive |
pfc-migrate questdb |
Stream a QuestDB table directly to a .pfc archive |
pfc-migrate convert |
Convert gzip/zstd/bzip2/lz4/JSONL files to PFC |
pfc-migrate s3 |
Convert JSONL archives in S3 in-place |
pfc-migrate glacier |
Restore + convert S3 Glacier archives to PFC |
pfc-migrate azure |
Convert JSONL archives in Azure Blob Storage |
pfc-migrate gcs |
Convert JSONL archives in Google Cloud Storage |
Once your archives are in PFC format, DuckDB can query them directly — without decompressing the whole file first:
INSTALL pfc FROM community;
LOAD pfc;
LOAD json;
-- Query just one hour from a 30-day archive
SELECT line->>'$.level' AS level, line->>'$.message' AS message
FROM read_pfc_jsonl(
'/var/log/pfc/app_2026-03-01.pfc',
ts_from = epoch(TIMESTAMPTZ '2026-03-01 14:00:00+00'),
ts_to = epoch(TIMESTAMPTZ '2026-03-01 15:00:00+00')
);| Tool | 1h query on 30-day archive | Storage vs gzip |
|---|---|---|
| gzip | Decompress full 30-day file | — |
| zstd | Decompress full 30-day file | — |
| PFC-JSONL | Decompress ~1/720 of the file | 25% smaller than gzip |
~6–11% compression ratio on typical JSONL log data (25–40% smaller than gzip).
Cloud conversions run in-region: download → convert → upload, without ever routing through your laptop or billing for egress.
| Format | Extension | Extra dependency |
|---|---|---|
| gzip | .jsonl.gz |
stdlib ✅ |
| bzip2 | .jsonl.bz2 |
stdlib ✅ |
| zstd | .jsonl.zst |
pip install pfc-migrate[zstd] |
| lz4 | .jsonl.lz4 |
pip install pfc-migrate[lz4] |
| Plain JSONL | .jsonl |
stdlib ✅ |
The pfc_jsonl binary must be installed on the machine running the export:
# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl
# macOS (Apple Silicon M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
-o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonlLicense note: This tool requires the
pfc_jsonlbinary.pfc_jsonlis free for personal and open-source use — commercial use requires a separate license. See pfc-jsonl for details.
macOS Intel (x64): Binary coming soon. Windows: No native binary. Use WSL2 or a Linux machine.
pip install pfc-migrate
# With zstd support
pip install pfc-migrate[zstd]
# With S3/Glacier support
pip install pfc-migrate[s3]
# With Azure Blob Storage support
pip install pfc-migrate[azure]
# With Google Cloud Storage support
pip install pfc-migrate[gcs]
# For CrateDB direct export
pip install pfc-migrate[postgres]
# For QuestDB direct export
pip install pfc-migrate[questdb]Stream rows directly from a CrateDB table into a .pfc archive. No intermediate files.
pip install pfc-migrate[postgres]
# Export one week of logs
pfc-migrate cratedb \
--host crate.example.com \
--user crate \
--dbname doc \
--schema doc \
--table logs \
--ts-column ts \
--from-ts "2026-03-01" --to-ts "2026-03-08" \
--output logs_2026-03-01.pfc \
--verbose
# Auto-named output: logs_20260301_20260308.pfc
pfc-migrate cratedb --host localhost --table logs \
--from-ts "2026-03-01" --to-ts "2026-03-08" --verboseVerbose output:
-> Connecting to CrateDB at localhost:5432 (db: doc) ...
-> Columns (6): ts, level, message, host, service, duration_ms
-> Streaming rows (batch size: 10,000) ...
100,000 rows (17.4 MiB) ...
200,000 rows (34.8 MiB) ...
-> Exported 250,000 rows (43.7 MiB JSONL)
-> Compressing with pfc_jsonl ...
✓ 250,000 rows | JSONL 43.7 MiB -> PFC 2.6 MiB (5.9%) -> logs_20260301_20260308.pfc
| Option | Default | Description |
|---|---|---|
--host |
localhost | CrateDB host |
--port |
5432 | PostgreSQL wire port |
--user |
crate | Username |
--password |
(empty) | Password |
--dbname |
doc | Database name |
--schema |
doc | Schema name |
--table |
required | Table to export |
--ts-column |
None | Timestamp column for WHERE filter and ORDER BY |
--from-ts |
None | Start of range (inclusive, ISO 8601) |
--to-ts |
None | End of range (exclusive, ISO 8601) |
--batch-size |
10000 | Rows per fetch (memory-safe batching) |
--output |
(auto) | Output .pfc file |
--verbose |
false | Show row progress and size stats |
Stream rows directly from a QuestDB table into a .pfc archive. No intermediate files.
pip install pfc-migrate[questdb]
# Export one week of trades
pfc-migrate questdb \
--host quest.example.com \
--table trades \
--ts-column timestamp \
--from-ts "2026-03-01" --to-ts "2026-03-08" \
--output trades_2026-03-01.pfc \
--verbose
# Auto-named output: trades_20260301_20260308.pfc
pfc-migrate questdb --host localhost --table trades \
--from-ts "2026-03-01" --to-ts "2026-03-08" --verboseVerbose output:
-> Connecting to QuestDB at localhost:8812 (db: qdb) ...
-> Columns (5): timestamp, symbol, price, volume, side
-> Streaming rows (batch size: 10,000) ...
100,000 rows (18.1 MiB) ...
-> Exported 120,000 rows (21.7 MiB JSONL)
-> Compressing with pfc_jsonl ...
✓ 120,000 rows | JSONL 21.7 MiB -> PFC 1.3 MiB (6.0%) -> trades_20260301_20260308.pfc
| Option | Default | Description |
|---|---|---|
--host |
localhost | QuestDB host |
--port |
8812 | PostgreSQL wire port |
--user |
admin | Username |
--password |
quest | Password |
--dbname |
qdb | Database name |
--table |
required | Table to export (no schema prefix) |
--ts-column |
None | Timestamp column for WHERE filter and ORDER BY |
--from-ts |
None | Start of range (inclusive, ISO 8601) |
--to-ts |
None | End of range (exclusive, ISO 8601) |
--batch-size |
10000 | Rows per fetch (memory-safe batching) |
--output |
(auto) | Output .pfc file |
--verbose |
false | Show row progress and size stats |
Note: QuestDB has no schema concept — tables are referenced by name only. There is no
--schemaoption.
# Single file (output auto-named: logs.pfc + logs.pfc.bidx)
pfc-migrate convert logs.jsonl.gz
# Explicit output
pfc-migrate convert logs.jsonl.gz logs.pfc
# Entire directory
pfc-migrate convert --dir /var/log/archive/ --output-dir /var/log/pfc/
# Recursive + verbose
pfc-migrate convert --dir /mnt/logs/ -r -v
Conversion happens in-region (download to temp dir → convert → upload). No egress charges.
# Single object
pfc-migrate s3 \
--bucket my-logs \
--key archive/app_2026-03.jsonl.gz \
--out-bucket my-logs-pfc \
--out-prefix converted/
# All objects matching a prefix
pfc-migrate s3 \
--bucket my-logs \
--prefix archive/2026-03/ \
--out-bucket my-logs-pfc \
--out-prefix converted/2026-03/ \
--format gz \
--verbose
# Glacier (Expedited retrieval)
pfc-migrate glacier \
--bucket my-glacier-logs \
--prefix 2025/ \
--out-bucket my-glacier-pfc \
--retrieval-tier Expedited# All blobs matching a prefix
pfc-migrate azure \
--container my-logs \
--prefix archive/2026-03/ \
--out-container my-logs-pfc \
--connection-string "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;"# All objects matching a prefix
pfc-migrate gcs \
--bucket my-logs \
--prefix archive/2026-03/ \
--out-bucket my-logs-pfc \
--verboseQuery CrateDB live data and cold PFC archives in a single DuckDB SQL statement:
import duckdb, psycopg2
con = duckdb.connect()
con.execute("INSTALL pfc FROM community; LOAD pfc; LOAD json;")
# Register CrateDB live data as a view
cratedb_conn = psycopg2.connect(host="localhost", user="crate", dbname="doc")
live_data = cratedb_conn.cursor()
live_data.execute("SELECT * FROM logs WHERE ts >= '2026-04-01'")
con.register("live_logs", live_data.fetchall())
# Query cold PFC archives + hot live data in one SQL
result = con.execute("""
SELECT ts, level, message
FROM pfc_scan([
'/archives/logs_2026-01.pfc',
'/archives/logs_2026-02.pfc',
'/archives/logs_2026-03.pfc'
])
UNION ALL
SELECT ts, level, message FROM live_logs
ORDER BY ts
""").fetchall()See examples/cratedb_archive_explorer.py for a complete demo.
Query QuestDB live data and cold PFC archives in a single DuckDB SQL statement:
import duckdb, psycopg2
con = duckdb.connect()
con.execute("INSTALL pfc FROM community; LOAD pfc; LOAD json;")
# Register QuestDB live data as a view
questdb_conn = psycopg2.connect(host="localhost", port=8812,
user="admin", password="quest", dbname="qdb")
live_data = questdb_conn.cursor()
live_data.execute("SELECT * FROM trades WHERE timestamp >= '2026-04-01'")
con.register("live_trades", live_data.fetchall())
# Query cold PFC archives + hot live data in one SQL
result = con.execute("""
SELECT timestamp, symbol, price, volume
FROM pfc_scan([
'/archives/trades_2026-01.pfc',
'/archives/trades_2026-02.pfc',
'/archives/trades_2026-03.pfc'
])
UNION ALL
SELECT timestamp, symbol, price, volume FROM live_trades
ORDER BY timestamp
""").fetchall()Every conversion is verified by full decompression and MD5 check before output is written. If anything doesn't match, the output file is deleted and the error is reported — the original is never modified. For S3, GCS, and Azure subcommands, --delete removes the original cloud object only after successful verification.
| Project | Description |
|---|---|
| pfc-jsonl | Core binary — compress, decompress, query |
| pfc-duckdb | DuckDB Community Extension (INSTALL pfc FROM community) |
| pfc-fluentbit | Fluent Bit -> PFC forwarder for live pipelines |
| pfc-archiver-cratedb | Autonomous daemon: archive old CrateDB partitions automatically |
| pfc-archiver-questdb | Autonomous daemon: archive old QuestDB partitions automatically |
| pfc-vector | High-performance Rust ingest daemon for Vector.dev and Telegraf |
| pfc-otel-collector | OpenTelemetry OTLP/HTTP log exporter |
| pfc-kafka-consumer | Kafka / Redpanda consumer |
| pfc-telegraf | Telegraf HTTP output plugin → PFC |
| pfc-grafana | Grafana data source plugin for PFC archives |
pfc-migrate (this repository) is released under the MIT License — see LICENSE.
The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com