pfc-migrate — Move any JSONL log or event data to PFC cold storage

Export any JSONL data directly to PFC cold storage — or convert existing compressed JSONL archives from local disk, S3, Azure, or GCS. No intermediate files, no schema changes, no pipelines.

What this does

Command	What it does
`pfc-migrate cratedb`	Stream a CrateDB table directly to a `.pfc` archive
`pfc-migrate questdb`	Stream a QuestDB table directly to a `.pfc` archive
`pfc-migrate convert`	Convert gzip/zstd/bzip2/lz4/JSONL files to PFC
`pfc-migrate s3`	Convert JSONL archives in S3 in-place
`pfc-migrate glacier`	Restore + convert S3 Glacier archives to PFC
`pfc-migrate azure`	Convert JSONL archives in Azure Blob Storage
`pfc-migrate gcs`	Convert JSONL archives in Google Cloud Storage

Why convert?

Once your archives are in PFC format, DuckDB can query them directly — without decompressing the whole file first:

INSTALL pfc FROM community;
LOAD pfc;
LOAD json;

-- Query just one hour from a 30-day archive
SELECT line->>'$.level' AS level, line->>'$.message' AS message
FROM read_pfc_jsonl(
    '/var/log/pfc/app_2026-03-01.pfc',
    ts_from = epoch(TIMESTAMPTZ '2026-03-01 14:00:00+00'),
    ts_to   = epoch(TIMESTAMPTZ '2026-03-01 15:00:00+00')
);

Tool	1h query on 30-day archive	Storage vs gzip
gzip	Decompress full 30-day file	—
zstd	Decompress full 30-day file	—
PFC-JSONL	Decompress ~1/720 of the file	25% smaller than gzip

~6–11% compression ratio on typical JSONL log data (25–40% smaller than gzip).

Zero egress cost

Cloud conversions run in-region: download → convert → upload, without ever routing through your laptop or billing for egress.

Input Formats (file conversion)

Format	Extension	Extra dependency
gzip	`.jsonl.gz`	stdlib ✅
bzip2	`.jsonl.bz2`	stdlib ✅
zstd	`.jsonl.zst`	`pip install pfc-migrate[zstd]`
lz4	`.jsonl.lz4`	`pip install pfc-migrate[lz4]`
Plain JSONL	`.jsonl`	stdlib ✅

Requirements

The pfc_jsonl binary must be installed on the machine running the export:

# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# macOS (Apple Silicon M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

License note: This tool requires the pfc_jsonl binary. pfc_jsonl is free for personal and open-source use — commercial use requires a separate license. See pfc-jsonl for details.

macOS Intel (x64): Binary coming soon. Windows: No native binary. Use WSL2 or a Linux machine.

Install

pip install pfc-migrate

# With zstd support
pip install pfc-migrate[zstd]

# With S3/Glacier support
pip install pfc-migrate[s3]

# With Azure Blob Storage support
pip install pfc-migrate[azure]

# With Google Cloud Storage support
pip install pfc-migrate[gcs]

# For CrateDB direct export
pip install pfc-migrate[postgres]

# For QuestDB direct export
pip install pfc-migrate[questdb]

Usage — CrateDB direct export

Stream rows directly from a CrateDB table into a .pfc archive. No intermediate files.

pip install pfc-migrate[postgres]

# Export one week of logs
pfc-migrate cratedb \
  --host crate.example.com \
  --user crate \
  --dbname doc \
  --schema doc \
  --table logs \
  --ts-column ts \
  --from-ts "2026-03-01" --to-ts "2026-03-08" \
  --output logs_2026-03-01.pfc \
  --verbose

# Auto-named output: logs_20260301_20260308.pfc
pfc-migrate cratedb --host localhost --table logs \
  --from-ts "2026-03-01" --to-ts "2026-03-08" --verbose

Verbose output:

  -> Connecting to CrateDB at localhost:5432 (db: doc) ...
  -> Columns (6): ts, level, message, host, service, duration_ms
  -> Streaming rows (batch size: 10,000) ...
     100,000 rows  (17.4 MiB) ...
     200,000 rows  (34.8 MiB) ...
  -> Exported 250,000 rows  (43.7 MiB JSONL)
  -> Compressing with pfc_jsonl ...
  ✓ 250,000 rows  |  JSONL 43.7 MiB  ->  PFC 2.6 MiB  (5.9%)  ->  logs_20260301_20260308.pfc

Option	Default	Description
`--host`	localhost	CrateDB host
`--port`	5432	PostgreSQL wire port
`--user`	crate	Username
`--password`	(empty)	Password
`--dbname`	doc	Database name
`--schema`	doc	Schema name
`--table`	required	Table to export
`--ts-column`	None	Timestamp column for WHERE filter and ORDER BY
`--from-ts`	None	Start of range (inclusive, ISO 8601)
`--to-ts`	None	End of range (exclusive, ISO 8601)
`--batch-size`	10000	Rows per fetch (memory-safe batching)
`--output`	(auto)	Output `.pfc` file
`--verbose`	false	Show row progress and size stats

Usage — QuestDB direct export

Stream rows directly from a QuestDB table into a .pfc archive. No intermediate files.

pip install pfc-migrate[questdb]

# Export one week of trades
pfc-migrate questdb \
  --host quest.example.com \
  --table trades \
  --ts-column timestamp \
  --from-ts "2026-03-01" --to-ts "2026-03-08" \
  --output trades_2026-03-01.pfc \
  --verbose

# Auto-named output: trades_20260301_20260308.pfc
pfc-migrate questdb --host localhost --table trades \
  --from-ts "2026-03-01" --to-ts "2026-03-08" --verbose

Verbose output:

  -> Connecting to QuestDB at localhost:8812 (db: qdb) ...
  -> Columns (5): timestamp, symbol, price, volume, side
  -> Streaming rows (batch size: 10,000) ...
     100,000 rows  (18.1 MiB) ...
  -> Exported 120,000 rows  (21.7 MiB JSONL)
  -> Compressing with pfc_jsonl ...
  ✓ 120,000 rows  |  JSONL 21.7 MiB  ->  PFC 1.3 MiB  (6.0%)  ->  trades_20260301_20260308.pfc

Option	Default	Description
`--host`	localhost	QuestDB host
`--port`	8812	PostgreSQL wire port
`--user`	admin	Username
`--password`	quest	Password
`--dbname`	qdb	Database name
`--table`	required	Table to export (no schema prefix)
`--ts-column`	None	Timestamp column for WHERE filter and ORDER BY
`--from-ts`	None	Start of range (inclusive, ISO 8601)
`--to-ts`	None	End of range (exclusive, ISO 8601)
`--batch-size`	10000	Rows per fetch (memory-safe batching)
`--output`	(auto)	Output `.pfc` file
`--verbose`	false	Show row progress and size stats

Note: QuestDB has no schema concept — tables are referenced by name only. There is no --schema option.

Usage — Local filesystem

# Single file (output auto-named: logs.pfc + logs.pfc.bidx)
pfc-migrate convert logs.jsonl.gz

# Explicit output
pfc-migrate convert logs.jsonl.gz logs.pfc

# Entire directory
pfc-migrate convert --dir /var/log/archive/ --output-dir /var/log/pfc/

# Recursive + verbose
pfc-migrate convert --dir /mnt/logs/ -r -v

Usage — Amazon S3 / S3 Glacier

Conversion happens in-region (download to temp dir → convert → upload). No egress charges.

# Single object
pfc-migrate s3 \
  --bucket my-logs \
  --key archive/app_2026-03.jsonl.gz \
  --out-bucket my-logs-pfc \
  --out-prefix converted/

# All objects matching a prefix
pfc-migrate s3 \
  --bucket my-logs \
  --prefix archive/2026-03/ \
  --out-bucket my-logs-pfc \
  --out-prefix converted/2026-03/ \
  --format gz \
  --verbose

# Glacier (Expedited retrieval)
pfc-migrate glacier \
  --bucket my-glacier-logs \
  --prefix 2025/ \
  --out-bucket my-glacier-pfc \
  --retrieval-tier Expedited

Usage — Azure Blob Storage

# All blobs matching a prefix
pfc-migrate azure \
  --container my-logs \
  --prefix archive/2026-03/ \
  --out-container my-logs-pfc \
  --connection-string "DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;"

Usage — Google Cloud Storage

# All objects matching a prefix
pfc-migrate gcs \
  --bucket my-logs \
  --prefix archive/2026-03/ \
  --out-bucket my-logs-pfc \
  --verbose

Hybrid queries: CrateDB live + PFC cold storage

Query CrateDB live data and cold PFC archives in a single DuckDB SQL statement:

import duckdb, psycopg2

con = duckdb.connect()
con.execute("INSTALL pfc FROM community; LOAD pfc; LOAD json;")

# Register CrateDB live data as a view
cratedb_conn = psycopg2.connect(host="localhost", user="crate", dbname="doc")
live_data = cratedb_conn.cursor()
live_data.execute("SELECT * FROM logs WHERE ts >= '2026-04-01'")
con.register("live_logs", live_data.fetchall())

# Query cold PFC archives + hot live data in one SQL
result = con.execute("""
    SELECT ts, level, message
    FROM pfc_scan([
        '/archives/logs_2026-01.pfc',
        '/archives/logs_2026-02.pfc',
        '/archives/logs_2026-03.pfc'
    ])
    UNION ALL
    SELECT ts, level, message FROM live_logs
    ORDER BY ts
""").fetchall()

See examples/cratedb_archive_explorer.py for a complete demo.

Hybrid queries: QuestDB live + PFC cold storage

Query QuestDB live data and cold PFC archives in a single DuckDB SQL statement:

import duckdb, psycopg2

con = duckdb.connect()
con.execute("INSTALL pfc FROM community; LOAD pfc; LOAD json;")

# Register QuestDB live data as a view
questdb_conn = psycopg2.connect(host="localhost", port=8812,
                                user="admin", password="quest", dbname="qdb")
live_data = questdb_conn.cursor()
live_data.execute("SELECT * FROM trades WHERE timestamp >= '2026-04-01'")
con.register("live_trades", live_data.fetchall())

# Query cold PFC archives + hot live data in one SQL
result = con.execute("""
    SELECT timestamp, symbol, price, volume
    FROM pfc_scan([
        '/archives/trades_2026-01.pfc',
        '/archives/trades_2026-02.pfc',
        '/archives/trades_2026-03.pfc'
    ])
    UNION ALL
    SELECT timestamp, symbol, price, volume FROM live_trades
    ORDER BY timestamp
""").fetchall()

Lossless guarantee

Every conversion is verified by full decompression and MD5 check before output is written. If anything doesn't match, the output file is deleted and the error is reported — the original is never modified. For S3, GCS, and Azure subcommands, --delete removes the original cloud object only after successful verification.

Related Projects

Project	Description
pfc-jsonl	Core binary — compress, decompress, query
pfc-duckdb	DuckDB Community Extension (`INSTALL pfc FROM community`)
pfc-fluentbit	Fluent Bit -> PFC forwarder for live pipelines
pfc-archiver-cratedb	Autonomous daemon: archive old CrateDB partitions automatically
pfc-archiver-questdb	Autonomous daemon: archive old QuestDB partitions automatically
pfc-vector	High-performance Rust ingest daemon for Vector.dev and Telegraf
pfc-otel-collector	OpenTelemetry OTLP/HTTP log exporter
pfc-kafka-consumer	Kafka / Redpanda consumer
pfc-telegraf	Telegraf HTTP output plugin → PFC
pfc-grafana	Grafana data source plugin for PFC archives

License

pfc-migrate (this repository) is released under the MIT License — see LICENSE.

The PFC-JSONL binary (pfc_jsonl) is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
examples		examples
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pfc_migrate.py		pfc_migrate.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_all_backends.py		test_all_backends.py
test_lossless.py		test_lossless.py
test_s3_lossless.py		test_s3_lossless.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pfc-migrate — Move any JSONL log or event data to PFC cold storage

What this does

Why convert?

Zero egress cost

Input Formats (file conversion)

Requirements

Install

Usage — CrateDB direct export

Usage — QuestDB direct export

Usage — Local filesystem

Usage — Amazon S3 / S3 Glacier

Usage — Azure Blob Storage

Usage — Google Cloud Storage

Hybrid queries: CrateDB live + PFC cold storage

Hybrid queries: QuestDB live + PFC cold storage

Lossless guarantee

Related Projects

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pfc-migrate — Move any JSONL log or event data to PFC cold storage

What this does

Why convert?

Zero egress cost

Input Formats (file conversion)

Requirements

Install

Usage — CrateDB direct export

Usage — QuestDB direct export

Usage — Local filesystem

Usage — Amazon S3 / S3 Glacier

Usage — Azure Blob Storage

Usage — Google Cloud Storage

Hybrid queries: CrateDB live + PFC cold storage

Hybrid queries: QuestDB live + PFC cold storage

Lossless guarantee

Related Projects

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages