Skip to content

ImpossibleForge/pfc-gateway

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pfc-gateway

Bidirectional HTTP gateway for PFC cold archives — no DuckDB required.

pfc-gateway makes PFC archives on S3 (or local storage) queryable by any tool — Grafana, Python, curl, PowerBI — through a simple HTTP API. It also receives NDJSON from Fluent Bit, Vector, Telegraf, or any HTTP client and compresses it to .pfc archives automatically.

Part of the PFC Ecosystem.


What it does

[Fluent Bit / Vector / Telegraf / curl]
          │
          ▼  POST /ingest — push NDJSON rows
     pfc-gateway  (this server)  ←─────────── also receives data
          │
          ├── .pfc_buffer.jsonl  (live buffer)
          └── ingest_<ts>.pfc    (auto-rotated on size or time)

[Grafana / Python / curl / PowerBI / your own tools]
          │
          ▼  POST /query — HTTP REST, no client library needed
     pfc-gateway  (this server)  ────────────► serves data
          │
          ▼  pfc_jsonl s3-fetch — HTTP Range requests
     .pfc archives on S3 / local
          │
          ▼  only ~4% of the archive is read per query
     NDJSON stream back to client

One server. Ingest from any tool. Query from any tool. No DuckDB, no custom plugins.


Install

# 1. Install pfc_jsonl binary (required)
# Linux x64:
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-linux-x64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# macOS (Apple Silicon / M1–M4):
curl -L https://github.com/ImpossibleForge/pfc-jsonl/releases/latest/download/pfc_jsonl-macos-arm64 \
     -o /usr/local/bin/pfc_jsonl && chmod +x /usr/local/bin/pfc_jsonl

# 2. Install pfc-gateway
git clone https://github.com/ImpossibleForge/pfc-gateway
cd pfc-gateway
pip install fastapi uvicorn boto3 python-dateutil

# 3. Start the server
PFC_API_KEY=your-secret-key uvicorn pfc_gateway:app --host 0.0.0.0 --port 8765

License note: This tool requires the pfc_jsonl binary. pfc_jsonl is free for personal and open-source use — commercial use requires a separate license. See pfc-jsonl for details.

AWS credentials are read from the standard locations (~/.aws/credentials, environment variables, IAM role). No extra config needed.


Ingest — receive data from any HTTP source

Enable ingest by setting PFC_INGEST_DIR. The gateway appends rows to a buffer file and rotates it to a compressed .pfc file when a size or time threshold is reached.

# Start gateway with ingest enabled
PFC_API_KEY=secret PFC_INGEST_DIR=/data/pfc \
  uvicorn pfc_gateway:app --host 0.0.0.0 --port 8765

Send rows with curl

# JSON array
curl -s -X POST http://localhost:8765/ingest \
  -H "X-API-Key: secret" \
  -H "Content-Type: application/json" \
  -d '[{"ts":"2026-04-21T10:00:00Z","level":"INFO","msg":"server started"}]'

# NDJSON (Fluent Bit json_stream / Vector ndjson)
printf '{"ts":"2026-04-21T10:00:01Z","level":"WARN","msg":"high cpu"}\n' | \
  curl -s -X POST http://localhost:8765/ingest \
  -H "X-API-Key: secret" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @-

Fluent Bit HTTP output

[OUTPUT]
    Name              http
    Match             *
    Host              your-server
    Port              8765
    URI               /ingest
    Format            json          # sends JSON array — pfc-gateway auto-detects
    Header            X-API-Key secret

Vector HTTP sink

[sinks.pfc_gateway]
type     = "http"
inputs   = ["your_source"]
uri      = "http://your-server:8765/ingest"
encoding.codec = "ndjson"

[sinks.pfc_gateway.request.headers]
X-API-Key = "secret"

Force-flush the buffer

curl -s -X POST http://localhost:8765/ingest/flush \
  -H "X-API-Key: secret"
# → {"flushed": true, "rows": 4821, "file": "/data/pfc/ingest_20260421T103045.pfc"}

Check buffer status

curl -s http://localhost:8765/ingest/status -H "X-API-Key: secret"
# → {"enabled": true, "buffer_rows": 142, "buffer_mb": 0.021,
#    "last_flush_age_sec": 312.4, "rotate_mb": 64, "rotate_sec": 3600, ...}

Query a PFC archive

curl (S3 file)

curl -s \
  -H "X-API-Key: your-secret-key" \
  -H "Content-Type: application/json" \
  -X POST http://localhost:8765/query \
  -d '{
    "file":    "s3://my-archive/pfc/logs_2026-03.pfc",
    "from_ts": "2026-03-05T10:00:00Z",
    "to_ts":   "2026-03-05T12:00:00Z",
    "filter":  {"level": "ERROR"}
  }'

Response: NDJSON stream

{"ts":"2026-03-05T10:14:32Z","level":"ERROR","message":"connection refused","host":"web-03"}
{"ts":"2026-03-05T11:02:19Z","level":"ERROR","message":"disk full","host":"db-01"}

curl (local file)

curl -s \
  -H "X-API-Key: your-secret-key" \
  -X POST http://localhost:8765/query \
  -H "Content-Type: application/json" \
  -d '{"file":"/data/archive/logs_march.pfc","from_ts":"2026-03-01","to_ts":"2026-04-01"}'

Python

import requests, json

resp = requests.post(
    "http://localhost:8765/query",
    headers={"X-API-Key": "your-secret-key"},
    json={
        "file":    "s3://my-archive/pfc/logs_2026-03.pfc",
        "from_ts": "2026-03-05T10:00Z",
        "to_ts":   "2026-03-05T12:00Z",
    },
    stream=True,
)
for line in resp.iter_lines():
    row = json.loads(line)
    print(row["ts"], row.get("level"), row.get("message"))

Query multiple files (multi-month)

curl -s \
  -H "X-API-Key: your-secret-key" \
  -X POST http://localhost:8765/query/batch \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      "s3://my-archive/pfc/logs_2026-01.pfc",
      "s3://my-archive/pfc/logs_2026-02.pfc",
      "s3://my-archive/pfc/logs_2026-03.pfc"
    ],
    "from_ts": "2026-01-15T00:00Z",
    "to_ts":   "2026-03-15T00:00Z"
  }'

Files are queried in order. Results stream back as a single combined NDJSON response.


SQL Query via DuckDB (optional)

If DuckDB with the pfc extension is installed on the gateway server, you can run full SQL queries against .pfc archives:

curl -X POST http://localhost:8765/query/sql \
  -H "x-api-key: secret" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT json_extract_string(line, '"'"'$.level'"'"') AS level, COUNT(*) AS cnt FROM pfc_scan('"'"'/var/lib/pfc/logs.pfc'"'"') GROUP BY level ORDER BY cnt DESC"
  }'

Supports any DuckDB SQL — GROUP BY, AVG, JOIN across multiple files, window functions:

-- Avg latency per service, last hour
SELECT json_extract_string(line, '$.service') AS service,
       ROUND(AVG(json_extract(line, '$.latency_ms')::FLOAT), 1) AS avg_ms
FROM pfc_scan('/var/lib/pfc/logs.pfc')
GROUP BY service ORDER BY avg_ms DESC;

Check if SQL mode is available on your gateway instance:

curl http://localhost:8765/ -H "x-api-key: secret"
# {"status":"ok","version":"0.3.0","binary":"...","sql_mode":true}

sql_mode: false means DuckDB is not installed — standard /query still works normally.

Setup:

# Install DuckDB
curl -L https://github.com/duckdb/duckdb/releases/latest/download/duckdb_cli-linux-amd64.gz \
  | gunzip > /usr/local/bin/duckdb && chmod +x /usr/local/bin/duckdb
# Install pfc extension
duckdb -c "INSTALL pfc FROM community;"

Grafana Integration

pfc-gateway implements the Grafana SimpleJSON data source protocol.

Setup (takes 2 minutes):

  1. In Grafana → Settings → Data Sources → Add data source
  2. Search for SimpleJSON (install plugin if needed)
  3. URL: http://your-server:8765/grafana
  4. Custom HTTP Header: X-API-Key = your secret key
  5. Save & Test → should show "Data source is working"

In a dashboard panel:

  • Target: s3://my-archive/pfc/logs_2026-03.pfc
  • Optional filter: s3://my-archive/logs.pfc|{"level":"ERROR"}

Grafana's time range picker controls from_ts and to_ts automatically.


Live DB + cold archives in one dashboard

Without pfc-gateway: Only the last 30 days (hot live data) visible in Grafana.

With pfc-gateway:

  • Panel 1: Live DB data source (last 30 days)
  • Panel 2: pfc-gateway data source (months/years of cold PFC archives)

Both panels in the same Grafana dashboard. No re-import. No DuckDB.


API Reference

Query endpoints

POST /query

Field Type Description
file string Local path or s3://bucket/key.pfc
from_ts string ISO 8601 start time (inclusive)
to_ts string ISO 8601 end time (exclusive)
filter object Optional equality filter {"level": "ERROR"}
aws_profile string Optional AWS profile name

Returns: application/x-ndjson stream.

POST /query/batch

Same as /query but with files: [...] array instead of single file.

GET /

Health check. Returns {"status": "ok", "version": "0.2.0"}.

Ingest endpoints

POST /ingest

Accepts rows in three formats (auto-detected):

  • JSON array: [{...}, {...}]
  • Object with rows key: {"rows": [{...}, ...]}
  • Raw NDJSON: {...}\n{...}\n

Returns: {"accepted": N}

Requires PFC_INGEST_DIR to be set (returns 503 otherwise).

POST /ingest/flush

Force-compresses the current buffer to a .pfc file immediately.

Returns: {"flushed": true, "rows": N, "file": "/path/to/ingest_<ts>.pfc"} or {"flushed": false, "reason": "empty"}.

GET /ingest/status

Returns buffer statistics: row count, byte size, age since last flush, last output file, rotation thresholds.


Environment Variables

Query / Auth

Variable Default Description
PFC_API_KEY (none — auth off) API key for X-API-Key header
PFC_JSONL_BINARY /usr/local/bin/pfc_jsonl Path to pfc_jsonl binary
PFC_HOST 0.0.0.0 Bind address
PFC_PORT 8765 Port
PFC_PRESIGN_TTL 3600 Pre-signed URL TTL in seconds
AWS_DEFAULT_REGION eu-central-1 AWS region for S3

Ingest

Variable Default Description
PFC_INGEST_DIR (none — ingest off) Directory for buffer + output .pfc files
PFC_INGEST_ROTATE_MB 64 Rotate when buffer reaches this size (MB)
PFC_INGEST_ROTATE_SEC 3600 Rotate when buffer is older than this (seconds)
PFC_INGEST_PREFIX ingest Output filename prefix: ingest_<ts>.pfc

Standard AWS variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_PROFILE) are respected automatically.


Run as systemd service

# /etc/systemd/system/pfc-gateway.service
[Unit]
Description=pfc-gateway — PFC cold archive bidirectional gateway
After=network.target

[Service]
Type=simple
User=pfc
WorkingDirectory=/opt/pfc-gateway
ExecStart=/usr/bin/uvicorn pfc_gateway:app --host 0.0.0.0 --port 8765
Restart=on-failure
Environment=PFC_API_KEY=your-secret-key
Environment=AWS_DEFAULT_REGION=eu-central-1
Environment=PFC_INGEST_DIR=/data/pfc          # omit to disable ingest
Environment=PFC_INGEST_ROTATE_MB=64
Environment=PFC_INGEST_ROTATE_SEC=3600

[Install]
WantedBy=multi-user.target
sudo systemctl enable --now pfc-gateway

Run as Docker container

docker run -d \
  -p 8765:8765 \
  -e PFC_API_KEY=your-secret-key \
  -e AWS_ACCESS_KEY_ID=... \
  -e AWS_SECRET_ACCESS_KEY=... \
  --name pfc-gateway \
  impossibleforge/pfc-gateway:latest

Architecture in the full PFC ecosystem

Your data sources
    │
    ├── pfc-migrate     (one-shot export)
    ├── pfc-archiver-*  (autonomous daemon)
    ├── pfc-fluentbit   (live pipeline)
    └── pfc-gateway     (POST /ingest ← NEW)  ← this repo
              │
              ▼
    .pfc archives (local / S3 / Azure / GCS)
              │
    ┌─────────┴──────────┐
    │                    │
    ▼                    ▼
pfc-duckdb          pfc-gateway  ← this repo
SQL queries         HTTP REST
(DuckDB needed)     (no DuckDB)
    │                    │
    ▼                    ▼
Python / CLI        Grafana / PowerBI / curl / own tools
                    Fluent Bit / Vector / Telegraf (ingest)
Tool What DuckDB needed
pfc-migrate One-shot export to .pfc No
pfc-archiver-* Autonomous archive daemon No
pfc-fluentbit Live pipeline → .pfc No
pfc-duckdb SQL queries on PFC files Yes
pfc-gateway HTTP REST — any tool No

Related repos


ImpossibleForge — github.com/ImpossibleForge Contact: info@impossibleforge.com


License

pfc-gateway (this repository) is released under the MIT License — see LICENSE.

The PFC-JSONL binary () is proprietary software — free for personal and open-source use. Commercial use requires a license: info@impossibleforge.com

About

HTTP REST gateway for PFC cold storage archives — query any time range without full decompression. Part of the PFC Ecosystem.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages