Low-cost Telegram-first PDF translator using configurable provider models for OCR + translation.
- Channel: Telegram-only
- Language pairs:
tr,enanden,tr - Model policy: provider-selectable (
openaioropenrouter) - Billing: credits with manual Stars verification
- Retention: 24-hour cleanup
GET /healthGET /versionPOST /v1/jobs(multipart: file, source_lang, target_lang, telegram_user_id)GET /v1/jobs/{job_id}GET /v1/jobs/{job_id}/download?telegram_user_id=...GET /v1/credits/{telegram_user_id}POST /v1/admin/credits/grant(Header:x-admin-token)GET /v1/admin/jobs/stats(Header:x-admin-token)
cp .env.example .env
python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
python scripts/init_db.py
uvicorn pdf_translator.api.main:app --reload --port 8900For OpenAI-first testing:
MODEL_PROVIDER=openaiOPENAI_API_KEY=...- Suggested starter models:
OPENAI_TRANSLATE_MODEL=gpt-4.1-mini,OPENAI_OCR_MODEL=gpt-4.1-mini
./.env.openai.min-cost: lower cost (translate=gpt-4.1-nano,ocr=gpt-4.1-mini)./.env.openai.balanced: safer quality baseline (translate=gpt-4.1-mini,ocr=gpt-4.1-mini)./.env.local: put only your secret key here (OPENAI_API_KEY=...), this file is git-ignored
Usage:
cp .env.openai.balanced .env # or .env.openai.min-cost
cp .env.local.example .env.local
# edit .env.local and set OPENAI_API_KEYRun model-level PDF validation (text-layer + scanned-like):
make smoke-modelThis command creates sample PDFs under data/smoke/, opens jobs through the API app, waits for completion, and saves translated outputs under outputs/.
Reference test file:
data/fixtures/reference_english.pdf- Source:
https://arxiv.org/pdf/1706.03762.pdf
Run fixed-file smoke:
make smoke-referenceOutput file:
outputs/reference_english_en_tr.translated.pdf
Tuning (same reference PDF, faster loops):
SMOKE_REFERENCE_PAGES=5(default, first 5 pages)SMOKE_REFERENCE_PAGES=15(full file)SMOKE_REFERENCE_TIMEOUT_SEC=420
Run a quick layout benchmark on first 2 pages:
python scripts/benchmark_layout_candidates.pyExpected candidate paths:
outputs/reference_english_en_tr.translated.pdfoutputs/bench_pdf2zh/reference_english-mono.pdfoutputs/bench_pdf2zh_babeldoc/reference_english.tr.mono.pdf
export TELEGRAM_BOT_TOKEN=...
python -m pdf_translator.bot.telegram_bot- User checks
/pricing - User requests
/buyand receives reference code - Admin verifies Stars payment externally
- Admin grants credits via
POST /v1/admin/credits/grant
Run periodic cleanup (cron):
python scripts/cleanup_expired.py- Page limit per job: 150
- File size limit: 80MB
- Layout preservation is block-based (best effort)