An end-to-end translation pipeline for manga (Japanese) and manhwa (Korean) pages into Uzbek. The system combines computer vision, OCR, and LLM-based translation into a single pluggable workflow, exposed through both a CLI and a FastAPI web service.
The pipeline automates every step required to publish a translated chapter: detecting text regions, recognizing source text, producing context-aware translations, removing original text from artwork, and rendering the translated lines back onto clean pages.
split → mask → OCR → translate → clean → render
Each stage is implemented as an independent module behind a stable interface, allowing individual components (OCR engine, translator backend, inpainting method) to be swapped without touching the rest of the pipeline.
- Multi-engine OCR — five interchangeable backends (manga-ocr, EasyOCR, PaddleOCR, OpenAI Vision, YOLOv8 + Florence-2), selected via a factory based on source language and configuration.
- Three-phase LLM translation — scene analysis, contextual translation, and automated review. Built on LangChain with structured output (Pydantic models) for reliable parsing.
- Multiple LLM providers — OpenAI, Google Gemini, Anthropic, and local Ollama models behind a single
Translatorinterface. - Context preservation — per-manga character glossary, tone, and per-chapter story summaries are persisted and fed into subsequent chapter translations for consistency.
- Text removal — three inpainting strategies: pcleaner, LaMa, and OpenCV, with GPU memory managed through context managers.
- Smart segmentation — row-variance analysis splits tall manhwa strips into renderable segments; YOLO-based speech-bubble detection enables automatic page merging.
- Typesetting — tone-to-font mapping, adaptive font sizing, text wrapping, and background-brightness-aware color selection.
- Web service — FastAPI application with MongoDB persistence, a thread-pool job manager, and WebSocket progress streaming.
- CDN publishing — direct export of rendered chapters and thumbnails to Cloudflare R2.
- Single source of truth — MongoDB holds all pipeline state; the orchestrator does not perform file I/O for results.
- Factory pattern —
create_ocr_engine(),create_translator(), andcreate_chat_model()decouple configuration from implementation. - Lazy loading — heavy dependencies (OCR models, LLM clients, R2 SDK) are initialized only when first used.
- Immutable domain models —
BBox,TextRegion,PageResult, andPipelineConfigare frozen dataclasses, eliminating a class of concurrency bugs. - Thread safety — each translation job receives its own pipeline instance; shared context files are guarded by locks.
- Observability — a structured
[PROGRESS:N]log format is routed through a custom logging handler into an asyncio queue and streamed to clients over WebSocket.
src/manga_pipeline/
├── config.py PipelineConfig, enums for language and backend
├── models.py Frozen domain models
├── pipeline.py MangaPipeline orchestrator
├── conversation.py Translation, validator, and reviewer agents
├── scene_analyzer.py LangChain structured output for scene/character extraction
├── translator.py Translator ABC and batch translation
├── pivot.py KO → EN → UZ pivot translation
├── context.py Character glossary and tone management
├── cleaner.py pcleaner / LaMa / OpenCV inpainting
├── splitter.py Row-variance image segmentation
├── auto_merge.py YOLO-based bubble detection and merging
├── renderer.py Pillow-based page rendering
├── storage.py Cloudflare R2 client
├── ocr/ OCR engine implementations behind a common ABC
└── web/ FastAPI app, routes, and services
- Scene Analysis — the chapter is analyzed into scenes, characters, and a narrative arc using LangChain structured output.
- Translation — scenes are translated sequentially with conversation history preserved across chunks.
- Review — a reviewer agent performs quality checks; a per-chunk validator retries translations scoring below the configured threshold.
| Collection | Key | Purpose |
|---|---|---|
projects |
slug | Chapter registry, settings, metadata |
jobs |
job_id | Job lifecycle and error state |
pages |
(manga, chapter, page_idx) | Page results with embedded region array |
chapter_usage |
(manga, chapter, usage_type) | Token and cost telemetry |
chapter_actions |
— | Audit log of user and pipeline actions |
Chapters transition through a state machine: uploaded → processing → ocr_done → translating → done, with a terminal failed state reachable from any active stage.
Requires Python 3.11 or newer.
python -m venv venv
source venv/bin/activate
pip install -e .Optional extras:
pip install -e ".[korean]" # EasyOCR for Korean/Russian
pip install -e ".[lama]" # LaMa inpainting
pip install -e ".[yolo_florence]" # YOLOv8 + Florence-2 OCR
pip install -e ".[cdn]" # Cloudflare R2 publishingCopy .env.example to .env and provide the required credentials:
| Variable | Description |
|---|---|
OPENAI_API_KEY |
Required for the OpenAI backend |
OPENAI_MODEL |
Translator model (default: gpt-5-mini) |
GEMINI_API_KEY |
Required for the Gemini backend |
OLLAMA_HOST |
Local Ollama server address |
MONGODB_URI |
MongoDB connection string |
R2_* |
Cloudflare R2 credentials for CDN publishing |
python main.py --lang ja --backend openai
python main.py --lang ko --backend gemini --data-dir datapython run_web.pySelected HTTP endpoints:
| Method | Endpoint | Purpose |
|---|---|---|
POST |
/api/jobs |
Run the OCR and cleaning pipeline |
POST |
/api/translate-chapter |
Translate a single chapter |
POST |
/api/results/{manga}/{chapter}/retranslate |
Retranslate untranslated regions |
GET, PATCH |
/api/results/{manga}/{chapter} |
Read or edit results |
POST |
/api/publish/{manga} |
Publish rendered chapters to R2 |
WS |
/ws/jobs/{id} |
Stream real-time job progress |
black src/ && isort src/ && ruff check src/
pytest tests/All formatters and linters use a line length of 88.
Released under the GNU General Public License v3.0 or later.
data/<manga-slug>/
├── context.yaml Extracted characters, glossary, tone
├── story_progress.yml Per-chapter summaries for inter-chapter coherence
├── meta.yml Catalog metadata
└── chapters/<chapter>/
├── input/ Uploaded source images
└── output/
├── _work/ Intermediate split segments
├── masks/ Text region masks
├── clean/ Inpainted images
└── run_info.json Run metadata and timing