Manga Pipeline

An end-to-end translation pipeline for manga (Japanese) and manhwa (Korean) pages into Uzbek. The system combines computer vision, OCR, and LLM-based translation into a single pluggable workflow, exposed through both a CLI and a FastAPI web service.

Overview

The pipeline automates every step required to publish a translated chapter: detecting text regions, recognizing source text, producing context-aware translations, removing original text from artwork, and rendering the translated lines back onto clean pages.

split → mask → OCR → translate → clean → render

Each stage is implemented as an independent module behind a stable interface, allowing individual components (OCR engine, translator backend, inpainting method) to be swapped without touching the rest of the pipeline.

Key Features

Multi-engine OCR — five interchangeable backends (manga-ocr, EasyOCR, PaddleOCR, OpenAI Vision, YOLOv8 + Florence-2), selected via a factory based on source language and configuration.
Three-phase LLM translation — scene analysis, contextual translation, and automated review. Built on LangChain with structured output (Pydantic models) for reliable parsing.
Multiple LLM providers — OpenAI, Google Gemini, Anthropic, and local Ollama models behind a single Translator interface.
Context preservation — per-manga character glossary, tone, and per-chapter story summaries are persisted and fed into subsequent chapter translations for consistency.
Text removal — three inpainting strategies: pcleaner, LaMa, and OpenCV, with GPU memory managed through context managers.
Smart segmentation — row-variance analysis splits tall manhwa strips into renderable segments; YOLO-based speech-bubble detection enables automatic page merging.
Typesetting — tone-to-font mapping, adaptive font sizing, text wrapping, and background-brightness-aware color selection.
Web service — FastAPI application with MongoDB persistence, a thread-pool job manager, and WebSocket progress streaming.
CDN publishing — direct export of rendered chapters and thumbnails to Cloudflare R2.

Architecture

Design Principles

Single source of truth — MongoDB holds all pipeline state; the orchestrator does not perform file I/O for results.
Factory pattern — create_ocr_engine(), create_translator(), and create_chat_model() decouple configuration from implementation.
Lazy loading — heavy dependencies (OCR models, LLM clients, R2 SDK) are initialized only when first used.
Immutable domain models — BBox, TextRegion, PageResult, and PipelineConfig are frozen dataclasses, eliminating a class of concurrency bugs.
Thread safety — each translation job receives its own pipeline instance; shared context files are guarded by locks.
Observability — a structured [PROGRESS:N] log format is routed through a custom logging handler into an asyncio queue and streamed to clients over WebSocket.

Module Layout

src/manga_pipeline/
├── config.py           PipelineConfig, enums for language and backend
├── models.py           Frozen domain models
├── pipeline.py         MangaPipeline orchestrator
├── conversation.py     Translation, validator, and reviewer agents
├── scene_analyzer.py   LangChain structured output for scene/character extraction
├── translator.py       Translator ABC and batch translation
├── pivot.py            KO → EN → UZ pivot translation
├── context.py          Character glossary and tone management
├── cleaner.py          pcleaner / LaMa / OpenCV inpainting
├── splitter.py         Row-variance image segmentation
├── auto_merge.py       YOLO-based bubble detection and merging
├── renderer.py         Pillow-based page rendering
├── storage.py          Cloudflare R2 client
├── ocr/                OCR engine implementations behind a common ABC
└── web/                FastAPI app, routes, and services

Translation Flow

Scene Analysis — the chapter is analyzed into scenes, characters, and a narrative arc using LangChain structured output.
Translation — scenes are translated sequentially with conversation history preserved across chunks.
Review — a reviewer agent performs quality checks; a per-chunk validator retries translations scoring below the configured threshold.

Data Model

Collection	Key	Purpose
`projects`	slug	Chapter registry, settings, metadata
`jobs`	job_id	Job lifecycle and error state
`pages`	(manga, chapter, page_idx)	Page results with embedded region array
`chapter_usage`	(manga, chapter, usage_type)	Token and cost telemetry
`chapter_actions`	—	Audit log of user and pipeline actions

Chapters transition through a state machine: uploaded → processing → ocr_done → translating → done, with a terminal failed state reachable from any active stage.

Installation

Requires Python 3.11 or newer.

python -m venv venv
source venv/bin/activate
pip install -e .

Optional extras:

pip install -e ".[korean]"         # EasyOCR for Korean/Russian
pip install -e ".[lama]"           # LaMa inpainting
pip install -e ".[yolo_florence]"  # YOLOv8 + Florence-2 OCR
pip install -e ".[cdn]"            # Cloudflare R2 publishing

Configuration

Copy .env.example to .env and provide the required credentials:

Variable	Description
`OPENAI_API_KEY`	Required for the OpenAI backend
`OPENAI_MODEL`	Translator model (default: `gpt-5-mini`)
`GEMINI_API_KEY`	Required for the Gemini backend
`OLLAMA_HOST`	Local Ollama server address
`MONGODB_URI`	MongoDB connection string
`R2_*`	Cloudflare R2 credentials for CDN publishing

Usage

CLI

python main.py --lang ja --backend openai
python main.py --lang ko --backend gemini --data-dir data

Web Server

python run_web.py

Selected HTTP endpoints:

Method	Endpoint	Purpose
`POST`	`/api/jobs`	Run the OCR and cleaning pipeline
`POST`	`/api/translate-chapter`	Translate a single chapter
`POST`	`/api/results/{manga}/{chapter}/retranslate`	Retranslate untranslated regions
`GET`, `PATCH`	`/api/results/{manga}/{chapter}`	Read or edit results
`POST`	`/api/publish/{manga}`	Publish rendered chapters to R2
`WS`	`/ws/jobs/{id}`	Stream real-time job progress

Development

black src/ && isort src/ && ruff check src/
pytest tests/

All formatters and linters use a line length of 88.

License

Released under the GNU General Public License v3.0 or later.

Project Layout on Disk

data/<manga-slug>/
├── context.yaml          Extracted characters, glossary, tone
├── story_progress.yml    Per-chapter summaries for inter-chapter coherence
├── meta.yml              Catalog metadata
└── chapters/<chapter>/
    ├── input/            Uploaded source images
    └── output/
        ├── _work/        Intermediate split segments
        ├── masks/        Text region masks
        ├── clean/        Inpainted images
        └── run_info.json Run metadata and timing

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.claude		.claude
.vscode		.vscode
fonts		fonts
scripts		scripts
src/manga_pipeline		src/manga_pipeline
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
run_web.py		run_web.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Manga Pipeline

Overview

Key Features

Architecture

Design Principles

Module Layout

Translation Flow

Data Model

Installation

Configuration

Usage

CLI

Web Server

Development

License

Project Layout on Disk

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Manga Pipeline

Overview

Key Features

Architecture

Design Principles

Module Layout

Translation Flow

Data Model

Installation

Configuration

Usage

CLI

Web Server

Development

License

Project Layout on Disk

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages