Skip to content

rustammdev/manga-ocr

Repository files navigation

Manga Pipeline

An end-to-end translation pipeline for manga (Japanese) and manhwa (Korean) pages into Uzbek. The system combines computer vision, OCR, and LLM-based translation into a single pluggable workflow, exposed through both a CLI and a FastAPI web service.

Overview

The pipeline automates every step required to publish a translated chapter: detecting text regions, recognizing source text, producing context-aware translations, removing original text from artwork, and rendering the translated lines back onto clean pages.

split → mask → OCR → translate → clean → render

Each stage is implemented as an independent module behind a stable interface, allowing individual components (OCR engine, translator backend, inpainting method) to be swapped without touching the rest of the pipeline.

Key Features

  • Multi-engine OCR — five interchangeable backends (manga-ocr, EasyOCR, PaddleOCR, OpenAI Vision, YOLOv8 + Florence-2), selected via a factory based on source language and configuration.
  • Three-phase LLM translation — scene analysis, contextual translation, and automated review. Built on LangChain with structured output (Pydantic models) for reliable parsing.
  • Multiple LLM providers — OpenAI, Google Gemini, Anthropic, and local Ollama models behind a single Translator interface.
  • Context preservation — per-manga character glossary, tone, and per-chapter story summaries are persisted and fed into subsequent chapter translations for consistency.
  • Text removal — three inpainting strategies: pcleaner, LaMa, and OpenCV, with GPU memory managed through context managers.
  • Smart segmentation — row-variance analysis splits tall manhwa strips into renderable segments; YOLO-based speech-bubble detection enables automatic page merging.
  • Typesetting — tone-to-font mapping, adaptive font sizing, text wrapping, and background-brightness-aware color selection.
  • Web service — FastAPI application with MongoDB persistence, a thread-pool job manager, and WebSocket progress streaming.
  • CDN publishing — direct export of rendered chapters and thumbnails to Cloudflare R2.

Architecture

Design Principles

  • Single source of truth — MongoDB holds all pipeline state; the orchestrator does not perform file I/O for results.
  • Factory patterncreate_ocr_engine(), create_translator(), and create_chat_model() decouple configuration from implementation.
  • Lazy loading — heavy dependencies (OCR models, LLM clients, R2 SDK) are initialized only when first used.
  • Immutable domain modelsBBox, TextRegion, PageResult, and PipelineConfig are frozen dataclasses, eliminating a class of concurrency bugs.
  • Thread safety — each translation job receives its own pipeline instance; shared context files are guarded by locks.
  • Observability — a structured [PROGRESS:N] log format is routed through a custom logging handler into an asyncio queue and streamed to clients over WebSocket.

Module Layout

src/manga_pipeline/
├── config.py           PipelineConfig, enums for language and backend
├── models.py           Frozen domain models
├── pipeline.py         MangaPipeline orchestrator
├── conversation.py     Translation, validator, and reviewer agents
├── scene_analyzer.py   LangChain structured output for scene/character extraction
├── translator.py       Translator ABC and batch translation
├── pivot.py            KO → EN → UZ pivot translation
├── context.py          Character glossary and tone management
├── cleaner.py          pcleaner / LaMa / OpenCV inpainting
├── splitter.py         Row-variance image segmentation
├── auto_merge.py       YOLO-based bubble detection and merging
├── renderer.py         Pillow-based page rendering
├── storage.py          Cloudflare R2 client
├── ocr/                OCR engine implementations behind a common ABC
└── web/                FastAPI app, routes, and services

Translation Flow

  1. Scene Analysis — the chapter is analyzed into scenes, characters, and a narrative arc using LangChain structured output.
  2. Translation — scenes are translated sequentially with conversation history preserved across chunks.
  3. Review — a reviewer agent performs quality checks; a per-chunk validator retries translations scoring below the configured threshold.

Data Model

Collection Key Purpose
projects slug Chapter registry, settings, metadata
jobs job_id Job lifecycle and error state
pages (manga, chapter, page_idx) Page results with embedded region array
chapter_usage (manga, chapter, usage_type) Token and cost telemetry
chapter_actions Audit log of user and pipeline actions

Chapters transition through a state machine: uploaded → processing → ocr_done → translating → done, with a terminal failed state reachable from any active stage.

Installation

Requires Python 3.11 or newer.

python -m venv venv
source venv/bin/activate
pip install -e .

Optional extras:

pip install -e ".[korean]"         # EasyOCR for Korean/Russian
pip install -e ".[lama]"           # LaMa inpainting
pip install -e ".[yolo_florence]"  # YOLOv8 + Florence-2 OCR
pip install -e ".[cdn]"            # Cloudflare R2 publishing

Configuration

Copy .env.example to .env and provide the required credentials:

Variable Description
OPENAI_API_KEY Required for the OpenAI backend
OPENAI_MODEL Translator model (default: gpt-5-mini)
GEMINI_API_KEY Required for the Gemini backend
OLLAMA_HOST Local Ollama server address
MONGODB_URI MongoDB connection string
R2_* Cloudflare R2 credentials for CDN publishing

Usage

CLI

python main.py --lang ja --backend openai
python main.py --lang ko --backend gemini --data-dir data

Web Server

python run_web.py

Selected HTTP endpoints:

Method Endpoint Purpose
POST /api/jobs Run the OCR and cleaning pipeline
POST /api/translate-chapter Translate a single chapter
POST /api/results/{manga}/{chapter}/retranslate Retranslate untranslated regions
GET, PATCH /api/results/{manga}/{chapter} Read or edit results
POST /api/publish/{manga} Publish rendered chapters to R2
WS /ws/jobs/{id} Stream real-time job progress

Development

black src/ && isort src/ && ruff check src/
pytest tests/

All formatters and linters use a line length of 88.

License

Released under the GNU General Public License v3.0 or later.

Project Layout on Disk

data/<manga-slug>/
├── context.yaml          Extracted characters, glossary, tone
├── story_progress.yml    Per-chapter summaries for inter-chapter coherence
├── meta.yml              Catalog metadata
└── chapters/<chapter>/
    ├── input/            Uploaded source images
    └── output/
        ├── _work/        Intermediate split segments
        ├── masks/        Text region masks
        ├── clean/        Inpainted images
        └── run_info.json Run metadata and timing

About

Manga & manhwa translation pipeline — detect, OCR, batch-translate, and clean Japanese/Korean manga pages into Uzbek. Python + pcleaner + manga-ocr/EasyOCR + OpenAI/Ollama.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages