Skip to content

purecodework/bookllm

Repository files navigation

BookLLM

English 中文 MIT License

BookLLM is a self-hosted book translation app with glossary, review, and polish pipelines.

It supports OpenAI-compatible APIs and local LLM runtimes for long-form reading and iterative translation workflows.

Current language options support translation between Chinese, English, Japanese, Korean, French, German, Spanish, and Russian. Source language can also be set to auto-detect.

BookLLM Reader

Bilingual long-form reading with live translation output.

Features

OpenAI-compatible
Works with OpenAI-compatible providers and runtimes, including OpenAI, DeepSeek, OpenRouter, Ollama, LM Studio, and OMLX.
Live Reader
Stream translations directly into a bilingual long-form reading interface.
Multi-stage Pipeline
Translate with optional glossary extraction, review, and polishing stages.
Sidekick Model
Use Sidekick as an optional second model for review, polishing, or quality-focused stages.
Glossary Support
Maintain per-page glossary context, with optional whole-book extraction.
PDF / OCR Support
Extract text from PDFs and use bundled OCR for scanned pages.

Screenshots

BookLLM Library
Library
Books, progress, and translation tasks.
BookLLM Translation Pipeline
Translation Pipeline
Glossary, review, and polishing configuration.

Pipeline Cost Notice

Important

Enabling optional pipeline stages can significantly increase token usage, API cost, and processing time.

These stages may improve translation quality, but the marginal gain is not always large enough to justify the extra cost and latency, especially for everyday translation tasks.

Before using the full pipeline on a long document, test it on a short file or a small excerpt first. Compare output quality, latency, and token usage, then enable only the stages that clearly help your use case.

Quick Start

Run BookLLM locally with Docker Compose. No .env file is required for single-machine use.

git clone https://github.com/purecodework/bookllm.git
cd BookLLM
docker compose up -d --build

Then open:

http://localhost:3000

For later starts, use:

docker compose up -d

Rebuild only after source, dependency, or Dockerfile changes:

docker compose up -d --build

Initial Setup

  1. Open Model Connection from the sidebar.
  2. Configure your OpenAI-compatible endpoint:
  3. Optional: if you are using a hosted API provider instead of a local runtime, tune the input token budget and concurrency in Settings to improve throughput.
  4. Upload an EPUB, PDF, or TXT file to begin translation.

Architecture

Next.js frontend
  -> NestJS API + SSE
     -> PostgreSQL: books, pages, settings, glossary/context, token usage
     -> Redis: BullMQ queues, job state, event fanout
     -> Translation workers: chunking, glossary/context, translation, review, polish, export
     -> FastAPI OCR service: PyMuPDF + Tesseract
     -> Local asset storage: originals, covers, extracted images
     -> OpenAI-compatible LLM endpoints: primary + optional Sidekick

Known Limitations

  • Token usage shown in the UI is approximate. Provider dashboards remain the billing source of truth.
  • OCR does not preserve the original layout.

License

MIT

About

Translate books and documents with LLMs, featuring glossary extraction, review, and polishing pipelines.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors