Skip to content

vmxmy/PDFMathTranslate-next

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,717 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

banner

PDFMathTranslate

Featured|HelloGitHub translation status

Byaidu%2FPDFMathTranslate | Trendshift

PDF scientific paper translation and bilingual comparison.

Warning

This project is provided "as is" under the AGPL v3 license, and no guarantees are provided for the quality and performance of the program. The entire risk of the program's quality and performance is borne by you. If the program is found to be defective, you will be responsible for all necessary service, repair, or correction costs.

Due to the maintainers' limited energy, we do not provide any form of usage assistance or problem-solving. Related issues will be closed directly! (Pull requests to improve project documentation are welcome; bugs or friendly issues that follow the issue template are not affected by this)

For details on how to contribute, please consult the Contribution Guide.

Updates

  • [Feb. 12, 2026] HTTP API now supports 异步队列任务、状态查询与按需下载翻译结果(by @vmxmy)
  • [Jun. 4, 2025] The project is renamed and move to PDFMathTranslate/PDFMathTranslate-next (by @awwaawwa)
  • [Mar. 3, 2025] Experimental support for the new backend BabelDOC WebUI added as an experimental option (by @awwaawwa)
  • [Feb. 22 2025] Better release CI and well-packaged windows-amd64 exe (by @awwaawwa)
  • [Dec. 24 2024] The translator now supports local models on Xinference (by @imClumsyPanda)
  • [Dec. 19 2024] Non-PDF/A documents are now supported using -cp (by @reycn)
  • [Dec. 13 2024] Additional support for backend by (by @YadominJinta)
  • [Dec. 10 2024] The translator now supports OpenAI models on Azure (by @yidasanqian)

Preview

Online Service 🌟

Note

pdf2zh 2.0 does not currently provide an online demo

You can try our application out using either of the following demos:

Note that the computing resources of the demo are limited, so please avoid abusing them.

Installation and Usage

Installation

  1. Windows EXE Recommand for Windows

  2. Docker Recommand for Linux

  3. uv (a Python package manager) Recommand for macOS

    Need a local one-click Docker startup? Run ./script/docker-up.sh from the project root and open http://localhost:7860/.


Usage

  1. Using WebUI
  2. Using Zotero Plugin (Third party program)
  3. Using Commandline

For different use cases, we provide distinct methods to use our program. Check out this page for more information.

Advanced Options

For detailed explanations, please refer to our document about Advanced Usage for a full list of each option.

Secondary Development (APIs)

Run an HTTP service with:

uvicorn pdf2zh_next.api.app:app --host 0.0.0.0 --port 8000

常用 REST 调用(请在 Header 中添加 Authorization: Bearer <api_key>):

  • 创建任务:

    curl -X POST \
      -H "Authorization: Bearer <your-user-api-key>" \
      -F "files=@/path/to/paper.pdf" \
      -F "target_language=zh" \
      http://localhost:8000/v1/translations/
  • 查询进度:GET /v1/translations/{task_id}/progress

  • 查看结果摘要:GET /v1/translations/{task_id}/result

  • 下载翻译包:GET /v1/translations/{task_id}/files/{file_id}/download

  • 删除任务(附带清理产物):DELETE /v1/translations/{task_id}

  • 已结束任务的独立清理:

    curl -X POST \
      -H "Authorization: Bearer <your-admin-api-key>" \
      http://localhost:8000/v1/translations/{task_id}/clean

Environment variables:

  • 翻译与存储:PDF2ZH_API_SUPPORTED_FORMATS(默认 .pdf),PDF2ZH_API_MAX_FILE_SIZE(默认 104857600),PDF2ZH_API_STORAGE_ROOTPDF2ZH_API_SECONDS_PER_MB / PDF2ZH_API_ESTIMATE_MIN_SECONDS / PDF2ZH_API_ESTIMATE_MAX_SECONDSPDF2ZH_API_PREVIEW_CONFIDENCEPDF2ZH_API_ARTIFACT_EXPIRE_DAYS
  • 并发与生命周期:PDF2ZH_API_MAX_CONCURRENCY(默认 10),PDF2ZH_API_TASK_TIMEOUT(默认 3600 秒),PDF2ZH_API_CLEANUP_INTERVAL(默认 300 秒),PDF2ZH_API_TASK_RETENTION_HOURS(默认 24 小时)。
  • 认证模板:PDF2ZH_API_USER_* / PDF2ZH_API_ADMIN_* 用于权限、配额、文件大小、允许引擎等默认值(详见 .env.example)。
  • PDF2ZH_API_USER_KEYS: 逗号分隔的普通用户密钥列表(必须配置,无内置默认;支持 .env)。
  • PDF2ZH_API_ADMIN_KEYS: 逗号分隔的管理员密钥列表(必须配置,无内置默认;支持 .env)。
  • PDF2ZH_API_MAX_CONCURRENCY: maximum concurrent translations (default 10).
  • PDF2ZH_API_QUEUE_MAXSIZE: optional queue length limit (default unlimited).
  • PDF2ZH_API_EXEC_TIMEOUT: seconds to wait when acquiring a worker slot.
  • PDF2ZH_API_WORKERS: number of background queue workers (defaults to PDF2ZH_API_MAX_CONCURRENCY).

Concurrent Processing Flow

flowchart LR
    subgraph HTTP_API["FastAPI Server"]
        direction TB
        U[Client Request] --> |upload PDF| TQ[translate_pdf]
        TQ --> |create TaskRecord| Q[TASK_QUEUE]
        subgraph Lifespan
            direction TB
            style Lifespan fill:#f5f5f5,stroke:#ccc,stroke-width:1px
            W1[_task_worker_loop #1]
            Wn[_task_worker_loop #N]
        end
        Q --> |await get| W1
        Q --> |await get| Wn
        W1 --> |asyncio.create_task| RT1["_run_task(task1)"]
        Wn --> |asyncio.create_task| RTn["_run_task(taskN)"]
        RT1 --> |await acquire| SEM[SEMAPHORE (max=PDF2ZH_API_MAX_CONCURRENCY)]
        RTn --> |await acquire| SEM
        SEM --> |permit| EX1["_execute_task(task)"]
    end

    subgraph TaskLifecycle["Per-Task Execution"]
        direction TB
        EX1 --> |clone settings\nset output| CFG[settings.validate]
        CFG --> |await| STRM["_stream_translation"]
        STRM --> |async for events| HILO["do_translate_async_stream"]
    end

    subgraph Subprocess["Multiprocessing Layer"]
        direction TB
        HILO --> |spawn| SUBP["_translate_in_subprocess"]
        SUBP --> PROC["multiprocessing.Process"]
        PROC --> WRAP["_translate_wrapper"]
        WRAP --> |babeldoc async loop| BABEL[BabelDOC]
        BABEL --> |progress/error events| PIPE{{Pipe/Queue}}
        PIPE --> |events back| STRM
    end

    STRM --> |finish/error| EX1
    EX1 --> |release| SEM
    EX1 --> |set event\nupdate state| STATE[TaskRecord]
    STATE --> RESP[API Response/Result Polling]
Loading

GET /v1/health 返回服务状态与当前队列信息。Future API expansions will be documented here.

Language Code

If you don't know what code to use to translate to the language you need, check out this documentation

Acknowledgements

Before submit your code

We welcome the active participation of contributors to make pdf2zh better. Before you are ready to submit your code, please refer to our Code of Conduct and Contribution Guide.

Contributors

Alt

Star History

Star History Chart

About

PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 98.0%
  • Shell 1.5%
  • Other 0.5%