pptx-translate

A Claude Code skill that translates an editable .pptx from Chinese to English in place — keeping each text box's color, font size, bold, and layout region intact.

It never mutates your source file: every edit (translation + font size + line spacing + box size) is recorded as replayable data keyed by shape identity, then replayed onto a copy of the original.

一个 Claude Code 技能:把可编辑 PPTX 中译英,逐 run 保留颜色/字号/加粗, 译文尽量贴合原文区域。原件全程不动,所有修改可重放。

What it does

Format-segment granularity — runs in a paragraph that share (size, bold, color) are merged into one semantic unit before translating. This both merges IME-fragmented runs (whole-sentence translation, correct word order) and preserves mixed-format boundaries inside a single box (e.g. grey body text wrapping a blue quote, a green number next to black caption).
Edits as replayable data — segments.json (translations + font-size overrides) and adjust.json (line spacing / box size / position) address shapes by identity, so the same edits apply to a working copy for review and finally replay onto a copy of the original. Zero loss, repeatable.
See-it overflow check — renders pages to PNG via LibreOffice so the agent (or you) can visually spot truncation / squished text / collisions that a numeric estimate misses.

Requirements

pip install python-pptx PyMuPDF

LibreOffice is needed only for the render/screenshot step. render.py locates soffice in this order: SOFFICE env var → soffice on PATH → Windows default install path.

Linux: install LibreOffice; soffice is usually on PATH → zero config.
macOS: export SOFFICE=/Applications/LibreOffice.app/Contents/MacOS/soffice
Windows: winget install --id TheDocumentFoundation.LibreOffice -e

Install as a Claude Code skill

Drop the folder into your project's (or global) skills directory so Claude Code auto-discovers it:

# project-local
git clone https://github.com/Darkstarrd-dev/pptx-translate.git \
  .claude/skills/pptx-translate

# or global (all projects)
git clone https://github.com/Darkstarrd-dev/pptx-translate.git \
  ~/.claude/skills/pptx-translate

Then in Claude Code just ask: "translate this pptx to English" / "翻译 pptx", or invoke /pptx-translate. Claude reads SKILL.md and drives the pipeline.

Use the scripts directly (without Claude Code)

The four scripts are a standalone pipeline. From the folder containing your test.pptx (prefix PYTHONIOENCODING=utf-8 on Windows):

S=path/to/pptx-translate

# 0. inspect structure (find mixed-format boxes)
python $S/probe.py test.pptx 1 3 7 12

# 1. extract Chinese format-segments -> segments.json
python $S/translate_pptx.py extract test.pptx segments.json --pages 1 3 7 12

# 2. translate: write an id->English map (en_map.json), then merge it in.
#    A value can be a string (translation only) or {"en":..,"size":..}
#    to also override the font size for a long title that would overflow.
python $S/translate_pptx.py merge segments.json en_map.json

# 3. apply -> produce the English deck on a COPY (source untouched)
python $S/translate_pptx.py apply test.pptx segments.json test_EN.pptx --adjust adjust.json

# 4. render to PNG and look at the images
python $S/render.py test_EN.pptx 1,3,7,12 renders_en

# 5. (optional) numeric overflow check on fixed-size boxes
python $S/check_overflow.py test_EN.pptx 1 3 7 12

en_map.json example (key = segment id from segments.json):

{
  "s1_sh6_p0_r0": "Multi-Energy Distributed",
  "s1_sh6_p1_r0": { "en": "Green Power Station Solution", "size": 30 }
}

adjust.json example (box-level tweaks, replayed by shape identity):

{ "boxes": [
  { "slide": 1, "id_path": [6], "line_spacing": 0.95 },
  { "slide": 3, "id_path": [166], "line_spacing": 0.9 }
] }

Iterate: spot overflow in the PNGs → tune size in en_map.json or line spacing / box size in adjust.json → re-run merge → apply → render. When happy, replay the same segments.json + adjust.json onto a copy of the original.

Scripts

Script	Role
`probe.py`	Inspect text boxes / runs / color / size / autofit per page
`translate_pptx.py`	`extract` segments, `merge` translations+sizes, `apply` (replay → Arial unify, size/box adjust)
`render.py`	LibreOffice → PDF → PyMuPDF → PNG
`check_overflow.py`	Numeric overflow estimate for fixed-size boxes

Gotchas

Kangxi-radical look-alikes: some decks use Unicode Kangxi radical characters (U+2F00+) that look identical to real Hanzi. Never hand-type the source for matching — always go through the id map. probe/extract read the raw characters, so they're always correct.
autofit=SHAPE_TO_FIT_TEXT boxes grow: most boxes resize to fit text, so a longer English string enlarges the box and may collide with neighbors. Use size overrides to pull the line count / width back toward the original.
Keep background media when rendering: cover/divider slides often have white text over dark photos. Don't strip backgrounds or white-on-white becomes invisible and unreviewable.
Output file locked: if PowerPoint has the target open, apply/render fail (PermissionError / soffice exit 1). Close it or use a different name.
LibreOffice ≠ PowerPoint: font substitution and widths differ slightly. Good enough to judge overflow; verify final result in PowerPoint.

中文使用说明

概述

pptx-translate 是一个 Claude Code skill（也可脱离 Claude Code 独立使用），它能将可编辑的 .pptx 文件从中文翻译为英文，原位保留原文每个文本框的颜色、字号、加粗样式和版式区域，原件全程不受改动。

核心特性：

格式片段（format-segment）粒度 — 同一段落内字号/加粗/颜色相同的连续 run 合并为一个语义单元再翻译。既能合并 IME 输入法切分的碎片 run（整句翻译，语序正确），又能保留框内混排格式边界（如灰色正文中嵌蓝色引用、绿色数字接黑色标注）。
编辑数据可重放 — segments.json（译文 + 字号覆盖）和 adjust.json（行距 / 框尺寸 / 位置）按形状身份 ID 索引，先在副本上反复调试，定稿后对原件副本执行最终重放。零风险，可重复。
可视化溢出检查 — 通过 LibreOffice 将页面渲染为 PNG，肉眼直观发现截断、挤压、碰撞等问题，弥补纯数值估算的盲区。

安装

Python 依赖

pip install python-pptx PyMuPDF

LibreOffice（仅渲染/看图需要）

render.py 按以下顺序查找 soffice：SOFFICE 环境变量 → PATH 上的 soffice → Windows 默认安装路径。

平台	安装方式
Windows	`winget install --id TheDocumentFoundation.LibreOffice -e`，装在默认路径即可零配置
Linux	`sudo apt install libreoffice` 或等效包管理器；`soffice` 自动在 `PATH` 上，零配置
macOS	从 libreoffice.org 下载安装，然后 `export SOFFICE=/Applications/LibreOffice.app/Contents/MacOS/soffice`

注意： 纯翻译 + 回填（步骤 1-3）不需要 LibreOffice。只有渲染 PNG 看图排查溢出时才需要。

完整工作流程

以下假设文件 test.pptx 在工作目录下，脚本路径存为变量 S。Windows 用户需在命令前加 PYTHONIOENCODING=utf-8 防止控制台乱码。

第 0 步：探查结构

python $S/probe.py test.pptx 1 3 7 12

输出每个指定页面的文本框信息：内容、run 片段数、颜色、字号、autofit 模式。帮你快速定位哪些框含混排格式、哪些框需要额外关注。

第 1 步：提取中文格式片段

python $S/translate_pptx.py extract test.pptx segments.json --pages 1 3 7 12

逐页逐框提取文本，按格式（字号/加粗/颜色）合并 run 为片段，输出 segments.json。每个片段有唯一 id（如 s1_sh6_p0_r0 = 第1页第6形状第0段第0片段），内容保持原始字符。

第 2 步：翻译并合并

先人工（或调用 AI）为 segments.json 中的每个 id 提供英文译文，写成 en_map.json，然后执行合并：

python $S/translate_pptx.py merge segments.json en_map.json

en_map.json 格式：

纯字符串 = 仅替换译文
对象 {"en": "...", "size": 30} = 替换译文 + 覆盖字号（应对长标题溢出）

{
  "s1_sh6_p0_r0": "Multi-Energy Distributed",
  "s1_sh6_p1_r0": { "en": "Green Power Station Solution", "size": 30 }
}

第 3 步：应用到副本

python $S/translate_pptx.py apply test.pptx segments.json test_EN.pptx --adjust adjust.json

将译文回填到 test.pptx 的副本 test_EN.pptx 上，原文 test.pptx 不受影响。--adjust adjust.json 可选，用于微调行距和框尺寸。

adjust.json 格式（按形状身份 ID 索引）：

{
  "boxes": [
    { "slide": 1, "id_path": [6], "line_spacing": 0.95 },
    { "slide": 3, "id_path": [166], "line_spacing": 0.9 }
  ]
}

第 4 步：渲染为 PNG 看图

python $S/render.py test_EN.pptx 1,3,7,12 renders_en

通过 LibreOffice 将指定页转为 PDF，再用 PyMuPDF 截取 PNG 图片，输出到 renders_en/ 目录。

第 5 步（可选）：数值溢出检测

python $S/check_overflow.py test_EN.pptx 1 3 7 12

对固定尺寸文本框进行数值估算，判断译文是否超出框边界。

调试迭代循环

发现 PNG 中有截断或挤压 → 在 en_map.json 中调小 size 值，或在 adjust.json 中调整 line_spacing → 重新运行 merge → apply → render。满意后将同一组 segments.json + adjust.json 对原件副本执行最终重放。

脚本职责一览

脚本	功能
`probe.py`	探查文本框结构：run 数、颜色、字号、autofit 模式
`translate_pptx.py`	`extract` 提取格式片段 → `merge` 合并译文 + 字号 → `apply` 回填（统一 Arial 字体、调整字号/框尺寸）
`render.py`	LibreOffice → PDF → PyMuPDF → PNG 渲染
`check_overflow.py`	固定尺寸框的数值溢出估算

注意事项

康熙部首形近字： 有些 PPT 使用 Unicode 康熙部首（U+2F00+）字符，外观与汉字完全相同。切勿手打原文去匹配，始终通过 id 定位。probe / extract 读取的是原始字符，保证准确。
autofit=SHAPE_TO_FIT_TEXT 框会变大： 多数文本框设为自适应，英文译文更长会导致框膨胀挤压邻框。用 size 覆盖字号来控制行数/宽度。
渲染时保留背景素材： 封面/分隔页常是白字深色背景图，不要剥离背景，否则白底白字无法查看。
关闭输出文件： 如果 PowerPoint 打开了目标文件，apply / render 会失败（PermissionError 或 soffice 退出码 1）。关闭文件或换一个文件名。
LibreOffice ≠ PowerPoint： 字体替换和宽度有细微差异，渲染结果可用于判断溢出，但最终效果以 PowerPoint 打开为准。

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
check_overflow.py		check_overflow.py
probe.py		probe.py
render.py		render.py
requirements.txt		requirements.txt
translate_pptx.py		translate_pptx.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pptx-translate

What it does

Requirements

Install as a Claude Code skill

Use the scripts directly (without Claude Code)

Scripts

Gotchas

中文使用说明

概述

安装

Python 依赖

LibreOffice（仅渲染/看图需要）

完整工作流程

第 0 步：探查结构

第 1 步：提取中文格式片段

第 2 步：翻译并合并

第 3 步：应用到副本

第 4 步：渲染为 PNG 看图

第 5 步（可选）：数值溢出检测

调试迭代循环

脚本职责一览

注意事项

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pptx-translate

What it does

Requirements

Install as a Claude Code skill

Use the scripts directly (without Claude Code)

Scripts

Gotchas

中文使用说明

概述

安装

Python 依赖

LibreOffice（仅渲染/看图需要）

完整工作流程

第 0 步：探查结构

第 1 步：提取中文格式片段

第 2 步：翻译并合并

第 3 步：应用到副本

第 4 步：渲染为 PNG 看图

第 5 步（可选）：数值溢出检测

调试迭代循环

脚本职责一览

注意事项

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages