fix(tui): fix paste & autocomplete corruption with CJK (Chinese) input by yanyihan-xiaomi · Pull Request #1292 · XiaomiMiMo/MiMo-Code

yanyihan-xiaomi · 2026-06-24T09:03:13Z

Summary

Fixes input-corruption bugs in the TUI prompt that share one root cause: mixing the editor's display-width offsets with UTF-16 string indices. The editor (@opentui/core) tracks cursor/extmark offsets in display-width units (a wide CJK char = 2 columns, a newline = 1, a tab = 2), while the plainText we slice in JS is UTF-16 (a CJK char = 1 unit). Mixing them drifts positions to the right whenever a wide char, newline, or tab precedes the offset.

修复 TUI prompt 输入框的输入错乱，根因相同：显示宽度坐标 与 UTF-16 字符串坐标 混用。编辑器（@opentui/core）的光标 / extmark 偏移量按显示宽度计（一个中文字符 = 2 列，换行 = 1，Tab = 2），而我们在 JS 里 slice 的 plainText 是 UTF-16 字符串（同一个字符 = 1 个单位）。两者混用，只要偏移量前出现宽字符、换行或 Tab，定位就会右偏。

Symptoms

Paste: pasting multi-line content collapses to a [Pasted ~N lines] placeholder that expands on submit. With CJK text before it, the placeholder prefix was left as residue and trailing content was swallowed. A placeholder sitting on its own line after CJK (e.g. A\n[Pasted]\nB) expanded to A\n[CONTENT\nB — a stray [ left behind — because each preceding newline/tab was mis-counted.
Autocomplete: the @ (files) / $ (agents) / / (commands) trigger detection misaligned when CJK preceded the trigger — wrong filter term, misplaced mention placeholder, and wrong trailing-space decision.
粘贴：粘贴多行内容会折叠成 [Pasted ~N lines] 占位符，提交时展开。若粘贴位置前有中文，展开时占位符前缀残留、且吞掉尾部内容。当占位符独占一行且前面有中文（如 A\n[Pasted]\nB）时，会展开成 A\n[CONTENT\nB——残留一个 [，原因是前面的换行 / Tab 被错误计数。
自动补全：@文件 / $agent / /命令 的触发检测在前面有中文时错位——过滤词错误、提及占位符错位、尾部空格判断出错。

Root cause detail

The display-width offset is not simply Bun.stringWidth(text): the editor counts a newline as width 1 and a tab as width 2, but Bun.stringWidth returns 0 for both. The converters therefore special-case "\n" (1) and "\t" (2) so they track the editor exactly (verified char-by-char against @opentui/core). Pasted "\r" never reaches the converters — paste input is normalized to "\n" and the editor itself maps "\r" to "\n".

显示宽度偏移并不等于 Bun.stringWidth(text)：编辑器把换行计为宽度 1、Tab 计为宽度 2，而 Bun.stringWidth 对两者都返回 0。因此换算器对 "\n"（1）和 "\t"（2）做特判，与编辑器逐字符对齐（已与 @opentui/core 逐字符核对）。粘贴的 "\r" 不会到达换算器——粘贴入口已归一化为 "\n"，且编辑器自身也把 "\r" 映射为 "\n"。

Changes

New offset.ts with the shared coordinate converters widthToStringIndex / stringIndexToWidth / charAfterCursor, reused by both fixes; an internal charWidth handles the newline/tab special cases.
expandPlaceholders() in part.ts, used by submit() to expand placeholders in the correct coordinate system.
Pure detectTrigger() in autocomplete-detect.ts, wired into onInput; and a coordinate fix for insertPart's cursor-char lookup.
新增 offset.ts，提供两条修复共用的坐标换算函数 widthToStringIndex / stringIndexToWidth / charAfterCursor；内部 charWidth 处理换行 / Tab 的特判。
part.ts 新增 expandPlaceholders()，submit() 改用它在正确坐标系下展开占位符。
新增 autocomplete-detect.ts 的纯函数 detectTrigger()，onInput 改用它；并修正 insertPart 取光标后字符的坐标。

Tests

Followed TDD throughout — a failing test reproduced each bug first, then the fix made it pass.

offset.test.ts: both-direction conversion, round-trip (CJK, supplementary-plane emoji, newlines, tabs), and charAfterCursor.
autocomplete-detect.test.ts: every detectTrigger branch (leading /, @/$ ASCII, CJK before/after the trigger, whitespace in between, non-whitespace before).
prompt-part.test.ts: expandPlaceholders (single, CJK-preceded, multiple, and a placeholder on its own line after CJK).

bun test test/cli/cmd/tui/ → 27 passed; bun typecheck clean.

全程 TDD —— 先写失败测试复现每个 bug，再修复使其通过。

offset.test.ts：双向换算、round-trip（中文、emoji 补充平面、换行、Tab）、charAfterCursor。
autocomplete-detect.test.ts：detectTrigger 的每个分支（行首 /、@/$ ASCII、中文在触发符前 / 后、中间有空白、前置非空白）。
prompt-part.test.ts：expandPlaceholders（单个、中文前置、多占位符、占位符在中文后独占一行）。

bun test test/cli/cmd/tui/ → 27 通过；bun typecheck 干净。

Regression risk

Low. The dominant ASCII single-line path is unchanged — the converters are identity for pure-ASCII input without newlines/tabs, and detectTrigger maps 1:1 to the original onInput logic. Verified line-by-line by an independent code review with no regressions found.

风险低。ASCII 单行主路径行为不变——换算器对不含换行 / Tab 的纯 ASCII 输入是恒等映射，detectTrigger 与原 onInput 逻辑 1:1 对应。经独立 code review 逐行核对，未发现回归。

Extmark offsets are display-width based (a wide CJK char counts as 2 columns) while the editor plainText is a JS UTF-16 string (a CJK char is 1 unit). At submit time the placeholder was expanded with inputText.slice using the width-based offset against the UTF-16 string, so any CJK text before a paste over-counted the start index: the placeholder prefix was left as residue and trailing content got swallowed. Add expandPlaceholders() which converts width offsets to UTF-16 string indices before slicing, applied right-to-left so multiple placeholders stay valid.

The @ / $ / slash autocomplete shared the same width-vs-UTF-16 coordinate bug as paste: onInput sliced plainText (UTF-16) with the display-width cursor offset and stored a UTF-16 trigger index into store.index, which is then consumed by width-based APIs (getTextRange, extmarks.create). When CJK text preceded a trigger (or followed it before the cursor), the filter term and mention placeholder were misaligned. Extract the width<->UTF-16 conversions into offset.ts (reused by part.ts), move the trigger detection into a pure detectTrigger() that works in width coordinates, and fix insertPart's value.at() to convert the cursor offset to a string index first.

- extract insertPart's needsSpace decision into charAfterCursor() in offset.ts and unit-test it (the value.at coordinate fix was untested) - add a $-trigger-preceded-by-CJK case to detectTrigger tests - add a supplementary-plane (emoji) round-trip case to the converters - document that the converters assume code-point-boundary inputs

The editor advances its display-width offset by 1 per newline and 2 per tab, but Bun.stringWidth returns 0 for both. The converters used Bun.stringWidth directly, so any newline or tab before a placeholder desynced the two coordinate systems, leaving a stray "[" of the placeholder behind and corrupting content (e.g. a multi-line paste on its own line after CJK text expanded to "...\n[CONTENT\n..." instead of "...\nCONTENT \n..."). Special-case "\n" (width 1) and "\t" (width 2) in charWidth, matching the editor exactly (verified char-by-char against @opentui/core). Pasted "\r" never reaches here — paste input is normalized to "\n" and the editor also maps "\r" to "\n".

yanyihan-xiaomi self-assigned this Jun 24, 2026

yanyihan-xiaomi added 3 commits June 24, 2026 18:13

yanyihan-xiaomi force-pushed the fix/paste-chinese-residue branch from f2a95a7 to f21f6df Compare June 24, 2026 10:14

yanyihan-xiaomi merged commit 13dccc6 into main Jun 24, 2026
4 of 6 checks passed

yanyihan-xiaomi deleted the fix/paste-chinese-residue branch June 24, 2026 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tui): fix paste & autocomplete corruption with CJK (Chinese) input#1292

fix(tui): fix paste & autocomplete corruption with CJK (Chinese) input#1292
yanyihan-xiaomi merged 4 commits into
mainfrom
fix/paste-chinese-residue

yanyihan-xiaomi commented Jun 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yanyihan-xiaomi commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Symptoms

Root cause detail

Changes

Tests

Regression risk

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yanyihan-xiaomi commented Jun 24, 2026 •

edited

Loading