Skip to content

fix(tui): fix paste & autocomplete corruption with CJK (Chinese) input#1292

Merged
yanyihan-xiaomi merged 4 commits into
mainfrom
fix/paste-chinese-residue
Jun 24, 2026
Merged

fix(tui): fix paste & autocomplete corruption with CJK (Chinese) input#1292
yanyihan-xiaomi merged 4 commits into
mainfrom
fix/paste-chinese-residue

Conversation

@yanyihan-xiaomi

@yanyihan-xiaomi yanyihan-xiaomi commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes input-corruption bugs in the TUI prompt that share one root cause: mixing the editor's display-width offsets with UTF-16 string indices. The editor (@opentui/core) tracks cursor/extmark offsets in display-width units (a wide CJK char = 2 columns, a newline = 1, a tab = 2), while the plainText we slice in JS is UTF-16 (a CJK char = 1 unit). Mixing them drifts positions to the right whenever a wide char, newline, or tab precedes the offset.

修复 TUI prompt 输入框的输入错乱,根因相同:显示宽度坐标UTF-16 字符串坐标 混用。编辑器(@opentui/core)的光标 / extmark 偏移量按显示宽度计(一个中文字符 = 2 列,换行 = 1,Tab = 2),而我们在 JS 里 sliceplainText 是 UTF-16 字符串(同一个字符 = 1 个单位)。两者混用,只要偏移量前出现宽字符、换行或 Tab,定位就会右偏。

Symptoms

  • Paste: pasting multi-line content collapses to a [Pasted ~N lines] placeholder that expands on submit. With CJK text before it, the placeholder prefix was left as residue and trailing content was swallowed. A placeholder sitting on its own line after CJK (e.g. A\n[Pasted]\nB) expanded to A\n[CONTENT\nB — a stray [ left behind — because each preceding newline/tab was mis-counted.

  • Autocomplete: the @ (files) / $ (agents) / / (commands) trigger detection misaligned when CJK preceded the trigger — wrong filter term, misplaced mention placeholder, and wrong trailing-space decision.

  • 粘贴:粘贴多行内容会折叠成 [Pasted ~N lines] 占位符,提交时展开。若粘贴位置前有中文,展开时占位符前缀残留、且吞掉尾部内容。当占位符独占一行且前面有中文(如 A\n[Pasted]\nB)时,会展开成 A\n[CONTENT\nB——残留一个 [,原因是前面的换行 / Tab 被错误计数。

  • 自动补全@文件 / $agent / /命令 的触发检测在前面有中文时错位——过滤词错误、提及占位符错位、尾部空格判断出错。

Root cause detail

The display-width offset is not simply Bun.stringWidth(text): the editor counts a newline as width 1 and a tab as width 2, but Bun.stringWidth returns 0 for both. The converters therefore special-case "\n" (1) and "\t" (2) so they track the editor exactly (verified char-by-char against @opentui/core). Pasted "\r" never reaches the converters — paste input is normalized to "\n" and the editor itself maps "\r" to "\n".

显示宽度偏移并不等于 Bun.stringWidth(text):编辑器把换行计为宽度 1Tab 计为宽度 2,而 Bun.stringWidth 对两者都返回 0。因此换算器对 "\n"(1)和 "\t"(2)做特判,与编辑器逐字符对齐(已与 @opentui/core 逐字符核对)。粘贴的 "\r" 不会到达换算器——粘贴入口已归一化为 "\n",且编辑器自身也把 "\r" 映射为 "\n"

Changes

  • New offset.ts with the shared coordinate converters widthToStringIndex / stringIndexToWidth / charAfterCursor, reused by both fixes; an internal charWidth handles the newline/tab special cases.

  • expandPlaceholders() in part.ts, used by submit() to expand placeholders in the correct coordinate system.

  • Pure detectTrigger() in autocomplete-detect.ts, wired into onInput; and a coordinate fix for insertPart's cursor-char lookup.

  • 新增 offset.ts,提供两条修复共用的坐标换算函数 widthToStringIndex / stringIndexToWidth / charAfterCursor;内部 charWidth 处理换行 / Tab 的特判。

  • part.ts 新增 expandPlaceholders()submit() 改用它在正确坐标系下展开占位符。

  • 新增 autocomplete-detect.ts 的纯函数 detectTrigger()onInput 改用它;并修正 insertPart 取光标后字符的坐标。

Tests

Followed TDD throughout — a failing test reproduced each bug first, then the fix made it pass.

  • offset.test.ts: both-direction conversion, round-trip (CJK, supplementary-plane emoji, newlines, tabs), and charAfterCursor.
  • autocomplete-detect.test.ts: every detectTrigger branch (leading /, @/$ ASCII, CJK before/after the trigger, whitespace in between, non-whitespace before).
  • prompt-part.test.ts: expandPlaceholders (single, CJK-preceded, multiple, and a placeholder on its own line after CJK).

bun test test/cli/cmd/tui/27 passed; bun typecheck clean.

全程 TDD —— 先写失败测试复现每个 bug,再修复使其通过。

  • offset.test.ts:双向换算、round-trip(中文、emoji 补充平面、换行、Tab)、charAfterCursor
  • autocomplete-detect.test.tsdetectTrigger 的每个分支(行首 /@/$ ASCII、中文在触发符前 / 后、中间有空白、前置非空白)。
  • prompt-part.test.tsexpandPlaceholders(单个、中文前置、多占位符、占位符在中文后独占一行)。

bun test test/cli/cmd/tui/27 通过bun typecheck 干净。

Regression risk

Low. The dominant ASCII single-line path is unchanged — the converters are identity for pure-ASCII input without newlines/tabs, and detectTrigger maps 1:1 to the original onInput logic. Verified line-by-line by an independent code review with no regressions found.

风险低。ASCII 单行主路径行为不变——换算器对不含换行 / Tab 的纯 ASCII 输入是恒等映射,detectTrigger 与原 onInput 逻辑 1:1 对应。经独立 code review 逐行核对,未发现回归。

@yanyihan-xiaomi yanyihan-xiaomi self-assigned this Jun 24, 2026
Extmark offsets are display-width based (a wide CJK char counts as 2
columns) while the editor plainText is a JS UTF-16 string (a CJK char is
1 unit). At submit time the placeholder was expanded with inputText.slice
using the width-based offset against the UTF-16 string, so any CJK text
before a paste over-counted the start index: the placeholder prefix was
left as residue and trailing content got swallowed.

Add expandPlaceholders() which converts width offsets to UTF-16 string
indices before slicing, applied right-to-left so multiple placeholders
stay valid.
The @ / $ / slash autocomplete shared the same width-vs-UTF-16 coordinate
bug as paste: onInput sliced plainText (UTF-16) with the display-width
cursor offset and stored a UTF-16 trigger index into store.index, which is
then consumed by width-based APIs (getTextRange, extmarks.create). When CJK
text preceded a trigger (or followed it before the cursor), the filter term
and mention placeholder were misaligned.

Extract the width<->UTF-16 conversions into offset.ts (reused by part.ts),
move the trigger detection into a pure detectTrigger() that works in width
coordinates, and fix insertPart's value.at() to convert the cursor offset
to a string index first.
- extract insertPart's needsSpace decision into charAfterCursor() in
  offset.ts and unit-test it (the value.at coordinate fix was untested)
- add a $-trigger-preceded-by-CJK case to detectTrigger tests
- add a supplementary-plane (emoji) round-trip case to the converters
- document that the converters assume code-point-boundary inputs
@yanyihan-xiaomi yanyihan-xiaomi force-pushed the fix/paste-chinese-residue branch from f2a95a7 to f21f6df Compare June 24, 2026 10:14
The editor advances its display-width offset by 1 per newline and 2 per
tab, but Bun.stringWidth returns 0 for both. The converters used
Bun.stringWidth directly, so any newline or tab before a placeholder
desynced the two coordinate systems, leaving a stray "[" of the
placeholder behind and corrupting content (e.g. a multi-line paste on its
own line after CJK text expanded to "...\n[CONTENT\n..." instead of
"...\nCONTENT \n...").

Special-case "\n" (width 1) and "\t" (width 2) in charWidth, matching the
editor exactly (verified char-by-char against @opentui/core). Pasted "\r"
never reaches here — paste input is normalized to "\n" and the editor
also maps "\r" to "\n".
@yanyihan-xiaomi yanyihan-xiaomi merged commit 13dccc6 into main Jun 24, 2026
4 of 6 checks passed
@yanyihan-xiaomi yanyihan-xiaomi deleted the fix/paste-chinese-residue branch June 24, 2026 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant