fix(tui): fix paste & autocomplete corruption with CJK (Chinese) input#1292
Merged
Conversation
Extmark offsets are display-width based (a wide CJK char counts as 2 columns) while the editor plainText is a JS UTF-16 string (a CJK char is 1 unit). At submit time the placeholder was expanded with inputText.slice using the width-based offset against the UTF-16 string, so any CJK text before a paste over-counted the start index: the placeholder prefix was left as residue and trailing content got swallowed. Add expandPlaceholders() which converts width offsets to UTF-16 string indices before slicing, applied right-to-left so multiple placeholders stay valid.
The @ / $ / slash autocomplete shared the same width-vs-UTF-16 coordinate bug as paste: onInput sliced plainText (UTF-16) with the display-width cursor offset and stored a UTF-16 trigger index into store.index, which is then consumed by width-based APIs (getTextRange, extmarks.create). When CJK text preceded a trigger (or followed it before the cursor), the filter term and mention placeholder were misaligned. Extract the width<->UTF-16 conversions into offset.ts (reused by part.ts), move the trigger detection into a pure detectTrigger() that works in width coordinates, and fix insertPart's value.at() to convert the cursor offset to a string index first.
- extract insertPart's needsSpace decision into charAfterCursor() in offset.ts and unit-test it (the value.at coordinate fix was untested) - add a $-trigger-preceded-by-CJK case to detectTrigger tests - add a supplementary-plane (emoji) round-trip case to the converters - document that the converters assume code-point-boundary inputs
f2a95a7 to
f21f6df
Compare
The editor advances its display-width offset by 1 per newline and 2 per tab, but Bun.stringWidth returns 0 for both. The converters used Bun.stringWidth directly, so any newline or tab before a placeholder desynced the two coordinate systems, leaving a stray "[" of the placeholder behind and corrupting content (e.g. a multi-line paste on its own line after CJK text expanded to "...\n[CONTENT\n..." instead of "...\nCONTENT \n..."). Special-case "\n" (width 1) and "\t" (width 2) in charWidth, matching the editor exactly (verified char-by-char against @opentui/core). Pasted "\r" never reaches here — paste input is normalized to "\n" and the editor also maps "\r" to "\n".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes input-corruption bugs in the TUI prompt that share one root cause: mixing the editor's display-width offsets with UTF-16 string indices. The editor (
@opentui/core) tracks cursor/extmark offsets in display-width units (a wide CJK char = 2 columns, a newline = 1, a tab = 2), while theplainTextwe slice in JS is UTF-16 (a CJK char = 1 unit). Mixing them drifts positions to the right whenever a wide char, newline, or tab precedes the offset.修复 TUI prompt 输入框的输入错乱,根因相同:显示宽度坐标 与 UTF-16 字符串坐标 混用。编辑器(
@opentui/core)的光标 / extmark 偏移量按显示宽度计(一个中文字符 = 2 列,换行 = 1,Tab = 2),而我们在 JS 里slice的plainText是 UTF-16 字符串(同一个字符 = 1 个单位)。两者混用,只要偏移量前出现宽字符、换行或 Tab,定位就会右偏。Symptoms
Paste: pasting multi-line content collapses to a
[Pasted ~N lines]placeholder that expands on submit. With CJK text before it, the placeholder prefix was left as residue and trailing content was swallowed. A placeholder sitting on its own line after CJK (e.g.A\n[Pasted]\nB) expanded toA\n[CONTENT\nB— a stray[left behind — because each preceding newline/tab was mis-counted.Autocomplete: the
@(files) /$(agents) //(commands) trigger detection misaligned when CJK preceded the trigger — wrong filter term, misplaced mention placeholder, and wrong trailing-space decision.粘贴:粘贴多行内容会折叠成
[Pasted ~N lines]占位符,提交时展开。若粘贴位置前有中文,展开时占位符前缀残留、且吞掉尾部内容。当占位符独占一行且前面有中文(如A\n[Pasted]\nB)时,会展开成A\n[CONTENT\nB——残留一个[,原因是前面的换行 / Tab 被错误计数。自动补全:
@文件/$agent//命令的触发检测在前面有中文时错位——过滤词错误、提及占位符错位、尾部空格判断出错。Root cause detail
The display-width offset is not simply
Bun.stringWidth(text): the editor counts a newline as width 1 and a tab as width 2, butBun.stringWidthreturns 0 for both. The converters therefore special-case"\n"(1) and"\t"(2) so they track the editor exactly (verified char-by-char against@opentui/core). Pasted"\r"never reaches the converters — paste input is normalized to"\n"and the editor itself maps"\r"to"\n".显示宽度偏移并不等于
Bun.stringWidth(text):编辑器把换行计为宽度 1、Tab 计为宽度 2,而Bun.stringWidth对两者都返回 0。因此换算器对"\n"(1)和"\t"(2)做特判,与编辑器逐字符对齐(已与@opentui/core逐字符核对)。粘贴的"\r"不会到达换算器——粘贴入口已归一化为"\n",且编辑器自身也把"\r"映射为"\n"。Changes
New
offset.tswith the shared coordinate converterswidthToStringIndex/stringIndexToWidth/charAfterCursor, reused by both fixes; an internalcharWidthhandles the newline/tab special cases.expandPlaceholders()inpart.ts, used bysubmit()to expand placeholders in the correct coordinate system.Pure
detectTrigger()inautocomplete-detect.ts, wired intoonInput; and a coordinate fix forinsertPart's cursor-char lookup.新增
offset.ts,提供两条修复共用的坐标换算函数widthToStringIndex/stringIndexToWidth/charAfterCursor;内部charWidth处理换行 / Tab 的特判。part.ts新增expandPlaceholders(),submit()改用它在正确坐标系下展开占位符。新增
autocomplete-detect.ts的纯函数detectTrigger(),onInput改用它;并修正insertPart取光标后字符的坐标。Tests
Followed TDD throughout — a failing test reproduced each bug first, then the fix made it pass.
offset.test.ts: both-direction conversion, round-trip (CJK, supplementary-plane emoji, newlines, tabs), andcharAfterCursor.autocomplete-detect.test.ts: everydetectTriggerbranch (leading/,@/$ASCII, CJK before/after the trigger, whitespace in between, non-whitespace before).prompt-part.test.ts:expandPlaceholders(single, CJK-preceded, multiple, and a placeholder on its own line after CJK).bun test test/cli/cmd/tui/→ 27 passed;bun typecheckclean.全程 TDD —— 先写失败测试复现每个 bug,再修复使其通过。
offset.test.ts:双向换算、round-trip(中文、emoji 补充平面、换行、Tab)、charAfterCursor。autocomplete-detect.test.ts:detectTrigger的每个分支(行首/、@/$ASCII、中文在触发符前 / 后、中间有空白、前置非空白)。prompt-part.test.ts:expandPlaceholders(单个、中文前置、多占位符、占位符在中文后独占一行)。bun test test/cli/cmd/tui/→ 27 通过;bun typecheck干净。Regression risk
Low. The dominant ASCII single-line path is unchanged — the converters are identity for pure-ASCII input without newlines/tabs, and
detectTriggermaps 1:1 to the originalonInputlogic. Verified line-by-line by an independent code review with no regressions found.风险低。ASCII 单行主路径行为不变——换算器对不含换行 / Tab 的纯 ASCII 输入是恒等映射,
detectTrigger与原onInput逻辑 1:1 对应。经独立 code review 逐行核对,未发现回归。