Summary
TiktokenTokenizer::decode in the smg gateway panics with no entry found for key when it has to decode a Kimi-K2.5 token id that lives in the vocab but is not registered by load_from_path. The panic crashes a tokio worker thread and breaks the in-flight /v1/chat/completions connection. Hit reliably in long sweeps (BFCL non_live, 16 panics over 1390 requests against nvidia/Kimi-K2.5-NVFP4).
This is a smg-side tokenizer wiring bug — the engine emits a valid id within the model's declared vocab range (vocab_size = 163840), and smg's decoder is the one that fails to handle it.
Symptom
[smg] thread 'tokio-rt-worker' (1724695) panicked at
/root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tiktoken-rs-0.9.1/src/patched_tiktoken.rs:82:64:
no entry found for key
Many lossy-UTF-8 warns precede the panic (normal multi-byte boundary handling), then one specific token id triggers the unconditional index access.
Root cause
tiktoken-rs-0.9.1/src/patched_tiktoken.rs (_decode_native_and_split):
let token_bytes = self
.decoder
.get(&token)
.unwrap_or_else(|| &self.special_tokens_decoder[&token]); // panics if missing
If token is in neither decoder nor special_tokens_decoder, the [&token] index panics. The crate exposes no safe variant.
smg's lossy-decode fallback in crates/tokenizer/src/tiktoken.rs:473-494 (Decoder for TiktokenTokenizer::decode) is the only caller exposed to this path.
Kimi K2.5 vocab gap
TiktokenTokenizer::load_from_path (crates/tokenizer/src/tiktoken.rs:217-251) builds special_tokens_encoder from only two sources:
| Source |
IDs covered |
tiktoken.model BPE encoder |
163584 entries (ids 0..163583) |
tokenizer_config.json added_tokens_decoder |
23 entries scattered in 163584..163839 |
That leaves 233 ids unmapped in {163589, 163592, 163600, 163608..163837}. These are reserved/extra-id slots in Kimi-K2.5's declared 163840-token vocab. They're rare in normal sampling but not impossible — over a long sweep they show up and crash a worker.
Reproducer (Python parity of the panicking lookup)
Loads the same two sources smg loads, runs the same fallback _decode_native_and_split logic:
import json, base64
ranks = {}
with open('/path/to/Kimi-K2.5-NVFP4/tiktoken.model') as f:
for line in f:
line = line.strip()
if not line: continue
b64, r = line.split()
ranks[int(r)] = base64.b64decode(b64)
with open('/path/to/Kimi-K2.5-NVFP4/tokenizer_config.json') as f:
cfg = json.load(f)
atd = {int(k): v['content'].encode('utf-8') for k, v in cfg['added_tokens_decoder'].items()}
def decode_native_and_split(token_ids):
for t in token_ids:
if t in ranks:
yield ranks[t]
else:
yield atd[t] # KeyError ≡ Rust panic at patched_tiktoken.rs:82:64
next(decode_native_and_split([163600])) # KeyError: 163600
next(decode_native_and_split([163700])) # KeyError: 163700
Both ids are within the model's declared vocab_size (163840) and are valid model output ids.
Environment
- smg
1.4.2.dev15 (from lightseek.org/whl/cu130/)
- smg-grpc-servicer
0.5.3.dev15, smg-grpc-proto 0.4.8.dev15
- Model:
nvidia/Kimi-K2.5-NVFP4
- Workload: BFCL non_live, 1390 prompts,
--num-threads 64
- Frequency: 16 panics across 1390 requests; 3 surfaced to the client after BFCL/tenacity retries
Notes
- This is independent of the backend: any backend that ever emits one of the 233 reserved ids will trip the same panic.
- The lossy-UTF-8 fallback is fine; the panic is in
_decode_native_and_split, called from the fallback before the warn is logged.
- The panicking worker takes down only one tokio worker; the smg process keeps running, but the affected client request returns
Server disconnected without sending a response.
Summary
TiktokenTokenizer::decodein the smg gateway panics withno entry found for keywhen it has to decode a Kimi-K2.5 token id that lives in the vocab but is not registered byload_from_path. The panic crashes a tokio worker thread and breaks the in-flight/v1/chat/completionsconnection. Hit reliably in long sweeps (BFCL non_live, 16 panics over 1390 requests againstnvidia/Kimi-K2.5-NVFP4).This is a smg-side tokenizer wiring bug — the engine emits a valid id within the model's declared vocab range (
vocab_size = 163840), and smg's decoder is the one that fails to handle it.Symptom
Many lossy-UTF-8 warns precede the panic (normal multi-byte boundary handling), then one specific token id triggers the unconditional index access.
Root cause
tiktoken-rs-0.9.1/src/patched_tiktoken.rs(_decode_native_and_split):If
tokenis in neitherdecodernorspecial_tokens_decoder, the[&token]index panics. The crate exposes no safe variant.smg's lossy-decode fallback in
crates/tokenizer/src/tiktoken.rs:473-494(Decoder for TiktokenTokenizer::decode) is the only caller exposed to this path.Kimi K2.5 vocab gap
TiktokenTokenizer::load_from_path(crates/tokenizer/src/tiktoken.rs:217-251) buildsspecial_tokens_encoderfrom only two sources:tiktoken.modelBPE encoder0..163583)tokenizer_config.jsonadded_tokens_decoder163584..163839That leaves 233 ids unmapped in
{163589, 163592, 163600, 163608..163837}. These are reserved/extra-id slots in Kimi-K2.5's declared 163840-token vocab. They're rare in normal sampling but not impossible — over a long sweep they show up and crash a worker.Reproducer (Python parity of the panicking lookup)
Loads the same two sources smg loads, runs the same fallback
_decode_native_and_splitlogic:Both ids are within the model's declared
vocab_size(163840) and are valid model output ids.Environment
1.4.2.dev15(fromlightseek.org/whl/cu130/)0.5.3.dev15, smg-grpc-proto0.4.8.dev15nvidia/Kimi-K2.5-NVFP4--num-threads 64Notes
_decode_native_and_split, called from the fallback before the warn is logged.Server disconnected without sending a response.