Skip to content

tiktoken decode panics on unmapped Kimi K2.5 reserved token IDs #1475

@key4ng

Description

@key4ng

Summary

TiktokenTokenizer::decode in the smg gateway panics with no entry found for key when it has to decode a Kimi-K2.5 token id that lives in the vocab but is not registered by load_from_path. The panic crashes a tokio worker thread and breaks the in-flight /v1/chat/completions connection. Hit reliably in long sweeps (BFCL non_live, 16 panics over 1390 requests against nvidia/Kimi-K2.5-NVFP4).

This is a smg-side tokenizer wiring bug — the engine emits a valid id within the model's declared vocab range (vocab_size = 163840), and smg's decoder is the one that fails to handle it.

Symptom

[smg] thread 'tokio-rt-worker' (1724695) panicked at
  /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tiktoken-rs-0.9.1/src/patched_tiktoken.rs:82:64:
  no entry found for key

Many lossy-UTF-8 warns precede the panic (normal multi-byte boundary handling), then one specific token id triggers the unconditional index access.

Root cause

tiktoken-rs-0.9.1/src/patched_tiktoken.rs (_decode_native_and_split):

let token_bytes = self
    .decoder
    .get(&token)
    .unwrap_or_else(|| &self.special_tokens_decoder[&token]);  // panics if missing

If token is in neither decoder nor special_tokens_decoder, the [&token] index panics. The crate exposes no safe variant.

smg's lossy-decode fallback in crates/tokenizer/src/tiktoken.rs:473-494 (Decoder for TiktokenTokenizer::decode) is the only caller exposed to this path.

Kimi K2.5 vocab gap

TiktokenTokenizer::load_from_path (crates/tokenizer/src/tiktoken.rs:217-251) builds special_tokens_encoder from only two sources:

Source IDs covered
tiktoken.model BPE encoder 163584 entries (ids 0..163583)
tokenizer_config.json added_tokens_decoder 23 entries scattered in 163584..163839

That leaves 233 ids unmapped in {163589, 163592, 163600, 163608..163837}. These are reserved/extra-id slots in Kimi-K2.5's declared 163840-token vocab. They're rare in normal sampling but not impossible — over a long sweep they show up and crash a worker.

Reproducer (Python parity of the panicking lookup)

Loads the same two sources smg loads, runs the same fallback _decode_native_and_split logic:

import json, base64
ranks = {}
with open('/path/to/Kimi-K2.5-NVFP4/tiktoken.model') as f:
    for line in f:
        line = line.strip()
        if not line: continue
        b64, r = line.split()
        ranks[int(r)] = base64.b64decode(b64)

with open('/path/to/Kimi-K2.5-NVFP4/tokenizer_config.json') as f:
    cfg = json.load(f)
atd = {int(k): v['content'].encode('utf-8') for k, v in cfg['added_tokens_decoder'].items()}

def decode_native_and_split(token_ids):
    for t in token_ids:
        if t in ranks:
            yield ranks[t]
        else:
            yield atd[t]   # KeyError ≡ Rust panic at patched_tiktoken.rs:82:64

next(decode_native_and_split([163600]))  # KeyError: 163600
next(decode_native_and_split([163700]))  # KeyError: 163700

Both ids are within the model's declared vocab_size (163840) and are valid model output ids.

Environment

  • smg 1.4.2.dev15 (from lightseek.org/whl/cu130/)
  • smg-grpc-servicer 0.5.3.dev15, smg-grpc-proto 0.4.8.dev15
  • Model: nvidia/Kimi-K2.5-NVFP4
  • Workload: BFCL non_live, 1390 prompts, --num-threads 64
  • Frequency: 16 panics across 1390 requests; 3 surfaced to the client after BFCL/tenacity retries

Notes

  • This is independent of the backend: any backend that ever emits one of the 233 reserved ids will trip the same panic.
  • The lossy-UTF-8 fallback is fine; the panic is in _decode_native_and_split, called from the fallback before the warn is logged.
  • The panicking worker takes down only one tokio worker; the smg process keeps running, but the affected client request returns Server disconnected without sending a response.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions