tiktoken decode panics on unmapped Kimi K2.5 reserved token IDs

## Summary

`TiktokenTokenizer::decode` in the smg gateway panics with `no entry found for key` when it has to decode a Kimi-K2.5 token id that lives in the vocab but is not registered by `load_from_path`. The panic crashes a tokio worker thread and breaks the in-flight `/v1/chat/completions` connection. Hit reliably in long sweeps (BFCL non_live, 16 panics over 1390 requests against `nvidia/Kimi-K2.5-NVFP4`).

This is a smg-side tokenizer wiring bug — the engine emits a valid id within the model's declared vocab range (`vocab_size = 163840`), and smg's decoder is the one that fails to handle it.

## Symptom

```
[smg] thread 'tokio-rt-worker' (1724695) panicked at
  /root/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tiktoken-rs-0.9.1/src/patched_tiktoken.rs:82:64:
  no entry found for key
```

Many lossy-UTF-8 warns precede the panic (normal multi-byte boundary handling), then one specific token id triggers the unconditional index access.

## Root cause

`tiktoken-rs-0.9.1/src/patched_tiktoken.rs` (`_decode_native_and_split`):

```rust
let token_bytes = self
    .decoder
    .get(&token)
    .unwrap_or_else(|| &self.special_tokens_decoder[&token]);  // panics if missing
```

If `token` is in neither `decoder` nor `special_tokens_decoder`, the `[&token]` index panics. The crate exposes no safe variant.

smg's lossy-decode fallback in `crates/tokenizer/src/tiktoken.rs:473-494` (`Decoder for TiktokenTokenizer::decode`) is the only caller exposed to this path.

## Kimi K2.5 vocab gap

`TiktokenTokenizer::load_from_path` (`crates/tokenizer/src/tiktoken.rs:217-251`) builds `special_tokens_encoder` from only two sources:

| Source | IDs covered |
|---|---|
| `tiktoken.model` BPE encoder | 163584 entries (ids `0..163583`) |
| `tokenizer_config.json` `added_tokens_decoder` | 23 entries scattered in `163584..163839` |

That leaves **233 ids unmapped** in `{163589, 163592, 163600, 163608..163837}`. These are reserved/extra-id slots in Kimi-K2.5's declared 163840-token vocab. They're rare in normal sampling but not impossible — over a long sweep they show up and crash a worker.

## Reproducer (Python parity of the panicking lookup)

Loads the same two sources smg loads, runs the same fallback `_decode_native_and_split` logic:

```python
import json, base64
ranks = {}
with open('/path/to/Kimi-K2.5-NVFP4/tiktoken.model') as f:
    for line in f:
        line = line.strip()
        if not line: continue
        b64, r = line.split()
        ranks[int(r)] = base64.b64decode(b64)

with open('/path/to/Kimi-K2.5-NVFP4/tokenizer_config.json') as f:
    cfg = json.load(f)
atd = {int(k): v['content'].encode('utf-8') for k, v in cfg['added_tokens_decoder'].items()}

def decode_native_and_split(token_ids):
    for t in token_ids:
        if t in ranks:
            yield ranks[t]
        else:
            yield atd[t]   # KeyError ≡ Rust panic at patched_tiktoken.rs:82:64

next(decode_native_and_split([163600]))  # KeyError: 163600
next(decode_native_and_split([163700]))  # KeyError: 163700
```

Both ids are within the model's declared `vocab_size` (163840) and are valid model output ids.

## Environment

- smg `1.4.2.dev15` (from `lightseek.org/whl/cu130/`)
- smg-grpc-servicer `0.5.3.dev15`, smg-grpc-proto `0.4.8.dev15`
- Model: `nvidia/Kimi-K2.5-NVFP4`
- Workload: BFCL non_live, 1390 prompts, `--num-threads 64`
- Frequency: 16 panics across 1390 requests; 3 surfaced to the client after BFCL/tenacity retries

## Notes

- This is independent of the backend: any backend that ever emits one of the 233 reserved ids will trip the same panic.
- The lossy-UTF-8 fallback is fine; the panic is in `_decode_native_and_split`, called from the fallback before the warn is logged.
- The panicking worker takes down only one tokio worker; the smg process keeps running, but the affected client request returns `Server disconnected without sending a response`.


Source	IDs covered
`tiktoken.model` BPE encoder	163584 entries (ids `0..163583`)
`tokenizer_config.json` `added_tokens_decoder`	23 entries scattered in `163584..163839`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tiktoken decode panics on unmapped Kimi K2.5 reserved token IDs #1475

Summary

Symptom

Root cause

Kimi K2.5 vocab gap

Reproducer (Python parity of the panicking lookup)

Environment

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

tiktoken decode panics on unmapped Kimi K2.5 reserved token IDs #1475

Description

Summary

Symptom

Root cause

Kimi K2.5 vocab gap

Reproducer (Python parity of the panicking lookup)

Environment

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions