I noticed that in "examples/classification.py" maps labels to hardcoded token IDs:
0 -> 1294 ("No")
1 -> 3553 ("Yes")
While this works for the current tokenizer, it makes the example brittle if:
- the tokenizer vocabulary changes,
- a different Gemma variant is used, or
- the example is adapted to another model.
Additionally, the file contains minor typos:
- "grammaticaly" -> "grammatically"
- "respectivelly" -> "respectively"
Proposed Solution
- Compute the token IDs for "Yes" and "No" dynamically using the tokenizer instance.
- Add a small safety check to ensure that these labels map to a single token.
- Fix the spelling errors in the prompt template and comments.
This would make the example more robust while keeping its behavior unchanged.
I would be happy to implement this change.