Skip to content

[Examples] Avoid hardcoded label token IDs in classification.py in examples #496

@Ashish-Patnaik

Description

@Ashish-Patnaik

I noticed that in "examples/classification.py" maps labels to hardcoded token IDs:

0 -> 1294 ("No")
1 -> 3553 ("Yes")

While this works for the current tokenizer, it makes the example brittle if:

  • the tokenizer vocabulary changes,
  • a different Gemma variant is used, or
  • the example is adapted to another model.

Additionally, the file contains minor typos:

  • "grammaticaly" -> "grammatically"
  • "respectivelly" -> "respectively"

Proposed Solution

  1. Compute the token IDs for "Yes" and "No" dynamically using the tokenizer instance.
  2. Add a small safety check to ensure that these labels map to a single token.
  3. Fix the spelling errors in the prompt template and comments.

This would make the example more robust while keeping its behavior unchanged.
I would be happy to implement this change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions