A plain script to compare and count tokens for OpenAI
OpenAIToken Counter is a simple Python script designed to provide insights into the tokenization process of various languages compared to English.
to run the script you need the following Python libraries to be properly installed:
tiktokenrichtabulate
To install these prerequisites, execute:
pip install tiktoken rich tabulate-
Clone the Repository
-
Execute the Script:
python GPTTokencounter.py
-
Language Selection:
-
The program kicks off with a brief overview of its purpose.
-
You'll then be prompted to either:
[a] Default (English and Bangla)[b] Custom
Choose the default for a comparison between English and Bangla, or opt for a custom language pair.
-
-
Provide Required Inputs:
- For the custom option, you'll need to specify the ISO language code, an English word or sentence, and its counterpart in the chosen language.
-
Examine the Token Comparison:
- The next step showcases a table contrasting the tokenization of the English phrase with the selected language, leveraging the
gpt-3.5-turbomodel.
- The next step showcases a table contrasting the tokenization of the English phrase with the selected language, leveraging the
| Language | English | Bangla |
|---|---|---|
| Sentence | I speak Bengali | আমি বাংলায় কথা কই |
-
The
tiktokenlibrary is the backbone, facilitating token count based on the model. -
The default model in use is
gpt-3.5-turbo. Should you wish to experiment with others, adjust themodel_namevariable within themain()function. -
The
display_word_parameters()function integrates word wrapping, ensuring legibility for lengthier inputs.
We welcome feedback! If you come across any hiccups do reach out through GitHub.
