Skip to content

Count the number of tokens sent #5

@vincentfretin

Description

@vincentfretin

Instead of

.slice(-5000)

to limit to 5000 characters sent (5000 characters is between 714 words and 1250 words according to google) we should probably use GPT3Tokenizer to really limit the tokens sent to 4000 or a configurable value.
I found out about it while reading this article https://supabase.com/blog/openai-embeddings-postgres-vector
Also https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them says
Token Limits: Depending on the model used (GPT-3), requests can use up to 4097 tokens shared between prompt and completion. If your prompt is 4000 tokens, your completion can be 97 tokens at most.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions