`latai` Generative AI Latency Measurement TUI

Latai is a Terminal UI application designed to evaluate the performance of Generative AI providers using either default prompts or your own custom inputs.

Latai’s TUI is structured into three distinct views:

Table View – This is the main interface displaying the list of AI models. Upon startup, latai verifies your access keys and loads models from providers that pass the verification. If a key is missing, a notification appears in the Events panel.
Information Panel – This section provides detailed insights into the current run. It’s especially useful when running multiple prompts, as it displays key performance metrics such as jitter, average latency, and min/max latency.
Events Panel – This panel logs all system activities in real-time. Since latai executes performance measurements in parallel without blocking the UI, you can monitor ongoing processes and check for any error messages here.

Usage

TLDR version (click on links to navigate to documentation sections):

Install.
Check your keys in environment.
- OPENAI_API_KEY for OpenAI models.
- GROQ_API_KEY for Groq models.
- AWS_PROFILE for AWS Bedrock
Run latai.

Installation

Two installation methods are available.

Releases

Select a release from the Releases Page 🔗.
Download the appropriate file for your platform:
- Mac
- Linux
- Windows
If you're unsure which release to download, always get the latest version.

Installing via Go tooling

For this one you need Go and its tooling to be installed.

# Install
go install github.com/pvlbzn/latai

# Run
latai

API Keys

Latai requires API keys to access LLM providers. Each provider is optional—by default, Latai attempts to load all providers and verifies their keys. If a key is missing, the corresponding provider will not be loaded.

If you don’t need a specific provider, simply omit its key.

TLDR:

Add the following API keys and values to your environment.
Update your terminal environment to apply the changes.

# OpenAI API key.
export OPENAI_API_KEY=

# Groq API key.
export GROQ_API_KEY=

# AWS Bedrock key. You can specify your AWS profile and region
# here. If you don't do this, yet you have your AWS CLI installed
# Latai will use `default` profile and `us-east-1` region.
export AWS_PROFILE=
export AWS_REGION=

Important

Transparency note. Keys never leave your machine. Latai has no telemetry and does not send your data anywhere. Each provider code has two functions which reach internet. First one is VerifyAccess which sends request to list all available models to check API key validity. Second one is Send which sends requests to LLMs to measure latency based on default or user prompts.

API key management is provider specific, here are the instructions for each supported provider.

OpenAI, Groq

OpenAI and Groq use the same API therefore their key management principle is identical. To set keys add these into your environment:

# OpenAI API key.
export OPENAI_API_KEY=

# Groq API key.
export GROQ_API_KEY=

If you don't need Groq, just don't add key.

AWS Bedrock

AWS uses their own mechanism of authentication which is based on AWS CLI. Refer to their documentation for details if you need it.

Latai will load your AWS profile in following order:

Access AWS_PROFILE and AWS_REGION from your environment.
If not found default to the default values AWS_PROFILE=default, AWS_REGION=us-east-1.

To set your profile and region add those:

export AWS_PROFILE=
export AWS_REGION=

Make sure that you either load Latai from the same terminal after exports, or add those exports into your shell rc file, e.g. .bashrc, .zshrc, etc.

Note

Make sure you have access to LLM models from your AWS account. They are not enabled by default. You have to navigate to https://REGION.console.aws.amazon.com/bedrock/home?region=REGION#/modelaccess and enable models from the console. Make sure to replace REGION with your actual region. Here is the link for us-east-1.

To verify your access you can use aws CLI.

aws bedrock \
  list-foundation-models \
  --region REGION \
  --profile PROFILE

Substitute REGION and PROFILE with your data. You can optionally pipe into jq to make output more readable.

Prompts: Default and Custom

Latai uses a set 3 pre-defined prompts by default. They are just good enough to measure latency to model and back. E.g. Respond with a single word: "optimistic".. You can find them here. Three pre-defined prompts meaning that by default all sampling happens with 3 runs.

If you need to measure compute time, or performance with your particular prompts, then you can add your own into ~/.latai directory.

# Create a directory where prompts are stored.
mkdir -p ~/.latai/prompts

# Create your prompts in there.
cd ~/.latai/prompts
touch p1.prompt

# Or create many.
touch {p1, p2, p3, p4, p5}.prompt

You can create any number of prompts you wish, just mind throttling and rate limiting. All prompts should have .prompt postfix, files with other postfixes will be ignored.

Providers & Vendors & Models

Definitions:

A provider is a service that grants access to models via an API.
A vendor is the company that owns, develops, trains, and aligns a model.
A model is an LLM with specific properties such as performance, context length, and supported languages.

Providers serve models through APIs. Models can be grouped into families, such as the Claude family. Some providers are mono-family, like OpenAI, which uses a single unified API for all its models. Others are multi-family, like AWS Bedrock, which has its own API, but the communication format varies depending on the model family.

Vendors may have one or more model families, typically defined by their API. For example, if models A and B share the same API and belong to the same vendor, they are considered part of the same family.

Latai organizes models by provider, as the provider is the core entity that runs the models. However, models can often be available through multiple providers. This distinction creates two API layers:

The provider API, which handles transport.
The model API, which defines the data format.

Rate Limits

Commonly rate limits measured in following metrics:

RPM: Requests per minute
RPD: Requests per day
TPM: Tokens per minute
TPD: Tokens per day

Verify those with your model provider. This information can be found at provider's documentation. You can find these links below. Keep in mind that these rate limits almost always negotiable with your provider, and generally limits applied to models, not provider itself.

Providers:

Groq Rate Limits
OpenAI Rate Limits
AWS Bedrock (read below)

OpenAI

OpenAI uses tiered rate limits from 1 to 5. For more details consult their documentation.

AWS Bedrock

AWS Bedrock has multiple providers under their name. Before using most of the models you have to request access to them via AWS UI.

Access

Note

Make sure you have access to LLM models from your AWS account. They are not enabled by default. You have to navigate to https://REGION.console.aws.amazon.com/bedrock/home?region=REGION#/modelaccess and enable models from the console. Make sure to replace REGION with your actual region.

To verify your access you can use aws CLI.

aws bedrock \
  list-foundation-models \
  --region REGION \
  --profile PROFILE

Substitute REGION and PROFILE with your data. You can optionally pipe into jq to make output more readable.

Models

Even though AWS Bedrock returns lots of models, not all of them can be accessed "as-is". For example AWS Bedrock lists more than 20 Claude-family models, however, only 6 out of them are available without provisioning. Genlat doesn't include models which require special access at the moment.

You can fork this repository and add required provisioned models by adding their ID into NewBedrock function in this file.

Contributing

Adding a New Provider

All providers reside at provider package. There is one main interface defined at provider.go file.

// Comments omitted for brevity, check source file 
// to see full version.
type Provider interface {
	Name() ModelProvider
	GetLLMModels(filter string) []*Model
	Measure(model *Model, prompt *prompt.Prompt) (*Metric, error)
	Send(message string, to *Model) (*Response, error)
	VerifyAccess() bool
}

If a struct satisfies this Provider interface it is ready to be used along with all other providers.

If provider you are adding is OpenAI API compliant check groq.go implementation.

Do not forget to add tests. You can see implementation of tests inside of provider package.

Troubleshooting

`err` as Latency Value

Read Events block of TUI, it generally explains what went wrong. The most common issues is related to AWS Bedrock due to access to models.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
cmd		cmd
docs		docs
internal		internal
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`latai` Generative AI Latency Measurement TUI

Usage

Installation

Releases

Installing via Go tooling

API Keys

OpenAI, Groq

AWS Bedrock

Prompts: Default and Custom

Providers & Vendors & Models

Rate Limits

OpenAI

AWS Bedrock

Access

Models

Contributing

Adding a New Provider

Troubleshooting

`err` as Latency Value

About

Uh oh!

Releases 1

Uh oh!

Languages

pvlbzn/latai

Folders and files

Latest commit

History

Repository files navigation

latai Generative AI Latency Measurement TUI

Usage

Installation

Releases

Installing via Go tooling

API Keys

OpenAI, Groq

AWS Bedrock

Prompts: Default and Custom

Providers & Vendors & Models

Rate Limits

OpenAI

AWS Bedrock

Access

Models

Contributing

Adding a New Provider

Troubleshooting

err as Latency Value

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Languages

`latai` Generative AI Latency Measurement TUI

`err` as Latency Value