Chatbot Promptfoo Tests

Promptfoo evaluation suite for testing APIMatic chatbot response quality across 7 SDK languages.

Test Coverage

Each run evaluates 8 prompts per language:

Test	Description
`authentication-method`	Validates the chatbot correctly identifies the API's auth scheme
`create-subscription`	Validates code generation for creating a subscription
`ask`	Validates guidance for cancelling a subscription
`instructions-following`	Validates step-by-step subscription renewal instructions
`sdk-infra-schema`	Validates chatbot knowledge of SDK model schemas
`sdk-infra-configuration`	Validates chatbot knowledge of SDK configuration options
`sdk-infra-codegen-prompt`	Validates complex code generation with logging and response parsing
`hallucination`	Validates the chatbot does not hallucinate non-existent API features

Two additional dependent-question tests (promptfooconfig-dependent.yaml) test multi-turn conversation context.

Supported languages: http_curl_v1, cs_net_standard_lib, java_eclipse_jre_lib, php_generic_lib_v2, python_generic_lib, ruby_generic_lib, ts_generic_lib

Setup

Install dependencies
```
npm install
```

Configure environment

Copy .env.example to .env and fill in your values:

cp .env.example chatbot/.env

Variable	Description
`CHATBOT_URL`	Full chatbot API endpoint URL
`CHATBOT_LANGUAGE`	SDK language identifier (see supported languages above)
`OPENAI_API_KEY`	OpenAI API key used by the `llm-rubric` judge

Running Evaluations

Run from the chatbot/ directory:

cd chatbot

# All languages except http_curl_v1 (main + dependent configs)
npx promptfoo eval --config promptfooconfig.yaml --output results-default.json
npx promptfoo eval --config promptfooconfig-dependent.yaml --output results-dependent.json

# HTTP/cURL only
npx promptfoo eval --config http.yaml --output results-http.json

How It Works

Each test sends a user question to the chatbot API via HTTP POST and evaluates the response using:

icontains — checks for required keywords in the response
javascript — checks for code blocks (```)
llm-rubric — uses OpenAI GPT as a judge against a RAG-context prompt (threshold: 0.85)

RAG context files live in chatbot/prompts/<language>/. Each .txt file describes what a correct answer should contain, passed to the LLM judge as the rubric.

CI

The workflow in .github/workflows/chatbot-promptfoo-tests.yml runs on manual dispatch. Required GitHub secrets:

Secret	Description
`CHATBOT_URL_DEV`	Dev environment chatbot URL
`CHATBOT_URL_PROD`	Prod environment chatbot URL
`OPENAI_API_KEY`	OpenAI API key for LLM judge

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
chatbot		chatbot
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chatbot Promptfoo Tests

Test Coverage

Setup

Running Evaluations

How It Works

CI

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Chatbot Promptfoo Tests

Test Coverage

Setup

Running Evaluations

How It Works

CI

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages