Promptfoo evaluation suite for testing APIMatic chatbot response quality across 7 SDK languages.
Each run evaluates 8 prompts per language:
| Test | Description |
|---|---|
authentication-method |
Validates the chatbot correctly identifies the API's auth scheme |
create-subscription |
Validates code generation for creating a subscription |
ask |
Validates guidance for cancelling a subscription |
instructions-following |
Validates step-by-step subscription renewal instructions |
sdk-infra-schema |
Validates chatbot knowledge of SDK model schemas |
sdk-infra-configuration |
Validates chatbot knowledge of SDK configuration options |
sdk-infra-codegen-prompt |
Validates complex code generation with logging and response parsing |
hallucination |
Validates the chatbot does not hallucinate non-existent API features |
Two additional dependent-question tests (promptfooconfig-dependent.yaml) test multi-turn conversation context.
Supported languages: http_curl_v1, cs_net_standard_lib, java_eclipse_jre_lib, php_generic_lib_v2, python_generic_lib, ruby_generic_lib, ts_generic_lib
-
Install dependencies
npm install
-
Configure environment
Copy
.env.exampleto.envand fill in your values:cp .env.example chatbot/.env
Variable Description CHATBOT_URLFull chatbot API endpoint URL CHATBOT_LANGUAGESDK language identifier (see supported languages above) OPENAI_API_KEYOpenAI API key used by the llm-rubricjudge
Run from the chatbot/ directory:
cd chatbot
# All languages except http_curl_v1 (main + dependent configs)
npx promptfoo eval --config promptfooconfig.yaml --output results-default.json
npx promptfoo eval --config promptfooconfig-dependent.yaml --output results-dependent.json
# HTTP/cURL only
npx promptfoo eval --config http.yaml --output results-http.jsonEach test sends a user question to the chatbot API via HTTP POST and evaluates the response using:
icontains— checks for required keywords in the responsejavascript— checks for code blocks (```)llm-rubric— uses OpenAI GPT as a judge against a RAG-context prompt (threshold: 0.85)
RAG context files live in chatbot/prompts/<language>/. Each .txt file describes what a correct answer should contain, passed to the LLM judge as the rubric.
The workflow in .github/workflows/chatbot-promptfoo-tests.yml runs on manual dispatch. Required GitHub secrets:
| Secret | Description |
|---|---|
CHATBOT_URL_DEV |
Dev environment chatbot URL |
CHATBOT_URL_PROD |
Prod environment chatbot URL |
OPENAI_API_KEY |
OpenAI API key for LLM judge |