Skip to content

WasifIsrar/chatbot-promptfoo-tests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chatbot Promptfoo Tests

Promptfoo evaluation suite for testing APIMatic chatbot response quality across 7 SDK languages.

Test Coverage

Each run evaluates 8 prompts per language:

Test Description
authentication-method Validates the chatbot correctly identifies the API's auth scheme
create-subscription Validates code generation for creating a subscription
ask Validates guidance for cancelling a subscription
instructions-following Validates step-by-step subscription renewal instructions
sdk-infra-schema Validates chatbot knowledge of SDK model schemas
sdk-infra-configuration Validates chatbot knowledge of SDK configuration options
sdk-infra-codegen-prompt Validates complex code generation with logging and response parsing
hallucination Validates the chatbot does not hallucinate non-existent API features

Two additional dependent-question tests (promptfooconfig-dependent.yaml) test multi-turn conversation context.

Supported languages: http_curl_v1, cs_net_standard_lib, java_eclipse_jre_lib, php_generic_lib_v2, python_generic_lib, ruby_generic_lib, ts_generic_lib

Setup

  1. Install dependencies

    npm install
  2. Configure environment

    Copy .env.example to .env and fill in your values:

    cp .env.example chatbot/.env
    Variable Description
    CHATBOT_URL Full chatbot API endpoint URL
    CHATBOT_LANGUAGE SDK language identifier (see supported languages above)
    OPENAI_API_KEY OpenAI API key used by the llm-rubric judge

Running Evaluations

Run from the chatbot/ directory:

cd chatbot

# All languages except http_curl_v1 (main + dependent configs)
npx promptfoo eval --config promptfooconfig.yaml --output results-default.json
npx promptfoo eval --config promptfooconfig-dependent.yaml --output results-dependent.json

# HTTP/cURL only
npx promptfoo eval --config http.yaml --output results-http.json

How It Works

Each test sends a user question to the chatbot API via HTTP POST and evaluates the response using:

  • icontains — checks for required keywords in the response
  • javascript — checks for code blocks (```)
  • llm-rubric — uses OpenAI GPT as a judge against a RAG-context prompt (threshold: 0.85)

RAG context files live in chatbot/prompts/<language>/. Each .txt file describes what a correct answer should contain, passed to the LLM judge as the rubric.

CI

The workflow in .github/workflows/chatbot-promptfoo-tests.yml runs on manual dispatch. Required GitHub secrets:

Secret Description
CHATBOT_URL_DEV Dev environment chatbot URL
CHATBOT_URL_PROD Prod environment chatbot URL
OPENAI_API_KEY OpenAI API key for LLM judge

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors