The test suite validates every service end-to-end against the running stack. The stack must be up before running tests.
# Run all tests
make test
# or
bash test.sh
# Run specific tests
bash test.sh test_health_endpoints test_mcp_tools_loaded
# List all available tests
bash test.sh --help- Health — all service health endpoints, docker compose service status, browser replica count, cloudflared tunnel reachability (skipped if not running)
- LiteLLM — API endpoints, model registration (base models always, plus optional Ollama/CUDA/Speaches models when enabled), authentication (valid/invalid/missing keys), chat completions, SSE streaming
- Nginx — routing to all backends, root path blocked (404), admin UI basic auth (no creds → 401, bad creds → 401, correct creds → 200, no-auth mode → 200), admin rate limiting (503/429 under rapid fire)
- MCP — all tools loaded across servers, per-server tool counts, specific tool presence, authentication
- MCP Media — image generation tool (via mcp_tools-generate_image), TTS tool (via mcp_tools-generate_tts), tool presence based on enabled providers, invalid model error handling, tool descriptions contain expected model names, sdcpp-cuda image generation through MCP, e2e LLM tool calling (qwen3-8b → tool_call → MCP generate_image → sdcpp-cuda → upload → LLM responds with link)
- sd.cpp (SDCPP=1) — wrapper health, model listing, status fields, load/unload/double-load/double-unload, auto-load on generate, image generation, model swap (sd-turbo ↔ sdxl-turbo), cleanup unload. Both CPU and CUDA variants.
- talkies (TALKIES=1 / TALKIES_CUDA=1) —
/healthzreachability,/v1/modelslists the configured model_ids (CPU: 4 ASR + Kokoro; CUDA: 7 ASR + Kokoro),/api/psreturns the loaded list,DELETE /api/ps/{model_id}accepts URL-encoded slugs (404 on never-loaded model_ids),POST /unloadreturns 200, and end-to-end transcription + TTS through the LiteLLM proxy (asserts non-emptytextfield, MP3 body for/v1/audio/speech). The e2e transcription test is gated on the presence oftests/.fixtures/audio.{wav,mp3,m4a,flac,ogg}(skipped if missing — ffmpeg-decoded inside the service so any format works). - predictalot (PREDICTALOT=1 / PREDICTALOT_CUDA=1) — per-type model listings (
/v1/univariate/models,/v1/multivariate/models— verifies non-members are excluded), bearer auth enforcement, univariate forecast via chronos-2, univariate ensemble (forecast/ensemble), unknown-model 404, non-member-of-type rejection (timesfm-2.5 on/v1/multivariate/forecast), and the 26-tool MCP surface (predictalot-forecast_univariate_chronos_2,predictalot-list_univariate_models, etc.) - mailbox (MAILBOX=1) — open
/health, auth enforcement on non-health endpoints,/mailboxesreturns at least one configured account, MCP tool surface. Optional e2e (gated onMAILBOX_TEST_MAILBOX_NAME+MAILBOX_TEST_ADDRESS): send → poll IMAP → read → delete → verify gone. - telethon (TELETHON=1) —
get_mereturns own profile (verifies the string session is authorized), MCP tool surface - HybridS3 — full CRUD lifecycle (upload, download, list, delete, verify deletion), public read without auth, write rejection without auth, presigned URL generation and download
- Browser — page navigation, interactive element detection, screenshot capture, full automation flow (navigate, find elements, click, type, screenshot)
- Claudebox — chat completion via LiteLLM, direct API access via nginx, file operations (upload, download, list, delete), z.ai instance reachability, OpenAI-compatible models endpoint (both instances)
- Integration — end-to-end workflow: browser navigation → screenshot → upload to storage → verify public URL → LLM summarization
- Security — auth on every endpoint, cross-token isolation, MCP fake token and session hijack, nginx path normalization bypass, HTTP request smuggling (CL.TE/TE.CL), h2c smuggling, hop-by-hop header abuse, SSRF via browser and MCP to internal services, prompt injection key extraction, path traversal on claudebox and hybrids3, S3 presign abuse, stored XSS headers, model name injection, header injection, large payload rejection, docker socket removal verification, Docker Engine API isolation, internal port exposure, health endpoint info leakage
Tests are designed for zero or minimal token consumption:
- LiteLLM model list — no tokens, just API metadata
- MCP tool list — no tokens, just tool registration check
- Claudebox — hits
/statusand/openai/v1/modelsdirectly (no inference), plus one minimal chat completion withclaudebox-haikuto verify the OAuth token is valid - Browser, storage, nginx — pure HTTP, no LLM calls
- Integration test — one model call with a short prompt; this is the only test that burns real tokens
To skip the integration test: bash test.sh and exclude test_integration_* from the run.