feat: add live LLM integration tests (label-triggered CI) by xingyaoww · Pull Request #624 · OpenHands/OpenHands-CLI

xingyaoww · 2026-03-31T13:32:15Z

What changed

Add a new test suite that runs the real CLI against a real LLM provider in --headless mode, validating the full end-to-end pipeline. Tests are opt-in and triggered by the run-live-llm PR label.

Why

All existing "conversation tests" use mock LLM servers with trajectory replay. This new layer validates that the CLI actually works end-to-end with a real LLM — covering LLM connectivity, tool-call parsing, terminal execution, observation routing, multi-step planning, code generation, and headless output.

New files

File	Purpose
`tests/live_llm/conftest.py`	`--run-live-llm` pytest option, `run_cli` fixture (headless subprocess wrapper), JSON result reporting
`tests/live_llm/test_live_llm.py`	3 test cases (see below)
`tests/live_llm/README.md`	Usage docs
`.github/workflows/live-llm-tests.yml`	CI workflow: `run-live-llm` label trigger, PR comment with results

Test cases

3 prompts chosen to maximize component coverage with minimal LLM calls:

Test	Prompt	Components covered
`test_echo_command`	`echo 'hello from openhands'`	LLM connectivity → tool-call parsing → TerminalTool → observation → agent summary → `--override-with-envs`
`test_file_create_and_read`	Create check.txt + cat it back	Multi-step planning, file I/O (write → read), cross-turn observation correctness
`test_python_code_gen_and_run`	Write calc.py (2**10) + run it	Code generation, file creation, Python execution, numeric output

Verified

All 3 tests pass with anthropic/claude-haiku-4-5-20251001 via LiteLLM proxy:

tests/live_llm/test_live_llm.py::TestEchoCommand::test_echo_command PASSED
tests/live_llm/test_live_llm.py::TestFileCreateAndRead::test_file_create_and_read PASSED
tests/live_llm/test_live_llm.py::TestCodeGenAndExecution::test_python_code_gen_and_run PASSED
3 passed in 30.04s

Existing tests unaffected (1290 passed, 3 skipped):

uv run pytest --ignore=tests/snapshots  # 1290 passed, 3 skipped

Commands run

uv run ruff check tests/live_llm/ ✅
uv run ruff format tests/live_llm/ --check ✅
uv run pytest --ignore=tests/snapshots -q → 1290 passed, 3 skipped ✅
uv run pytest tests/live_llm/ -v → 3 skipped (no flag) ✅
uv run pytest tests/live_llm/ --run-live-llm -v → 3 passed ✅

Other changes

Makefile: add test-live-llm target
.gitignore: add .live-llm-results/
AGENTS.md: document live LLM test layer

🚀 Try this PR

uvx --python 3.12 git+https://github.com/OpenHands/OpenHands-CLI.git@feat/live-llm-integration-tests

Add a new test suite that runs the real CLI against a real LLM provider in --headless mode. Tests are opt-in and designed to validate the full end-to-end pipeline with minimal, high-coverage prompts. New files: - tests/live_llm/conftest.py — fixtures (run_cli), pytest --run-live-llm option, JSON result reporting for CI - tests/live_llm/test_live_llm.py — 3 tests: 1. test_echo_command: LLM → tool-call → terminal → observation → summary 2. test_file_create_and_read: multi-step planning + file I/O 3. test_python_code_gen_and_run: code gen + file creation + Python execution - .github/workflows/live-llm-tests.yml — triggered by 'run-live-llm' PR label or manual dispatch; posts results as sticky PR comment - tests/live_llm/README.md — usage docs Also: - Makefile: add test-live-llm target - .gitignore: add .live-llm-results/ - AGENTS.md: document the new test layer Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-31T13:34:29Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	6631	906	86%

report-only-changed-files is enabled. No files were changed during this commit :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add live LLM integration tests (label-triggered CI)#624

feat: add live LLM integration tests (label-triggered CI)#624
xingyaoww wants to merge 1 commit intomainfrom
feat/live-llm-integration-tests

xingyaoww commented Mar 31, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xingyaoww commented Mar 31, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Why

New files

Test cases

Verified

Commands run

Other changes

🚀 Try this PR

Uh oh!

github-actions Bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xingyaoww commented Mar 31, 2026 •

edited by github-actions Bot

Loading