feat: add live LLM integration tests (label-triggered CI)#624
Draft
feat: add live LLM integration tests (label-triggered CI)#624
Conversation
Add a new test suite that runs the real CLI against a real LLM provider in --headless mode. Tests are opt-in and designed to validate the full end-to-end pipeline with minimal, high-coverage prompts. New files: - tests/live_llm/conftest.py — fixtures (run_cli), pytest --run-live-llm option, JSON result reporting for CI - tests/live_llm/test_live_llm.py — 3 tests: 1. test_echo_command: LLM → tool-call → terminal → observation → summary 2. test_file_create_and_read: multi-step planning + file I/O 3. test_python_code_gen_and_run: code gen + file creation + Python execution - .github/workflows/live-llm-tests.yml — triggered by 'run-live-llm' PR label or manual dispatch; posts results as sticky PR comment - tests/live_llm/README.md — usage docs Also: - Makefile: add test-live-llm target - .gitignore: add .live-llm-results/ - AGENTS.md: document the new test layer Co-authored-by: openhands <openhands@all-hands.dev>
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed
Add a new test suite that runs the real CLI against a real LLM provider in
--headlessmode, validating the full end-to-end pipeline. Tests are opt-in and triggered by therun-live-llmPR label.Why
All existing "conversation tests" use mock LLM servers with trajectory replay. This new layer validates that the CLI actually works end-to-end with a real LLM — covering LLM connectivity, tool-call parsing, terminal execution, observation routing, multi-step planning, code generation, and headless output.
New files
tests/live_llm/conftest.py--run-live-llmpytest option,run_clifixture (headless subprocess wrapper), JSON result reportingtests/live_llm/test_live_llm.pytests/live_llm/README.md.github/workflows/live-llm-tests.ymlrun-live-llmlabel trigger, PR comment with resultsTest cases
3 prompts chosen to maximize component coverage with minimal LLM calls:
test_echo_commandecho 'hello from openhands'--override-with-envstest_file_create_and_readtest_python_code_gen_and_runVerified
All 3 tests pass with
anthropic/claude-haiku-4-5-20251001via LiteLLM proxy:Existing tests unaffected (1290 passed, 3 skipped):
Commands run
uv run ruff check tests/live_llm/✅uv run ruff format tests/live_llm/ --check✅uv run pytest --ignore=tests/snapshots -q→ 1290 passed, 3 skipped ✅uv run pytest tests/live_llm/ -v→ 3 skipped (no flag) ✅uv run pytest tests/live_llm/ --run-live-llm -v→ 3 passed ✅Other changes
Makefile: addtest-live-llmtarget.gitignore: add.live-llm-results/AGENTS.md: document live LLM test layer🚀 Try this PR