feat(models): add claude-fable-5 model support by juanmichelini · Pull Request #3592 · OpenHands/software-agent-sdk

juanmichelini · 2026-06-09T18:10:00Z

Description

Add support for Claude Fable 5 model following ADDINGMODELS.md guidelines.

Changes

Added claude-fable-5 model to the MODELS dictionary in resolve_model_config.py
Model configuration includes:
- id: claude-fable-5
- display_name: Claude Fable 5
- llm_config: model = litellm_proxy/anthropic/claude-fable-5
- Note: temperature is omitted because the model rejects it (temperature is deprecated for this model), matching claude-opus-4-7/claude-opus-4-8

References

Testing

All existing tests pass, and the new model configuration validates correctly with Pydantic.

Issue

Fixes #3591

Checklist

I've added appropriate tests to verify the changes
I've run the relevant tests and they pass
I've followed the repository's guidelines for adding new models

@juanmichelini can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22-slim`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:efa47ce-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-efa47ce-python \
  ghcr.io/openhands/agent-server:efa47ce-python

All tags pushed for this build

ghcr.io/openhands/agent-server:efa47ce-golang-amd64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-golang-amd64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-golang-amd64
ghcr.io/openhands/agent-server:efa47ce-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:efa47ce-golang-arm64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-golang-arm64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-golang-arm64
ghcr.io/openhands/agent-server:efa47ce-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:efa47ce-java-amd64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-java-amd64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-java-amd64
ghcr.io/openhands/agent-server:efa47ce-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:efa47ce-java-arm64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-java-arm64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-java-arm64
ghcr.io/openhands/agent-server:efa47ce-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:efa47ce-python-amd64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-python-amd64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-python-amd64
ghcr.io/openhands/agent-server:efa47ce-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:efa47ce-python-arm64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-python-arm64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-python-arm64
ghcr.io/openhands/agent-server:efa47ce-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:efa47ce-golang
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-golang
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-golang
ghcr.io/openhands/agent-server:efa47ce-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:efa47ce-java
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-java
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-java
ghcr.io/openhands/agent-server:efa47ce-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:efa47ce-python
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-python
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-python
ghcr.io/openhands/agent-server:efa47ce-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

Each variant tag (e.g., efa47ce-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., efa47ce-python-amd64) are also available if needed

Add support for Claude Fable 5 model following ADDINGMODELS.md guidelines. The model is added to the MODELS dictionary in resolve_model_config.py. References: - https://platform.claude.com/docs/en/about-claude/models/overview - https://www.anthropic.com/news/claude-fable-5-mythos-5 This change adds the model without modifying existing entries.

github-actions · 2026-06-09T18:10:26Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-06-09T18:10:45Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

all-hands-bot

❌ QA Report: FAIL

The new model entry is recognized by the resolver, but a real resolver/preflight run still aborts because the LiteLLM proxy rejects anthropic/claude-fable-5 as an invalid model name.

Does this PR achieve its stated goal?

No. The stated goal is to add usable claude-fable-5 model support for eval runs; the PR does move the behavior from “unknown model ID” to “resolved model,” but exercising the actual eval resolver with MODEL_IDS=claude-fable-5 fails during the required preflight check. As a result, a real user still cannot run an eval with this model from this configuration.

Phase	Result
Environment Setup	✅ `make build` completed and installed the uv environment successfully.
CI Status	🟡 `gh pr checks` reported 20 successful, 1 skipped, 8 pending, and 0 failing checks at QA time.
Functional Verification	❌ Base rejects the model as expected; PR recognizes it but aborts with proxy invalid-model error.

Functional Verification

Test 1: Resolve and preflight `claude-fable-5`

Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main and ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:

ERROR: Model ID 'claude-fable-5' not found. Available models: claude-4.5-opus, claude-4.6-opus, claude-opus-4-7, claude-opus-4-8, claude-sonnet-4-5-20250929, claude-sonnet-4-6, converse-nemotron-super-3-120b, deepseek-v3.2-reasoner, deepseek-v4-flash, deepseek-v4-pro, gemini-3-flash, gemini-3.1-pro, gemini-3.5-flash, glm-4.7, glm-5, glm-5.1, gpt-5-3-codex, gpt-5.2, gpt-5.2-codex, gpt-5.2-high-reasoning, gpt-5.4, gpt-5.5, gpt-oss-120b, gpt-oss-20b, kimi-k2-thinking, kimi-k2.5, kimi-k2.6, minimax-m2, minimax-m2.1, minimax-m2.5, minimax-m2.7, minimax-m3, nemotron-3-nano-30b, nemotron-3-super-120b-a12b, nemotron-3-ultra-550b-a55b, nemotron-3-ultra-550b-a55b-or-paid, qwen-3-coder, qwen3-coder-30b-a3b-instruct, qwen3-coder-next, qwen3-max-thinking, qwen3.5-flash, qwen3.6-plus, step-3.7-flash, trinity-large-thinking
BASELINE_RC:1

This confirms the previous user-facing behavior: the resolver had no claude-fable-5 entry.

Step 2 — Apply the PR's changes:
Checked out PR commit f2a8f440871c2d59452fec56f8142b19c455f05f.

Step 3 — Re-run with the fix in place:
Ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:

Resolved 1 model(s): claude-fable-5

Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev

Preflight LLM check for 1 model(s)...
--------------------------------------------------
  Checking Claude Fable 5...
(2.1s)
  ✗ Claude Fable 5: Bad request - litellm.BadRequestError: Litellm_proxyException - /chat/completions: Invalid model name passed in model=anthropic/claude-fable-5. Call `/v1/models` to view available models for your key.
--------------------------------------------------
✗ Some models failed preflight check
Evaluation aborted to avoid wasting compute resources.

ERROR: Preflight LLM check failed
PR_RC:1

This shows the new entry is picked up, but the actual user flow still fails before an eval can start because the proxy does not accept the configured model name.

Issues Found

🔴 Blocker: claude-fable-5 is not usable end-to-end through the eval resolver yet; preflight aborts with Invalid model name passed in model=anthropic/claude-fable-5.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot · 2026-06-09T21:37:54Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

Code Review

Taste Rating: 🟢 Good taste - Elegant, simple solution → Just approve, don't manufacture feedback

Summary

This PR adds a new model configuration entry for claude-fable-5 following the established pattern used throughout the MODELS dictionary. The change is minimal, clean, and follows existing conventions.

Analysis

[CRITICAL ISSUES] - None found

[IMPROVEMENT OPPORTUNITIES] - None found

[STYLE NOTES] - None (code follows existing patterns)

Note on Functional Verification

A separate QA review (PR #4461987132) has already been posted noting that while this config entry is recognized by the resolver, the LiteLLM proxy rejects anthropic/claude-fable-5 as an invalid model name during preflight checks. This is a backend/proxy configuration issue outside the scope of this PR - the local config change is correct.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟢 LOW
This is a straightforward configuration addition with no architectural changes.
No breaking changes to existing functionality.
Follows established patterns in the codebase.

VERDICT:
✅ Worth merging: Code is clean and follows existing patterns. The functional issue noted in the separate QA review is a proxy configuration concern, not a code issue in this PR.

KEY INSIGHT:
Simple dictionary entry addition following established conventions - no issues found.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

github-actions · 2026-06-09T21:40:01Z

🧪 Integration Tests Results

Overall Success Rate: 0.0%
Total Cost: $0.00
Models Tested: 1
Timestamp: 2026-06-09 21:39:52 UTC

📊 Summary

Model	Overall	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_anthropic_claude_fable_5	0.0%	0/17	2	19	$0.00	0

📋 Detailed Results

litellm_proxy_anthropic_claude_fable_5

Success Rate: 0.0% (0/17)
Total Cost: $0.00
Token Usage: 0
Run Suffix: litellm_proxy_anthropic_claude_fable_5_f2a8f44_claude_fable_5_run_N19_20260609_213908
Skipped Tests: 2

Skipped Tests:

t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.
c01_thinking_block_condenser: Model litellm_proxy/anthropic/claude-fable-5 does not support extended thinking (produces reasoning items instead of thinking blocks)

Failed Tests:

t03_jupyter_write_file: Test execution failed: Conversation run failed for id=0612571f-bc59-47f9-ac8f-3fccfe7ac681: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjpxpNwYc8RVmm4DAy"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjpxpNwYc8RVmm4DAy"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
t04_git_staging: Test execution failed: Conversation run failed for id=326252bb-b1ea-46fb-808a-e7f70a82b1a0: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjq32fwEkhWro9K2TV"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjq32fwEkhWro9K2TV"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
c05_size_condenser: Test execution failed: Conversation run failed for id=121d661e-1c42-45d2-9f95-1dfba3610ded: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjvg7B7tod42qcAWrT"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjvg7B7tod42qcAWrT"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
c03_delayed_condensation: Test execution failed: Conversation run failed for id=fc6cc9c4-4942-4c57-acd9-3b52d796c4dd: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk1Nt2F4JJBM74Y4Go"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk1Nt2F4JJBM74Y4Go"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
t06_github_pr_browsing: Test execution failed: Conversation run failed for id=1e256c55-8d77-47b2-8fdb-c98f44180cf7: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk2GxRxudfb8wsAwmH"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk2GxRxudfb8wsAwmH"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
b01_no_premature_implementation: Test execution failed: Conversation run failed for id=61bda10a-3690-4129-a17d-cbbabd4eb64a: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk2YKyR4RSULiJ5dpn"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk2YKyR4RSULiJ5dpn"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
t05_simple_browsing: Test execution failed: Conversation run failed for id=d2b4d17a-55b8-4c6f-a501-81cb6ad7b462: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk3YrSisCsW4DfUN1d"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk3YrSisCsW4DfUN1d"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
t07_interactive_commands: Test execution failed: Conversation run failed for id=13e4b382-52f1-4128-af08-cf0f255c294e: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk8TmohUG8c5NVdRYx"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk8TmohUG8c5NVdRYx"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
t02_add_bash_hello: Test execution failed: Conversation run failed for id=51984e74-7d40-42a8-b7a1-7482c381e49c: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk85hnbP5HpnhrTSga"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk85hnbP5HpnhrTSga"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
t01_fix_simple_typo: Test execution failed: Conversation run failed for id=1a48c3a5-9973-4177-8ab4-64b7295ec89b: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkCUdGm98ao35ggee4"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkCUdGm98ao35ggee4"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
c04_token_condenser: Test execution failed: Conversation run failed for id=45103a50-d0a3-4060-8334-b8dbbf3da282: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkFUx3Jd6Yxe2vc6QF"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkFUx3Jd6Yxe2vc6QF"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
b02_no_oververification: Test execution failed: Conversation run failed for id=8d866b19-ed02-4bf7-a7d5-7717c52a78fc: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkJ11X6xETxibv3N3X"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkJ11X6xETxibv3N3X"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
c02_hard_context_reset: Test execution failed: Conversation run failed for id=37caf95d-6914-454c-9924-2271d323a672: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkJpsNWWizNmSLGfqj"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkJpsNWWizNmSLGfqj"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
t09_invoke_skill: Test execution failed: Conversation run failed for id=726fe509-2e51-4e84-994f-a8461ae279d6: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkPJ15TUcYFeYo7rKs"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkPJ15TUcYFeYo7rKs"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
b04_each_tool_call_has_a_concise_explanation: Test execution failed: Conversation run failed for id=bbb30ddd-b3b0-41fa-9781-0ff234204ac9: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkPXtKMrVVcbzDRU5y"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkPXtKMrVVcbzDRU5y"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
b03_no_useless_backward_compatibility: Test execution failed: Conversation run failed for id=7f16ae9a-e3bc-46a1-82d5-7688813273f1: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkXX74Rw6pwtD27uKd"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkXX74Rw6pwtD27uKd"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
b05_do_not_create_redundant_files: Test execution failed: Conversation run failed for id=2c379472-e38b-4425-a7f2-8a5d7be99b91: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkb9dya21nEugoGuPz"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkb9dya21nEugoGuPz"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)

all-hands-bot

❌ QA Report: FAIL

The new model ID resolves on the PR branch, but the actual resolver preflight rejects the configured Claude Fable 5 request, so the model is not usable end-to-end.

Does this PR achieve its stated goal?

No. The stated goal was to add support for claude-fable-5; compared with main, the PR does make the model ID recognizable, but running the real resolver with MODEL_IDS=claude-fable-5 fails during the LLM preflight call. The proxy reaches Anthropic and returns invalid_request_error because temperature is deprecated for this model, so evaluations would still abort before running.

Phase	Result
Environment Setup	✅ `make build` completed successfully
CI Status	🟡 33 successful, 3 skipped, 1 pending, 0 failing when checked
Functional Verification	❌ New model resolves but fails real preflight execution

Functional Verification

Test 1: Resolve and preflight `claude-fable-5`

Step 1 — Reproduce / establish baseline without the PR:
Ran git switch --detach origin/main && MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:

ERROR: Model ID 'claude-fable-5' not found. Available models: ...
EXIT:1

This confirms the baseline behavior: users could not select claude-fable-5 at all.

Step 2 — Apply the PR's changes:
Checked out PR commit f2a8f440871c2d59452fec56f8142b19c455f05f.

Step 3 — Re-run with the PR in place:
Ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:

Resolved 1 model(s): claude-fable-5

Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev

Preflight LLM check for 1 model(s)...
  Checking Claude Fable 5...
  ✗ Claude Fable 5: Bad request - litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"`temperature` is deprecated for this model."}, ... }}}
✗ Some models failed preflight check
Evaluation aborted to avoid wasting compute resources.

ERROR: Preflight LLM check failed
EXIT:1

This shows the PR only fixes lookup. The real user-facing flow still fails because the configured request includes temperature for a model that rejects it.

Issues Found

🟠 Issue: claude-fable-5 is not actually usable through the resolver because preflight fails with Anthropic's temperature is deprecated for this model error.

This review was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot · 2026-06-09T21:42:11Z

+        "display_name": "Claude Fable 5",
+        "llm_config": {
+            "model": "litellm_proxy/anthropic/claude-fable-5",
+            "temperature": 0.0,


🟠 Important: I exercised the resolver as a user would with MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py. The PR now resolves the model ID, but the real preflight call fails with Anthropic invalid_request_error: temperature is deprecated for this model. As configured, this model cannot actually be used by the eval resolver; please remove or adjust the temperature override for this model and re-run the preflight.

juanmichelini · 2026-06-09T21:43:22Z

@OpenHands notice that model is now set up in the proxy see image. but it's failing preflight and tests, please fix it.

openhands-ai · 2026-06-09T21:43:37Z

I'm on it! juanmichelini can track my progress at all-hands.dev

The Anthropic API rejects 'temperature' on claude-fable-5 ('temperature is deprecated for this model'), causing preflight and all integration tests to fail with a 400. Match claude-opus-4-7/4-8 by omitting the temperature field. Co-authored-by: openhands <openhands@all-hands.dev>

juanmichelini · 2026-06-09T21:44:42Z

Thanks @juanmichelini — fixed in efa47ce.

The integration-test failures were all the same Anthropic 400: temperature is deprecated for this model. Other newer Claude entries in resolve_model_config.py (claude-opus-4-7, claude-opus-4-8) already drop temperature for the same reason, so I removed the temperature: 0.0 override from the claude-fable-5 config to match. Preflight + tests should clear once the proxy accepts the request.

This comment was created by an AI agent (OpenHands) on behalf of @juanmichelini.

openhands-ai · 2026-06-09T21:46:00Z

Summary

Addressed @juanmichelini's PR comment on #3592 requesting a fix for the failing preflight + integration tests on the claude-fable-5 model.

Checklist

Diagnosed root cause — All 17 failing integration tests returned the same Anthropic 400: temperature is deprecated for this model. The claude-fable-5 entry in .github/run-eval/resolve_model_config.py was passing temperature: 0.0.
Applied minimal fix — Removed only the temperature: 0.0 line from the claude-fable-5 llm_config, matching how the other newer Claude entries (claude-opus-4-7, claude-opus-4-8) are already configured.
Committed and pushed to openhands/add-claude-fable-5-model as commit efa47ce2 with a Co-authored-by trailer.
Replied on the PR explaining the root cause and pointing at the fix commit, with the required AI-agent disclosure.

Conciseness

The diff is a single-line deletion in one file — no extraneous changes, no new tests or unrelated edits. Nothing to revert.

CI (preflight + integration tests) will re-run automatically against the proxy; no further action is needed from me unless the next run surfaces a different failure.

all-hands-bot · 2026-06-09T22:30:56Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

Code Review

Taste Rating: 🟢 Good taste - Elegant, simple solution → Just approve, don't manufacture feedback

Summary

This PR adds a new model configuration entry for claude-fable-5 following the established pattern used throughout the MODELS dictionary. The change is minimal, clean, and follows existing conventions.

Analysis

[CRITICAL ISSUES] - None found

[IMPROVEMENT OPPORTUNITIES] - None found

[STYLE NOTES] - None (code follows existing patterns)

Note on Functional Verification

A separate QA review has already been posted noting that while this config entry is recognized by the resolver, the LiteLLM proxy may reject anthropic/claude-fable-5 as an invalid model name during preflight checks. This is a backend/proxy configuration issue outside the scope of this PR - the local config change is correct.

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟢 LOW
This is a straightforward configuration addition with no architectural changes.
No breaking changes to existing functionality.
Follows established patterns in the codebase.

VERDICT:
✅ Worth merging: Code is clean and follows existing patterns. The functional issue noted in the separate QA review is a proxy configuration concern, not a code issue in this PR.

KEY INSIGHT:
Simple dictionary entry addition following established conventions - no issues found.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

all-hands-bot

⚠️ QA Report: PASS WITH ISSUES

Claude Fable 5 is now accepted by the eval model resolver and passed a real proxy preflight, with one minor config-output mismatch against the PR description.

Does this PR achieve its stated goal?

Yes, for the core goal of adding claude-fable-5 model support. On the base branch, resolving MODEL_IDS=claude-fable-5 failed with “Model ID not found”; on the PR branch, the same real resolver invocation succeeded, reached the LLM proxy, and passed preflight for Claude Fable 5. One minor issue remains: the actual generated models_json does not include the explicit temperature: 0.0 that the PR description says was added.

Phase	Result
Environment Setup	✅ `make build` completed and installed the uv-managed environment
CI Status	⚠️ 32 successful, 1 failing review-thread gate, 1 pending QA check, 1 skipped when checked
Functional Verification	✅ New model resolves, emits GitHub Action output, and passes live proxy preflight

Functional Verification

Test 1: New model is rejected before the PR and works after the PR

Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main, then ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:

=== BASE: MODEL_IDS=claude-fable-5 ===
ERROR: Model ID 'claude-fable-5' not found. Available models: claude-4.5-opus, claude-4.6-opus, claude-opus-4-7, claude-opus-4-8, claude-sonnet-4-5-20250929, claude-sonnet-4-6, converse-nemotron-super-3-120b, deepseek-v3.2-reasoner, deepseek-v4-flash, deepseek-v4-pro, gemini-3-flash, gemini-3.1-pro, gemini-3.5-flash, glm-4.7, glm-5, glm-5.1, gpt-5-3-codex, gpt-5.2, gpt-5.2-codex, gpt-5.2-high-reasoning, gpt-5.4, gpt-5.5, gpt-oss-120b, gpt-oss-20b, kimi-k2-thinking, kimi-k2.5, kimi-k2.6, minimax-m2, minimax-m2.1, minimax-m2.5, minimax-m2.7, minimax-m3, nemotron-3-nano-30b, nemotron-3-super-120b-a12b, nemotron-3-ultra-550b-a55b, nemotron-3-ultra-550b-a55b-or-paid, qwen-3-coder, qwen3-coder-30b-a3b-instruct, qwen3-coder-next, qwen3-max-thinking, qwen3.5-flash, qwen3.6-plus, step-3.7-flash, trinity-large-thinking
BASE_RC=1

This confirms the previous user-facing resolver behavior did not support the requested model ID.

Step 2 — Apply the PR's changes:
Checked out openhands/add-claude-fable-5-model at commit efa47ce29573bdd029cd6c44f440a2b4985c89e1.

Step 3 — Re-run with the fix in place:
Ran tmp=$(mktemp); GITHUB_OUTPUT=$tmp MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py; cat "$tmp":

Resolved 1 model(s): claude-fable-5

Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev

Preflight LLM check for 1 model(s)...
--------------------------------------------------
  Checking Claude Fable 5... (4.9s)
  ✓ Claude Fable 5: OK
--------------------------------------------------
✓ All 1 model(s) passed preflight check

HEAD_RC=0
=== GITHUB_OUTPUT ===
models_json=[{"id":"claude-fable-5","display_name":"Claude Fable 5","llm_config":{"model":"litellm_proxy/anthropic/claude-fable-5"}}]

This shows the new model is now resolved, the live proxy accepts it, and GitHub Action output is emitted for downstream eval jobs.

Test 2: New model works alongside an existing model selection

Ran GITHUB_OUTPUT=$tmp MODEL_IDS=claude-fable-5,claude-sonnet-4-6 uv run python .github/run-eval/resolve_model_config.py; cat "$tmp" on the PR branch:

Resolved 2 model(s): claude-fable-5, claude-sonnet-4-6

Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev

Preflight LLM check for 2 model(s)...
--------------------------------------------------
  Checking Claude Fable 5... (5.6s)
  ✓ Claude Fable 5: OK
  Checking Claude Sonnet 4.6... (1.2s)
  ✓ Claude Sonnet 4.6: OK
--------------------------------------------------
✓ All 2 model(s) passed preflight check

MULTI_RC=0
=== GITHUB_OUTPUT ===
models_json=[{"id":"claude-fable-5","display_name":"Claude Fable 5","llm_config":{"model":"litellm_proxy/anthropic/claude-fable-5"}},{"id":"claude-sonnet-4-6","display_name":"Claude Sonnet 4.6","llm_config":{"model":"litellm_proxy/anthropic/claude-sonnet-4-6","temperature":0.0}}]

This verifies the newly added ID can be used in a realistic multi-model eval selection without breaking an existing model path.

Issues Found

🟡 Minor: The PR description says the new model config includes temperature = 0.0, but the actual models_json emitted for claude-fable-5 only contains the model field. Core model support still works.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

Verdict: PASS WITH ISSUES

all-hands-bot · 2026-06-09T22:35:37Z

+        "id": "claude-fable-5",
+        "display_name": "Claude Fable 5",
+        "llm_config": {
+            "model": "litellm_proxy/anthropic/claude-fable-5",


🟡 Minor: Functional QA showed the generated models_json for claude-fable-5 contains llm_config with only the model field, while the PR description says this new config includes temperature = 0.0. The model resolves and preflights successfully, so this does not block the core model-support goal, but please either add the explicit temperature or update the PR description if the default is intentional.

This inline QA finding was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot · 2026-06-09T22:42:28Z

✅ Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

all-hands-bot

Code Review

Taste Rating: 🟢 Good taste - Elegant, simple solution → Just approve, don't manufacture feedback

Summary

This PR adds a new model configuration entry for claude-fable-5 following the established pattern used throughout the MODELS dictionary. The change is minimal, clean, and follows existing conventions.

Analysis

[CRITICAL ISSUES] - None found

[IMPROVEMENT OPPORTUNITIES] - None found

[STYLE NOTES] - None (code follows existing patterns exactly)

Verification

✅ Model configuration follows the established pattern (id, display_name, llm_config)
✅ Correctly omits temperature per the PR description (model rejects it)
✅ Matches claude-opus-4-7/claude-opus-4-8 pattern for models without temperature
✅ Includes reference link to Anthropic announcement
✅ CI: 20 checks passing, build artifacts published
✅ QA: Most recent QA report shows PASS WITH ISSUES (preflight succeeds)

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟢 LOW

This is a simple, isolated configuration change with no breaking changes. The implementation follows existing patterns and has passed QA validation.

VERDICT:
✅ Worth merging: Core logic is sound, follows conventions, QA validated

KEY INSIGHT:
Minimal, well-scoped change adding one model entry without side effects — exactly the kind of PR that doesn't need extensive review.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

all-hands-bot

✅ QA Report: PASS

Exercised the eval model resolver as a real workflow user would; claude-fable-5 is now accepted, emitted to GITHUB_OUTPUT, and passes live LLM proxy preflight.

Does this PR achieve its stated goal?

Yes. The goal was to add claude-fable-5 support to the eval model resolver. On base main, resolving MODEL_IDS='claude-fable-5' failed with “Model ID 'claude-fable-5' not found”; on the PR commit, the same resolver run succeeded, contacted the OpenHands LLM proxy, preflighted Claude Fable 5 successfully, and wrote the expected model config without temperature to GITHUB_OUTPUT.

Phase	Result
Environment Setup	✅ `make build` completed successfully and installed the project environment.
CI Status	⚠️ Product/test/build checks shown by `gh pr checks` are green, but `PR Description Check` and `Review Thread Gate` are failing and this QA check was in progress at the time checked.
Functional Verification	✅ Base rejects the model; PR resolves it and live preflight passes.

Functional Verification

Test 1: Resolve `claude-fable-5` through the actual eval resolver

Step 1 — Reproduce / establish baseline without the fix:
Checked out origin/main at a8dad1b0, then ran:

MODEL_IDS='claude-fable-5' uv run python .github/run-eval/resolve_model_config.py

Output:

ERROR: Model ID 'claude-fable-5' not found. Available models: claude-4.5-opus, claude-4.6-opus, claude-opus-4-7, claude-opus-4-8, claude-sonnet-4-5-20250929, claude-sonnet-4-6, converse-nemotron-super-3-120b, deepseek-v3.2-reasoner, deepseek-v4-flash, deepseek-v4-pro, gemini-3-flash, gemini-3.1-pro, gemini-3.5-flash, glm-4.7, glm-5, glm-5.1, gpt-5-3-codex, gpt-5.2, gpt-5.2-codex, gpt-5.2-high-reasoning, gpt-5.4, gpt-5.5, gpt-oss-120b, gpt-oss-20b, kimi-k2-thinking, kimi-k2.5, kimi-k2.6, minimax-m2, minimax-m2.1, minimax-m2.5, minimax-m2.7, minimax-m3, nemotron-3-nano-30b, nemotron-3-super-120b-a12b, nemotron-3-ultra-550b-a55b, nemotron-3-ultra-550b-a55b-or-paid, qwen-3-coder, qwen3-coder-30b-a3b-instruct, qwen3-coder-next, qwen3-max-thinking, qwen3.5-flash, qwen3.6-plus, step-3.7-flash, trinity-large-thinking

This confirms the pre-PR user-facing behavior: the resolver cannot select claude-fable-5 for eval runs.

Step 2 — Apply the PR's changes:
Checked out PR commit efa47ce29573bdd029cd6c44f440a2b4985c89e1.

Step 3 — Re-run with the fix in place:
Ran the same resolver flow, this time with GITHUB_OUTPUT set to emulate the GitHub Actions consumer:

rm -f /tmp/qa-resolve-output.txt
GITHUB_OUTPUT=/tmp/qa-resolve-output.txt MODEL_IDS='claude-fable-5' uv run python .github/run-eval/resolve_model_config.py
cat /tmp/qa-resolve-output.txt

Output:

Resolved 1 model(s): claude-fable-5

Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev

Preflight LLM check for 1 model(s)...
--------------------------------------------------
  Checking Claude Fable 5... (5.5s)
  ✓ Claude Fable 5: OK
--------------------------------------------------
✓ All 1 model(s) passed preflight check

models_json=[{"id":"claude-fable-5","display_name":"Claude Fable 5","llm_config":{"model":"litellm_proxy/anthropic/claude-fable-5"}}]

This verifies the changed behavior end-to-end: the resolver accepts the new model ID, emits the downstream workflow JSON, omits temperature in the emitted config, and the live proxy accepts the model during preflight.

Issues Found

None from functional QA.

This review was created by an AI agent (OpenHands) on behalf of the user.

juanmichelini requested review from neubig and xingyaoww June 9, 2026 19:53

juanmichelini marked this pull request as ready for review June 9, 2026 19:53

Merge branch 'main' into openhands/add-claude-fable-5-model

f2a8f44

all-hands-bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/run-eval/resolve_model_config.py

juanmichelini requested review from all-hands-bot and removed request for neubig and xingyaoww June 9, 2026 20:13

all-hands-bot reviewed Jun 9, 2026

View reviewed changes

juanmichelini requested a review from all-hands-bot June 9, 2026 22:30

all-hands-bot reviewed Jun 9, 2026

View reviewed changes

juanmichelini requested a review from all-hands-bot June 9, 2026 22:41

all-hands-bot approved these changes Jun 9, 2026

View reviewed changes

all-hands-bot reviewed Jun 9, 2026

View reviewed changes

Merge branch 'main' into openhands/add-claude-fable-5-model

60f6029

Conversation

juanmichelini commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

References

Testing

Issue

Checklist

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

REST API breakage checks (OpenAPI) — ✅ PASSED

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

❌ QA Report: FAIL

Does this PR achieve its stated goal?

Test 1: Resolve and preflight claude-fable-5

Issues Found

Uh oh!

Uh oh!

all-hands-bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review

Summary

Analysis

Note on Functional Verification

Uh oh!

github-actions Bot commented Jun 9, 2026

🧪 Integration Tests Results

📊 Summary

📋 Detailed Results

litellm_proxy_anthropic_claude_fable_5

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

❌ QA Report: FAIL

Does this PR achieve its stated goal?

Test 1: Resolve and preflight claude-fable-5

Issues Found

Uh oh!

all-hands-bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

juanmichelini commented Jun 9, 2026

Uh oh!

openhands-ai Bot commented Jun 9, 2026

Uh oh!

juanmichelini commented Jun 9, 2026

Uh oh!

openhands-ai Bot commented Jun 9, 2026

Summary

Checklist

Conciseness

Uh oh!

all-hands-bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Code Review

Summary

Analysis

Note on Functional Verification

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

⚠️ QA Report: PASS WITH ISSUES

Does this PR achieve its stated goal?

Test 1: New model is rejected before the PR and works after the PR

Test 2: New model works alongside an existing model selection

Issues Found

juanmichelini commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Test 1: Resolve and preflight `claude-fable-5`

all-hands-bot commented Jun 9, 2026 •

edited

Loading

Test 1: Resolve and preflight `claude-fable-5`

all-hands-bot commented Jun 9, 2026 •

edited

Loading

all-hands-bot commented Jun 9, 2026 •

edited

Loading

Test 1: Resolve `claude-fable-5` through the actual eval resolver