feat(models): add claude-fable-5 model support#3592
Conversation
Add support for Claude Fable 5 model following ADDINGMODELS.md guidelines. The model is added to the MODELS dictionary in resolve_model_config.py. References: - https://platform.claude.com/docs/en/about-claude/models/overview - https://www.anthropic.com/news/claude-fable-5-mythos-5 This change adds the model without modifying existing entries.
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
❌ QA Report: FAIL
The new model entry is recognized by the resolver, but a real resolver/preflight run still aborts because the LiteLLM proxy rejects anthropic/claude-fable-5 as an invalid model name.
Does this PR achieve its stated goal?
No. The stated goal is to add usable claude-fable-5 model support for eval runs; the PR does move the behavior from “unknown model ID” to “resolved model,” but exercising the actual eval resolver with MODEL_IDS=claude-fable-5 fails during the required preflight check. As a result, a real user still cannot run an eval with this model from this configuration.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed and installed the uv environment successfully. |
| CI Status | 🟡 gh pr checks reported 20 successful, 1 skipped, 8 pending, and 0 failing checks at QA time. |
| Functional Verification | ❌ Base rejects the model as expected; PR recognizes it but aborts with proxy invalid-model error. |
Functional Verification
Test 1: Resolve and preflight claude-fable-5
Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main and ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:
ERROR: Model ID 'claude-fable-5' not found. Available models: claude-4.5-opus, claude-4.6-opus, claude-opus-4-7, claude-opus-4-8, claude-sonnet-4-5-20250929, claude-sonnet-4-6, converse-nemotron-super-3-120b, deepseek-v3.2-reasoner, deepseek-v4-flash, deepseek-v4-pro, gemini-3-flash, gemini-3.1-pro, gemini-3.5-flash, glm-4.7, glm-5, glm-5.1, gpt-5-3-codex, gpt-5.2, gpt-5.2-codex, gpt-5.2-high-reasoning, gpt-5.4, gpt-5.5, gpt-oss-120b, gpt-oss-20b, kimi-k2-thinking, kimi-k2.5, kimi-k2.6, minimax-m2, minimax-m2.1, minimax-m2.5, minimax-m2.7, minimax-m3, nemotron-3-nano-30b, nemotron-3-super-120b-a12b, nemotron-3-ultra-550b-a55b, nemotron-3-ultra-550b-a55b-or-paid, qwen-3-coder, qwen3-coder-30b-a3b-instruct, qwen3-coder-next, qwen3-max-thinking, qwen3.5-flash, qwen3.6-plus, step-3.7-flash, trinity-large-thinking
BASELINE_RC:1
This confirms the previous user-facing behavior: the resolver had no claude-fable-5 entry.
Step 2 — Apply the PR's changes:
Checked out PR commit f2a8f440871c2d59452fec56f8142b19c455f05f.
Step 3 — Re-run with the fix in place:
Ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:
Resolved 1 model(s): claude-fable-5
Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev
Preflight LLM check for 1 model(s)...
--------------------------------------------------
Checking Claude Fable 5...
(2.1s)
✗ Claude Fable 5: Bad request - litellm.BadRequestError: Litellm_proxyException - /chat/completions: Invalid model name passed in model=anthropic/claude-fable-5. Call `/v1/models` to view available models for your key.
--------------------------------------------------
✗ Some models failed preflight check
Evaluation aborted to avoid wasting compute resources.
ERROR: Preflight LLM check failed
PR_RC:1
This shows the new entry is picked up, but the actual user flow still fails before an eval can start because the proxy does not accept the configured model name.
Issues Found
- 🔴 Blocker:
claude-fable-5is not usable end-to-end through the eval resolver yet; preflight aborts withInvalid model name passed in model=anthropic/claude-fable-5.
This review was created by an AI agent (OpenHands) on behalf of the user.
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review
Taste Rating: 🟢 Good taste - Elegant, simple solution → Just approve, don't manufacture feedback
Summary
This PR adds a new model configuration entry for claude-fable-5 following the established pattern used throughout the MODELS dictionary. The change is minimal, clean, and follows existing conventions.
Analysis
[CRITICAL ISSUES] - None found
[IMPROVEMENT OPPORTUNITIES] - None found
[STYLE NOTES] - None (code follows existing patterns)
Note on Functional Verification
A separate QA review (PR #4461987132) has already been posted noting that while this config entry is recognized by the resolver, the LiteLLM proxy rejects anthropic/claude-fable-5 as an invalid model name during preflight checks. This is a backend/proxy configuration issue outside the scope of this PR - the local config change is correct.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW - This is a straightforward configuration addition with no architectural changes.
- No breaking changes to existing functionality.
- Follows established patterns in the codebase.
VERDICT:
✅ Worth merging: Code is clean and follows existing patterns. The functional issue noted in the separate QA review is a proxy configuration concern, not a code issue in this PR.
KEY INSIGHT:
Simple dictionary entry addition following established conventions - no issues found.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
🧪 Integration Tests ResultsOverall Success Rate: 0.0% 📊 Summary
📋 Detailed Resultslitellm_proxy_anthropic_claude_fable_5
Skipped Tests:
Failed Tests:
|
all-hands-bot
left a comment
There was a problem hiding this comment.
❌ QA Report: FAIL
The new model ID resolves on the PR branch, but the actual resolver preflight rejects the configured Claude Fable 5 request, so the model is not usable end-to-end.
Does this PR achieve its stated goal?
No. The stated goal was to add support for claude-fable-5; compared with main, the PR does make the model ID recognizable, but running the real resolver with MODEL_IDS=claude-fable-5 fails during the LLM preflight call. The proxy reaches Anthropic and returns invalid_request_error because temperature is deprecated for this model, so evaluations would still abort before running.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully |
| CI Status | 🟡 33 successful, 3 skipped, 1 pending, 0 failing when checked |
| Functional Verification | ❌ New model resolves but fails real preflight execution |
Functional Verification
Test 1: Resolve and preflight claude-fable-5
Step 1 — Reproduce / establish baseline without the PR:
Ran git switch --detach origin/main && MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:
ERROR: Model ID 'claude-fable-5' not found. Available models: ...
EXIT:1
This confirms the baseline behavior: users could not select claude-fable-5 at all.
Step 2 — Apply the PR's changes:
Checked out PR commit f2a8f440871c2d59452fec56f8142b19c455f05f.
Step 3 — Re-run with the PR in place:
Ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:
Resolved 1 model(s): claude-fable-5
Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev
Preflight LLM check for 1 model(s)...
Checking Claude Fable 5...
✗ Claude Fable 5: Bad request - litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"`temperature` is deprecated for this model."}, ... }}}
✗ Some models failed preflight check
Evaluation aborted to avoid wasting compute resources.
ERROR: Preflight LLM check failed
EXIT:1
This shows the PR only fixes lookup. The real user-facing flow still fails because the configured request includes temperature for a model that rejects it.
Issues Found
- 🟠 Issue:
claude-fable-5is not actually usable through the resolver because preflight fails with Anthropic'stemperature is deprecated for this modelerror.
This review was created by an AI agent (OpenHands) on behalf of the user.
| "display_name": "Claude Fable 5", | ||
| "llm_config": { | ||
| "model": "litellm_proxy/anthropic/claude-fable-5", | ||
| "temperature": 0.0, |
There was a problem hiding this comment.
🟠 Important: I exercised the resolver as a user would with MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py. The PR now resolves the model ID, but the real preflight call fails with Anthropic invalid_request_error: temperature is deprecated for this model. As configured, this model cannot actually be used by the eval resolver; please remove or adjust the temperature override for this model and re-run the preflight.
|
@OpenHands notice that model is now set up in the proxy see image. but it's failing preflight and tests, please fix it.
|
|
I'm on it! juanmichelini can track my progress at all-hands.dev |
The Anthropic API rejects 'temperature' on claude-fable-5
('temperature is deprecated for this model'), causing preflight and
all integration tests to fail with a 400. Match claude-opus-4-7/4-8
by omitting the temperature field.
Co-authored-by: openhands <openhands@all-hands.dev>
|
Thanks @juanmichelini — fixed in efa47ce. The integration-test failures were all the same Anthropic 400: This comment was created by an AI agent (OpenHands) on behalf of @juanmichelini. |
SummaryAddressed @juanmichelini's PR comment on #3592 requesting a fix for the failing preflight + integration tests on the Checklist
ConcisenessThe diff is a single-line deletion in one file — no extraneous changes, no new tests or unrelated edits. Nothing to revert. CI (preflight + integration tests) will re-run automatically against the proxy; no further action is needed from me unless the next run surfaces a different failure. |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review
Taste Rating: 🟢 Good taste - Elegant, simple solution → Just approve, don't manufacture feedback
Summary
This PR adds a new model configuration entry for claude-fable-5 following the established pattern used throughout the MODELS dictionary. The change is minimal, clean, and follows existing conventions.
Analysis
[CRITICAL ISSUES] - None found
[IMPROVEMENT OPPORTUNITIES] - None found
[STYLE NOTES] - None (code follows existing patterns)
Note on Functional Verification
A separate QA review has already been posted noting that while this config entry is recognized by the resolver, the LiteLLM proxy may reject anthropic/claude-fable-5 as an invalid model name during preflight checks. This is a backend/proxy configuration issue outside the scope of this PR - the local config change is correct.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW - This is a straightforward configuration addition with no architectural changes.
- No breaking changes to existing functionality.
- Follows established patterns in the codebase.
VERDICT:
✅ Worth merging: Code is clean and follows existing patterns. The functional issue noted in the separate QA review is a proxy configuration concern, not a code issue in this PR.
KEY INSIGHT:
Simple dictionary entry addition following established conventions - no issues found.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
⚠️ QA Report: PASS WITH ISSUES
Claude Fable 5 is now accepted by the eval model resolver and passed a real proxy preflight, with one minor config-output mismatch against the PR description.
Does this PR achieve its stated goal?
Yes, for the core goal of adding claude-fable-5 model support. On the base branch, resolving MODEL_IDS=claude-fable-5 failed with “Model ID not found”; on the PR branch, the same real resolver invocation succeeded, reached the LLM proxy, and passed preflight for Claude Fable 5. One minor issue remains: the actual generated models_json does not include the explicit temperature: 0.0 that the PR description says was added.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed and installed the uv-managed environment |
| CI Status | |
| Functional Verification | ✅ New model resolves, emits GitHub Action output, and passes live proxy preflight |
Functional Verification
Test 1: New model is rejected before the PR and works after the PR
Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main, then ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:
=== BASE: MODEL_IDS=claude-fable-5 ===
ERROR: Model ID 'claude-fable-5' not found. Available models: claude-4.5-opus, claude-4.6-opus, claude-opus-4-7, claude-opus-4-8, claude-sonnet-4-5-20250929, claude-sonnet-4-6, converse-nemotron-super-3-120b, deepseek-v3.2-reasoner, deepseek-v4-flash, deepseek-v4-pro, gemini-3-flash, gemini-3.1-pro, gemini-3.5-flash, glm-4.7, glm-5, glm-5.1, gpt-5-3-codex, gpt-5.2, gpt-5.2-codex, gpt-5.2-high-reasoning, gpt-5.4, gpt-5.5, gpt-oss-120b, gpt-oss-20b, kimi-k2-thinking, kimi-k2.5, kimi-k2.6, minimax-m2, minimax-m2.1, minimax-m2.5, minimax-m2.7, minimax-m3, nemotron-3-nano-30b, nemotron-3-super-120b-a12b, nemotron-3-ultra-550b-a55b, nemotron-3-ultra-550b-a55b-or-paid, qwen-3-coder, qwen3-coder-30b-a3b-instruct, qwen3-coder-next, qwen3-max-thinking, qwen3.5-flash, qwen3.6-plus, step-3.7-flash, trinity-large-thinking
BASE_RC=1
This confirms the previous user-facing resolver behavior did not support the requested model ID.
Step 2 — Apply the PR's changes:
Checked out openhands/add-claude-fable-5-model at commit efa47ce29573bdd029cd6c44f440a2b4985c89e1.
Step 3 — Re-run with the fix in place:
Ran tmp=$(mktemp); GITHUB_OUTPUT=$tmp MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py; cat "$tmp":
Resolved 1 model(s): claude-fable-5
Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev
Preflight LLM check for 1 model(s)...
--------------------------------------------------
Checking Claude Fable 5... (4.9s)
✓ Claude Fable 5: OK
--------------------------------------------------
✓ All 1 model(s) passed preflight check
HEAD_RC=0
=== GITHUB_OUTPUT ===
models_json=[{"id":"claude-fable-5","display_name":"Claude Fable 5","llm_config":{"model":"litellm_proxy/anthropic/claude-fable-5"}}]
This shows the new model is now resolved, the live proxy accepts it, and GitHub Action output is emitted for downstream eval jobs.
Test 2: New model works alongside an existing model selection
Ran GITHUB_OUTPUT=$tmp MODEL_IDS=claude-fable-5,claude-sonnet-4-6 uv run python .github/run-eval/resolve_model_config.py; cat "$tmp" on the PR branch:
Resolved 2 model(s): claude-fable-5, claude-sonnet-4-6
Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev
Preflight LLM check for 2 model(s)...
--------------------------------------------------
Checking Claude Fable 5... (5.6s)
✓ Claude Fable 5: OK
Checking Claude Sonnet 4.6... (1.2s)
✓ Claude Sonnet 4.6: OK
--------------------------------------------------
✓ All 2 model(s) passed preflight check
MULTI_RC=0
=== GITHUB_OUTPUT ===
models_json=[{"id":"claude-fable-5","display_name":"Claude Fable 5","llm_config":{"model":"litellm_proxy/anthropic/claude-fable-5"}},{"id":"claude-sonnet-4-6","display_name":"Claude Sonnet 4.6","llm_config":{"model":"litellm_proxy/anthropic/claude-sonnet-4-6","temperature":0.0}}]
This verifies the newly added ID can be used in a realistic multi-model eval selection without breaking an existing model path.
Issues Found
- 🟡 Minor: The PR description says the new model config includes
temperature = 0.0, but the actualmodels_jsonemitted forclaude-fable-5only contains themodelfield. Core model support still works.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
Verdict: PASS WITH ISSUES
| "id": "claude-fable-5", | ||
| "display_name": "Claude Fable 5", | ||
| "llm_config": { | ||
| "model": "litellm_proxy/anthropic/claude-fable-5", |
There was a problem hiding this comment.
🟡 Minor: Functional QA showed the generated models_json for claude-fable-5 contains llm_config with only the model field, while the PR description says this new config includes temperature = 0.0. The model resolves and preflights successfully, so this does not block the core model-support goal, but please either add the explicit temperature or update the PR description if the default is intentional.
This inline QA finding was created by an AI agent (OpenHands) on behalf of the user.
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review
Taste Rating: 🟢 Good taste - Elegant, simple solution → Just approve, don't manufacture feedback
Summary
This PR adds a new model configuration entry for claude-fable-5 following the established pattern used throughout the MODELS dictionary. The change is minimal, clean, and follows existing conventions.
Analysis
[CRITICAL ISSUES] - None found
[IMPROVEMENT OPPORTUNITIES] - None found
[STYLE NOTES] - None (code follows existing patterns exactly)
Verification
- ✅ Model configuration follows the established pattern (id, display_name, llm_config)
- ✅ Correctly omits
temperatureper the PR description (model rejects it) - ✅ Matches
claude-opus-4-7/claude-opus-4-8pattern for models without temperature - ✅ Includes reference link to Anthropic announcement
- ✅ CI: 20 checks passing, build artifacts published
- ✅ QA: Most recent QA report shows PASS WITH ISSUES (preflight succeeds)
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This is a simple, isolated configuration change with no breaking changes. The implementation follows existing patterns and has passed QA validation.
VERDICT:
✅ Worth merging: Core logic is sound, follows conventions, QA validated
KEY INSIGHT:
Minimal, well-scoped change adding one model entry without side effects — exactly the kind of PR that doesn't need extensive review.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Exercised the eval model resolver as a real workflow user would; claude-fable-5 is now accepted, emitted to GITHUB_OUTPUT, and passes live LLM proxy preflight.
Does this PR achieve its stated goal?
Yes. The goal was to add claude-fable-5 support to the eval model resolver. On base main, resolving MODEL_IDS='claude-fable-5' failed with “Model ID 'claude-fable-5' not found”; on the PR commit, the same resolver run succeeded, contacted the OpenHands LLM proxy, preflighted Claude Fable 5 successfully, and wrote the expected model config without temperature to GITHUB_OUTPUT.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully and installed the project environment. |
| CI Status | gh pr checks are green, but PR Description Check and Review Thread Gate are failing and this QA check was in progress at the time checked. |
| Functional Verification | ✅ Base rejects the model; PR resolves it and live preflight passes. |
Functional Verification
Test 1: Resolve claude-fable-5 through the actual eval resolver
Step 1 — Reproduce / establish baseline without the fix:
Checked out origin/main at a8dad1b0, then ran:
MODEL_IDS='claude-fable-5' uv run python .github/run-eval/resolve_model_config.pyOutput:
ERROR: Model ID 'claude-fable-5' not found. Available models: claude-4.5-opus, claude-4.6-opus, claude-opus-4-7, claude-opus-4-8, claude-sonnet-4-5-20250929, claude-sonnet-4-6, converse-nemotron-super-3-120b, deepseek-v3.2-reasoner, deepseek-v4-flash, deepseek-v4-pro, gemini-3-flash, gemini-3.1-pro, gemini-3.5-flash, glm-4.7, glm-5, glm-5.1, gpt-5-3-codex, gpt-5.2, gpt-5.2-codex, gpt-5.2-high-reasoning, gpt-5.4, gpt-5.5, gpt-oss-120b, gpt-oss-20b, kimi-k2-thinking, kimi-k2.5, kimi-k2.6, minimax-m2, minimax-m2.1, minimax-m2.5, minimax-m2.7, minimax-m3, nemotron-3-nano-30b, nemotron-3-super-120b-a12b, nemotron-3-ultra-550b-a55b, nemotron-3-ultra-550b-a55b-or-paid, qwen-3-coder, qwen3-coder-30b-a3b-instruct, qwen3-coder-next, qwen3-max-thinking, qwen3.5-flash, qwen3.6-plus, step-3.7-flash, trinity-large-thinking
This confirms the pre-PR user-facing behavior: the resolver cannot select claude-fable-5 for eval runs.
Step 2 — Apply the PR's changes:
Checked out PR commit efa47ce29573bdd029cd6c44f440a2b4985c89e1.
Step 3 — Re-run with the fix in place:
Ran the same resolver flow, this time with GITHUB_OUTPUT set to emulate the GitHub Actions consumer:
rm -f /tmp/qa-resolve-output.txt
GITHUB_OUTPUT=/tmp/qa-resolve-output.txt MODEL_IDS='claude-fable-5' uv run python .github/run-eval/resolve_model_config.py
cat /tmp/qa-resolve-output.txtOutput:
Resolved 1 model(s): claude-fable-5
Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev
Preflight LLM check for 1 model(s)...
--------------------------------------------------
Checking Claude Fable 5... (5.5s)
✓ Claude Fable 5: OK
--------------------------------------------------
✓ All 1 model(s) passed preflight check
models_json=[{"id":"claude-fable-5","display_name":"Claude Fable 5","llm_config":{"model":"litellm_proxy/anthropic/claude-fable-5"}}]
This verifies the changed behavior end-to-end: the resolver accepts the new model ID, emits the downstream workflow JSON, omits temperature in the emitted config, and the live proxy accepts the model during preflight.
Issues Found
None from functional QA.
This review was created by an AI agent (OpenHands) on behalf of the user.

Description
Add support for Claude Fable 5 model following ADDINGMODELS.md guidelines.
Changes
claude-fable-5model to the MODELS dictionary inresolve_model_config.pyid: claude-fable-5display_name: Claude Fable 5llm_config: model = litellm_proxy/anthropic/claude-fable-5temperatureis omitted because the model rejects it (temperature is deprecated for this model), matchingclaude-opus-4-7/claude-opus-4-8References
Testing
All existing tests pass, and the new model configuration validates correctly with Pydantic.
Issue
Fixes #3591
Checklist
@juanmichelini can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:efa47ce-pythonRun
All tags pushed for this build
About Multi-Architecture Support
efa47ce-python) is a multi-arch manifest supporting both amd64 and arm64efa47ce-python-amd64) are also available if needed