Skip to content

feat(models): add claude-fable-5 model support#3592

Open
juanmichelini wants to merge 4 commits into
mainfrom
openhands/add-claude-fable-5-model
Open

feat(models): add claude-fable-5 model support#3592
juanmichelini wants to merge 4 commits into
mainfrom
openhands/add-claude-fable-5-model

Conversation

@juanmichelini

@juanmichelini juanmichelini commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Description

Add support for Claude Fable 5 model following ADDINGMODELS.md guidelines.

Changes

  • Added claude-fable-5 model to the MODELS dictionary in resolve_model_config.py
  • Model configuration includes:
    • id: claude-fable-5
    • display_name: Claude Fable 5
    • llm_config: model = litellm_proxy/anthropic/claude-fable-5
    • Note: temperature is omitted because the model rejects it (temperature is deprecated for this model), matching claude-opus-4-7/claude-opus-4-8

References

Testing

All existing tests pass, and the new model configuration validates correctly with Pydantic.

Issue

Fixes #3591

Checklist

  • I've added appropriate tests to verify the changes
  • I've run the relevant tests and they pass
  • I've followed the repository's guidelines for adding new models

@juanmichelini can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:efa47ce-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-efa47ce-python \
  ghcr.io/openhands/agent-server:efa47ce-python

All tags pushed for this build

ghcr.io/openhands/agent-server:efa47ce-golang-amd64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-golang-amd64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-golang-amd64
ghcr.io/openhands/agent-server:efa47ce-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:efa47ce-golang-arm64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-golang-arm64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-golang-arm64
ghcr.io/openhands/agent-server:efa47ce-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:efa47ce-java-amd64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-java-amd64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-java-amd64
ghcr.io/openhands/agent-server:efa47ce-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:efa47ce-java-arm64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-java-arm64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-java-arm64
ghcr.io/openhands/agent-server:efa47ce-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:efa47ce-python-amd64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-python-amd64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-python-amd64
ghcr.io/openhands/agent-server:efa47ce-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:efa47ce-python-arm64
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-python-arm64
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-python-arm64
ghcr.io/openhands/agent-server:efa47ce-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:efa47ce-golang
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-golang
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-golang
ghcr.io/openhands/agent-server:efa47ce-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:efa47ce-java
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-java
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-java
ghcr.io/openhands/agent-server:efa47ce-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:efa47ce-python
ghcr.io/openhands/agent-server:efa47ce29573bdd029cd6c44f440a2b4985c89e1-python
ghcr.io/openhands/agent-server:openhands-add-claude-fable-5-model-python
ghcr.io/openhands/agent-server:efa47ce-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., efa47ce-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., efa47ce-python-amd64) are also available if needed

Add support for Claude Fable 5 model following ADDINGMODELS.md guidelines.
The model is added to the MODELS dictionary in resolve_model_config.py.

References:
- https://platform.claude.com/docs/en/about-claude/models/overview
- https://www.anthropic.com/news/claude-fable-5-mythos-5

This change adds the model without modifying existing entries.
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@juanmichelini juanmichelini requested review from neubig and xingyaoww June 9, 2026 19:53
@juanmichelini juanmichelini marked this pull request as ready for review June 9, 2026 19:53

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ QA Report: FAIL

The new model entry is recognized by the resolver, but a real resolver/preflight run still aborts because the LiteLLM proxy rejects anthropic/claude-fable-5 as an invalid model name.

Does this PR achieve its stated goal?

No. The stated goal is to add usable claude-fable-5 model support for eval runs; the PR does move the behavior from “unknown model ID” to “resolved model,” but exercising the actual eval resolver with MODEL_IDS=claude-fable-5 fails during the required preflight check. As a result, a real user still cannot run an eval with this model from this configuration.

Phase Result
Environment Setup make build completed and installed the uv environment successfully.
CI Status 🟡 gh pr checks reported 20 successful, 1 skipped, 8 pending, and 0 failing checks at QA time.
Functional Verification ❌ Base rejects the model as expected; PR recognizes it but aborts with proxy invalid-model error.
Functional Verification

Test 1: Resolve and preflight claude-fable-5

Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main and ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:

ERROR: Model ID 'claude-fable-5' not found. Available models: claude-4.5-opus, claude-4.6-opus, claude-opus-4-7, claude-opus-4-8, claude-sonnet-4-5-20250929, claude-sonnet-4-6, converse-nemotron-super-3-120b, deepseek-v3.2-reasoner, deepseek-v4-flash, deepseek-v4-pro, gemini-3-flash, gemini-3.1-pro, gemini-3.5-flash, glm-4.7, glm-5, glm-5.1, gpt-5-3-codex, gpt-5.2, gpt-5.2-codex, gpt-5.2-high-reasoning, gpt-5.4, gpt-5.5, gpt-oss-120b, gpt-oss-20b, kimi-k2-thinking, kimi-k2.5, kimi-k2.6, minimax-m2, minimax-m2.1, minimax-m2.5, minimax-m2.7, minimax-m3, nemotron-3-nano-30b, nemotron-3-super-120b-a12b, nemotron-3-ultra-550b-a55b, nemotron-3-ultra-550b-a55b-or-paid, qwen-3-coder, qwen3-coder-30b-a3b-instruct, qwen3-coder-next, qwen3-max-thinking, qwen3.5-flash, qwen3.6-plus, step-3.7-flash, trinity-large-thinking
BASELINE_RC:1

This confirms the previous user-facing behavior: the resolver had no claude-fable-5 entry.

Step 2 — Apply the PR's changes:
Checked out PR commit f2a8f440871c2d59452fec56f8142b19c455f05f.

Step 3 — Re-run with the fix in place:
Ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:

Resolved 1 model(s): claude-fable-5

Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev

Preflight LLM check for 1 model(s)...
--------------------------------------------------
  Checking Claude Fable 5...
(2.1s)
  ✗ Claude Fable 5: Bad request - litellm.BadRequestError: Litellm_proxyException - /chat/completions: Invalid model name passed in model=anthropic/claude-fable-5. Call `/v1/models` to view available models for your key.
--------------------------------------------------
✗ Some models failed preflight check
Evaluation aborted to avoid wasting compute resources.

ERROR: Preflight LLM check failed
PR_RC:1

This shows the new entry is picked up, but the actual user flow still fails before an eval can start because the proxy does not accept the configured model name.

Issues Found

  • 🔴 Blocker: claude-fable-5 is not usable end-to-end through the eval resolver yet; preflight aborts with Invalid model name passed in model=anthropic/claude-fable-5.

This review was created by an AI agent (OpenHands) on behalf of the user.

Comment thread .github/run-eval/resolve_model_config.py
@juanmichelini juanmichelini requested review from all-hands-bot and removed request for neubig and xingyaoww June 9, 2026 20:13

all-hands-bot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Taste Rating: 🟢 Good taste - Elegant, simple solution → Just approve, don't manufacture feedback


Summary

This PR adds a new model configuration entry for claude-fable-5 following the established pattern used throughout the MODELS dictionary. The change is minimal, clean, and follows existing conventions.


Analysis

[CRITICAL ISSUES] - None found

[IMPROVEMENT OPPORTUNITIES] - None found

[STYLE NOTES] - None (code follows existing patterns)


Note on Functional Verification

A separate QA review (PR #4461987132) has already been posted noting that while this config entry is recognized by the resolver, the LiteLLM proxy rejects anthropic/claude-fable-5 as an invalid model name during preflight checks. This is a backend/proxy configuration issue outside the scope of this PR - the local config change is correct.


[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW
  • This is a straightforward configuration addition with no architectural changes.
  • No breaking changes to existing functionality.
  • Follows established patterns in the codebase.

VERDICT:
Worth merging: Code is clean and follows existing patterns. The functional issue noted in the separate QA review is a proxy configuration concern, not a code issue in this PR.

KEY INSIGHT:
Simple dictionary entry addition following established conventions - no issues found.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🧪 Integration Tests Results

Overall Success Rate: 0.0%
Total Cost: $0.00
Models Tested: 1
Timestamp: 2026-06-09 21:39:52 UTC

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_anthropic_claude_fable_5 0.0% 0/17 2 19 $0.00 0

📋 Detailed Results

litellm_proxy_anthropic_claude_fable_5

  • Success Rate: 0.0% (0/17)
  • Total Cost: $0.00
  • Token Usage: 0
  • Run Suffix: litellm_proxy_anthropic_claude_fable_5_f2a8f44_claude_fable_5_run_N19_20260609_213908
  • Skipped Tests: 2

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.
  • c01_thinking_block_condenser: Model litellm_proxy/anthropic/claude-fable-5 does not support extended thinking (produces reasoning items instead of thinking blocks)

Failed Tests:

  • t03_jupyter_write_file: Test execution failed: Conversation run failed for id=0612571f-bc59-47f9-ac8f-3fccfe7ac681: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjpxpNwYc8RVmm4DAy"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjpxpNwYc8RVmm4DAy"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • t04_git_staging: Test execution failed: Conversation run failed for id=326252bb-b1ea-46fb-808a-e7f70a82b1a0: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjq32fwEkhWro9K2TV"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjq32fwEkhWro9K2TV"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • c05_size_condenser: Test execution failed: Conversation run failed for id=121d661e-1c42-45d2-9f95-1dfba3610ded: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjvg7B7tod42qcAWrT"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQjvg7B7tod42qcAWrT"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • c03_delayed_condensation: Test execution failed: Conversation run failed for id=fc6cc9c4-4942-4c57-acd9-3b52d796c4dd: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk1Nt2F4JJBM74Y4Go"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk1Nt2F4JJBM74Y4Go"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • t06_github_pr_browsing: Test execution failed: Conversation run failed for id=1e256c55-8d77-47b2-8fdb-c98f44180cf7: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk2GxRxudfb8wsAwmH"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk2GxRxudfb8wsAwmH"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • b01_no_premature_implementation: Test execution failed: Conversation run failed for id=61bda10a-3690-4129-a17d-cbbabd4eb64a: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk2YKyR4RSULiJ5dpn"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk2YKyR4RSULiJ5dpn"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • t05_simple_browsing: Test execution failed: Conversation run failed for id=d2b4d17a-55b8-4c6f-a501-81cb6ad7b462: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk3YrSisCsW4DfUN1d"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk3YrSisCsW4DfUN1d"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • t07_interactive_commands: Test execution failed: Conversation run failed for id=13e4b382-52f1-4128-af08-cf0f255c294e: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk8TmohUG8c5NVdRYx"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk8TmohUG8c5NVdRYx"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • t02_add_bash_hello: Test execution failed: Conversation run failed for id=51984e74-7d40-42a8-b7a1-7482c381e49c: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk85hnbP5HpnhrTSga"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQk85hnbP5HpnhrTSga"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • t01_fix_simple_typo: Test execution failed: Conversation run failed for id=1a48c3a5-9973-4177-8ab4-64b7295ec89b: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkCUdGm98ao35ggee4"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkCUdGm98ao35ggee4"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • c04_token_condenser: Test execution failed: Conversation run failed for id=45103a50-d0a3-4060-8334-b8dbbf3da282: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkFUx3Jd6Yxe2vc6QF"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkFUx3Jd6Yxe2vc6QF"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • b02_no_oververification: Test execution failed: Conversation run failed for id=8d866b19-ed02-4bf7-a7d5-7717c52a78fc: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkJ11X6xETxibv3N3X"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkJ11X6xETxibv3N3X"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • c02_hard_context_reset: Test execution failed: Conversation run failed for id=37caf95d-6914-454c-9924-2271d323a672: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkJpsNWWizNmSLGfqj"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkJpsNWWizNmSLGfqj"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • t09_invoke_skill: Test execution failed: Conversation run failed for id=726fe509-2e51-4e84-994f-a8461ae279d6: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkPJ15TUcYFeYo7rKs"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkPJ15TUcYFeYo7rKs"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • b04_each_tool_call_has_a_concise_explanation: Test execution failed: Conversation run failed for id=bbb30ddd-b3b0-41fa-9781-0ff234204ac9: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkPXtKMrVVcbzDRU5y"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkPXtKMrVVcbzDRU5y"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • b03_no_useless_backward_compatibility: Test execution failed: Conversation run failed for id=7f16ae9a-e3bc-46a1-82d5-7688813273f1: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkXX74Rw6pwtD27uKd"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkXX74Rw6pwtD27uKd"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)
  • b05_do_not_create_redundant_files: Test execution failed: Conversation run failed for id=2c379472-e38b-4425-a7f2-8a5d7be99b91: litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkb9dya21nEugoGuPz"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}]. Received Model Group=anthropic/claude-fable-5\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"temperature is deprecated for this model."},"request_id":"req_011CbtQkb9dya21nEugoGuPz"}No fallback model group found for original model_group=anthropic/claude-fable-5. Fallbacks=[{'minimax-m2.5': ['minimax-m2.5-api']}] LiteLLM Retried: 3 times', 'type': None, 'param': None, 'code': '400'}} (Cost: $0.00)

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ QA Report: FAIL

The new model ID resolves on the PR branch, but the actual resolver preflight rejects the configured Claude Fable 5 request, so the model is not usable end-to-end.

Does this PR achieve its stated goal?

No. The stated goal was to add support for claude-fable-5; compared with main, the PR does make the model ID recognizable, but running the real resolver with MODEL_IDS=claude-fable-5 fails during the LLM preflight call. The proxy reaches Anthropic and returns invalid_request_error because temperature is deprecated for this model, so evaluations would still abort before running.

Phase Result
Environment Setup make build completed successfully
CI Status 🟡 33 successful, 3 skipped, 1 pending, 0 failing when checked
Functional Verification ❌ New model resolves but fails real preflight execution
Functional Verification

Test 1: Resolve and preflight claude-fable-5

Step 1 — Reproduce / establish baseline without the PR:
Ran git switch --detach origin/main && MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:

ERROR: Model ID 'claude-fable-5' not found. Available models: ...
EXIT:1

This confirms the baseline behavior: users could not select claude-fable-5 at all.

Step 2 — Apply the PR's changes:
Checked out PR commit f2a8f440871c2d59452fec56f8142b19c455f05f.

Step 3 — Re-run with the PR in place:
Ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:

Resolved 1 model(s): claude-fable-5

Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev

Preflight LLM check for 1 model(s)...
  Checking Claude Fable 5...
  ✗ Claude Fable 5: Bad request - litellm.BadRequestError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"`temperature` is deprecated for this model."}, ... }}}
✗ Some models failed preflight check
Evaluation aborted to avoid wasting compute resources.

ERROR: Preflight LLM check failed
EXIT:1

This shows the PR only fixes lookup. The real user-facing flow still fails because the configured request includes temperature for a model that rejects it.

Issues Found

  • 🟠 Issue: claude-fable-5 is not actually usable through the resolver because preflight fails with Anthropic's temperature is deprecated for this model error.

This review was created by an AI agent (OpenHands) on behalf of the user.

"display_name": "Claude Fable 5",
"llm_config": {
"model": "litellm_proxy/anthropic/claude-fable-5",
"temperature": 0.0,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: I exercised the resolver as a user would with MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py. The PR now resolves the model ID, but the real preflight call fails with Anthropic invalid_request_error: temperature is deprecated for this model. As configured, this model cannot actually be used by the eval resolver; please remove or adjust the temperature override for this model and re-run the preflight.

@juanmichelini

Copy link
Copy Markdown
Collaborator Author

@OpenHands notice that model is now set up in the proxy see image. but it's failing preflight and tests, please fix it.

image

@openhands-ai

openhands-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown

I'm on it! juanmichelini can track my progress at all-hands.dev

The Anthropic API rejects 'temperature' on claude-fable-5
('temperature is deprecated for this model'), causing preflight and
all integration tests to fail with a 400. Match claude-opus-4-7/4-8
by omitting the temperature field.

Co-authored-by: openhands <openhands@all-hands.dev>

Copy link
Copy Markdown
Collaborator Author

Thanks @juanmichelini — fixed in efa47ce.

The integration-test failures were all the same Anthropic 400: temperature is deprecated for this model. Other newer Claude entries in resolve_model_config.py (claude-opus-4-7, claude-opus-4-8) already drop temperature for the same reason, so I removed the temperature: 0.0 override from the claude-fable-5 config to match. Preflight + tests should clear once the proxy accepts the request.

This comment was created by an AI agent (OpenHands) on behalf of @juanmichelini.

@openhands-ai

openhands-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Summary

Addressed @juanmichelini's PR comment on #3592 requesting a fix for the failing preflight + integration tests on the claude-fable-5 model.

Checklist

  • Diagnosed root cause — All 17 failing integration tests returned the same Anthropic 400: temperature is deprecated for this model. The claude-fable-5 entry in .github/run-eval/resolve_model_config.py was passing temperature: 0.0.
  • Applied minimal fix — Removed only the temperature: 0.0 line from the claude-fable-5 llm_config, matching how the other newer Claude entries (claude-opus-4-7, claude-opus-4-8) are already configured.
  • Committed and pushed to openhands/add-claude-fable-5-model as commit efa47ce2 with a Co-authored-by trailer.
  • Replied on the PR explaining the root cause and pointing at the fix commit, with the required AI-agent disclosure.

Conciseness

The diff is a single-line deletion in one file — no extraneous changes, no new tests or unrelated edits. Nothing to revert.

CI (preflight + integration tests) will re-run automatically against the proxy; no further action is needed from me unless the next run surfaces a different failure.

all-hands-bot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Taste Rating: 🟢 Good taste - Elegant, simple solution → Just approve, don't manufacture feedback


Summary

This PR adds a new model configuration entry for claude-fable-5 following the established pattern used throughout the MODELS dictionary. The change is minimal, clean, and follows existing conventions.


Analysis

[CRITICAL ISSUES] - None found

[IMPROVEMENT OPPORTUNITIES] - None found

[STYLE NOTES] - None (code follows existing patterns)


Note on Functional Verification

A separate QA review has already been posted noting that while this config entry is recognized by the resolver, the LiteLLM proxy may reject anthropic/claude-fable-5 as an invalid model name during preflight checks. This is a backend/proxy configuration issue outside the scope of this PR - the local config change is correct.


[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW
  • This is a straightforward configuration addition with no architectural changes.
  • No breaking changes to existing functionality.
  • Follows established patterns in the codebase.

VERDICT:
Worth merging: Code is clean and follows existing patterns. The functional issue noted in the separate QA review is a proxy configuration concern, not a code issue in this PR.

KEY INSIGHT:
Simple dictionary entry addition following established conventions - no issues found.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

Claude Fable 5 is now accepted by the eval model resolver and passed a real proxy preflight, with one minor config-output mismatch against the PR description.

Does this PR achieve its stated goal?

Yes, for the core goal of adding claude-fable-5 model support. On the base branch, resolving MODEL_IDS=claude-fable-5 failed with “Model ID not found”; on the PR branch, the same real resolver invocation succeeded, reached the LLM proxy, and passed preflight for Claude Fable 5. One minor issue remains: the actual generated models_json does not include the explicit temperature: 0.0 that the PR description says was added.

Phase Result
Environment Setup make build completed and installed the uv-managed environment
CI Status ⚠️ 32 successful, 1 failing review-thread gate, 1 pending QA check, 1 skipped when checked
Functional Verification ✅ New model resolves, emits GitHub Action output, and passes live proxy preflight
Functional Verification

Test 1: New model is rejected before the PR and works after the PR

Step 1 — Reproduce / establish baseline (without the fix):
Checked out origin/main, then ran MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py:

=== BASE: MODEL_IDS=claude-fable-5 ===
ERROR: Model ID 'claude-fable-5' not found. Available models: claude-4.5-opus, claude-4.6-opus, claude-opus-4-7, claude-opus-4-8, claude-sonnet-4-5-20250929, claude-sonnet-4-6, converse-nemotron-super-3-120b, deepseek-v3.2-reasoner, deepseek-v4-flash, deepseek-v4-pro, gemini-3-flash, gemini-3.1-pro, gemini-3.5-flash, glm-4.7, glm-5, glm-5.1, gpt-5-3-codex, gpt-5.2, gpt-5.2-codex, gpt-5.2-high-reasoning, gpt-5.4, gpt-5.5, gpt-oss-120b, gpt-oss-20b, kimi-k2-thinking, kimi-k2.5, kimi-k2.6, minimax-m2, minimax-m2.1, minimax-m2.5, minimax-m2.7, minimax-m3, nemotron-3-nano-30b, nemotron-3-super-120b-a12b, nemotron-3-ultra-550b-a55b, nemotron-3-ultra-550b-a55b-or-paid, qwen-3-coder, qwen3-coder-30b-a3b-instruct, qwen3-coder-next, qwen3-max-thinking, qwen3.5-flash, qwen3.6-plus, step-3.7-flash, trinity-large-thinking
BASE_RC=1

This confirms the previous user-facing resolver behavior did not support the requested model ID.

Step 2 — Apply the PR's changes:
Checked out openhands/add-claude-fable-5-model at commit efa47ce29573bdd029cd6c44f440a2b4985c89e1.

Step 3 — Re-run with the fix in place:
Ran tmp=$(mktemp); GITHUB_OUTPUT=$tmp MODEL_IDS=claude-fable-5 uv run python .github/run-eval/resolve_model_config.py; cat "$tmp":

Resolved 1 model(s): claude-fable-5

Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev

Preflight LLM check for 1 model(s)...
--------------------------------------------------
  Checking Claude Fable 5... (4.9s)
  ✓ Claude Fable 5: OK
--------------------------------------------------
✓ All 1 model(s) passed preflight check

HEAD_RC=0
=== GITHUB_OUTPUT ===
models_json=[{"id":"claude-fable-5","display_name":"Claude Fable 5","llm_config":{"model":"litellm_proxy/anthropic/claude-fable-5"}}]

This shows the new model is now resolved, the live proxy accepts it, and GitHub Action output is emitted for downstream eval jobs.

Test 2: New model works alongside an existing model selection

Ran GITHUB_OUTPUT=$tmp MODEL_IDS=claude-fable-5,claude-sonnet-4-6 uv run python .github/run-eval/resolve_model_config.py; cat "$tmp" on the PR branch:

Resolved 2 model(s): claude-fable-5, claude-sonnet-4-6

Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev

Preflight LLM check for 2 model(s)...
--------------------------------------------------
  Checking Claude Fable 5... (5.6s)
  ✓ Claude Fable 5: OK
  Checking Claude Sonnet 4.6... (1.2s)
  ✓ Claude Sonnet 4.6: OK
--------------------------------------------------
✓ All 2 model(s) passed preflight check

MULTI_RC=0
=== GITHUB_OUTPUT ===
models_json=[{"id":"claude-fable-5","display_name":"Claude Fable 5","llm_config":{"model":"litellm_proxy/anthropic/claude-fable-5"}},{"id":"claude-sonnet-4-6","display_name":"Claude Sonnet 4.6","llm_config":{"model":"litellm_proxy/anthropic/claude-sonnet-4-6","temperature":0.0}}]

This verifies the newly added ID can be used in a realistic multi-model eval selection without breaking an existing model path.

Issues Found

  • 🟡 Minor: The PR description says the new model config includes temperature = 0.0, but the actual models_json emitted for claude-fable-5 only contains the model field. Core model support still works.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

Verdict: PASS WITH ISSUES

"id": "claude-fable-5",
"display_name": "Claude Fable 5",
"llm_config": {
"model": "litellm_proxy/anthropic/claude-fable-5",

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Functional QA showed the generated models_json for claude-fable-5 contains llm_config with only the model field, while the PR description says this new config includes temperature = 0.0. The model resolves and preflights successfully, so this does not block the core model-support goal, but please either add the explicit temperature or update the PR description if the default is intentional.

This inline QA finding was created by an AI agent (OpenHands) on behalf of the user.

all-hands-bot commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Review complete.

This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Taste Rating: 🟢 Good taste - Elegant, simple solution → Just approve, don't manufacture feedback


Summary

This PR adds a new model configuration entry for claude-fable-5 following the established pattern used throughout the MODELS dictionary. The change is minimal, clean, and follows existing conventions.


Analysis

[CRITICAL ISSUES] - None found

[IMPROVEMENT OPPORTUNITIES] - None found

[STYLE NOTES] - None (code follows existing patterns exactly)


Verification

  • ✅ Model configuration follows the established pattern (id, display_name, llm_config)
  • ✅ Correctly omits temperature per the PR description (model rejects it)
  • ✅ Matches claude-opus-4-7/claude-opus-4-8 pattern for models without temperature
  • ✅ Includes reference link to Anthropic announcement
  • ✅ CI: 20 checks passing, build artifacts published
  • ✅ QA: Most recent QA report shows PASS WITH ISSUES (preflight succeeds)

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟢 LOW

This is a simple, isolated configuration change with no breaking changes. The implementation follows existing patterns and has passed QA validation.


VERDICT:
Worth merging: Core logic is sound, follows conventions, QA validated

KEY INSIGHT:
Minimal, well-scoped change adding one model entry without side effects — exactly the kind of PR that doesn't need extensive review.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Exercised the eval model resolver as a real workflow user would; claude-fable-5 is now accepted, emitted to GITHUB_OUTPUT, and passes live LLM proxy preflight.

Does this PR achieve its stated goal?

Yes. The goal was to add claude-fable-5 support to the eval model resolver. On base main, resolving MODEL_IDS='claude-fable-5' failed with “Model ID 'claude-fable-5' not found”; on the PR commit, the same resolver run succeeded, contacted the OpenHands LLM proxy, preflighted Claude Fable 5 successfully, and wrote the expected model config without temperature to GITHUB_OUTPUT.

Phase Result
Environment Setup make build completed successfully and installed the project environment.
CI Status ⚠️ Product/test/build checks shown by gh pr checks are green, but PR Description Check and Review Thread Gate are failing and this QA check was in progress at the time checked.
Functional Verification ✅ Base rejects the model; PR resolves it and live preflight passes.
Functional Verification

Test 1: Resolve claude-fable-5 through the actual eval resolver

Step 1 — Reproduce / establish baseline without the fix:
Checked out origin/main at a8dad1b0, then ran:

MODEL_IDS='claude-fable-5' uv run python .github/run-eval/resolve_model_config.py

Output:

ERROR: Model ID 'claude-fable-5' not found. Available models: claude-4.5-opus, claude-4.6-opus, claude-opus-4-7, claude-opus-4-8, claude-sonnet-4-5-20250929, claude-sonnet-4-6, converse-nemotron-super-3-120b, deepseek-v3.2-reasoner, deepseek-v4-flash, deepseek-v4-pro, gemini-3-flash, gemini-3.1-pro, gemini-3.5-flash, glm-4.7, glm-5, glm-5.1, gpt-5-3-codex, gpt-5.2, gpt-5.2-codex, gpt-5.2-high-reasoning, gpt-5.4, gpt-5.5, gpt-oss-120b, gpt-oss-20b, kimi-k2-thinking, kimi-k2.5, kimi-k2.6, minimax-m2, minimax-m2.1, minimax-m2.5, minimax-m2.7, minimax-m3, nemotron-3-nano-30b, nemotron-3-super-120b-a12b, nemotron-3-ultra-550b-a55b, nemotron-3-ultra-550b-a55b-or-paid, qwen-3-coder, qwen3-coder-30b-a3b-instruct, qwen3-coder-next, qwen3-max-thinking, qwen3.5-flash, qwen3.6-plus, step-3.7-flash, trinity-large-thinking

This confirms the pre-PR user-facing behavior: the resolver cannot select claude-fable-5 for eval runs.

Step 2 — Apply the PR's changes:
Checked out PR commit efa47ce29573bdd029cd6c44f440a2b4985c89e1.

Step 3 — Re-run with the fix in place:
Ran the same resolver flow, this time with GITHUB_OUTPUT set to emulate the GitHub Actions consumer:

rm -f /tmp/qa-resolve-output.txt
GITHUB_OUTPUT=/tmp/qa-resolve-output.txt MODEL_IDS='claude-fable-5' uv run python .github/run-eval/resolve_model_config.py
cat /tmp/qa-resolve-output.txt

Output:

Resolved 1 model(s): claude-fable-5

Checking proxy connectivity: https://llm-proxy.app.all-hands.dev
✓ Proxy reachable at https://llm-proxy.app.all-hands.dev

Preflight LLM check for 1 model(s)...
--------------------------------------------------
  Checking Claude Fable 5... (5.5s)
  ✓ Claude Fable 5: OK
--------------------------------------------------
✓ All 1 model(s) passed preflight check

models_json=[{"id":"claude-fable-5","display_name":"Claude Fable 5","llm_config":{"model":"litellm_proxy/anthropic/claude-fable-5"}}]

This verifies the changed behavior end-to-end: the resolver accepts the new model ID, emits the downstream workflow JSON, omits temperature in the emitted config, and the live proxy accepts the model during preflight.

Issues Found

None from functional QA.

This review was created by an AI agent (OpenHands) on behalf of the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Following ADDINGMODELS.md add claude-fable-5

3 participants