Conversation
There was a problem hiding this comment.
Pull request overview
This PR moves LLM configuration and execution concerns “up” into LMI by introducing typed, opinionated config models (ModelSpec, LLMConfig), shifting retry/fallback behavior into LMI (with clearer error semantics), and updating LDP agents/modules/tests to consume the new config shape.
Changes:
- Replace dict-shaped
llm_modelconfiguration with typedllm_config(LLMConfig/ModelSpec) across agents, graph modules, and tests. - Introduce LMI-owned retry/fallback policy and new error surfaces (e.g.,
AllModelsExhaustedError,ModelRefusalError) with targeted unit tests. - Update VCR cassettes to reflect new default request parameters (notably
max_tokens=4096) and updated client/library versions.
Reviewed changes
Copilot reviewed 52 out of 54 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_rollouts.py | Updates fallback behavior assertions to expect AllModelsExhaustedError and uses llm_config legacy-coerce shape. |
| tests/test_optimizer.py | Migrates optimizer tests to LLMConfig.coerce() and llm_config= construction. |
| tests/test_ops.py | Switches op tests to LLMConfig inputs and updates gradient expectations when config becomes a scalar leaf. |
| tests/test_modules.py | Updates module configs/instantiation to use llm_config defaults from ReActAgent. |
| tests/test_envs.py | Updates agent construction to llm_config and error expectations to AllModelsExhaustedError. |
| tests/test_agents.py | Migrates agents to llm_config, updates serialization assertions and gradient assertions. |
| tests/cassettes/TestSimpleAgent.test_dummyenv[gpt-4o-mini-2024-07-18].yaml | VCR update for new request shape (adds max_tokens) and updated client headers. |
| tests/cassettes/TestSimpleAgent.test_dummyenv[claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens, tool schema changes). |
| tests/cassettes/TestSimpleAgent.test_agent_grad[gpt-4o-mini-2024-07-18].yaml | VCR update for new request shape (adds max_tokens) and updated client headers. |
| tests/cassettes/TestSimpleAgent.test_agent_grad[claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens, tool schema changes). |
| tests/cassettes/TestReActAgent.test_react_dummyenv[True-gpt-4-turbo].yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestReActAgent.test_react_dummyenv[True-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestReActAgent.test_react_dummyenv[False-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestReActAgent.test_agent_grad[True-gpt-4-turbo].yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestReActAgent.test_agent_grad[True-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestReActAgent.test_agent_grad[False-claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens) and payload adjustments. |
| tests/cassettes/TestNoToolsSimpleAgent.test_dummyenv[claude-haiku-4-5-20251001].yaml | VCR update for new request shape (max_tokens). |
| tests/cassettes/TestMemoryAgent.test_agent_grad.yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestAgentState.test_no_state_mutation[agent1].yaml | VCR update for new request shape (adds max_tokens). |
| tests/cassettes/TestAgentState.test_no_state_mutation[agent0].yaml | VCR update for new request shape (adds max_tokens). |
| src/ldp/graph/modules/thought.py | Switches module to accept LLMConfig instead of dict config. |
| src/ldp/graph/modules/reflect.py | Replaces dict llm_model field with LLMConfigField and updates wiring. |
| src/ldp/graph/modules/react.py | Uses LLMConfig.with_extra_params() to set stop sequences without mutating dicts. |
| src/ldp/graph/modules/llm_call.py | Updates parsed-call module to use LLMConfig and typed ConfigOp. |
| src/ldp/graph/common_ops.py | Changes LLMCallOp.forward() signature to accept LLMConfig and constructs LiteLLMModel via llm_config. |
| src/ldp/agent/tree_of_thoughts_agent.py | Migrates agent config field to LLMConfigField and updates call sites. |
| src/ldp/agent/simple_agent.py | Migrates agent config field to LLMConfigField and updates internal ConfigOp. |
| src/ldp/agent/react_agent.py | Migrates agent config field to LLMConfigField and updates module construction. |
| packages/lmi/tests/test_retry.py | Adds unit tests for retry/fallback classification and backoff bounds. |
| packages/lmi/tests/test_llms.py | Updates tests for new fallback error semantics, dispatch behavior, and Responses integration toggled per-model. |
| packages/lmi/tests/test_litellm_patches.py | Removes tests for the removed provider-400 retry patch. |
| packages/lmi/tests/test_dispatch.py | Adds end-to-end tests for LMI dispatch + retry/fallback loop (mocking litellm.acompletion). |
| packages/lmi/tests/test_cost_tracking.py | Simplifies tests by removing Router bypass paths and aligning to new config behavior. |
| packages/lmi/tests/test_config.py | Adds comprehensive tests for ModelSpec, legacy config translation, and LLMConfig.coerce / LLMConfigField. |
| packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_basic_call.yaml | Updates Responses API VCR cassette. |
| packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_multi_turn_stateful.yaml | Updates Responses API multi-turn VCR cassette. |
| packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_responses_api_off_ignores_response_id.yaml | Updates VCR cassette for non-Responses model behavior. |
| packages/lmi/tests/cassettes/TestLiteLLMModel.test_max_token_truncation.yaml | Adds/updates VCR cassette for truncation behavior with max_tokens. |
| packages/lmi/tests/cassettes/TestLiteLLMModel.test_cost_call_single.yaml | Adds/updates VCR cassette for streaming cost tracking request shape. |
| packages/lmi/src/lmi/retry.py | Introduces centralized retry/fallback policy and jittered exponential backoff. |
| packages/lmi/src/lmi/litellm_patches.py | Removes the provider-400 retry patch and renumbers patch docs accordingly. |
| packages/lmi/src/lmi/exceptions.py | Adds ModelRefusalError and AllModelsExhaustedError. |
| packages/lmi/src/lmi/constants.py | Removes env-flag toggle for Responses API and keeps core constants. |
| packages/lmi/src/lmi/config.py | Adds typed ModelSpec/LLMConfig and legacy/dict coercion utilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Applies: Gemini default safety settings; `temperature` / `max_tokens` | ||
| defaults; and silent drop of `logprobs` / `top_logprobs` for non-OpenAI | ||
| providers (which don't support them). Explicit values in `overrides` | ||
| always win over the defaults. |
There was a problem hiding this comment.
ModelSpec.from_name() docstring says it will “silently drop” logprobs / top_logprobs for non-OpenAI providers, but the implementation raises a ValueError instead. Please align the docstring with the actual behavior (or change the behavior) so callers know whether to expect coercion or an exception.
| Applies: Gemini default safety settings; `temperature` / `max_tokens` | |
| defaults; and silent drop of `logprobs` / `top_logprobs` for non-OpenAI | |
| providers (which don't support them). Explicit values in `overrides` | |
| always win over the defaults. | |
| Applies: Gemini default safety settings and `temperature` / | |
| `max_tokens` defaults. `logprobs` and `top_logprobs` are treated as | |
| OpenAI-only parameters: passing them for non-OpenAI providers raises | |
| `ValueError`. Explicit values in `overrides` always win over the | |
| defaults. |
| params_by_name: dict[str, dict[str, Any]] = { | ||
| m["model_name"]: dict(m.get("litellm_params", {})) for m in model_list | ||
| } | ||
|
|
||
| primary_name = model_list[0]["model_name"] | ||
| ordered: list[str] = [primary_name, *fallback_map.get(primary_name, [])] | ||
| for name in params_by_name: | ||
| if name not in ordered: | ||
| ordered.append(name) | ||
|
|
||
| router_kwargs = legacy.get("router_kwargs") or {} | ||
| default_timeout = router_kwargs.get("timeout", 60.0) | ||
| default_retries = router_kwargs.get("num_retries", 3) | ||
|
|
||
| return cls( | ||
| models=[ | ||
| _spec_from_legacy_params( | ||
| params_by_name.get(name, {}), | ||
| default_timeout=default_timeout, | ||
| default_retries=default_retries, | ||
| ) | ||
| for name in ordered | ||
| ] |
There was a problem hiding this comment.
LLMConfig.from_legacy_dict() can pass an empty dict into _spec_from_legacy_params() when fallbacks references a model_name that isn’t present in model_list, which will then fail with a KeyError on params['model']. Consider validating that all referenced fallback model names exist (and raising a clear ValueError listing the missing names) before constructing the ordered chain.
Bringing some functionality up from LiteLLM into LMI:
ModelSpec, LLMConfig), which get translated into litellm config dicts just in timeModelSpec.responses_apilets us toggle-on Responses backend selectively. This means we can swap between the backends in a fallback cascadeNotably, this drops all dependence on
litellm.Routerexcept for embeddings, which we can likely handle in a similar way. It also gets rid of a few things that annoyed me:model_listor top-level kwargsSimpleAgent/LLMCallOp/LiteLLMModel/litellmlevels. Now, the first 3 all operate onLLMConfig.Some proof that this is backwards compatible: no cassette churn besides from new/dropped tests.