Skip to content

Moving things into LMI#446

Open
sidnarayanan wants to merge 26 commits intomainfrom
lmi-fallbacks
Open

Moving things into LMI#446
sidnarayanan wants to merge 26 commits intomainfrom
lmi-fallbacks

Conversation

@sidnarayanan
Copy link
Copy Markdown
Collaborator

@sidnarayanan sidnarayanan commented Apr 18, 2026

Bringing some functionality up from LiteLLM into LMI:

  • Model configuration uses opinionated pydantic models (ModelSpec, LLMConfig), which get translated into litellm config dicts just in time
  • LMI handles retries and fallbacks. We use slightly more intelligent retries - only on things that are actually worth retrying, and with jittered exp backoff
  • Because we own retries/fallbacks, we have better logging
  • ModelSpec.responses_api lets us toggle-on Responses backend selectively. This means we can swap between the backends in a fallback cascade
  • Response validation is pushed down from LDP into LMI, so it can use the new LMI retry/fallback logic.

Notably, this drops all dependence on litellm.Router except for embeddings, which we can likely handle in a similar way. It also gets rid of a few things that annoyed me:

  • Having to remember whether parameters go in model_list or top-level kwargs
  • Differently-shaped configuration at SimpleAgent/LLMCallOp/LiteLLMModel/litellm levels. Now, the first 3 all operate on LLMConfig.

Some proof that this is backwards compatible: no cassette churn besides from new/dropped tests.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves LLM configuration and execution concerns “up” into LMI by introducing typed, opinionated config models (ModelSpec, LLMConfig), shifting retry/fallback behavior into LMI (with clearer error semantics), and updating LDP agents/modules/tests to consume the new config shape.

Changes:

  • Replace dict-shaped llm_model configuration with typed llm_config (LLMConfig / ModelSpec) across agents, graph modules, and tests.
  • Introduce LMI-owned retry/fallback policy and new error surfaces (e.g., AllModelsExhaustedError, ModelRefusalError) with targeted unit tests.
  • Update VCR cassettes to reflect new default request parameters (notably max_tokens=4096) and updated client/library versions.

Reviewed changes

Copilot reviewed 52 out of 54 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_rollouts.py Updates fallback behavior assertions to expect AllModelsExhaustedError and uses llm_config legacy-coerce shape.
tests/test_optimizer.py Migrates optimizer tests to LLMConfig.coerce() and llm_config= construction.
tests/test_ops.py Switches op tests to LLMConfig inputs and updates gradient expectations when config becomes a scalar leaf.
tests/test_modules.py Updates module configs/instantiation to use llm_config defaults from ReActAgent.
tests/test_envs.py Updates agent construction to llm_config and error expectations to AllModelsExhaustedError.
tests/test_agents.py Migrates agents to llm_config, updates serialization assertions and gradient assertions.
tests/cassettes/TestSimpleAgent.test_dummyenv[gpt-4o-mini-2024-07-18].yaml VCR update for new request shape (adds max_tokens) and updated client headers.
tests/cassettes/TestSimpleAgent.test_dummyenv[claude-haiku-4-5-20251001].yaml VCR update for new request shape (max_tokens, tool schema changes).
tests/cassettes/TestSimpleAgent.test_agent_grad[gpt-4o-mini-2024-07-18].yaml VCR update for new request shape (adds max_tokens) and updated client headers.
tests/cassettes/TestSimpleAgent.test_agent_grad[claude-haiku-4-5-20251001].yaml VCR update for new request shape (max_tokens, tool schema changes).
tests/cassettes/TestReActAgent.test_react_dummyenv[True-gpt-4-turbo].yaml VCR update for new request shape (adds max_tokens).
tests/cassettes/TestReActAgent.test_react_dummyenv[True-claude-haiku-4-5-20251001].yaml VCR update for new request shape (max_tokens) and payload adjustments.
tests/cassettes/TestReActAgent.test_react_dummyenv[False-claude-haiku-4-5-20251001].yaml VCR update for new request shape (max_tokens) and payload adjustments.
tests/cassettes/TestReActAgent.test_agent_grad[True-gpt-4-turbo].yaml VCR update for new request shape (adds max_tokens).
tests/cassettes/TestReActAgent.test_agent_grad[True-claude-haiku-4-5-20251001].yaml VCR update for new request shape (max_tokens) and payload adjustments.
tests/cassettes/TestReActAgent.test_agent_grad[False-claude-haiku-4-5-20251001].yaml VCR update for new request shape (max_tokens) and payload adjustments.
tests/cassettes/TestNoToolsSimpleAgent.test_dummyenv[claude-haiku-4-5-20251001].yaml VCR update for new request shape (max_tokens).
tests/cassettes/TestMemoryAgent.test_agent_grad.yaml VCR update for new request shape (adds max_tokens).
tests/cassettes/TestAgentState.test_no_state_mutation[agent1].yaml VCR update for new request shape (adds max_tokens).
tests/cassettes/TestAgentState.test_no_state_mutation[agent0].yaml VCR update for new request shape (adds max_tokens).
src/ldp/graph/modules/thought.py Switches module to accept LLMConfig instead of dict config.
src/ldp/graph/modules/reflect.py Replaces dict llm_model field with LLMConfigField and updates wiring.
src/ldp/graph/modules/react.py Uses LLMConfig.with_extra_params() to set stop sequences without mutating dicts.
src/ldp/graph/modules/llm_call.py Updates parsed-call module to use LLMConfig and typed ConfigOp.
src/ldp/graph/common_ops.py Changes LLMCallOp.forward() signature to accept LLMConfig and constructs LiteLLMModel via llm_config.
src/ldp/agent/tree_of_thoughts_agent.py Migrates agent config field to LLMConfigField and updates call sites.
src/ldp/agent/simple_agent.py Migrates agent config field to LLMConfigField and updates internal ConfigOp.
src/ldp/agent/react_agent.py Migrates agent config field to LLMConfigField and updates module construction.
packages/lmi/tests/test_retry.py Adds unit tests for retry/fallback classification and backoff bounds.
packages/lmi/tests/test_llms.py Updates tests for new fallback error semantics, dispatch behavior, and Responses integration toggled per-model.
packages/lmi/tests/test_litellm_patches.py Removes tests for the removed provider-400 retry patch.
packages/lmi/tests/test_dispatch.py Adds end-to-end tests for LMI dispatch + retry/fallback loop (mocking litellm.acompletion).
packages/lmi/tests/test_cost_tracking.py Simplifies tests by removing Router bypass paths and aligning to new config behavior.
packages/lmi/tests/test_config.py Adds comprehensive tests for ModelSpec, legacy config translation, and LLMConfig.coerce / LLMConfigField.
packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_basic_call.yaml Updates Responses API VCR cassette.
packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_multi_turn_stateful.yaml Updates Responses API multi-turn VCR cassette.
packages/lmi/tests/cassettes/TestResponsesAPIIntegration.test_responses_api_off_ignores_response_id.yaml Updates VCR cassette for non-Responses model behavior.
packages/lmi/tests/cassettes/TestLiteLLMModel.test_max_token_truncation.yaml Adds/updates VCR cassette for truncation behavior with max_tokens.
packages/lmi/tests/cassettes/TestLiteLLMModel.test_cost_call_single.yaml Adds/updates VCR cassette for streaming cost tracking request shape.
packages/lmi/src/lmi/retry.py Introduces centralized retry/fallback policy and jittered exponential backoff.
packages/lmi/src/lmi/litellm_patches.py Removes the provider-400 retry patch and renumbers patch docs accordingly.
packages/lmi/src/lmi/exceptions.py Adds ModelRefusalError and AllModelsExhaustedError.
packages/lmi/src/lmi/constants.py Removes env-flag toggle for Responses API and keeps core constants.
packages/lmi/src/lmi/config.py Adds typed ModelSpec/LLMConfig and legacy/dict coercion utilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +65 to +68
Applies: Gemini default safety settings; `temperature` / `max_tokens`
defaults; and silent drop of `logprobs` / `top_logprobs` for non-OpenAI
providers (which don't support them). Explicit values in `overrides`
always win over the defaults.
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelSpec.from_name() docstring says it will “silently drop” logprobs / top_logprobs for non-OpenAI providers, but the implementation raises a ValueError instead. Please align the docstring with the actual behavior (or change the behavior) so callers know whether to expect coercion or an exception.

Suggested change
Applies: Gemini default safety settings; `temperature` / `max_tokens`
defaults; and silent drop of `logprobs` / `top_logprobs` for non-OpenAI
providers (which don't support them). Explicit values in `overrides`
always win over the defaults.
Applies: Gemini default safety settings and `temperature` /
`max_tokens` defaults. `logprobs` and `top_logprobs` are treated as
OpenAI-only parameters: passing them for non-OpenAI providers raises
`ValueError`. Explicit values in `overrides` always win over the
defaults.

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +216
params_by_name: dict[str, dict[str, Any]] = {
m["model_name"]: dict(m.get("litellm_params", {})) for m in model_list
}

primary_name = model_list[0]["model_name"]
ordered: list[str] = [primary_name, *fallback_map.get(primary_name, [])]
for name in params_by_name:
if name not in ordered:
ordered.append(name)

router_kwargs = legacy.get("router_kwargs") or {}
default_timeout = router_kwargs.get("timeout", 60.0)
default_retries = router_kwargs.get("num_retries", 3)

return cls(
models=[
_spec_from_legacy_params(
params_by_name.get(name, {}),
default_timeout=default_timeout,
default_retries=default_retries,
)
for name in ordered
]
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLMConfig.from_legacy_dict() can pass an empty dict into _spec_from_legacy_params() when fallbacks references a model_name that isn’t present in model_list, which will then fail with a KeyError on params['model']. Consider validating that all referenced fallback model names exist (and raising a clear ValueError listing the missing names) before constructing the ordered chain.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants