-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fix: MCP CPU spike by adding timeout to session cleanup #4758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When making MCP calls through the responses API, the llama-stack server CPU usage could spike to 100% and remain there indefinitely due to anyio's _deliver_cancellation loop hanging during session cleanup. This fix adds a configurable timeout (default 5 seconds) to the __aexit__ calls in MCPSessionManager.close_all() using anyio.fail_after(). If cleanup takes longer than the timeout, it's aborted to prevent the CPU spin. Fixes llamastack#4754
mattf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please provide reproduction steps.
i did the following and still see 100% CPU usage -
10:53:24 in llama-stack on fix/mcp-cpu-spike-timeout [$?] is 📦 0.4.0.dev0 …
➜ uv run llama stack run --providers agents=inline::meta-reference,inference=remote::llama-openai-compat,vector_io=inline::faiss,tool_runtime=inline::rag-runtime,files=inline::localfs
...
INFO 2026-01-28 10:53:34,588 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321
(Press CTRL+C to quit)
INFO 2026-01-28 10:53:38,379 uvicorn.access:476 uncategorized: ::1:53190 - "POST /v1/responses HTTP/1.1" 200
10:53:35 in llama-stack on fix/mcp-cpu-spike-timeout [$?] is 📦 0.4.0.dev0 …
➜ curl http://localhost:8321/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "llama-openai-compat/Llama-4-Scout-17B-16E-Instruct-FP8",
"input": "Use the provided tool to say something.",
"tools": [
{
"type": "mcp",
"server_label": "local-mcp",
"server_url": "http://localhost:9090"
}
],
"tool_choice": "auto"
}'
|
Also still seeing a problem |
|
lgtm, CPU spike gone can when using MCP |
|
This pull request has merge conflicts that must be resolved before it can be merged. @jwm4 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
Summary
Fixes #4754
When making MCP calls through the responses API, the llama-stack server CPU usage could spike to 100% and remain there indefinitely, even after the request completes.
Root Cause
The issue occurs during MCP session cleanup in
MCPSessionManager.close_all(). When tasks don't respond to cancellation, anyio's_deliver_cancellationloop can spin indefinitely, causing the CPU spike.Solution
Added a configurable timeout (default 5 seconds) to the
__aexit__calls usinganyio.fail_after(). If cleanup takes longer than the timeout, it's aborted to prevent the CPU spin.Testing
TimeoutErrorfromfail_after()gracefully