Skip to content

Bug: MCP _call_tool timeout cancels future but leaks subprocess; executor shutdown defeats timeoutΒ #139

@tcconnally

Description

@tcconnally

Severity: 🟠 High (subprocess + descriptor leak; defeats timeout)

_call_tool in src/perseus/mcp.py:245–251:

with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(_call_resolver, spec, args_str, cfg, workspace)
    result = future.result(timeout=timeout)
return result

Two problems:

  1. future.result(timeout=…) only abandons the future β€” the worker thread (and its child subprocess from @query) continues running.
  2. The with block calls executor.shutdown(wait=True) on exit, which blocks the MCP response until the abandoned subprocess completes β€” defeating the entire timeout mechanism.

Repro

perseus mcp serve &
# Send MCP tools/call for perseus_query with { "command": "sleep 600" } and tool_timeout_s=5.
# The response will block for ~600s, not 5s.

Suggested fix

executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
try:
    future = executor.submit(_call_resolver, spec, args_str, cfg, workspace)
    result = future.result(timeout=timeout)
except concurrent.futures.TimeoutError:
    executor.shutdown(wait=False, cancel_futures=True)
    return f"Error executing {directive_name}: timed out after {timeout}s"
else:
    executor.shutdown(wait=False)
    return result

Better long-term: enforce timeout via subprocess.run(timeout=…) inside directives/query.py only, and have the MCP wrapper pass through (don't double-wrap). Track spawned PIDs in a process group so the wrapper can os.killpg on timeout.

Acceptance criteria

  • Test: invoke perseus_query via MCP with cmd="sleep 30", tool_timeout_s=2. Response returns within 5s. pgrep -f "sleep 30" returns nothing within 7s.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions