fix(mcp): evict pooled connections when tool discovery fails by MohanKumar21 · Pull Request #1499 · lightseekorg/smg

MohanKumar21 · 2026-05-15T11:26:57Z

Description

Problem

E2E MCP tool-calling tests (e2e_test/responses/test_tools_call.py) were flaky in CI: sometimes Brave/DeepWiki MCP worked, sometimes the model returned only reasoning + message with no mcp_list_tools or mcp_call.

Root cause: poisoned connection pool entries. When connect_dynamic_server first connected to a dynamic MCP server (e.g. http://brave-search-mcp:8080/mcp), it cached the connection in the pool. If list_all_tools() failed or returned before tools were registered (transient network, server not ready), the code only logged a warning and still treated the connect as successful. Later requests hit an early return because the connection was already pooled, skipped tool discovery entirely, and the tool inventory stayed empty. The model then saw McpListTools(tools=[]) and never invoked MCP tools until gateway restart.

Solution

Make pooled connections valid only when tools are actually registered. Retry tool discovery when a connection exists but the inventory is empty, and evict + fail the connect when discovery fails so the next request can reconnect cleanly.

Changes

crates/mcp/src/core/orchestrator.rs — connect_dynamic_server_with_tenant:
- Fast path only when connection is pooled and tool_inventory.has_server_tools(url).
- After get_or_create, call list_all_tools() if tools are still missing (re-list on cached client).
- On list_all_tools() failure: remove connection from pool and return McpError::ConnectionFailed instead of succeeding with zero tools.
crates/mcp/src/inventory/index.rs — Add has_server_tools(server_key) to check whether any tools are registered for a server URL.
crates/mcp/src/core/pool.rs — Add remove(key) to evict bad pooled connections.

Test Plan

Re-run the openai-responses job that previously reported 8 failures in TestToolCallingCloud

Checklist

cargo +nightly fmt passes
cargo clippy --all-targets --all-features -- -D warnings passes
(Optional) Documentation updated
(Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

New Features
- Idle connection reaper to evict unused pooled connections.
- Faster pool-key discovery fast path with a short-lived cached "empty discovery" window.
- Inventory deduplicates tool entries and exposes pool-scoped tooling presence checks.
- Pool tracks last-used timestamps and tool-discovery completion state.
Bug Fixes
- Cleaner handling of failed pooled connections and reduced redundant discovery.
Tests
- Updated tests for deduplication, nondeterministic lookup, and discovery state APIs.

coderabbitai · 2026-05-15T11:30:51Z

Warning

Rate limit exceeded

@MohanKumar21 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 54 minutes and 52 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c2cd75f4-c91a-405b-8e90-b27916cbcd92

📥 Commits

Reviewing files that changed from the base of the PR and between 9b12527 and 08c3731.

📒 Files selected for processing (2)

crates/mcp/src/core/orchestrator.rs
crates/mcp/src/core/pool.rs

📝 Walkthrough

Walkthrough

Adds pool-side tool-discovery state and removal APIs, background idle eviction, inventory simple-name deduplication (HashSet), tenant-typed PoolKey usage, and orchestrator discovery fast-paths with defensive inventory refresh and fail-fast pooled-connection cleanup.

Changes

Dynamic Server Connection Pooling & Tool Discovery

Layer / File(s)	Summary
Orchestrator reaper and TTL `crates/mcp/src/core/orchestrator.rs`	Introduces `EMPTY_DISCOVERY_TTL`, spawns `spawn_idle_reaper` to periodically evict idle pooled connections, and documents eviction behavior.
Orchestrator connection flow & discovery fast-paths `crates/mcp/src/core/orchestrator.rs`	`connect_dynamic_server_with_tenant` accepts `Option<TenantId>`, uses inventory-aware pre/post fast paths to avoid redundant list calls, clones `PoolKey` into `get_or_create` closure, clears stale inventory before discovery, repopulates from `list_all_tools`, marks discovery completed on success, and removes pooled entry/returns `ConnectionFailed` on discovery failure when appropriate.
Pool key and tenant type `crates/mcp/src/core/pool.rs`	`PoolKey.tenant_id` changed to `Option<TenantId>` and constructors updated to accept `Option<TenantId>`.
CachedConnection & last-used refresh `crates/mcp/src/core/pool.rs`	`CachedConnection` gains `tools_discovered: bool` and `last_used` tracking; `get_or_create`/`get` refresh `last_used` on hits.
Pool APIs: discovery query/mark, eviction, remove `crates/mcp/src/core/pool.rs`	Adds `tool_discovery_completed()` and `mark_tool_discovery_completed()` to read/set discovery state; updates `evict_idle()` to remove stale keys, decrement `connection_count`, and call `eviction_callback` per removal; adds `remove()` to evict by key and adjust counters. Tests updated for tenant-id usage and discovery APIs.
Inventory simple-name deduplication & pool index `crates/mcp/src/inventory/index.rs`	`tools_by_simple_name` now stores `HashSet<QualifiedToolName>` for deduplication; `get_tool()` returns an arbitrary matching qualified tool under collisions; adds `tools_by_pool`, `insert_pool_entry()`, and `has_server_tools(&PoolKey)`; updates clear/insert logic and tests to account for nondeterministic set iteration.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

lightseekorg/smg#621: Introduces the atomic connection_count that remove() relies on.

Suggested labels

tests

Suggested reviewers

CatherineSue
key4ng
slin1237

Poem

🐇 A rabbit in the pool digs deep,
It marks the tools the servers keep,
When caches say the inventory's clear,
It skips the lists and hops on near,
If listing fails, it sweeps — then sleeps.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main purpose: fixing flaky connection pool behavior by evicting connections when tool discovery fails.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request enhances the MCP orchestrator's connection pooling by ensuring tool discovery is completed before returning a pooled connection and implementing a mechanism to remove failed connections from the pool. Review feedback suggests triggering the eviction callback during connection removal to keep the tool inventory synchronized, clearing existing tools before re-discovery to prevent stale entries, and optimizing performance on hot paths using interned strings.

gemini-code-assist · 2026-05-15T11:33:30Z

+    pub fn remove(&self, key: &PoolKey) -> bool {
+        let mut connections = self.connections.lock();
+        if connections.pop(key).is_some() {
+            self.connection_count.fetch_sub(1, Ordering::Relaxed);
+            true
+        } else {
+            false
+        }


The remove method should trigger the eviction_callback when a connection is successfully removed. The McpOrchestrator relies on this callback to clear associated tools from the ToolInventory. Additionally, ensure that long-lived bidirectional streams in this pool implement an idle timeout to prevent resource exhaustion. Finally, since PoolKey is used on hot paths, consider using an interned string type like Arc<str> for its fields to improve performance.

Suggested change

pub fn remove(&self, key: &PoolKey) -> bool {

let mut connections = self.connections.lock();

if connections.pop(key).is_some() {

self.connection_count.fetch_sub(1, Ordering::Relaxed);

true

} else {

false

}

pub fn remove(&self, key: &PoolKey) -> bool {

let mut connections = self.connections.lock();

if connections.pop(key).is_some() {

self.connection_count.fetch_sub(1, Ordering::Relaxed);

if let Some(callback) = &self.eviction_callback {

callback(key);

}

true

} else {

false

}

}

References

Implement an idle timeout for long-lived bidirectional streams to prevent inactive clients from indefinitely consuming server-side resources such as tasks and channels.

For types that are frequently cloned on hot paths and represent a small, repeated set of values (e.g., worker IDs or tenant IDs), use an interned string type like Arc to improve performance.

gemini-code-assist · 2026-05-15T11:33:31Z

+        // Load tools from the server.
+        // If this fails, remove the connection from the pool so the next request
+        // retries from scratch instead of perpetually seeing zero tools.
        match client.peer().list_all_tools().await {


Consider calling inventory_clone.clear_server_tools(&server_key) before re-listing tools to prevent stale or partial entries. To prevent vulnerabilities from duplicate entries, ensure the inventory uses a HashSet for tool storage. Additionally, ensure tool names are JSON-escaped before interpolation to handle special characters correctly.

inventory_clone.clear_server_tools(&server_key); match client.peer().list_all_tools().await {

References

To prevent vulnerabilities from duplicate entries, use data structures that inherently enforce uniqueness, such as HashSet, instead of manually deduplicating collections like Vec.

Tool names should be JSON-escaped before interpolation to correctly handle special characters like quotes, backslashes, and unicode.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/mcp/src/core/orchestrator.rs (1)
1334-1409: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't use URL-scoped inventory to validate tenant/auth-scoped pool entries.

PoolKey is intentionally keyed by url + auth_hash + tenant_id, but both has_server_tools(&pool_key.url) checks only consult URL-scoped inventory. A second connection to the same URL with different credentials or tenant can therefore skip discovery and inherit whatever tool set another pool entry registered. The guard at Line 1405 makes this happen even when the exact PoolKey was just created. Reuse needs to be keyed by the same identity as the pooled connection, or this optimization breaks the isolation the pool key is supposed to provide.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/mcp/src/core/orchestrator.rs` around lines 1334 - 1409, The code
currently checks tools by URL only (has_server_tools(&pool_key.url)), which
breaks PoolKey isolation; update both fast-path and post-get_or_create checks to
consult the tool inventory using the full PoolKey identity
(url+auth_hash+tenant_id) instead of just the URL: introduce or call an
inventory method that accepts the PoolKey (e.g.,
has_tools_for_pool_key(&pool_key) or has_server_tools_with_auth(&pool_key)) and
use that in place of has_server_tools(&pool_key.url) (references: PoolKey,
pool_key, has_server_tools, connection_pool.get_or_create, inventory_clone,
tool_inventory). Ensure the inventory indexing stores/retrieves tools keyed by
the same PoolKey fields (auth_hash and tenant_id included) so pooled connections
with different credentials/tenants do not share tool lists.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/mcp/src/core/orchestrator.rs`:
- Around line 1334-1409: The code currently checks tools by URL only
(has_server_tools(&pool_key.url)), which breaks PoolKey isolation; update both
fast-path and post-get_or_create checks to consult the tool inventory using the
full PoolKey identity (url+auth_hash+tenant_id) instead of just the URL:
introduce or call an inventory method that accepts the PoolKey (e.g.,
has_tools_for_pool_key(&pool_key) or has_server_tools_with_auth(&pool_key)) and
use that in place of has_server_tools(&pool_key.url) (references: PoolKey,
pool_key, has_server_tools, connection_pool.get_or_create, inventory_clone,
tool_inventory). Ensure the inventory indexing stores/retrieves tools keyed by
the same PoolKey fields (auth_hash and tenant_id included) so pooled connections
with different credentials/tenants do not share tool lists.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e2c27fe5-dd69-4d0d-9e1d-df37db989b58

📥 Commits

Reviewing files that changed from the base of the PR and between 8c11f43 and 27f86d0.

📒 Files selected for processing (3)

crates/mcp/src/core/orchestrator.rs
crates/mcp/src/core/pool.rs
crates/mcp/src/inventory/index.rs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00ff20dc35

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/mcp/src/core/orchestrator.rs`:
- Around line 1334-1345: The fast path in discover_or_get_server_url (checking
self.connection_pool.tool_discovery_completed(&pool_key) &&
self.tool_inventory.has_server_tools(&pool_key.url)) causes servers that
legitimately have zero tools to always fall through and re-run list_all_tools()
on each request; to fix, add and use a cached "discovered_empty" state (or a
discovered_tools_count option) keyed by pool_key.url in ToolInventory or
ConnectionPool and set it when list_all_tools() returns zero, check that flag
alongside tool_discovery_completed (e.g., treat discovered_empty as a valid
discovered state), and add a TTL or eviction policy so the discovered-empty
marker is refreshed periodically to avoid permanent staleness while preventing
repeated discovery calls for intentionally empty servers (refer to
tool_discovery_completed, has_server_tools, list_all_tools, and the
pool_key.url/ToolInventory APIs).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4364485e-4db5-4a1a-b4ae-576ff2dca80f

📥 Commits

Reviewing files that changed from the base of the PR and between 27f86d0 and 6279b39.

📒 Files selected for processing (3)

crates/mcp/src/core/orchestrator.rs
crates/mcp/src/core/pool.rs
crates/mcp/src/inventory/index.rs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6279b39205

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T13:44:20Z

+        if self.connection_pool.tool_discovery_completed(&pool_key)
+            && self.tool_inventory.has_server_tools(&pool_key.url)
+        {


Allow fast path after successful empty tool discovery

mark_tool_discovery_completed is documented to treat empty list_all_tools results as successful discovery, but this fast path also requires has_server_tools(url), so zero-tool servers can never hit the cache path and will re-run discovery on every connect. In environments where a dynamic MCP endpoint intentionally exposes no tools (or temporarily returns none), this causes repeated network calls and avoidable ConnectionFailed errors from transient list_all_tools failures despite a previously successful discovery.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-15T13:44:21Z

+                    "Failed to list tools from '{}': {}; removing pooled connection",
+                    server_key, e
+                );
+                self.connection_pool.remove(&pool_key);


Avoid evicting pooled client on concurrent discovery error

Concurrent callers can share the same pooled client and race through list_all_tools because discovery is not serialized per pool key; if one caller succeeds and another transiently fails, this unconditional remove evicts the healthy connection and triggers URL-level inventory clearing, undoing the successful discovery and causing temporary tool loss/flaky failures. The eviction should be conditional on discovery still being incomplete (or protected by a per-key discovery lock) before clearing shared state.

Useful? React with 👍 / 👎.

Signed-off-by: MohanKumar21! <mohanmrm20@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c2b4a7866

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T05:55:30Z

+        let count = evicted.len();
+        if count > 0 {
+            self.connection_count.fetch_sub(count, Ordering::Relaxed);
+            if let Some(callback) = &self.eviction_callback {


Guard idle-eviction counter update against concurrent clear

evict_idle decrements connection_count after releasing the pool lock, so a concurrent clear() can reset the counter to 0 between removal and fetch_sub(count). In that race, fetch_sub underflows the usize counter and len()/stats() report a huge bogus value. Keep the decrement synchronized with the removal (or use saturating/CAS logic) to avoid counter corruption during shutdown or admin clears.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a6281e5e9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T08:01:05Z

+            && (self.tool_inventory.has_server_tools(&pool_key.url)
+                || self
+                    .connection_pool
+                    .discovery_fresh_within(&pool_key, EMPTY_DISCOVERY_TTL))


Don’t bypass empty inventory on fresh discovery timestamps

The new fast-path condition returns early when tool_discovery_completed is true and discovery_fresh_within(...) passes, even if has_server_tools(url) is false. That reintroduces a tool-loss window: if a sibling tenant eviction clears URL-scoped inventory, requests for this pooled entry can skip re-discovery for up to EMPTY_DISCOVERY_TTL (60s) and serve with no MCP tools. Fresh evidence in this revision is the added || discovery_fresh_within(&pool_key, EMPTY_DISCOVERY_TTL) branch, which explicitly allows bypassing the inventory check.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/mcp/src/core/pool.rs`:
- Around line 323-335: remove() currently invokes eviction_callback while
holding connections.lock(), risking deadlock; change it to detect and remove the
connection while holding the lock, record whether a callback exists (and the
key) into local variables, then drop the lock and call the callback afterwards.
Specifically, in the pub fn remove(&self, key: &PoolKey) -> bool function: use
connections.lock() only to call connections.pop(key) and update
connection_count.fetch_sub, capture a cloned/owned reference to
self.eviction_callback (or a boolean flag) and the key if present, release the
mutex, and then invoke the callback outside the lock (matching how evict_idle
does it).

In `@crates/mcp/src/inventory/index.rs`:
- Around line 332-342: The current has_server_tools(&self, server_key: &str) is
too coarse because pooled connections are keyed by (url, auth_hash, tenant_id);
update the inventory lookup to be tenant/auth-aware by changing has_server_tools
to accept the pool identity (e.g. auth_hash and tenant_id or a composite
pool_key) and check tools_by_server against that pool-scoped key (or add a new
tools_by_pool map keyed by the composite). Specifically, modify the
has_server_tools signature and all callers to pass the pool-scoped key, and
ensure the method uses tools_by_server (or the new tools_by_pool) to return true
only when the exact pool-scoped entry exists and is non-empty.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 34ed448f-3bf0-43a2-b560-21176bdf6f0c

📥 Commits

Reviewing files that changed from the base of the PR and between 6279b39 and 1a6281e.

📒 Files selected for processing (3)

crates/mcp/src/core/orchestrator.rs
crates/mcp/src/core/pool.rs
crates/mcp/src/inventory/index.rs

Signed-off-by: MohanKumar21! <mohanmrm20@gmail.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/mcp/src/core/orchestrator.rs`:
- Around line 1483-1493: The orchestrator is unconditionally calling
metrics.record_connection_opened() for connections returned by
connection_pool.get_or_create(), causing over-counting when cached
Arc<McpClient> instances are reused; change get_or_create (in pool.rs) to return
an indicator of whether the connection was newly created (e.g., return
(Arc<McpClient>, bool is_new) or provide a separate creation-aware API) and
update the orchestrator call sites that currently call
record_connection_opened() (the branches that check
connection_pool.tool_discovery_completed(&pool_key) and subsequent
early-returns) to only call record_connection_opened() when is_new is true, or
alternatively move the metrics increment into the creation code path inside
connection_pool.get_or_create so cached hits do not increment
active_connections.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ae22887-b2b0-446f-9c59-699119be2294

📥 Commits

Reviewing files that changed from the base of the PR and between 1a6281e and 9b12527.

📒 Files selected for processing (3)

crates/mcp/src/core/orchestrator.rs
crates/mcp/src/core/pool.rs
crates/mcp/src/inventory/index.rs

coderabbitai · 2026-05-18T09:33:27Z

+        // Another caller may have completed discovery while we were connecting.
+        // Same combined check as the outer fast path — see note above.
+        if self.connection_pool.tool_discovery_completed(&pool_key)
+            && (inventory_clone.has_server_tools(&pool_key)
+                || self
+                    .connection_pool
+                    .discovery_fresh_within(&pool_key, EMPTY_DISCOVERY_TTL))
+        {
+            self.metrics.record_connection_opened();
+            return Ok(server_key);
+        }


⚠️ Potential issue | 🟠 Major | 💤 Low value

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if get_or_create returns information about whether it was a cache hit # and how metrics are expected to be recorded rg -n -A 10 'fn get_or_create' crates/mcp/src/core/pool.rs

Repository: lightseekorg/smg

Length of output: 437

🏁 Script executed:

#!/bin/bash # Get the full get_or_create implementation cat -n crates/mcp/src/core/pool.rs | sed -n '172,250p'

Repository: lightseekorg/smg

Length of output: 3150

🏁 Script executed:

#!/bin/bash # Get the full context around line 1491 and 1542 in orchestrator.rs wc -l crates/mcp/src/core/orchestrator.rs

Repository: lightseekorg/smg

Length of output: 101

🏁 Script executed:

#!/bin/bash # Find where record_connection_opened is defined and called rg -n 'record_connection_opened' crates/mcp/src/

Repository: lightseekorg/smg

Length of output: 672

🏁 Script executed:

#!/bin/bash # Get context around line 1491 and 1542 cat -n crates/mcp/src/core/orchestrator.rs | sed -n '1420,1555p'

Repository: lightseekorg/smg

Length of output: 7254

🏁 Script executed:

#!/bin/bash # Check the metrics definition cat -n crates/mcp/src/core/metrics.rs | sed -n '93,110p'

Repository: lightseekorg/smg

Length of output: 1025

🏁 Script executed:

#!/bin/bash # Search for record_connection_closed to understand the full lifecycle rg -n -B 2 -A 2 'record_connection_closed' crates/mcp/src/core/orchestrator.rs

Repository: lightseekorg/smg

Length of output: 226

Metrics may over-count when cached connections trigger record_connection_opened().

get_or_create returns an Arc<McpClient> without indicating whether it was a cache hit (line 186 in pool.rs) or a newly created connection (lines 190–210). The orchestrator then calls record_connection_opened() at lines 1491, 1528, and 1542 regardless of whether the connection was new or cached. Since record_connection_opened() increments active_connections (fetch_add), a reused cached connection will increment the counter multiple times across different requests, creating an accounting mismatch with record_connection_closed() which is only called once during shutdown.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/mcp/src/core/orchestrator.rs` around lines 1483 - 1493, The orchestrator is unconditionally calling metrics.record_connection_opened() for connections returned by connection_pool.get_or_create(), causing over-counting when cached Arc<McpClient> instances are reused; change get_or_create (in pool.rs) to return an indicator of whether the connection was newly created (e.g., return (Arc<McpClient>, bool is_new) or provide a separate creation-aware API) and update the orchestrator call sites that currently call record_connection_opened() (the branches that check connection_pool.tool_discovery_completed(&pool_key) and subsequent early-returns) to only call record_connection_opened() when is_new is true, or alternatively move the metrics increment into the creation code path inside connection_pool.get_or_create so cached hits do not increment active_connections.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08c3731f55

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-18T09:36:54Z

+
    /// Clear all cached items for a server. Uses server index for O(tools_per_server) removal.
    pub fn clear_server_tools(&self, server_key: &str) {
+        self.tools_by_pool.retain(|key, _| key.url != server_key);


Preserve sibling pool tool markers during URL clears

clear_server_tools now removes tools_by_pool entries for all PoolKeys that share a URL. Since connect_dynamic_server_with_tenant calls this on every successful discovery and then repopulates only the current pool key, discovery by tenant/auth variant A erases variant B’s marker; B then misses the fast path and re-runs list_all_tools on its next request. For multi-tenant traffic on one MCP URL, this introduces repeated rediscovery/network churn and makes tool availability flaky whenever one of those extra discovery calls transiently fails.

Useful? React with 👍 / 👎.

MohanKumar21 requested review from CatherineSue, key4ng, slin1237, zhaowenzi and zhoug9127 as code owners May 15, 2026 11:26

github-actions Bot added the mcp MCP related changes label May 15, 2026

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

coderabbitai Bot approved these changes May 15, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

Comment thread crates/mcp/src/core/orchestrator.rs Outdated

MohanKumar21 force-pushed the ksmkumar/fix-intermittent-wf-failure branch from 00ff20d to 6279b39 Compare May 15, 2026 13:39

coderabbitai Bot requested changes May 15, 2026

View reviewed changes

Comment thread crates/mcp/src/core/orchestrator.rs Outdated

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

MohanKumar21 added 2 commits May 18, 2026 11:21

fix(mcp): evict pooled connections when tool discovery fails

ab84411

Signed-off-by: MohanKumar21! <mohanmrm20@gmail.com>

fix(mcp): track tool discovery on pool to fix flaky dynamic MCP tests

76e0715

Signed-off-by: MohanKumar21! <mohanmrm20@gmail.com>

MohanKumar21 force-pushed the ksmkumar/fix-intermittent-wf-failure branch from 3b75491 to 3c2b4a7 Compare May 18, 2026 05:51

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

MohanKumar21 force-pushed the ksmkumar/fix-intermittent-wf-failure branch from 3c2b4a7 to 1a6281e Compare May 18, 2026 07:56

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

coderabbitai Bot requested changes May 18, 2026

View reviewed changes

Comment thread crates/mcp/src/core/pool.rs

Comment thread crates/mcp/src/inventory/index.rs Outdated

fix(mcp): harden dynamic connection pool and enforce idle timeout

9b12527

Signed-off-by: MohanKumar21! <mohanmrm20@gmail.com>

MohanKumar21 force-pushed the ksmkumar/fix-intermittent-wf-failure branch from 1a6281e to 9b12527 Compare May 18, 2026 09:28

fix(mcp): Fix MCP dynamic inventory fast-path isolation

08c3731

Signed-off-by: MohanKumar21! <mohanmrm20@gmail.com>

coderabbitai Bot requested changes May 18, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Conversation

MohanKumar21 commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

MohanKumar21 commented May 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading