fix(server): unblock per-pod concurrency — uvicorn workers=4 + pymongo native async#237
Conversation
Currently the production CMD runs `uvicorn ... --port 5003` with no
`--workers` flag, which falls through to uvicorn's default of 1 worker.
Combined with `pymongo` (sync) calls inside `async def` adapter methods,
each pod is effectively single-threaded: every Mongo round-trip blocks
the entire event loop, so the pod can only process one in-flight
Mongo-bound request at a time.
At Cosmos round-trip latencies of 10-30ms, this caps each pod at
~30-100 req/s. Downstream tenants (OneEdge, FDD, Deep Research,
customer-emu-*) have been compensating by horizontally scaling: the
OneEdge user-testing pack runs ~20 agentex pods just to handle a few
hundred concurrent users, even though pods sit at ~12% CPU utilization
(1 worker on 8-core nodes).
This patch sets `--workers ${UVICORN_WORKERS:-4}` in the production
final-stage CMD. 4 workers is a safe default for the typical 1 CPU /
2Gi pod limits we see in customer-emu-fdd packs — it gives ~4x
concurrency per pod without exhausting memory (each worker is ~150-
200MB resident). Operators with higher CPU/memory limits can bump
UVICORN_WORKERS up to ~cpu_count.
The dev stage CMD (`--reload` mode) intentionally stays at the
implicit 1 worker because `--reload` is incompatible with
`--workers >1`.
Followup work (separate PRs, larger scope):
- Wrap each sync `pymongo` call in `asyncio.to_thread()` so the
event loop isn't blocked at all (this would let a single worker
handle hundreds of concurrent ops, bounded only by `MONGODB_MAX_
POOL_SIZE`).
- Migrate the adapter to `motor` (the async MongoDB driver) for
the cleanest end state.
Either of those would deliver another ~10x on top of this change.
Verified
- `docker buildx build --check` on the new Dockerfile: no warnings.
- Diff against main is purely additive (a comment + ENV + replaced
CMD); no other behavior changes.
References
- adapter_mongodb.py uses sync pymongo: src/adapters/crud_store/adapter_mongodb.py:8,219,277
- async def wrapping sync calls: src/adapters/crud_store/adapter_mongodb.py:196 etc.
- pymongo.MongoClient init: src/config/dependencies.py
The MongoDB adapter exposes async methods but the underlying driver (pymongo) is synchronous, so every Mongo round-trip blocked the worker's event loop — capping a single worker to one in-flight DB request at a time and starving every other coroutine until the call returned. Wrap each pymongo call site (find/find_one, insert_one/many, update_one, delete_one/many, and cursor materializations) with asyncio.to_thread so the I/O runs on the default thread pool while the event loop keeps serving other requests. For cursor builders the wrapper closes over the full chain (find().skip().limit().sort()) so the chained calls and the list() materialization happen together in the worker thread. Combined with the --workers 4 default this gives per-pod concurrency on the Mongo-bound paths instead of one-at-a-time serialization. Local load test against /agents (which hits the wrapped list() path): workers=1 256 req/s -> workers=4 892 req/s, 0 failures over 5000 reqs.
Builds on the asyncio.to_thread wrap from the parent PR. Switches the
MongoDB adapter from sync pymongo (wrapped in to_thread) to pymongo's
native async API (AsyncMongoClient, AsyncDatabase, AsyncCollection),
which has been GA since pymongo 4.13 and is MongoDB's recommended
replacement for motor (motor deprecated 2025-05).
With native async, each coroutine in the adapter is truly non-blocking
on Mongo I/O. We drop the thread-pool hop entirely:
- asyncio.to_thread(self.collection.insert_one, data)
-> await self.collection.insert_one(data)
- asyncio.to_thread(lambda: list(self.collection.find(q)))
-> await self.collection.find(q).to_list(length=None)
- for cursor paginators (list, find_by_field,
find_by_field_with_cursor) the chained skip/limit/sort still
builds the cursor lazily; only to_list() materializes.
Also:
- GlobalDependencies.mongodb_client is now AsyncMongoClient; close()
is now awaited in force_reload and async_shutdown.
- mongodb_indexes.{ensure,drop_all,get_index_stats} are async, since
create_index / drop_indexes / list_indexes are all awaitable on
AsyncCollection.
- Test fixtures (base_mongodb_database, mongodb_database, integration
isolated_test_schema) yield an AsyncDatabase so repositories under
test can `await` collection ops just like in production.
- Bumps pymongo floor to 4.13 (the GA cut for AsyncMongoClient).
Adds tests/unit/adapters/test_mongodb_adapter.py covering create / get
/ update / delete, batch_create / batch_get, list pagination + sort,
find_by_field with paging, find_by_field_with_cursor before/after, and
delete_by_field. All paths exercise the new await collection.* and the
cursor.to_list materialization.
Verified locally (workers=4, /agents endpoint, 5000 reqs at c=100):
- 0 Mongo-path failures (the 5xx in the run were Postgres
TooManyConnectionsError from the local test container, unrelated)
- ~850 req/s sustained, ~1 GB RSS
- 6/6 new adapter unit tests pass
olliestanley
left a comment
There was a problem hiding this comment.
mostly looks great from my perspective. just left two small comments in the diff.
outside the diff, one P0 to address:
in src/api/health_interceptor.py we need to fix the readiness check - it currently runs
await asyncio.to_thread(client.admin.command, "ping")
but since we switched to the async client, this will not actually execute the coroutine. it can be replaced with
await client.admin.command("ping")
Three items from @olliestanley's review: 1. health_interceptor._check_mongodb was passing the async client's coroutine-returning command() to asyncio.to_thread, which wraps it in a thread that just creates the coroutine and returns it without awaiting. The healthcheck silently always passed. Replace with direct `await client.admin.command("ping")` now that the client is async. 2. Expand the Dockerfile comment near UVICORN_WORKERS=4 to call out that each worker has its own DB connection pool, and that scaling workers multiplies the per-pod connection count. List the relevant env vars (POSTGRES_POOL_SIZE / POSTGRES_MIDDLEWARE_POOL_SIZE / MONGODB_MAX_POOL_SIZE) so ops knows what to tune in tandem. 3. Rename test_delete_by_field_and_batch_delete -> test_delete_by_field (the test doesn't exercise batch_delete, just delete_by_field).
|
Thanks @olliestanley — all three addressed in 5540620:
|
The readiness test mocked client.admin.command with a plain MagicMock, which worked when the call was wrapped in asyncio.to_thread (sync return value). Now that the production code awaits the call directly on the native async client, the mock must return a coroutine — switch to AsyncMock. This is the exact symmetric case of the P0 we just fixed: the test mock and the production code have to agree on whether the call is sync or async.
|
for posterity: the integration test run here failed because it has trouble running on a fork, but I was able to manually kick one off and all the integration tests did pass: https://github.com/scaleapi/scale-agentex/actions/runs/25875798139 in the process of reviewing now, ultimately this is a fix that will benefit all of our Agentex users — many thanks for attacking this so eagerly. |
smoreinis
left a comment
There was a problem hiding this comment.
LG, I built a custom image and deployed it to our AWS sgp dev environment so can confirm that this deploys cleanly there.
Problem
A single AgentEx pod was getting bottlenecked on conversational user traffic, forcing us to scale to ~20 pods just for user testing. Two compounding causes:
--workers 1in production. One worker == one process == one event loop.async defmethods but uses the synchronouspymongodriver. Every Mongo call inside anasync defblocked the event loop for the full DB round-trip, so even within a single worker the server serialized all in-flight Mongo-bound requests instead of multiplexing them.Net effect: per-pod throughput on any conversational/persistence path was effectively 1 request at a time.
Changes
This PR lands two commits in order:
1.
fix(server): default to uvicorn --workers 4 in production CMDSets
UVICORN_WORKERS=4as the production default (overrideable via env var). Dev stage stays single-worker because--reloadis incompatible with--workers >1.2.
fix(server): migrate to pymongo native async (AsyncMongoClient)Migrates the MongoDB adapter from sync pymongo to pymongo's native async API (
AsyncMongoClient/AsyncDatabase/AsyncCollection), GA since pymongo 4.13. Each coroutine in the adapter is now truly non-blocking on Mongo I/O:For cursor paginators (
list,find_by_field,find_by_field_with_cursor) the chained.skip().limit().sort()is still lazy; onlyto_list()materializes.Why not Motor? Motor was officially deprecated by MongoDB on May 14, 2025 in favor of the pymongo native async API. Final EOL is May 2027. The recommended migration path is straight to
pymongo.AsyncMongoClient.Other touches:
GlobalDependencies.mongodb_clientis nowAsyncMongoClient;close()is awaited inforce_reload/async_shutdown; startup ping is awaited.mongodb_indexes.{ensure,drop_all,get_index_stats}areasync def;create_index/drop_indexes/list_indexesare awaited / async-iterated onAsyncCollection.TaskStateRepository.get_by_task_and_agentawaitsfind_one.base_mongodb_database, legacymongodb_database, integrationisolated_test_schema) yield anAsyncDatabaseso repositories under test consume the same async API as in production.pymongodep floor bumped to>=4.13(the GA cut forAsyncMongoClient).Tests
New
agentex/tests/unit/adapters/test_mongodb_adapter.pycovering every path the migration touches: create / get / update / delete, batch_create / batch_get, list with pagination + ordering (asc/desc), find_by_field with paging, find_by_field_with_cursor before/after, delete_by_field. 6/6 tests pass against a real MongoDB testcontainer.Stress-test results
Tested with two images (workers=4 + to_thread, workers=4 + native async) side-by-side, capped to prod-pod size (1 CPU / 2 GiB), against the same local MongoDB seeded with 15 realistic agent documents.
Reads — 10 000
find()calls per runThe to_thread baseline plateaus at ~4400 r/s the moment concurrency exceeds the default ThreadPoolExecutor size — adding more concurrency does nothing because in-flight Mongo ops are bounded by
min(32, cpu+4)threads per worker. Native async has no such ceiling; the I/O is truly non-blocking.Writes — 20 000 (
insert_one+find_one) per run100 000-request soak (c=200, reads)
Both memory-stable across 100 k ops; native async is 31 % faster end-to-end on the same workload.
HTTP layer (
/agents, c=100, 5 000 reqs)This PR delivers ~850 req/s at workers=4 on the local stack; the HTTP layer bottleneck is the Postgres-bound auth middleware and the 1-CPU CPU cap, not the Mongo path. The Mongo-isolated numbers above are the right view of what this PR actually changes — in production with a properly-sized Postgres pool the HTTP-layer ceiling moves to where the Mongo numbers are.
Per-pod capacity translation
Assuming a typical chat turn fans out into ~15–25 server requests:
Risk
async def; only the underlying I/O is now non-blocking.pymongo.errors.*exception types are shared between sync and async APIs, so the retry decorator (which catchesAutoReconnect,NetworkTimeout,ServerSelectionTimeoutError) keeps working.retryWrites=False, pool sizes, timeouts).--reloaddoesn't compose with--workers >1.Rollback
Revert this PR. As an emergency env-only mitigation, setting
UVICORN_WORKERS=1at runtime still leaves the native-async adapter in place (which is strictly better than the pre-PR sync-blocking behavior).Greptile Summary
pymongotopymongo's native async API (AsyncMongoClient/AsyncCollection), eliminating event-loop blocking on every Mongo I/O call. All callers that already declaredasync defnow truly await the I/O.UVICORN_WORKERS=4as the production default viaENV+ ash -c \"exec ...\"CMD, allowing the value to be overridden at runtime via theUVICORN_WORKERSenv var while preserving proper PID-1 signal propagation.test_mongodb_adapter.py) exercising the migrated code paths against a real MongoDB testcontainer.Confidence Score: 5/5
Safe to merge; all async conversions are correct, test coverage is solid, and no P1/P0 issues found.
No P0 or P1 issues identified. The async migration touches every collection call site and all are correctly awaited. Pytest asyncio_mode=auto is configured, so @pytest.fixture async def fixtures work as expected. drop_all_indexes and get_index_stats have no sync callers. The Dockerfile CMD uses exec for proper PID-1 signal handling. New unit tests exercise all migrated paths against a real testcontainer.
No files require special attention.
Important Files Changed
Sequence Diagram
sequenceDiagram participant C as Client participant U as uvicorn worker (x4) participant F as FastAPI handler participant R as MongoDBCRUDRepository participant M as MongoDB (AsyncMongoClient) C->>U: HTTP request U->>F: route dispatch F->>R: await repo.get / create / list Note over R: Previously: sync pymongo blocked event loop here R->>M: await collection.find_one / insert_one M-->>R: result (non-blocking I/O) R-->>F: deserialized model F-->>U: JSON response U-->>C: HTTP response Note over U: Other coroutines run freely while Mongo I/O is in-flightReviews (2): Last reviewed commit: "fix(test): switch healthcheck mock to As..." | Re-trigger Greptile