Fix CLI timeout enforcement, web-fetch cooldown persistence, vertex test assertions#16
Merged
Merged
Conversation
…est assertions Three latent bugs surfaced by 'unused variable' warnings: 1. server/api/cli_tools.rs: execute_command and run_tool computed timeout_secs but never enforced it, so /api/cli-tools/execute and /api/cli-tools/run/:name could hang the request indefinitely on a slow child process. Pulled command execution into run_command_with_timeout that spawns with kill_on_drop, uses tokio::time::timeout, and reports timed_out=true on expiry. 2. server/api/web_fetch.rs: mark_connection_unavailable computed an 'until' timestamp from the cooldown but never wrote it back to the provider connection. As a result the per-connection cooldown was effectively a no-op across requests: select_connection / filter_available_accounts use rate_limited_until to skip cooling-down accounts, but it was always None. Now persist rate_limited_until on failure and clear it in clear_connection_error on success. 3. core/executor/vertex.rs: test_parse_vertex_model_standard and test_parse_vertex_model_partner destructured project_id and (for partner) actual_model but never asserted on them. Added the missing assertions so the parser contract is locked in. Plus four tokio tests for run_command_with_timeout covering happy path, stdout capture, hard timeout (kill within seconds), and the missing-binary error path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three latent bugs were sitting in the Rust BE behind
unused variable: ...compile warnings. None of them caused a compile/test failure, so CI was green, but each silently broke a real feature. Fixed all three and added regression tests.1.
src/server/api/cli_tools.rs—timeout_secsnever enforcedPOST /api/cli-tools/executeandPOST /api/cli-tools/run/:toolboth accept atimeout_secsfield, capped to 120s. The handlers computedtimeout_secsbut then calledtokio::process::Command::output().awaitdirectly, with no timeout wrapper. A slow or hung child process would block the request forever and returntimed_out: false.Pulled child execution into a new
run_command_with_timeouthelper that:kill_on_drop(true)so the child is reaped when the future is dropped,wait_with_output()intokio::time::timeout(Duration::from_secs(timeout_secs), …),success: false,exit_code: None,stderr: "Command timed out after Ns",timed_out: true.Added 4 tokio tests: happy path, stdout capture, hard timeout (verifies the kill happens within seconds for a
sleep 30child), and the missing-binary error path.2.
src/server/api/web_fetch.rs—rate_limited_untilnever persistedmark_connection_unavailableis the function that gets called when a/v1/web/fetchupstream returns a fallback-worthy error (429 / 5xx). It computed anuntil: DateTime<Utc>from the cooldown duration but never wrote it toc.rate_limited_until. The fallback selector elsewhere (select_connection,filter_available_accounts,is_account_unavailable— seecore/account_fallback/) keys off exactly that field to decide whether to skip a cooling-down connection.Net effect: the per-connection cooldown for web-fetch fallback was a no-op across requests. The excluded HashSet inside
do_fetch_with_fallbackcovered the current request only; the next request would happily re-pick the same dead connection until it died again. Compare toapply_error_stateincore/account_fallback/mod.rs:614which setsrate_limited_untilcorrectly for the chat path — same intent, just missed in this code path.Fix: persist
until.to_rfc3339()intoc.rate_limited_untilinmark_connection_unavailable, and clear it back toNoneinclear_connection_erroron success.3.
src/core/executor/vertex.rs— missing test assertionstest_parse_vertex_model_partnerdestructured(location, project_id, actual_model, is_partner)but only asserted onlocationandis_partner.test_parse_vertex_model_standardignoredproject_idsimilarly. Added the missingassert_eq!s so the partner branch'sactual_model == "glm-5-maas"(nomodels/prefix) and the always-emptyproject_idcontract are locked in.Review & Testing Checklist for Human
cargo test --lib --tests— 54 test binaries, 0 failures locally; CI should be the same./api/cli-tools/executefrom the dashboard with a long-running command (e.g.sleep 60,timeout_secs: 2) and confirm the response returns within ~2s withtimed_out: trueand that nosleepprocess is left behind on the host (pgrep sleep)./v1/web/fetchagainst a provider that 429s, then immediately again and confirm the second request skips that connection (look forrate_limited_untilindb.json's provider_connections row, and the dashboard's provider card showing 'cooling down').cd web && npm run build).Notes
The compile warnings around
unused variable: timeout_secs,unused variable: until,unused variable: project_id, andunused variable: actual_modelare now gone, which is the simplest sniff-test that the bugs are gone too. A few unrelated unused-variable warnings remain (p,tool_name_map,model_str,plan,provider,headers,reqforStartMitmRequest) — I reviewed each and they're stale parameters / leftover refactors, not behavior-affecting. Happy to clean those up in a follow-up if you want.The CI workflow at
.github/workflows/rust-ci.ymlrunscargo fmt --all --checkandcargo test --lib --tests. Both pass locally on Rust 1.95.0.