fix(api): reject schemeless worker URLs in POST /workers#1532
fix(api): reject schemeless worker URLs in POST /workers#1532CatherineSue wants to merge 1 commit into
Conversation
`WorkerService::create_worker` calls `reserve_id_for_url(config.url)` before the AddWorker workflow runs, then returns a 202 with that WorkerId in the Location header. The reservation key is the submitted URL string; the workflow's `CreateWorkerStep` rewrites the URL via `normalize_url`, which prepends `http://` or `grpc://` for any input that doesn't already start with one of `http://`, `https://`, `grpc://`, or `grpcs://`. When normalization changes the string, the reservation is keyed on the bare URL while the live worker is registered under the canonical URL — two `url_to_id` entries, the bare one orphaned forever, and the WorkerId returned in the 202 points at nothing. The client polls its Location and gets 404. Validate the URL at the API boundary so the orphan can't be created through this path. Extract the existing scheme + host validation in `config::validation::ConfigValidator::validate_urls` into a free `pub(crate) fn validate_worker_url(url: &str) -> Result<(), String>`, have both the static-config validator and the new service-layer wrapper call it. Service-layer wrapper returns `WorkerServiceError::BadRequest` so the API returns 400 with a clear message instead of 202 -> 404. This does not address workflow-failure orphans (the reservation also leaks when the AddWorker workflow itself errors after a successful submission). That class needs a `release_reservation` lifecycle on the registry and is tracked as a separate follow-up. Signed-off-by: Chang Su <8605658+CatherineSue@users.noreply.github.com>
📝 WalkthroughWalkthroughThis PR extracts worker URL validation logic into a reusable helper function at the config layer, integrates it into config validation to produce ChangesWorker URL validation extraction and integration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors worker URL validation by extracting the logic into a reusable validate_worker_url function within the configuration module. This function is now utilized in the worker service to validate URLs during worker creation, ensuring that invalid inputs result in a 400 Bad Request error. The changes also include a comprehensive suite of unit tests covering various URL validation scenarios, such as scheme checks and host parsing. I have no feedback to provide as there were no review comments.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@model_gateway/src/worker/service.rs`:
- Around line 445-497: Add a service-level regression test that calls
WorkerService::create_worker with the bare host string "10.0.0.5:8000" and
asserts it returns a BadRequest (same failure class as
validate_worker_url_request); wire WorkerService with test doubles/mocks for the
ID reservation and job-queue submission used by create_worker and assert those
mocks were NOT invoked (i.e., no ID reservation and no queue submission
occurred) to ensure invalid URLs fail before side effects.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: df342086-617a-4ede-b206-9f0a7499ec65
📒 Files selected for processing (2)
model_gateway/src/config/validation.rsmodel_gateway/src/worker/service.rs
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
|
|
||
| #[test] | ||
| fn validate_worker_url_request_accepts_all_four_schemes() { | ||
| assert!(validate_worker_url_request("http://10.0.0.5:8000").is_ok()); | ||
| assert!(validate_worker_url_request("https://10.0.0.5:8000").is_ok()); | ||
| assert!(validate_worker_url_request("grpc://10.0.0.5:8000").is_ok()); | ||
| assert!(validate_worker_url_request("grpcs://10.0.0.5:8000").is_ok()); | ||
| } | ||
|
|
||
| #[test] | ||
| fn validate_worker_url_request_accepts_case_insensitive_schemes() { | ||
| assert!(validate_worker_url_request("HTTP://10.0.0.5:8000").is_ok()); | ||
| assert!(validate_worker_url_request("GrPc://10.0.0.5:8000").is_ok()); | ||
| } | ||
|
|
||
| #[test] | ||
| fn validate_worker_url_request_rejects_bare_host_port_as_400() { | ||
| let err = validate_worker_url_request("10.0.0.5:8000").unwrap_err(); | ||
| assert!(matches!(err, WorkerServiceError::BadRequest { .. })); | ||
| assert_eq!(err.status_code(), StatusCode::BAD_REQUEST); | ||
| assert!(err | ||
| .to_string() | ||
| .contains("http://, https://, grpc://, or grpcs://")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn validate_worker_url_request_rejects_empty_as_400() { | ||
| let err = validate_worker_url_request("").unwrap_err(); | ||
| assert!(matches!(err, WorkerServiceError::BadRequest { .. })); | ||
| assert!(err.to_string().contains("empty")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn validate_worker_url_request_rejects_unknown_scheme() { | ||
| let err = validate_worker_url_request("ftp://10.0.0.5:8000").unwrap_err(); | ||
| assert!(matches!(err, WorkerServiceError::BadRequest { .. })); | ||
| } | ||
|
|
||
| #[test] | ||
| fn validate_worker_url_request_rejects_missing_host() { | ||
| let err = validate_worker_url_request("http://").unwrap_err(); | ||
| assert!(matches!(err, WorkerServiceError::BadRequest { .. })); | ||
| } | ||
|
|
||
| #[test] | ||
| fn validate_worker_url_request_rejects_unparsable_url() { | ||
| let err = validate_worker_url_request("http://[invalid").unwrap_err(); | ||
| assert!(matches!(err, WorkerServiceError::BadRequest { .. })); | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add a service-level regression test for create_worker side effects on invalid URLs.
These tests validate the wrapper, but they don’t assert the core regression behavior in WorkerService::create_worker: invalid URLs should fail before reservation and queue submission. Please add a test that calls create_worker("10.0.0.5:8000") and verifies no ID reservation/job submission occurs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@model_gateway/src/worker/service.rs` around lines 445 - 497, Add a
service-level regression test that calls WorkerService::create_worker with the
bare host string "10.0.0.5:8000" and asserts it returns a BadRequest (same
failure class as validate_worker_url_request); wire WorkerService with test
doubles/mocks for the ID reservation and job-queue submission used by
create_worker and assert those mocks were NOT invoked (i.e., no ID reservation
and no queue submission occurred) to ensure invalid URLs fail before side
effects.
Description
Problem
WorkerService::create_worker(model_gateway/src/worker/service.rs:240) callsreserve_id_for_url(config.url)(registry.rs:889) before the AddWorker workflow runs, so that the HTTP handler can return202 Acceptedsynchronously withLocation: /workers/{worker_id}. The contract requires the reservation key (config.urlat submission time) and the registration key (worker.url()after workflow normalization) to be the same string — only then doesregister_inner'surl_to_id.entry(...)find the pre-reserved entry and reuse the reservedWorkerId.CreateWorkerSteprewrites the URL vianormalize_url(model_gateway/src/workflow/steps/local/create_worker.rs:300-313):When the submitted URL has no recognized scheme (e.g. bare
host:port),normalize_urlprepends one — the reservation key and registration key diverge — and theworker_idreturned in the 202 response is permanently orphaned. The client polls/workers/{worker_id}and gets404 Not Found. The live worker is registered under a differentWorkerIdthat the client was never told about.Adjacent #1523 (
fix(service-discovery): drop http:// prefix so dual-probe detects gRPC) added aresolve_url_to_idcanonicalization helper that makes registry lookups tolerate the orphan, but does not address the API contract violation. This PR closes the orphan-creation path on the HTTP API side.Solution
Validate
config.urlat the API boundary, beforereserve_id_for_urlruns. Reject any URL that:http://,https://,grpc://,grpcs://(case-insensitive scheme allow-list), or::url::Url::parseor has no host.The exact rule already lives in
config::validation::ConfigValidator::validate_urls(used for staticrouter_configURLs). Extracted it into a freepub(crate) fn validate_worker_url(url: &str) -> Result<(), String>so both the static-config validator and the new service-layer wrapper share one implementation. The service-layer wrapper (validate_worker_url_request) translates the plain error string intoWorkerServiceError::BadRequest, yielding400 Bad Requestwith a message likeWorker URL '10.0.0.5:8000' is invalid: URL must start with http://, https://, grpc://, or grpcs://.Changes
model_gateway/src/config/validation.rs: extractpub(crate) fn validate_worker_url(url: &str) -> Result<(), String>.ConfigValidator::validate_urlsnow delegates to it and wraps the error string inConfigError::InvalidValue.model_gateway/src/worker/service.rs: addvalidate_worker_url_request(wraps the shared helper intoWorkerServiceError::BadRequest), call it as the first statement ofcreate_workerso no state is mutated for invalid input. Seven unit tests covering accept (four schemes + case-insensitive) and reject (barehost:port, empty, unknown scheme, missing host, unparsable URL) paths.Test Plan
Behavior matrix after this PR, walking every scenario the team discussed for #1523. Each timeline begins at
t=0when the API handler is invoked.Case A — K8s pod is discovered, registered, deleted (the primary scenario #1523 fixed)
This PR does not change this path.
service_discoverysubmitsJob::AddWorkerdirectly to the job queue and never callsWorkerService::create_worker; no validation runs.✅ Unchanged. Result: correct end-to-end registration and removal.
Case B — User POSTs
{"url":"http://10.0.0.5:8000"}(already-schemed)✅ Unchanged. Validation passes, no orphan.
Case C — User POSTs
{"url":"10.0.0.5:8000"}(bare) — THIS IS THE CASE THIS PR FIXESBefore this PR:
After this PR:
✅ Fixed. Client gets an immediate, actionable 400 instead of 202 -> 404. No orphan can be created via this path.
Case D — Mixed source: a human POSTs the same pod K8s is already managing
Two sub-cases.
D1: POST schemed URL while K8s also discovers the same pod.
Two valid workers exist for the same host:port under different schemes — a pre-existing data model permissiveness, unchanged by this PR. The 202's
worker_idis honest; polling it returns the worker the user created. No orphan, no 404.D2: POST bare URL while K8s also discovers the same pod.
Before this PR: produces the same orphan + 404 wart as Case C, plus the pod is also tracked separately by service_discovery under the canonical scheme.
After this PR: 400 at the API boundary as in Case C. The K8s-side registration is unaffected.
✅ The only behavior change is in the bare-URL sub-case, where we now reject upfront instead of silently producing an orphan.
Case E — Workflow or job-submit fails after a successful reservation
maintoday regardless of whether the submitted URL was schemed or bare. Validation at the API boundary cannot fix them — the URL is well-formed; it's the downstream failure that orphans the reservation.Properly fixing this requires a
release_reservation(url) -> boolregistry primitive that drops aurl_to_identry only when itsWorkerIdis not inworkers, wired into:WorkerService::create_worker's error-return paths (so E1 is reaped synchronously)Tracked separately; out of scope for this PR.
Test summary
cargo test -p smg --lib worker::service::tests::— 7/7 new tests passing (covering each accept and reject path ofvalidate_worker_url_request, including verification that rejections surface asWorkerServiceError::BadRequestwithStatusCode::BAD_REQUESTand the expected error message text).cargo test -p smg --lib config::validation::tests::— 23/23 pre-existing config-side tests passing after the helper extraction;test_validate_invalid_urlsstill exercises the same accept/reject rules via the new shared path.cargo +nightly fmt --checkclean.cargo clippy -p smg --all-targets --all-features -- -D warningsclean.Checklist
cargo +nightly fmtpassescargo clippy --all-targets --all-features -- -D warningspassesSummary by CodeRabbit
Bug Fixes
Tests