Summary
POST /v1/jobs/async/submit currently accepts only {agent_type, input, job_id?, expiry_seconds?}. There is no way for an HTTP client to control which data sources (web search, knowledge base, paper search, etc.) the agent uses fo
r a given request.
The backend already supports per-request data-source filtering end-to-end — the HTTP surface just doesn't expose it. This issue proposes adding an optional data_sources: list[str] | None field to the request body so clients can fe
tch the available sources via GET /v1/data_sources and then scope their submissions.
Motivation
Today the only way to supply a data_sources list is via the WebSocket chat path, where the UI stringifies {"query": "...", "data_sources": [...]} into the chat message body. That works for the UI but is not a usable contract for
external HTTP clients. For server-to-server or scripted usage, we want:
GET /v1/data_sources — enumerate available sources (already exists).
POST /v1/jobs/async/submit with data_sources: [...] — submit a job scoped to a chosen subset.
Current state
The plumbing below submit_job is already complete:
submit_agent_job in frontends/aiq_api/src/aiq_api/jobs/submit.py already accepts data_sources: list[str] | None.
- The Dask runner in
frontends/aiq_api/src/aiq_api/jobs/runner.py already calls filter_tools_by_sources(tools, data_sources) before constructing the agent.
- The filter primitive (
src/aiq_agent/common/data_sources.py::filter_tools_by_sources) already handles None → all tools, [] → no tools, unknown ids → silent drop.
The only gap is JobSubmitRequest in frontends/aiq_api/src/aiq_api/routes/jobs.py, which has no data_sources field, so the HTTP handler cannot pass one through.
Proposal
Add data_sources: list[str] | None (default None) to JobSubmitRequest and validate it in the handler against the live registry.
Semantics:
| Client sends |
Behavior |
| field omitted |
use all configured data-source tools (unchanged default) |
null |
identical to omitted |
[] |
job submitted without datasources enabled |
["known_id", ...] |
filter to those sources; job submitted |
| any unknown id |
422 Unprocessable Entity naming every unknown id; job not submitted |
Summary
POST /v1/jobs/async/submitcurrently accepts only{agent_type, input, job_id?, expiry_seconds?}. There is no way for an HTTP client to control which data sources (web search, knowledge base, paper search, etc.) the agent uses for a given request.
The backend already supports per-request data-source filtering end-to-end — the HTTP surface just doesn't expose it. This issue proposes adding an optional
data_sources: list[str] | Nonefield to the request body so clients can fetch the available sources via
GET /v1/data_sourcesand then scope their submissions.Motivation
Today the only way to supply a
data_sourceslist is via the WebSocket chat path, where the UI stringifies{"query": "...", "data_sources": [...]}into the chat message body. That works for the UI but is not a usable contract forexternal HTTP clients. For server-to-server or scripted usage, we want:
GET /v1/data_sources— enumerate available sources (already exists).POST /v1/jobs/async/submitwithdata_sources: [...]— submit a job scoped to a chosen subset.Current state
The plumbing below
submit_jobis already complete:submit_agent_jobinfrontends/aiq_api/src/aiq_api/jobs/submit.pyalready acceptsdata_sources: list[str] | None.frontends/aiq_api/src/aiq_api/jobs/runner.pyalready callsfilter_tools_by_sources(tools, data_sources)before constructing the agent.src/aiq_agent/common/data_sources.py::filter_tools_by_sources) already handlesNone→ all tools,[]→ no tools, unknown ids → silent drop.The only gap is
JobSubmitRequestinfrontends/aiq_api/src/aiq_api/routes/jobs.py, which has nodata_sourcesfield, so the HTTP handler cannot pass one through.Proposal
Add
data_sources: list[str] | None(defaultNone) toJobSubmitRequestand validate it in the handler against the live registry.Semantics:
null[]["known_id", ...]