Skip to content

add optional data_sources field to restrict per-request tool selection #214

@tanleach

Description

@tanleach

Summary

POST /v1/jobs/async/submit currently accepts only {agent_type, input, job_id?, expiry_seconds?}. There is no way for an HTTP client to control which data sources (web search, knowledge base, paper search, etc.) the agent uses fo
r a given request.

The backend already supports per-request data-source filtering end-to-end — the HTTP surface just doesn't expose it. This issue proposes adding an optional data_sources: list[str] | None field to the request body so clients can fe
tch the available sources via GET /v1/data_sources and then scope their submissions.

Motivation

Today the only way to supply a data_sources list is via the WebSocket chat path, where the UI stringifies {"query": "...", "data_sources": [...]} into the chat message body. That works for the UI but is not a usable contract for
external HTTP clients. For server-to-server or scripted usage, we want:

  1. GET /v1/data_sources — enumerate available sources (already exists).
  2. POST /v1/jobs/async/submit with data_sources: [...] — submit a job scoped to a chosen subset.

Current state

The plumbing below submit_job is already complete:

  • submit_agent_job in frontends/aiq_api/src/aiq_api/jobs/submit.py already accepts data_sources: list[str] | None.
  • The Dask runner in frontends/aiq_api/src/aiq_api/jobs/runner.py already calls filter_tools_by_sources(tools, data_sources) before constructing the agent.
  • The filter primitive (src/aiq_agent/common/data_sources.py::filter_tools_by_sources) already handles None → all tools, [] → no tools, unknown ids → silent drop.

The only gap is JobSubmitRequest in frontends/aiq_api/src/aiq_api/routes/jobs.py, which has no data_sources field, so the HTTP handler cannot pass one through.

Proposal

Add data_sources: list[str] | None (default None) to JobSubmitRequest and validate it in the handler against the live registry.

Semantics:

Client sends Behavior
field omitted use all configured data-source tools (unchanged default)
null identical to omitted
[] job submitted without datasources enabled
["known_id", ...] filter to those sources; job submitted
any unknown id 422 Unprocessable Entity naming every unknown id; job not submitted

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions