Context
Currently investigators have three inputs:
- Sources (
sources/) — code, papers, data to read and modify
- Skills (
skills/) — domain knowledge injected into prompts
- Compute nodes — machines to run experiments on
There's a gap: executable tools that investigators can call during experiments — simulators, pre-trained models, evaluation scripts, external APIs.
Examples
- A physics simulator investigators call to test hypotheses
- A pre-trained model they query for predictions or embeddings
- A custom evaluation harness with domain-specific metrics
- An MCP server connecting to a database or external service
- A data preprocessing pipeline they run before training
Design Options
Option A: Keep it in sources/
Put executables in sources/, document them in the research proposal. Investigators already have Bash access — they can call anything.
Pro: No new abstraction, simple, works today.
Con: No discoverability — investigators don't know what's executable vs what's reference material.
Option B: Dedicated tools/ directory
A new tools/ directory in each session. Each tool has a manifest (name, description, how to call it, expected inputs/outputs). The orchestrator registers them and injects tool descriptions into investigator prompts.
Pro: Formal, discoverable, investigators know exactly what tools are available.
Con: More complexity, another directory to manage.
Option C: MCP server integration
Tools are MCP servers that the Agent SDK connects to natively. Users drop an MCP config file, the orchestrator registers the servers.
Pro: Native SDK integration, structured input/output, full tool-use protocol.
Con: Requires users to build MCP servers — higher barrier.
Questions
- Is Option A (sources + documentation) sufficient for most use cases?
- Should tools be formally registered or just documented?
- Would MCP integration be valuable, or is Bash execution enough?
- What tools would YOU want to give your investigators?
Feedback welcome.
Context
Currently investigators have three inputs:
sources/) — code, papers, data to read and modifyskills/) — domain knowledge injected into promptsThere's a gap: executable tools that investigators can call during experiments — simulators, pre-trained models, evaluation scripts, external APIs.
Examples
Design Options
Option A: Keep it in
sources/Put executables in
sources/, document them in the research proposal. Investigators already have Bash access — they can call anything.Pro: No new abstraction, simple, works today.
Con: No discoverability — investigators don't know what's executable vs what's reference material.
Option B: Dedicated
tools/directoryA new
tools/directory in each session. Each tool has a manifest (name, description, how to call it, expected inputs/outputs). The orchestrator registers them and injects tool descriptions into investigator prompts.Pro: Formal, discoverable, investigators know exactly what tools are available.
Con: More complexity, another directory to manage.
Option C: MCP server integration
Tools are MCP servers that the Agent SDK connects to natively. Users drop an MCP config file, the orchestrator registers the servers.
Pro: Native SDK integration, structured input/output, full tool-use protocol.
Con: Requires users to build MCP servers — higher barrier.
Questions
Feedback welcome.