Context
Adapting Karpathy's Autoresearch pattern to autonomously optimize workflow node execution.
Why
The workflow engine executes DAGs of AI nodes — LLM calls, tool executions, branching logic. As workflows grow more complex (10+ nodes, nested sub-workflows), execution time and memory compound. Optimizing scheduling, parallelization, and node implementation directly impacts user experience and compute costs.
What
Set up the autoresearch loop:
| File |
Role |
Who edits |
benchmark.py |
Runs test workflow suite, measures execution time + memory + correctness |
Nobody (read-only) |
engine.py |
Scheduler, node implementations, parallelization logic |
Agent only |
program.md |
Optimization targets, constraints, test workflows |
Human only |
Search space
- Scheduling strategies: Topological sort variants, priority queues, speculative execution
- Parallelization: asyncio concurrency limits, node-level vs branch-level parallelism
- Memory management: Streaming intermediate results vs buffering, context window packing
- Node execution: Batch compatible nodes, reuse connections, cache deterministic outputs
Evaluation metric
Primary: Total workflow execution time (seconds)
Secondary: Peak memory (MB)
Tertiary: Correctness (all node outputs match expected)
Prerequisites
Generated by Claude
Context
Adapting Karpathy's Autoresearch pattern to autonomously optimize workflow node execution.
Why
The workflow engine executes DAGs of AI nodes — LLM calls, tool executions, branching logic. As workflows grow more complex (10+ nodes, nested sub-workflows), execution time and memory compound. Optimizing scheduling, parallelization, and node implementation directly impacts user experience and compute costs.
What
Set up the autoresearch loop:
benchmark.pyengine.pyprogram.mdSearch space
Evaluation metric
Primary: Total workflow execution time (seconds)
Secondary: Peak memory (MB)
Tertiary: Correctness (all node outputs match expected)
Prerequisites
benchmark.pywith timing + memory + correctness checksprogram.mdGenerated by Claude