Add dynamic workflow tool#3426
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
⚠️ QA Report: PASS WITH ISSUES
The dynamic workflow tool worked end-to-end in manual SDK usage, but PR CI currently has a failed check-examples check.
Does this PR achieve its stated goal?
Yes, functionally. On the base branch, WorkflowToolSet is absent; on the PR branch, I could import/create the tool, execute workflow scripts, block an unsafe open() script, run a single real subagent via wf.run_agent, and run the new example that fans out subagents then reduces their answers into a final result with EXAMPLE_COST.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully |
| CI Status | check-examples failed; several checks were still in progress when checked |
| Functional Verification | ✅ Direct tool usage, guardrail behavior, run_agent, and example map_agents/reduce_agent flow worked |
Functional Verification
Test 1: Baseline absence vs PR availability
Step 1 — Establish baseline without the feature:
Ran git checkout --quiet origin/main; uv run python -c "import importlib.util; print('workflow_module_found=', importlib.util.find_spec('openhands.tools.workflow') is not None)"; uv run python -c "from openhands.tools import WorkflowToolSet; print(WorkflowToolSet)" || echo 'WorkflowToolSet import failed on base branch as expected':
BASE_COMMIT=5e614afe
workflow_module_found= False
ImportError: cannot import name 'WorkflowToolSet' from 'openhands.tools'
WorkflowToolSet import failed on base branch as expected
This confirms the workflow tool is new user-facing functionality, not pre-existing behavior.
Step 2 — Apply the PR's changes:
Checked out redo-dynamic-workflow-mvp at d91a5e8.
Step 3 — Re-run with the feature in place:
Used WorkflowToolSet.create(...).as_executable() from a real Conversation, then executed a flatten workflow and an unsafe workflow:
tool_available= WorkflowToolSet
flatten_status= completed is_error= False text= ['alpha', 'beta', 'gamma', 'delta']
unsafe_status= error is_error= True text= Workflow scripts may not call `open`
This shows the new tool is importable/creatable, executes a valid workflow script, and enforces the advertised best-effort unsafe-call guardrail.
Test 2: Single subagent workflow primitive
Step 1 — Establish baseline:
The base branch import check above shows there was no WorkflowToolSet/wf.run_agent path to exercise.
Step 2 — Apply the PR's changes:
On redo-dynamic-workflow-mvp, registered a small QA subagent and invoked the workflow tool with a script calling await wf.run_agent(...).
Step 3 — Run with the feature in place:
Ran the one-off SDK script through uv run python:
run_agent_status= completed is_error= False
run_agent_text= PASS-WORKFLOW-RUN-AGENT
cost= 0.019145000000000002
This confirms the single-subagent primitive works through the actual TaskManager/subagent path with a real LLM-backed subagent.
Test 3: Runnable standalone example with fan-out and reduce
Step 1 — Establish baseline:
The example file does not exist on the base branch as part of the new feature; the base import check also shows the workflow tool itself is absent.
Step 2 — Apply the PR's changes:
On redo-dynamic-workflow-mvp, ran the new standalone example exactly as a user would.
Step 3 — Run with the feature in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python examples/01_standalone_sdk/52_dynamic_workflow.py:
Task 'task_00000002' completed.
Task 'task_00000001' completed.
Task 'task_00000003' completed.
Task 'task_00000004' completed.
Dynamic workflow result:
An octopus runs on three hearts, with the main one pausing whenever it swims; a honeybee can recognize human faces through clever pattern-based vision despite its tiny brain; and snow leopards skip roaring in favor of chuffs, hisses, growls, and a loud, non-aggressive “prusten.”
EXAMPLE_COST: 0.053985000000000005
This confirms the advertised fan-out (map_agents) and reducer (reduce_agent) workflow runs end-to-end with real subagents and prints the expected example cost line.
Issues Found
- 🟠 Issue: GitHub CI currently reports
[Optional] Docs example / check-examplesas failed, with additional checks still in progress. Functional QA did not reproduce a workflow-tool failure—the new example ran successfully locally—but the PR should not be considered fully healthy until CI is green.
This review was created by an AI agent (OpenHands) on behalf of the user.
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||
Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The dynamic workflow tool works end-to-end: it imports only on the PR branch, executes generated workflow scripts, rejects unsafe scripts, runs real map/reduce sub-agent orchestration, and the new standalone example completes.
Does this PR achieve its stated goal?
Yes. The PR set out to add a WorkflowToolSet/workflow actions, provide WorkflowContext helpers (run_agent, map_agents, reduce_agent, flatten), and include a runnable SDK example. I exercised those as a user: direct workflow tool execution returned a completed observation, unsafe script execution returned an error observation, real LLM-backed sub-agents produced a reducer result through map_agents/reduce_agent, and examples/01_standalone_sdk/52_dynamic_workflow.py ran successfully and printed EXAMPLE_COST.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed and installed the uv environment. |
| CI Status | ✅ All checked non-QA jobs are passing; this qa-changes job was still in progress when checked. |
| Functional Verification | ✅ Verified import baseline, direct tool execution, safety rejection, real sub-agent map/reduce, and the standalone example. |
Functional Verification
Test 1: Baseline import vs PR import/execution
Step 1 — Establish baseline without the feature:
Ran git switch --detach origin/main && uv run python -c 'import importlib; importlib.import_module("openhands.tools.workflow")':
ModuleNotFoundError: No module named 'openhands.tools.workflow'
This shows the workflow package is new user-visible functionality added by this PR.
Step 2 — Apply the PR's changes:
Switched back to redo-dynamic-workflow-mvp at 181521b3a694e847453c1eaa3e2820f792811df8.
Step 3 — Re-run with the PR behavior in place:
Ran a user-style SDK script that creates an Agent with WorkflowToolSet, creates a Conversation, and calls WorkflowExecutor with a generated async def main(wf) script using wf.flatten:
status= completed
is_error= False
text= {'areas': ['sdk', 'tools', 'workspace', 'agent-server'], 'count': 4}
This confirms the new tool package imports and the workflow tool executes generated Python through the conversation context.
Test 2: Unsafe generated workflow code is rejected
Ran the workflow tool with a generated script that attempted open('/tmp/qa-workflow-should-not-open'):
status= error
is_error= True
text= Workflow scripts may not call `open`
This confirms the tool returns an error observation instead of executing a direct file read from generated workflow code.
Test 3: Real map_agents + reduce_agent orchestration
Ran a workflow action that registered a real qa_echo sub-agent backed by the configured LLM, fanned out two sub-agent tasks with wf.map_agents(max_concurrency=2), then reduced their outputs with wf.reduce_agent:
Task 'task_00000002' completed.
Task 'task_00000001' completed.
Task 'task_00000003' completed.
status= completed
is_error= False
text= QA-RED-BLUE
This confirms the workflow context can fan out and reduce real sub-agent conversations, not just run local helper logic.
Test 4: Standalone example runs as documented
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python examples/01_standalone_sdk/52_dynamic_workflow.py:
Message from Coverage Workflow Agent to User
## Repo-wide test coverage audit
...
Recommended next step
Start with a focused test push covering the P0 security and data-integrity gaps...
Tokens: ↑ input 177.15K • cache hit 59.25% • reasoning 3.9K • ↓ output 18.44K • $ 0.9667
EXAMPLE_COST: 0.9666999999999999
This confirms the parent agent can generate and run a dynamic workflow using the new tool, producing the intended repo-wide coverage audit and example cost line.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
|
CI is green on 181521b, the example now has the parent agent generate the workflow code, and related docs are in OpenHands/docs#535. Ready for another automated review.\n\n_This comment was created by an AI agent (OpenHands) on behalf of the user._ |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Good overall shape for an MVP — the three-layer design (WorkflowAction/WorkflowObservation/WorkflowToolSet in definition.py, runtime orchestration in impl.py, and the honest security disclaimer in the tool description) is clean and readable. The AST-level validation paired with _safe_globals is the right defence-in-depth approach for in-process script execution.
A few issues worth addressing before landing:
asyncio.gatherinmap_agentsis missingreturn_exceptions=True, so one failing sub-agent silently aborts all parallel work and loses every partial result — the most impactful bug here.- One unreachable
assertinsiderun_oneshould be removed or converted to a proper guard. - A private method (
_ensure_parent) is called across module boundaries — fragile contract worth flagging. - Minor placement nit on the bottom-of-file
TYPE_CHECKINGimport indefinition.py.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the dynamic workflow tool end-to-end as an SDK user; it imports on the PR branch, orchestrates real sub-agents via map/reduce, rejects unsafe scripts, and the new standalone example completed with a coverage report.
Does this PR achieve its stated goal?
Yes. The PR set out to add a WorkflowToolSet/WorkflowContext for Python dynamic workflow scripts plus a runnable standalone example. On the PR commit, I ran a real WorkflowExecutor with an SDK Conversation and a registered sub-agent; wf.map_agents, wf.flatten, and wf.reduce_agent completed and returned the expected workflow tokens. I also ran the new example path and its output ended with a repo-wide coverage report plus EXAMPLE_COST: 0.9111179999999999.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully and installed the uv workspace environment. |
| CI Status | 🟡 At check time: 33 checks passing, 3 skipped, 1 pending (qa-changes); no CI jobs were rerun. |
| Functional Verification | ✅ New import surface, real workflow execution, unsafe-script handling, and bundled example were exercised. |
Functional Verification
Test 1: New workflow tool import surface
Step 1 — Reproduce / establish baseline (without the fix):
Ran git checkout origin/main && uv run python -c "from openhands.tools import WorkflowToolSet; print(WorkflowToolSet)":
BRANCH=HEAD COMMIT=e65a8abd
ImportError: cannot import name 'WorkflowToolSet' from 'openhands.tools' (.../openhands-tools/openhands/tools/__init__.py)
EXIT_CODE=1
This shows the advertised workflow tool was not available from the public tools package on the base branch.
Step 2 — Apply the PR's changes:
Checked out redo-dynamic-workflow-mvp at 181521b3.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python -c "from openhands.tools import WorkflowToolSet; from openhands.tools.workflow import WorkflowAction, WorkflowExecutor; print(WorkflowToolSet.name, WorkflowAction.__name__, WorkflowExecutor.__name__)":
BRANCH=redo-dynamic-workflow-mvp COMMIT=181521b3
workflow_tool_set WorkflowAction WorkflowExecutor
EXIT_CODE=0
This confirms the new tool set and workflow action/executor are importable as SDK users would expect.
Test 2: Real dynamic workflow with sub-agent map/reduce
Step 1 — Reproduce / establish baseline (without the fix):
The base-branch import failure above establishes that this SDK user path did not exist before the PR.
Step 2 — Apply the PR's changes:
On 181521b3, I registered a minimal qa_workflow_worker sub-agent, created a parent SDK Conversation with WorkflowToolSet, and invoked WorkflowExecutor with generated-style workflow code using wf.map_agents, wf.flatten, and wf.reduce_agent.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_workflow_real.py:
Task 'task_00000002' completed.
Task 'task_00000001' completed.
Task 'task_00000003' completed.
STATUS completed
IS_ERROR False
TEXT_START
WORKFLOW_QA_alpha WORKFLOW_QA_beta WORKFLOW_QA_tail
TEXT_END
This confirms the workflow tool can orchestrate multiple real SDK sub-agent tasks, preserve map output, flatten intermediate values, run a reducer, and return a completed observation.
Test 3: Shipped standalone example
Step 1 — Reproduce / establish baseline (without the fix):
Ran git checkout origin/main && test -f examples/01_standalone_sdk/52_dynamic_workflow.py:
BRANCH=HEAD COMMIT=e65a8abd
example_missing
This shows the advertised standalone example did not exist on the base branch.
Step 2 — Apply the PR's changes:
Checked out redo-dynamic-workflow-mvp at 181521b3.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python examples/01_standalone_sdk/52_dynamic_workflow.py > /tmp/qa_dynamic_workflow_example.out 2>&1 and inspected the final output:
## Concrete next tests to add first
1. Agent Server secrets exposure tests
2. Agent Server persistence store tests
3. SDK LLM serialization/tool-call tests
4. Workspace lifecycle/error tests
5. Tools task tracker and timeout tests
6. SDK plugin/condenser orchestration tests
7. Websocket and patch/parser tests
EXAMPLE_COST: 0.9111179999999999
This confirms the new example ran through the parent-agent-generated workflow and produced the intended repo-wide coverage audit report.
Test 4: Unsafe generated script handling
Step 1 — Reproduce / establish baseline (without the fix):
The base branch had no workflow tool to run this error path.
Step 2 — Apply the PR's changes:
On 181521b3, I invoked the workflow executor with a script whose main(wf) attempted open('/etc/passwd').
Step 3 — Re-run with the fix in place:
Ran a real WorkflowExecutor invocation with an SDK Conversation:
STATUS error
IS_ERROR True
TEXT Workflow scripts may not call `open`
This confirms the tool returns an error observation instead of executing a blocked file-read operation directly inside workflow code.
Issues Found
None.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
|
Review feedback addressed in 70d18ca; all checks are green and review threads are resolved. Requesting another automated review.\n\n_This comment was created by an AI agent (OpenHands) on behalf of the user._ |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
All four issues from the previous review are cleanly resolved in this iteration — attach_parent(), the return_exceptions=True gather, the dead assert removal, and the TYPE_CHECKING block placement. The three-layer design (definition.py / impl.py / __init__.py) is easy to navigate, the AST validator + _safe_globals restriction is the right defence-in-depth approach for in-process code execution, and the test coverage is solid.
A few new observations below — one worth fixing before landing, two suggestions for your consideration.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the dynamic workflow feature end-to-end through the SDK, including parent workflow tool execution, sub-agent map/reduce orchestration, failure aggregation, and the new standalone example.
Does this PR achieve its stated goal?
Yes. On origin/main, the workflow tool was not importable; on the PR branch, it imported as workflow_tool_set, a parent SDK Conversation requested the workflow tool, the workflow spawned sub-agent map jobs plus a reducer through the task path, and the workflow observation reached the parent. I also verified map_agents aggregates multiple sub-agent failures instead of stopping after the first failure, and the new example script ran to completion with an EXAMPLE_COST line.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully |
| CI Status | ✅ 33 checks passing; 1 QA Changes check pending for this run; 3 PR artifact jobs skipped |
| Functional Verification | ✅ Dynamic workflow import, conversation execution, failure aggregation, and example run all succeeded |
Functional Verification
Test 1: Feature availability before and after the PR
Step 1 — Establish baseline without the PR:
Ran git checkout --detach origin/main && uv run python /tmp/qa_dynamic_workflow.py import:
HEAD is now at e65a8abd Register claude-opus-4-8 in SDK model lists (#3424)
Traceback (most recent call last):
File "/tmp/qa_dynamic_workflow.py", line 16, in <module>
from openhands.tools.workflow import WorkflowToolSet
ImportError: cannot import name 'WorkflowToolSet' from 'openhands.tools.workflow' (unknown location)
This confirms the dynamic workflow tool is new behavior introduced by this PR.
Step 2 — Apply the PR changes:
Checked out redo-dynamic-workflow-mvp at 70d18ca12ef08f52c8de2457acf337f27eb9a59c.
Step 3 — Re-run with the PR:
Ran uv run python /tmp/qa_dynamic_workflow.py import:
workflow tool set import ok: workflow_tool_set
This confirms the new workflow tool set is available from the package entry point.
Test 2: Parent agent executes a dynamic workflow through the SDK tool path
Step 1 — Baseline:
The base branch import failure above shows there was no workflow entry point to execute through a parent agent before the PR.
Step 2 — Apply the PR changes:
Used the PR branch with a local OpenAI-compatible fake LLM server so the SDK could run a real Conversation deterministically without external model dependencies.
Step 3 — Run the workflow as a user-facing SDK conversation:
Ran uv run python /tmp/qa_dynamic_workflow.py conversation:
fake llm requests: 6
parent requested workflow tool: True
subagent/reducer prompts observed: [{"type": "text", "text": "audit sdk"}] | [{"type": "text", "text": "audit tools"}] | [{"type": "text", "text": "audit workspace"}] | [{"type": "text", "text": "combine findings
Input:
{
"findings": [...]
workflow observation reached parent: True
workflow observation snippet: [{'type': 'text', 'text': 'SUBAGENT_RESULT for [{"type": "text", "text": "combine findings\n\nInput:...'}]
This shows the parent agent requested the workflow tool, the generated script fanned out three sub-agent prompts, ran a reducer prompt with serialized intermediate results, and returned the workflow observation back to the parent conversation.
Test 3: map_agents aggregates sibling failures
Step 1 — Baseline:
The base branch did not have the workflow API, so there was no map_agents failure behavior to exercise.
Step 2 — Apply the PR changes:
Ran the PR implementation with a recording task manager that returns errors for two items and success for one item.
Step 3 — Execute the failing workflow:
Ran uv run python /tmp/qa_dynamic_workflow.py failure-aggregation:
started prompts: audit alpha, audit beta, audit gamma
aggregated errors: simulated failure for audit beta | simulated failure for audit gamma
This confirms all sibling tasks were attempted and both failures were surfaced together, matching the PR goal that one failing item should not cancel sibling collection.
Test 4: New standalone example runs to completion
Step 1 — Baseline:
examples/01_standalone_sdk/52_dynamic_workflow.py does not exist on origin/main, so this is a new runnable example from the PR.
Step 2 — Apply the PR changes:
Ran the example on the PR branch with LLM_MODEL=openai/gpt-4o-mini, LLM_API_KEY=fake-key, and LLM_BASE_URL pointing to a local OpenAI-compatible fake server.
Step 3 — Run the actual example script:
Ran python /tmp/qa_run_dynamic_workflow_example.py:
example exit code: 0
fake llm requests: 5
Tool: workflow
Result:
Coverage note for [{"type": "text", "text": "write final coverage report
Input:..."}]
Message from Coverage Workflow Agent to User
Example completed after workflow observation.
EXAMPLE_COST: 3.75e-06
This confirms the added example can execute the workflow tool path and emit the expected example cost marker.
Issues Found
None.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
The three-layer design (WorkflowAction/WorkflowObservation → WorkflowContext → WorkflowExecutor) is clean, the _TaskStarter Protocol enables genuine dependency injection without mocking, and the ExceptionGroup aggregation + attach_parent() fixes from earlier iterations are solid. The standalone example demonstrates the intended usage clearly. A few new findings below — the wf private-attribute exposure is the most substantive; the others are usability and robustness notes.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands /iterate |
|
I'm on it! neubig can track my progress at all-hands.dev |
|
CI is all green (33/33 checks passed). All review findings from the previous review have been addressed:
Ready for another automated review. This comment was created by an AI agent (OpenHands) on behalf of the user. |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
This is a well-structured, thoughtfully documented MVP for dynamic workflow orchestration. The code is clean, the tests are comprehensive, and the security model is honest about its limitations. A few issues worth addressing before merge — the most actionable ones are the wf-alias sandbox bypass and the silent no-op in attach_parent.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
- add docstring note to validate_workflow_script acknowledging the wf-alias bypass limitation (documentation gap, not security gap) - update type() comment to accurately note methods injected via wf are not re-sandboxed, though they only expose public wf API - add warning log in attach_parent when called with a different conversation (surfaces potential programming errors) - thread 2 (run_one re-wrap): kept as-is; the [item N] prefix is valuable context for debugging ExceptionGroup failures Co-authored-by: openhands <openhands@all-hands.dev>
|
Round 24 in 8a484b7: Fixed:
Kept as-is:
All 5 threads resolved. This comment was created by an AI agent (OpenHands) on behalf of the user. |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
This PR has gone through extensive iterative refinement and is in very good shape. The three-layer design (WorkflowAction/WorkflowObservation → WorkflowContext → WorkflowExecutor), the _TaskStarter Protocol for real dependency injection (no mocks needed in most tests), the ExceptionGroup failure aggregation with 1-based item indexing, asyncio.timeout for hung-workflow protection, idempotent close(), the str.replace prompt-injection fix, and the per-call max_concurrency cap are all cleanly implemented and tested.
Status of open threads:
-
Thread 3322692914 (add test for per-call
max_concurrencycap):test_map_agents_respects_context_concurrency_capat line 321 of the test file addresses this — it verifies that a per-callmax_concurrency=1000is silently capped at the context's limit of 3 and confirms peak concurrency never exceeds 3. ✅ -
Thread 3322692922 (document the two
register_toolcalls):WorkflowTool's docstring already says "PreferWorkflowToolSetfor standard SDK auto-create usage. UseWorkflowToolwhen you need to inject a custom executor (e.g., in tests or extensions)" which documents the intent. A short comment above the tworegister_toollines would close the suggestion explicitly (see inline comment below).
Note on the _call_name false-positive (review 4386764961): The latest bot review flagged this as still unresolved, but the fix (isinstance(node.func, ast.Name) guard) has actually been in place since commit cd0aeb1. Attribute method calls like result.open() are already correctly allowed. No action needed.
[RISK ASSESSMENT]
⚠️ Risk: 🟡 MEDIUM — executes LLM-generated Python in-process with best-effort AST validation. This is explicitly and clearly documented.- No breaking changes; all new public symbols are additive.
VERDICT: ✅ Ready to merge. Two minor items below — one is a missing test for a documented behaviour guarantee, the other is a one-line comment.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the dynamic workflow feature end-to-end: the new module is absent on main, imports on the PR, executes workflow scripts, delegates to real registered sub-agents through TaskManager, aggregates map failures, and the new standalone example completed successfully.
Does this PR achieve its stated goal?
Yes. The PR set out to add a WorkflowToolSet/workflow action-observation path, provide WorkflowContext helpers (run_agent, map_agents, reduce_agent, flatten), aggregate map_agents failures, and ship a runnable SDK example. I exercised those surfaces as a user would: direct SDK imports/calls, real LLM-backed sub-agent delegation through the workflow executor, and examples/01_standalone_sdk/52_dynamic_workflow.py, which generated and ran a coverage-audit workflow and printed EXAMPLE_COST: 1.118247.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed and installed the uv-managed dev environment. |
| CI Status | 🟡 Many checks passing; tools-tests, qa-changes, and several build/push jobs were still in progress when checked; no failures observed. |
| Functional Verification | ✅ Core workflow tool behavior and the standalone example both ran successfully. |
Functional Verification
Test 1: Baseline vs PR import behavior
Step 1 — Establish baseline without the feature:
Checked out origin/main and ran uv run python /tmp/qa_dynamic_workflow_import.py:
IMPORT_ERROR:ModuleNotFoundError:No module named 'openhands.tools.workflow'
This confirms the base branch does not expose the dynamic workflow module.
Step 2 — Apply the PR's changes:
Switched back to redo-dynamic-workflow-mvp at 8a484b75764fbc47af7577b36c68cd77e8c92498.
Step 3 — Re-run with the PR:
Ran uv run python /tmp/qa_dynamic_workflow_import.py:
IMPORT_OK:WorkflowToolSet:WorkflowAction:WorkflowExecutor
This confirms the new public import surface is available on the PR branch.
Test 2: Workflow executor and context helpers
Step 1 — Baseline:
On main, the import above fails, so this workflow tool entry point cannot be exercised there.
Step 2 — Apply the PR's changes:
Used the PR branch and called WorkflowExecutor with a generated script containing async def main(wf).
Step 3 — Run with the PR:
Ran uv run python /tmp/qa_dynamic_workflow_basic.py:
STATUS:completed
IS_ERROR:False
TEXT:alpha|beta|gamma|delta
Then ran uv run python /tmp/qa_dynamic_workflow_manager.py:
SUCCESS_RESULT:done:reduce
Input:
{
"one": "done:single",
"mapped": [
"done:map a",
"done:map b"
]
}
SUCCESS_PROMPTS:['worker:single', 'mapper:map a', 'mapper:map b', 'reducer:reduce
Input:
{
"one": "done:single",
"mapped": [
"done:map a",
"done:map b"
]
}']
CLOSED:True
FAILURE_TYPE:ExceptionGroup
FAILURE_TEXT:map_agents: one or more sub-agents failed (2 sub-exceptions)
FAILURE_ITEMS:[item 2] boom:item FAIL-1|[item 3] boom:item FAIL-2
This confirms flatten, run_agent, map_agents, reduce_agent, manager cleanup, and aggregated map failures all behave as described.
Test 3: Real registered sub-agents via TaskManager
Step 1 — Baseline:
The workflow module is absent on main, so real workflow delegation is unavailable there.
Step 2 — Apply the PR's changes:
Registered a minimal qa_echo sub-agent and invoked WorkflowExecutor without injecting a custom manager, so the workflow used the real TaskManager path.
Step 3 — Run with the PR:
Ran uv run python /tmp/qa_dynamic_workflow_real_subagents.py:
Task 'task_00000001' completed.
Task 'task_00000002' completed.
Task 'task_00000003' completed.
STATUS:completed
IS_ERROR:False
TEXT:RED, BLUE
This demonstrates the workflow can fan out to registered sub-agents and reduce their outputs through real SDK conversation/task machinery.
Test 4: Validation error path and standalone example
Validation path:
Ran uv run python /tmp/qa_dynamic_workflow_error.py:
STATUS:error
IS_ERROR:True
TEXT:Workflow scripts may not access private wf attributes or call wf.close()
This confirms unsafe/unsupported workflow calls surface as an error observation rather than silently running.
Standalone example:
Ran uv run python examples/01_standalone_sdk/52_dynamic_workflow.py:
Summary: Return synthesized repository coverage audit report
...
Finish with message:
## Repo-wide test coverage report
...
Tokens: ↑ input 185.62K • cache hit 58.48% • reasoning 4.85K • ↓ output 22.62K • $ 1.1182
EXAMPLE_COST: 1.118247
This confirms the new example runs end-to-end: the parent agent generated workflow code, the workflow executed sub-agent coverage audits, the reducer returned a repo-wide report, and the required EXAMPLE_COST line was printed.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
Co-authored-by: openhands <openhands@all-hands.dev>
|
CI is green on Low-risk note: the dynamic workflow tool is additive and is not added to any defaults or default execution paths; callers only get it if they explicitly opt into the tool. Ready for another automated review. This PR comment was created by an AI agent (OpenHands) on behalf of the user. |
|
🔍 Review in progress… We are performing the review through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 QA Report: PARTIAL
Core dynamic workflow behavior works in local execution, but a successful live LLM-backed sub-agent run could not be verified because the available LLM configuration failed before model completion.
Does this PR achieve its stated goal?
Partially verified. The PR adds the workflow tool surface and I was able to run dynamic workflow scripts that exercised run_agent, map_agents, reduce_agent, and flatten, including aggregation of multiple map_agents failures and creation of the tool inside a real SDK Conversation. I could not verify the fully live TaskManager + LLM sub-agent success path or the standalone example end-to-end because the available environment either lacked the LiteLLM proxy base URL or had an invalid direct OpenAI credential.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully; no tests/linters were run. |
| CI Status | ⏳ 34 successful checks, 0 failed, 5 skipped, 1 in progress (qa-changes) when checked. |
| Functional Verification | 🟡 Core workflow execution verified; live LLM-backed sub-agent completion blocked by environment credentials/config. |
Functional Verification
Test 1: Baseline vs PR tool availability
Step 1 — Establish baseline without the PR:
Ran this on a detached origin/main worktree:
cd /tmp/qa-dynamic-workflow-base && uv run python - <<'PY'
try:
from openhands.tools.workflow import WorkflowToolSet
except Exception as exc:
print(type(exc).__name__ + ': ' + str(exc))
else:
print('WorkflowToolSet import unexpectedly succeeded:', WorkflowToolSet)
PYOutput:
ModuleNotFoundError: No module named 'openhands.tools.workflow'
This shows the workflow tool is new behavior introduced by the PR.
Step 2 — Apply the PR's changes:
Used the checked-out PR head fa05bbc3efb03c1956a520cf85e94222064c814b.
Step 3 — Re-run with the PR in place:
Ran a workflow script through the PR's API that imports WorkflowToolSet, executes a generated async def main(wf) script, fans out map_agents, calls reduce_agent, and closes the manager:
imported WorkflowToolSet: workflow_tool_set
workflow result: {'flattened': ['qa-worker handled [single item]', 'qa-worker handled [map alpha]', 'qa-worker handled [map {"name": "beta"}]'], 'reduced': 'qa-reducer handled [summarize\n\nInput:\n{...}]'}
task calls: [('qa-worker', 'single item', 'single run'), ..., ('qa-reducer', 'summarize\n\nInput:\n{...}', 'final reduce')]
manager closed: True
This confirms the new workflow module is importable and generated workflow code can coordinate run_agent, map_agents, reduce_agent, and flatten with real execution semantics.
Test 2: map_agents failure aggregation
Ran a generated workflow where two mapped items failed while one succeeded:
aggregated failure count: 2
aggregated failure messages: ['[item 2] subagent refused prompt: item fail-one', '[item 3] subagent refused prompt: item fail-two']
This confirms a failing item does not hide sibling failures; both mapped failures were collected and surfaced together.
Test 3: Tool creation inside a real SDK conversation
Ran a real Agent/Conversation construction with Tool(name=WorkflowToolSet.name) and executed a simple workflow through WorkflowExecutor:
tool definitions created: [('workflow', 'WorkflowAction')]
observation: completed {'flattened': ['a', 'b', 'c']} False
This confirms the new tool set can be created from a conversation state and returns a successful WorkflowObservation for a valid dynamic script.
Unable to Verify
Live LLM-backed sub-agent success path / standalone example
I attempted a minimal real TaskManager-backed sub-agent workflow with a registered sub-agent and the environment's default model:
using model: litellm_proxy/openai/gpt-5.5
api key present: True
workflow status: error
workflow text: map_agents: one or more sub-agents failed:
[1] [item 1] Conversation run failed ... Litellm_proxyException - api_base not set for LiteLLM Proxy responses API. Set via api_base parameter or LITELLM_PROXY_API_BASE environment variable
[2] [item 2] Conversation run failed ... Litellm_proxyException - api_base not set for LiteLLM Proxy responses API. Set via api_base parameter or LITELLM_PROXY_API_BASE environment variable
I retried with a direct OpenAI model to rule out the proxy setting:
using model: openai/gpt-4o-mini
api key present: True
workflow status: error
workflow text: Conversation run failed ... litellm.AuthenticationError: OpenAIException - Incorrect API key provided
Because both attempts failed before a model response due to environment configuration/authentication, I could not honestly verify a successful live LLM-backed sub-agent run or the full examples/01_standalone_sdk/52_dynamic_workflow.py behavior. Future QA would be easier if AGENTS.md documented a known-good low-cost LLM_MODEL/LLM_BASE_URL combination for SDK example verification.
Issues Found
None.
This review was generated by an AI agent (OpenHands) on behalf of the user.
VascoSch92
left a comment
There was a problem hiding this comment.
LGTM!
Just some style comments.
A things to flag is that the timeout is just effective for coroutine which awaits.
So this code will not be caught by the timeout:
async def main(wf):
while True:
passor a similar recursive script.
Is that a possible user-case or just a corner one?
|
@OpenHands fix the code review comments |
|
Uh oh! There was an unexpected error starting the job :( |
|
@OpenHands fix the code review comments |
|
Uh oh! There was an unexpected error starting the job :( |
|
@OpenHands fix the code review comments |
|
I'm on it! VascoSch92 can track my progress at all-hands.dev |
|
@neubig I launched my OH to fix the review. Hope is not a problem. |
- Use Final[str] for _WORKFLOW_DESCRIPTION - Add 'from __future__ import annotations' to definition.py for consistency and drop quoted forward references on WorkflowTool/WorkflowToolSet.create - Drop the docstring-duplicating comments above register_tool calls - Simplify validate_workflow_script main-signature check via main_args alias - Collapse the unsafe-call check into a single condition - Use ternary in _attribute_root_name Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed @VascoSch92's style nits in 16f873b:
Re: the timeout caveat — All six review threads are resolved. This comment was created by an AI agent (OpenHands) on behalf of the user. |
SummaryAddressed @VascoSch92's six inline code review comments on PR #3426 in commit Request fulfillment checklist
ConcisenessThe changes are purely the requested style nits — no extraneous modifications. The diff is +24/−24 across exactly the two files called out by the reviewer ( PR: #3426 |
2017a56 to
16f873b
Compare
|
[Automatic Post]: This PR seems to be currently waiting for review. @neubig @VascoSch92, could you please take a look when you have a chance? This comment was created by an AI agent (OpenHands) on behalf of the user. |
Summary
WorkflowToolSetwithWorkflowAction/WorkflowObservationfor Python dynamic workflow scriptsWorkflowContextthat supportsrun_agent,map_agents,reduce_agent, andflattenusing the existingTaskManagermap_agentssub-agent failures so one failing item does not cancel sibling tasks before their results/errors are collectedCloses #3425
Related docs PR: OpenHands/docs#535
Tests
uv run pytest tests/tools/workflow/test_workflow_tool.py -quv run pytest tests/tools/workflow/test_workflow_tool.py tests/tools/task/test_task_manager.py tests/tools/task/test_task_manager_thread_safety.py -quv run pre-commit run --files openhands-tools/openhands/tools/task/manager.py openhands-tools/openhands/tools/workflow/definition.py openhands-tools/openhands/tools/workflow/impl.py openhands-tools/openhands/tools/workflow/__init__.py openhands-tools/openhands/tools/__init__.py tests/tools/workflow/test_workflow_tool.py examples/01_standalone_sdk/52_dynamic_workflow.pyuv run python examples/01_standalone_sdk/52_dynamic_workflow.pyDOCS_PATH=/workspace/project/docs GITHUB_EVENT_NAME=pull_request GITHUB_EVENT_PATH=/tmp/pr_event.json python .github/scripts/check_documented_examples.pyThis PR was created by an AI agent (OpenHands) on behalf of the user.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:16f873b-pythonRun
All tags pushed for this build
About Multi-Architecture Support
16f873b-python) is a multi-arch manifest supporting both amd64 and arm6416f873b-python-amd64) are also available if needed