test(ci): surface Resource phase child failures#1211
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds three private helpers to Resource Phase Failure Reporting
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces helper functions _github_actions_escape, _tail_text, and _emit_resource_failure_summary in conftest.py to print detailed summaries and GitHub Actions error annotations for failed Resource child jobs. It also integrates these summaries into the test dispatching workflow and adds comprehensive unit tests in tests/ut/py/test_resource_failure_summary.py. There are no review comments, so I have no feedback to provide.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
a955e76 to
3d16fc7
Compare
修改前后报错区别修改前,Resource phase 中某个 child pytest 失败后,详细失败信息只会出现在对应的折叠 Process completed with exit code 1. 从最终 tail 看不出具体是哪个 Resource child 失败,只能知道 task-submit 返回了 1。排 standalone test_device_error_class_reaches_host_log[scope_deadlock] ... [FAILrc=-11] 修改后,Resource child 失败会在折叠 group 外额外打印未折叠摘要,并输出 GitHub *** Resource phase failed: 1 child job(s) ***
如果后续 L2 phase 继续执行,最终退出前还会再打印一次 compact recap,避免 Resource 失 *** Resource phase failed recap: 1 child job(s) ***
已验证python -m pytest tests/ut/py/test_resource_failure_summary.py -q 结果: 2 passed 覆盖内容:
同时跑了: /data/xxx/simpler/.venv/bin/python -m ruff check conftest.py tests/ut/py/ 结果: All checks passed! 总结
|
|
我觉得这里可以调整一下输出策略。 Resource child 的完整 pytest 输出已经放在 这里更有价值的改进可能是让失败的 folded group 本身更容易定位。现在外层会告诉我们哪个 child 失败了,但 group title 主要是 label,例如 建议:
这样 CI 最后能看到失败的是哪个具体用例,同时 reviewer 可以直接按 nodeid 搜索/定位到对应 folded group;完整 traceback 仍然保留在 group 里,不会重复展开。 |
Fixes hw-native-sys#1205 by making Resource-phase child pytest failures visible outside collapsed GitHub Actions groups. The dispatcher now prints failed child labels, return codes, devices, durations, output tails, and a final recap after L2 so the top-level CI tail remains actionable. This does not change the underlying hardware/runtime failure behavior.
3d16fc7 to
0d63347
Compare
|
@ChaoZheng109 已按这个方向调整并重新推了 commit 主要改动:
验证过: 另外用 a2a3 onboard 做了真实 Resource failure probe,AICore 当前 summary 会在 group 外显示失败 child 的 nodeid、label、rc、devices、duration 和上述短 hint;完整 traceback 仍保留在 folded group 中。 |
Summary
Fixes #1205.
When the root pytest dispatcher runs Resource-phase child jobs, a failing child currently leaves the top-level CI tail with only
task-submit/Process completed with exit code 1unless the reviewer expands the right GitHub group and the relevant output was not truncated.This PR makes Resource failures visible outside collapsed groups by printing:
::errorannotation for each failed childNotes
This addresses the CI observability problem from #1205. It does not attempt to fix the underlying a5
runtime_fatal_codesteardown behavior; #1206 mitigated that separately by taking the a5 onboard case offline.Verification
conftest.pydue to Linux-onlyfcntldependencies.