APITest 功能新增:引入更高效的 GPU 任务调度与并行执行引擎 engineV4#638
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📝 主要内容
相比 engineV2 的核心改进
1. 架构升级:从进程池到任务队列
engineV2 采用 Pebble 的 ProcessPool + 动态负载均衡方案,engineV4 引入每工作进程独立队列架构(WorkerPool + per-worker queues)。改进包括:
2. 增强的容错能力
exit(99):致命 CUDA 错误 → 自动重启工作者exit(98):OOM 错误 → 自动重启工作者exit(1):测试失败 → 标记为错误,保留工作者pending_dispatch队列,确保无任务丢失3. 智能内存管理
4. 运行时可观测性优化
--show_runtime_status选项,大规模测试可控制输出频率新增功能
1. 启动脚本强化
新增
run-v4.sh提供运维友好的三大操作:性能对标
向后兼容性