fix: 修复dml并行任务#2106
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthrough引入单线程 GPU 执行器的线程本地标记与同步/提交 API,新增基于执行提供者的会话序列化判断并将 ONNX 会话创建与运行路由到该执行器;用线程锁替代 OCR 初始化的忙等轮询并在 OCR 推理失败时记录异常并重新抛出;新增并发与序列化相关单元测试。 ChangesGPU 执行器与 ONNX 会话接入
OCR 初始化与 OCR 推理错误处理
序列流程图sequenceDiagram
participant Client as 调用方
participant Model as 推理加载器/调用代码
participant Executor as GPU 执行器线程
participant Session as ONNX 会话
Client->>Model: 发起推理请求(输入)
Model->>Session: 准备输入
alt 会话需序列化 (Dml 或 provider 不明)
Model->>Executor: run_sync(session.run)
Executor->>Session: session.run(...) (序列化执行)
Session-->>Executor: 返回结果
Executor-->>Model: 同步返回结果
else 无需序列化 (CPU)
Model->>Session: 直接 session.run(...)
Session-->>Model: 返回结果
end
Model-->>Client: 返回推理输出
代码审查工作量评估🎯 4 (复杂) | ⏱️ ~45 分钟 可能相关的 PR
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/one_dragon/utils/gpu_executor.py (1)
24-51: 把共享执行包装器的类型签名补齐。Line 24-51 这里几个新增 helper 已经成了 OCR 和 YOLO 共用入口,但
session、output_names、input_feed以及变参仍然是裸类型,后面很难靠静态检查发现误传。建议先把这层公共 API 的参数和返回值补齐,再让薄包装(比如src/one_dragon/yolo/onnx_model_loader.py和src/onnxocr/predict_base.py的 wrapper)直接复用同一套签名。As per coding guidelines
**/*.py: All functions and methods must include type hints.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/one_dragon/utils/gpu_executor.py` around lines 24 - 51, Add proper type hints to the shared executor helpers: annotate is_executor_thread() -> bool; submit(fn: Callable[..., T], /, *args: Any, **kwargs: Any) -> Future[T]; run_sync(fn: Callable[..., T], /, *args: Any, **kwargs: Any) -> T; should_serialize_session(session: Any) -> bool; and run_session(session: Any, output_names: Sequence[str], input_feed: Optional[Dict[str, Any]] = None, **kwargs: Any) -> Sequence[Any]; import and use typing symbols (TypeVar T, Callable, Any, Optional, Dict, Sequence, Future) so the wrappers in yolo/onnx_model_loader.py and onnxocr/predict_base.py can reuse the same signatures.src/onnxocr/predict_base.py (1)
20-29: 把 ORT Session 初始化抽成公共工厂。Line 20-29 这里和
src/one_dragon/yolo/onnx_model_loader.pyLine 146-155 已经复制了一套 provider 选择和 DMLSessionOptions配置。现在这两处都属于同一个并行崩溃修复面,后面只要一边再补一个兼容参数,另一边就会继续跑旧逻辑,回归时会很难排查。建议把这段抽到一个共享 helper,让 OCR 和 YOLO 统一走一处配置。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/onnxocr/predict_base.py` around lines 20 - 29, Extract the duplicated ONNX Runtime session initialization into a shared factory (e.g., create_onnx_session or make_onnxruntime_session) and replace the inline code in src/onnxocr/predict_base.py (session_options, DmlExecutionProvider check, ORT_SEQUENTIAL, enable_mem_pattern, and onnxruntime.InferenceSession creation) and the similar block in src/one_dragon/yolo/onnx_model_loader.py to call that factory; the factory should accept model path, providers, and optional extra session options, implement the "if 'DmlExecutionProvider' in providers then session_options.execution_mode = onnxruntime.ExecutionMode.ORT_SEQUENTIAL and session_options.enable_mem_pattern = False" logic, return the InferenceSession, and be imported/used by both modules so future provider/option changes are made in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/one_dragon/utils/gpu_executor.py`:
- Around line 40-44: The current should_serialize_session(session) swallows all
exceptions from session.get_providers(), silently disabling serialization;
replace the broad try/except with an existence check: use getattr(session,
"get_providers", None) and if it's None return False, otherwise call
get_providers() and let any exceptions propagate (do not catch Exception), then
check for "DmlExecutionProvider". Also add a type annotation for the session
parameter (e.g., session: Any or session: InferenceSession) so the signature is
typed and import Any or the appropriate ONNX session type.
---
Nitpick comments:
In `@src/one_dragon/utils/gpu_executor.py`:
- Around line 24-51: Add proper type hints to the shared executor helpers:
annotate is_executor_thread() -> bool; submit(fn: Callable[..., T], /, *args:
Any, **kwargs: Any) -> Future[T]; run_sync(fn: Callable[..., T], /, *args: Any,
**kwargs: Any) -> T; should_serialize_session(session: Any) -> bool; and
run_session(session: Any, output_names: Sequence[str], input_feed:
Optional[Dict[str, Any]] = None, **kwargs: Any) -> Sequence[Any]; import and use
typing symbols (TypeVar T, Callable, Any, Optional, Dict, Sequence, Future) so
the wrappers in yolo/onnx_model_loader.py and onnxocr/predict_base.py can reuse
the same signatures.
In `@src/onnxocr/predict_base.py`:
- Around line 20-29: Extract the duplicated ONNX Runtime session initialization
into a shared factory (e.g., create_onnx_session or make_onnxruntime_session)
and replace the inline code in src/onnxocr/predict_base.py (session_options,
DmlExecutionProvider check, ORT_SEQUENTIAL, enable_mem_pattern, and
onnxruntime.InferenceSession creation) and the similar block in
src/one_dragon/yolo/onnx_model_loader.py to call that factory; the factory
should accept model path, providers, and optional extra session options,
implement the "if 'DmlExecutionProvider' in providers then
session_options.execution_mode = onnxruntime.ExecutionMode.ORT_SEQUENTIAL and
session_options.enable_mem_pattern = False" logic, return the InferenceSession,
and be imported/used by both modules so future provider/option changes are made
in one place.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 53e7a87c-8bba-49b4-b30e-3543fe5fb75c
📒 Files selected for processing (9)
src/one_dragon/base/operation/one_dragon_context.pysrc/one_dragon/utils/gpu_executor.pysrc/one_dragon/yolo/onnx_model_loader.pysrc/one_dragon/yolo/yolov8_onnx_cls.pysrc/one_dragon/yolo/yolov8_onnx_det.pysrc/onnxocr/predict_base.pysrc/onnxocr/predict_cls.pysrc/onnxocr/predict_det.pysrc/onnxocr/predict_rec.py
| def should_serialize_session(session) -> bool: | ||
| try: | ||
| providers = session.get_providers() | ||
| except Exception: | ||
| return False |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Read the specific file and lines in question
head -50 src/one_dragon/utils/gpu_executor.py | tail -20Repository: OneDragon-Anything/ZenlessZoneZero-OneDragon
Length of output: 651
🏁 Script executed:
# Also check what's in the file around those lines with context
sed -n '30,55p' src/one_dragon/utils/gpu_executor.pyRepository: OneDragon-Anything/ZenlessZoneZero-OneDragon
Length of output: 846
🏁 Script executed:
# Check the downstream usage in the mentioned files
echo "=== src/one_dragon/yolo/onnx_model_loader.py around line 159 ==="
sed -n '155,165p' src/one_dragon/yolo/onnx_model_loader.py
echo "=== src/onnxocr/predict_base.py around line 68 ==="
sed -n '64,74p' src/onnxocr/predict_base.pyRepository: OneDragon-Anything/ZenlessZoneZero-OneDragon
Length of output: 875
🏁 Script executed:
# Check the full run_session function to understand control flow
sed -n '40,80p' src/one_dragon/utils/gpu_executor.pyRepository: OneDragon-Anything/ZenlessZoneZero-OneDragon
Length of output: 597
应改进异常处理以避免静默禁用序列化保护。
should_serialize_session() 第 41-44 行对所有异常都进行了捕获并返回 False,导致当 get_providers() 出现任何错误时,run_session() 都会直接落回到 session.run() 而绕过序列化保护。这会导致 src/one_dragon/yolo/onnx_model_loader.py 第 164 行和 src/onnxocr/predict_base.py 第 72 行的调用处静默禁用原本的 DML 保护机制。应仅将"方法不存在"作为不可序列化的情况处理,其他异常应直接抛出,以便及时发现真实问题。
同时,函数参数 session 缺少类型注解,需补充类型提示。
def should_serialize_session(session) -> bool:
try:
providers = session.get_providers()
except Exception:
return False
return "DmlExecutionProvider" in providers建议改为:
def should_serialize_session(session) -> bool:
get_providers = getattr(session, "get_providers", None)
if get_providers is None:
return False
providers = get_providers()
return "DmlExecutionProvider" in providers🧰 Tools
🪛 Ruff (0.15.6)
[warning] 43-43: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/one_dragon/utils/gpu_executor.py` around lines 40 - 44, The current
should_serialize_session(session) swallows all exceptions from
session.get_providers(), silently disabling serialization; replace the broad
try/except with an existence check: use getattr(session, "get_providers", None)
and if it's None return False, otherwise call get_providers() and let any
exceptions propagate (do not catch Exception), then check for
"DmlExecutionProvider". Also add a type annotation for the session parameter
(e.g., session: Any or session: InferenceSession) so the signature is typed and
import Any or the appropriate ONNX session type.
|
closed as not planned |
|
重点PR, 需要支援, @JoshCai233 帮帮忙 |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
tests/one_dragon/test_gpu_inference_serialization.py (1)
22-208: ⚡ Quick win测试文件新增函数建议统一补全类型注解。
Line 22-208的新增 helper/test 函数中,多个定义缺少类型注解(至少应补-> None)。建议一次性补齐,避免后续静态检查噪声。As per coding guidelines
**/*.py: “All functions and methods must include type hints”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/one_dragon/test_gpu_inference_serialization.py` around lines 22 - 208, Several helper and test functions lack return type annotations (should at least have -> None) which violates the project's typing guideline; add explicit type hints for all functions and methods in this diff—e.g., ConcurrencyProbe.__init__, ConcurrencyProbe.enter, FakeSession.__init__, FakeSession.get_providers, FakeSession.run, create_session (inside test_create_onnx_session_serializes_dml_factories), the test functions (test_create_onnx_session_serializes_dml_factories, test_run_session_serializes_dml_sessions, test_run_session_serializes_when_provider_lookup_fails, test_run_session_does_not_serialize_cpu_sessions, test_ocr_init_model_uses_lock_for_concurrent_calls, test_ocr_init_model_failure_releases_lock, test_onnx_paddleocr_ocr_reraises_inference_errors) and any local callables (download); update their signatures to include appropriate type hints (e.g., -> None or concrete return types) so static type checks pass.src/onnxocr/predict_base.py (1)
74-75: ⚡ Quick win为
run_onnx_session补齐类型注解。
Line 74新增方法缺少参数与返回值类型,和仓库的 Python 规范不一致。建议修改
+from typing import Any + - def run_onnx_session(self, onnx_session, output_names, input_feed): + def run_onnx_session( + self, + onnx_session: Any, + output_names: list[str], + input_feed: dict[str, Any], + ) -> Any: return gpu_executor.run_session(onnx_session, output_names, input_feed=input_feed)As per coding guidelines
**/*.py: “All functions and methods must include type hints” and “use modern syntax features such aslist[str]instead ofList[str]”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/onnxocr/predict_base.py` around lines 74 - 75, Add Python type annotations to run_onnx_session: annotate the parameters and return type using modern syntax (e.g. onnx_session: Any, output_names: list[str], input_feed: dict[str, Any]) and give the function an explicit return type (-> Any) to match project guidelines; import Any from typing if not already present and keep the call to gpu_executor.run_session(onnx_session, output_names, input_feed=input_feed) unchanged.src/one_dragon/yolo/onnx_model_loader.py (1)
164-165: ⚡ Quick win
run_session类型声明不完整且泛型写法未按 3.11 规范。
Line 164仍是List[str],并且方法缺少返回类型标注。建议修改
-from typing import Optional, List +from typing import Any, Optional ... - def run_session(self, output_names: List[str], input_feed: dict): + def run_session(self, output_names: list[str], input_feed: dict[str, Any]) -> Any: return gpu_executor.run_session(self.session, output_names, input_feed=input_feed)As per coding guidelines
**/*.py: “Target Python 3.11+ and use modern syntax features such aslist[str]instead ofList[str]” and “All functions and methods must include type hints”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/one_dragon/yolo/onnx_model_loader.py` around lines 164 - 165, The method run_session should use 3.11-style generics and include a return type: change the signature from def run_session(self, output_names: List[str], input_feed: dict): to use list[str] and an explicit return annotation matching what gpu_executor.run_session returns (e.g., -> list[Any] or -> tuple[Any, ...]); also tighten input_feed to dict[str, Any] and add from typing import Any at the top if needed so the signature becomes something like def run_session(self, output_names: list[str], input_feed: dict[str, Any]) -> list[Any]: while leaving the body return gpu_executor.run_session(self.session, output_names, input_feed=input_feed) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/one_dragon/yolo/onnx_model_loader.py`:
- Around line 164-165: The method run_session should use 3.11-style generics and
include a return type: change the signature from def run_session(self,
output_names: List[str], input_feed: dict): to use list[str] and an explicit
return annotation matching what gpu_executor.run_session returns (e.g., ->
list[Any] or -> tuple[Any, ...]); also tighten input_feed to dict[str, Any] and
add from typing import Any at the top if needed so the signature becomes
something like def run_session(self, output_names: list[str], input_feed:
dict[str, Any]) -> list[Any]: while leaving the body return
gpu_executor.run_session(self.session, output_names, input_feed=input_feed)
unchanged.
In `@src/onnxocr/predict_base.py`:
- Around line 74-75: Add Python type annotations to run_onnx_session: annotate
the parameters and return type using modern syntax (e.g. onnx_session: Any,
output_names: list[str], input_feed: dict[str, Any]) and give the function an
explicit return type (-> Any) to match project guidelines; import Any from
typing if not already present and keep the call to
gpu_executor.run_session(onnx_session, output_names, input_feed=input_feed)
unchanged.
In `@tests/one_dragon/test_gpu_inference_serialization.py`:
- Around line 22-208: Several helper and test functions lack return type
annotations (should at least have -> None) which violates the project's typing
guideline; add explicit type hints for all functions and methods in this
diff—e.g., ConcurrencyProbe.__init__, ConcurrencyProbe.enter,
FakeSession.__init__, FakeSession.get_providers, FakeSession.run, create_session
(inside test_create_onnx_session_serializes_dml_factories), the test functions
(test_create_onnx_session_serializes_dml_factories,
test_run_session_serializes_dml_sessions,
test_run_session_serializes_when_provider_lookup_fails,
test_run_session_does_not_serialize_cpu_sessions,
test_ocr_init_model_uses_lock_for_concurrent_calls,
test_ocr_init_model_failure_releases_lock,
test_onnx_paddleocr_ocr_reraises_inference_errors) and any local callables
(download); update their signatures to include appropriate type hints (e.g., ->
None or concrete return types) so static type checks pass.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cc957fff-35ad-47fd-987c-b2f047706c0a
📒 Files selected for processing (6)
src/one_dragon/base/matcher/ocr/onnx_ocr_matcher.pysrc/one_dragon/utils/gpu_executor.pysrc/one_dragon/yolo/onnx_model_loader.pysrc/onnxocr/onnx_paddleocr.pysrc/onnxocr/predict_base.pytests/one_dragon/test_gpu_inference_serialization.py
@JoshCai233 的测试, A卡会死机, 需要明确在GUI上声明这个问题或者更新方案解决 @A-nony-mous |
|
其他三名志愿者的A卡已明确不会死机,需要更加合理的说法 |
|
增加了 (A卡慎用) 字样, 就这样吧 |

彻底解决DML并发问题,并恢复gpu配置设定。
stress_test_directdml.py测试通过
ref: #874 #1675 #2043
Summary by CodeRabbit
新功能
修复与行为变更
性能改进
测试