Arm backend: implement EthosUBackend::is_available() hardware check#20021
Arm backend: implement EthosUBackend::is_available() hardware check#20021vacu9708 wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20021
Note: Links to docs will display an error until the docs builds have been completed.
|
|
|
|
I don't seem to have merge permission. Could a maintainer please merge this once there are no remaining blockers? |
This PR needs a
|
d3db0f9 to
c1df1d3
Compare
4e79ede to
831dba8
Compare
|
CI failures : test-arm-backend-zephyr raises
|
4bfbfe1 to
99697c1
Compare
| @@ -0,0 +1,54 @@ | |||
| From 08111f0f4f5034237181b8ac4b50ccdc84575fb9 Mon Sep 17 00:00:00 2001 | |||
There was a problem hiding this comment.
Hi thinking about it it would be good if we can either get this upstreamed or keep it outside of the git as user can have their own version of the driver supplied from the build system from the vendor. Adding A patch like this works inside the project but might collide if build in other systems like you discovered with Zephyr. I'm not sure about CMSIS-PACK (on it's way) and future integration to example PlatformIO/Arudoino might need to handle this.
There was a problem hiding this comment.
Hi @zingo. I agree with you.
I removed the patch file and added a TODO comment above the weak fallback.
I am going to submit an upstream PR to add ethosu_driver_is_registered().
Once the upstream change is merged and all builds are confirmed to be using
a driver with ethosu_driver_is_registered() implemented, the weak symbol
and this TODO can be removed in a follow-up PR. Until then, the weak fallback
ensures this PR does not break any build that links against an older driver.
Without this check, ethos-u users may have a hard time debugging mysterious freezes when ethosu_init() is accidentally omitted after refactoring or for some other reason.
99697c1 to
caf58c7
Compare
is_available() returned 1 unconditionally, causing two failure modes: - Cortex-M: missing ethosu_init() hangs silently in ethosu_reserve_driver() - Cortex-A: absent /dev/ethosu0 fails late with a cryptic driver error Changes: - Cortex-M: calls ethosu_driver_is_registered() to check whether ethosu_init() was called. A weak fallback (return true) is kept until ethosu_driver_is_registered() is available in all driver builds. - Cortex-A: constructs EthosU::Device to probe /dev/ethosu0. Signed-off-by: Youngsik Yang <vacu9708@gmail.com>
caf58c7 to
63879e9
Compare
Summary
EthosUBackend::is_available()returned1unconditionally.Method::load()callsis_available()beforeinit()and returnsError::NotFoundwith a clear message when the backend is unavailable — but that check was never triggered.The consequence differs by platform:
ethosu_init()must be called by the firmware before ExecuTorch runs — it registers the NPU driver handle thatethosu_reserve_driver()later waits for. If it goes missing for some reason, the program hangs silently insideethosu_reserve_driver()which waits forever on a semaphore. No error is returned, no log is produced, and the device appears frozen./dev/ethosu0may be absent if the kernel module is not loaded. the error was logged but attributed to the driver invocation rather than to availability.flowchart LR subgraph before["Before - is_available() = return 1"] direction TB B1["Method::load()"] B2["is_available()\nunconditionally returns true"] B3["init()\nexecute()"] B4["ethosu_reserve_driver()"] B5["invoke_linux_driver()\n-> cryptic driver error\nfails late"] SEM(["Semaphore count = 0\n==\nethosu_init() never called"]) B1 --> B2 --> B3 B3 -- Cortex-M --> B4 B3 -- Cortex-A --> B5 B4 -->|semaphore_take\nWAIT_FOREVER| SEM SEM -->|"🔄 spinning forever\nno error / no log"| SEM end subgraph after["After - is_available() checks hardware"] direction TB A1["Method::load()"] A2{"is_available()"} A3["ethosu_driver_is_registered()\nregistered_drivers == NULL"] A4["EthosU::Device('/dev/ethosu0')\nthrows if absent"] A5["Error::NotFound\n early / clear message"] A1 --> A2 A2 -- Cortex-M --> A3 A2 -- Cortex-A --> A4 A3 -- false --> A5 A4 -- throws --> A5 end before ~~~ afterChanges
ethosu_driver_is_registered()toethos-u-core-driver. It checksregistered_drivers != NULL— and answers "wasethosu_init()called?" The semaphore count is always> 0whenregistered_drivers != NULL, so atruereturn guaranteesethosu_reserve_driver()will not block.EthosU::Device, which throws if the device is absent or unopenable.EthosUBackend_Internal.hplatform_is_available()EthosUBackend.cppEthosUBackend_Cortex_M.cppethosu_driver_is_registered(); weak symbol fallback (return true) for build paths where the patch is not appliedEthosUBackend_Cortex_A.cppEthosU::Deviceprobecorstone_utils.cmakepatch_repocall forcore_software/core_driver(Corstone SDK)examples/arm/ethos-u-setup/core_driver/0001-Add-ethosu_driver_is_registered.patchethosu_driver_is_registered()to the driver; applied to both the Corstone SDK and Zephyrhal_ethos_uzephyr/CMakeLists.txthal_ethos_uat CMake configure timeethosu_driver_is_registered()is added via a patch file rather than an upstream contribution because ExecuTorch already uses this mechanism for the Ethos-U SDK. This PR extends the same pattern tocore_driver.The weak symbol ensures any other build path not reached by the patch degrades to
return truerather than failing at link time.Tests
No tests added. Other backends like VGFBackend, XNNPACK, and MLX each have real
is_available()implementations and were merged without tests.The implementations here are similarly straightforward; I think end-to-end coverage would require disproportionate infrastructure for changes of this size.
$ lintrunner backends/arm/runtime/EthosUBackend*.cpp \ backends/arm/runtime/EthosUBackend_Internal.h \ backends/arm/scripts/corstone_utils.cmake ok No lint issues.cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani