-
Notifications
You must be signed in to change notification settings - Fork 1k
Add Arduino library support for ExecuTorch #20221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| # Generated by build_arduino_library.sh — do not check in | ||
| arduino_lib/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| /* | ||
| * Copyright (c) Meta Platforms, Inc. and affiliates. | ||
| * All rights reserved. | ||
| * | ||
| * This source code is licensed under the BSD-style license found in the | ||
| * LICENSE file in the root directory of this source tree. | ||
| */ | ||
|
|
||
| #pragma once | ||
|
|
||
| // Arduino's custom <new> header omits <exception>, which breaks | ||
| // std::bad_variant_access in <variant>. Include it first. | ||
| #include <exception> | ||
|
|
||
| #ifndef C10_USING_CUSTOM_GENERATED_MACROS | ||
| #define C10_USING_CUSTOM_GENERATED_MACROS | ||
| #endif | ||
| #ifndef ET_ENABLE_DEPRECATED_CONSTANT_BUFFER | ||
| #define ET_ENABLE_DEPRECATED_CONSTANT_BUFFER 0 | ||
| #endif | ||
| #ifndef FLATBUFFERS_MAX_ALIGNMENT | ||
| #define FLATBUFFERS_MAX_ALIGNMENT 1024 | ||
| #endif | ||
|
|
||
| #include <executorch/extension/data_loader/buffer_data_loader.h> | ||
| #include <executorch/runtime/core/memory_allocator.h> | ||
| #include <executorch/runtime/executor/method.h> | ||
| #include <executorch/runtime/executor/method_meta.h> | ||
| #include <executorch/runtime/executor/program.h> | ||
| #include <executorch/runtime/platform/runtime.h> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,344 @@ | ||
| <!--- | ||
| Copyright (c) Meta Platforms, Inc. and affiliates. | ||
| All rights reserved. | ||
|
|
||
| This source code is licensed under the BSD-style license found in the | ||
| LICENSE file in the root directory of this source tree. | ||
| ---> | ||
|
|
||
| # ExecuTorch Arduino Library | ||
|
|
||
| Run PyTorch models on Arduino microcontrollers using ExecuTorch. | ||
|
|
||
| This directory contains everything needed to package ExecuTorch as an | ||
| Arduino library. A build script vendors the runtime sources from this | ||
| repository into a self-contained library that Arduino users install | ||
| through the Library Manager or by copying into their libraries folder. | ||
|
|
||
| ## How It Works | ||
|
|
||
| ``` | ||
| PyTorch Model ──► torch.export ──► .pte file ──► model.h (C array) | ||
| │ | ||
| Arduino Sketch (.ino) | ||
| #include <ExecuTorchArduino.h> | ||
| #include "model.h" | ||
| │ | ||
| arduino-cli compile ──► Upload ──► Runs on board | ||
| ``` | ||
|
|
||
| ### The three pieces | ||
|
|
||
| 1. **The library** (`arduino_lib/ExecuTorchArduino/`) — the ExecuTorch | ||
| runtime, CMSIS-NN kernels, and portable ops packaged for the Arduino | ||
| build system. Generated by `build_arduino_library.sh`; not checked in. | ||
|
|
||
| 2. **The model** (`model.h`) — a `.pte` file converted to a C byte array. | ||
| Each user brings their own model, exported from PyTorch with the | ||
| Cortex-M backend. | ||
|
|
||
| 3. **The sketch** (`.ino`) — a standard Arduino program that loads the | ||
| model, feeds it input, and reads the output. Uses the native | ||
| ExecuTorch C++ API (`Program::load`, `Method::execute`, etc.). | ||
|
|
||
| ## Supported Boards | ||
|
|
||
| | Board | MCU | Status | | ||
| |-------|-----|--------| | ||
| | Arduino Uno Q | STM32U585 (Cortex-M33) | Tested | | ||
| | Arduino Nano 33 BLE | nRF52840 (Cortex-M4F) | Planned (requires mbed PAL) | | ||
| | Arduino Giga R1 WiFi | STM32H747 (Cortex-M7) | Planned (requires mbed PAL) | | ||
| | Arduino Portenta H7 | STM32H747 (Cortex-M7) | Planned (requires mbed PAL) | | ||
|
|
||
| The library currently requires the Zephyr board core. Non-Zephyr boards | ||
| (mbed) need a platform abstraction layer port before they can compile. | ||
| CMSIS-NN accelerated ops work on any ARM Cortex-M with DSP extensions. | ||
| Portable ops work on any architecture. | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### 1. Build the Arduino library | ||
|
|
||
| ```bash | ||
| cd examples/arduino | ||
| ./build_arduino_library.sh | ||
| ``` | ||
|
|
||
| This copies the required ExecuTorch sources from the repository into | ||
| `arduino_lib/ExecuTorchArduino/`, ready for Arduino. | ||
|
|
||
| ### 2. Install the library | ||
|
|
||
| Copy the generated library into your Arduino libraries folder: | ||
|
|
||
| ```bash | ||
| # macOS: | ||
| cp -r arduino_lib/ExecuTorchArduino ~/Documents/Arduino/libraries/ | ||
| # Linux: | ||
| cp -r arduino_lib/ExecuTorchArduino ~/Arduino/libraries/ | ||
| ``` | ||
|
|
||
| Or with `arduino-cli`: | ||
|
|
||
| ```bash | ||
| cd arduino_lib && zip -r ExecuTorchArduino.zip ExecuTorchArduino && cd .. | ||
| arduino-cli lib install --zip-path arduino_lib/ExecuTorchArduino.zip | ||
| ``` | ||
|
|
||
| ### 3. Export a model | ||
|
|
||
| Each sketch needs a `model.h` file — a `.pte` model converted to a C | ||
| byte array. Use `pte_to_header.py` from the Arm examples to convert | ||
| any `.pte` file: | ||
|
|
||
| ```bash | ||
| python examples/arm/executor_runner/pte_to_header.py \ | ||
| -p model.pte -d examples/arduino/examples/AddModel -o model.h | ||
| ``` | ||
|
|
||
| **AddModel** — export a simple add model (no dataset needed): | ||
|
|
||
| ```bash | ||
| python -c " | ||
| import torch | ||
| from executorch.exir import to_edge | ||
| from torch.export import export | ||
| class Add(torch.nn.Module): | ||
| def forward(self, x): return x + 1.0 | ||
| et = to_edge(export(Add().eval(), (torch.tensor([1.,2.,3.]),))).to_executorch() | ||
| with open('add.pte','wb') as f: f.write(bytes(et.buffer))" | ||
|
|
||
| python examples/arm/executor_runner/pte_to_header.py \ | ||
| -p add.pte -d examples/arduino/examples/AddModel -o model.h | ||
| ``` | ||
|
|
||
| **HelloExecuTorch** — uses any valid model; the AddModel `.pte` works: | ||
|
|
||
| ```bash | ||
| cp examples/arduino/examples/AddModel/model.h \ | ||
| examples/arduino/examples/HelloExecuTorch/model.h | ||
| ``` | ||
|
|
||
| **KeywordSpotting** — requires a quantized DS-CNN model. Generate it | ||
| with `export_model.py`: | ||
|
|
||
| ```bash | ||
| # Download the dataset (one time, ~2.3 GB) — run from repo root: | ||
| python -c "import torchaudio; torchaudio.datasets.SPEECHCOMMANDS( | ||
| root='outputs/speech_commands', download=True)" | ||
|
|
||
| # Train DS-CNN, quantize with CMSIS-NN, and export model.h: | ||
| python examples/arduino/export_model.py \ | ||
| --output examples/arduino/examples/KeywordSpotting/model.h | ||
| ``` | ||
|
|
||
| This trains DS-CNN on Google Speech Commands v2 (100 samples/class), | ||
| quantizes to int8 via `CortexMQuantizer`, calibrates with real MFCC | ||
| audio data, and exports a 54 KB `.pte` as a C header. | ||
|
|
||
| To export with a pre-trained checkpoint instead of training: | ||
|
|
||
| ```bash | ||
| python examples/arduino/export_model.py --checkpoint my_weights.pth \ | ||
| --output examples/arduino/examples/KeywordSpotting/model.h | ||
| ``` | ||
|
|
||
| **Note:** `build_arduino_library.sh` requires schema headers from a prior | ||
| cmake build. If you haven't built ExecuTorch yet, run | ||
| `./install_executorch.sh` first. | ||
|
|
||
| ### 4. Write a sketch | ||
|
|
||
| ```cpp | ||
| #include <ExecuTorchArduino.h> | ||
| #include "model.h" | ||
|
|
||
| using executorch::extension::BufferDataLoader; | ||
| using executorch::runtime::Error; | ||
| using executorch::runtime::HierarchicalAllocator; | ||
| using executorch::runtime::MemoryAllocator; | ||
| using executorch::runtime::MemoryManager; | ||
| using executorch::runtime::Method; | ||
| using executorch::runtime::MethodMeta; | ||
| using executorch::runtime::Program; | ||
| using executorch::runtime::Result; | ||
| using executorch::runtime::Span; | ||
|
|
||
| alignas(16) uint8_t method_pool[28 * 1024]; | ||
|
|
||
| void setup() { | ||
| Serial.begin(115200); | ||
| delay(2000); | ||
|
|
||
| executorch::runtime::runtime_init(); | ||
|
|
||
| auto loader = BufferDataLoader(model_pte, sizeof(model_pte)); | ||
| Result<Program> program = Program::load(&loader); | ||
| if (!program.ok()) { | ||
| Serial.println("Failed to load program"); | ||
| return; | ||
| } | ||
|
|
||
| // ... load method, set inputs, execute, read outputs | ||
| // See examples/ for complete working sketches. | ||
| } | ||
|
|
||
| void loop() { | ||
| // Run inference periodically | ||
| delay(2000); | ||
| } | ||
| ``` | ||
|
|
||
| The sketch uses the **native ExecuTorch C++ API** — the same API used on | ||
| Linux, Android, and bare-metal targets. No wrapper layer, no | ||
| Arduino-specific abstractions. | ||
|
|
||
| ### 5. Compile and upload | ||
|
|
||
| ```bash | ||
| arduino-cli compile --fqbn arduino:zephyr:unoq MySketch | ||
| arduino-cli upload --fqbn arduino:zephyr:unoq -p /dev/cu.usbmodem* MySketch | ||
| arduino-cli monitor -p /dev/cu.usbmodem* --config baudrate=115200 | ||
| ``` | ||
|
|
||
| ## What is inside the library | ||
|
|
||
| The `build_arduino_library.sh` script assembles these components from | ||
| the ExecuTorch repository: | ||
|
|
||
| | Component | Source in repo | Purpose | | ||
| |-----------|---------------|---------| | ||
| | ET Runtime | `runtime/executor/`, `runtime/core/`, `runtime/kernel/`, `runtime/platform/` | Model loading, memory management, op dispatch | | ||
| | Portable Ops | `kernels/portable/` | Software op implementations (any CPU) | | ||
| | Cortex-M Ops | `backends/cortex_m/ops/` | CMSIS-NN accelerated int8 ops | | ||
| | CMSIS-NN | fetched by cmake / Zephyr module | ARM's optimized DSP kernels | | ||
| | flatcc | `third-party/flatcc/` | .pte file parsing | | ||
| | flatbuffers | `third-party/flatbuffers/` | Schema headers | | ||
| | c10 | `runtime/core/portable_type/c10/` | Core type definitions | | ||
|
|
||
| The library uses no external dependencies beyond what the Arduino board | ||
| core provides. | ||
|
|
||
| ## Arduino-specific patches | ||
|
|
||
| The build script applies these patches to make ExecuTorch compile under | ||
| Arduino's build system: | ||
|
|
||
| 1. **`#include <exception>` before `<variant>`** — Arduino's custom | ||
| `<new>` header omits `<exception>`, breaking `std::bad_variant_access`. | ||
|
|
||
| 2. **`cmake_macros.h` stub** — c10/torch headers expect a cmake-generated | ||
| file. The build script generates a stub; `C10_USING_CUSTOM_GENERATED_MACROS` | ||
| is defined in `ExecuTorchArduino.h` to skip the include. | ||
|
|
||
| 3. **`platform_stubs.c`** — provides weak stubs for `_Exit()`, `fprintf()`, | ||
| and `__aeabi_f2lz` for the LLEXT environment on boards that lack them. | ||
|
|
||
| 4. **Compile-time defines** — `ExecuTorchArduino.h` sets | ||
| `ET_ENABLE_DEPRECATED_CONSTANT_BUFFER=0` (requires models exported with | ||
| current ExecuTorch) and `FLATBUFFERS_MAX_ALIGNMENT=1024`. | ||
|
|
||
| ## Development | ||
|
|
||
| ### Updating the library | ||
|
|
||
| After modifying ExecuTorch sources, regenerate the library: | ||
|
|
||
| ```bash | ||
| ./build_arduino_library.sh # rebuild | ||
| ./build_arduino_library.sh --clean # remove generated output | ||
| ``` | ||
|
|
||
| ### Testing | ||
|
|
||
| ```bash | ||
| arduino-cli compile --fqbn arduino:zephyr:unoq examples/HelloExecuTorch | ||
| arduino-cli upload --fqbn arduino:zephyr:unoq -p /dev/cu.usbmodem* examples/HelloExecuTorch | ||
| arduino-cli monitor -p /dev/cu.usbmodem* --config baudrate=115200 | ||
| ``` | ||
|
|
||
| ### Publishing to Arduino Library Manager | ||
|
|
||
| The library is published by adding its repository URL to the | ||
| [Arduino Library Registry](https://github.com/arduino/library-registry). | ||
| After the initial registration, new git tags are picked up automatically. | ||
|
|
||
| ## Build Validation | ||
|
|
||
| Tested on Arduino Uno Q (STM32U585, Cortex-M33 @ 160 MHz): | ||
|
|
||
| - **Portable ops**: Add model (`x + 1.0`) produces correct output | ||
| `[1,2,3] + 1 = [2.0, 3.0, 4.0]` | ||
| - **CMSIS-NN linear**: Quantized linear model (int8, 2.2 KB) runs | ||
| `arm_fully_connected_s8` via `cortex_m::quantized_linear` | ||
| - **CMSIS-NN keyword spotting**: DS-CNN (MLPerf Tiny KWS benchmark, | ||
| 54 KB, int8) correctly classifies real audio from Google Speech | ||
| Commands dataset via 16 CMSIS-NN accelerated ops (conv2d, depthwise | ||
| conv2d, avgpool, linear, quantize, dequantize, pad) | ||
|
|
||
| ### Keyword Spotting Results | ||
|
|
||
| Verified with real audio on hardware: | ||
|
|
||
| ``` | ||
| "yes" → [yes]=7.82 >>> Detected: yes CORRECT! | ||
| "no" → [no]=1.60 >>> Detected: no CORRECT! | ||
| ``` | ||
|
|
||
| To test different keywords, change one line in the sketch: | ||
|
|
||
| ```cpp | ||
| // In KeywordSpotting.ino, change this line: | ||
| #include "mfcc_yes.h" // → detects "yes" | ||
| // #include "mfcc_no.h" // → detects "no" | ||
| // #include "mfcc_stop.h" // → detects "stop" | ||
| // Available: mfcc_yes.h, mfcc_no.h, mfcc_up.h, mfcc_down.h, | ||
| // mfcc_left.h, mfcc_right.h, mfcc_on.h, mfcc_off.h, | ||
| // mfcc_stop.h, mfcc_go.h | ||
| ``` | ||
|
|
||
| To test with your own audio recording: | ||
|
|
||
| ```bash | ||
| python generate_test_input.py --input my_recording.wav --output mfcc_custom.h | ||
| # Then: #include "mfcc_custom.h" in the sketch | ||
| ``` | ||
|
|
||
| ## End-to-End Flow | ||
|
|
||
| ``` | ||
| Google Speech Commands "yes" audio (.wav, 16kHz, 1 second) | ||
| → MFCC extraction (49 time frames × 10 coefficients) | ||
| → DS-CNN model (23K params, trained on MacBook CPU) | ||
| → CortexMQuantizer → int8 (calibrated with real MFCC data) | ||
| → CMSIS-NN ops (conv2d, depthwise_conv2d, avgpool, linear) | ||
| → Export to .pte (54 KB) | ||
| → Arduino library → arduino-cli compile → upload | ||
| → Cortex-M33 @ 160 MHz (Arduino Uno Q, STM32U585) | ||
| → Serial output: ">>> yes" ✅ | ||
| ``` | ||
|
|
||
| ## Dataset | ||
|
|
||
| Training and test audio from [Google Speech Commands v2](https://arxiv.org/abs/1804.03209) | ||
| — 65,000 one-second recordings of 35 words spoken by thousands of | ||
| people. Standard dataset used by the MLPerf Tiny benchmark. Download | ||
| via `torchaudio.datasets.SPEECHCOMMANDS` (2.3 GB). | ||
|
|
||
| The DS-CNN KWS benchmark uses 12 output classes (silence, unknown, plus | ||
| 10 keywords). The Arduino export script trains the 10 keyword classes: | ||
| yes, no, up, down, left, right, on, off, stop, go. | ||
|
|
||
| ## LLEXT Memory Budget | ||
|
|
||
| The Arduino Uno Q loads sketches as LLEXT (Loadable Extensions). | ||
| Sizes reported by `arduino-cli compile` (Zephyr board core 0.55.2): | ||
|
|
||
| | Build | Code | Data | Total | Status | | ||
| |-------|------|------|-------|--------| | ||
| | HelloExecuTorch (portable ops) | 62 KB | 27 KB | 89 KB | ✅ | | ||
| | Add model (portable ops) | 88 KB | 35 KB | 123 KB | ✅ | | ||
| | DS-CNN (selective CMSIS-NN) | 87 KB | 57 KB | 144 KB | ✅ | | ||
|
|
||
| All CMSIS-NN sources are compiled, but the linker's | ||
| `--gc-sections` discards unused functions from the final binary. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.