|
| 1 | +<!--- |
| 2 | + Copyright (c) Meta Platforms, Inc. and affiliates. |
| 3 | + All rights reserved. |
| 4 | +
|
| 5 | + This source code is licensed under the BSD-style license found in the |
| 6 | + LICENSE file in the root directory of this source tree. |
| 7 | +---> |
| 8 | + |
| 9 | +# ExecuTorch Arduino Library |
| 10 | + |
| 11 | +Run PyTorch models on Arduino microcontrollers using ExecuTorch. |
| 12 | + |
| 13 | +This directory contains everything needed to package ExecuTorch as an |
| 14 | +Arduino library. A build script vendors the runtime sources from this |
| 15 | +repository into a self-contained library that Arduino users install |
| 16 | +through the Library Manager or by copying into their libraries folder. |
| 17 | + |
| 18 | +## How It Works |
| 19 | + |
| 20 | +``` |
| 21 | +PyTorch Model ──► torch.export ──► .pte file ──► model.h (C array) |
| 22 | + │ |
| 23 | + Arduino Sketch (.ino) |
| 24 | + #include <ExecuTorchArduino.h> |
| 25 | + #include "model.h" |
| 26 | + │ |
| 27 | + arduino-cli compile ──► Upload ──► Runs on board |
| 28 | +``` |
| 29 | + |
| 30 | +### The three pieces |
| 31 | + |
| 32 | +1. **The library** (`arduino_lib/ExecuTorchArduino/`) — the ExecuTorch |
| 33 | + runtime, CMSIS-NN kernels, and portable ops packaged for the Arduino |
| 34 | + build system. Generated by `build_arduino_library.sh`; not checked in. |
| 35 | + |
| 36 | +2. **The model** (`model.h`) — a `.pte` file converted to a C byte array. |
| 37 | + Each user brings their own model, exported from PyTorch with the |
| 38 | + Cortex-M backend. |
| 39 | + |
| 40 | +3. **The sketch** (`.ino`) — a standard Arduino program that loads the |
| 41 | + model, feeds it input, and reads the output. Uses the native |
| 42 | + ExecuTorch C++ API (`Program::load`, `Method::execute`, etc.). |
| 43 | + |
| 44 | +## Supported Boards |
| 45 | + |
| 46 | +| Board | MCU | Status | |
| 47 | +|-------|-----|--------| |
| 48 | +| Arduino Uno Q | STM32U585 (Cortex-M33) | Tested | |
| 49 | +| Arduino Nano 33 BLE | nRF52840 (Cortex-M4F) | Planned (requires mbed PAL) | |
| 50 | +| Arduino Giga R1 WiFi | STM32H747 (Cortex-M7) | Planned (requires mbed PAL) | |
| 51 | +| Arduino Portenta H7 | STM32H747 (Cortex-M7) | Planned (requires mbed PAL) | |
| 52 | + |
| 53 | +The library currently requires the Zephyr board core. Non-Zephyr boards |
| 54 | +(mbed) need a platform abstraction layer port before they can compile. |
| 55 | +CMSIS-NN accelerated ops work on any ARM Cortex-M with DSP extensions. |
| 56 | +Portable ops work on any architecture. |
| 57 | + |
| 58 | +## Quick Start |
| 59 | + |
| 60 | +### 1. Build the Arduino library |
| 61 | + |
| 62 | +```bash |
| 63 | +cd examples/arduino |
| 64 | +./build_arduino_library.sh |
| 65 | +``` |
| 66 | + |
| 67 | +This copies the required ExecuTorch sources from the repository into |
| 68 | +`arduino_lib/ExecuTorchArduino/`, ready for Arduino. |
| 69 | + |
| 70 | +### 2. Install the library |
| 71 | + |
| 72 | +Copy the generated library into your Arduino libraries folder: |
| 73 | + |
| 74 | +```bash |
| 75 | +# macOS: |
| 76 | +cp -r arduino_lib/ExecuTorchArduino ~/Documents/Arduino/libraries/ |
| 77 | +# Linux: |
| 78 | +cp -r arduino_lib/ExecuTorchArduino ~/Arduino/libraries/ |
| 79 | +``` |
| 80 | + |
| 81 | +Or with `arduino-cli`: |
| 82 | + |
| 83 | +```bash |
| 84 | +cd arduino_lib && zip -r ExecuTorchArduino.zip ExecuTorchArduino && cd .. |
| 85 | +arduino-cli lib install --zip-path arduino_lib/ExecuTorchArduino.zip |
| 86 | +``` |
| 87 | + |
| 88 | +### 3. Export a model |
| 89 | + |
| 90 | +The KeywordSpotting example requires a `model.h` file containing the |
| 91 | +DS-CNN model as a C byte array. Generate it with `export_model.py`: |
| 92 | + |
| 93 | +```bash |
| 94 | +# Download the dataset (one time, ~2.3 GB) — run from repo root: |
| 95 | +python -c "import torchaudio; torchaudio.datasets.SPEECHCOMMANDS( |
| 96 | + root='outputs/speech_commands', download=True)" |
| 97 | + |
| 98 | +# Train DS-CNN, quantize with CMSIS-NN, and export model.h: |
| 99 | +python examples/arduino/export_model.py \ |
| 100 | + --output examples/arduino/examples/KeywordSpotting/model.h |
| 101 | +``` |
| 102 | + |
| 103 | +This trains DS-CNN on Google Speech Commands v2 (100 samples/class), |
| 104 | +quantizes to int8 via `CortexMQuantizer`, calibrates with real MFCC |
| 105 | +audio data, and exports a 54 KB `.pte` as a C header. |
| 106 | + |
| 107 | +To export with a pre-trained checkpoint instead of training: |
| 108 | + |
| 109 | +```bash |
| 110 | +python examples/arduino/export_model.py --checkpoint my_weights.pth \ |
| 111 | + --output examples/arduino/examples/KeywordSpotting/model.h |
| 112 | +``` |
| 113 | + |
| 114 | +**Note:** `build_arduino_library.sh` requires schema headers from a prior |
| 115 | +cmake build. If you haven't built ExecuTorch yet, run |
| 116 | +`./install_executorch.sh` first. |
| 117 | + |
| 118 | +### 4. Write a sketch |
| 119 | + |
| 120 | +```cpp |
| 121 | +#include <ExecuTorchArduino.h> |
| 122 | +#include "model.h" |
| 123 | + |
| 124 | +using executorch::extension::BufferDataLoader; |
| 125 | +using executorch::runtime::Error; |
| 126 | +using executorch::runtime::HierarchicalAllocator; |
| 127 | +using executorch::runtime::MemoryAllocator; |
| 128 | +using executorch::runtime::MemoryManager; |
| 129 | +using executorch::runtime::Method; |
| 130 | +using executorch::runtime::MethodMeta; |
| 131 | +using executorch::runtime::Program; |
| 132 | +using executorch::runtime::Result; |
| 133 | +using executorch::runtime::Span; |
| 134 | + |
| 135 | +alignas(16) uint8_t method_pool[28 * 1024]; |
| 136 | +alignas(16) uint8_t temp_pool[8 * 1024]; |
| 137 | + |
| 138 | +void setup() { |
| 139 | + Serial.begin(115200); |
| 140 | + delay(2000); |
| 141 | + |
| 142 | + executorch::runtime::runtime_init(); |
| 143 | + |
| 144 | + auto loader = BufferDataLoader(model_pte, model_pte_size); |
| 145 | + Result<Program> program = Program::load(&loader); |
| 146 | + if (!program.ok()) { |
| 147 | + Serial.println("Failed to load program"); |
| 148 | + return; |
| 149 | + } |
| 150 | + |
| 151 | + // ... load method, set inputs, execute, read outputs |
| 152 | + // See examples/ for complete working sketches. |
| 153 | +} |
| 154 | + |
| 155 | +void loop() { |
| 156 | + // Run inference periodically |
| 157 | + delay(2000); |
| 158 | +} |
| 159 | +``` |
| 160 | + |
| 161 | +The sketch uses the **native ExecuTorch C++ API** — the same API used on |
| 162 | +Linux, Android, and bare-metal targets. No wrapper layer, no |
| 163 | +Arduino-specific abstractions. |
| 164 | + |
| 165 | +### 5. Compile and upload |
| 166 | + |
| 167 | +```bash |
| 168 | +arduino-cli compile --fqbn arduino:zephyr:unoq MySketch |
| 169 | +arduino-cli upload --fqbn arduino:zephyr:unoq -p /dev/cu.usbmodem* MySketch |
| 170 | +arduino-cli monitor -p /dev/cu.usbmodem* --config baudrate=115200 |
| 171 | +``` |
| 172 | + |
| 173 | +## What is inside the library |
| 174 | + |
| 175 | +The `build_arduino_library.sh` script assembles these components from |
| 176 | +the ExecuTorch repository: |
| 177 | + |
| 178 | +| Component | Source in repo | Purpose | |
| 179 | +|-----------|---------------|---------| |
| 180 | +| ET Runtime | `runtime/executor/`, `runtime/core/`, `runtime/kernel/`, `runtime/platform/` | Model loading, memory management, op dispatch | |
| 181 | +| Portable Ops | `kernels/portable/` | Software op implementations (any CPU) | |
| 182 | +| Cortex-M Ops | `backends/cortex_m/ops/` | CMSIS-NN accelerated int8 ops | |
| 183 | +| CMSIS-NN | fetched by cmake / Zephyr module | ARM's optimized DSP kernels | |
| 184 | +| flatcc | `third-party/flatcc/` | .pte file parsing | |
| 185 | +| flatbuffers | `third-party/flatbuffers/` | Schema headers | |
| 186 | +| c10 | `runtime/core/portable_type/c10/` | Core type definitions | |
| 187 | + |
| 188 | +The library uses no external dependencies beyond what the Arduino board |
| 189 | +core provides. |
| 190 | + |
| 191 | +## Arduino-specific patches |
| 192 | + |
| 193 | +The build script applies these patches to make ExecuTorch compile under |
| 194 | +Arduino's build system: |
| 195 | + |
| 196 | +1. **`#include <exception>` before `<variant>`** — Arduino's custom |
| 197 | + `<new>` header omits `<exception>`, breaking `std::bad_variant_access`. |
| 198 | + |
| 199 | +2. **`cmake_macros.h` stub** — c10/torch headers expect a cmake-generated |
| 200 | + file. The build script generates a stub; `C10_USING_CUSTOM_GENERATED_MACROS` |
| 201 | + is defined in `ExecuTorchArduino.h` to skip the include. |
| 202 | + |
| 203 | +3. **`platform_stubs.c`** — provides weak stubs for `_Exit()`, `fprintf()`, |
| 204 | + and `__aeabi_f2lz` for the LLEXT environment on boards that lack them. |
| 205 | + |
| 206 | +4. **Compile-time defines** — `ExecuTorchArduino.h` sets |
| 207 | + `ET_ENABLE_DEPRECATED_CONSTANT_BUFFER=0` (requires models exported with |
| 208 | + current ExecuTorch) and `FLATBUFFERS_MAX_ALIGNMENT=1024`. |
| 209 | + |
| 210 | +## Development |
| 211 | + |
| 212 | +### Updating the library |
| 213 | + |
| 214 | +After modifying ExecuTorch sources, regenerate the library: |
| 215 | + |
| 216 | +```bash |
| 217 | +./build_arduino_library.sh # rebuild |
| 218 | +./build_arduino_library.sh --clean # remove generated output |
| 219 | +``` |
| 220 | + |
| 221 | +### Testing |
| 222 | + |
| 223 | +```bash |
| 224 | +arduino-cli compile --fqbn arduino:zephyr:unoq examples/HelloExecuTorch |
| 225 | +arduino-cli upload --fqbn arduino:zephyr:unoq -p /dev/cu.usbmodem* examples/HelloExecuTorch |
| 226 | +arduino-cli monitor -p /dev/cu.usbmodem* --config baudrate=115200 |
| 227 | +``` |
| 228 | + |
| 229 | +### Publishing to Arduino Library Manager |
| 230 | + |
| 231 | +The library is published by adding its repository URL to the |
| 232 | +[Arduino Library Registry](https://github.com/arduino/library-registry). |
| 233 | +After the initial registration, new git tags are picked up automatically. |
| 234 | + |
| 235 | +## Build Validation |
| 236 | + |
| 237 | +Tested on Arduino Uno Q (STM32U585, Cortex-M33 @ 160 MHz): |
| 238 | + |
| 239 | +- **Portable ops**: Add model (`x + 1.0`) produces correct output |
| 240 | + `[1,2,3] + 1 = [2.0, 3.0, 4.0]` |
| 241 | +- **CMSIS-NN linear**: Quantized linear model (int8, 2.2 KB) runs |
| 242 | + `arm_fully_connected_s8` via `cortex_m::quantized_linear` |
| 243 | +- **CMSIS-NN keyword spotting**: DS-CNN (MLPerf Tiny KWS benchmark, |
| 244 | + 54 KB, int8) correctly classifies real audio from Google Speech |
| 245 | + Commands dataset via 16 CMSIS-NN accelerated ops (conv2d, depthwise |
| 246 | + conv2d, avgpool, linear, quantize, dequantize, pad) |
| 247 | + |
| 248 | +### Keyword Spotting Results |
| 249 | + |
| 250 | +Verified with real audio on hardware: |
| 251 | + |
| 252 | +``` |
| 253 | +"yes" → [yes]=7.82 >>> Detected: yes CORRECT! |
| 254 | +"no" → [no]=1.60 >>> Detected: no CORRECT! |
| 255 | +``` |
| 256 | + |
| 257 | +To test different keywords, change one line in the sketch: |
| 258 | + |
| 259 | +```cpp |
| 260 | +// In KeywordSpotting.ino, change this line: |
| 261 | +#include "mfcc_yes.h" // → detects "yes" |
| 262 | +// #include "mfcc_no.h" // → detects "no" |
| 263 | +// #include "mfcc_stop.h" // → detects "stop" |
| 264 | +// Available: mfcc_yes.h, mfcc_no.h, mfcc_up.h, mfcc_down.h, |
| 265 | +// mfcc_left.h, mfcc_right.h, mfcc_on.h, mfcc_off.h, |
| 266 | +// mfcc_stop.h, mfcc_go.h |
| 267 | +``` |
| 268 | + |
| 269 | +To test with your own audio recording: |
| 270 | + |
| 271 | +```bash |
| 272 | +python generate_test_input.py --input my_recording.wav --output mfcc_custom.h |
| 273 | +# Then: #include "mfcc_custom.h" in the sketch |
| 274 | +``` |
| 275 | + |
| 276 | +## End-to-End Flow |
| 277 | + |
| 278 | +``` |
| 279 | +Google Speech Commands "yes" audio (.wav, 16kHz, 1 second) |
| 280 | + → MFCC extraction (49 time frames × 10 coefficients) |
| 281 | + → DS-CNN model (23K params, trained on MacBook CPU) |
| 282 | + → CortexMQuantizer → int8 (calibrated with real MFCC data) |
| 283 | + → CMSIS-NN ops (conv2d, depthwise_conv2d, avgpool, linear) |
| 284 | + → Export to .pte (54 KB) |
| 285 | + → Arduino library → arduino-cli compile → upload |
| 286 | + → Cortex-M33 @ 160 MHz (Arduino Uno Q, STM32U585) |
| 287 | + → Serial output: ">>> yes" ✅ |
| 288 | +``` |
| 289 | + |
| 290 | +## Dataset |
| 291 | + |
| 292 | +Training and test audio from [Google Speech Commands v2](https://arxiv.org/abs/1804.03209) |
| 293 | +— 65,000 one-second recordings of 35 words spoken by thousands of |
| 294 | +people. Standard dataset used by the MLPerf Tiny benchmark. Download |
| 295 | +via `torchaudio.datasets.SPEECHCOMMANDS` (2.3 GB). |
| 296 | + |
| 297 | +The DS-CNN KWS benchmark uses 12 output classes (silence, unknown, plus |
| 298 | +10 keywords). The Arduino export script trains the 10 keyword classes: |
| 299 | +yes, no, up, down, left, right, on, off, stop, go. |
| 300 | + |
| 301 | +## LLEXT Memory Budget |
| 302 | + |
| 303 | +The Arduino Uno Q loads sketches as LLEXT (Loadable Extensions) into |
| 304 | +131 KB of dynamic memory. Both code and data share this budget: |
| 305 | + |
| 306 | +| Build | Code | Data | Total | Status | |
| 307 | +|-------|------|------|-------|--------| |
| 308 | +| Add model (portable ops) | 97 KB | 2 KB | 99 KB | ✅ | |
| 309 | +| DS-CNN (selective CMSIS-NN) | 87 KB | 25 KB | 112 KB | ✅ | |
| 310 | +| DS-CNN (all CMSIS-NN) | 230 KB | 39 KB | 269 KB | ❌ Too large | |
| 311 | + |
| 312 | +Selective CMSIS-NN inclusion (~30 compiled files vs 111 total) keeps the |
| 313 | +build within budget. Arduino only compiles sources referenced by your |
| 314 | +sketch; unused CMSIS-NN functions are excluded by the linker. |
0 commit comments