Skip to content

Commit eff587e

Browse files
committed
Add ExecuTorch Arduino library with CMSIS-NN keyword spotting example
Package ExecuTorch as an Arduino library installable via Library Manager. Includes build script that vendors runtime sources, DS-CNN keyword spotting example with CMSIS-NN acceleration, and pre-generated MFCC test inputs from Google Speech Commands. Tested on Arduino Uno Q (STM32U585, Cortex-M33).
1 parent 218cc45 commit eff587e

22 files changed

Lines changed: 2471 additions & 0 deletions

examples/arduino/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Generated by build_arduino_library.sh — do not check in
2+
arduino_lib/
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
/*
2+
* Copyright (c) Meta Platforms, Inc. and affiliates.
3+
* All rights reserved.
4+
*
5+
* This source code is licensed under the BSD-style license found in the
6+
* LICENSE file in the root directory of this source tree.
7+
*/
8+
9+
#pragma once
10+
11+
// Arduino's custom <new> header omits <exception>, which breaks
12+
// std::bad_variant_access in <variant>. Include it first.
13+
#include <exception>
14+
15+
#ifndef C10_USING_CUSTOM_GENERATED_MACROS
16+
#define C10_USING_CUSTOM_GENERATED_MACROS
17+
#endif
18+
#ifndef ET_ENABLE_DEPRECATED_CONSTANT_BUFFER
19+
#define ET_ENABLE_DEPRECATED_CONSTANT_BUFFER 0
20+
#endif
21+
#ifndef FLATBUFFERS_MAX_ALIGNMENT
22+
#define FLATBUFFERS_MAX_ALIGNMENT 1024
23+
#endif
24+
25+
#include <executorch/extension/data_loader/buffer_data_loader.h>
26+
#include <executorch/runtime/core/memory_allocator.h>
27+
#include <executorch/runtime/executor/method.h>
28+
#include <executorch/runtime/executor/method_meta.h>
29+
#include <executorch/runtime/executor/program.h>
30+
#include <executorch/runtime/platform/runtime.h>

examples/arduino/README.md

Lines changed: 314 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,314 @@
1+
<!---
2+
Copyright (c) Meta Platforms, Inc. and affiliates.
3+
All rights reserved.
4+
5+
This source code is licensed under the BSD-style license found in the
6+
LICENSE file in the root directory of this source tree.
7+
--->
8+
9+
# ExecuTorch Arduino Library
10+
11+
Run PyTorch models on Arduino microcontrollers using ExecuTorch.
12+
13+
This directory contains everything needed to package ExecuTorch as an
14+
Arduino library. A build script vendors the runtime sources from this
15+
repository into a self-contained library that Arduino users install
16+
through the Library Manager or by copying into their libraries folder.
17+
18+
## How It Works
19+
20+
```
21+
PyTorch Model ──► torch.export ──► .pte file ──► model.h (C array)
22+
23+
Arduino Sketch (.ino)
24+
#include <ExecuTorchArduino.h>
25+
#include "model.h"
26+
27+
arduino-cli compile ──► Upload ──► Runs on board
28+
```
29+
30+
### The three pieces
31+
32+
1. **The library** (`arduino_lib/ExecuTorchArduino/`) — the ExecuTorch
33+
runtime, CMSIS-NN kernels, and portable ops packaged for the Arduino
34+
build system. Generated by `build_arduino_library.sh`; not checked in.
35+
36+
2. **The model** (`model.h`) — a `.pte` file converted to a C byte array.
37+
Each user brings their own model, exported from PyTorch with the
38+
Cortex-M backend.
39+
40+
3. **The sketch** (`.ino`) — a standard Arduino program that loads the
41+
model, feeds it input, and reads the output. Uses the native
42+
ExecuTorch C++ API (`Program::load`, `Method::execute`, etc.).
43+
44+
## Supported Boards
45+
46+
| Board | MCU | Status |
47+
|-------|-----|--------|
48+
| Arduino Uno Q | STM32U585 (Cortex-M33) | Tested |
49+
| Arduino Nano 33 BLE | nRF52840 (Cortex-M4F) | Planned (requires mbed PAL) |
50+
| Arduino Giga R1 WiFi | STM32H747 (Cortex-M7) | Planned (requires mbed PAL) |
51+
| Arduino Portenta H7 | STM32H747 (Cortex-M7) | Planned (requires mbed PAL) |
52+
53+
The library currently requires the Zephyr board core. Non-Zephyr boards
54+
(mbed) need a platform abstraction layer port before they can compile.
55+
CMSIS-NN accelerated ops work on any ARM Cortex-M with DSP extensions.
56+
Portable ops work on any architecture.
57+
58+
## Quick Start
59+
60+
### 1. Build the Arduino library
61+
62+
```bash
63+
cd examples/arduino
64+
./build_arduino_library.sh
65+
```
66+
67+
This copies the required ExecuTorch sources from the repository into
68+
`arduino_lib/ExecuTorchArduino/`, ready for Arduino.
69+
70+
### 2. Install the library
71+
72+
Copy the generated library into your Arduino libraries folder:
73+
74+
```bash
75+
# macOS:
76+
cp -r arduino_lib/ExecuTorchArduino ~/Documents/Arduino/libraries/
77+
# Linux:
78+
cp -r arduino_lib/ExecuTorchArduino ~/Arduino/libraries/
79+
```
80+
81+
Or with `arduino-cli`:
82+
83+
```bash
84+
cd arduino_lib && zip -r ExecuTorchArduino.zip ExecuTorchArduino && cd ..
85+
arduino-cli lib install --zip-path arduino_lib/ExecuTorchArduino.zip
86+
```
87+
88+
### 3. Export a model
89+
90+
The KeywordSpotting example requires a `model.h` file containing the
91+
DS-CNN model as a C byte array. Generate it with `export_model.py`:
92+
93+
```bash
94+
# Download the dataset (one time, ~2.3 GB) — run from repo root:
95+
python -c "import torchaudio; torchaudio.datasets.SPEECHCOMMANDS(
96+
root='outputs/speech_commands', download=True)"
97+
98+
# Train DS-CNN, quantize with CMSIS-NN, and export model.h:
99+
python examples/arduino/export_model.py \
100+
--output examples/arduino/examples/KeywordSpotting/model.h
101+
```
102+
103+
This trains DS-CNN on Google Speech Commands v2 (100 samples/class),
104+
quantizes to int8 via `CortexMQuantizer`, calibrates with real MFCC
105+
audio data, and exports a 54 KB `.pte` as a C header.
106+
107+
To export with a pre-trained checkpoint instead of training:
108+
109+
```bash
110+
python examples/arduino/export_model.py --checkpoint my_weights.pth \
111+
--output examples/arduino/examples/KeywordSpotting/model.h
112+
```
113+
114+
**Note:** `build_arduino_library.sh` requires schema headers from a prior
115+
cmake build. If you haven't built ExecuTorch yet, run
116+
`./install_executorch.sh` first.
117+
118+
### 4. Write a sketch
119+
120+
```cpp
121+
#include <ExecuTorchArduino.h>
122+
#include "model.h"
123+
124+
using executorch::extension::BufferDataLoader;
125+
using executorch::runtime::Error;
126+
using executorch::runtime::HierarchicalAllocator;
127+
using executorch::runtime::MemoryAllocator;
128+
using executorch::runtime::MemoryManager;
129+
using executorch::runtime::Method;
130+
using executorch::runtime::MethodMeta;
131+
using executorch::runtime::Program;
132+
using executorch::runtime::Result;
133+
using executorch::runtime::Span;
134+
135+
alignas(16) uint8_t method_pool[28 * 1024];
136+
alignas(16) uint8_t temp_pool[8 * 1024];
137+
138+
void setup() {
139+
Serial.begin(115200);
140+
delay(2000);
141+
142+
executorch::runtime::runtime_init();
143+
144+
auto loader = BufferDataLoader(model_pte, model_pte_size);
145+
Result<Program> program = Program::load(&loader);
146+
if (!program.ok()) {
147+
Serial.println("Failed to load program");
148+
return;
149+
}
150+
151+
// ... load method, set inputs, execute, read outputs
152+
// See examples/ for complete working sketches.
153+
}
154+
155+
void loop() {
156+
// Run inference periodically
157+
delay(2000);
158+
}
159+
```
160+
161+
The sketch uses the **native ExecuTorch C++ API** — the same API used on
162+
Linux, Android, and bare-metal targets. No wrapper layer, no
163+
Arduino-specific abstractions.
164+
165+
### 5. Compile and upload
166+
167+
```bash
168+
arduino-cli compile --fqbn arduino:zephyr:unoq MySketch
169+
arduino-cli upload --fqbn arduino:zephyr:unoq -p /dev/cu.usbmodem* MySketch
170+
arduino-cli monitor -p /dev/cu.usbmodem* --config baudrate=115200
171+
```
172+
173+
## What is inside the library
174+
175+
The `build_arduino_library.sh` script assembles these components from
176+
the ExecuTorch repository:
177+
178+
| Component | Source in repo | Purpose |
179+
|-----------|---------------|---------|
180+
| ET Runtime | `runtime/executor/`, `runtime/core/`, `runtime/kernel/`, `runtime/platform/` | Model loading, memory management, op dispatch |
181+
| Portable Ops | `kernels/portable/` | Software op implementations (any CPU) |
182+
| Cortex-M Ops | `backends/cortex_m/ops/` | CMSIS-NN accelerated int8 ops |
183+
| CMSIS-NN | fetched by cmake / Zephyr module | ARM's optimized DSP kernels |
184+
| flatcc | `third-party/flatcc/` | .pte file parsing |
185+
| flatbuffers | `third-party/flatbuffers/` | Schema headers |
186+
| c10 | `runtime/core/portable_type/c10/` | Core type definitions |
187+
188+
The library uses no external dependencies beyond what the Arduino board
189+
core provides.
190+
191+
## Arduino-specific patches
192+
193+
The build script applies these patches to make ExecuTorch compile under
194+
Arduino's build system:
195+
196+
1. **`#include <exception>` before `<variant>`** — Arduino's custom
197+
`<new>` header omits `<exception>`, breaking `std::bad_variant_access`.
198+
199+
2. **`cmake_macros.h` stub** — c10/torch headers expect a cmake-generated
200+
file. The build script generates a stub; `C10_USING_CUSTOM_GENERATED_MACROS`
201+
is defined in `ExecuTorchArduino.h` to skip the include.
202+
203+
3. **`platform_stubs.c`** — provides weak stubs for `_Exit()`, `fprintf()`,
204+
and `__aeabi_f2lz` for the LLEXT environment on boards that lack them.
205+
206+
4. **Compile-time defines**`ExecuTorchArduino.h` sets
207+
`ET_ENABLE_DEPRECATED_CONSTANT_BUFFER=0` (requires models exported with
208+
current ExecuTorch) and `FLATBUFFERS_MAX_ALIGNMENT=1024`.
209+
210+
## Development
211+
212+
### Updating the library
213+
214+
After modifying ExecuTorch sources, regenerate the library:
215+
216+
```bash
217+
./build_arduino_library.sh # rebuild
218+
./build_arduino_library.sh --clean # remove generated output
219+
```
220+
221+
### Testing
222+
223+
```bash
224+
arduino-cli compile --fqbn arduino:zephyr:unoq examples/HelloExecuTorch
225+
arduino-cli upload --fqbn arduino:zephyr:unoq -p /dev/cu.usbmodem* examples/HelloExecuTorch
226+
arduino-cli monitor -p /dev/cu.usbmodem* --config baudrate=115200
227+
```
228+
229+
### Publishing to Arduino Library Manager
230+
231+
The library is published by adding its repository URL to the
232+
[Arduino Library Registry](https://github.com/arduino/library-registry).
233+
After the initial registration, new git tags are picked up automatically.
234+
235+
## Build Validation
236+
237+
Tested on Arduino Uno Q (STM32U585, Cortex-M33 @ 160 MHz):
238+
239+
- **Portable ops**: Add model (`x + 1.0`) produces correct output
240+
`[1,2,3] + 1 = [2.0, 3.0, 4.0]`
241+
- **CMSIS-NN linear**: Quantized linear model (int8, 2.2 KB) runs
242+
`arm_fully_connected_s8` via `cortex_m::quantized_linear`
243+
- **CMSIS-NN keyword spotting**: DS-CNN (MLPerf Tiny KWS benchmark,
244+
54 KB, int8) correctly classifies real audio from Google Speech
245+
Commands dataset via 16 CMSIS-NN accelerated ops (conv2d, depthwise
246+
conv2d, avgpool, linear, quantize, dequantize, pad)
247+
248+
### Keyword Spotting Results
249+
250+
Verified with real audio on hardware:
251+
252+
```
253+
"yes" → [yes]=7.82 >>> Detected: yes CORRECT!
254+
"no" → [no]=1.60 >>> Detected: no CORRECT!
255+
```
256+
257+
To test different keywords, change one line in the sketch:
258+
259+
```cpp
260+
// In KeywordSpotting.ino, change this line:
261+
#include "mfcc_yes.h" // → detects "yes"
262+
// #include "mfcc_no.h" // → detects "no"
263+
// #include "mfcc_stop.h" // → detects "stop"
264+
// Available: mfcc_yes.h, mfcc_no.h, mfcc_up.h, mfcc_down.h,
265+
// mfcc_left.h, mfcc_right.h, mfcc_on.h, mfcc_off.h,
266+
// mfcc_stop.h, mfcc_go.h
267+
```
268+
269+
To test with your own audio recording:
270+
271+
```bash
272+
python generate_test_input.py --input my_recording.wav --output mfcc_custom.h
273+
# Then: #include "mfcc_custom.h" in the sketch
274+
```
275+
276+
## End-to-End Flow
277+
278+
```
279+
Google Speech Commands "yes" audio (.wav, 16kHz, 1 second)
280+
→ MFCC extraction (49 time frames × 10 coefficients)
281+
→ DS-CNN model (23K params, trained on MacBook CPU)
282+
→ CortexMQuantizer → int8 (calibrated with real MFCC data)
283+
→ CMSIS-NN ops (conv2d, depthwise_conv2d, avgpool, linear)
284+
→ Export to .pte (54 KB)
285+
→ Arduino library → arduino-cli compile → upload
286+
→ Cortex-M33 @ 160 MHz (Arduino Uno Q, STM32U585)
287+
→ Serial output: ">>> yes" ✅
288+
```
289+
290+
## Dataset
291+
292+
Training and test audio from [Google Speech Commands v2](https://arxiv.org/abs/1804.03209)
293+
— 65,000 one-second recordings of 35 words spoken by thousands of
294+
people. Standard dataset used by the MLPerf Tiny benchmark. Download
295+
via `torchaudio.datasets.SPEECHCOMMANDS` (2.3 GB).
296+
297+
The DS-CNN KWS benchmark uses 12 output classes (silence, unknown, plus
298+
10 keywords). The Arduino export script trains the 10 keyword classes:
299+
yes, no, up, down, left, right, on, off, stop, go.
300+
301+
## LLEXT Memory Budget
302+
303+
The Arduino Uno Q loads sketches as LLEXT (Loadable Extensions) into
304+
131 KB of dynamic memory. Both code and data share this budget:
305+
306+
| Build | Code | Data | Total | Status |
307+
|-------|------|------|-------|--------|
308+
| Add model (portable ops) | 97 KB | 2 KB | 99 KB ||
309+
| DS-CNN (selective CMSIS-NN) | 87 KB | 25 KB | 112 KB ||
310+
| DS-CNN (all CMSIS-NN) | 230 KB | 39 KB | 269 KB | ❌ Too large |
311+
312+
Selective CMSIS-NN inclusion (~30 compiled files vs 111 total) keeps the
313+
build within budget. Arduino only compiles sources referenced by your
314+
sketch; unused CMSIS-NN functions are excluded by the linker.

0 commit comments

Comments
 (0)