Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions examples/arduino/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Generated by build_arduino_library.sh — do not check in
arduino_lib/
30 changes: 30 additions & 0 deletions examples/arduino/ExecuTorchArduino.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
* Copyright (c) Meta Platforms, Inc. and affiliates.
* All rights reserved.
*
* This source code is licensed under the BSD-style license found in the
* LICENSE file in the root directory of this source tree.
*/

#pragma once

// Arduino's custom <new> header omits <exception>, which breaks
// std::bad_variant_access in <variant>. Include it first.
#include <exception>

#ifndef C10_USING_CUSTOM_GENERATED_MACROS
#define C10_USING_CUSTOM_GENERATED_MACROS
#endif
#ifndef ET_ENABLE_DEPRECATED_CONSTANT_BUFFER
#define ET_ENABLE_DEPRECATED_CONSTANT_BUFFER 0
#endif
#ifndef FLATBUFFERS_MAX_ALIGNMENT
#define FLATBUFFERS_MAX_ALIGNMENT 1024
#endif

#include <executorch/extension/data_loader/buffer_data_loader.h>
#include <executorch/runtime/core/memory_allocator.h>
#include <executorch/runtime/executor/method.h>
#include <executorch/runtime/executor/method_meta.h>
#include <executorch/runtime/executor/program.h>
#include <executorch/runtime/platform/runtime.h>
344 changes: 344 additions & 0 deletions examples/arduino/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
<!---
Copyright (c) Meta Platforms, Inc. and affiliates.
All rights reserved.

This source code is licensed under the BSD-style license found in the
LICENSE file in the root directory of this source tree.
--->

# ExecuTorch Arduino Library

Run PyTorch models on Arduino microcontrollers using ExecuTorch.

This directory contains everything needed to package ExecuTorch as an
Arduino library. A build script vendors the runtime sources from this
repository into a self-contained library that Arduino users install
through the Library Manager or by copying into their libraries folder.

## How It Works

```
PyTorch Model ──► torch.export ──► .pte file ──► model.h (C array)
Arduino Sketch (.ino)
#include <ExecuTorchArduino.h>
#include "model.h"
arduino-cli compile ──► Upload ──► Runs on board
```

### The three pieces

1. **The library** (`arduino_lib/ExecuTorchArduino/`) — the ExecuTorch
runtime, CMSIS-NN kernels, and portable ops packaged for the Arduino
build system. Generated by `build_arduino_library.sh`; not checked in.

2. **The model** (`model.h`) — a `.pte` file converted to a C byte array.
Each user brings their own model, exported from PyTorch with the
Cortex-M backend.

3. **The sketch** (`.ino`) — a standard Arduino program that loads the
model, feeds it input, and reads the output. Uses the native
ExecuTorch C++ API (`Program::load`, `Method::execute`, etc.).

## Supported Boards

| Board | MCU | Status |
|-------|-----|--------|
| Arduino Uno Q | STM32U585 (Cortex-M33) | Tested |
| Arduino Nano 33 BLE | nRF52840 (Cortex-M4F) | Planned (requires mbed PAL) |
| Arduino Giga R1 WiFi | STM32H747 (Cortex-M7) | Planned (requires mbed PAL) |
| Arduino Portenta H7 | STM32H747 (Cortex-M7) | Planned (requires mbed PAL) |

The library currently requires the Zephyr board core. Non-Zephyr boards
(mbed) need a platform abstraction layer port before they can compile.
CMSIS-NN accelerated ops work on any ARM Cortex-M with DSP extensions.
Portable ops work on any architecture.

## Quick Start

### 1. Build the Arduino library

```bash
cd examples/arduino
./build_arduino_library.sh
```

This copies the required ExecuTorch sources from the repository into
`arduino_lib/ExecuTorchArduino/`, ready for Arduino.

### 2. Install the library

Copy the generated library into your Arduino libraries folder:

```bash
# macOS:
cp -r arduino_lib/ExecuTorchArduino ~/Documents/Arduino/libraries/
# Linux:
cp -r arduino_lib/ExecuTorchArduino ~/Arduino/libraries/
```

Or with `arduino-cli`:

```bash
cd arduino_lib && zip -r ExecuTorchArduino.zip ExecuTorchArduino && cd ..
arduino-cli lib install --zip-path arduino_lib/ExecuTorchArduino.zip
```

### 3. Export a model

Each sketch needs a `model.h` file — a `.pte` model converted to a C
byte array. Use `pte_to_header.py` from the Arm examples to convert
any `.pte` file:

```bash
python examples/arm/executor_runner/pte_to_header.py \
-p model.pte -d examples/arduino/examples/AddModel -o model.h
```

**AddModel** — export a simple add model (no dataset needed):

```bash
python -c "
import torch
from executorch.exir import to_edge
from torch.export import export
class Add(torch.nn.Module):
def forward(self, x): return x + 1.0
et = to_edge(export(Add().eval(), (torch.tensor([1.,2.,3.]),))).to_executorch()
with open('add.pte','wb') as f: f.write(bytes(et.buffer))"

python examples/arm/executor_runner/pte_to_header.py \
-p add.pte -d examples/arduino/examples/AddModel -o model.h
```

**HelloExecuTorch** — uses any valid model; the AddModel `.pte` works:

```bash
cp examples/arduino/examples/AddModel/model.h \
examples/arduino/examples/HelloExecuTorch/model.h
```

**KeywordSpotting** — requires a quantized DS-CNN model. Generate it
with `export_model.py`:

```bash
# Download the dataset (one time, ~2.3 GB) — run from repo root:
python -c "import torchaudio; torchaudio.datasets.SPEECHCOMMANDS(
root='outputs/speech_commands', download=True)"

# Train DS-CNN, quantize with CMSIS-NN, and export model.h:
python examples/arduino/export_model.py \
--output examples/arduino/examples/KeywordSpotting/model.h
```

This trains DS-CNN on Google Speech Commands v2 (100 samples/class),
quantizes to int8 via `CortexMQuantizer`, calibrates with real MFCC
audio data, and exports a 54 KB `.pte` as a C header.

To export with a pre-trained checkpoint instead of training:

```bash
python examples/arduino/export_model.py --checkpoint my_weights.pth \
--output examples/arduino/examples/KeywordSpotting/model.h
```

**Note:** `build_arduino_library.sh` requires schema headers from a prior
cmake build. If you haven't built ExecuTorch yet, run
`./install_executorch.sh` first.

### 4. Write a sketch

```cpp
#include <ExecuTorchArduino.h>
#include "model.h"

using executorch::extension::BufferDataLoader;
using executorch::runtime::Error;
using executorch::runtime::HierarchicalAllocator;
using executorch::runtime::MemoryAllocator;
using executorch::runtime::MemoryManager;
using executorch::runtime::Method;
using executorch::runtime::MethodMeta;
using executorch::runtime::Program;
using executorch::runtime::Result;
using executorch::runtime::Span;

alignas(16) uint8_t method_pool[28 * 1024];

void setup() {
Serial.begin(115200);
delay(2000);

executorch::runtime::runtime_init();

auto loader = BufferDataLoader(model_pte, sizeof(model_pte));
Result<Program> program = Program::load(&loader);
if (!program.ok()) {
Serial.println("Failed to load program");
return;
}

// ... load method, set inputs, execute, read outputs
// See examples/ for complete working sketches.
}

void loop() {
// Run inference periodically
delay(2000);
}
```

The sketch uses the **native ExecuTorch C++ API** — the same API used on
Linux, Android, and bare-metal targets. No wrapper layer, no
Arduino-specific abstractions.

### 5. Compile and upload

```bash
arduino-cli compile --fqbn arduino:zephyr:unoq MySketch
arduino-cli upload --fqbn arduino:zephyr:unoq -p /dev/cu.usbmodem* MySketch
arduino-cli monitor -p /dev/cu.usbmodem* --config baudrate=115200
```

## What is inside the library

The `build_arduino_library.sh` script assembles these components from
the ExecuTorch repository:

| Component | Source in repo | Purpose |
|-----------|---------------|---------|
| ET Runtime | `runtime/executor/`, `runtime/core/`, `runtime/kernel/`, `runtime/platform/` | Model loading, memory management, op dispatch |
| Portable Ops | `kernels/portable/` | Software op implementations (any CPU) |
| Cortex-M Ops | `backends/cortex_m/ops/` | CMSIS-NN accelerated int8 ops |
| CMSIS-NN | fetched by cmake / Zephyr module | ARM's optimized DSP kernels |
| flatcc | `third-party/flatcc/` | .pte file parsing |
| flatbuffers | `third-party/flatbuffers/` | Schema headers |
| c10 | `runtime/core/portable_type/c10/` | Core type definitions |

The library uses no external dependencies beyond what the Arduino board
core provides.

## Arduino-specific patches

The build script applies these patches to make ExecuTorch compile under
Arduino's build system:

1. **`#include <exception>` before `<variant>`** — Arduino's custom
`<new>` header omits `<exception>`, breaking `std::bad_variant_access`.

2. **`cmake_macros.h` stub** — c10/torch headers expect a cmake-generated
file. The build script generates a stub; `C10_USING_CUSTOM_GENERATED_MACROS`
is defined in `ExecuTorchArduino.h` to skip the include.

3. **`platform_stubs.c`** — provides weak stubs for `_Exit()`, `fprintf()`,
and `__aeabi_f2lz` for the LLEXT environment on boards that lack them.

Comment thread
psiddh marked this conversation as resolved.
4. **Compile-time defines** — `ExecuTorchArduino.h` sets
`ET_ENABLE_DEPRECATED_CONSTANT_BUFFER=0` (requires models exported with
current ExecuTorch) and `FLATBUFFERS_MAX_ALIGNMENT=1024`.

## Development

### Updating the library

After modifying ExecuTorch sources, regenerate the library:

```bash
./build_arduino_library.sh # rebuild
./build_arduino_library.sh --clean # remove generated output
```

### Testing

```bash
arduino-cli compile --fqbn arduino:zephyr:unoq examples/HelloExecuTorch
arduino-cli upload --fqbn arduino:zephyr:unoq -p /dev/cu.usbmodem* examples/HelloExecuTorch
arduino-cli monitor -p /dev/cu.usbmodem* --config baudrate=115200
```

### Publishing to Arduino Library Manager

The library is published by adding its repository URL to the
[Arduino Library Registry](https://github.com/arduino/library-registry).
After the initial registration, new git tags are picked up automatically.

## Build Validation

Tested on Arduino Uno Q (STM32U585, Cortex-M33 @ 160 MHz):

- **Portable ops**: Add model (`x + 1.0`) produces correct output
`[1,2,3] + 1 = [2.0, 3.0, 4.0]`
- **CMSIS-NN linear**: Quantized linear model (int8, 2.2 KB) runs
`arm_fully_connected_s8` via `cortex_m::quantized_linear`
- **CMSIS-NN keyword spotting**: DS-CNN (MLPerf Tiny KWS benchmark,
54 KB, int8) correctly classifies real audio from Google Speech
Commands dataset via 16 CMSIS-NN accelerated ops (conv2d, depthwise
conv2d, avgpool, linear, quantize, dequantize, pad)

### Keyword Spotting Results

Verified with real audio on hardware:

```
"yes" → [yes]=7.82 >>> Detected: yes CORRECT!
"no" → [no]=1.60 >>> Detected: no CORRECT!
```

To test different keywords, change one line in the sketch:

```cpp
// In KeywordSpotting.ino, change this line:
#include "mfcc_yes.h" // → detects "yes"
// #include "mfcc_no.h" // → detects "no"
// #include "mfcc_stop.h" // → detects "stop"
// Available: mfcc_yes.h, mfcc_no.h, mfcc_up.h, mfcc_down.h,
// mfcc_left.h, mfcc_right.h, mfcc_on.h, mfcc_off.h,
// mfcc_stop.h, mfcc_go.h
```

To test with your own audio recording:

```bash
python generate_test_input.py --input my_recording.wav --output mfcc_custom.h
# Then: #include "mfcc_custom.h" in the sketch
```

## End-to-End Flow

```
Google Speech Commands "yes" audio (.wav, 16kHz, 1 second)
→ MFCC extraction (49 time frames × 10 coefficients)
→ DS-CNN model (23K params, trained on MacBook CPU)
→ CortexMQuantizer → int8 (calibrated with real MFCC data)
→ CMSIS-NN ops (conv2d, depthwise_conv2d, avgpool, linear)
→ Export to .pte (54 KB)
→ Arduino library → arduino-cli compile → upload
→ Cortex-M33 @ 160 MHz (Arduino Uno Q, STM32U585)
→ Serial output: ">>> yes" ✅
```

## Dataset

Training and test audio from [Google Speech Commands v2](https://arxiv.org/abs/1804.03209)
— 65,000 one-second recordings of 35 words spoken by thousands of
people. Standard dataset used by the MLPerf Tiny benchmark. Download
via `torchaudio.datasets.SPEECHCOMMANDS` (2.3 GB).

The DS-CNN KWS benchmark uses 12 output classes (silence, unknown, plus
10 keywords). The Arduino export script trains the 10 keyword classes:
yes, no, up, down, left, right, on, off, stop, go.

## LLEXT Memory Budget

The Arduino Uno Q loads sketches as LLEXT (Loadable Extensions).
Sizes reported by `arduino-cli compile` (Zephyr board core 0.55.2):

| Build | Code | Data | Total | Status |
|-------|------|------|-------|--------|
| HelloExecuTorch (portable ops) | 62 KB | 27 KB | 89 KB | ✅ |
| Add model (portable ops) | 88 KB | 35 KB | 123 KB | ✅ |
| DS-CNN (selective CMSIS-NN) | 87 KB | 57 KB | 144 KB | ✅ |

All CMSIS-NN sources are compiled, but the linker's
`--gc-sections` discards unused functions from the final binary.
Loading
Loading