pytorch · psiddh · Jun 19, 2026 · Jun 17, 2026 · Jun 19, 2026
diff --git a/examples/arduino/.gitignore b/examples/arduino/.gitignore
@@ -0,0 +1,2 @@
+# Generated by build_arduino_library.sh — do not check in
+arduino_lib/
diff --git a/examples/arduino/ExecuTorchArduino.h b/examples/arduino/ExecuTorchArduino.h
@@ -0,0 +1,30 @@
+/*
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under the BSD-style license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+
+#pragma once
+
+// Arduino's custom <new> header omits <exception>, which breaks
+// std::bad_variant_access in <variant>. Include it first.
+#include <exception>
+
+#ifndef C10_USING_CUSTOM_GENERATED_MACROS
+#define C10_USING_CUSTOM_GENERATED_MACROS
+#endif
+#ifndef ET_ENABLE_DEPRECATED_CONSTANT_BUFFER
+#define ET_ENABLE_DEPRECATED_CONSTANT_BUFFER 0
+#endif
+#ifndef FLATBUFFERS_MAX_ALIGNMENT
+#define FLATBUFFERS_MAX_ALIGNMENT 1024
+#endif
+
+#include <executorch/extension/data_loader/buffer_data_loader.h>
+#include <executorch/runtime/core/memory_allocator.h>
+#include <executorch/runtime/executor/method.h>
+#include <executorch/runtime/executor/method_meta.h>
+#include <executorch/runtime/executor/program.h>
+#include <executorch/runtime/platform/runtime.h>
diff --git a/examples/arduino/README.md b/examples/arduino/README.md
@@ -0,0 +1,344 @@
+<!---
+  Copyright (c) Meta Platforms, Inc. and affiliates.
+  All rights reserved.
+
+  This source code is licensed under the BSD-style license found in the
+  LICENSE file in the root directory of this source tree.
+--->
+
+# ExecuTorch Arduino Library
+
+Run PyTorch models on Arduino microcontrollers using ExecuTorch.
+
+This directory contains everything needed to package ExecuTorch as an
+Arduino library. A build script vendors the runtime sources from this
+repository into a self-contained library that Arduino users install
+through the Library Manager or by copying into their libraries folder.
+
+## How It Works
+
+```
+PyTorch Model ──► torch.export ──► .pte file ──► model.h (C array)
+                                                      │
+                                          Arduino Sketch (.ino)
+                                          #include <ExecuTorchArduino.h>
+                                          #include "model.h"
+                                                      │
+                                          arduino-cli compile ──► Upload ──► Runs on board
+```
+
+### The three pieces
+
+1. **The library** (`arduino_lib/ExecuTorchArduino/`) — the ExecuTorch
+   runtime, CMSIS-NN kernels, and portable ops packaged for the Arduino
+   build system.  Generated by `build_arduino_library.sh`; not checked in.
+
+2. **The model** (`model.h`) — a `.pte` file converted to a C byte array.
+   Each user brings their own model, exported from PyTorch with the
+   Cortex-M backend.
+
+3. **The sketch** (`.ino`) — a standard Arduino program that loads the
+   model, feeds it input, and reads the output.  Uses the native
+   ExecuTorch C++ API (`Program::load`, `Method::execute`, etc.).
+
+## Supported Boards
+
+| Board | MCU | Status |
+|-------|-----|--------|
+| Arduino Uno Q | STM32U585 (Cortex-M33) | Tested |
+| Arduino Nano 33 BLE | nRF52840 (Cortex-M4F) | Planned (requires mbed PAL) |
+| Arduino Giga R1 WiFi | STM32H747 (Cortex-M7) | Planned (requires mbed PAL) |
+| Arduino Portenta H7 | STM32H747 (Cortex-M7) | Planned (requires mbed PAL) |
+
+The library currently requires the Zephyr board core.  Non-Zephyr boards
+(mbed) need a platform abstraction layer port before they can compile.
+CMSIS-NN accelerated ops work on any ARM Cortex-M with DSP extensions.
+Portable ops work on any architecture.
+
+## Quick Start
+
+### 1. Build the Arduino library
+
+```bash
+cd examples/arduino
+./build_arduino_library.sh
+```
+
+This copies the required ExecuTorch sources from the repository into
+`arduino_lib/ExecuTorchArduino/`, ready for Arduino.
+
+### 2. Install the library
+
+Copy the generated library into your Arduino libraries folder:
+
+```bash
+# macOS:
+cp -r arduino_lib/ExecuTorchArduino ~/Documents/Arduino/libraries/
+# Linux:
+cp -r arduino_lib/ExecuTorchArduino ~/Arduino/libraries/
+```
+
+Or with `arduino-cli`:
+
+```bash
+cd arduino_lib && zip -r ExecuTorchArduino.zip ExecuTorchArduino && cd ..
+arduino-cli lib install --zip-path arduino_lib/ExecuTorchArduino.zip
+```
+
+### 3. Export a model
+
+Each sketch needs a `model.h` file — a `.pte` model converted to a C
+byte array.  Use `pte_to_header.py` from the Arm examples to convert
+any `.pte` file:
+
+```bash
+python examples/arm/executor_runner/pte_to_header.py \
+    -p model.pte -d examples/arduino/examples/AddModel -o model.h
+```
+
+**AddModel** — export a simple add model (no dataset needed):
+
+```bash
+python -c "
+import torch
+from executorch.exir import to_edge
+from torch.export import export
+class Add(torch.nn.Module):
+    def forward(self, x): return x + 1.0
+et = to_edge(export(Add().eval(), (torch.tensor([1.,2.,3.]),))).to_executorch()
+with open('add.pte','wb') as f: f.write(bytes(et.buffer))"
+
+python examples/arm/executor_runner/pte_to_header.py \
+    -p add.pte -d examples/arduino/examples/AddModel -o model.h
+```
+
+**HelloExecuTorch** — uses any valid model; the AddModel `.pte` works:
+
+```bash
+cp examples/arduino/examples/AddModel/model.h \
+   examples/arduino/examples/HelloExecuTorch/model.h
+```
+
+**KeywordSpotting** — requires a quantized DS-CNN model.  Generate it
+with `export_model.py`:
+
+```bash
+# Download the dataset (one time, ~2.3 GB) — run from repo root:
+python -c "import torchaudio; torchaudio.datasets.SPEECHCOMMANDS(
+    root='outputs/speech_commands', download=True)"
+
+# Train DS-CNN, quantize with CMSIS-NN, and export model.h:
+python examples/arduino/export_model.py \
+    --output examples/arduino/examples/KeywordSpotting/model.h
+```
+
+This trains DS-CNN on Google Speech Commands v2 (100 samples/class),
+quantizes to int8 via `CortexMQuantizer`, calibrates with real MFCC
+audio data, and exports a 54 KB `.pte` as a C header.
+
+To export with a pre-trained checkpoint instead of training:
+
+```bash
+python examples/arduino/export_model.py --checkpoint my_weights.pth \
+    --output examples/arduino/examples/KeywordSpotting/model.h
+```
+
+**Note:** `build_arduino_library.sh` requires schema headers from a prior
+cmake build.  If you haven't built ExecuTorch yet, run
+`./install_executorch.sh` first.
+
+### 4. Write a sketch
+
+```cpp
+#include <ExecuTorchArduino.h>
+#include "model.h"
+
+using executorch::extension::BufferDataLoader;
+using executorch::runtime::Error;
+using executorch::runtime::HierarchicalAllocator;
+using executorch::runtime::MemoryAllocator;
+using executorch::runtime::MemoryManager;
+using executorch::runtime::Method;
+using executorch::runtime::MethodMeta;
+using executorch::runtime::Program;
+using executorch::runtime::Result;
+using executorch::runtime::Span;
+
+alignas(16) uint8_t method_pool[28 * 1024];
+
+void setup() {
+  Serial.begin(115200);
+  delay(2000);
+
+  executorch::runtime::runtime_init();
+
+  auto loader = BufferDataLoader(model_pte, sizeof(model_pte));
+  Result<Program> program = Program::load(&loader);
+  if (!program.ok()) {
+    Serial.println("Failed to load program");
+    return;
+  }
+
+  // ... load method, set inputs, execute, read outputs
+  // See examples/ for complete working sketches.
+}
+
+void loop() {
+  // Run inference periodically
+  delay(2000);
+}
+```
+
+The sketch uses the **native ExecuTorch C++ API** — the same API used on
+Linux, Android, and bare-metal targets.  No wrapper layer, no
+Arduino-specific abstractions.
+
+### 5. Compile and upload
+
+```bash
+arduino-cli compile --fqbn arduino:zephyr:unoq MySketch
+arduino-cli upload  --fqbn arduino:zephyr:unoq -p /dev/cu.usbmodem* MySketch
+arduino-cli monitor -p /dev/cu.usbmodem* --config baudrate=115200
+```
+
+## What is inside the library
+
+The `build_arduino_library.sh` script assembles these components from
+the ExecuTorch repository:
+
+| Component | Source in repo | Purpose |
+|-----------|---------------|---------|
+| ET Runtime | `runtime/executor/`, `runtime/core/`, `runtime/kernel/`, `runtime/platform/` | Model loading, memory management, op dispatch |
+| Portable Ops | `kernels/portable/` | Software op implementations (any CPU) |
+| Cortex-M Ops | `backends/cortex_m/ops/` | CMSIS-NN accelerated int8 ops |
+| CMSIS-NN | fetched by cmake / Zephyr module | ARM's optimized DSP kernels |
+| flatcc | `third-party/flatcc/` | .pte file parsing |
+| flatbuffers | `third-party/flatbuffers/` | Schema headers |
+| c10 | `runtime/core/portable_type/c10/` | Core type definitions |
+
+The library uses no external dependencies beyond what the Arduino board
+core provides.
+
+## Arduino-specific patches
+
+The build script applies these patches to make ExecuTorch compile under
+Arduino's build system:
+
+1. **`#include <exception>` before `<variant>`** — Arduino's custom
+   `<new>` header omits `<exception>`, breaking `std::bad_variant_access`.
+
+2. **`cmake_macros.h` stub** — c10/torch headers expect a cmake-generated
+   file.  The build script generates a stub; `C10_USING_CUSTOM_GENERATED_MACROS`
+   is defined in `ExecuTorchArduino.h` to skip the include.
+
+3. **`platform_stubs.c`** — provides weak stubs for `_Exit()`, `fprintf()`,
+   and `__aeabi_f2lz` for the LLEXT environment on boards that lack them.
+
+4. **Compile-time defines** — `ExecuTorchArduino.h` sets
+   `ET_ENABLE_DEPRECATED_CONSTANT_BUFFER=0` (requires models exported with
+   current ExecuTorch) and `FLATBUFFERS_MAX_ALIGNMENT=1024`.
+
+## Development
+
+### Updating the library
+
+After modifying ExecuTorch sources, regenerate the library:
+
+```bash
+./build_arduino_library.sh        # rebuild
+./build_arduino_library.sh --clean  # remove generated output
+```
+
+### Testing
+
+```bash
+arduino-cli compile --fqbn arduino:zephyr:unoq examples/HelloExecuTorch
+arduino-cli upload  --fqbn arduino:zephyr:unoq -p /dev/cu.usbmodem* examples/HelloExecuTorch
+arduino-cli monitor -p /dev/cu.usbmodem* --config baudrate=115200
+```
+
+### Publishing to Arduino Library Manager
+
+The library is published by adding its repository URL to the
+[Arduino Library Registry](https://github.com/arduino/library-registry).
+After the initial registration, new git tags are picked up automatically.
+
+## Build Validation
+
+Tested on Arduino Uno Q (STM32U585, Cortex-M33 @ 160 MHz):
+
+- **Portable ops**: Add model (`x + 1.0`) produces correct output
+  `[1,2,3] + 1 = [2.0, 3.0, 4.0]`
+- **CMSIS-NN linear**: Quantized linear model (int8, 2.2 KB) runs
+  `arm_fully_connected_s8` via `cortex_m::quantized_linear`
+- **CMSIS-NN keyword spotting**: DS-CNN (MLPerf Tiny KWS benchmark,
+  54 KB, int8) correctly classifies real audio from Google Speech
+  Commands dataset via 16 CMSIS-NN accelerated ops (conv2d, depthwise
+  conv2d, avgpool, linear, quantize, dequantize, pad)
+
+### Keyword Spotting Results
+
+Verified with real audio on hardware:
+
+```
+"yes" → [yes]=7.82  >>> Detected: yes  CORRECT!
+"no"  → [no]=1.60   >>> Detected: no   CORRECT!
+```
+
+To test different keywords, change one line in the sketch:
+
+```cpp
+// In KeywordSpotting.ino, change this line:
+#include "mfcc_yes.h"    // → detects "yes"
+// #include "mfcc_no.h"  // → detects "no"
+// #include "mfcc_stop.h"  // → detects "stop"
+// Available: mfcc_yes.h, mfcc_no.h, mfcc_up.h, mfcc_down.h,
+//            mfcc_left.h, mfcc_right.h, mfcc_on.h, mfcc_off.h,
+//            mfcc_stop.h, mfcc_go.h
+```
+
+To test with your own audio recording:
+
+```bash
+python generate_test_input.py --input my_recording.wav --output mfcc_custom.h
+# Then: #include "mfcc_custom.h" in the sketch
+```
+
+## End-to-End Flow
+
+```
+Google Speech Commands "yes" audio (.wav, 16kHz, 1 second)
+  → MFCC extraction (49 time frames × 10 coefficients)
+  → DS-CNN model (23K params, trained on MacBook CPU)
+  → CortexMQuantizer → int8 (calibrated with real MFCC data)
+  → CMSIS-NN ops (conv2d, depthwise_conv2d, avgpool, linear)
+  → Export to .pte (54 KB)
+  → Arduino library → arduino-cli compile → upload
+  → Cortex-M33 @ 160 MHz (Arduino Uno Q, STM32U585)
+  → Serial output: ">>> yes" ✅
+```
+
+## Dataset
+
+Training and test audio from [Google Speech Commands v2](https://arxiv.org/abs/1804.03209)
+— 65,000 one-second recordings of 35 words spoken by thousands of
+people.  Standard dataset used by the MLPerf Tiny benchmark.  Download
+via `torchaudio.datasets.SPEECHCOMMANDS` (2.3 GB).
+
+The DS-CNN KWS benchmark uses 12 output classes (silence, unknown, plus
+10 keywords).  The Arduino export script trains the 10 keyword classes:
+yes, no, up, down, left, right, on, off, stop, go.
+
+## LLEXT Memory Budget
+
+The Arduino Uno Q loads sketches as LLEXT (Loadable Extensions).
+Sizes reported by `arduino-cli compile` (Zephyr board core 0.55.2):
+
+| Build | Code | Data | Total | Status |
+|-------|------|------|-------|--------|
+| HelloExecuTorch (portable ops) | 62 KB | 27 KB | 89 KB | ✅ |
+| Add model (portable ops) | 88 KB | 35 KB | 123 KB | ✅ |
+| DS-CNN (selective CMSIS-NN) | 87 KB | 57 KB | 144 KB | ✅ |
+
+All CMSIS-NN sources are compiled, but the linker's
+`--gc-sections` discards unused functions from the final binary.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# Generated by build_arduino_library.sh — do not check in
		arduino_lib/