Skip to content

Commit a6305cb

Browse files
Arm backend: Add CMSIS Pack build infrastructure (#19942)
Adds the PyTorch::ExecuTorch CMSIS Pack build pipeline: PDSC template, per-operator component generator, kernel-registration generator, source-collection driver, top-level pack-build script, the production GitHub Actions build workflow, and a local test harness for the consumer-build smoke. All changes are confined to backends/arm/cmsis_pack/ and backends/arm/scripts/cmsis_pack/. No code outside the cmsis_pack folder is modified. ## Pack contents - Runtime, Kernel Utils, Kernel Registration umbrella components. - 171 per-op Portable operator components. - 9 per-op Quantized operator components. - 16 per-op Cortex-M operator components. The ARM::CMSIS-NN dependency is auto-attached only to ops that route through CMSIS-NN (q/dq and softmax stay pure-CPU; the 13 quantized_* operators add `<require condition="CMSIS-NN"/>`). - Ethos-U Backend for Cortex-M (bare-metal). The Runtime component ships a pack-local patched copy of `runtime/platform/default/minimal.cpp` (under `backends/arm/cmsis_pack/contributions/runtime/platform/default/`). The patched copy marks each `et_pal_*` fallback with `ET_WEAK` directly on the definition. Relying on attribute inheritance from the (weak-via-ET_INTERNAL_PLATFORM_WEAKNESS) declaration in `platform.h` does not produce weak symbols on GCC 13/14/15 or armclang 6.24, so a downstream client that supplies its own `et_pal_*` override hits a multi-def link error against the fallback unless the fallback definition is marked weak directly. The upstream file is not modified; copy_sources.sh overlays the patched copy onto the staged pack tree. ## Codegen - generate_components.py walks the kernel/operator source trees and emits one CMSIS pack component per op_*.cpp under three categories: Portable, Quantized, Cortex-M. The Cortex-M scan detects CMSIS-NN dependency per file via include + symbol pattern scan, so q/dq components stay free of the CMSIS-NN dep. - generate_register_all_kernels.py walks the portable / quantized / cortex_m operator yamls, extracts kernel signatures from the matching .cpp sources, and emits one #ifdef RTE_ML_EXECUTORCH_OP_<CATEGORY>_<NAME>-guarded Kernel(...) registration per overload. Cortex-M kernels are forward-declared inside a `cortex_m::native` namespace block (with using aliases for Tensor / ScalarType / Int64ArrayRef / KernelRuntimeContext, mirroring cortex_m_ops_common.h but without pulling in arm_nn_types.h) and invoked via cortex_m::native::<short>(...). ATen and quantized_decomposed kernels remain in torch::executor::native. - For cortex_m::quantize_per_tensor.out and cortex_m::dequantize_per_tensor.out the generated trampoline accepts either the declared 7 args or 8 args and routes the `out` slot via stack[stack.size()-1], so AOT pipelines that emit an optional out_dtype EValue do not require kernel recompilation. ## Local test harness - backends/arm/cmsis_pack/test/validate_pack.py — structural validation of a built .pack archive (PDSC well-formed, file refs resolve to real entries, no leaked Python sources, runtime + RegisterAllKernels.cpp present). - backends/arm/cmsis_pack/test/smoke/ — minimal csolution consumer project plus a run.sh driver that builds the pack, validates it, and runs csolution + cbuild against the freshly built pack inside the avh-mlops-licensed-community Docker image. No compiler flags are hand-curated in the script; the cmsis-toolbox cdefault.yml plus the PDSC drive everything. ## Workflow .github/workflows/build-cmsis-pack.yml runs the pack build only: cross-compile ExecuTorch headers, run build_pack.sh, upload the artifact, and (on non-prerelease release events) attach the .pack to the GitHub Release. Structural validation and consumer-build smoke run locally via the scripts under backends/arm/cmsis_pack/test/. ## Follow-up items (not in this CL) 1. executorch_config.yml is parsed but unused by generate_components.py; annotate as documentation or drive generation from it. 2. Add a coverage test that compares generate_register_all_kernels.py output against the union of functions.yaml / quantized.yaml / cortex_m operators.yaml. 3. PDSC template <repository> URL hard-codes pytorch/executorch. 4. Wire validate_pack.py + smoke/run.sh into the CI workflow once the runner image carries vcpkg + the AVH-MLOps toolset. 5. Extend the csolution smoke project to also exercise a Cortex-M per-op component. 6. Cortex-A / Linux-userspace Ethos-U backend variant: add as a second component once the userspace driver headers are vendorable in-pack. 7. Gate std::random_device use in op_rand / op_randn / op_native_dropout behind a bare-metal-aware define plus an et_pal_random_u32() PAL hook, so consumers selecting those per-op components do not hit unresolved libstdc++ _M_getval / _M_init / _M_fini at link time. Original author: Matthias Hertel <matthias.hertel@arm.com> Change-Id: Ide967ae0a24293d9f24b76961db92eb1f064655d Signed-off-by: Matthias Hertel <matthias.hertel@arm.com> cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani Signed-off-by: Matthias Hertel <matthias.hertel@arm.com>
1 parent 0dbaed4 commit a6305cb

18 files changed

Lines changed: 3076 additions & 0 deletions

File tree

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Copyright 2026 Arm Limited and/or its affiliates.
2+
#
3+
# This source code is licensed under the BSD-style license found in the
4+
# LICENSE file in the root directory of this source tree.
5+
6+
name: Build CMSIS Pack
7+
8+
on:
9+
schedule:
10+
# Nightly at 03:00 UTC, staggered after nightly.yml (02:00) so the
11+
# shared runner pool isn't hit by both at the same minute.
12+
- cron: 0 3 * * *
13+
release:
14+
# Build (and, for non-prerelease, publish) the pack when a GitHub
15+
# Release is created. The tag the release points at drives the pack
16+
# version via GITHUB_REF below.
17+
types: [published]
18+
push:
19+
branches:
20+
- main
21+
- release/*
22+
paths:
23+
- .github/workflows/build-cmsis-pack.yml
24+
- backends/arm/cmsis_pack/**
25+
- backends/arm/cmsis_pack/scripts/**
26+
- backends/arm/runtime/**
27+
- backends/cortex_m/**
28+
- kernels/portable/**
29+
- kernels/quantized/**
30+
- runtime/**
31+
- schema/**
32+
pull_request:
33+
paths:
34+
- .github/workflows/build-cmsis-pack.yml
35+
- backends/arm/cmsis_pack/**
36+
- backends/arm/cmsis_pack/scripts/**
37+
workflow_dispatch:
38+
inputs:
39+
version_override:
40+
description: 'Override pack version (e.g., 1.2.0). Leave empty to derive from version.txt'
41+
required: false
42+
type: string
43+
44+
concurrency:
45+
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}
46+
cancel-in-progress: true
47+
48+
jobs:
49+
build-cmsis-pack:
50+
name: build-cmsis-pack
51+
uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main
52+
permissions:
53+
id-token: write
54+
contents: read
55+
with:
56+
runner: linux.2xlarge
57+
docker-image: ci-image:executorch-ubuntu-22.04-arm-sdk
58+
submodules: 'recursive'
59+
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
60+
timeout: 60
61+
upload-artifact: cmsis-pack-artifact
62+
script: |
63+
set -eux
64+
65+
echo "::group::Setup environment"
66+
# The generic Linux job chooses to use base env, not the one setup by the image
67+
CONDA_ENV=$(conda env list --json | jq -r ".envs | .[-1]")
68+
conda activate "${CONDA_ENV}"
69+
70+
source .ci/scripts/utils.sh
71+
install_executorch "--use-pt-pinned-commit"
72+
echo "::endgroup::"
73+
74+
echo "::group::Install ARM toolchain"
75+
.ci/scripts/setup-arm-baremetal-tools.sh
76+
source examples/arm/arm-scratch/setup_path.sh
77+
echo "::endgroup::"
78+
79+
echo "::group::Cross-compile ExecuTorch for Cortex-M"
80+
# Stage 1: Build core ExecuTorch with arm-none-eabi-gcc
81+
# This generates required headers (flatbuffers, schema)
82+
backends/arm/scripts/build_executorch.sh
83+
CMAKE_BUILD_DIR="$(pwd)/cmake-out-arm"
84+
echo "::endgroup::"
85+
86+
echo "::group::Determine pack version"
87+
# Derive version from tag, input override, schedule (nightly), or version.txt
88+
BASE_VER="$(cat version.txt | sed 's/a0$//')"
89+
if [[ -n "${{ inputs.version_override || '' }}" ]]; then
90+
PACK_VERSION="${{ inputs.version_override }}"
91+
elif [[ "${GITHUB_REF}" == refs/tags/v* ]]; then
92+
# Strip leading 'v' and any -rc suffix for release tags
93+
PACK_VERSION="${GITHUB_REF#refs/tags/v}"
94+
elif [[ "${{ github.event_name }}" == "schedule" ]]; then
95+
PACK_VERSION="${BASE_VER}-nightly-$(date -u +%Y%m%d)"
96+
else
97+
PACK_VERSION="${BASE_VER}-dev"
98+
fi
99+
echo "Pack version: ${PACK_VERSION}"
100+
echo "::endgroup::"
101+
102+
echo "::group::Build CMSIS Pack"
103+
backends/arm/cmsis_pack/scripts/build_pack.sh \
104+
--executorch-root "$(pwd)" \
105+
--build-dir "${CMAKE_BUILD_DIR}" \
106+
--version "${PACK_VERSION}" \
107+
--output-dir "$(pwd)/artifacts-to-be-uploaded"
108+
echo "::endgroup::"
109+
110+
# Structural validation and consumer-build smoke are intentionally
111+
# not run in CI yet. See:
112+
# backends/arm/cmsis_pack/test/validate_pack.py (structural)
113+
# backends/arm/cmsis_pack/test/smoke/run.sh (cbuild via
114+
# AVH-MLOps)
115+
# for the local test drivers.
116+
117+
# Attach the pack to the GitHub Release when a non-prerelease release is
118+
# published. Prereleases still build + validate via the release trigger
119+
# but are not published.
120+
publish-cmsis-pack:
121+
if: github.event_name == 'release' && !github.event.release.prerelease
122+
needs: build-cmsis-pack
123+
runs-on: ubuntu-latest
124+
permissions:
125+
contents: write
126+
steps:
127+
- name: Download pack artifact
128+
uses: actions/download-artifact@v4
129+
with:
130+
name: cmsis-pack-artifact
131+
path: pack-output
132+
133+
- name: Upload to GitHub Release
134+
uses: softprops/action-gh-release@v2
135+
with:
136+
files: pack-output/*.pack
137+
tag_name: ${{ github.ref_name }}

backends/arm/cmsis_pack/README.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# PyTorch::ExecuTorch CMSIS Pack
2+
3+
Build scripts and templates for the `PyTorch::ExecuTorch` CMSIS Pack.
4+
5+
## Overview
6+
7+
This is a **source pack**: it ships ExecuTorch runtime + kernel sources
8+
(not a prebuilt binary), packaged as a CMSIS Pack for bare-metal Cortex-M
9+
consumers. Every portable, quantized and Cortex-M operator is exposed as a
10+
selectable CMSIS component, enabling fine-grained code-size control.
11+
12+
## Structure
13+
14+
```
15+
backends/arm/
16+
├── cmsis_pack/
17+
│ ├── config/
18+
│ │ └── executorch_config.yml # Build configuration and defines
19+
│ ├── contributions/
20+
│ │ ├── add/ # Static files copied verbatim into the pack
21+
│ │ │ ├── Documentation/
22+
│ │ │ └── armclang_shims/sys/ # AC6-only sys/types.h shim (compiler.h fix)
23+
│ │ └── runtime/platform/default/
24+
│ │ └── minimal.cpp.patch # Pack-local patch applied to upstream minimal.cpp
25+
│ ├── templates/
26+
│ │ └── PyTorch.ExecuTorch.pdsc.tpl # Pack description template
27+
│ ├── test/
28+
│ │ ├── validate_pack.py # Structural validation of a built .pack
29+
│ │ └── smoke/ # csolution consumer-build smoke project
30+
│ │ ├── run.sh # Local driver (build + validate + cbuild)
31+
│ │ ├── smoke.csolution.yml
32+
│ │ ├── smoke.cproject.yml
33+
│ │ ├── vcpkg-configuration.json
34+
│ │ └── main.cpp
35+
│ └── scripts/
36+
│ ├── build_pack.sh # Main entry point
37+
│ ├── copy_sources.sh # Collects sources from repo tree
38+
│ ├── generate_components.py # Generates per-operator PDSC components
39+
│ └── generate_register_all_kernels.py # Generates #ifdef-guarded registrations
40+
```
41+
42+
The build/codegen scripts and the test harness are co-located under
43+
`backends/arm/cmsis_pack/`, keeping everything pack-specific in one tree.
44+
45+
## Components
46+
47+
- **Machine Learning::ExecuTorch::Runtime** — Core runtime (always required)
48+
- **Machine Learning::ExecuTorch::Kernel Utils** — Kernel registration utilities
49+
- **Machine Learning::ExecuTorch::Kernel Registration** — Per-op kernel registrations
50+
- **Machine Learning::ExecuTorch Operators::Portable \*** — Individual portable operators
51+
- **Machine Learning::ExecuTorch Operators::Quantized \*** — Quantized operators
52+
- **Machine Learning::ExecuTorch Operators::Cortex-M \*** — CMSIS-NN-optimized Cortex-M operators
53+
- **Machine Learning::ExecuTorch::Backend EthosU** — Ethos-U NPU backend for Cortex-M host (bare-metal)
54+
55+
## Building locally
56+
57+
```bash
58+
# 1. Cross-compile ExecuTorch for Cortex-M (generates required headers).
59+
# Equivalent to running backends/arm/scripts/build_executorch.sh — use
60+
# that script if you want the canonical Arm-backend build flags.
61+
cmake \
62+
-DCMAKE_TOOLCHAIN_FILE=examples/arm/ethos-u-setup/arm-none-eabi-gcc.cmake \
63+
-DEXECUTORCH_BUILD_ARM_BAREMETAL=ON \
64+
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
65+
-DEXECUTORCH_BUILD_FLATC=ON \
66+
-Bcmake-out-arm .
67+
cmake --build cmake-out-arm --config Release -j$(nproc)
68+
69+
# 2. Build the pack. --output-dir is where the .pack archive lands
70+
# (created if absent); each invocation rewrites this directory.
71+
backends/arm/cmsis_pack/scripts/build_pack.sh \
72+
--executorch-root "$(pwd)" \
73+
--build-dir cmake-out-arm \
74+
--version "$(cat version.txt | sed 's/a0$//')" \
75+
--output-dir pack-output
76+
```
77+
78+
The resulting `.pack` file is a zip archive installable via `cpackget add <file>.pack`.
79+
80+
## Local testing
81+
82+
Two scripts under `test/` exercise a freshly built pack the way real
83+
consumers will. Both run outside CI today; the CI workflow only builds and
84+
uploads the `.pack` artifact.
85+
86+
```bash
87+
# End-to-end: rebuild the pack, validate its structure, then run a
88+
# csolution + cbuild consumer build inside the AVH-MLOps Docker image.
89+
backends/arm/cmsis_pack/test/smoke/run.sh
90+
```
91+
92+
The driver:
93+
94+
1. Calls `build_pack.sh` to produce a fresh `.pack` (always rebuilds — no
95+
stale install gets exercised).
96+
2. Runs `test/validate_pack.py` against the archive: PDSC well-formed,
97+
runtime + `RegisterAllKernels.cpp` present, no duplicate / leaked
98+
`.py` entries, every `<file name="..."/>` reference resolves.
99+
3. Spawns `ghcr.io/arm-software/avh-mlops/arm-mlops-docker-licensed-community:latest-arm64`,
100+
`vcpkg activate`s the toolchain set declared in
101+
`test/smoke/vcpkg-configuration.json` (cmsis-toolbox + arm-none-eabi-gcc
102+
+ cmake + ninja), `cpackget`-installs the freshly built pack into a
103+
container-local pack root, then `cbuild`s the smoke project for the
104+
`ARMCM55` target. All compile flags come from the PDSC and the
105+
cmsis-toolbox `cdefault.yml` — none are hand-curated in the script.
106+
107+
Override defaults via env vars (defaults shown in parentheses):
108+
109+
| Var | Default | Meaning |
110+
|-----|---------|---------|
111+
| `PACK_VERSION` | `<version.txt>-stage` | Version string baked into the pack |
112+
| `BUILD_DIR` | `arm_test/cmake-out` | CMake build dir feeding generated headers |
113+
| `OUTPUT_DIR` | `arm_test/cmsis-pack-output` | Where the `.pack` archive lands |
114+
| `DOCKER_IMAGE` | `ghcr.io/arm-software/avh-mlops/arm-mlops-docker-licensed-community:latest-arm64` | Image used for the consumer build |
115+
116+
Prerequisite: `BUILD_DIR` must be populated by
117+
`backends/arm/scripts/build_executorch.sh` so the generated FlatBuffers /
118+
schema headers are available to `build_pack.sh`.
119+
120+
To validate a previously built `.pack` archive without rebuilding or
121+
running the consumer build:
122+
123+
```bash
124+
python3 backends/arm/cmsis_pack/test/validate_pack.py path/to/PyTorch.ExecuTorch.<ver>.pack
125+
```
126+
127+
## Dependencies
128+
129+
- ARM::CMSIS
130+
- ARM::CMSIS-NN (for CMSIS-NN optimized operators)
131+
- ARM::ethos-u-core-driver (for Ethos-U backend)

0 commit comments

Comments
 (0)