Skip to content

Commit ba2516c

Browse files
authored
Add the initial version of Claude skills (#17284)
Adding the first version of Claude skills so that it knows how to export a model and debug. Apparently still missing a lot of details in all these backends and we should keep iterating on them.
1 parent 02501d8 commit ba2516c

11 files changed

Lines changed: 352 additions & 52 deletions

File tree

.claude/backends.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Backends
2+
3+
| Backend | Platform | Hardware | Location |
4+
|---------|----------|----------|----------|
5+
| XNNPACK | All | CPU | `backends/xnnpack/` |
6+
| CUDA | Linux/Windows | GPU | `backends/cuda/` |
7+
| CoreML | iOS, macOS | NPU/GPU/CPU | `backends/apple/coreml/` |
8+
| MPS | iOS, macOS | GPU | `backends/apple/mps/` |
9+
| Vulkan | Android | GPU | `backends/vulkan/` |
10+
| QNN | Android | NPU | `backends/qualcomm/` |
11+
| MediaTek | Android | NPU | `backends/mediatek/` |
12+
| Arm Ethos-U | Embedded | NPU | `backends/arm/` |
13+
| OpenVINO | Embedded | CPU/GPU/NPU | `backends/openvino/` |
14+
| Cadence | Embedded | DSP | See `backends-cadence.md` |
15+
| Samsung | Android | NPU | `backends/samsung/` |
16+
17+
## Partitioner imports
18+
```python
19+
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
20+
from executorch.backends.apple.coreml.partition.coreml_partitioner import CoreMLPartitioner
21+
from executorch.backends.qualcomm.partition.qnn_partitioner import QnnPartitioner
22+
from executorch.backends.vulkan.partition.vulkan_partitioner import VulkanPartitioner
23+
```
24+
25+
## Usage pattern
26+
```python
27+
from executorch.exir import to_edge
28+
29+
edge = to_edge(exported_program)
30+
edge = edge.to_backend(XnnpackPartitioner()) # or other partitioner
31+
exec_prog = edge.to_executorch()
32+
```
33+
34+
Unsupported ops fall back to portable CPU. Use multiple partitioners for priority fallback.

.claude/faq.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Common Errors
2+
3+
## Error Codes
4+
Error codes defined in `runtime/core/error.h`.
5+
6+
| Code | Name | Common Cause |
7+
|------|------|--------------|
8+
| 0x10 | InvalidArgument | Input shape mismatch - inputs don't match export shapes. Use dynamic shapes if needed. |
9+
| 0x14 | OperatorMissing | Selective build missing operator. Regenerate `et_operator_library` from current model. |
10+
| 0x20 | NotFound | Missing backend. Link with `--whole-archive`: `-Wl,--whole-archive libxnnpack_backend.a -Wl,--no-whole-archive` |
11+
12+
## Export Issues
13+
14+
**Missing out variants**: Custom ops need ExecuTorch implementation. See `kernel-library-custom-aten-kernel.md`.
15+
16+
**RuntimeError: convert function not implemented**: Unsupported operator. File GitHub issue.
17+
18+
## Runtime Issues
19+
20+
**Slow inference**:
21+
1. Build with `-DCMAKE_BUILD_TYPE=Release`
22+
2. Ensure model is delegated (use `XnnpackPartitioner`)
23+
3. Set thread count: `threadpool::get_threadpool()->_unsafe_reset_threadpool(num_threads)`
24+
25+
**Numerical accuracy**: Use devtools to debug. See `/profile` skill.
26+
27+
**Error setting input 0x10**: Input shape mismatch. Specify dynamic shapes at export.
28+
29+
**Duplicate kernel registration abort**: Multiple `gen_operators_lib` linked. Use only one per target.
30+
31+
## Installation
32+
33+
**Missing python-dev**: `sudo apt install python<version>-dev`
34+
35+
**Missing pytorch_tokenizers**: `pip install -e ./extension/llm/tokenizers/`

.claude/llm-export.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# LLM Export
2+
3+
High-level API for exporting LLMs to .pte format.
4+
5+
## Supported Models
6+
Llama 2/3/3.1/3.2, Qwen 2.5/3, Phi 3.5/4-mini, SmolLM2
7+
8+
Full list: `extension/llm/export/config/llm_config.py`
9+
10+
For other models (Gemma, Mistral, BERT, Whisper): use optimum-executorch (see `/setup` skill).
11+
12+
## Basic Usage
13+
14+
```bash
15+
python -m executorch.extension.llm.export.export_llm \
16+
--config path/to/config.yaml
17+
```
18+
19+
## Config Structure
20+
21+
```yaml
22+
base:
23+
model_class: llama3_2
24+
checkpoint: path/to/consolidated.00.pth
25+
params: path/to/params.json
26+
metadata: '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
27+
28+
model:
29+
use_kv_cache: True # recommended
30+
use_sdpa_with_kv_cache: True # recommended
31+
use_attention_sink: False # extend generation
32+
quantize_kv_cache: False # int8 KV cache
33+
34+
quantization:
35+
qmode: 8da4w # int8 activation + int4 weight
36+
group_size: 32
37+
embedding_quantize: 4,32
38+
39+
backend:
40+
xnnpack:
41+
enabled: True
42+
extended_ops: True
43+
44+
debug:
45+
verbose: True # show delegation table
46+
generate_etrecord: True # for devtools profiling
47+
```
48+
49+
## Quantization Modes
50+
51+
**TorchAO (XNNPACK)**:
52+
- `8da4w`: int8 dynamic activation + int4 weight
53+
- `int8`: int8 weight-only
54+
- `torchao:8da4w`: low-bit kernels for Arm
55+
56+
**pt2e (QNN, CoreML, Vulkan)**: Use for non-CPU backends.
57+
58+
## Config Classes
59+
All options in `extension/llm/export/config/llm_config.py`:
60+
- `LlmConfig` - top level
61+
- `ExportConfig` - max_seq_length, max_context_length
62+
- `ModelConfig` - model optimizations
63+
- `QuantizationConfig` - quantization options
64+
- `BackendConfig` - backend settings
65+
- `DebugConfig` - verbose, etrecord, profiling

.claude/quantization.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Quantization
2+
3+
Docs: https://docs.pytorch.org/ao/main/pt2e_quantization/index.html
4+
5+
## Backend quantizers
6+
| Backend | Quantizer |
7+
|---------|-----------|
8+
| XNNPACK | `XNNPACKQuantizer` |
9+
| Qualcomm | `QnnQuantizer` |
10+
| CoreML | `CoreMLQuantizer` |
11+
12+
## LLM modes
13+
See `examples/models/llama/source_transformation/quantize.py`: `int8`, `8da4w`, `4w`

.claude/runtime-api.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Runtime API
2+
3+
## executorch.runtime (preferred)
4+
```python
5+
from executorch.runtime import Runtime, Program, Method
6+
runtime = Runtime.get()
7+
program = runtime.load_program(Path("model.pte"))
8+
outputs = program.load_method("forward").execute(inputs)
9+
```
10+
11+
## portable_lib (low-level)
12+
```python
13+
from executorch.extension.pybindings.portable_lib import _load_for_executorch
14+
module = _load_for_executorch("model.pte")
15+
outputs = module.forward(inputs)
16+
```
17+
18+
## Missing kernel fixes
19+
20+
If runtime shows missing kernel errors, import the kernel module before loading:
21+
22+
```python
23+
# Missing quantized kernels (e.g., quantized_decomposed::embedding_byte.out)
24+
from executorch.kernels import quantized
25+
26+
# Missing LLM custom ops (e.g., llama::custom_sdpa.out, llama::update_cache.out)
27+
from executorch.extension.llm.custom_ops import custom_ops
28+
```

.claude/skills/building/SKILL.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
name: building
3+
description: Build ExecuTorch runners or C++ libraries. Use when compiling runners for Llama, Whisper, or other models, or building the C++ runtime.
4+
---
5+
6+
# Building
7+
8+
## Runners (Makefile)
9+
```bash
10+
make help # list all targets
11+
make llama-cpu # Llama
12+
make whisper-metal # Whisper on Metal
13+
make gemma3-cuda # Gemma3 on CUDA
14+
```
15+
16+
Output: `cmake-out/examples/models/<model>/<runner>`
17+
18+
## C++ Libraries (CMake)
19+
```bash
20+
cmake --list-presets # list presets
21+
cmake --workflow --preset llm-release # LLM CPU
22+
cmake --workflow --preset llm-release-metal # LLM Metal
23+
```

.claude/skills/export/SKILL.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
name: export
3+
description: Export a PyTorch model to .pte format for ExecuTorch. Use when converting models, lowering to edge, or generating .pte files.
4+
---
5+
6+
# Export
7+
8+
## Basic pattern
9+
```python
10+
from executorch.exir import to_edge_transform_and_lower
11+
from torch.export import export
12+
13+
exported = export(model.eval(), example_inputs)
14+
edge = to_edge_transform_and_lower(exported)
15+
with open("model.pte", "wb") as f:
16+
f.write(edge.to_executorch().buffer)
17+
```
18+
19+
## Model-specific scripts
20+
| Model | Script |
21+
|-------|--------|
22+
| Llama | `examples/models/llama/export_llama.py` |
23+
| Whisper | `examples/models/whisper/` |
24+
| Parakeet | `examples/models/parakeet/export_parakeet_tdt.py` |
25+
26+
## Debugging
27+
- Draft export: `export(model, inputs, strict=False)`
28+
- tlparse: `TORCH_LOGS="+dynamo,+export" python script.py 2>&1 | tlparse`

.claude/skills/profile/SKILL.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
name: profile
3+
description: Profile ExecuTorch model execution. Use when measuring performance, analyzing operator timing, or debugging slow models.
4+
---
5+
6+
# Profile
7+
8+
## 1. Enable ETDump when loading
9+
```python
10+
program = runtime.load_program("model.pte", enable_etdump=True, debug_buffer_size=int(1e7))
11+
```
12+
13+
## 2. Execute and save
14+
```python
15+
outputs = program.load_method("forward").execute(inputs)
16+
program.write_etdump_result_to_file("etdump.etdp", "debug.bin")
17+
```
18+
19+
## 3. Analyze with Inspector
20+
```python
21+
from executorch.devtools import Inspector
22+
inspector = Inspector(etrecord="model.etrecord", etdump_path="etdump.etdp")
23+
inspector.print_data_tabular()
24+
```

.claude/skills/setup/SKILL.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
name: setup
3+
description: Set up ExecuTorch development environment. Use when installing dependencies, setting up conda environments, or preparing to develop with ExecuTorch.
4+
---
5+
6+
# Setup
7+
8+
1. Activate conda: `conda activate executorch`
9+
- If not found: `conda env list | grep -E "(executorch|et)"`
10+
11+
2. Install executorch: `./install_executorch.sh`
12+
13+
3. (Optional) For Huggingface integration:
14+
- Read commit from `.ci/docker/ci_commit_pins/optimum-executorch.txt`
15+
- Install: `pip install git+https://github.com/huggingface/optimum-executorch.git@<COMMIT>`

.claude/tokenizers.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Tokenizers
2+
3+
C++ tokenizer implementations with Python bindings. Located in `extension/llm/tokenizers/`.
4+
5+
## Installation
6+
```bash
7+
pip install -e ./extension/llm/tokenizers/
8+
```
9+
10+
## Python API
11+
12+
```python
13+
from pytorch_tokenizers import get_tokenizer
14+
15+
# Auto-detect tokenizer type from file
16+
tokenizer = get_tokenizer("path/to/tokenizer.model") # or .json
17+
18+
# Encode/decode
19+
tokens = tokenizer.encode("Hello world")
20+
text = tokenizer.decode(tokens)
21+
```
22+
23+
## Available Tokenizers
24+
25+
| Class | Format | Use Case |
26+
|-------|--------|----------|
27+
| `HuggingFaceTokenizer` | `.json` | HuggingFace models |
28+
| `TiktokenTokenizer` | `.model` | OpenAI/Llama 3 |
29+
| `Llama2cTokenizer` | `.model` | Llama 2, SentencePiece |
30+
| `CppSPTokenizer` | `.model` | SentencePiece (C++) |
31+
32+
## Direct Usage
33+
34+
```python
35+
from pytorch_tokenizers import HuggingFaceTokenizer, TiktokenTokenizer, Llama2cTokenizer
36+
37+
# HuggingFace (tokenizer.json)
38+
tokenizer = HuggingFaceTokenizer("tokenizer.json", "tokenizer_config.json")
39+
40+
# Tiktoken (Llama 3, etc.)
41+
tokenizer = TiktokenTokenizer(model_path="tokenizer.model")
42+
43+
# Llama2c/SentencePiece
44+
tokenizer = Llama2cTokenizer(model_path="tokenizer.model")
45+
```
46+
47+
## C++ Tokenizers
48+
49+
For C++ runners, include headers from `extension/llm/tokenizers/include/`:
50+
- `hf_tokenizer.h` - HuggingFace
51+
- `tiktoken.h` - Tiktoken
52+
- `sentencepiece.h` - SentencePiece
53+
- `llama2c_tokenizer.h` - Llama2c
54+
- `tekken.h` - Mistral Tekken v7

0 commit comments

Comments
 (0)