diff --git a/docs/architecture-review-and-rust-analysis.md b/docs/architecture-review-and-rust-analysis.md
new file mode 100644
index 000000000..2405d5658
--- /dev/null
+++ b/docs/architecture-review-and-rust-analysis.md
@@ -0,0 +1,252 @@
+# SIMPLER / PTO Runtime — Architecture Review & Rust-Suitability Analysis
+
+A review of the `simpler` project (HiSilicon PTO Runtime): the 7-layer level
+model, the chip-level three-program model, architectural diagrams, and a
+component-by-component analysis of where Rust would (and would not) help.
+
+> Scope note: this is an external review for discussion. The project today is
+> ~131 k LOC C++ + ~40 k LOC Python, **zero Rust**. Nothing here proposes a
+> rewrite; it maps where Rust's guarantees would pay off if components were
+> (re)written, and labels each with the **single dominant reason**.
+
+![SIMPLER architecture: L0–L6 level model, engine + three-program model, and Rust-suitability map](diagrams/architecture.svg)
+
+> The three panels above are rendered from [`diagrams/make_diagrams.py`](diagrams/make_diagrams.py)
+> (`python3 docs/diagrams/make_diagrams.py` regenerates `diagrams/architecture.svg`). The ASCII
+> versions below are the same diagrams inline.
+
+---
+
+## 1. What SIMPLER is
+
+A **task-graph runtime** that builds and executes DAGs of compute tasks on
+Ascend NPU clusters, coordinating **AICPU** (on-device control processor) and
+**AICore** (AIV vector / AIC cube compute) execution. Three independently
+compiled programs — Host `.so`, AICPU `.so`, AICore `.o` — cooperate through
+narrow C APIs, with a Python orchestration layer on top.
+
+Two orthogonal axes structure the codebase:
+
+- **Level**: L0 (core) → L6 (cluster) — a 7-layer hierarchy mirroring physical
+  topology.
+- **Program**: Host / AICPU / AICore — the three-program model at the L2 chip
+  boundary.
+
+---
+
+## 2. The 7-layer level model (L0–L6)
+
+```text
+ LEVEL   NAME              UNIT                         RUNTIME COMPONENT             WORLD
+ ─────   ────────────────  ───────────────────────────  ────────────────────────────  ─────────────
+ L6   ▒  CLOS2 / Cluster   full cluster (N6 super-nodes) Worker(level=5) ×N            ┐
+ L5   ▒  CLOS1 / SuperNode super-node (N5 pods)          Worker(level=4) ×N            │ HOST / CLUSTER
+ L4   ▒  POD   / Pod       pod (4 hosts)                 Worker(level=3) ×N + Sub ×M   │ (Orchestrator +
+ L3   ▒  HOST  / Node      one host (16 chips + M subs)  ChipWorker ×N + SubWorker ×M  │  Scheduler + Worker,
+         ─────────────────────────────────────────────────────────────────────────── │  IPC / RoCE / HCCS)
+ L2   █  CHIP  / Processor one NPU chip (shared GM)      Host.so + AICPU.so + AICore.o ┘ ← THE BOUNDARY
+ L1   ░  DIE   / L2Cache   chip die                      hardware-managed              ┐ ON-DEVICE
+ L0   ░  CORE  / AIV,AIC   individual compute core       hardware-managed              ┘ (shared GM + atomics)
+```
+
+**L2 is the boundary** between two worlds:
+
+- **L0–L2 (on-device)**: AICPU scheduler + AICore workers + device Global
+  Memory. Coordination by shared GM, atomics, barriers, and the
+  AICPU↔AICore **handshake protocol**. Hard real hardware constraints apply
+  (e.g. AICore *cannot* write `DATA_MAIN_BASE`; MMIO reads are strictly serial
+  at ~95 ns each).
+- **L3–L6 (host/cluster)**: every level runs the **same** scheduling engine —
+  one `Worker` C++ class handles L3 upward; `level` is just a diagnostic label.
+  Composition is **recursive**: a parent Worker schedules child Workers through
+  the identical mailbox protocol L3 uses for chip children. Local composition
+  via fork + shared memory; cross-host (L4–L6) via RoCE / HCCS / UB / sockets.
+
+Maturity, per the docs: L3 implemented; L4 local implemented + remote
+simulation; L5/L6 reuse the L4 code path (untested) / remote proposed.
+
+---
+
+## 3. The three engine components (L3+) and the L2 three-program model
+
+Every level L3+ composes three cooperating components, each on its own thread:
+
+```text
+   ORCHESTRATOR (Orch thread)        SCHEDULER (Scheduler thread)     WORKER (Worker threads)
+   ─────────────────────────         ───────────────────────────      ──────────────────────
+   DAG builder. Runs on the          DAG executor. Drains 3 queues:    Execution layer.
+   user's thread. Owns:              · wiring  (wire fanout edges)     WorkerManager holds
+   · Ring     (slot pool)            · ready   (fanin satisfied →      WorkerThread pools.
+   · TensorMap(dep inference)                   pick idle worker)      Each encodes (callable,
+   · Scope    (tensor lifetime)      · completion (release fanout)     config, args) into a shm
+                                      Never inspects task data —        mailbox → signals the
+   submit_next_level(c, args, cfg)   only moves slot ids + reads        forked child → spin-polls
+     → alloc, dep-infer, push         TaskSlotState metadata.            TASK_DONE.
+       wiring_queue ─────────────────►                ─────────────────►
+                                                       ◄──── completion (slot, outcome)
+```
+
+At **L2**, the "Worker" leaf is a `ChipWorker` that drives the three on-device
+programs:
+
+```text
+        ┌──────────────────────── Python application / SceneTestCase ───────────────────────┐
+        │  nanobind (task_interface)   ChipWorker(dlopen host.so)   RuntimeBuilder/KernelCompiler │
+        └───────────────┬───────────────────────┬───────────────────────────┬─────────────────┘
+                        │                        │                           │ (compile)
+                        ▼                        ▼                           ▼
+        ┌──────── Host Runtime (C++ .so) ────────┐              ┌──── Binary data (AICPU.so + AICore.o) ────┐
+        │ DeviceRunner · MemoryAllocator · C API │  loads ──►   │   dlopen'd / launched at runtime          │
+        └───────────────────────┬────────────────┘              └───────────────────┬───────────────────────┘
+                                │                                                    │
+                                ▼                       Ascend device                ▼
+                 ┌──────────────────────────────────────────────────────────────────────────┐
+                 │  AICPU: task scheduler loop   ◄── handshake buffers (aicpu_ready /         │
+                 │  AICore (AIV/AIC): kernels         aicore_done / task ptr) ──►  compute    │
+                 └──────────────────────────────────────────────────────────────────────────┘
+```
+
+**Two platform backends** (`onboard/` real hardware, `sim/` thread-based host
+simulation) and **two runtimes** (`host_build_graph` = graph built on host CPU,
+for dev/debug; `tensormap_and_ringbuffer` = graph built on AICPU/device, for
+production) sit under `src/{arch}/{platform,runtime}/`.
+
+**Python/C++ division** (from the docs): *Python decides **when**
+(fork ordering, `SharedMemory` lifecycle, callable registration); C++ decides
+**how fast** (threading, atomics, zero-copy dispatch).*
+
+---
+
+## 4. Where Rust fits — component-by-component
+
+Reading the layers top-to-bottom, here is each component, its current
+language, and the **one dominant reason** Rust would or would not help. The
+label in **bold** is the headline reason.
+
+```text
+ ════════════════════════════════════════════════════════════════════════════════════════════
+ LAYER / COMPONENT                       TODAY    RUST?         DOMINANT REASON (label)
+ ════════════════════════════════════════════════════════════════════════════════════════════
+ L3–L6 orchestration / user DAG fn       Python   ✗ keep Py    [ERGONOMICS] dynamic user API,
+   (python/simpler/{worker,orchestrator})                       fork timing, torch interop
+ ────────────────────────────────────────────────────────────────────────────────────────────
+ Scheduler engine (queues, dispatch      C++      ✓✓ STRONG    [CONCURRENCY-SAFETY] lock-free-ish
+   loop, TaskSlotState)                                          queues across Orch/Sched/Worker
+   src/common/hierarchical/scheduler                            threads — data races are the bug
+                                                                 class Rust's Send/Sync removes
+ ────────────────────────────────────────────────────────────────────────────────────────────
+ Orchestrator (Ring, TensorMap, Scope,   C++      ✓✓ STRONG    [LIFETIME-SAFETY] Scope = tensor
+   slot state machine)                                          lifetimes + slot reuse; the exact
+   src/common/hierarchical/{ring,                                use-after-free / aliasing class
+   tensormap,scope,orchestrator}                                 ownership/borrow encodes
+ ────────────────────────────────────────────────────────────────────────────────────────────
+ WorkerManager / WorkerThread + shm      C++      ✓ MODERATE   [CONCURRENCY-SAFETY] thread pool +
+   mailbox dispatch                                             mailbox state machine; but raw shm
+   src/common/hierarchical/worker_manager                       + fork interop needs heavy `unsafe`
+ ────────────────────────────────────────────────────────────────────────────────────────────
+ Remote L3 transport: endpoint + wire    C++      ✓✓ STRONG    [PARSING-SAFETY] versioned frame
+   codec (RoCE/HCCS/UB/sockets)                                 codec over the network = untrusted
+   src/common/hierarchical/remote_{endpoint,wire}               bytes; Rust parsers reject malformed
+                                                                 input without memory-unsafety
+ ────────────────────────────────────────────────────────────────────────────────────────────
+ Host Runtime: DeviceRunner,             C++      ~ WEAK       [FFI-COST] thin wrapper over CANN
+   MemoryAllocator, C API                                       C SDK (rtSetDevice, dlsym); Rust
+   src/{arch}/platform/*/host                                   adds FFI noise for little safety win
+ ────────────────────────────────────────────────────────────────────────────────────────────
+ AICPU scheduler kernel (device .so)     C++      ~ WEAK*      [TOOLCHAIN] must compile with CANN's
+   src/{arch}/platform/*/aicpu                                  AICPU toolchain; no Rust target.
+                                                                 *Logic is race-heavy → Rust would
+                                                                  help IF a target existed
+ ────────────────────────────────────────────────────────────────────────────────────────────
+ AICore compute kernel (device .o)       C++/PTO  ✗ NO         [NO-BACKEND] PTO ISA via CCEC; no
+   src/{arch}/platform/*/aicore                                 Rust/LLVM backend for AICore. This
+                                                                 is the kernel-safety story of the
+                                                                 *other* project (ascend-rs), not a
+                                                                 runtime concern
+ ────────────────────────────────────────────────────────────────────────────────────────────
+ Python↔C++ bindings (nanobind)          C++      ~ WEAK       [INTEROP] nanobind is mature; a Rust
+   python/bindings/task_interface.cpp                          PyO3 equivalent only pays off if the
+                                                                 bound engine is already Rust
+ ════════════════════════════════════════════════════════════════════════════════════════════
+```
+
+### The labels, expanded
+
+- **[CONCURRENCY-SAFETY] — Scheduler & WorkerManager (strongest case).**
+  The Scheduler runs a dedicated thread draining three queues shared with the
+  Orch thread and N WorkerThreads, coordinated by mutex+CV and atomics over
+  `TaskSlotState`. This is precisely the class of bug — data races, torn reads
+  of slot state, missed wakeups — that Rust's `Send`/`Sync` + borrow checker
+  turn into *compile errors*. **Main reason to use Rust: fearless concurrency on
+  the hot scheduling path.**
+
+- **[LIFETIME-SAFETY] — Orchestrator (Ring / TensorMap / Scope).**
+  `Scope` manages intermediate-tensor lifetimes; `Ring` reuses fixed slots with
+  back-pressure; `TensorMap` maps a producer slot to consumers. A slot freed
+  while a downstream consumer still references it is a use-after-free — the
+  *same* hazard class the companion `ascend-rs` work shows Rust ownership
+  rejects at compile time. **Main reason: ownership/lifetimes make
+  slot-reuse-after-free unrepresentable.**
+
+- **[PARSING-SAFETY] — Remote L3 wire codec.**
+  `remote_wire.cpp` is a versioned frame codec for cross-host task frames over
+  RoCE/HCCS/UB/sockets — i.e. it decodes **bytes off the network**. Hand-rolled
+  C++ binary parsers are a perennial CVE source (overreads, length confusion).
+  **Main reason: safe decoding of untrusted/versioned input.**
+
+- **[FFI-COST] — Host Runtime / C API.** `DeviceRunner` is a thin handle-based
+  wrapper over CANN C calls (`rtSetDevice`, stream sync, `dlsym`). Rewriting in
+  Rust means wrapping all of CANN in `extern "C"` + `unsafe` — the safety upside
+  is small and the FFI tax is real. **Main reason *not* to: it's mostly FFI
+  glue, where Rust's guarantees are voided by `unsafe` anyway.**
+
+- **[TOOLCHAIN] — AICPU kernel.** Logically this is *also* a race-heavy
+  scheduler (it would benefit from Rust), but it must be built by CANN's AICPU
+  compiler; there is no Rust target for the AICPU. **Main reason *not* to:
+  no toolchain, regardless of merit.**
+
+- **[NO-BACKEND] — AICore kernel.** Compiled to PTO ISA via CCEC; no
+  Rust/LLVM AICore backend exists. (This is exactly the boundary the *separate*
+  `ascend-rs` project addresses with a shape-typed Rust model + an IR-level
+  oracle — out of scope for this runtime.) **Main reason *not* to: no code
+  generation path to the device.**
+
+- **[ERGONOMICS] — Python orchestration layer.** The user writes orch
+  functions in Python; the layer also owns `fork()` timing, `SharedMemory`
+  alloc/unlink, and torch zero-copy interop. This is "decide *when*" glue where
+  Python's dynamism and ecosystem win. **Main reason to keep Python: user-facing
+  API + lifecycle orchestration, not throughput.**
+
+---
+
+## 5. Summary picture — Rust suitability over the architecture
+
+```text
+                         RUST SUITABILITY  (██ strong · ▓ moderate · ░ weak/no)
+ ┌─────────────────────────────────────────────────────────────────────────────────┐
+ │ L3–L6  Python orchestration ........... ░  keep Python   [ERGONOMICS]             │
+ │ ┌──────────────────────── host/cluster engine (C++) ───────────────────────────┐ │
+ │ │ Scheduler (queues, dispatch) ........ ██  STRONG        [CONCURRENCY-SAFETY]   │ │
+ │ │ Orchestrator (Ring/TensorMap/Scope) . ██  STRONG        [LIFETIME-SAFETY]      │ │
+ │ │ Remote wire codec / endpoint ........ ██  STRONG        [PARSING-SAFETY]       │ │
+ │ │ WorkerManager / mailbox dispatch .... ▓   MODERATE      [CONCURRENCY-SAFETY]   │ │
+ │ │ Host Runtime / DeviceRunner / C API . ░   weak          [FFI-COST]             │ │
+ │ │ nanobind bindings ................... ░   weak          [INTEROP]              │ │
+ │ └──────────────────────────────────────────────────────────────────────────────┘ │
+ │ ════════════════════════ L2 device boundary ════════════════════════════════════ │
+ │ AICPU scheduler kernel ................ ░   blocked       [TOOLCHAIN]              │
+ │ AICore compute kernel (PTO ISA) ....... ░   no            [NO-BACKEND]             │
+ └─────────────────────────────────────────────────────────────────────────────────┘
+```
+
+**Bottom line.** The high-value Rust targets are the **host-side coordination
+core** — Scheduler, Orchestrator, and the remote wire codec — where the bug
+classes are exactly concurrency races, slot-lifetime use-after-free, and
+untrusted-input parsing that Rust eliminates at compile time. The device side
+(AICPU/AICore) is blocked by toolchain/backend availability, not by merit, and
+the Python layer is best left as the ergonomic "when" layer. A pragmatic first
+step would be a single Rust crate replacing `src/common/hierarchical/`
+(scheduler + orchestrator + ring/tensormap/scope + remote_wire), exposed to the
+existing Python via PyO3 — leaving the CANN FFI host runtime and the device
+kernels in C++.
+```
diff --git a/docs/diagrams/architecture.svg b/docs/diagrams/architecture.svg
new file mode 100644
index 000000000..be67b7327
--- /dev/null
+++ b/docs/diagrams/architecture.svg
@@ -0,0 +1,11 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="1180" height="1328" viewBox="0 0 1180 1328" font-family="DejaVu Sans, sans-serif">
+<defs><marker id="arr" markerWidth="9" markerHeight="9" refX="7" refY="4.5" orient="auto"><path d="M0,0 L9,4.5 L0,9 z" fill="#8b949e"/></marker></defs>
+<rect width="1180" height="1328" fill="#0e1116"/>
+<text x="30" y="-6" fill="#e6edf3"></text>
+<rect x="14" y="18" width="1152" height="358" rx="10" fill="#161b22" opacity="0.35"/>
+<g transform="translate(0,28)"><text x="30" y="34" fill="#e6edf3" font-size="20" font-weight="700" font-family="DejaVu Sans">1 · Level model — the 7-layer hierarchy (L0–L6)</text><rect x="30" y="56" width="40" height="30" rx="5" fill="#1f4e6b"/><text x="50" y="77" fill="#e6edf3" font-size="14" font-weight="700" text-anchor="middle" font-family="DejaVu Sans, sans-serif">L6</text><text x="86" y="77" fill="#e6edf3" font-size="14" font-weight="600" text-anchor="start" font-family="DejaVu Sans, sans-serif">CLOS2 / Cluster</text><text x="300" y="77" fill="#8b949e" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">full cluster (N6 super-nodes)</text><text x="600" y="77" fill="#e6edf3" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">Worker(level=5) ×N</text><rect x="30" y="92" width="40" height="30" rx="5" fill="#1f4e6b"/><text x="50" y="113" fill="#e6edf3" font-size="14" font-weight="700" text-anchor="middle" font-family="DejaVu Sans, sans-serif">L5</text><text x="86" y="113" fill="#e6edf3" font-size="14" font-weight="600" text-anchor="start" font-family="DejaVu Sans, sans-serif">CLOS1 / SuperNode</text><text x="300" y="113" fill="#8b949e" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">super-node (N5 pods)</text><text x="600" y="113" fill="#e6edf3" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">Worker(level=4) ×N</text><rect x="30" y="128" width="40" height="30" rx="5" fill="#1f4e6b"/><text x="50" y="149" fill="#e6edf3" font-size="14" font-weight="700" text-anchor="middle" font-family="DejaVu Sans, sans-serif">L4</text><text x="86" y="149" fill="#e6edf3" font-size="14" font-weight="600" text-anchor="start" font-family="DejaVu Sans, sans-serif">POD / Pod</text><text x="300" y="149" fill="#8b949e" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">pod (4 hosts)</text><text x="600" y="149" fill="#e6edf3" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">Worker(level=3) ×N + Sub ×M</text><rect x="30" y="164" width="40" height="30" rx="5" fill="#1f4e6b"/><text x="50" y="185" fill="#e6edf3" font-size="14" font-weight="700" text-anchor="middle" font-family="DejaVu Sans, sans-serif">L3</text><text x="86" y="185" fill="#e6edf3" font-size="14" font-weight="600" text-anchor="start" font-family="DejaVu Sans, sans-serif">HOST / Node</text><text x="300" y="185" fill="#8b949e" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">one host (16 chips + M subs)</text><text x="600" y="185" fill="#e6edf3" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">ChipWorker ×N + SubWorker ×M</text><rect x="30" y="200" width="40" height="30" rx="5" fill="#7c4a2d"/><text x="50" y="221" fill="#e6edf3" font-size="14" font-weight="700" text-anchor="middle" font-family="DejaVu Sans, sans-serif">L2</text><text x="86" y="221" fill="#e6edf3" font-size="14" font-weight="600" text-anchor="start" font-family="DejaVu Sans, sans-serif">CHIP / Processor</text><text x="300" y="221" fill="#8b949e" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">one NPU chip (shared GM)</text><text x="600" y="221" fill="#e6edf3" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">Host.so + AICPU.so + AICore.o</text><rect x="30" y="236" width="40" height="30" rx="5" fill="#7c4a2d"/><text x="50" y="257" fill="#e6edf3" font-size="14" font-weight="700" text-anchor="middle" font-family="DejaVu Sans, sans-serif">L1</text><text x="86" y="257" fill="#e6edf3" font-size="14" font-weight="600" text-anchor="start" font-family="DejaVu Sans, sans-serif">DIE / L2Cache</text><text x="300" y="257" fill="#8b949e" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">chip die</text><text x="600" y="257" fill="#e6edf3" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">hardware-managed</text><rect x="30" y="272" width="40" height="30" rx="5" fill="#7c4a2d"/><text x="50" y="293" fill="#e6edf3" font-size="14" font-weight="700" text-anchor="middle" font-family="DejaVu Sans, sans-serif">L0</text><text x="86" y="293" fill="#e6edf3" font-size="14" font-weight="600" text-anchor="start" font-family="DejaVu Sans, sans-serif">CORE / AIV, AIC</text><text x="300" y="293" fill="#8b949e" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">individual compute core</text><text x="600" y="293" fill="#e6edf3" font-size="12.5" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">hardware-managed</text><line x1="30" y1="197" x2="980" y2="197" stroke="#2ea043" stroke-width="2" stroke-dasharray="7 4"/><text x="985" y="191" fill="#2ea043" font-size="12" font-weight="700" text-anchor="start" font-family="DejaVu Sans, sans-serif">◄ L2 BOUNDARY</text><text x="985" y="80" fill="#1f4e6b" font-size="12" font-weight="700" text-anchor="start" font-family="DejaVu Sans, sans-serif">HOST / CLUSTER</text><text x="985" y="96" fill="#8b949e" font-size="11" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">Orch+Sched+Worker</text><text x="985" y="110" fill="#8b949e" font-size="11" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">IPC · RoCE · HCCS</text><text x="985" y="237" fill="#7c4a2d" font-size="12" font-weight="700" text-anchor="start" font-family="DejaVu Sans, sans-serif">ON-DEVICE</text><text x="985" y="253" fill="#8b949e" font-size="11" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">shared GM + atomics</text></g>
+<rect x="14" y="386" width="1152" height="488" rx="10" fill="#161b22" opacity="0.35"/>
+<g transform="translate(0,396)"><text x="30" y="34" fill="#e6edf3" font-size="20" font-weight="700" font-family="DejaVu Sans">2 · Engine components (L3+) and the L2 three-program model</text><rect x="30" y="56" width="360" height="96" rx="8" fill="#161b22" stroke="#b7791f" stroke-width="1"/><text x="210.0" y="100" fill="#e6edf3" font-size="15" font-weight="600" text-anchor="middle" font-family="DejaVu Sans, sans-serif">ORCHESTRATOR</text><text x="210.0" y="118" fill="#8b949e" font-size="11.5" text-anchor="middle" font-family="DejaVu Sans Mono, monospace">Orch thread · DAG builder</text><text x="46" y="118" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">Ring · TensorMap · Scope</text><text x="46" y="136" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">submit_next_level(c, args, cfg)</text><rect x="410" y="56" width="360" height="96" rx="8" fill="#161b22" stroke="#2f6f4f" stroke-width="1"/><text x="590.0" y="100" fill="#e6edf3" font-size="15" font-weight="600" text-anchor="middle" font-family="DejaVu Sans, sans-serif">SCHEDULER</text><text x="590.0" y="118" fill="#8b949e" font-size="11.5" text-anchor="middle" font-family="DejaVu Sans Mono, monospace">Scheduler thread · DAG executor</text><text x="426" y="118" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">wiring → ready → completion queues</text><text x="426" y="136" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">moves slot ids; never reads task data</text><rect x="790" y="56" width="360" height="96" rx="8" fill="#161b22" stroke="#3a5f8a" stroke-width="1"/><text x="970.0" y="100" fill="#e6edf3" font-size="15" font-weight="600" text-anchor="middle" font-family="DejaVu Sans, sans-serif">WORKER</text><text x="970.0" y="118" fill="#8b949e" font-size="11.5" text-anchor="middle" font-family="DejaVu Sans Mono, monospace">Worker threads · execution</text><text x="806" y="118" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">WorkerManager + WorkerThread pool</text><text x="806" y="136" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">shm mailbox → forked child → poll</text><line x1="390" y1="104" x2="410" y2="104" stroke="#b7791f" stroke-width="2" marker-end="url(#arr)"/><line x1="770" y1="104" x2="790" y2="104" stroke="#2f6f4f" stroke-width="2" marker-end="url(#arr)"/><text x="580" y="100" fill="#8b949e" font-size="10" font-weight="400" text-anchor="middle" font-family="DejaVu Sans Mono, monospace">wiring</text><text x="960" y="100" fill="#8b949e" font-size="10" font-weight="400" text-anchor="middle" font-family="DejaVu Sans Mono, monospace">dispatch</text><line x1="790" y1="132" x2="770" y2="132" stroke="#3a5f8a" stroke-width="1.4" stroke-dasharray="4 3" marker-end="url(#arr)"/><text x="960" y="146" fill="#8b949e" font-size="10" font-weight="400" text-anchor="middle" font-family="DejaVu Sans Mono, monospace">◄ completion (slot, outcome)</text><text x="30" y="196" fill="#8b949e" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">At L2 the Worker leaf = ChipWorker driving three on-device programs:</text><rect x="30" y="214" width="1120" height="40" rx="8" fill="#161b22" stroke="#30363d" stroke-width="1"/><text x="590.0" y="238" fill="#e6edf3" font-size="13" font-weight="600" text-anchor="middle" font-family="DejaVu Sans, sans-serif">Python application / SceneTestCase  —  nanobind · ChipWorker (dlopen host.so) · RuntimeBuilder / KernelCompiler</text><line x1="590" y1="254" x2="590" y2="280" stroke="#8b949e" stroke-width="2" marker-end="url(#arr)"/><rect x="30" y="286" width="540" height="64" rx="8" fill="#161b22" stroke="#6e3b3b" stroke-width="1"/><text x="300.0" y="314" fill="#e6edf3" font-size="15" font-weight="600" text-anchor="middle" font-family="DejaVu Sans, sans-serif">Host Runtime (C++ .so)</text><text x="300.0" y="332" fill="#8b949e" font-size="11.5" text-anchor="middle" font-family="DejaVu Sans Mono, monospace">DeviceRunner · MemoryAllocator · C API</text><rect x="610" y="286" width="540" height="64" rx="8" fill="#161b22" stroke="#30363d" stroke-width="1"/><text x="880.0" y="314" fill="#e6edf3" font-size="15" font-weight="600" text-anchor="middle" font-family="DejaVu Sans, sans-serif">Binary data</text><text x="880.0" y="332" fill="#8b949e" font-size="11.5" text-anchor="middle" font-family="DejaVu Sans Mono, monospace">AICPU .so  +  AICore .o   (dlopen&#x27;d / launched)</text><line x1="300" y1="350" x2="300" y2="378" stroke="#8b949e" stroke-width="2" marker-end="url(#arr)"/><line x1="880" y1="350" x2="880" y2="378" stroke="#8b949e" stroke-width="2" marker-end="url(#arr)"/><rect x="30" y="384" width="1120" height="70" rx="8" fill="#10222b" stroke="#7c4a2d" stroke-width="1"/><text x="590.0" y="424" fill="#e6edf3" font-size="15" font-weight="600" text-anchor="middle" font-family="DejaVu Sans, sans-serif">Ascend device</text><text x="60" y="412" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">AICPU: task scheduler loop</text><text x="60" y="432" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">AICore (AIV/AIC): compute kernels</text><text x="720" y="412" fill="#8b949e" font-size="12" font-weight="400" text-anchor="middle" font-family="DejaVu Sans Mono, monospace">◄── handshake buffers ──►</text><text x="720" y="432" fill="#8b949e" font-size="11" font-weight="400" text-anchor="middle" font-family="DejaVu Sans Mono, monospace">aicpu_ready · aicore_done · task ptr</text></g>
+<rect x="14" y="884" width="1152" height="424" rx="10" fill="#161b22" opacity="0.35"/>
+<g transform="translate(0,894)"><text x="30" y="34" fill="#e6edf3" font-size="20" font-weight="700" font-family="DejaVu Sans">3 · Rust-suitability map — dominant reason per component</text><text x="30" y="56" fill="#8b949e" font-size="12" font-weight="700" text-anchor="start" font-family="DejaVu Sans, sans-serif">COMPONENT</text><text x="560" y="56" fill="#8b949e" font-size="12" font-weight="700" text-anchor="start" font-family="DejaVu Sans, sans-serif">TODAY</text><text x="660" y="56" fill="#8b949e" font-size="12" font-weight="700" text-anchor="start" font-family="DejaVu Sans, sans-serif">RUST?</text><text x="830" y="56" fill="#8b949e" font-size="12" font-weight="700" text-anchor="start" font-family="DejaVu Sans, sans-serif">DOMINANT REASON</text><rect x="30" y="66" width="8" height="26" rx="2" fill="#6e3b3b"/><text x="50" y="84" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">L3–L6 Python orchestration</text><text x="560" y="84" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">Python</text><text x="660" y="84" fill="#8b949e" font-size="12" font-weight="600" text-anchor="start" font-family="DejaVu Sans Mono, monospace">░ keep Python</text><text x="830" y="84" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">[ERGONOMICS]</text><line x1="30" y1="98" x2="1150" y2="98" stroke="#30363d" stroke-width="1"/><rect x="30" y="100" width="8" height="26" rx="2" fill="#2ea043"/><text x="50" y="118" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">Scheduler (queues, dispatch loop)</text><text x="560" y="118" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">C++</text><text x="660" y="118" fill="#2ea043" font-size="12" font-weight="600" text-anchor="start" font-family="DejaVu Sans Mono, monospace">██ STRONG</text><text x="830" y="118" fill="#e6edf3" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">[CONCURRENCY-SAFETY]</text><rect x="30" y="134" width="8" height="26" rx="2" fill="#2ea043"/><text x="50" y="152" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">Orchestrator (Ring · TensorMap · Scope)</text><text x="560" y="152" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">C++</text><text x="660" y="152" fill="#2ea043" font-size="12" font-weight="600" text-anchor="start" font-family="DejaVu Sans Mono, monospace">██ STRONG</text><text x="830" y="152" fill="#e6edf3" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">[LIFETIME-SAFETY]</text><rect x="30" y="168" width="8" height="26" rx="2" fill="#2ea043"/><text x="50" y="186" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">Remote L3 wire codec / endpoint</text><text x="560" y="186" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">C++</text><text x="660" y="186" fill="#2ea043" font-size="12" font-weight="600" text-anchor="start" font-family="DejaVu Sans Mono, monospace">██ STRONG</text><text x="830" y="186" fill="#e6edf3" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">[PARSING-SAFETY]</text><rect x="30" y="202" width="8" height="26" rx="2" fill="#9e6a1f"/><text x="50" y="220" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">WorkerManager / mailbox dispatch</text><text x="560" y="220" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">C++</text><text x="660" y="220" fill="#9e6a1f" font-size="12" font-weight="600" text-anchor="start" font-family="DejaVu Sans Mono, monospace">▓ MODERATE</text><text x="830" y="220" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">[CONCURRENCY-SAFETY]</text><rect x="30" y="236" width="8" height="26" rx="2" fill="#6e3b3b"/><text x="50" y="254" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">Host Runtime / DeviceRunner / C API</text><text x="560" y="254" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">C++</text><text x="660" y="254" fill="#8b949e" font-size="12" font-weight="600" text-anchor="start" font-family="DejaVu Sans Mono, monospace">░ weak</text><text x="830" y="254" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">[FFI-COST]</text><rect x="30" y="270" width="8" height="26" rx="2" fill="#6e3b3b"/><text x="50" y="288" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">nanobind bindings</text><text x="560" y="288" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">C++</text><text x="660" y="288" fill="#8b949e" font-size="12" font-weight="600" text-anchor="start" font-family="DejaVu Sans Mono, monospace">░ weak</text><text x="830" y="288" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">[INTEROP]</text><line x1="30" y1="302" x2="1150" y2="302" stroke="#7c4a2d" stroke-width="2" stroke-dasharray="7 4"/><text x="1146" y="317" fill="#7c4a2d" font-size="11" font-weight="700" text-anchor="end" font-family="DejaVu Sans, sans-serif">L2 device boundary</text><rect x="30" y="304" width="8" height="26" rx="2" fill="#6e3b3b"/><text x="50" y="322" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">AICPU scheduler kernel (device)</text><text x="560" y="322" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">C++</text><text x="660" y="322" fill="#8b949e" font-size="12" font-weight="600" text-anchor="start" font-family="DejaVu Sans Mono, monospace">░ blocked</text><text x="830" y="322" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">[TOOLCHAIN]</text><rect x="30" y="338" width="8" height="26" rx="2" fill="#6e3b3b"/><text x="50" y="356" fill="#e6edf3" font-size="13" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">AICore compute kernel (PTO ISA)</text><text x="560" y="356" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">C++/PTO</text><text x="660" y="356" fill="#8b949e" font-size="12" font-weight="600" text-anchor="start" font-family="DejaVu Sans Mono, monospace">░ no</text><text x="830" y="356" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans Mono, monospace">[NO-BACKEND]</text><text x="30" y="390" fill="#8b949e" font-size="12" font-weight="400" text-anchor="start" font-family="DejaVu Sans, sans-serif">Strong targets = the host coordination core: races, slot-lifetime UAF, untrusted wire bytes — compile-time-eliminated by Rust.</text></g>
+</svg>
\ No newline at end of file
diff --git a/docs/diagrams/make_diagrams.py b/docs/diagrams/make_diagrams.py
new file mode 100644
index 000000000..e66561fef
--- /dev/null
+++ b/docs/diagrams/make_diagrams.py
@@ -0,0 +1,177 @@
+#!/usr/bin/env python3
+"""Render the SIMPLER architecture diagrams as a single self-contained SVG:
+  1. L0-L6 level model (the 7-layer hierarchy)
+  2. Three engine components + L2 three-program model
+  3. Rust-suitability map over the architecture
+Run: python3 make_diagrams.py  ->  architecture.svg
+Pure stdlib; no external deps. Edit here and re-run to regenerate.
+"""
+import html, os
+
+W = 1180
+PADDED = []  # (svg fragment, height) appended per panel
+
+# ---- palette ----
+BG = "#0e1116"; PANEL = "#161b22"; INK = "#e6edf3"; MUTE = "#8b949e"
+LINE = "#30363d"
+ON_DEV = "#7c4a2d"   # on-device (L0-L2) accent
+HOST = "#1f4e6b"     # host/cluster accent
+STRONG = "#2ea043"   # rust strong
+MOD = "#9e6a1f"      # rust moderate
+WEAK = "#6e3b3b"     # rust weak/no
+ORCH = "#b7791f"; SCHED = "#2f6f4f"; WORK = "#3a5f8a"
+
+
+def esc(s): return html.escape(str(s))
+
+
+def box(x, y, w, h, fill, text, sub="", rx=8, tcol=INK, scol=MUTE, ts=15, anchor="middle", stroke=LINE):
+    cx = x + w / 2 if anchor == "middle" else x + 12
+    out = [f'<rect x="{x}" y="{y}" width="{w}" height="{h}" rx="{rx}" fill="{fill}" stroke="{stroke}" stroke-width="1"/>']
+    ty = y + (h / 2 + ts / 2 - 3 if not sub else h / 2 - 4)
+    out.append(f'<text x="{cx}" y="{ty:.0f}" fill="{tcol}" font-size="{ts}" font-weight="600" text-anchor="{anchor}" font-family="DejaVu Sans, sans-serif">{esc(text)}</text>')
+    if sub:
+        out.append(f'<text x="{cx}" y="{y + h/2 + 14:.0f}" fill="{scol}" font-size="11.5" text-anchor="{anchor}" font-family="DejaVu Sans Mono, monospace">{esc(sub)}</text>')
+    return "".join(out)
+
+
+def label(x, y, text, col=INK, ts=13, anchor="start", weight="400", mono=False):
+    fam = "DejaVu Sans Mono, monospace" if mono else "DejaVu Sans, sans-serif"
+    return f'<text x="{x}" y="{y}" fill="{col}" font-size="{ts}" font-weight="{weight}" text-anchor="{anchor}" font-family="{fam}">{esc(text)}</text>'
+
+
+def arrow(x1, y1, x2, y2, col=MUTE, w=1.6, dash=""):
+    d = f' stroke-dasharray="{dash}"' if dash else ""
+    return (f'<line x1="{x1}" y1="{y1}" x2="{x2}" y2="{y2}" stroke="{col}" stroke-width="{w}"{d} marker-end="url(#arr)"/>')
+
+
+# ============================ Panel 1: L0-L6 level model ============================
+def panel_level():
+    h = 340
+    o = [f'<text x="30" y="34" fill="{INK}" font-size="20" font-weight="700" font-family="DejaVu Sans">1 · Level model — the 7-layer hierarchy (L0–L6)</text>']
+    rows = [
+        ("L6", "CLOS2 / Cluster", "full cluster (N6 super-nodes)", "Worker(level=5) ×N", HOST),
+        ("L5", "CLOS1 / SuperNode", "super-node (N5 pods)", "Worker(level=4) ×N", HOST),
+        ("L4", "POD / Pod", "pod (4 hosts)", "Worker(level=3) ×N + Sub ×M", HOST),
+        ("L3", "HOST / Node", "one host (16 chips + M subs)", "ChipWorker ×N + SubWorker ×M", HOST),
+        ("L2", "CHIP / Processor", "one NPU chip (shared GM)", "Host.so + AICPU.so + AICore.o", ON_DEV),
+        ("L1", "DIE / L2Cache", "chip die", "hardware-managed", ON_DEV),
+        ("L0", "CORE / AIV, AIC", "individual compute core", "hardware-managed", ON_DEV),
+    ]
+    y0, rh = 56, 36
+    for i, (lv, name, unit, comp, accent) in enumerate(rows):
+        y = y0 + i * rh
+        o.append(f'<rect x="30" y="{y}" width="40" height="{rh-6}" rx="5" fill="{accent}"/>')
+        o.append(label(50, y + 21, lv, INK, 14, "middle", "700"))
+        o.append(label(86, y + 21, name, INK, 14, weight="600"))
+        o.append(label(300, y + 21, unit, MUTE, 12.5))
+        o.append(label(600, y + 21, comp, INK, 12.5, mono=True))
+    # boundary line between L2 and L3 (after row idx 3)
+    by = y0 + 4 * rh - 3
+    o.append(f'<line x1="30" y1="{by}" x2="{W-200}" y2="{by}" stroke="{STRONG}" stroke-width="2" stroke-dasharray="7 4"/>')
+    o.append(label(W-195, by - 6, "◄ L2 BOUNDARY", STRONG, 12, "start", "700"))
+    # world brackets
+    o.append(label(W-195, y0 + 24, "HOST / CLUSTER", HOST, 12, "start", "700"))
+    o.append(label(W-195, y0 + 40, "Orch+Sched+Worker", MUTE, 11, mono=True))
+    o.append(label(W-195, y0 + 54, "IPC · RoCE · HCCS", MUTE, 11, mono=True))
+    o.append(label(W-195, by + 40, "ON-DEVICE", ON_DEV, 12, "start", "700"))
+    o.append(label(W-195, by + 56, "shared GM + atomics", MUTE, 11, mono=True))
+    return "".join(o), h
+
+
+# ============== Panel 2: three engine components + L2 three-program model ==============
+def panel_engine():
+    h = 470
+    o = [f'<text x="30" y="34" fill="{INK}" font-size="20" font-weight="700" font-family="DejaVu Sans">2 · Engine components (L3+) and the L2 three-program model</text>']
+    # three engine boxes
+    o.append(box(30, 56, 360, 96, PANEL, "ORCHESTRATOR", "Orch thread · DAG builder", stroke=ORCH))
+    o.append(label(46, 118, "Ring · TensorMap · Scope", MUTE, 12, mono=True))
+    o.append(label(46, 136, "submit_next_level(c, args, cfg)", MUTE, 12, mono=True))
+    o.append(box(410, 56, 360, 96, PANEL, "SCHEDULER", "Scheduler thread · DAG executor", stroke=SCHED))
+    o.append(label(426, 118, "wiring → ready → completion queues", MUTE, 12, mono=True))
+    o.append(label(426, 136, "moves slot ids; never reads task data", MUTE, 12, mono=True))
+    o.append(box(790, 56, 360, 96, PANEL, "WORKER", "Worker threads · execution", stroke=WORK))
+    o.append(label(806, 118, "WorkerManager + WorkerThread pool", MUTE, 12, mono=True))
+    o.append(label(806, 136, "shm mailbox → forked child → poll", MUTE, 12, mono=True))
+    o.append(arrow(390, 104, 410, 104, ORCH, 2))
+    o.append(arrow(770, 104, 790, 104, SCHED, 2))
+    o.append(label(580, 100, "wiring", MUTE, 10, "middle", mono=True))
+    o.append(label(960, 100, "dispatch", MUTE, 10, "middle", mono=True))
+    o.append(arrow(790, 132, 770, 132, WORK, 1.4, "4 3"))
+    o.append(label(960, 146, "◄ completion (slot, outcome)", MUTE, 10, "middle", mono=True))
+
+    # L2 three-program model below
+    o.append(label(30, 196, "At L2 the Worker leaf = ChipWorker driving three on-device programs:", MUTE, 13))
+    o.append(box(30, 214, 1120, 40, PANEL, "Python application / SceneTestCase  —  nanobind · ChipWorker (dlopen host.so) · RuntimeBuilder / KernelCompiler", "", ts=13))
+    o.append(arrow(590, 254, 590, 280, MUTE, 2))
+    o.append(box(30, 286, 540, 64, PANEL, "Host Runtime (C++ .so)", "DeviceRunner · MemoryAllocator · C API", stroke=WEAK))
+    o.append(box(610, 286, 540, 64, PANEL, "Binary data", "AICPU .so  +  AICore .o   (dlopen'd / launched)", stroke=LINE))
+    o.append(arrow(300, 350, 300, 378, MUTE, 2))
+    o.append(arrow(880, 350, 880, 378, MUTE, 2))
+    o.append(box(30, 384, 1120, 70, "#10222b", "Ascend device", "", stroke=ON_DEV))
+    o.append(label(60, 412, "AICPU: task scheduler loop", INK, 13))
+    o.append(label(60, 432, "AICore (AIV/AIC): compute kernels", INK, 13))
+    o.append(label(720, 412, "◄── handshake buffers ──►", MUTE, 12, "middle", mono=True))
+    o.append(label(720, 432, "aicpu_ready · aicore_done · task ptr", MUTE, 11, "middle", mono=True))
+    return "".join(o), h
+
+
+# ==================== Panel 3: Rust-suitability map ====================
+def panel_rust():
+    rows = [
+        ("L3–L6 Python orchestration", "Python", WEAK, "░ keep Python", "ERGONOMICS"),
+        ("Scheduler (queues, dispatch loop)", "C++", STRONG, "██ STRONG", "CONCURRENCY-SAFETY"),
+        ("Orchestrator (Ring · TensorMap · Scope)", "C++", STRONG, "██ STRONG", "LIFETIME-SAFETY"),
+        ("Remote L3 wire codec / endpoint", "C++", STRONG, "██ STRONG", "PARSING-SAFETY"),
+        ("WorkerManager / mailbox dispatch", "C++", MOD, "▓ MODERATE", "CONCURRENCY-SAFETY"),
+        ("Host Runtime / DeviceRunner / C API", "C++", WEAK, "░ weak", "FFI-COST"),
+        ("nanobind bindings", "C++", WEAK, "░ weak", "INTEROP"),
+        ("AICPU scheduler kernel (device)", "C++", WEAK, "░ blocked", "TOOLCHAIN"),
+        ("AICore compute kernel (PTO ISA)", "C++/PTO", WEAK, "░ no", "NO-BACKEND"),
+    ]
+    rh = 34
+    h = 70 + len(rows) * rh + 30
+    o = [f'<text x="30" y="34" fill="{INK}" font-size="20" font-weight="700" font-family="DejaVu Sans">3 · Rust-suitability map — dominant reason per component</text>']
+    o.append(label(30, 56, "COMPONENT", MUTE, 12, "start", "700"))
+    o.append(label(560, 56, "TODAY", MUTE, 12, "start", "700"))
+    o.append(label(660, 56, "RUST?", MUTE, 12, "start", "700"))
+    o.append(label(830, 56, "DOMINANT REASON", MUTE, 12, "start", "700"))
+    y0 = 66
+    for i, (comp, today, col, verdict, reason) in enumerate(rows):
+        y = y0 + i * rh
+        if i == 1:  # divider before the host engine block
+            o.append(f'<line x1="30" y1="{y-2}" x2="{W-30}" y2="{y-2}" stroke="{LINE}" stroke-width="1"/>')
+        if i == 7:  # device boundary
+            o.append(f'<line x1="30" y1="{y-2}" x2="{W-30}" y2="{y-2}" stroke="{ON_DEV}" stroke-width="2" stroke-dasharray="7 4"/>')
+            o.append(label(W-34, y+13, "L2 device boundary", ON_DEV, 11, "end", "700"))
+        o.append(f'<rect x="30" y="{y}" width="8" height="{rh-8}" rx="2" fill="{col}"/>')
+        o.append(label(50, y + 18, comp, INK, 13))
+        o.append(label(560, y + 18, today, MUTE, 12, mono=True))
+        o.append(label(660, y + 18, verdict, col if col != WEAK else MUTE, 12, mono=True, weight="600"))
+        o.append(label(830, y + 18, "[" + reason + "]", INK if col == STRONG else MUTE, 12, mono=True))
+    o.append(label(30, y0 + len(rows)*rh + 18,
+                   "Strong targets = the host coordination core: races, slot-lifetime UAF, untrusted wire bytes — compile-time-eliminated by Rust.",
+                   MUTE, 12))
+    return "".join(o), h
+
+
+panels = [panel_level(), panel_engine(), panel_rust()]
+gap = 28
+total_h = sum(h for _, h in panels) + gap * (len(panels) + 1)
+
+parts = [
+    f'<svg xmlns="http://www.w3.org/2000/svg" width="{W}" height="{total_h}" viewBox="0 0 {W} {total_h}" font-family="DejaVu Sans, sans-serif">',
+    f'<defs><marker id="arr" markerWidth="9" markerHeight="9" refX="7" refY="4.5" orient="auto"><path d="M0,0 L9,4.5 L0,9 z" fill="{MUTE}"/></marker></defs>',
+    f'<rect width="{W}" height="{total_h}" fill="{BG}"/>',
+    f'<text x="30" y="-6" fill="{INK}"></text>',
+]
+cy = gap
+for frag, hh in panels:
+    parts.append(f'<rect x="14" y="{cy-10}" width="{W-28}" height="{hh+18}" rx="10" fill="{PANEL}" opacity="0.35"/>')
+    parts.append(f'<g transform="translate(0,{cy})">{frag}</g>')
+    cy += hh + gap
+parts.append("</svg>")
+
+out = os.path.join(os.path.dirname(os.path.abspath(__file__)), "architecture.svg")
+open(out, "w").write("\n".join(parts))
+print("wrote", out, f"({total_h}px tall)")