Commit 57736af
WeightsCacheManager: per-cache-path instances (#20337)
Summary:
Split `XNNWeightsCache` out of its de-facto singleton lifetime (one instance per `XnnpackBackend`, shared across every PTE that opted into the file-backed cache) into a per-cache-file-path instance dispensed by a new `XNNWeightsCacheManager`. Mirrors the `XNNWorkspaceManager` PerModel pattern: same path → same shared instance; different paths → independent instances; empty path → one shared heap-only instance so XNNPACK's in-memory name dedup still works across PTEs.
The singleton design was forced today because `XnnpackBackendOptions::weights_cache_` is a by-value member and `XnnpackBackend` itself is a namespace-scope global (`XNNPACKBackend.cpp:246`). For PTEs that genuinely share a cache file the singleton's per-entry refcounting works, but two PTEs with **different** packed-cache paths hit the same `XNNWeightsCache` instance, so the second PTE's `initialize_for_runtime` calls `ftruncate(0)` on the file under the first PTE's still-live mmap regions — every subsequent access SIGBUSes (P2369924970 traces this). The clue the prior fix added — the warning in `XNNWeightsCache.h:147-151` "the path MUST be unique per `XNNWeightsCache` instance" — is enforced here at the manager level rather than relying on each caller to honor it.
Design:
`XNNWeightsCacheManager` (new, mirrors `XNNWorkspaceManager`):
- `std::unordered_map<std::string, std::weak_ptr<XNNWeightsCache>> caches_` keyed by absolute cache file path.
- `get_or_create(path)` looks up the path under a single `meta_mutex_`; if a live `weak_ptr` exists, returns the shared instance, otherwise constructs a new one, calls `set_packed_cache_path(path)` BEFORE registering it, and stores a `weak_ptr`.
- `meta_mutex_` is held only during the map op — never across any call into `XNNWeightsCache`, so different-path callers proceed in parallel after the brief window.
- `save_all()` snapshots live shared_ptrs under `meta_mutex_`, then iterates outside the meta lock and acquires each instance's own mutex around `save_packed_index()`.
- Empty path uses a separate `empty_path_cache_` (weak_ptr) + `empty_path_mutex_`: all heap-only callers (NGTTS sub-runners, FLLM classifier, PLLM methods when mmap MC is off) share one instance so XNNPACK's in-memory `look_up_or_insert` name dedup catches duplicate weights across PTEs and across methods within a PTE. Without this sharing, every `XnnpackBackend::init` allocated its own packed copy of every weight, regressing heap-only memory by ~500 MB on LoRA-multimethod PLLM (paste P2380809516: app_phys peak 1731 MB with per-instance vs ~1260 MB with the prior process-singleton). The hazard the per-path keying motivates — non-opt-in PTE inheriting an opt-in PTE's path and writing into its mmap file — never applies to empty-path callers because they hold no path / no fd / no mmap regions, so sharing among empty-path callers carries no isolation cost.
- Expired `weak_ptr` entries are erased opportunistically on the next `get_or_create` / `save_all` for that path. Stale entries from never-revisited paths linger; cost is one string + weak_ptr per dead entry. Acceptable per `XNNWorkspaceManager` precedent.
`XNNWeightsCache`:
- New `std::mutex instance_mutex_` member + `mutex()` accessor. The class has no internal synchronization; callers are responsible for holding `mutex()` around every method invocation, INCLUDING the XNNPACK callback paths (`look_up`, `reserve_space`, `look_up_or_insert`) that fire during `xnn_create_runtime`.
- `set_packed_cache_path` is documented as call-once-before-publish: production callers go through the manager, which sets the path before installing the `shared_ptr` in the map, so no other thread can observe the instance yet. Tests that construct the class directly must respect this contract.
`XnnpackBackendOptions`:
- Replaced `XNNWeightsCache weights_cache_` + `std::mutex weights_cache_mutex_` with `XNNWeightsCacheManager weights_cache_manager_`.
- New `get_or_create_weights_cache(path)` thin wrapper around the manager.
- `save_weights_cache_locked()` now walks every live cache via the manager's `save_all()`.
- `packed_cache_path_` keeps a small `path_mutex_` to serialize the `set_option(packed_cache_path_option_key)` → `init()` read; this is just transport for the option value, the path's authoritative home is per-instance inside each cache.
`XNNPACKBackend`:
- `init`: pulls the path from per-PTE `runtime_spec` (see "Per-PTE caller signal" below), asks the manager for the shared `XNNWeightsCache`, locks its `mutex()` for the entire init→compileModel sequence, then publishes the `shared_ptr` into the executor via the new `XNNExecutor::set_weights_cache`. Same-path PTEs serialize on the same instance mutex; different-path PTEs hold different mutexes and proceed in parallel — the singleton design forced full serialization here.
- `execute`: lock the per-executor cache's mutex (if any) instead of the global one. Concurrent execute on independent caches now runs in parallel.
- `destroy`: lock the per-executor cache's mutex, call `delete_packed_data`. The local `shared_ptr` keeps the instance alive across `delete_packed_data` even if dropping it from the executor was the last outside reference.
`XNNExecutor`:
- New `std::shared_ptr<XNNWeightsCache> weights_cache_` member, set once after `compileModel`. Forward-declared (rather than including `XNNWeightsCache.h`) to keep the transitive `pte_data_map.h` dependency out of the executor's public header — preserves the existing `xnnexecutor_test` build dep set.
Per-PTE caller signal (the FLLM / NGTTS isolation guarantee):
The manager's per-path dedup is necessary but not sufficient — if a non-opt-in PTE inherits an opt-in PTE's globally-set path, the manager hands it the same shared instance and the non-opt-in PTE's `reserve_space` writes into the opt-in model's mmap file. Investigation of the three on-device loaders confirms the concrete risk: cria PLLM pushes `packed_cache_path_option_key` globally (`runner_interface.h:365-373`), but NGTTS sub-runners (`AcousticRunner` / `HfMimiRunner` / `SemanticLmRunner` at `executorch/examples/models/fb/llama4/runner/*.cpp`) bypass cria entirely and never push a path, and the cria FLLM classifier path skips the push when `FactoryMetaData::useMmapPackedWeights = false` (`CriaHost.cpp:220-224`). All three loaders run in the same process and share one `XnnpackBackend` global (`XNNPACKBackend.cpp:246`).
Two complementary changes lock this down:
1. `XNNPACKBackend::init` no longer reads `options_.get_packed_cache_path()` (shared backend-singleton state). It reads the path strictly from `BackendInitContext::get_runtime_spec<const char*>(packed_cache_path_option_key)` — the only per-PTE signal that proves THIS PTE explicitly opted in. If `runtime_spec` carries no path, `cache_path` is empty and the manager hands the shared `empty_path_cache_` (per the empty-path branch above). Non-opt-in PTEs are guaranteed isolated from the mmap-path file regardless of what the global path happens to hold; they still dedupe against one another in the shared empty-path cache.
2. cria `runner_interface.h::loadModel()` no longer pushes XNNPACK options globally via `executorch::runtime::set_option`. It now builds a `BackendOptions<3>` carrying path / `weight_cache_option_key` / `workspace_sharing_mode_option_key`, wraps it in a `LoadBackendOptionsMap`, and passes that map to every `Module::load_method` call (primary, multimethod loop, YOCO prefill/decode). The `BackendOptions` and map both live on the `loadModel` stack frame, which extends through every `load_method` call — Span lifetime requirements satisfied. Per-PTE options propagate into the backend's `BackendInitContext::runtime_spec` via `Method::init`'s `LoadBackendOptionsMap` path (`method.cpp:957-963`). Non-opt-in cria PTEs and non-cria loaders (NGTTS, direct-Module) simply don't pass a map → empty runtime_spec → init forces empty path → shared heap-only instance with dedup.
Lock hierarchy (updated):
- `weights_cache_manager_.meta_mutex_` (leaf — only during path-keyed map ops, never held across calls into instances)
- `weights_cache_manager_.empty_path_mutex_` (leaf — only during empty-path weak_ptr lookup/store)
- `XNNWeightsCache::instance_mutex_` (one per cache)
- `workspace_meta_mutex_`
- `workspace_mutex_` (owned by executor)
Race-condition / corner-case coverage:
- Same-path concurrent `get_or_create`: serialize on `meta_mutex_`, both return the same shared instance.
- Different-path concurrent `get_or_create`: parallel after the brief `meta_mutex_` window.
- Mid-load contention: same-path callers serialize on the instance mutex around `initialize_for_runtime`.
- Cross-PTE clobbering (the original bug): impossible — each path owns its own instance.
- Cross-process same-path: existing `flock(LOCK_EX|LOCK_NB)` defense untouched.
- Cache file deleted on disk: existing mmap stays valid (unix unlink semantics); manager doesn't track disk state.
- Process shutdown mid-save: executor-held `shared_ptr` outlives the manager map; instance destruction follows the executor's normal teardown.
- XNNPACK seed mismatch / cache format bump: existing per-entry seed reject + v1-trailer reject paths untouched.
- Empty path: shared via `empty_path_cache_` weak_ptr; recreated when all shared_ptrs drop; never collides with any mmap-path instance.
- Concurrent same-cache execute + destroy: serialize on the instance mutex.
- Stale global path inherited by non-opt-in PTE: prevented by the runtime_spec-only path read in init.
Mirrored to `fbcode/executorch/backends/xnnpack/runtime/`. The cria change lives only under `xplat/cria/` (no fbcode mirror).
### Test plan
Built `fbsource//xplat/executorch/backends/xnnpack:xnnpack_backend` on linux, Apple, Android, and `fbcode//executorch/backends/xnnpack:xnnpack_backend` on linux — all green. Built downstream consumers to verify the API change is binary-compatible: `fbsource//xplat/cria/core:cria{Apple,Android}`, `fbsource//xplat/sgr/ml_service/modules/llm:lib_sgr_llmApple`, `fbsource//xplat/assistant/oacr/trims/modules/ondevice_modules:mwa_ondevice_moduleApple` — all green.
```
buck2 test \
fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache_manager \
fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache \
fbcode//executorch/backends/xnnpack/test:test_workspace_manager \
fbcode//executorch/backends/xnnpack/test:xnnexecutor_test
→ Pass 38. Fail 0. Build failure 0.
```
The new `test_xnn_weights_cache_manager` exercises 13 hazard cases the manager handles: `SamePathReturnsSameInstance`, `DifferentPathsReturnDifferentInstances`, `EmptyPathSharedAcrossCallers`, `EmptyPathRecreatedAfterAllRefsDrop`, `EmptyPathDoesNotShareWithMmapPath`, `ExpiredEntryDoesNotLeak`, `ExpiredEntryRecreatedOnNextCall`, `ConcurrentSamePathSameInstance` (16-thread fan-in), `ConcurrentDifferentPathsIndependent` (8-thread fan-out), `SaveAllNoLiveInstancesIsOk`, `SaveAllWalksLiveCaches`, `SaveAllSkipsExpiredEntries`, `NonEmptyPathRegistersInMap`.
Cria runner tests (`fbsource//xplat/cria/core/runner/tests/...`): Pass 938, Fail 0. The 18 `Fatal` entries reported by buck2 (`PrefillReturnsLogits`, `PrefillMapsParams`, `PrefillStringPrompt`, etc.) reproduce identically on this commit's parent (605126226e, no cria change, no init guard) with the same OpenMP/MKL/ASan SEGV stack — `kmp_basic_flag_native::done_check` → `__kmp_hyper_barrier_release` triggered by `mkl_blas_sgemm_omp_driver_v1` racing with `pthread_create` from `pthreadpool_create_v2`. These are pre-existing flakes in the asan-ubsan platform configuration, not caused by either the manager refactor or the runtime_spec migration.
`arc lint -a` clean across all 22 changed/added files (11 xplat + 11 fbcode mirrors; cria is xplat-only).
Differential Revision: D1084315101 parent 574bfca commit 57736af
10 files changed
Lines changed: 638 additions & 71 deletions
File tree
- backends/xnnpack
- runtime
- test
- runtime
- shim_et/xplat/executorch/build
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
28 | 31 | | |
29 | 32 | | |
30 | 33 | | |
| |||
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
40 | 47 | | |
41 | 48 | | |
42 | 49 | | |
| |||
71 | 78 | | |
72 | 79 | | |
73 | 80 | | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
74 | 95 | | |
75 | 96 | | |
76 | 97 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
98 | | - | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
99 | 103 | | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
104 | 122 | | |
105 | | - | |
| 123 | + | |
106 | 124 | | |
107 | 125 | | |
108 | 126 | | |
| |||
118 | 136 | | |
119 | 137 | | |
120 | 138 | | |
121 | | - | |
| 139 | + | |
122 | 140 | | |
123 | 141 | | |
124 | 142 | | |
| |||
135 | 153 | | |
136 | 154 | | |
137 | 155 | | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
138 | 164 | | |
139 | 165 | | |
140 | 166 | | |
| |||
146 | 172 | | |
147 | 173 | | |
148 | 174 | | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
153 | 184 | | |
154 | 185 | | |
155 | 186 | | |
| |||
176 | 207 | | |
177 | 208 | | |
178 | 209 | | |
179 | | - | |
180 | | - | |
181 | | - | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
182 | 220 | | |
183 | 221 | | |
184 | 222 | | |
185 | 223 | | |
186 | 224 | | |
187 | | - | |
188 | | - | |
189 | | - | |
| 225 | + | |
| 226 | + | |
190 | 227 | | |
191 | 228 | | |
192 | 229 | | |
| |||
237 | 274 | | |
238 | 275 | | |
239 | 276 | | |
240 | | - | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
241 | 280 | | |
242 | 281 | | |
243 | 282 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
| 17 | + | |
17 | 18 | | |
18 | 19 | | |
19 | 20 | | |
| |||
143 | 144 | | |
144 | 145 | | |
145 | 146 | | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
151 | 158 | | |
152 | 159 | | |
153 | 160 | | |
154 | 161 | | |
155 | 162 | | |
156 | 163 | | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
157 | 185 | | |
158 | 186 | | |
159 | 187 | | |
| |||
215 | 243 | | |
216 | 244 | | |
217 | 245 | | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
218 | 252 | | |
219 | 253 | | |
220 | 254 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
0 commit comments