Skip to content

Commit 57736af

Browse files
doggeralfacebook-github-bot
authored andcommitted
WeightsCacheManager: per-cache-path instances (#20337)
Summary: Split `XNNWeightsCache` out of its de-facto singleton lifetime (one instance per `XnnpackBackend`, shared across every PTE that opted into the file-backed cache) into a per-cache-file-path instance dispensed by a new `XNNWeightsCacheManager`. Mirrors the `XNNWorkspaceManager` PerModel pattern: same path → same shared instance; different paths → independent instances; empty path → one shared heap-only instance so XNNPACK's in-memory name dedup still works across PTEs. The singleton design was forced today because `XnnpackBackendOptions::weights_cache_` is a by-value member and `XnnpackBackend` itself is a namespace-scope global (`XNNPACKBackend.cpp:246`). For PTEs that genuinely share a cache file the singleton's per-entry refcounting works, but two PTEs with **different** packed-cache paths hit the same `XNNWeightsCache` instance, so the second PTE's `initialize_for_runtime` calls `ftruncate(0)` on the file under the first PTE's still-live mmap regions — every subsequent access SIGBUSes (P2369924970 traces this). The clue the prior fix added — the warning in `XNNWeightsCache.h:147-151` "the path MUST be unique per `XNNWeightsCache` instance" — is enforced here at the manager level rather than relying on each caller to honor it. Design: `XNNWeightsCacheManager` (new, mirrors `XNNWorkspaceManager`): - `std::unordered_map<std::string, std::weak_ptr<XNNWeightsCache>> caches_` keyed by absolute cache file path. - `get_or_create(path)` looks up the path under a single `meta_mutex_`; if a live `weak_ptr` exists, returns the shared instance, otherwise constructs a new one, calls `set_packed_cache_path(path)` BEFORE registering it, and stores a `weak_ptr`. - `meta_mutex_` is held only during the map op — never across any call into `XNNWeightsCache`, so different-path callers proceed in parallel after the brief window. - `save_all()` snapshots live shared_ptrs under `meta_mutex_`, then iterates outside the meta lock and acquires each instance's own mutex around `save_packed_index()`. - Empty path uses a separate `empty_path_cache_` (weak_ptr) + `empty_path_mutex_`: all heap-only callers (NGTTS sub-runners, FLLM classifier, PLLM methods when mmap MC is off) share one instance so XNNPACK's in-memory `look_up_or_insert` name dedup catches duplicate weights across PTEs and across methods within a PTE. Without this sharing, every `XnnpackBackend::init` allocated its own packed copy of every weight, regressing heap-only memory by ~500 MB on LoRA-multimethod PLLM (paste P2380809516: app_phys peak 1731 MB with per-instance vs ~1260 MB with the prior process-singleton). The hazard the per-path keying motivates — non-opt-in PTE inheriting an opt-in PTE's path and writing into its mmap file — never applies to empty-path callers because they hold no path / no fd / no mmap regions, so sharing among empty-path callers carries no isolation cost. - Expired `weak_ptr` entries are erased opportunistically on the next `get_or_create` / `save_all` for that path. Stale entries from never-revisited paths linger; cost is one string + weak_ptr per dead entry. Acceptable per `XNNWorkspaceManager` precedent. `XNNWeightsCache`: - New `std::mutex instance_mutex_` member + `mutex()` accessor. The class has no internal synchronization; callers are responsible for holding `mutex()` around every method invocation, INCLUDING the XNNPACK callback paths (`look_up`, `reserve_space`, `look_up_or_insert`) that fire during `xnn_create_runtime`. - `set_packed_cache_path` is documented as call-once-before-publish: production callers go through the manager, which sets the path before installing the `shared_ptr` in the map, so no other thread can observe the instance yet. Tests that construct the class directly must respect this contract. `XnnpackBackendOptions`: - Replaced `XNNWeightsCache weights_cache_` + `std::mutex weights_cache_mutex_` with `XNNWeightsCacheManager weights_cache_manager_`. - New `get_or_create_weights_cache(path)` thin wrapper around the manager. - `save_weights_cache_locked()` now walks every live cache via the manager's `save_all()`. - `packed_cache_path_` keeps a small `path_mutex_` to serialize the `set_option(packed_cache_path_option_key)` → `init()` read; this is just transport for the option value, the path's authoritative home is per-instance inside each cache. `XNNPACKBackend`: - `init`: pulls the path from per-PTE `runtime_spec` (see "Per-PTE caller signal" below), asks the manager for the shared `XNNWeightsCache`, locks its `mutex()` for the entire init→compileModel sequence, then publishes the `shared_ptr` into the executor via the new `XNNExecutor::set_weights_cache`. Same-path PTEs serialize on the same instance mutex; different-path PTEs hold different mutexes and proceed in parallel — the singleton design forced full serialization here. - `execute`: lock the per-executor cache's mutex (if any) instead of the global one. Concurrent execute on independent caches now runs in parallel. - `destroy`: lock the per-executor cache's mutex, call `delete_packed_data`. The local `shared_ptr` keeps the instance alive across `delete_packed_data` even if dropping it from the executor was the last outside reference. `XNNExecutor`: - New `std::shared_ptr<XNNWeightsCache> weights_cache_` member, set once after `compileModel`. Forward-declared (rather than including `XNNWeightsCache.h`) to keep the transitive `pte_data_map.h` dependency out of the executor's public header — preserves the existing `xnnexecutor_test` build dep set. Per-PTE caller signal (the FLLM / NGTTS isolation guarantee): The manager's per-path dedup is necessary but not sufficient — if a non-opt-in PTE inherits an opt-in PTE's globally-set path, the manager hands it the same shared instance and the non-opt-in PTE's `reserve_space` writes into the opt-in model's mmap file. Investigation of the three on-device loaders confirms the concrete risk: cria PLLM pushes `packed_cache_path_option_key` globally (`runner_interface.h:365-373`), but NGTTS sub-runners (`AcousticRunner` / `HfMimiRunner` / `SemanticLmRunner` at `executorch/examples/models/fb/llama4/runner/*.cpp`) bypass cria entirely and never push a path, and the cria FLLM classifier path skips the push when `FactoryMetaData::useMmapPackedWeights = false` (`CriaHost.cpp:220-224`). All three loaders run in the same process and share one `XnnpackBackend` global (`XNNPACKBackend.cpp:246`). Two complementary changes lock this down: 1. `XNNPACKBackend::init` no longer reads `options_.get_packed_cache_path()` (shared backend-singleton state). It reads the path strictly from `BackendInitContext::get_runtime_spec<const char*>(packed_cache_path_option_key)` — the only per-PTE signal that proves THIS PTE explicitly opted in. If `runtime_spec` carries no path, `cache_path` is empty and the manager hands the shared `empty_path_cache_` (per the empty-path branch above). Non-opt-in PTEs are guaranteed isolated from the mmap-path file regardless of what the global path happens to hold; they still dedupe against one another in the shared empty-path cache. 2. cria `runner_interface.h::loadModel()` no longer pushes XNNPACK options globally via `executorch::runtime::set_option`. It now builds a `BackendOptions<3>` carrying path / `weight_cache_option_key` / `workspace_sharing_mode_option_key`, wraps it in a `LoadBackendOptionsMap`, and passes that map to every `Module::load_method` call (primary, multimethod loop, YOCO prefill/decode). The `BackendOptions` and map both live on the `loadModel` stack frame, which extends through every `load_method` call — Span lifetime requirements satisfied. Per-PTE options propagate into the backend's `BackendInitContext::runtime_spec` via `Method::init`'s `LoadBackendOptionsMap` path (`method.cpp:957-963`). Non-opt-in cria PTEs and non-cria loaders (NGTTS, direct-Module) simply don't pass a map → empty runtime_spec → init forces empty path → shared heap-only instance with dedup. Lock hierarchy (updated): - `weights_cache_manager_.meta_mutex_` (leaf — only during path-keyed map ops, never held across calls into instances) - `weights_cache_manager_.empty_path_mutex_` (leaf — only during empty-path weak_ptr lookup/store) - `XNNWeightsCache::instance_mutex_` (one per cache) - `workspace_meta_mutex_` - `workspace_mutex_` (owned by executor) Race-condition / corner-case coverage: - Same-path concurrent `get_or_create`: serialize on `meta_mutex_`, both return the same shared instance. - Different-path concurrent `get_or_create`: parallel after the brief `meta_mutex_` window. - Mid-load contention: same-path callers serialize on the instance mutex around `initialize_for_runtime`. - Cross-PTE clobbering (the original bug): impossible — each path owns its own instance. - Cross-process same-path: existing `flock(LOCK_EX|LOCK_NB)` defense untouched. - Cache file deleted on disk: existing mmap stays valid (unix unlink semantics); manager doesn't track disk state. - Process shutdown mid-save: executor-held `shared_ptr` outlives the manager map; instance destruction follows the executor's normal teardown. - XNNPACK seed mismatch / cache format bump: existing per-entry seed reject + v1-trailer reject paths untouched. - Empty path: shared via `empty_path_cache_` weak_ptr; recreated when all shared_ptrs drop; never collides with any mmap-path instance. - Concurrent same-cache execute + destroy: serialize on the instance mutex. - Stale global path inherited by non-opt-in PTE: prevented by the runtime_spec-only path read in init. Mirrored to `fbcode/executorch/backends/xnnpack/runtime/`. The cria change lives only under `xplat/cria/` (no fbcode mirror). ### Test plan Built `fbsource//xplat/executorch/backends/xnnpack:xnnpack_backend` on linux, Apple, Android, and `fbcode//executorch/backends/xnnpack:xnnpack_backend` on linux — all green. Built downstream consumers to verify the API change is binary-compatible: `fbsource//xplat/cria/core:cria{Apple,Android}`, `fbsource//xplat/sgr/ml_service/modules/llm:lib_sgr_llmApple`, `fbsource//xplat/assistant/oacr/trims/modules/ondevice_modules:mwa_ondevice_moduleApple` — all green. ``` buck2 test \ fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache_manager \ fbcode//executorch/backends/xnnpack/test:test_xnn_weights_cache \ fbcode//executorch/backends/xnnpack/test:test_workspace_manager \ fbcode//executorch/backends/xnnpack/test:xnnexecutor_test → Pass 38. Fail 0. Build failure 0. ``` The new `test_xnn_weights_cache_manager` exercises 13 hazard cases the manager handles: `SamePathReturnsSameInstance`, `DifferentPathsReturnDifferentInstances`, `EmptyPathSharedAcrossCallers`, `EmptyPathRecreatedAfterAllRefsDrop`, `EmptyPathDoesNotShareWithMmapPath`, `ExpiredEntryDoesNotLeak`, `ExpiredEntryRecreatedOnNextCall`, `ConcurrentSamePathSameInstance` (16-thread fan-in), `ConcurrentDifferentPathsIndependent` (8-thread fan-out), `SaveAllNoLiveInstancesIsOk`, `SaveAllWalksLiveCaches`, `SaveAllSkipsExpiredEntries`, `NonEmptyPathRegistersInMap`. Cria runner tests (`fbsource//xplat/cria/core/runner/tests/...`): Pass 938, Fail 0. The 18 `Fatal` entries reported by buck2 (`PrefillReturnsLogits`, `PrefillMapsParams`, `PrefillStringPrompt`, etc.) reproduce identically on this commit's parent (605126226e, no cria change, no init guard) with the same OpenMP/MKL/ASan SEGV stack — `kmp_basic_flag_native::done_check` → `__kmp_hyper_barrier_release` triggered by `mkl_blas_sgemm_omp_driver_v1` racing with `pthread_create` from `pthreadpool_create_v2`. These are pre-existing flakes in the asan-ubsan platform configuration, not caused by either the manager refactor or the runtime_spec migration. `arc lint -a` clean across all 22 changed/added files (11 xplat + 11 fbcode mirrors; cria is xplat-only). Differential Revision: D108431510
1 parent 574bfca commit 57736af

10 files changed

Lines changed: 638 additions & 71 deletions

File tree

backends/xnnpack/runtime/XNNExecutor.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ namespace backends {
2525
namespace xnnpack {
2626
namespace delegate {
2727

28+
// Forward-declared to keep XNNWeightsCache.h out of this header.
29+
class XNNWeightsCache;
30+
2831
class XNNExecutor {
2932
private:
3033
std::unique_ptr<xnn_runtime, decltype(&xnn_delete_runtime)> runtime_{
@@ -37,6 +40,10 @@ class XNNExecutor {
3740
std::vector<xnn_external_value> externals_;
3841
std::vector<std::string> packed_data_names_;
3942
std::shared_ptr<XNNWorkspace> workspace_;
43+
// Owned so the cache outlives delete_packed_data in destroy(),
44+
// even when every other executor sharing it is gone. Empty when no
45+
// file-backed cache is in use.
46+
std::shared_ptr<XNNWeightsCache> weights_cache_;
4047
std::atomic<bool> in_use_{false};
4148
std::atomic<bool> destroyed_{false};
4249

@@ -71,6 +78,20 @@ class XNNExecutor {
7178
return workspace_;
7279
}
7380

81+
// Set once by XNNPACKBackend::init after compileModel succeeds. Pass
82+
// an empty shared_ptr if no file-backed cache is in use for this PTE
83+
// (treated identically to never calling this).
84+
inline void set_weights_cache(std::shared_ptr<XNNWeightsCache> cache) {
85+
weights_cache_ = std::move(cache);
86+
}
87+
88+
// Returns the per-PTE weights cache shared_ptr (may be empty). Used
89+
// by XNNPACKBackend::execute to lock the cache's mutex around runtime
90+
// invocation, and by destroy() to invoke delete_packed_data.
91+
inline std::shared_ptr<XNNWeightsCache> get_weights_cache() const {
92+
return weights_cache_;
93+
}
94+
7495
/**
7596
* Initialize the XNNExecutor with a given runtime and input/output ids.
7697
* The input/output ids are expected to be sorted in order of their

backends/xnnpack/runtime/XNNPACKBackend.cpp

Lines changed: 61 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -91,18 +91,36 @@ class XnnpackBackend final
9191
auto workspace = workspace_result.get();
9292

9393
bool use_weight_cache = options_.resolve_weight_cache(context);
94-
// Hold the lock for the entire init-compile-finalize sequence to prevent
95-
// concurrent inits from resetting is_finalized_ or overwriting
96-
// named_data_map_ while compileModel is using the shared weights cache.
97-
std::unique_lock<std::mutex> lock_weights_cache(
98-
options_.weights_cache_mutex(), std::defer_lock);
94+
// Per-path weights cache: acquire (or create) the shared instance
95+
// for this PTE's cache file path, then hold its instance mutex for
96+
// the entire init-compile sequence. Two PTEs targeting the same
97+
// path get the same shared instance and serialize on its mutex;
98+
// PTEs targeting different paths get independent instances and
99+
// proceed in parallel (the singleton design forced full
100+
// serialization here).
101+
std::shared_ptr<xnnpack::delegate::XNNWeightsCache> weights_cache;
102+
std::unique_lock<std::mutex> lock_weights_cache;
99103
if (use_weight_cache) {
100-
lock_weights_cache.lock();
101-
102-
const auto& cache_path = options_.get_packed_cache_path();
103-
options_.weights_cache().set_packed_cache_path(cache_path);
104+
// Per-PTE: only use a packed cache path when this PTE opted in
105+
// via runtime_spec (LoadBackendOptionsMap passed to load_method).
106+
// Ignoring the backend-singleton's global path prevents a
107+
// non-opt-in PTE from inheriting another model's cache file
108+
// when multiple models share this backend in one process.
109+
std::string cache_path;
110+
auto path_spec = context.get_runtime_spec<const char*>(
111+
xnnpack::packed_cache_path_option_key);
112+
if (path_spec.ok()) {
113+
cache_path = path_spec.get();
114+
}
115+
auto wc_result = options_.get_or_create_weights_cache(cache_path);
116+
if (!wc_result.ok()) {
117+
return wc_result.error();
118+
}
119+
weights_cache = wc_result.get();
120+
lock_weights_cache =
121+
std::unique_lock<std::mutex>(weights_cache->mutex());
104122

105-
options_.weights_cache().initialize_for_runtime(
123+
weights_cache->initialize_for_runtime(
106124
context.get_runtime_allocator(), named_data_map);
107125
workspace->set_uses_weight_cache();
108126
}
@@ -118,7 +136,7 @@ class XnnpackBackend final
118136
processed->data(),
119137
processed->size(),
120138
executor,
121-
&options_.weights_cache(),
139+
weights_cache.get(),
122140
workspace_ptr,
123141
named_data_map,
124142
use_weight_cache);
@@ -135,6 +153,14 @@ class XnnpackBackend final
135153
return err;
136154
}
137155

156+
// Publish the cache into the executor so execute() / destroy() can
157+
// reach it without going through options_. Held by shared_ptr so
158+
// the instance survives until this executor is destroyed even if
159+
// every other PTE sharing the same cache has already torn down.
160+
if (use_weight_cache) {
161+
executor->set_weights_cache(std::move(weights_cache));
162+
}
163+
138164
return executor;
139165
}
140166

@@ -146,10 +172,15 @@ class XnnpackBackend final
146172

147173
auto workspace = executor->get_workspace();
148174

149-
std::unique_lock<std::mutex> lock_weights_cache(
150-
options_.weights_cache_mutex(), std::defer_lock);
151-
if (executor->uses_weight_cache() || workspace->uses_weight_cache()) {
152-
lock_weights_cache.lock();
175+
// Per-executor cache lock: serializes concurrent execute() and
176+
// destroy() against any other PTE that shares this cache instance.
177+
// Different-path PTEs hold different mutexes and proceed in
178+
// parallel. The empty-shared_ptr branch covers PTEs that didn't
179+
// opt into the file-backed cache.
180+
auto cache = executor->get_weights_cache();
181+
std::unique_lock<std::mutex> lock_weights_cache;
182+
if (cache) {
183+
lock_weights_cache = std::unique_lock<std::mutex>(cache->mutex());
153184
}
154185

155186
auto [raii_lock, _] = workspace->acquire();
@@ -176,17 +207,23 @@ class XnnpackBackend final
176207
if (handle != nullptr) {
177208
auto executor = static_cast<xnnpack::delegate::XNNExecutor*>(handle);
178209
auto workspace = executor->get_workspace();
179-
180-
const std::lock_guard<std::mutex> lock_weights_cache(
181-
options_.weights_cache_mutex());
210+
auto cache = executor->get_weights_cache();
211+
212+
// Per-executor cache lock: same semantics as execute(). Keeps a
213+
// local shared_ptr so the instance lives through delete_packed_data
214+
// even if dropping it from the executor below was the last
215+
// outside reference.
216+
std::unique_lock<std::mutex> lock_weights_cache;
217+
if (cache) {
218+
lock_weights_cache = std::unique_lock<std::mutex>(cache->mutex());
219+
}
182220

183221
#ifdef ENABLE_XNNPACK_PROFILING
184222
executor->print_avg_op_timings();
185223
#endif
186224

187-
if (executor->uses_weight_cache()) {
188-
options_.weights_cache().delete_packed_data(
189-
executor->get_packed_data_names());
225+
if (cache && executor->uses_weight_cache()) {
226+
cache->delete_packed_data(executor->get_packed_data_names());
190227
}
191228

192229
// This is needed to serialize access to xnn_delete_runtime which is not
@@ -237,7 +274,9 @@ class XnnpackBackend final
237274
mutable xnnpack::XnnpackBackendOptions options_;
238275

239276
// Lock hierarchy for mutexes:
240-
// options_.weights_cache_mutex()
277+
// weights_cache_manager_.meta_mutex_ (leaf — held only during
278+
// get_or_create map ops)
279+
// XNNWeightsCache::instance_mutex_ (one per cache instance)
241280
// workspace_meta_mutex_
242281
// workspace_mutex_ (owned by executor)
243282
};

backends/xnnpack/runtime/XNNWeightsCache.h

Lines changed: 39 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#include <executorch/runtime/core/memory_allocator.h>
1515
#include <executorch/runtime/core/result.h>
1616
#include <executorch/runtime/executor/pte_data_map.h>
17+
#include <mutex>
1718
#include <string>
1819
#include <unordered_map>
1920
#include <vector>
@@ -143,17 +144,44 @@ class XNNWeightsCache {
143144
* When set, reserve_space() allocates from a MAP_SHARED file instead
144145
* of heap, and finalize_for_runtime() calls msync to make pages clean.
145146
*
146-
* The path MUST be unique per XNNWeightsCache instance — sharing it
147-
* across instances (or processes) would mean O_TRUNC corrupts the other
148-
* holder's mappings (SIGBUS on access). initialize_for_runtime() takes
149-
* an advisory exclusive flock on the file; if the lock fails the mmap
150-
* path is disabled for this instance and allocations fall back to heap.
147+
* MUST be called BEFORE any other method on this instance, and never
148+
* again afterward. Production callers go through XNNWeightsCacheManager,
149+
* which sets the path once before publishing the shared_ptr; the
150+
* per-instance mutex() does NOT need to be held for this single
151+
* pre-publish setter call because no other thread can observe the
152+
* instance yet. Tests that construct XNNWeightsCache directly must
153+
* still respect the call-once contract.
154+
*
155+
* Multiple instances pointing at the same path WILL corrupt each
156+
* other's state (O_TRUNC → SIGBUS); the manager prevents this by
157+
* deduping per-path.
151158
*/
152159
void set_packed_cache_path(const std::string& path);
153160

154161
/** Save packed weight index so subsequent loads skip packing. */
155162
Error save_packed_index();
156163

164+
/**
165+
* Per-instance mutex. Callers MUST hold this around every method
166+
* call on this XNNWeightsCache (initialize_for_runtime,
167+
* finalize_for_runtime, load_unpacked_data, delete_packed_data,
168+
* save_packed_index) AND around any XNNPACK callback path that
169+
* touches this cache (xnn_create_runtime invokes look_up /
170+
* reserve_space / look_up_or_insert during compile). The cache has
171+
* no internal synchronization; this mutex is the only thing that
172+
* serializes concurrent use.
173+
*
174+
* Held by XNNPACKBackend::init from before initialize_for_runtime
175+
* through compileModel; by ::execute around the runtime invocation
176+
* when the executor uses this cache; by ::destroy around
177+
* delete_packed_data; and by XNNWeightsCacheManager::save_all
178+
* around each save_packed_index. The manager's own meta_mutex is a
179+
* strictly-shallower-level lock — never held across this one.
180+
*/
181+
std::mutex& mutex() noexcept {
182+
return instance_mutex_;
183+
}
184+
157185
private:
158186
static constexpr uint32_t kCacheMagic = 0x58505743; // "XPWC"
159187
// Bump when the on-disk layout (footer or per-entry record) changes.
@@ -215,6 +243,12 @@ class XNNWeightsCache {
215243
// in mmap_regions_, so delete_packed_data() can munmap when ref_count==0.
216244
std::unordered_map<void*, size_t> file_ptr_to_region_index_;
217245

246+
// Per-instance lock. Documented contract on mutex() — the cache itself
247+
// never touches this field; callers (XNNPACKBackend, manager, tests)
248+
// are responsible for acquiring before any other public method.
249+
std::mutex instance_mutex_;
250+
251+
218252
// Function pointers to override XNNPACK's default xnn_weights_cache_provider
219253
// functions.
220254
static size_t look_up(
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
/*
2+
* Copyright (c) Meta Platforms, Inc. and affiliates.
3+
* All rights reserved.
4+
*
5+
* This source code is licensed under the BSD-style license found in the
6+
* LICENSE file in the root directory of this source tree.
7+
*/
8+
9+
#include <executorch/backends/xnnpack/runtime/XNNWeightsCacheManager.h>
10+
11+
#include <executorch/runtime/core/error.h>
12+
13+
#include <utility>
14+
#include <vector>
15+
16+
namespace executorch::backends::xnnpack {
17+
18+
using executorch::runtime::Error;
19+
using executorch::runtime::Result;
20+
21+
Result<std::shared_ptr<delegate::XNNWeightsCache>>
22+
XNNWeightsCacheManager::get_or_create(const std::string& cache_file_path) {
23+
// Empty path → shared, heap-only instance. All empty-path callers
24+
// (NGTTS sub-runners, FLLM classifier, PLLM methods when mmap MC is
25+
// off) dedupe against one another's packed weights via XNNPACK's
26+
// in-memory `look_up_or_insert` name match. Without this sharing,
27+
// each `XnnpackBackend::init` allocated its own copy of every
28+
// packed weight, regressing heap-only memory by hundreds of MB on
29+
// LoRA-multimethod models — see header comment.
30+
if (cache_file_path.empty()) {
31+
std::scoped_lock<std::mutex> lock(empty_path_mutex_);
32+
if (auto live = empty_path_cache_.lock()) {
33+
return live;
34+
}
35+
auto cache = std::make_shared<delegate::XNNWeightsCache>();
36+
empty_path_cache_ = cache;
37+
return cache;
38+
}
39+
40+
std::scoped_lock<std::mutex> lock(meta_mutex_);
41+
42+
auto it = caches_.find(cache_file_path);
43+
if (it != caches_.end()) {
44+
if (auto live = it->second.lock()) {
45+
return live;
46+
}
47+
// Stale weak_ptr — erase before recreating. Without this, the
48+
// insert below would overwrite the dead entry anyway; explicit
49+
// erase makes the intent obvious.
50+
caches_.erase(it);
51+
}
52+
53+
auto cache = std::make_shared<delegate::XNNWeightsCache>();
54+
// Set the path before publishing the shared_ptr into the map so any
55+
// concurrent caller that finds the live weak_ptr observes a fully
56+
// initialized instance. set_packed_cache_path is a plain string copy
57+
// — no heavy work, no I/O — so doing it under meta_mutex_ is safe.
58+
cache->set_packed_cache_path(cache_file_path);
59+
caches_[cache_file_path] = cache;
60+
return cache;
61+
}
62+
63+
Error XNNWeightsCacheManager::save_all() {
64+
// Snapshot live shared_ptrs under meta_mutex_, then release it
65+
// before calling into per-instance save. This honors the
66+
// meta_mutex_ → instance mutex hierarchy and lets concurrent
67+
// get_or_create on unrelated paths proceed during the save walk.
68+
std::vector<std::shared_ptr<delegate::XNNWeightsCache>> live;
69+
{
70+
std::scoped_lock<std::mutex> lock(meta_mutex_);
71+
live.reserve(caches_.size());
72+
for (auto it = caches_.begin(); it != caches_.end();) {
73+
if (auto cache = it->second.lock()) {
74+
live.push_back(std::move(cache));
75+
++it;
76+
} else {
77+
it = caches_.erase(it);
78+
}
79+
}
80+
}
81+
82+
Error first_err = Error::Ok;
83+
for (auto& cache : live) {
84+
std::lock_guard<std::mutex> lock(cache->mutex());
85+
Error err = cache->save_packed_index();
86+
if (err != Error::Ok && first_err == Error::Ok) {
87+
first_err = err;
88+
}
89+
}
90+
return first_err;
91+
}
92+
93+
size_t XNNWeightsCacheManager::live_count() const {
94+
std::scoped_lock<std::mutex> lock(meta_mutex_);
95+
size_t count = 0;
96+
for (const auto& entry : caches_) {
97+
if (!entry.second.expired()) {
98+
++count;
99+
}
100+
}
101+
return count;
102+
}
103+
104+
} // namespace executorch::backends::xnnpack

0 commit comments

Comments
 (0)