Skip to content

perf(coreml): cache compiled models by default (faster runs + fixes restricted $TMPDIR) (#129)#142

Merged
raphaelsty merged 1 commit into
mainfrom
fix/coreml-cache-dir
Jun 17, 2026
Merged

perf(coreml): cache compiled models by default (faster runs + fixes restricted $TMPDIR) (#129)#142
raphaelsty merged 1 commit into
mainfrom
fix/coreml-cache-dir

Conversation

@raphaelsty

@raphaelsty raphaelsty commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Fixes #129. Co-authored with the issue reporter @RBozydar.

What & why

CoreML compiles the ONNX model into a CoreML bundle at session creation. With no cache dir, ONNX Runtime compiles into the ephemeral $TMPDIR, so:

  1. the model is recompiled on every colgrep invocation (each run is a fresh process), and
  2. on macOS setups where $TMPDIR (under /var/folders/.../T) is rootless-restricted, the compile fails outright (the reported crash).

This points CoreML at a persistent per-user cache dir by default~/Library/Caches/next-plaid/coreml (honoring XDG_CACHE_HOME) — so the compiled model persists across runs and never touches $TMPDIR.

Benefit (benchmarked: per-invocation --force-gpu CoreML search, fresh process each run)

model main (recompiles each run) this PR (cached) saved speedup
LateOn-Code-edge ~1.01 s/run ~0.70 s/run ~0.31 s ~31%
answerai-small ~1.85 s/run ~1.16 s/run ~0.68 s ~37%

The first (cold) run matches main; every subsequent run loads the cached compiled model. Since searches run constantly, this compounds.

And it fixes the original crash by construction: CoreML no longer compiles under $TMPDIR, so the rootless-restriction failure can't occur.

Behavior / compatibility

  • This is a deliberate default change: CoreML now writes a persistent cache under ~/Library/Caches instead of using ephemeral temp. (Opted into for the speed + robustness win.)
  • Precedence: explicit NEXT_PLAID_COREML_CACHE_DIR / colgrep settings --coreml-cache-dir → default cache dir → $TMPDIR (only if the cache dir can't be created).
  • Non-CoreML builds and the success path of other providers are unaffected.

Override (persistent colgrep setting)

colgrep settings --coreml-cache-dir /some/other/dir   # relocate the cache (persists)
colgrep settings --clear-coreml-cache-dir             # back to the default location

Validation

  • Compiles + clippy clean with and without --features coreml; unit tests (coreml_cache_dir set/get/clear/serialize) + full suites green via ci-quick.
  • Benchmarked main vs this PR on two models (table above) — consistent ~31–37% faster repeated CoreML loads.
  • Smoke-tested: explicit dir + default both cache the compiled model; success path otherwise unchanged.
  • Note: I could not reproduce the reporter's exact rootless-/var/folders crash locally, but caching-by-default removes the $TMPDIR dependency that causes it. @RBozydar — confirming on your environment would be the final check.

Closes #129

@raphaelsty raphaelsty changed the title fix(coreml): persistent --coreml-cache-dir for restricted macOS TMPDIR (#129) perf(coreml): cache compiled models by default (faster runs + fixes restricted $TMPDIR) (#129) Jun 17, 2026
…ache-dir (#129)

CoreML compiles the ONNX model into a CoreML bundle at session creation. With no
cache dir, ONNX Runtime compiles into the ephemeral $TMPDIR, so the model is
recompiled on every invocation, and on macOS setups where $TMPDIR (under
/var/folders/.../T) is rootless-restricted the compile fails outright (#129).

- next-plaid-onnx: point CoreML at a persistent per-user model cache dir by
  default (~/Library/Caches/next-plaid/coreml, honoring XDG_CACHE_HOME). The
  compiled model is reused across runs and never compiled under $TMPDIR.
  Precedence: NEXT_PLAID_COREML_CACHE_DIR -> default cache dir -> $TMPDIR (only if
  the cache dir cannot be created).
- colgrep: persistent `coreml_cache_dir` setting to relocate the cache
  (`colgrep settings --coreml-cache-dir <PATH>` / `--clear-coreml-cache-dir`),
  shown in `colgrep settings`, exported as NEXT_PLAID_COREML_CACHE_DIR at startup.

Benefit (per-invocation --force-gpu CoreML search, fresh process each run):
- LateOn-Code-edge: ~1.01s -> ~0.70s/run after first (~31% faster)
- answerai-small:   ~1.85s -> ~1.16s/run after first (~37% faster)
First (cold) run matches main; every subsequent run loads the cached model. Also
fixes the rootless-$TMPDIR crash by construction (no $TMPDIR compile).

Closes #129

Co-authored-by: RBozydar <34664833+RBozydar@users.noreply.github.com>
@raphaelsty raphaelsty force-pushed the fix/coreml-cache-dir branch from bab14b1 to 9885755 Compare June 17, 2026 14:54
@raphaelsty raphaelsty merged commit c573b12 into main Jun 17, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CoreML colgrep fails under default macOS TMPDIR; use stable model cache dir

1 participant