feat: Auto-clear SSD cache on model unload by afanty2021 · Pull Request #885 · jundot/omlx

afanty2021 · 2026-04-22T02:36:07Z

Summary

Add automatic SSD cache clearing when models are unloaded, useful for benchmarking scenarios where cache persistence across model loads is not needed.

Problem

When running multiple models in sequence for benchmarking:

Each model leaves SSD cache behind after unloading
Subsequent models write new cache, causing "SSD write queue full" warnings
Cache interference can affect benchmark results

2026-04-21 06:29:19,224 - omlx.cache.paged_ssd_cache - WARNING - SSD write queue full, dropping evicted block c127f0c289fba682

Solution

Add a clear_ssd_cache_on_unload configuration option that:

Clears SSD cache before engine stop during model unload
Is disabled by default (backward compatible)
Can be enabled via CLI flag, environment variable, or config file

Changes

Configuration

CLI flag: --paged-ssd-cache-clear-on-unload
Environment variable: OMLX_PAGED_SSD_CACHE_CLEAR_ON_UNLOAD
Config file: clear_on_unload in paged_ssd_cache section

Implementation

Modified _unload_engine() in omlx/engine_pool.py
Clears SSD cache before stopping the engine
Safe attribute access with exception handling
Comprehensive test coverage (3 new tests)

Usage

# Enable auto-clear (recommended for benchmarking)
omlx serve --model-dir ~/models --paged-ssd-cache-clear-on-unload

# Environment variable
export OMLX_PAGED_SSD_CACHE_CLEAR_ON_UNLOAD=true
omlx serve --model-dir ~/models

# Config file (~/.omlx/settings.json)
{
  "paged_ssd_cache": {
    "clear_on_unload": true
  }
}

Test plan

Unit tests pass (3 new tests in test_engine_pool.py)
Configuration parsing works correctly
CLI argument parsing works correctly
Manual testing with multiple model loads/unloads
Documentation added (docs/CLEAR_SSD_CACHE_ON_UNLOAD.md)

Breaking changes

None. This feature is opt-in and disabled by default.

Checklist

Code follows project style guidelines
Tests added/updated
Documentation updated
No breaking changes

🤖 Generated with Claude Code

…conflict

- add oq_manager/hf_uploader fields to ServerState dataclass - update KVCache reconstruct test to expect tensor shape offset - update oQ predicate bits test for affine-only mode - rewrite metal limit tests to match no-op behavior (jundot#429) - fix memory fallback test mock to patch HAS_MLX

# Conflicts: # omlx/_version.py # packaging/venvstacks.toml

- Add model alias resolution for Claude model names - Include capabilities field in models API response - Add Ollama network setup documentation - Fix tokenizer detection for Qwen3.5-Claude models Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add MEMORY/ directory (omlx runtime memory data) - Add benchmark_qwen35.py (personal performance testing script) - Add config/ directory (personal omlx configuration files) - Add docs/CLAUDE_CODE_*.md and docs/NETWORK_DEPLOYMENT.md (personal documentation) - Add specific scripts in scripts/ directory (personal utility scripts) These files are user-specific and should not be committed to the repository.

GenerationBatch is not available in mlx-lm 0.31.2. Disabled the import and patch code temporarily. TODO: Re-enable when mlx-lm adds GenerationBatch back or alternative is found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add complete CLAUDE.md documentation for the oMLX project to help AI assistants understand the codebase architecture and development workflow. - Document project overview and core features - Explain system architecture and components - Provide development guide and testing instructions - Record upstream changes (v0.3.5.dev1) - Include technical stack and dependency information This documentation enables better AI assistance for future development tasks and maintains consistency with the project's AI workflow standards.

- Add graphify-out/ directory (Graphify tool output) - Add user-specific configuration files (.graphify_python, GITHUB_ISSUE_TEMPLATE.md, hfd.sh) These files are project-specific or generated artifacts that should not be tracked in version control.

- 更新版本号至 0.3.5.dev2 - 同步上游 20 个最新提交 - 记录 VLM 性能提升（2倍速度） - 添加音频模型扩展和语音克隆功能 - 更新 Metal 缓存优化和 IME 输入法修复 - 同步超时修复和 SSE 稳定性改进 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

解决 scheduler.py 冲突，采用上游版本： - 恢复 GenerationBatch monkey-patch - 添加 mRoPE（多旋转位置编码）支持 - 新增 VLM mRoPE 集成测试 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- 更新文档时间：2026-04-10 → 2026-04-14 - 核心特性：新增 DFlash 推测解码引擎（3-4x 加速） - 系统架构：添加 DFlashEngine 节点 - 关键依赖：添加 dflash-mlx - 目录结构：添加 engine/dflash.py - 核心组件：新增 DFlash 推测解码引擎说明 - 最近变更：添加 edb7244 提交记录 - 重要变更：添加 DFlash 引擎特性详细说明 - 相关资源：添加 dflash-mlx 链接 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- 添加 TurboQuantKVCache.merge monkey-patch 支持 - 改进 mRoPE 实现，使用 PromptProcessingBatch.prompt - 修复 burst-completion bug (jundot#557) - 改进 TurboQuantKV 实现，添加 _apply_turboquant_kv_empty - 添加 .worktrees/ 到 .gitignore - 保留本地用户配置

- 更新版本号到 v0.3.5-rc1 - 添加最新 20 条上游提交记录 - 更新重要变更汇总 - 记录 TurboQuantKV 优化和 mRoPE 改进

- 更新项目版本到 v0.3.5 - 更新 dflash-mlx 依赖到 v0.1.3 (814c4a1) - 将变更记录从 CLAUDE.md 移至 CHANGELOG.md - 在 CLAUDE.md 顶部添加变更日志链接 - 精简 CLAUDE.md，专注于项目结构和开发指南 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…se CLI NSStatusItem.isVisible() only reflects the app's own setVisible: state, so on Tahoe (26.x) it stays True even when ControlCenter hides the icon or the user toggles it off in System Settings > Menu Bar. The 3s NSAlert check in v0.3.6 never fired for affected users. - Replace isVisible() with a frame check on the status item's button window (width/height and x-origin) which actually catches Tahoe hiding - Retain the one-shot timer reference to prevent early PyObjC dealloc, bump delay 1s -> 3s for ControlCenter to settle - NSAlert gains an "Open Menu Bar Settings" button that deep-links to System Settings via x-apple.systempreferences URL - Re-check in health_timer so runtime toggle-off also triggers a warning (gated by _warned_hidden for once-per-session behavior) - Add "omlx diagnose menubar" CLI that reports macOS version, app install, menubar process status, recent visibility log entries, and manual recovery steps, useful when the icon is gone and the menu is the only control surface Apple's sandbox policy blocks programmatic re-enable on Tahoe: the real visibility prefs live in group.com.apple.controlcenter's Group Container, which third-party apps can't read or write. Writes to the legacy com.apple.controlcenter.plist are ignored by Tahoe's ControlCenter (verified on-device). So 0.3.7 focuses on detection plus guiding the user to the right Settings pane. Refs jundot#725 jundot#806

合并 30 个上游提交，主要更新： - 升级到 v0.3.6 - dflash-mlx 升级到 v0.1.3 - 修复 Tahoe 菜单栏隐藏状态检测 - 修复 VLM 工具消息格式化 - 修复 Jina 重排序器评分 - oQ 量化 float16 选项（M1/M2 加速） - 移除 LSUIElement 防止 ControlCenter 阻塞 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

# Conflicts: # omlx/cli.py # packaging/omlx_app/app.py