-
Notifications
You must be signed in to change notification settings - Fork 25
Add MLX integration for local vision-language models #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jmanhype
wants to merge
13
commits into
GeneralUserModels:main
Choose a base branch
from
jmanhype:mlx-vlm-integration
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This change adds support for running GUM completely locally on Apple Silicon Macs using MLX-powered vision language models, eliminating the need for OpenAI API calls. Key Features: - Drop-in replacement for OpenAI API with MLXClient wrapper - Support for both vision tasks (screenshot analysis) and text tasks (proposition generation, revision, filtering) - Configurable backend selection (OpenAI vs MLX) for both Screen observer and core GUM functionality - Automatic JSON cleanup for structured outputs - Lazy model loading for better startup performance Changes: - Add gum/mlx_client.py: OpenAI-compatible wrapper for mlx-vlm - Update gum/gum.py: Add use_mlx parameter and MLX backend support - Update gum/observers/screen.py: Add MLX vision support for screenshots - Update pyproject.toml: Add mlx-vlm>=0.3.0 dependency - Add examples/mlx_example.py: Complete working example with MLX - Add docs/mlx-integration.md: Comprehensive MLX setup and usage guide Benefits: - Completely free (no API costs) - 100% private (all data stays on device) - Works offline - Fast on Apple Silicon (M1/M2/M3) Recommended for: - Users with Apple Silicon Macs and 16GB+ RAM - Privacy-conscious users - Users wanting to avoid API costs - Offline usage scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Exclude Claude Code hook logs from version control 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
These files should not be version controlled 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Properly separate .claude/ entry on its own line 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add comprehensive JSON cleanup in MLXClient - Handle markdown code fences, mismatched quotes, and malformed JSON - Add test_mlx_integration.py for testing MLX functionality - Document known limitation: Qwen2-VL-2B may generate malformed JSON - Recommend using Qwen2.5-VL-7B or larger for better JSON compliance - MLX model loading and generation confirmed working Known Issues: - Smaller models (2B) may generate JSON with quote inconsistencies - Larger models (7B+) have better JSON compliance - JSON parsing will be improved in future updates 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The 7B model provides excellent JSON compliance and higher quality outputs. Test Results: - ✅ Perfect JSON parsing (no formatting issues) - ✅ 5 high-quality propositions generated - ✅ Better reasoning and confidence scores - ✅ ~4.5GB RAM usage (acceptable for 32GB machines) Changes: - Update test_mlx_integration.py to use 7B model - Update examples/mlx_example.py to use 7B model - Confirmed working on M2 32GB MacBook Pro Recommendation: Use 7B model for all production deployments 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Switch from 2B to 7B model for better JSON compliance - Update all model references and print statements - Tested successfully with perfect JSON generation
- Add --use-mlx flag to enable local MLX models - Add --mlx-model flag to specify model (defaults to Qwen2.5-VL-7B) - Support USE_MLX and MLX_MODEL environment variables - Pass MLX config to both Screen observer and gum instance - Display backend info on startup (MLX vs OpenAI)
- Clear MLX Metal cache after each generation - Force garbage collection to free memory - Prevents SIGSEGV crashes after multiple batches
- Support direct array format: [...]
- Support wrapped format: {"propositions": [...]}
- Fixes TypeError with 2B model that returns arrays directly
- Fix _revise_propositions to handle [...] and {"propositions": [...]}
- Fix _filter_propositions to handle [...] and {"relations": [...]}
- Wraps bare arrays for Pydantic validation
- Fixes JSONDecodeError and ValidationError with 2B model
- Remove conditional JSON cleaning - Always clean responses from MLX models - Helps with 2B model's frequent JSON formatting issues
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds support for running GUM completely locally on Apple Silicon Macs using MLX-powered vision language models, eliminating the need for OpenAI API calls.
Key Features
Changes
New Files
gum/mlx_client.py- OpenAI-compatible wrapper for mlx-vlm with automatic JSON cleanupexamples/mlx_example.py- Complete working example demonstrating MLX usagedocs/mlx-integration.md- Comprehensive setup guide with benchmarks and troubleshootingModified Files
gum/gum.py- Addeduse_mlxandmlx_modelparameters for configurable backend selectiongum/observers/screen.py- Added MLX vision support for screenshot analysispyproject.toml- Addedmlx-vlm>=0.3.0dependencyTechnical Details
MLXClient Features:
.chat.completions.create())Supported Use Cases:
Usage Example
Performance Benchmarks (M2 32GB)
Recommended For
Testing
✅ Tested on M2 MacBook Pro 32GB with:
Documentation
See
docs/mlx-integration.mdfor:Backward Compatibility
✅ 100% backward compatible - existing OpenAI-based code works unchanged. MLX is opt-in via
use_mlx=Trueparameter.🤖 Generated with Claude Code
Co-Authored-By: Claude noreply@anthropic.com