Run Large Language Models on iOS devices with just one line of code
let response = try await EdgeLLM.chat("Hello, world!")Note: EdgeLLM is now fully functional! Supports multiple models including Qwen, Gemma, and Phi-3.
import EdgeLLM
// 1. Basic chat (uses default model)
let response = try await EdgeLLM.chat("Hello!")
print(response)
// 2. Choose a specific model
let response = try await EdgeLLM.chat("Hello!", model: .gemma2b)
// 3. Stream responses in real-time
for try await token in EdgeLLM.stream("Tell me a joke") {
print(token, terminator: "")
}
// 4. Advanced usage with custom instance
let llm = try await EdgeLLM(model: .qwen06b)
let response = try await llm.chat("Explain quantum computing")
>
β² Fictional penguin article summarised offline.
- π Dead Simple - Chat with LLMs in one line
- π± iOS Optimized - Metal GPU acceleration for blazing speed
- π Privacy First - Everything runs on-device
- π¦ Easy Install - Swift Package Manager ready
- π Streaming Support - Real-time responses
In Xcode:
- File β Add Package Dependencies
- Enter URL:
https://github.com/john-rocky/EdgeLLM - Select version and click "Add Package"
Or add to your Package.swift:
dependencies: [
.package(url: "https://github.com/john-rocky/EdgeLLM", from: "1.0.0")
]- Qwen 0.6B (
.qwen06b) - Smallest, fastest model (~1.2GB) - Gemma 2B (
.gemma2b) - Balanced performance (~2.5GB) - Phi-3.5 Mini (
.phi3_mini) - Most capable (~3.8GB)
Models are automatically downloaded on first use (WiFi recommended).
import EdgeLLM
// Chat in one line!
let response = try await EdgeLLM.chat("What's the weather like?")
print(response)// Receive response token by token
for try await token in EdgeLLM.stream("Tell me a story") {
print(token, terminator: "")
}// Specify model and options
let response = try await EdgeLLM.chat(
"Technical question",
model: .gemma2b, // Use different model
options: EdgeLLM.Options(
temperature: 0.3, // More deterministic
maxTokens: 500
)
)// Keep LLM instance for conversations
let llm = try await EdgeLLM(model: .gemma2b)
// Multiple exchanges
let response1 = try await llm.chat("My name is John")
let response2 = try await llm.chat("What's my name?")
// Reset conversation
await llm.reset()Basic chat interface in Examples/SimpleChat:
cd Examples/SimpleChat
open SimpleChat.xcodeprojAdvanced demo with real-time streaming and performance monitoring:
cd Examples/StreamingChat
open StreamingChat.xcodeprojFeatures:
- Real-time token streaming
- Live performance metrics (tokens/sec, latency)
- Model comparison (Qwen3, Gemma, Phi-3.5)
- iOS 14.0+
- Xcode 15.0+
- 4GB+ free storage for models
- Recommended: iPhone 12 or newer (Neural Engine support)
On iPhone 15 Pro:
- Initial load: 2-3 seconds
- Token generation: 10-30 tokens/sec (model dependent)
- Memory usage: 1-4GB depending on model
Models are downloaded automatically on first run (WiFi recommended).
Try a smaller model like .qwen06b:
let response = try await EdgeLLM.chat("Hello", model: .qwen06b)Apache2.0 License
Pull requests are welcome!
- Clone the repository
- Set up git hooks to prevent large files:
git config core.hooksPath .githooks
- Never commit binary files (
.xcframework,.zip,.mlmodel, etc.) - Maximum file size: 10MB
- Large files should be uploaded to GitHub Releases
- The pre-commit hook will block commits with large files
EdgeLLM is built on top of the MLC-LLM project.
