-
Notifications
You must be signed in to change notification settings - Fork 0
Model Catalog
llamdrop uses a two-layer model system. The verified catalog is the safe default. Live HuggingFace search gives access to everything else.
Every model in the verified catalog has been:
- Tested on real hardware across device tiers
- Confirmed downloadable from HuggingFace without login
- Assigned accurate RAM requirements based on real observation (not estimated from file size)
- Licensed for free use (Apache 2.0, MIT, or equivalent)
- Given a
prompt_formatfield so the correct chat template is used automatically
llamdrop shows only models your device can actually run, based on your device tier and available RAM at launch time. The browser header shows your current tier so you know where you stand.
| Tier | Available RAM | Typical devices |
|---|---|---|
| Micro | < 1 GB | Very old phones, minimal Linux |
| Low | 1 – 3 GB | Budget Android phones, Raspberry Pi 4 (4GB) |
| Low-Mid | 3 – 6 GB | Mid-range phones, older laptops |
| Mid | 6 – 12 GB | Modern phones, mainstream laptops |
| High | 12 – 16 GB+ | Flagship phones, gaming laptops |
| Model | Params | Min RAM | Best For |
|---|---|---|---|
| SmolLM2 135M | 135M | 0.5GB | Ultra-fast, basic Q&A |
| SmolLM2 360M | 360M | 0.8GB | Basic chat |
| Qwen2.5 0.5B | 500M | 1.0GB | General chat — best Micro quality |
| TinyLlama 1.1B | 1.1B | 1.2GB | Lightweight chat, fast replies |
| Gemma 3 1B | 1B | 1.2GB | General chat |
| SmolLM2 1.7B | 1.7B | 1.8GB | Chat, summarization — HuggingFace's best small model |
| Qwen3 0.6B | 600M | 1.1GB | Fast reasoning on budget hardware |
Recommended for Micro: Qwen2.5 0.5B or SmolLM2 1.7B
| Model | Params | Min RAM | Best For |
|---|---|---|---|
| Qwen2.5 1.5B | 1.5B | 2.0GB | General chat, coding — recommended default |
| Qwen2.5 Coder 1.5B | 1.5B | 2.0GB | Code generation, debugging |
| Llama 3.2 1B | 1B | 1.6GB | Fast general chat |
| DeepSeek R1 Distill 1.5B | 1.5B | 2.0GB | Step-by-step reasoning, math |
| SmolLM3 3B | 3B | 2.8GB | General chat |
Recommended for Low: Qwen2.5 1.5B Q4_K_M for most phones.
| Model | Params | Min RAM | Best For |
|---|---|---|---|
| Llama 3.2 3B | 3B | 4.0GB | General chat, reasoning |
| Qwen2.5 3B | 3B | 4.0GB | Multilingual, general chat |
| Qwen2.5 Coder 3B | 3B | 4.0GB | Code generation, review |
| DeepSeek R1 7B Q2 | 7B | 5.0GB | Advanced reasoning, problem solving |
| Mistral 7B v0.3 | 7B | 5.5GB | Best overall quality in this tier |
| Llama 3.1 8B | 8B | 5.5GB | General chat, long context |
Recommended for Low-Mid: Mistral 7B or Llama 3.1 8B for best quality.
| Model | Params | Min RAM | Best For |
|---|---|---|---|
| Gemma 3 12B | 12B | 8.0GB | Reasoning, general quality |
| Qwen3 8B | 8B | 6.5GB | Multilingual, strong reasoning |
| DeepSeek R1 14B | 14B | 10.0GB | Advanced reasoning, math |
| Mistral NeMo 12B | 12B | 8.5GB | Long context, general quality |
| Model | Params | Min RAM | Best For |
|---|---|---|---|
| Gemma 3 27B | 27B | 18.0GB | Strong general quality |
| Qwen3 32B | 32B | 22.0GB | Multilingual, reasoning |
| DeepSeek R1 32B | 32B | 22.0GB | Advanced reasoning |
| Qwen2.5 Coder 32B | 32B | 22.0GB | Code generation at scale |
| Model | Languages | Tier |
|---|---|---|
| Aya Expanse 8B Q2 | 100+ languages including Hindi, Marathi, Arabic | Low-Mid |
| Qwen2.5 3B / 7B | Strong multilingual | Low-Mid |
| Qwen3 series | Strong multilingual | All |
Arabic, Hindi, Spanish, and Portuguese UI languages are also supported natively.
Each model uses a specific chat template. llamdrop auto-detects this from models.json — no manual setup needed.
| Format | Models |
|---|---|
| ChatML | Qwen series, SmolLM2, TinyLlama, DeepSeek R1 1.5B, Aya |
| Llama3 | Llama 3.2 / 3.1 / 3.3, Mistral 7B / NeMo, DeepSeek R1 7B+ |
| Gemma | Gemma 2, Gemma 3 |
| Phi3 | Phi-3 Mini, Phi-3.5 Mini, Phi-4 |
llamdrop picks the right level for your device automatically based on live RAM at download time.
| Level | Quality | When used |
|---|---|---|
| Q5_K_M | Best | Plenty of RAM available |
| Q4_K_M | Very good | Standard — most devices |
| IQ3_M | Good | Moderate RAM pressure — better quality than Q2_K |
| IQ2_M | Acceptable | Tight RAM — better than Q2_K at same size |
| Q2_K | Acceptable | Last resort — RAM is very tight |
Note: IQ quants (IQ2/IQ3/IQ4) automatically disable Vulkan GPU acceleration — they are incompatible with the Vulkan compute path. llamdrop handles this automatically.
Select Search HuggingFace from the menu to search any GGUF model. llamdrop estimates RAM from file size and quantization. Results are clearly marked unverified — use this for models not in the catalog. You can also paste direct HuggingFace file URLs (e.g. https://huggingface.co/.../file.gguf) to bypass searching entirely.
Open a Pull Request editing models.json. Each entry needs:
{
"id": "model-id",
"name": "Display Name",
"tier": 2,
"min_device_level": "low",
"max_device_level": "high",
"hf_repo": "org/repo-name-GGUF",
"prompt_format": "chatml",
"best_for": "what it's good at",
"languages": ["english"],
"license": "Apache 2.0",
"license_allows_free_use": true,
"verified": true,
"variants": {
"Q4_K_M": {"filename": "model-q4_k_m.gguf", "download_size_gb": 1.0, "min_ram_gb": 2.0}
}
}See Contributing for full guidelines.
LLAMdrop v0.10.0 • Built by @DeVenLucaz • Free & Open Source
Empowering low-spec devices with local AI.