Skip to content

Model Catalog

DeVenLucaz edited this page Jun 19, 2026 · 4 revisions

llamdrop uses a two-layer model system. The verified catalog is the safe default. Live HuggingFace search gives access to everything else.


Layer 1 — Verified Catalog

Every model in the verified catalog has been:

  • Tested on real hardware across device tiers
  • Confirmed downloadable from HuggingFace without login
  • Assigned accurate RAM requirements based on real observation (not estimated from file size)
  • Licensed for free use (Apache 2.0, MIT, or equivalent)
  • Given a prompt_format field so the correct chat template is used automatically

llamdrop shows only models your device can actually run, based on your device tier and available RAM at launch time. The browser header shows your current tier so you know where you stand.


Device Tiers

Tier Available RAM Typical devices
Micro < 1 GB Very old phones, minimal Linux
Low 1 – 3 GB Budget Android phones, Raspberry Pi 4 (4GB)
Low-Mid 3 – 6 GB Mid-range phones, older laptops
Mid 6 – 12 GB Modern phones, mainstream laptops
High 12 – 16 GB+ Flagship phones, gaming laptops

Micro Tier — Ultra Low RAM (under 1GB available)

Model Params Min RAM Best For
SmolLM2 135M 135M 0.5GB Ultra-fast, basic Q&A
SmolLM2 360M 360M 0.8GB Basic chat
Qwen2.5 0.5B 500M 1.0GB General chat — best Micro quality
TinyLlama 1.1B 1.1B 1.2GB Lightweight chat, fast replies
Gemma 3 1B 1B 1.2GB General chat
SmolLM2 1.7B 1.7B 1.8GB Chat, summarization — HuggingFace's best small model
Qwen3 0.6B 600M 1.1GB Fast reasoning on budget hardware

Recommended for Micro: Qwen2.5 0.5B or SmolLM2 1.7B


Low Tier — Standard (1–3GB available)

Model Params Min RAM Best For
Qwen2.5 1.5B 1.5B 2.0GB General chat, coding — recommended default
Qwen2.5 Coder 1.5B 1.5B 2.0GB Code generation, debugging
Llama 3.2 1B 1B 1.6GB Fast general chat
DeepSeek R1 Distill 1.5B 1.5B 2.0GB Step-by-step reasoning, math
SmolLM3 3B 3B 2.8GB General chat

Recommended for Low: Qwen2.5 1.5B Q4_K_M for most phones.


Low-Mid Tier (3–6GB available)

Model Params Min RAM Best For
Llama 3.2 3B 3B 4.0GB General chat, reasoning
Qwen2.5 3B 3B 4.0GB Multilingual, general chat
Qwen2.5 Coder 3B 3B 4.0GB Code generation, review
DeepSeek R1 7B Q2 7B 5.0GB Advanced reasoning, problem solving
Mistral 7B v0.3 7B 5.5GB Best overall quality in this tier
Llama 3.1 8B 8B 5.5GB General chat, long context

Recommended for Low-Mid: Mistral 7B or Llama 3.1 8B for best quality.


Mid Tier (6–12GB available)

Model Params Min RAM Best For
Gemma 3 12B 12B 8.0GB Reasoning, general quality
Qwen3 8B 8B 6.5GB Multilingual, strong reasoning
DeepSeek R1 14B 14B 10.0GB Advanced reasoning, math
Mistral NeMo 12B 12B 8.5GB Long context, general quality

High Tier (12–16GB+ available)

Model Params Min RAM Best For
Gemma 3 27B 27B 18.0GB Strong general quality
Qwen3 32B 32B 22.0GB Multilingual, reasoning
DeepSeek R1 32B 32B 22.0GB Advanced reasoning
Qwen2.5 Coder 32B 32B 22.0GB Code generation at scale

Multilingual Models

Model Languages Tier
Aya Expanse 8B Q2 100+ languages including Hindi, Marathi, Arabic Low-Mid
Qwen2.5 3B / 7B Strong multilingual Low-Mid
Qwen3 series Strong multilingual All

Arabic, Hindi, Spanish, and Portuguese UI languages are also supported natively.


Prompt Formats

Each model uses a specific chat template. llamdrop auto-detects this from models.json — no manual setup needed.

Format Models
ChatML Qwen series, SmolLM2, TinyLlama, DeepSeek R1 1.5B, Aya
Llama3 Llama 3.2 / 3.1 / 3.3, Mistral 7B / NeMo, DeepSeek R1 7B+
Gemma Gemma 2, Gemma 3
Phi3 Phi-3 Mini, Phi-3.5 Mini, Phi-4

Quantization

llamdrop picks the right level for your device automatically based on live RAM at download time.

Level Quality When used
Q5_K_M Best Plenty of RAM available
Q4_K_M Very good Standard — most devices
IQ3_M Good Moderate RAM pressure — better quality than Q2_K
IQ2_M Acceptable Tight RAM — better than Q2_K at same size
Q2_K Acceptable Last resort — RAM is very tight

Note: IQ quants (IQ2/IQ3/IQ4) automatically disable Vulkan GPU acceleration — they are incompatible with the Vulkan compute path. llamdrop handles this automatically.


Layer 2 — Live HuggingFace Search

Select Search HuggingFace from the menu to search any GGUF model. llamdrop estimates RAM from file size and quantization. Results are clearly marked unverified — use this for models not in the catalog. You can also paste direct HuggingFace file URLs (e.g. https://huggingface.co/.../file.gguf) to bypass searching entirely.


Adding a Model to the Catalog

Open a Pull Request editing models.json. Each entry needs:

{
  "id": "model-id",
  "name": "Display Name",
  "tier": 2,
  "min_device_level": "low",
  "max_device_level": "high",
  "hf_repo": "org/repo-name-GGUF",
  "prompt_format": "chatml",
  "best_for": "what it's good at",
  "languages": ["english"],
  "license": "Apache 2.0",
  "license_allows_free_use": true,
  "verified": true,
  "variants": {
    "Q4_K_M": {"filename": "model-q4_k_m.gguf", "download_size_gb": 1.0, "min_ram_gb": 2.0}
  }
}

See Contributing for full guidelines.

🦙 LLAMdrop Wiki

📂 Resource Center

🆘 Support & Plans


Tip: Running on budget hardware? Check the Model Catalog for Tier 1 models.

Clone this wiki locally