Model Catalog

llamdrop uses a two-layer model system. The verified catalog is the safe default. Live HuggingFace search gives access to everything else.

Layer 1 — Verified Catalog

Every model in the verified catalog has been:

Tested on real hardware across device tiers
Confirmed downloadable from HuggingFace without login
Assigned accurate RAM requirements based on real observation (not estimated from file size)
Licensed for free use (Apache 2.0, MIT, or equivalent)
Given a prompt_format field so the correct chat template is used automatically

llamdrop shows only models your device can actually run, based on your device tier and available RAM at launch time. The browser header shows your current tier so you know where you stand.

Device Tiers

Tier	Available RAM	Typical devices
Micro	< 1 GB	Very old phones, minimal Linux
Low	1 – 3 GB	Budget Android phones, Raspberry Pi 4 (4GB)
Low-Mid	3 – 6 GB	Mid-range phones, older laptops
Mid	6 – 12 GB	Modern phones, mainstream laptops
High	12 – 16 GB+	Flagship phones, gaming laptops

Micro Tier — Ultra Low RAM (under 1GB available)

Model	Params	Min RAM	Best For
SmolLM2 135M	135M	0.5GB	Ultra-fast, basic Q&A
SmolLM2 360M	360M	0.8GB	Basic chat
Qwen2.5 0.5B	500M	1.0GB	General chat — best Micro quality
TinyLlama 1.1B	1.1B	1.2GB	Lightweight chat, fast replies
Gemma 3 1B	1B	1.2GB	General chat
SmolLM2 1.7B	1.7B	1.8GB	Chat, summarization — HuggingFace's best small model
Qwen3 0.6B	600M	1.1GB	Fast reasoning on budget hardware

Recommended for Micro: Qwen2.5 0.5B or SmolLM2 1.7B

Low Tier — Standard (1–3GB available)

Model	Params	Min RAM	Best For
Qwen2.5 1.5B	1.5B	2.0GB	General chat, coding — recommended default
Qwen2.5 Coder 1.5B	1.5B	2.0GB	Code generation, debugging
Llama 3.2 1B	1B	1.6GB	Fast general chat
DeepSeek R1 Distill 1.5B	1.5B	2.0GB	Step-by-step reasoning, math
SmolLM3 3B	3B	2.8GB	General chat

Recommended for Low: Qwen2.5 1.5B Q4_K_M for most phones.

Low-Mid Tier (3–6GB available)

Model	Params	Min RAM	Best For
Llama 3.2 3B	3B	4.0GB	General chat, reasoning
Qwen2.5 3B	3B	4.0GB	Multilingual, general chat
Qwen2.5 Coder 3B	3B	4.0GB	Code generation, review
DeepSeek R1 7B Q2	7B	5.0GB	Advanced reasoning, problem solving
Mistral 7B v0.3	7B	5.5GB	Best overall quality in this tier
Llama 3.1 8B	8B	5.5GB	General chat, long context

Recommended for Low-Mid: Mistral 7B or Llama 3.1 8B for best quality.

Mid Tier (6–12GB available)

Model	Params	Min RAM	Best For
Gemma 3 12B	12B	8.0GB	Reasoning, general quality
Qwen3 8B	8B	6.5GB	Multilingual, strong reasoning
DeepSeek R1 14B	14B	10.0GB	Advanced reasoning, math
Mistral NeMo 12B	12B	8.5GB	Long context, general quality

High Tier (12–16GB+ available)

Model	Params	Min RAM	Best For
Gemma 3 27B	27B	18.0GB	Strong general quality
Qwen3 32B	32B	22.0GB	Multilingual, reasoning
DeepSeek R1 32B	32B	22.0GB	Advanced reasoning
Qwen2.5 Coder 32B	32B	22.0GB	Code generation at scale

Multilingual Models

Model	Languages	Tier
Aya Expanse 8B Q2	100+ languages including Hindi, Marathi, Arabic	Low-Mid
Qwen2.5 3B / 7B	Strong multilingual	Low-Mid
Qwen3 series	Strong multilingual	All

Arabic, Hindi, Spanish, and Portuguese UI languages are also supported natively.

Prompt Formats

Each model uses a specific chat template. llamdrop auto-detects this from models.json — no manual setup needed.

Format	Models
ChatML	Qwen series, SmolLM2, TinyLlama, DeepSeek R1 1.5B, Aya
Llama3	Llama 3.2 / 3.1 / 3.3, Mistral 7B / NeMo, DeepSeek R1 7B+
Gemma	Gemma 2, Gemma 3
Phi3	Phi-3 Mini, Phi-3.5 Mini, Phi-4

Quantization

llamdrop picks the right level for your device automatically based on live RAM at download time.

Level	Quality	When used
Q5_K_M	Best	Plenty of RAM available
Q4_K_M	Very good	Standard — most devices
IQ3_M	Good	Moderate RAM pressure — better quality than Q2_K
IQ2_M	Acceptable	Tight RAM — better than Q2_K at same size
Q2_K	Acceptable	Last resort — RAM is very tight

Note: IQ quants (IQ2/IQ3/IQ4) automatically disable Vulkan GPU acceleration — they are incompatible with the Vulkan compute path. llamdrop handles this automatically.

Layer 2 — Live HuggingFace Search

Select Search HuggingFace from the menu to search any GGUF model. llamdrop estimates RAM from file size and quantization. Results are clearly marked unverified — use this for models not in the catalog. You can also paste direct HuggingFace file URLs (e.g. https://huggingface.co/.../file.gguf) to bypass searching entirely.

Adding a Model to the Catalog

Open a Pull Request editing models.json. Each entry needs:

{
  "id": "model-id",
  "name": "Display Name",
  "tier": 2,
  "min_device_level": "low",
  "max_device_level": "high",
  "hf_repo": "org/repo-name-GGUF",
  "prompt_format": "chatml",
  "best_for": "what it's good at",
  "languages": ["english"],
  "license": "Apache 2.0",
  "license_allows_free_use": true,
  "verified": true,
  "variants": {
    "Q4_K_M": {"filename": "model-q4_k_m.gguf", "download_size_gb": 1.0, "min_ram_gb": 2.0}
  }
}

See Contributing for full guidelines.

LLAMdrop v0.10.0 • Built by @DeVenLucaz • Free & Open Source
Empowering low-spec devices with local AI.

🦙 LLAMdrop Wiki

📂 Resource Center

🆘 Support & Plans

Tip: Running on budget hardware? Check the Model Catalog for Tier 1 models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Catalog

Layer 1 — Verified Catalog

Device Tiers

Micro Tier — Ultra Low RAM (under 1GB available)

Low Tier — Standard (1–3GB available)

Low-Mid Tier (3–6GB available)

Mid Tier (6–12GB available)

High Tier (12–16GB+ available)

Multilingual Models

Prompt Formats

Quantization

Layer 2 — Live HuggingFace Search

Adding a Model to the Catalog

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🦙 LLAMdrop Wiki

📂 Resource Center

🆘 Support & Plans

Clone this wiki locally