Getting Started

This page walks through exactly what happens from your first llamdrop command to your first AI conversation.

Step 1 — First Launch

Type llamdrop and press Enter. On your very first launch, llamdrop runs a Dynamic Backend Probing micro-benchmark and shows a Device Profile card. This is a one-time welcome screen that tells you exactly what it detected about your hardware:

RAM total and available
CPU chip name (80+ Android SoC names translated, e.g. MT6853V → Dimensity 720)
CPU core layout (big.LITTLE aware — only fast cores are used)
CPU flags (AVX2, AVX512, NEON)
GPU vendor and whether GPU acceleration is active
Storage free space
OS and architecture
Your device tier (Micro / Low / Low-Mid / Mid / High)
Which backend will be used (llama.cpp CPU, Vulkan, CUDA, ROCm, or Ollama)
Recommended starting models for your hardware

This screen runs once and only once. Hardware detection at launch takes about 2 seconds, runs exactly once, and the result is used everywhere.

Step 2 — Main Menu

Navigate with arrow keys, select with Enter.

Option	What it does
🚀 Start chatting	Launch a downloaded model
⬇️ Browse & download	Verified model catalog — filtered to your device tier and RAM
🔎 Search HuggingFace	Search any GGUF model live
📂 My downloaded models	View models you have + scan phone for existing GGUFs
💾 Resume saved session	Continue a previous conversation
🔧 Device info	Your full hardware profile, tier, and backend decision
🩺 Doctor	Health check — diagnose and auto-fix install issues
⚙️ Config	View your settings and their source
🆙 Update llamdrop	Pull latest version from GitHub
⚙️ Update Engine	Update the underlying llama.cpp binary separately
🌐 Language / भाषा	Change display language (English, Hindi, Spanish, Portuguese, Arabic)
❓ Help	Quick reference

The menu header shows live RAM (colour-coded green/yellow/red), battery level, llama.cpp status, GPU status, and any available updates.

Step 3 — Browse and Pick a Model

Select Browse & download. You'll see a list filtered to models that fit your device tier and current RAM. Models that are too small to be useful on your hardware, or too large to ever run, are hidden. Each entry shows:

Model name, parameter count, quantization level
Download size and RAM required
Fit status: Great fit / Good fit / Tight on RAM
⚡ t/s benchmark score (shown after your first chat with that model, rolling average of last 5 runs)

Filters:

Press [C] to filter by category: chat, coding, reasoning, multilingual, etc.
Press [P] to filter by Provider (e.g. Meta, Qwen, Microsoft, Google).
Press [U] to toggle hiding/showing Unsupported Models (power users only).

Recommended starting points:

Under 2GB available RAM: Qwen2.5 0.5B or SmolLM2 1.7B
2–4GB available RAM: Qwen2.5 1.5B Q4_K_M — good quality, stable
4–8GB available RAM: Mistral 7B or Llama 3.1 8B
8GB+ available RAM: Gemma 3 12B, Qwen3 8B, or better

Step 4 — Download

Press Enter on a model. llamdrop:

Re-checks your live RAM right now and picks the best quantization for what actually fits (Q5_K_M → Q4_K_M → IQ3_M → Q2_K, depending on available RAM)
Shows download size and reason for the choice
Downloads with a progress bar — auto-resumes if connection drops
Verifies the file via SHA-256 checksum after download to catch corruption
If you cancel mid-download (Ctrl+C), the partial file is deleted immediately — it will never show up as a valid model

Step 5 — Chat

After download, press Enter to launch. You'll see the chat header with model name, RAM free, and context size.

You: hello

🦙  Hello! How can I help you today?

🟢 RAM: 2.4GB free
You:

Chat commands

Command	Action
`/save`	Save conversation to resume later
`/export`	Export chat as markdown to Downloads folder
`/trim`	Manually trim old context to free RAM
`/clear`	Clear conversation history
`/ram`	Show current RAM
`/quit`	Exit chat

Automatic & Advanced features

Context trimming — auto-trims when RAM gets low. Always preserves your first exchange — only the middle of long conversations gets trimmed, never your opening task or persona.
RAM monitor — background thread watches RAM during inference, triggers trim if critical
Battery tracking — shows % drop per response on Android with charge-level icons (🪫 🔴 🟡 🔋)
Session auto-save — saves every 5 exchanges (10 messages) automatically
File context — attach a file (TXT, MD, PDF, CSV, JSON) to your conversation before chatting to chat with your document.
Doctor Auto-healing — llamdrop doctor now includes an auto-fix [F] key to automatically repair missing binaries, broken configs, and corrupted directories.

Ollama backend (Linux)

If you have Ollama running, llamdrop auto-detects it and routes inference through Ollama's HTTP API. An Ollama Chat option appears in the menu.

Step 6 — Resume Sessions

Go to Resume saved session. Select a session number to resume, or type D2 to delete session 2. The model loads with full conversation history as context.

Next Steps

Model Catalog — full list of verified models with RAM requirements and tiers
FAQ & Troubleshooting — if something didn't work
Device Compatibility — confirmed working devices

LLAMdrop v0.10.0 • Built by @DeVenLucaz • Free & Open Source
Empowering low-spec devices with local AI.

🦙 LLAMdrop Wiki

📂 Resource Center

🆘 Support & Plans

Tip: Running on budget hardware? Check the Model Catalog for Tier 1 models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

Step 1 — First Launch

Step 2 — Main Menu

Step 3 — Browse and Pick a Model

Step 4 — Download

Step 5 — Chat

Chat commands

Automatic & Advanced features

Ollama backend (Linux)

Step 6 — Resume Sessions

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🦙 LLAMdrop Wiki

📂 Resource Center

🆘 Support & Plans

Clone this wiki locally