-
Notifications
You must be signed in to change notification settings - Fork 0
Getting Started
This page walks through exactly what happens from your first llamdrop command to your first AI conversation.
Type llamdrop and press Enter. On your very first launch, llamdrop runs a Dynamic Backend Probing micro-benchmark and shows a Device Profile card. This is a one-time welcome screen that tells you exactly what it detected about your hardware:
- RAM total and available
- CPU chip name (80+ Android SoC names translated, e.g. MT6853V → Dimensity 720)
- CPU core layout (big.LITTLE aware — only fast cores are used)
- CPU flags (AVX2, AVX512, NEON)
- GPU vendor and whether GPU acceleration is active
- Storage free space
- OS and architecture
- Your device tier (Micro / Low / Low-Mid / Mid / High)
- Which backend will be used (llama.cpp CPU, Vulkan, CUDA, ROCm, or Ollama)
- Recommended starting models for your hardware
This screen runs once and only once. Hardware detection at launch takes about 2 seconds, runs exactly once, and the result is used everywhere.
Navigate with arrow keys, select with Enter.
| Option | What it does |
|---|---|
| 🚀 Start chatting | Launch a downloaded model |
| ⬇️ Browse & download | Verified model catalog — filtered to your device tier and RAM |
| 🔎 Search HuggingFace | Search any GGUF model live |
| 📂 My downloaded models | View models you have + scan phone for existing GGUFs |
| 💾 Resume saved session | Continue a previous conversation |
| 🔧 Device info | Your full hardware profile, tier, and backend decision |
| 🩺 Doctor | Health check — diagnose and auto-fix install issues |
| ⚙️ Config | View your settings and their source |
| 🆙 Update llamdrop | Pull latest version from GitHub |
| ⚙️ Update Engine | Update the underlying llama.cpp binary separately |
| 🌐 Language / भाषा | Change display language (English, Hindi, Spanish, Portuguese, Arabic) |
| ❓ Help | Quick reference |
The menu header shows live RAM (colour-coded green/yellow/red), battery level, llama.cpp status, GPU status, and any available updates.
Select Browse & download. You'll see a list filtered to models that fit your device tier and current RAM. Models that are too small to be useful on your hardware, or too large to ever run, are hidden. Each entry shows:
- Model name, parameter count, quantization level
- Download size and RAM required
- Fit status: Great fit / Good fit / Tight on RAM
- ⚡ t/s benchmark score (shown after your first chat with that model, rolling average of last 5 runs)
Filters:
- Press
[C]to filter by category: chat, coding, reasoning, multilingual, etc. - Press
[P]to filter by Provider (e.g. Meta, Qwen, Microsoft, Google). - Press
[U]to toggle hiding/showing Unsupported Models (power users only).
Recommended starting points:
-
Under 2GB available RAM:
Qwen2.5 0.5BorSmolLM2 1.7B -
2–4GB available RAM:
Qwen2.5 1.5B Q4_K_M— good quality, stable -
4–8GB available RAM:
Mistral 7BorLlama 3.1 8B -
8GB+ available RAM:
Gemma 3 12B,Qwen3 8B, or better
Press Enter on a model. llamdrop:
- Re-checks your live RAM right now and picks the best quantization for what actually fits (Q5_K_M → Q4_K_M → IQ3_M → Q2_K, depending on available RAM)
- Shows download size and reason for the choice
- Downloads with a progress bar — auto-resumes if connection drops
- Verifies the file via SHA-256 checksum after download to catch corruption
- If you cancel mid-download (Ctrl+C), the partial file is deleted immediately — it will never show up as a valid model
After download, press Enter to launch. You'll see the chat header with model name, RAM free, and context size.
You: hello
🦙 Hello! How can I help you today?
🟢 RAM: 2.4GB free
You:
| Command | Action |
|---|---|
/save |
Save conversation to resume later |
/export |
Export chat as markdown to Downloads folder |
/trim |
Manually trim old context to free RAM |
/clear |
Clear conversation history |
/ram |
Show current RAM |
/quit |
Exit chat |
- Context trimming — auto-trims when RAM gets low. Always preserves your first exchange — only the middle of long conversations gets trimmed, never your opening task or persona.
- RAM monitor — background thread watches RAM during inference, triggers trim if critical
- Battery tracking — shows % drop per response on Android with charge-level icons (🪫 🔴 🟡 🔋)
- Session auto-save — saves every 5 exchanges (10 messages) automatically
- File context — attach a file (TXT, MD, PDF, CSV, JSON) to your conversation before chatting to chat with your document.
-
Doctor Auto-healing —
llamdrop doctornow includes an auto-fix[F]key to automatically repair missing binaries, broken configs, and corrupted directories.
If you have Ollama running, llamdrop auto-detects it and routes inference through Ollama's HTTP API. An Ollama Chat option appears in the menu.
Go to Resume saved session. Select a session number to resume, or type D2 to delete session 2. The model loads with full conversation history as context.
- Model Catalog — full list of verified models with RAM requirements and tiers
- FAQ & Troubleshooting — if something didn't work
- Device Compatibility — confirmed working devices
LLAMdrop v0.10.0 • Built by @DeVenLucaz • Free & Open Source
Empowering low-spec devices with local AI.