🔍 A semantic image search system that lets you find images using natural language descriptions.
- Natural Language Search: Find images using everyday language like "sunset at the beach" or "a cute dog"
- Semantic Understanding: Powered by OpenAI's CLIP model - understands meaning, not just keywords
- Fast & Efficient: Uses FAISS for lightning-fast similarity search
- GPU Accelerated: Leverages your GPU for 6-10x faster processing
- Completely Offline: Your images and searches stay private on your machine
- Batch Processing: Index entire folders with progress tracking
- Smart Filtering: Automatically skips already-processed images
Text Query → Text Encoder (CLIP) → Vector (512-dim)
↓
Vector Store (FAISS)
↓
Similar Vector IDs
↓
Metadata Store (SQLite)
↓
Image Results
# Clone or create project
cd QID
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtpython test_setup.pyShould show:
- ✅ All libraries imported
- ✅ GPU detected (if available)
- ✅ CLIP model loaded
Add your own images
# Copy images to data/images/
cp /path/to/your/photos/* data/images/python test_pipeline.pyThis will:
- Scan
data/images/for images - Validate all images
- Encode images to vectors (GPU accelerated!)
- Store in database
The pipeline script includes an interactive search:
🔍 Enter search query: sunset at the beach
✅ Found 5 results:
1. sunset.jpg (95% match)
2. ocean_view.jpg (89% match)
3. beach_day.jpg (87% match)
QID/
├── config/
│ └── config.yaml # Configuration
├── src/
│ ├── embeddings/
│ │ ├── image_encoder.py # Image → Vector
│ │ └── text_encoder.py # Text → Vector
│ ├── database/
│ │ ├── vector_store.py # FAISS vector database
│ │ └── metadata_store.py # SQLite metadata
│ ├── ingestion/
│ │ ├── image_processor.py # Image validation
│ │ └── batch_indexer.py # Batch processing
│ ├── query/
│ │ └── search_engine.py # Search functionality
│ └── utils/
│ ├── config.py # Config management
│ └── logger.py # Logging
├── ui/ # Tkinter UI (coming soon!)
├── data/
│ ├── images/ # Your images
│ ├── embeddings/ # Vector database
│ └── metadata/ # Image metadata
└── models/ # Downloaded CLIP models
Hardware: RTX 3050 GPU
| Task | CPU | GPU | Speedup |
|---|---|---|---|
| Encode 100 images | 30s | 5s | 6x |
| Encode 1000 images | 5min | 50s | 6x |
| Search 10K images | 50ms | 50ms | 1x* |
| Text encoding | 100ms | 10ms | 10x |
*FAISS-CPU is already very fast for personal libraries
Memory Usage:
- 10,000 images: ~25MB (vectors) + 2MB (metadata)
- CLIP model: 350MB
- Peak GPU memory: ~500MB
- Check:
nvidia-smi - Reinstall PyTorch with CUDA: See installation steps
- Check
data/images/exists - Verify supported formats:
.jpg,.png,.bmp,.gif,.webp
- Reduce
batch_sizein config - Use smaller model:
ViT-B/32instead ofViT-L/14
- Ensure virtual environment is activated
- Check VS Code is using correct Python interpreter
✅ Completed:
- Image & text encoding (CLIP)
- Vector database (FAISS)
- Metadata storage (SQLite)
- Batch indexing pipeline
- Natural language search
- CLI interface
🚧 In Progress:
- Advanced UI (next phase!)
This is a learning project! Feel free to:
- Add features
- Fix bugs
- Improve documentation
- Share your improvements
Made with ❤️ for learning AI and building practical tools