🏛️ AI-Powered Semantic Search Engine for University Notices
Digital Archaeology is a Streamlit application that provides semantic search capabilities for notices (PDFs/images) using embeddings + FAISS, plus optional local LLM Q&A via Ollama (Mistral).
- 🔍 Semantic Search - Find documents using natural language queries
- 💬 AI Q&A - Ask questions and get answers powered by Mistral AI
- 📊 Dashboard - Monitor system statistics and activity logs
- 🕸️ Web Crawler - Automatically download notices from university websites
- 🎨 Dark Cyberpunk Theme - Stunning neon-accented UI with glassmorphism effects
- UI: Streamlit
- AI/ML: Python (sentence-transformers, FAISS, Mistral via Ollama)
- OCR: Tesseract + PyMuPDF
- Python 3.10+
- Tesseract OCR
- Ollama with Mistral model (for Q&A features)
# Arch Linux
sudo pacman -S python tesseract tesseract-data-eng poppler
# Install Ollama and Mistral
curl https://ollama.ai/install.sh | sh
ollama pull mistral./setup_arch.sh./run.shOpen: http://localhost:8501
digital-archaeology/
├── src/ # Python modules
│ ├── search.py # Search engine
│ ├── qa_engine.py # Q&A with Mistral
│ ├── processor.py # Document processing
│ └── scraper.py # Web crawler
├── config/ # Configuration files
├── data/ # Data directories
└── run.sh # Streamlit launcher
Search documents using natural language queries. The system uses sentence transformers to create embeddings and FAISS for fast similarity search.
Ask questions about your documents and get intelligent answers powered by Mistral AI running locally via Ollama.
Automatically discover and download PDF notices from configured university websites.
Stunning UI with:
- Neon colors (cyan, magenta, green)
- Glowing effects and animations
- Glassmorphism cards
- Terminal-inspired fonts
- Responsive design
Edit .env files in root, backend/, and frontend/ directories to customize:
- API ports
- MongoDB connection
- Python virtual environment path
- CORS settings
- Upload limits
cd backend
npm run dev # Uses nodemon for auto-reloadcd frontend
npm run dev # Vite dev server with HMRcd frontend
npm run buildMistral not responding:
ollama pull mistral
ollama servePython modules not found:
source venv/bin/activate
pip install -r requirements.txtMIT
v3.x - Streamlit app + local Ollama (Mistral)