AI-powered platform for Ministry of Education (MoE) and Higher-Education institutions to retrieve, understand, compare, explain, and audit government policies.
This project uses a phase-based documentation system for better organization:
- README.md (this file) - Quick start and overview
- PROJECT_DESCRIPTION.md - Comprehensive technical documentation
-
PHASE_1_SETUP_AND_AUTHENTICATION.md (7 documents)
- Email verification system
- Two-step registration
- University email domain validation
- Authentication setup guides
-
PHASE_2_DOCUMENT_MANAGEMENT.md (15 documents)
- Document approval workflows
- Draft and review processes
- Access control and security
- Status visibility and badges
- Search and sorting features
-
PHASE_3_INSTITUTION_AND_ROLE_MANAGEMENT.md (22 documents)
- Institution hierarchy management
- Ministry and university relationships
- Role-based permissions
- Institution deletion workflows
- User management strategies
-
PHASE_4_ADVANCED_FEATURES_AND_OPTIMIZATIONS.md (61 documents)
- Chat system and voice queries
- Notification system
- RAG and vector store optimizations
- Performance improvements (Redis, caching, indexing)
- External data sources
- Analytics and insights
- UI/UX fixes and enhancements
- Security audits and fixes
- π Multi-format Support: PDF, DOCX, PPTX, Images (with OCR)
- π Smart Search: Hybrid retrieval (semantic + keyword)
- β‘ Lazy RAG: Instant uploads, on-demand embedding
- π Citation Tracking: All answers include source documents
- π Role-Based Access: Hierarchical document visibility
- π€ AI Chat Assistant: Natural language queries with cited sources
- π€ Voice Queries: Ask questions via audio (98+ languages)
- π Multilingual: 100+ languages including Hindi, Tamil, Telugu, Bengali
- π Policy Analysis: Compare documents, detect conflicts, check compliance
- π₯ Role Hierarchy: Developer β Ministry Admin β University Admin β Document Officer β Student
- ποΈ Institution Types: Universities, Hospitals, Research Centers, Defense Academies
- β Approval Workflows: Multi-level document and user approval system
- π§ Email Verification: Secure two-step registration process
- π Real-time Notifications: Hierarchical notification routing
- π Analytics Dashboard: System health, activity tracking, user insights
- π External Data Sync: Connect to ministry databases
- π¨ Theme Support: Light/dark mode with persistent preferences
- Python 3.11+
- PostgreSQL 15+ with pgvector extension
- Node.js 18+
- Supabase account (or S3-compatible storage)
- Google API key (Gemini)
git clone <repository-url>
cd Beacon__V1# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (Linux/Mac)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtCreate .env file in root directory:
# Database
DATABASE_HOSTNAME=your-db-host
DATABASE_PORT=5432
DATABASE_NAME=postgres
DATABASE_USERNAME=your-username
DATABASE_PASSWORD=your-password
# Supabase Storage
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_KEY=your-supabase-key
SUPABASE_BUCKET_NAME=Docs
# AI Service
GOOGLE_API_KEY=your-google-api-key
# JWT Authentication
JWT_SECRET_KEY=your-secret-key
JWT_ALGORITHM=HS256
JWT_EXPIRATION_MINUTES=1440
# Email (Optional - for verification)
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your-email@gmail.com
SMTP_PASSWORD=your-app-password
FROM_EMAIL=your-email@gmail.com
FROM_NAME=BEACON System
FRONTEND_URL=http://localhost:5173
# Redis (Optional - for caching)
REDIS_URL=redis://localhost:6379# Enable pgvector extension
python scripts/enable_pgvector.py
# Run migrations
alembic upgrade head
# Initialize developer account (optional)
python backend/init_developer.pyuvicorn backend.main:app --reload --host 127.0.0.1 --port 8000Backend will be available at: http://localhost:8000
cd frontend
# Install dependencies
npm install
# Create .env file
echo "VITE_API_BASE_URL=http://localhost:8000/api" > .env
# Start development server
npm run devFrontend will be available at: http://localhost:5173
Backend:
- FastAPI (Python 3.11+)
- PostgreSQL with pgvector extension
- SQLAlchemy ORM
- Alembic migrations
- JWT authentication
Frontend:
- React 18 with Vite
- TailwindCSS + shadcn/ui components
- Zustand state management
- React Router v6
- Axios for API calls
AI/ML:
- Google Gemini 2.0 Flash (LLM)
- BGE-M3 embeddings (multilingual, 1024-dim)
- OpenAI Whisper (voice transcription)
- EasyOCR (image text extraction)
- pgvector (vector similarity search)
Storage:
- Supabase S3 (document storage)
- PostgreSQL (metadata + embeddings)
Upload β Process β Extract Metadata β Store
β
Query β Search Metadata β Rerank β Embed (if needed) β Search β Answer + Citations
Lazy Embedding Strategy:
- Documents uploaded instantly (no waiting for embedding)
- Embeddings generated on first query
- Subsequent queries use cached embeddings
- Multi-machine support via PostgreSQL storage
Developer (Super Admin)
β
Ministry Admin (MoE Officials)
β
University Admin (Institution Heads)
β
Document Officer (Upload/Manage Docs)
β
Student (Read-Only Access)
β
Public Viewer (Limited Access)
| Feature | Developer | Ministry Admin | University Admin | Document Officer | Student |
|---|---|---|---|---|---|
| View all documents | β | β (restricted) | β (institution) | β (institution) | β (public) |
| Upload documents | β | β (auto-approved) | β (needs approval) | β (needs approval) | β |
| Approve documents | β | β | β (institution) | β | β |
| Manage users | β | β (limited) | β (institution) | β | β |
| System health | β | β | β | β | β |
| Analytics | β | β | β (institution) | β | β |
POST /api/auth/register- User registrationPOST /api/auth/login- User loginPOST /api/auth/verify-email/{token}- Email verificationGET /api/auth/me- Get current user
POST /api/documents/upload- Upload documentGET /api/documents/list- List documents (role-filtered)GET /api/documents/{id}- Get document detailsGET /api/documents/{id}/download- Download documentDELETE /api/documents/{id}- Delete document
GET /api/approvals/pending- Get pending documentsPOST /api/approvals/{id}/approve- Approve documentPOST /api/approvals/{id}/reject- Reject document
POST /api/chat/query- Ask AI questionPOST /api/voice/query- Voice query (audio upload)GET /api/chat/sessions- Get chat history
GET /api/institutions/list- List institutionsPOST /api/institutions/create- Create institutionDELETE /api/institutions/{id}- Delete institution
GET /api/notifications/list- List notificationsGET /api/notifications/unread-count- Unread countPOST /api/notifications/{id}/mark-read- Mark as read
GET /api/analytics/stats- System statisticsGET /api/analytics/activity- Activity feedGET /api/audit/logs- Audit logs
Full API Documentation: http://localhost:8000/docs
# Run all tests
python tests/run_all_tests.py
# Individual tests
python tests/test_embeddings.py
python tests/test_voice_query.py
python tests/test_multilingual_embeddings.py
python tests/test_compliance_api.py
python tests/test_conflict_detection_api.py| Operation | Time | Notes |
|---|---|---|
| Document Upload | 3-7s | Instant response |
| Query (embedded) | 4-7s | Fast |
| Query (first time) | 12-19s | Includes embedding |
| Voice transcription | 5-10s | 1 min audio |
| User Login | <1s | JWT generation |
- β JWT-based authentication
- β Email verification required
- β Role-based access control (RBAC)
- β Document-level permissions
- β Audit logging for all actions
- β SQL injection prevention (SQLAlchemy ORM)
- β XSS protection (React escaping)
- β Soft deletes (preserve audit trail)
Beacon__V1/
βββ Agent/ # AI/ML Components
β βββ embeddings/ # BGE-M3 embeddings
β βββ voice/ # Whisper transcription
β βββ rag_agent/ # ReAct agent
β βββ retrieval/ # Hybrid search
β βββ lazy_rag/ # On-demand embedding
β βββ vector_store/ # pgvector integration
β βββ tools/ # Search tools
β
βββ backend/ # FastAPI Backend
β βββ routers/ # API endpoints
β βββ utils/ # Helper functions
β βββ database.py # SQLAlchemy models
β βββ main.py # FastAPI app
β
βββ frontend/ # React Frontend
β βββ src/
β β βββ components/ # Reusable components
β β βββ pages/ # Route pages
β β βββ services/ # API calls
β β βββ stores/ # Zustand stores
β βββ package.json
β
βββ alembic/ # Database migrations
βββ scripts/ # Utility scripts
βββ tests/ # Test suite
βββ .env # Environment variables
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ PROJECT_DESCRIPTION.md # Detailed documentation
# Check PostgreSQL is running
psql -h HOST -U USER -d DATABASE
# Verify .env file has correct credentials
# Test connection: python test_redis_connection.py# Install PyTorch with CUDA support
pip install torch --index-url https://download.pytorch.org/whl/cu118# Install FFmpeg
# Windows: Download from https://ffmpeg.org/download.html
# Linux: sudo apt install ffmpeg
# Mac: brew install ffmpeg# For Gmail:
# 1. Enable 2-Factor Authentication
# 2. Generate App Password: https://myaccount.google.com/apppasswords
# 3. Use App Password as SMTP_PASSWORD in .env- β Migrated from FAISS to pgvector for multi-machine support
- β Implemented lazy RAG for instant document uploads
- β Added email verification system
- β Enhanced notification system with hierarchical routing
- β Improved analytics dashboard with system health monitoring
- β Optimized performance with Redis caching
- β Added voice query support (98+ languages)
- β Implemented document approval workflows
- β Enhanced role-based access control
- Documentation: See phase documentation files for detailed guides
- API Docs: http://localhost:8000/docs
- Logs:
Agent/agent_logs/ - Tests:
python tests/run_all_tests.py
β
Multi-format document processing
β
Multilingual embeddings (100+ languages)
β
Voice query system (98+ languages)
β
Lazy RAG (instant uploads)
β
Hybrid retrieval (semantic + keyword)
β
External data ingestion
β
Citation tracking
β
Production-ready
Built with β€οΈ for Government Policy Intelligence
Version: 2.0.0 | Status: β Production Ready | Last Updated: December 5, 2025