An AI-enhanced classroom platform that ensures academic honesty by identifying plagiarism and AI-generated content.
UniqScan is an academic integrity platform that combines a Google Classroom–style MERN web app with Python microservices to analyze student submissions for similarity and AI-generated content. Instructors create classes and assignments; students submit files. The backend serves uploads and orchestrates analysis by calling a Similarity service that extracts text from PDFs, Office docs, and images (via Tesseract and PyMuPDF), compares it against a corpus to compute similarity, and then queries an AI detector to estimate AI-generated probability. Results are fused into an overall plagiarism score and returned with a rich HTML report that the UI renders alongside grades.
- Features
- Tech stack
- Repository layout
- Architecture and data flow
- Screenshots
- Environment variables (summary)
- How to run locally
- Python modules overview
- API overview
- Troubleshooting
classroom/— MERN appbackend/— Node/Express API, file uploads, ML integrationfrontend/— React UIML/— Python services consumed by the backendAI_content/— AI detector service (E5 LoRA)Similarity/— Similarity/OCR/plagiarism service
ml_nlp-ocr/— Standalone document watcher and OCR-to-Markdown pipelineai_text_detector/— Sample detectors and clients (HF model, Gradio, RapidAPI)Matcher_algo/— Matching and plagiarism CLI utilitiesmodels/— Local cache for the HF detector model
See per-directory READMEs for details.
Classroom & users
- User accounts: register, login, JWT-based auth
- Classrooms: create/join with access codes, roster management
- Posts & discussions: share announcements/resources per classroom
Assignments
- Teacher: create assignments with title/description/deadline
- Student: submit files (PDF/DOCX/PPT/Images/TXT/CSV)
- Submission management: list who submitted, who’s pending
Grading & reporting
- Backend triggers ML grading on submission (asynchronous queue)
- Scores returned and stored per project:
- Similarity score (%) vs local corpus
- AI-generated content score (%) via E5 small LoRA detector
- Overall plagiarism score and risk level
- Detailed, styled HTML report saved and accessible from UI
ML pipeline
- Public file URL served by backend
/uploadsfor ML services - Similarity/OCR service: downloads file, extracts text with OCR, compares vs corpus
- AI detector service:
/classifyreturns AI probability for text - Resilient fallbacks if ML is slow/unavailable (timeout handling and fallback report)
Operations & DX
- Config via
.envfiles; timeouts and service URLs configurable - MongoDB persistence for users, classrooms, posts, homework, submissions
- Logs and JSON artifacts for ML service runs; corpus grows automatically
Frontend
- React 17, React Router, React Bootstrap, Axios
Backend
- Node.js, Express, CORS, Multer (uploads), JWT auth, Mongoose (MongoDB)
ML Services (Python)
- Flask microservices
- AI detection: PyTorch, Transformers, Hugging Face Hub
- Similarity/OCR: PyMuPDF (fitz), Tesseract (pytesseract), OpenCV, Pillow, NLTK, python-docx, python-pptx
Storage and infra
- MongoDB (Atlas or local)
- Local filesystem for uploads and generated reports
- Classroom users interact with the React frontend.
- Backend stores data in MongoDB and handles file uploads under
classroom/backend/public/uploads. - When a homework is submitted, backend calls the ML API (Python Similarity service) with a public URL to the uploaded file.
- The Similarity service downloads the file, extracts text (PDF/OCR/etc.), compares against local corpus, calls the AI detector service, and returns scores plus an HTML report.
- Backend returns those results to the frontend for display.
React Frontend (student submit) ──> Node/Express Backend (stores file under /uploads)
│
├─ calls ML API /grade/analyze with file_url
│
▼
Flask Similarity Service (download + extract text + compare corpus)
│
├─ calls AI Detector /classify for AI %
│
└─ returns scores + HTML report
Backend persists results and serves report → Frontend displays scores and report
Note: Place images in the repository root assets/ folder as 1.png, 2.png, ....
- Landing
- Login
- Register
- Classroom view of Teacher
- Creating assignment
- Assignment view (Teacher)
- Join Class (Student)
- Classroom view of Student
- Assignment view (Student)
- Assignment Submission
- Assignment submission Check (Teacher)
- Report (HTML analysis)
- Grade view (Student)
Backend (classroom/backend/.env)
MONGO_DB_URL— Mongo connection stringCLIENT_URL— allowed CORS origin (e.g., http://localhost:3000)SECRET_ACCESS_TOKEN,ACCESS_TOKEN_EXPIRE,SECRET_REFRESH_TOKEN,REFRESH_TOKEN_EXPIRE— JWT configML_API_BASE_URL— Similarity API base URL (e.g., http://localhost:5000)ML_API_TIMEOUT— request timeout ms (default 300000)BACKEND_URL— optional explicit public URL for file links
Frontend (classroom/frontend/.env)
REACT_APP_BASE_URL— Backend base URL (e.g., http://localhost:4000)
AI Content service (classroom/ML/AI_content)
- Optional
HF_TOKENif needed to pull the model
Similarity service (classroom/ML/Similarity)
AI_SCORE_API_URL— e.g., http://localhost:5001 (the AI service base;/classifyis appended)- Tesseract must be installed; on Windows the default path is used automatically.
Recommended order in separate terminals:
- AI Detector service (port 5001)
cd classroom/ML/AI_content
pip install flask torch transformers huggingface_hub requests
python e5-small-lora.py- Similarity service (port 5000)
cd classroom/ML/Similarity
pip install flask requests nltk pytesseract pillow pymupdf opencv-python numpy python-docx python-pptx
$env:AI_SCORE_API_URL = "http://localhost:5001"
python app.py- Backend (port 4000 by default)
cd classroom/backend
npm install
cp .env.example .env # if you keep a template; otherwise create manually
npm run dev- Frontend (port 3000)
cd classroom/frontend
npm install
cp .env.example .env
npm start- Optional: Standalone OCR watcher
cd ml_nlp-ocr
pip install pytesseract pillow pymupdf watchdog opencv-python numpy tqdm python-docx python-pptx docling
python app.pyai_text_detector/— reference scripts for AI detection via local HF model, Gradio Space, RapidAPI.ml_nlp-ocr/— robust document watcher with OCR and Markdown outputs.Matcher_algo/— general-purpose n-gram matcher and a folder plagiarism CLI that emits HTML/TXT reports.
- Backend API: Express routers under
classroom/backend/src/routersexpose users, classrooms, posts, and homework. Grading is orchestrated server-side viagradingService.js. - ML API: See
classroom/backend/ML_API_DOCUMENTATION.mdfor request/response contract used by the backend. - AI Detector API:
classroom/ML/AI_content/e5-small-lora.pyexposesPOST /classify { text } -> { ai_score }andGET /health. - Similarity API:
classroom/ML/Similarity/app.pyexposesPOST /grade/analyzeand auxiliary endpoints.
- Tesseract not found
- Install Tesseract and ensure its path is correct. On Windows, default path is used automatically by the services.
- CORS errors in browser
- Ensure backend
CLIENT_URLmatches the React origin and that CORS is enabled inapp.js.
- Ensure backend
- ML API connection refused or timeout
- Start AI Detector (5001) and Similarity (5000) first. Verify
ML_API_BASE_URLin backend andAI_SCORE_API_URLin Similarity.
- Start AI Detector (5001) and Similarity (5000) first. Verify
- File URL not accessible by ML service
- Backend must serve uploads at
/uploads. ConfirmgetFileUrlbuilds a reachable URL andBACKEND_URLis set if deployed.
- Backend must serve uploads at
- Ensure Tesseract OCR is installed and accessible on your system.
- The Similarity service grows its corpus in
classroom/ML/Similarity/extracted_text/as you analyze more files. - The backend serves uploads via
/uploads/...so the Python service can fetch them by URL. Don’t disable this in production without providing an alternative access method.















