PCOS Detect is an end-to-end AI-powered clinical decision support system designed to assess Polycystic Ovary Syndrome (PCOS) risk using a combination of:
- Structured clinical (tabular) data
- Pelvic ultrasound imaging
- Automatic medical report (PDF) parsing
The system integrates machine learning, deep learning, document intelligence, and full-stack engineering into a single modular pipeline focused on robustness, explainability, and real-world usability.
β οΈ This system is intended for screening and decision-support purposes only and must not be used as a medical diagnostic tool.
- Multimodal PCOS risk prediction (Tabular + Ultrasound)
- Domain-specific expert ensemble for clinical features
- Adaptive meta-learningβbased fusion
- Explainable AI with Grad-CAM for ultrasound interpretation
- Automatic medical report (PDF) parsing
- Secure FastAPI backend with Next.js frontend
- ROC-AUCβdriven evaluation metrics
Medical PDF βββ
ββββΆ Document Parser βββΆ Auto-filled Form (Editable)
Manual Input ββ
Tabular Data ββββββββββΆ Clinical Experts (3 Models)
β
Ultrasound Image ββΆ CNN + Texture Features ββΆ Ultrasound Model
β
Meta Learner (Stacking)
β
Adaptive Multimodal Fusion
β
Final PCOS Risk (Low / Moderate / High)
- FastAPI-based REST services
- Secure, modular inference pipeline
- Models loaded once at application startup
- Stateless, auditable prediction endpoints
- Next.js (App Router) with TypeScript
- Guided clinical data entry workflow
- PDF upload with editable auto-filled fields
- Ultrasound image upload
- Explainable results dashboard
- CatBoost classifiers for tabular and ultrasound data
- EfficientNet-B0 for ultrasound feature extraction (transfer learning)
- ResNet50 for Grad-CAM explainability
- Meta learner for expert fusion
PCOS/
βββ app/ # FastAPI backend
β βββ api/ # Route handlers
β βββ core/ # Configuration
β βββ models/ # Pydantic schemas
β βββ main.py # App entry point
β
βββ data/
β βββ tabular/ # Clinical datasets
β βββ features/ # Ultrasound texture features
β βββ ultrasound/ # Raw & processed images
β
βββ frontend/ # Next.js frontend
β βββ app/ # App Router pages
β βββ lib/ # API utilities
β βββ public/ # Static assets
β
βββ models/ # Trained ML models
β βββ expert_hormonal.cbm
β βββ expert_metabolic.cbm
β βββ expert_symptom.cbm
β βββ ultrasound_catboost.cbm
β βββ resnet50_gradcam.pth
β
βββ scripts/ # Data preprocessing & training scripts
βββ notebooks/ # Experiments & evaluation
βββ requirements.txt
βββ README.md
Includes:
- Demographics (Age, Height, Weight, BMI)
- Hormonal markers (FSH, LH, AMH, TSH, PRL, Progesterone)
- Metabolic indicators (RBS, Weight Gain)
- Symptoms and lifestyle factors
Target Variable: PCOS (0 / 1)
-
Ovarian ultrasound scans
-
Feature extraction using:
- EfficientNet-B0 embeddings
- Local Binary Pattern (LBP) texture descriptors
- Blood test reports
- Ultrasound summaries
Used only for automatic data extraction, not for model training.
Instead of one monolithic model, clinical features are split into domain-specific experts:
| Expert | Feature Focus |
|---|---|
| Hormonal | FSH, LH, AMH, TSH, PRL |
| Metabolic | BMI, RBS, Weight Gain |
| Symptom | Acne, Hair Growth, Exercise |
- Model: CatBoostClassifier
- Metric: ROC-AUC
- Native handling of missing and categorical values
Outputs of all experts are combined using a meta learner.
Meta Features
- Expert probabilities
- Mean and maximum confidence
- Standard deviation
- Pairwise disagreement
Goal
- Improve calibration
- Reduce individual expert bias
- Increase robustness
- Backbone: EfficientNet-B0 (ImageNet pretrained)
- Texture features: LBP
- Final classifier: CatBoost
Final PCOS risk is computed dynamically:
- High tabular confidence β tabular weighted more
- High ultrasound confidence β ultrasound weighted more
- Otherwise β balanced fusion
- Dedicated ResNet50 CNN trained for visualization
- Grad-CAM heatmaps highlight ovarian regions
- Exposed via backend API
- Displayed in frontend alongside predictions
Validation Accuracy: ~81%
Supported Inputs
- PDF lab reports
- Tables and free text
Parsing Stack
- Camelot (table extraction)
- pdfplumber (text extraction)
- Regex-based numeric parsing
- Unit normalization and validation
- Confidence scoring per extracted field
Sample Output
{
"FSH": { "value": 5.8, "confidence": 0.95 },
"AMH": { "value": 8.4, "confidence": 0.93 }
}| Component | Metric | Score |
|---|---|---|
| Hormonal Expert | ROC-AUC | ~0.71 |
| Metabolic Expert | ROC-AUC | ~0.63 |
| Symptom Expert | ROC-AUC | ~0.75 |
| Meta Learner | ROC-AUC | ~0.80 |
| Ultrasound (Grad-CAM CNN) | Accuracy | ~81% |
POST /api/pcos/predict
FormData
tabular_data β JSON
ultrasound β Image (optional)
POST /api/pcos/parse-document
FormData
document β PDF
conda create -n pcos python=3.10
conda activate pcos
pip install -r requirements.txt
uvicorn app.main:app --reloadcd frontend
npm install
npm run devThis system is not a medical diagnostic tool.
It is intended for educational, research, and decision-support purposes only. Final diagnosis must always be made by a licensed healthcare professional.