End-to-end MLOps pipeline engineered for medical device AI — incorporating FDA Good Machine Learning Practice (GMLP) principles, PCCP-ready model versioning, automated demographic bias monitoring, and 510(k)-aligned evaluation workflows.
Most ML pipelines stop at model accuracy. Medical AI requires something fundamentally different: auditability, bias transparency, regulatory alignment, and the ability to safely update models post-market under a Predetermined Change Control Plan (PCCP).
This pipeline is built around the FDA's 10 GMLP guiding principles and structured to support the full Software as a Medical Device (SaMD) lifecycle — from data ingestion through post-market surveillance.
┌─────────────────────────────────────────────────────────────────────────┐
│ MedML-Ops Pipeline │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────────────┐ │
│ │ Data │ │ Schema │ │ Training │ │ Experiment │ │
│ │Ingestion │───▶│Validation│───▶│ Pipeline │───▶│ Tracking │ │
│ │ │ │+ Drift │ │(AutoML) │ │ (MLflow) │ │
│ └──────────┘ └──────────┘ └──────────┘ └───────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────────────┐ │
│ │ GMLP │ │ Bias │ │ Model │ │ Regression │ │
│ │Compliance│◀───│ Detector │◀───│Evaluator │◀───│ Testing │ │
│ │ Checklist│ │(Fairness)│ │(Clinical)│ │ (PCCP-aligned) │ │
│ └──────────┘ └──────────┘ └──────────┘ └───────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────────────┐ │
│ │ PCCP │ │ Model │ │ Model │ │ FastAPI │ │
│ │ Manager │───▶│ Registry │───▶│ Server │───▶│ Deployment │ │
│ │ │ │(MLflow) │ │(Versioned│ │ (Dockerized) │ │
│ └──────────┘ └──────────┘ └──────────┘ └───────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────┐ │
│ │ Production Monitoring │ │
│ │ Data Drift │ Prediction Drift │ │
│ │ PSI / KS │ Evidently AI │ │
│ │ Alerting │ Model Cards (Auto-gen) │ │
│ └────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Every component maps to one or more of the FDA's 10 Good Machine Learning Practice principles. The gmlp_checklist.py module automatically audits your pipeline state and generates compliance documentation suitable for 510(k) submissions.
bias_detector.py stratifies model performance across age groups, biological sex, and self-reported ethnicity — computing equalized odds, demographic parity gap, and calibration curves per subgroup. This directly addresses FDA guidance on eliminating bias from clinical AI.
pccp_manager.py implements the FDA's Predetermined Change Control Plan framework, allowing you to pre-define and validate model updates without requiring a new 510(k) for each version change.
model_evaluator.py goes beyond AUROC to support Youden's J threshold selection, iso-sensitivity curves, and comparison against predicate device performance benchmarks for 510(k) substantial equivalence arguments.
model_card_generator.py produces Google-style model cards extended with FDA submission metadata — intended use, training population demographics, known limitations, and subgroup performance tables.
| Component | Technology |
|---|---|
| Experiment Tracking | MLflow 2.x |
| Data Validation | Great Expectations + custom DICOM validation |
| Model Serving | FastAPI |
| Drift Monitoring | Evidently AI |
| Containerization | Docker + Docker Compose |
| AutoML / HPO | Optuna |
| Statistical Testing | SciPy, statsmodels |
| Calibration | scikit-learn calibration, netcal |
| Configuration | YAML + Pydantic |
| Code Quality | Black, isort, mypy, pre-commit |
medml-ops/
├── src/
│ ├── data_validation/
│ │ ├── schema_validator.py # Great Expectations-style schema + DICOM validation
│ │ └── bias_detector.py # Demographic fairness metrics & subgroup analysis
│ ├── training/
│ │ ├── experiment_tracker.py # MLflow GMLP-compliant experiment tracking
│ │ └── automated_training.py # Optuna HPO + reproducible training pipeline
│ ├── evaluation/
│ │ ├── model_evaluator.py # Clinical operating point + calibration + 510(k) metrics
│ │ └── regression_testing.py # PCCP-aligned regression & slice-based testing
│ ├── deployment/
│ │ ├── model_server.py # FastAPI inference server with prediction logging
│ │ └── docker/
│ │ ├── Dockerfile.train # Training container
│ │ ├── Dockerfile.serve # Serving container
│ │ └── docker-compose.yml # Full pipeline orchestration
│ ├── monitoring/
│ │ ├── drift_monitor.py # PSI, KS, chi-square drift detection + Evidently AI
│ │ └── model_card_generator.py # FDA-ready automated model cards
│ └── compliance/
│ ├── gmlp_checklist.py # FDA GMLP 10-principle automated audit
│ └── pccp_manager.py # Predetermined Change Control Plan manager
├── configs/
│ └── pipeline_config.yaml # Full pipeline configuration
├── scripts/
│ ├── run_pipeline.py # End-to-end pipeline execution
│ └── generate_report.py # Compliance report generation
├── docs/
│ ├── FDA_COMPLIANCE.md # GMLP, PCCP, 510(k), IEC 62304, 21 CFR 820
│ └── ARCHITECTURE.md # Pipeline architecture deep-dive
├── tests/ # Unit and integration tests
├── notebooks/ # Exploratory analysis notebooks
├── Makefile # Common pipeline commands
├── requirements.txt
├── setup.py
└── .gitignore
- Python 3.10+
- Docker & Docker Compose
- MLflow tracking server (or use local file store)
git clone https://github.com/yourusername/medml-ops.git
cd medml-ops
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -e .# Configure your pipeline
cp configs/pipeline_config.yaml configs/my_experiment.yaml
# Edit configs/my_experiment.yaml
# Run end-to-end
make pipeline CONFIG=configs/my_experiment.yaml
# Or step-by-step:
make validate # Data validation
make train # Model training
make evaluate # Evaluation + bias audit
make serve # Start inference server
make monitor # Start drift monitoring
make report # Generate compliance report# Build and start the full pipeline
cd src/deployment/docker
docker-compose up --build
# Services:
# - MLflow UI: http://localhost:5000
# - FastAPI server: http://localhost:8000
# - API Docs: http://localhost:8000/docs
# - Monitoring: http://localhost:8001from src.data_validation.schema_validator import MedicalDataValidator
validator = MedicalDataValidator(config_path="configs/pipeline_config.yaml")
report = validator.validate_dataset(
data_path="data/training_cohort.csv",
reference_path="data/reference_schema.json"
)
print(report.summary())from src.data_validation.bias_detector import DemographicBiasDetector
detector = DemographicBiasDetector(
sensitive_attributes=["age_group", "sex", "ethnicity"]
)
bias_report = detector.analyze(
y_true=labels,
y_pred=predictions,
demographics=demographic_df
)
bias_report.save_html("reports/bias_audit_v1.html")from src.training.experiment_tracker import GMLPExperimentTracker
with GMLPExperimentTracker(experiment_name="chest_xray_classifier") as tracker:
tracker.log_dataset_metadata(dataset_path, version="v2.1")
tracker.log_hyperparams(config)
# ... training loop ...
tracker.log_metrics({"auroc": 0.94, "f1": 0.87})
tracker.log_gmlp_documentation()from src.compliance.gmlp_checklist import GMLPComplianceChecker
checker = GMLPComplianceChecker(run_id=tracker.run_id)
report = checker.run_full_audit()
report.export_pdf("compliance/gmlp_audit_v1.pdf")| FDA Principle | Implementation |
|---|---|
| Multi-disciplinary expertise | Documented stakeholder sign-offs in PCCP |
| Good software engineering | Pre-commit hooks, type hints, unit tests |
| Clinical study design | Stratified splits, demographic balance checks |
| Data management | Schema validation, DICOM metadata audit, drift detection |
| Re-training practices | PCCP manager, automated regression testing |
| Model design tradeoffs | Clinical operating point selection, sensitivity/specificity curves |
| Human-AI collaboration | Confidence thresholding, uncertainty quantification |
| Test dataset independence | Train/test separation verification in GMLP checklist |
| Evaluation reflects deployment | Domain shift detection, production drift monitoring |
| Monitoring post-deployment | Evidently AI drift, PSI/KS alerts, automated model cards |
See docs/FDA_COMPLIANCE.md for the full regulatory mapping.
- FDA GMLP — Good Machine Learning Practice for Medical Device Development
- FDA AI/ML Action Plan — Artificial Intelligence/Machine Learning-Based Software as a Medical Device Action Plan
- IEC 62304 — Medical device software lifecycle processes
- 21 CFR Part 820 — FDA Quality System Regulation
- ISO 13485 — Medical devices quality management systems
- FDA 510(k) — Premarket notification framework
See CONTRIBUTING.md. All contributions must maintain GMLP compliance standards documented in docs/FDA_COMPLIANCE.md.
MIT License — see LICENSE.
Built for the intersection of software engineering rigor and FDA regulatory science. Designed to demonstrate what production medical AI engineering actually looks like.