This repo contains a MobileNetV2 model and the full pipeline to train it and deploy it. The code under src/ is packaged into a single Docker image that can either run training or serve inference via FastAPI. A small shell script switches the entrypoint between train and predict.
- MinIO for training data and model storage
- MLflow for experiment tracking, metrics, and model registry
- Kubeflow Trainer for running the training job
- KServe for hosting inference endpoints (QA and Prod)
- GitLab for CI/CD and the container registry
- Backbone: MobileNetV2 with pretrained weights to keep inference lean
- Data: SPEED (Spacecraft Pose Estimation Dataset) for training and evaluation of pose estimation on noncooperative spacecraft
- Input: an in-orbit satellite image
- Output: a pose estimate
- Source files:
config.py,data_processing.py,model.py,predict.py,train.py - Serving:
predict.pyexposes a FastAPI app
- Unit tests
- Build image and push to the container registry
- Train model on Kubeflow. Metrics and artifacts tracked in MLflow
- Deploy QA on KServe
- Integration tests against the live QA endpoint
- Deploy Prod
- Manual rollback jobs that apply previously saved manifests
Basic commands:
pytest tests/unit/
pytest tests/integration/
docker run IMAGENAME train
docker run -p 8000:8000 IMAGENAME predict
CI/CD variables expected by the pipeline:
AWS_ACCESS_KEY_ID
AWS_REGION
AWS_SECRET_ACCESS_KEY
DOCKER_AUTH_CONFIG
KSERVE_ENDPOINT
MINIO_ACCESS_KEY
MINIO_ENDPOINT
MINIO_SECRET_KEY
MLFLOW_TRACKING_URI