COMP 3610 – Assignment 4: MLOps & Model Deployment

Name: Samuel Soman
ID: 816039318

End-to-end ML deployment pipeline: a tuned Random Forest model predicts tip_amount for NYC Yellow Taxi trips, tracked with MLflow, served via a FastAPI REST API, and containerised with Docker Compose.

Prerequisites

Requirement	Version
Python	3.10+
Docker Desktop	Latest (must be running for Part 3)
Git	Any

How to Run the Entire Assignment

Follow the steps below in order. The notebook handles everything from data download through model training, API testing, and Docker deployment.

1. Clone the repository

git clone <repo-url>
cd COMP3610-A4

2. Create a virtual environment and install dependencies

python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
source .venv/bin/activate

pip install -r requirements.txt

3. Run the notebook (Parts 1 & 2)

Open assignment4.ipynb in VS Code or Jupyter and execute all cells from top to bottom. No separate MLflow server is needed — the notebook uses a local SQLite-backed tracking store (sqlite:///mlflow.db) so everything runs in-process.

The notebook will:

Download and clean the NYC Yellow Taxi dataset (~1.8 M rows)
Engineer features (speed, log distance, fare ratios, weekend flag)
Part 1 — MLflow: Train 3 models (RF baseline, RF tuned, Linear Regression), log parameters/metrics/artifacts/tags to MLflow, compare runs, and register the best model in the Model Registry
Export the best model to models/rf_regressor.joblib
Part 2 — FastAPI: Display app.py and test_app.py source, then run pytest (10 tests — all should pass)

Note: scikit-learn imports can be slow on some machines. Allow a few minutes for the model training cells to complete.

4. (Optional) Browse the MLflow UI

To view logged experiments in the MLflow dashboard, run this in a separate terminal:

mlflow ui --backend-store-uri sqlite:///mlflow.db --port 5000

Then open http://localhost:5000.

5. Run the API locally (without Docker)

uvicorn app:app --port 8000

Swagger docs: http://localhost:8000/docs
Health check: http://localhost:8000/health

6. Run the test suite

pytest test_app.py -v

All 10 tests should pass:

Single prediction, batch prediction
Missing fields (422), out-of-range values (422)
Health check, model info
Zero-distance edge case, extreme fare edge case
Batch >100 rejection (422)
Swagger docs accessible

7. Docker — Build & Run (Part 3)

Make sure Docker Desktop is running, then execute the Docker cells in the notebook, or run manually:

# Build the image
docker build -t taxi-tip-api .

# Report image size
docker images taxi-tip-api

# Start all services (API + MLflow bonus)
docker compose up -d --build

# Verify services are running
docker compose ps

# Test the containerised API
curl http://localhost:8000/health
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d "{\"trip_distance\":3.5,\"passenger_count\":1,\"fare_amount\":15.0,\"pickup_hour\":14,\"pickup_day_of_week\":2,\"trip_duration_minutes\":12.0}"

# Shut down cleanly
docker compose down

Service	Port	Description
`api`	8000	FastAPI prediction service
`mlflow`	5000	MLflow tracking server (bonus)

Project Structure

COMP3610-A4/
├── assignment4.ipynb      # Main notebook (Parts 1-4, all outputs visible)
├── app.py                 # FastAPI application
├── test_app.py            # pytest test suite (10 tests)
├── Dockerfile             # Container definition for the API
├── docker-compose.yml     # Orchestrates API + MLflow services
├── requirements.txt       # Pinned Python dependencies
├── README.md              # This file
├── .gitignore             # Excludes data/, mlruns/, models/, etc.
├── .dockerignore          # Excludes unnecessary files from Docker build
└── models/                # (gitignored) Model artifacts generated by notebook

API Endpoints

Method	Path	Description
`POST`	`/predict`	Single tip-amount prediction
`POST`	`/predict/batch`	Batch predictions (max 100 trips)
`GET`	`/health`	Status, model loaded flag, uptime
`GET`	`/model/info`	Model name, version, features, training metrics

Input validation is handled by Pydantic (TripInput with 13 fields, 7 with constraints). Invalid requests return HTTP 422 with descriptive error messages. Unexpected server errors return a structured HTTP 500 JSON response (no stack traces exposed).

Environment Variables

Variable	Default	Description
`MODEL_PATH`	`models/rf_regressor.joblib`	Path to the saved model file
`MODEL_VERSION`	`1`	Model version string returned in responses
`MLFLOW_TRACKING_URI`	—	Set in docker-compose for the MLflow service

Project Structure

├── assignment4.ipynb      # Main notebook – MLflow, API demo, Docker demo
├── app.py                 # FastAPI application (lifespan model loading)
├── test_app.py            # pytest test suite (10 tests)
├── Dockerfile             # python:3.11-slim, layer-cached pip install
├── docker-compose.yml     # api + mlflow services
├── requirements.txt       # Pinned Python dependencies
├── README.md              # This file
├── .gitignore
├── .dockerignore
└── models/                # (gitignored) exported model artifacts

Environment Variables

Variable	Default	Description
`MODEL_PATH`	`models/rf_regressor.joblib`	Path to the saved model file
`MODEL_VERSION`	`1`	Model version string returned in responses
`MLFLOW_TRACKING_URI`	`http://localhost:5000`	MLflow server URL (overridden in Docker Compose)

AI Tools Used

Tool	Purpose
GitHub Copilot	- Copilot was used for notebook documentation assistance. - Aided in debugging and fixing gaps or errors in code. - Checking against assignment specifications to ensure accuracy and proper implementation.
ChatGPT 5.4	- Used Chat for recommendations on the structure of the assignment i.e. the order in which to start the assignment and which parts should be done first etc. - Documentation assistance for README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMP 3610 – Assignment 4: MLOps & Model Deployment

Prerequisites

How to Run the Entire Assignment

1. Clone the repository

2. Create a virtual environment and install dependencies

3. Run the notebook (Parts 1 & 2)

4. (Optional) Browse the MLflow UI

5. Run the API locally (without Docker)

6. Run the test suite

7. Docker — Build & Run (Part 3)

Project Structure

API Endpoints

Environment Variables

Project Structure

Environment Variables

AI Tools Used

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
assignment4.ipynb		assignment4.ipynb
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
test_app.py		test_app.py

Folders and files

Latest commit

History

Repository files navigation

COMP 3610 – Assignment 4: MLOps & Model Deployment

Prerequisites

How to Run the Entire Assignment

1. Clone the repository

2. Create a virtual environment and install dependencies

3. Run the notebook (Parts 1 & 2)

4. (Optional) Browse the MLflow UI

5. Run the API locally (without Docker)

6. Run the test suite

7. Docker — Build & Run (Part 3)

Project Structure

API Endpoints

Environment Variables

Project Structure

Environment Variables

AI Tools Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages