- Sanjit Srinivasan (s256657)
- João Prazeres (s243036)
- Kornel Gładkowski (s242908)
- Mounika Maidamshetti (s250148)
This repository contains the project work carried out by group 41 in the MLOps course taught at DTU (course website).
The goal of the project is to apply natural language processing (NLP) techniques to a supervised text classification problem, namely the automatic prediction of the priority level of customer support tickets and, optionally, the subject or department to which each ticket should be routed. The main goal of this project is developing and deploying the solution in a streamlined, reproducible, and efficient manner that reflects real-world machine learning workflows.
We are using the PyTorch and Hugging Face Transformers framework. Weights & Biases for experiment tracking and model registry, DVC for data versioning, FastAPI for serving; these frameworks are core components of the system and are fully integrated into the project lifecycle.
We are using the Kaggle dataset Customer IT Support - Ticket Dataset. This dataset contains a total of 28,600 observations, each of which will have a topic assigned by the customer, a description of the customer's issue or inquiry, and a priority level (low, medium, or critical) assigned to the ticket. Additionally, each observation will have a department to which the email ticket is categorized.
We intend to use a pre-trained natural language processing (NLP) model. To train the model and perform hyperparameter sweeping, we will initially use compressed versions of BERT, such as DistilBERT or ALBERT. These compressed versions allow for more efficient training and a greater focus on the MLOps aspect of the project.
The directory structure of the project looks like this:
.
├── configs # Configuration files
├── .devcontainer # Development container configuration
│ ├── devcontainer.json
│ └── post_create.sh
├── dockerfiles # Dockerfiles
│ ├── api.dockerfile
│ └── train.dockerfile
├── docs # Documentation
│ ├── source
│ │ └── index.md
│ ├── mkdocs.yaml
│ └── README.md
├── .github # GitHub actions and automation
│ ├── agents
│ │ └── dtu_mlops_agent.md
│ ├── prompts
│ │ └── add_test.prompt.md
│ └── workflows
│ ├── linting.yaml
│ ├── pre-commit-update.yaml
│ └── tests.yaml
├── models # Trained models
├── notebooks # Jupyter notebooks
├── reports # Reports
│ └── figures
├── src # Source code
│ └── customer_support
│ ├── api.py
│ ├── data.py
│ ├── evaluate.py
│ ├── __init__.py
│ ├── model.py
│ ├── train.py
│ └── visualize.py
├── tests # Tests
│ ├── __init__.py
│ ├── test_api.py
│ ├── test_data.py
│ └── test_model.py
├── AGENTS.md
├── .gitignore
├── LICENSE
├── .pre-commit-config.yaml
├── pyproject.toml # Python project file
├── .python-version
├── README.md # Project README
├── renovate.json # Renovate configuration
├── tasks.py # Project tasks
└── uv.lock # UV lock fileCreated using mlops_template, a cookiecutter template for getting started with Machine Learning Operations (MLOps).
