An classification pipeline designed to automate the routing of customer support tickets. This project compares traditional Machine Learning baselines (SVM, SGD) against a Deep Learning approach (DistilBERT) to categorize tickets into 10 distinct support queues.
https://huggingface.co/datasets/Tobi-Bueck/customer-support-tickets- Multi-Model Architecture:
- Baseline: TF-IDF Vectorization with Linear SVM and SGD Classifiers.
- Deep Learning: Fine-tuned DistilBERT (Hugging Face Transformers) using Class Weighting to effectively handle imbalanced data.
- Production-Ready API: A FastAPI service designed to serve predictions in real-time.
- Dockerized Deployment: Fully containerized application using
uvfor dependency management and Python 3.12. - GPU Acceleration: Optimized for training on NVIDIA GPUs, specifically configured for RTX 50-series (Blackwell) hardware via PyTorch Nightly.
demo_triage.mp4
Evaluated on a dataset of ~28k English support tickets. The Weighted DistilBERT model achieved the highest performance by successfully identifying rare categories that simpler models often ignore.
| Model | Accuracy | Macro F1-Score | Key Strength |
|---|---|---|---|
| Linear SVM | 52% | 0.53 | Fast training baseline. |
| SGD Classifier | 67% | 0.68 | Strong performance with N-grams. |
| DistilBERT (Weighted) | 70% | 0.74 | Superior on rare classes (e.g., General Inquiry F1: 0.79). |
Note: Class weights were critical for the transformer model to improve General Inquiry F1-scores from 0.05 to 0.79.
- Language: Python 3.12
- Deep Learning: PyTorch (Nightly), Transformers (Hugging Face)
- Machine Learning: Scikit-learn, Pandas, NumPy
- API: FastAPI, Uvicorn
- DevOps: Docker, uv (Package Manager)
This project uses uv for fast, reproducible dependency management.
git clone https://github.com/James-Crockett/Support_Ticket_Auto_Triage.git
cd Support_Ticket_Auto_Triage# Install uv if you haven't already
pip install uv
# Sync dependencies (creates .venv automatically)
uv syncThe project is configured to use PyTorch Nightly (CUDA 12.8) for compatibility with RTX 50-series (sm_120) hardware. Maybe in future this would be unnecessary.
The training logic is split into Jupyter Notebooks:
1. notebooks/data_exp_linear.ipynb: EDA and Linear baseline models.
2. notebooks/transformer.ipynb: Deep Learning DistilBERT finetuned.
Build and run the inference API:
docker build -t ticket-triage-api .
docker run -p 8000:8000 ticket-triage-apicurl -X 'POST' \
'http://localhost:8000/predict' \
-d 'subject=Login Issue&body=I cannot access my account.'├── models/ # Saved models (Git-ignored)
│ ├── sgd/ # Serialized SGD model & vectorizer
│ └── transformer/ # Fine-tuned DistilBERT model
├── notebooks/ # Jupyter Notebooks for training
├── main.py # FastAPI application entry point
├── Dockerfile # Production Docker config
├── pyproject.toml # Dependency configuration
└── README.md # Project documentation