Auto-label help-desk tickets with topic + sentiment using a multi-task transformer head — served via FastAPI, evaluated in a Streamlit dashboard, with PII redaction built-in.
| Metric | Value |
|---|---|
| Topic accuracy | ~83% (keyword mock) |
| Sentiment accuracy | ~75% (keyword mock) |
| Latency | <5ms (mock) / ~14ms (DistilBERT) |
| PII redacted | email, phone, ticket IDs |
Replace mock inference with the fine-tuned DistilBERT head by running
python -m src.models.train.
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Run API
uvicorn app.main:app --host 0.0.0.0 --port 8000
# → http://localhost:8000/docs
# Run eval dashboard
streamlit run src/eval/dashboard.py
# Run tests
pytest tests/ -vdocker build -t triage:latest .
docker run -p 8000:8000 triage:latestPOST /predict
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "Refund failed twice; card charged."}'Response
{
"topic": {"label": "billing", "score": 0.82},
"sentiment": {"label": "neg", "score": 0.90},
"probs": {
"topic": {"billing": 0.82, "bug": 0.04, ...},
"sentiment": {"neg": 0.90, "neu": 0.05, "pos": 0.05}
},
"latency_ms": 3,
"model_version": "mock-keyword-0.1.0"
}GET /health
{"status": "ok", "model_version": "mock-keyword-0.1.0"}login · billing · bug · feature · shipping · other
neg · neu · pos
app/
└── main.py — FastAPI app factory
src/
├── infer/
│ ├── preprocess.py — PII redaction (email, phone, ticket IDs)
│ └── service.py — inference router (mock + real model path)
├── models/
│ ├── multitask_head.py — DistilBERT multi-task PyTorch module
│ └── train.py — fine-tuning script (DistilBERT on CSV)
└── eval/
└── dashboard.py — Streamlit eval dashboard
tests/
├── test_service.py — 24 API tests
└── test_preprocess.py — 7 PII redaction tests
data/
└── seed/seed.csv — 36 labelled seed tickets for training + eval
# Train on seed data (36 examples — add more for real accuracy)
python -m src.models.train \
--train_csv data/seed/seed.csv \
--epochs 3 \
--batch_size 8 \
--lr 2e-5
# Saves to artifacts/model/ — API picks it up automatically on restartpreprocess.py strips emails, phone numbers, and ticket IDs before any logging or inference:
from src.infer.preprocess import redact
text, n = redact("Contact user@example.com or call 555-123-4567 about TKT-001.")
# → "<email> or call <phone> about <ticket>", 3
