Skip to content

smakde/NLP-Ticket-Triage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Ticket Triage & Sentiment

CI Python 3.12 License: MIT

Auto-label help-desk tickets with topic + sentiment using a multi-task transformer head — served via FastAPI, evaluated in a Streamlit dashboard, with PII redaction built-in.


Results (mock inference on seed data)

Metric Value
Topic accuracy ~83% (keyword mock)
Sentiment accuracy ~75% (keyword mock)
Latency <5ms (mock) / ~14ms (DistilBERT)
PII redacted email, phone, ticket IDs

Replace mock inference with the fine-tuned DistilBERT head by running python -m src.models.train.


API Screenshot

FastAPI Docs

Eval Dashboard Screenshot

Eval Dashboard


Quickstart

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Run API
uvicorn app.main:app --host 0.0.0.0 --port 8000
# → http://localhost:8000/docs

# Run eval dashboard
streamlit run src/eval/dashboard.py

# Run tests
pytest tests/ -v

Docker

docker build -t triage:latest .
docker run -p 8000:8000 triage:latest

API

POST /predict

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "Refund failed twice; card charged."}'

Response

{
  "topic":     {"label": "billing", "score": 0.82},
  "sentiment": {"label": "neg",     "score": 0.90},
  "probs": {
    "topic":     {"billing": 0.82, "bug": 0.04, ...},
    "sentiment": {"neg": 0.90, "neu": 0.05, "pos": 0.05}
  },
  "latency_ms": 3,
  "model_version": "mock-keyword-0.1.0"
}

GET /health

{"status": "ok", "model_version": "mock-keyword-0.1.0"}

Topics

login · billing · bug · feature · shipping · other

Sentiment

neg · neu · pos


Project Structure

app/
└── main.py             — FastAPI app factory
src/
├── infer/
│   ├── preprocess.py   — PII redaction (email, phone, ticket IDs)
│   └── service.py      — inference router (mock + real model path)
├── models/
│   ├── multitask_head.py — DistilBERT multi-task PyTorch module
│   └── train.py          — fine-tuning script (DistilBERT on CSV)
└── eval/
    └── dashboard.py    — Streamlit eval dashboard
tests/
├── test_service.py     — 24 API tests
└── test_preprocess.py  — 7 PII redaction tests
data/
└── seed/seed.csv       — 36 labelled seed tickets for training + eval

Training

# Train on seed data (36 examples — add more for real accuracy)
python -m src.models.train \
  --train_csv data/seed/seed.csv \
  --epochs 3 \
  --batch_size 8 \
  --lr 2e-5
# Saves to artifacts/model/ — API picks it up automatically on restart

PII Redaction

preprocess.py strips emails, phone numbers, and ticket IDs before any logging or inference:

from src.infer.preprocess import redact
text, n = redact("Contact user@example.com or call 555-123-4567 about TKT-001.")
# → "<email> or call <phone> about <ticket>", 3

About

Compact transformer to auto-label help-desk tickets (topic + sentiment) with a FastAPI endpoint, eval dashboard, and MLOps glue.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages