Skip to content

elmarto87/support-ticket-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Support Ticket Analyzer

A Jupyter notebook pipeline that takes a CSV of support tickets, classifies each one by category and severity using Claude, and produces a prioritized PM report — with a Fix Now / Next Sprint / Backlog breakdown and leadership-ready insights.


The Problem

PMs inheriting a support queue face the same problem every sprint: hundreds of tickets, no consistent categorization, and stakeholders asking "what are the top issues?" The answer requires reading through noise to spot patterns — a process that doesn't scale and introduces bias toward whichever tickets were filed most recently or loudest.

This tool automates the triage: classify every ticket by category and AI-assessed severity, surface recurring themes, and output a structured backlog recommendation in under 5 minutes.


The Solution

CSV of support tickets → Classify (category + severity + topic) → Theme synthesis → PM report

Input: Any CSV with subject and body columns. Optional: id, date, priority.

Output:

  • Per-ticket classification: category, AI-assessed severity, topic label, one-liner summary
  • Top 5 recurring themes with severity skew
  • Prioritized backlog: Fix Now / Next Sprint / Backlog
  • PM insights for leadership
  • Charts: category breakdown, severity distribution, ticket volume over time
  • Exported CSV with all classifications for further analysis

Quickstart

1. Clone the repo

git clone https://github.com/elmarto87/support-ticket-analyzer.git
cd support-ticket-analyzer

2. Install dependencies

pip install -r requirements.txt

3. Set up your API key

cp .env.example .env
# Edit .env and add your Anthropic API key

4. Run with sample data

jupyter notebook main.ipynb

sample_data/sample_tickets.csv is included so you can run the full pipeline immediately — no data required.


CSV Format

Column Required Description
subject Ticket subject line
body Full ticket description
id optional Ticket ID for reference
date optional Date filed — enables volume trend chart
priority optional Agent-assigned priority (not used for AI severity)

Example Output

Run on 50 sample SaaS support tickets

Category Breakdown

Severity Breakdown

Volume Trend

Top themes Claude identified:

  • Authentication & Account Access Failures — mostly high severity — SSO breakdowns, 2FA SMS failures, and password reset issues are collectively locking users out; treat as systemic, not isolated
  • Data Integrity & Export Reliability — mostly high severity — CSV import corruption, missing custom fields in exports, and CRM sync delays are blocking reporting workflows
  • Security & Compliance Gaps — mostly high severity — Non-functional API key revocation and missing audit logs represent active compliance risk

PM backlog (Claude-generated):

🔴 Fix Now

  • SSO login failing for ~20% of users with no workaround — P0 blocker
  • API key revoke button non-functional — active security vulnerability
  • CSV import silently corrupting blank fields — data integrity risk

🟡 Next Sprint

  • Consolidate password reset + 2FA SMS delivery failures into one infra audit
  • Profile and fix reports page query performance regression
  • Investigate post-upgrade entitlement provisioning pipeline

🟢 Backlog

  • Folder and tag organization for content
  • Dashboard widget layout persistence
  • Search result ordering improvements

Tradeoffs and Decisions

1. Claude classification vs. embedding-based clustering

Embedding clustering groups tickets by semantic similarity but produces unlabeled clusters — you still have to read each cluster to understand what it represents. Claude classification returns human-readable labels (category, topic, one-liner) directly, making the output immediately actionable without manual interpretation. The tradeoff: Claude occasionally disagrees with a human classifier on edge cases, while clustering is fully unsupervised. For a PM use case where interpretability matters more than precision, classification wins.

2. AI-assessed severity vs. agent-assigned priority

Agent-assigned priority fields are unreliable — enterprise customers tend to mark everything high, and support agents triage by loudness rather than impact. The analyzer ignores the input priority field and has Claude re-assess severity based on the ticket text (user impact, number of users affected, availability vs. UX issue). This produces a more consistent signal for prioritization.

3. Batch size of 20 tickets per API call

Larger batches (50+) reduce API calls but increase the chance of the model losing track of index alignment across a long context. Smaller batches (5–10) are more accurate but multiply cost. 20 tickets per call balances cost, speed, and classification consistency — the model has enough context to normalize topic labels across similar tickets without losing index accuracy.


What I Learned

  • Agent-assigned priority fields are almost useless for PM prioritization — Claude's re-assessment based on the ticket description is more consistent and impact-aligned than whatever the support team entered
  • Authentication issues cluster together in the data but feel like separate bugs in the queue — seeing them as a theme (rather than individual tickets) is what surfaces the systemic root cause pattern
  • Asking Claude to generate a "one-liner" per ticket, rather than summarizing the full body, produces a better input for the theme synthesis step — shorter, more normalized text makes the cross-ticket pattern recognition more accurate

Requirements

  • Python 3.9+
  • Anthropic API key — get one here
  • See requirements.txt for package versions

About

Classify support tickets by category and severity using Claude, surface recurring themes, and generate a prioritized Fix Now / Next Sprint / Backlog PM report

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors