Business Case: Retail Customer Feedback Analysis for Product Insights
This project implements a text classification system to analyze customer product reviews for sentiment analysis. The system classifies reviews into three categories (positive, negative, neutral) to help identify problematic orders and improve customer satisfaction.
We follow a simple rating-based labeling scheme:
- if the rating is
$\lt 2.5$ , then the review is labeled as negative - if the rating is
$\in [2.5, 3.5]$ , it will be labeled as neutral - otherwise the review is labeled as positive
We select Amazon Review (2018)1 as our skeleton dataset. However, due to computation resource limitation, small datasets are selected on purpose yet too small will also affect models' performance. In addition, in the dataset there more positive samples than neutral ones and negative ones (see below). So we have manually adjusted the ratio, so that each label has
Thus, we selected a subset of Amazon reviews. The size of this dataset is utils/mixin.py.
Train the RoBERTa-based text classifier:
uv run finetune/bert_ft.pyTraining Parameters:
- Model:
FacebookAI/roberta-base - Epochs:
$5$ - Batch size:
$8$ per device - Learning rate:
$2\times 10^{-5}$ - Weight decay:
$0.01$ - Dataset split:
$80%$ train,$10%$ test,$10%$ validation
Checkpoints will be saved to results/checkpoint-*/
Evaluate a trained model with confusion matrix and metrics:
uv run finetune/val.py --dataset .data/All_Beauty_5.json.gz --output-dir assets/valOutputs:
- Confusion matrix heatmap (PNG)
- Classification report with precision/recall/F1 (TXT)
- Accuracy and other metrics (losses)
Generate explainability visualizations using LIME and SHAP:
# Multiple texts
uv run finetune/explain.py --text "Great product!" "Terrible quality" "Just okay"
# Single
uv run finetune/explain.py --text "This product broke after one day" --method limeOutputs saved to assets/explain/:
- LIME feature importance plots
- SHAP visualization
- Text explanations showing which words influenced predictions
Use the classifier interactively:
uv run server/processor/text.pyOr use it programmatically:
from server.processor.text import ReviewClassifier
classifier = ReviewClassifier()
results = classifier.predict([
"This product is excellent!",
"Terrible quality, very disappointed"
])
for result in results:
print(f"Label: {result['label']}, Confidence: {result['conf']:.2f}").
├── finetune/
│ ├── bert_ft.py # Main training script
│ ├── val.py # Evaluation with confusion matrix
│ └── explain.py # Explainability (LIME/SHAP)
├── server/
│ └── processor/
│ ├── text.py # Inference wrapper
│ └── vlm.py # Vision-language model (optional)
├── utils/
│ ├── constant.py # Labels, paths, configs
│ ├── mixin.py # Dataset mixing utilities
│ ├── parser.py # JSON/GZ data parser
│ └── structure.py # Data structures
├── results/ # Model checkpoints
├── assets/ # Visualizations & outputs
│ ├── val/ # Evaluation plots
│ └── explain/ # Explainability outputs
├── demo.py # Quick demo script
└── README.md
The dataset is split into three parts: train, test and val, where 80% is for train, 10% for test and 10% for val.
For text classification, we use FacebookAI/roberta-base as backbone and finetune on the above dataset with HuggingFace transformers, accelerate libraries. We trained 10 epochs, with batch size of 8, initial learning rate of 2×10⁻⁵, and weight decay of 0.01. After training, an accuracy of 77% can be reached.
The model achieves good performance on different categories:
| Category | Accuracy | Log |
|---|---|---|
| All Beauty | ![]() |
|
| Amazon Fashion | ![]() |
|
| Appliance | ![]() |
For Luxury Beauty category,





