Skip to content

ArcaLunar/CustomerReviewSystem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BC3415 AI in Accounting & Finance - Individual Project

Project Overview

Business Case: Retail Customer Feedback Analysis for Product Insights

This project implements a text classification system to analyze customer product reviews for sentiment analysis. The system classifies reviews into three categories (positive, negative, neutral) to help identify problematic orders and improve customer satisfaction.

Project Design

Label Design

We follow a simple rating-based labeling scheme:

  • if the rating is $\lt 2.5$, then the review is labeled as negative
  • if the rating is $\in [2.5, 3.5]$, it will be labeled as neutral
  • otherwise the review is labeled as positive

Dataset Selection

We select Amazon Review (2018)1 as our skeleton dataset. However, due to computation resource limitation, small datasets are selected on purpose yet too small will also affect models' performance. In addition, in the dataset there more positive samples than neutral ones and negative ones (see below). So we have manually adjusted the ratio, so that each label has $10000$ samples.

Thus, we selected a subset of Amazon reviews. The size of this dataset is $30000$. The detail can be checked in utils/mixin.py. $80%$ is splitted into training set, $10%$ is test set and the rest $10%$ is validation set.

Usage

Model Training

Train the RoBERTa-based text classifier:

uv run finetune/bert_ft.py

Training Parameters:

  • Model: FacebookAI/roberta-base
  • Epochs: $5$
  • Batch size: $8$ per device
  • Learning rate: $2\times 10^{-5}$
  • Weight decay: $0.01$
  • Dataset split: $80%$ train, $10%$ test, $10%$ validation

Checkpoints will be saved to results/checkpoint-*/

Model Evaluation

Evaluate a trained model with confusion matrix and metrics:

uv run finetune/val.py --dataset .data/All_Beauty_5.json.gz --output-dir assets/val

Outputs:

  • Confusion matrix heatmap (PNG)
  • Classification report with precision/recall/F1 (TXT)
  • Accuracy and other metrics (losses)

Text Explainability

Generate explainability visualizations using LIME and SHAP:

# Multiple texts
uv run finetune/explain.py --text "Great product!" "Terrible quality" "Just okay"

# Single
uv run finetune/explain.py --text "This product broke after one day" --method lime

Outputs saved to assets/explain/:

  • LIME feature importance plots
  • SHAP visualization
  • Text explanations showing which words influenced predictions

Interactive Inference

Use the classifier interactively:

uv run server/processor/text.py

Or use it programmatically:

from server.processor.text import ReviewClassifier

classifier = ReviewClassifier()
results = classifier.predict([
    "This product is excellent!",
    "Terrible quality, very disappointed"
])

for result in results:
    print(f"Label: {result['label']}, Confidence: {result['conf']:.2f}")

Project Structure

.
├── finetune/
│   ├── bert_ft.py       # Main training script
│   ├── val.py           # Evaluation with confusion matrix
│   └── explain.py       # Explainability (LIME/SHAP)
├── server/
│   └── processor/
│       ├── text.py      # Inference wrapper
│       └── vlm.py       # Vision-language model (optional)
├── utils/
│   ├── constant.py      # Labels, paths, configs
│   ├── mixin.py         # Dataset mixing utilities
│   ├── parser.py        # JSON/GZ data parser
│   └── structure.py     # Data structures
├── results/             # Model checkpoints
├── assets/              # Visualizations & outputs
│   ├── val/            # Evaluation plots
│   └── explain/        # Explainability outputs
├── demo.py             # Quick demo script
└── README.md

Training Results

The dataset is split into three parts: train, test and val, where 80% is for train, 10% for test and 10% for val.

For text classification, we use FacebookAI/roberta-base as backbone and finetune on the above dataset with HuggingFace transformers, accelerate libraries. We trained 10 epochs, with batch size of 8, initial learning rate of 2×10⁻⁵, and weight decay of 0.01. After training, an accuracy of 77% can be reached.

Evaluation Result

The model achieves good performance on different categories:

Category Accuracy Log
All Beauty $90.06%$
Amazon Fashion $86.71%$
Appliance $80.11%$

Confusion Matrix

For Luxury Beauty category,

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages