BC3415 AI in Accounting & Finance - Individual Project

Project Overview

Business Case: Retail Customer Feedback Analysis for Product Insights

This project implements a text classification system to analyze customer product reviews for sentiment analysis. The system classifies reviews into three categories (positive, negative, neutral) to help identify problematic orders and improve customer satisfaction.

Project Design

Label Design

We follow a simple rating-based labeling scheme:

if the rating is $\lt 2.5$, then the review is labeled as negative
if the rating is $\in [2.5, 3.5]$, it will be labeled as neutral
otherwise the review is labeled as positive

Dataset Selection

We select Amazon Review (2018)¹ as our skeleton dataset. However, due to computation resource limitation, small datasets are selected on purpose yet too small will also affect models' performance. In addition, in the dataset there more positive samples than neutral ones and negative ones (see below). So we have manually adjusted the ratio, so that each label has $10000$ samples.

Thus, we selected a subset of Amazon reviews. The size of this dataset is $30000$. The detail can be checked in utils/mixin.py. $80%$ is splitted into training set, $10%$ is test set and the rest $10%$ is validation set.

Usage

Model Training

Train the RoBERTa-based text classifier:

uv run finetune/bert_ft.py

Training Parameters:

Model: FacebookAI/roberta-base
Epochs: $5$
Batch size: $8$ per device
Learning rate: $2\times 10^{-5}$
Weight decay: $0.01$
Dataset split: $80%$ train, $10%$ test, $10%$ validation

Checkpoints will be saved to results/checkpoint-*/

Model Evaluation

Evaluate a trained model with confusion matrix and metrics:

uv run finetune/val.py --dataset .data/All_Beauty_5.json.gz --output-dir assets/val

Outputs:

Confusion matrix heatmap (PNG)
Classification report with precision/recall/F1 (TXT)
Accuracy and other metrics (losses)

Text Explainability

Generate explainability visualizations using LIME and SHAP:

# Multiple texts
uv run finetune/explain.py --text "Great product!" "Terrible quality" "Just okay"

# Single
uv run finetune/explain.py --text "This product broke after one day" --method lime

Outputs saved to assets/explain/:

LIME feature importance plots
SHAP visualization
Text explanations showing which words influenced predictions

Interactive Inference

Use the classifier interactively:

uv run server/processor/text.py

Or use it programmatically:

from server.processor.text import ReviewClassifier

classifier = ReviewClassifier()
results = classifier.predict([
    "This product is excellent!",
    "Terrible quality, very disappointed"
])

for result in results:
    print(f"Label: {result['label']}, Confidence: {result['conf']:.2f}")

Project Structure

.
├── finetune/
│   ├── bert_ft.py       # Main training script
│   ├── val.py           # Evaluation with confusion matrix
│   └── explain.py       # Explainability (LIME/SHAP)
├── server/
│   └── processor/
│       ├── text.py      # Inference wrapper
│       └── vlm.py       # Vision-language model (optional)
├── utils/
│   ├── constant.py      # Labels, paths, configs
│   ├── mixin.py         # Dataset mixing utilities
│   ├── parser.py        # JSON/GZ data parser
│   └── structure.py     # Data structures
├── results/             # Model checkpoints
├── assets/              # Visualizations & outputs
│   ├── val/            # Evaluation plots
│   └── explain/        # Explainability outputs
├── demo.py             # Quick demo script
└── README.md

Training Results

The dataset is split into three parts: train, test and val, where 80% is for train, 10% for test and 10% for val.

For text classification, we use FacebookAI/roberta-base as backbone and finetune on the above dataset with HuggingFace transformers, accelerate libraries. We trained 10 epochs, with batch size of 8, initial learning rate of 2×10⁻⁵, and weight decay of 0.01. After training, an accuracy of 77% can be reached.

The model achieves good performance on different categories:

Category	Accuracy	Log
All Beauty	$90.06%$
Amazon Fashion	$86.71%$
Appliance	$80.11%$

Confusion Matrix

For Luxury Beauty category,

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
finetune		finetune
server/processor		server/processor
utils		utils
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
infer.py		infer.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BC3415 AI in Accounting & Finance - Individual Project

Project Overview

Project Design

Label Design

Dataset Selection

Usage

Model Training

Model Evaluation

Text Explainability

Interactive Inference

Project Structure

Training Results

Confusion Matrix

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

ArcaLunar/CustomerReviewSystem

Folders and files

Latest commit

History

Repository files navigation

BC3415 AI in Accounting & Finance - Individual Project

Project Overview

Project Design

Label Design

Dataset Selection

Usage

Model Training

Model Evaluation

Text Explainability

Interactive Inference

Project Structure

Training Results

Confusion Matrix

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages