Sentiment-Analysis-using-NLP-

Multi-Class Sentiment Analysis with BERT and PyTorch

This project is an end-to-end Natural Language Processing (NLP) pipeline for multi-class sentiment analysis. It uses a pre-trained BERT model from Hugging Face, fine-tuned on the Google Play Store User Reviews dataset to classify app reviews into three categories: Positive, Negative, or Neutral.

The entire project is implemented in a Google Colab notebook, demonstrating the full workflow from data loading and cleaning to model training, evaluation, and inference.

Key Features

Multi-Class Classification: Classifies text into Positive, Negative, and Neutral sentiments.
State-of-the-Art Model: Leverages bert-base-uncased, a powerful pre-trained Transformer model, fine-tuned for high accuracy.
End-to-End Pipeline: Includes all steps: data preprocessing, BERT tokenization, model training, performance evaluation, and a prediction function for new text.
High Performance: Achieved ~94% accuracy on the validation set after just 3 epochs of training.
PyTorch-Powered: Built using PyTorch for model training and management, including the use of GPU acceleration.

Technologies Used

Python 3.8+
PyTorch: For building and training the deep learning model.
Hugging Face Transformers: For loading the pre-trained BERT model and its tokenizer.
Scikit-learn: For splitting data and performance evaluation.
Pandas: For data manipulation and cleaning.
Seaborn & Matplotlib: For data visualization and plotting training performance.
Google Colab: As the development and training environment with GPU support.

Project Structure

.
├── bert-sentiment-model/      # Directory for the saved fine-tuned model and tokenizer
├── googleplaystore_user_reviews.csv  # The dataset file (not included in repo)
└── Sentiment_Analysis_with_BERT.ipynb # The main Colab notebook with all the code
└── README.md                  # This file

Setup and Usage

To run this project, you can follow these steps:

Clone the repository (optional):

git clone [https://github.com/your-username/your-repo-name.git](https://github.com/your-username/your-repo-name.git)
cd your-repo-name

Open in Google Colab:
- Go to Google Colab.
- Click on File > Upload notebook and upload the Sentiment_Analysis_with_BERT.ipynb file.
Enable GPU:
- In the Colab notebook, navigate to Runtime > Change runtime type.
- Select GPU from the "Hardware accelerator" dropdown menu.
Download the Dataset:
- Download the dataset from Kaggle: Google Play Store User Reviews.
- Upload the googleplaystore_user_reviews.csv file to your Colab session using the file-explorer pane on the left.
Run the Notebook:
- Execute the cells in the notebook sequentially.
- The notebook will automatically install all required dependencies. The training process will take approximately 15-20 minutes on a standard Colab GPU.

How It Works

The notebook is divided into the following key stages:

Data Loading and Cleaning: The raw CSV is loaded, and rows with missing reviews or sentiments are dropped.
Preprocessing: Sentiment labels ('Positive', 'Negative', 'Neutral') are mapped to numerical values (2, 1, 0).
BERT Tokenization: The text is tokenized using the BertTokenizer, converting it into a format suitable for the model (Input IDs and Attention Masks).
Training: The BertForSequenceClassification model is fine-tuned on the training data for 3 epochs.
Evaluation: The model's performance is evaluated on the validation set after each epoch.
Inference: A prediction function is provided to test the trained model on new, unseen sentences.

Using the Prediction Function

Once the model is trained, you can easily predict the sentiment of any text:

# Example of using the prediction function from the notebook
review_text = "This app is great, but the latest update has some bugs."
predicted_sentiment = predict_sentiment(review_text)

print(f"Review: '{review_text}'")
print(f"Predicted Sentiment: {predicted_sentiment}")
# Expected Output: 'Neutral' or 'Positive'

Results

The model was trained for 3 epochs and achieved the following performance on the validation set:

Validation Accuracy: ~94%
Validation Loss: ~0.20

Performance Plots

(Optional: You can save the plots from your notebook as images and add them here)

Training vs. Validation Loss

Validation Accuracy

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Sentiment_Analysis_Using_NLP.ipynb		Sentiment_Analysis_Using_NLP.ipynb
sentiment_analysis_using_nlp.py		sentiment_analysis_using_nlp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment-Analysis-using-NLP-

Multi-Class Sentiment Analysis with BERT and PyTorch

Key Features

Technologies Used

Project Structure

Setup and Usage

How It Works

Using the Prediction Function

Results

Performance Plots

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentiment-Analysis-using-NLP-

Multi-Class Sentiment Analysis with BERT and PyTorch

Key Features

Technologies Used

Project Structure

Setup and Usage

How It Works

Using the Prediction Function

Results

Performance Plots

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages