An end-to-end Deep Learning system for binary sentiment classification of movie reviews using a Bi-Directional Recurrent Neural Network (RNN), with complete text preprocessing, model training, evaluation, and real-time deployment through a Flask web application.
This project implements a complete end-to-end Sentiment Analysis system using Natural Language Processing (NLP) and Deep Learning.
The system classifies movie reviews as Positive or Negative using a Bi-Directional Recurrent Neural Network (RNN) built with TensorFlow and Keras.
The project covers the entire pipeline including:
- Data preprocessing
- Text cleaning and normalization
- Word embedding generation
- Deep learning model training
- Model evaluation
- Model saving
- Real-time deployment using Flask
Understanding customer sentiment is critical for businesses and platforms that collect reviews.
The objective of this project is to:
- Analyze movie review text
- Convert raw text into numerical representations
- Train a deep learning model to classify sentiment
- Deploy the trained model for real-time predictions
- Dataset Used: IMDB Movie Reviews Dataset
- Source: Public IMDB dataset (CSV format)
- Total Used: First 10,000 reviews
- Target Labels:
- Positive → 1
- Negative → 0
The dataset contains:
- Review text
- Sentiment label
The model is built using a Bi-Directional RNN architecture.
- Embedding Layer
- Masking Layer
- Bi-Directional SimpleRNN (Layer 1)
- Bi-Directional SimpleRNN (Layer 2)
- Bi-Directional SimpleRNN (Layer 3)
- Dense Output Layer (Sigmoid activation)
It processes text:
- Forward direction (left → right)
- Backward direction (right → left)
This helps capture contextual meaning more effectively.
- Load CSV dataset
- Select first 10,000 rows
- Map sentiment labels to binary (0/1)
- Convert text to lowercase
- Remove punctuation
- Remove stopwords
- Apply lemmatization (WordNetLemmatizer)
- One-hot encoding
- Vocabulary size: 10,000
- Padding sequences to fixed length
- Convert text into numerical tensors
- Training set: 7000 samples
- Validation set: 2000 samples
- Test set: 1000 samples
- Optimizer: Adam
- Loss Function: Binary Crossentropy
- Epochs: 20
- Batch Size: 100
- Metric: Accuracy
The trained model is saved as:
review.pkl
The trained model is deployed using Flask.
- User enters a review
- Text is cleaned and processed
- Model predicts sentiment
- Displays:
- Sentiment (Positive / Negative)
- Confidence score
Sentiment_Analysis/
│
├── log_files/
├── templates/
│ └── index.html
├── main.py
├── RNN.py
├── embedding.py
├── lemmatization.py
├── log_code.py
├── app.py
├── test_model.py
├── review.pkl
├── requirements.txt
└── README.md
- Python
- NumPy
- Pandas
- NLTK
- TensorFlow / Keras
- Scikit-learn
- Flask
- Matplotlib
git clone https://github.com/your-username/Sentiment_Analysis.git
cd Sentiment_Analysispip install -r requirements.txtimport nltk
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')python main.pypython app.pyOpen in browser:
http://127.0.0.1:5000/
Input:
The movie was absolutely amazing and emotionally powerful.
Output:
Positive (confidence: 0.92)
This system can be extended for:
- Movie review classification
- Product review analysis
- Customer feedback monitoring
- Opinion mining
- Social media sentiment analysis
✔ End-to-end Deep Learning pipeline
✔ Custom text preprocessing implementation
✔ Bi-Directional RNN architecture
✔ Model serialization using Pickle
✔ Real-time web deployment with Flask
✔ Structured logging system
- Helps companies understand user sentiment
- Automates feedback classification
- Enables data-driven decision-making
- Reduces manual review analysis effort
Hema Malini Gangumalla Aspiring Data Scientist | NLP & Deep Learning Enthusiast
MIT License