BERT Text Classification Application

An enterprise-level text classification application using BERT transformers with a modern Streamlit web interface.

Overview

This application provides a comprehensive solution for training and deploying BERT models for text classification tasks. Built with Streamlit and leveraging Hugging Face's transformers library, it offers an intuitive web interface for both technical and non-technical users.

Features

Interactive Training: Upload CSV datasets and train customized BERT models through a user-friendly web interface
Real-time Prediction: Classify new text using trained models with instant results
Batch Processing: Run predictions on multiple texts simultaneously
Model Management: Save, load, and manage trained models with automatic persistence
Data Visualization: View data distributions, confusion matrices, and model performance metrics
Custom Styling: Modern UI with custom CSS styling and responsive design
Comprehensive Logging: Detailed application logging for debugging and monitoring
Configurable Parameters: Adjustable training hyperparameters and model settings

Project Structure

bert-classification-application/
├── app.py                    # Main Streamlit application entry point
├── config/                   # Configuration settings
│   ├── __init__.py
│   └── config.py            # App configuration and constants
├── data/                    # Data handling utilities
│   ├── __init__.py
│   └── data_loader.py       # CSV loading and preprocessing
├── models/                  # Model definition and training
│   ├── __init__.py
│   ├── model.py            # BERT classifier implementation
│   └── training.py         # Training logic and utilities
├── ui/                     # User interface components
│   ├── __init__.py
│   ├── about_page.py       # About page UI
│   ├── prediction_page.py  # Prediction interface
│   ├── styles.py           # Custom CSS and styling
│   └── training_page.py    # Training interface
├── utils/                  # Helper utilities
│   ├── __init__.py
│   ├── logger.py           # Logging configuration
│   └── visualization.py    # Charts and plots
├── tests/                  # Test suite
│   ├── __init__.py
│   └── test_model.py       # Model tests
├── trained_models/         # Saved model files (auto-created)
├── logs/                   # Application logs (auto-created)
├── requirements.txt        # Python dependencies
├── setup.py               # Package installation
├── Containerfile          # Container deployment
└── README.md              # This file

Installation

Prerequisites

Python 3.8 or higher
pip package manager

Setup

Clone the repository:

git clone <your-repository-url>
cd bert-classification-application

Create and activate a virtual environment:

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Starting the Application

Run the Streamlit app:

streamlit run app.py

Open your browser and navigate to:

http://localhost:8501

Training a Model

Navigate to "Upload & Train" page
Upload your dataset (CSV format with text and label columns)
Select your text and label columns from the dropdown menus
Configure training parameters:
- Model name (default: distilbert-base-uncased)
- Maximum sequence length
- Batch size
- Number of epochs
- Learning rate
Click "Train Model" and monitor the progress
View training results including confusion matrix and classification report

Making Predictions

Navigate to "Predict" page
Load a previously trained model from the dropdown
Enter text for classification
View predictions with confidence scores
Use batch prediction for multiple texts at once

Data Format

Your CSV file should contain at least two columns:

Text column: Contains the text data to be classified
Label column: Contains the target labels/categories

Example:

text,label
"This movie is amazing!",positive
"I didn't like this product",negative
"Great customer service",positive

Model Support

The application supports any BERT-based model from Hugging Face Hub:

distilbert-base-uncased (default, faster training)
bert-base-uncased
roberta-base
albert-base-v2
And many more...

Configuration

Key settings can be modified in config/config.py:

Model settings: Default model, max length, batch size
Training parameters: Epochs, learning rate, test split ratio
UI settings: Colors, layout, styling
Logging: Log level and format

Dependencies

torch: PyTorch deep learning framework
transformers: Hugging Face transformers library
streamlit: Web application framework
pandas: Data manipulation and analysis
scikit-learn: Machine learning utilities
matplotlib/seaborn: Data visualization
python-dotenv: Environment variable management

See requirements.txt for specific versions.

Container Deployment

Build and run using the provided Containerfile:

# Build container
podman build -t bert-classifier .

# Run container
podman run -p 8501:8501 bert-classifier

Logging

Application logs are automatically saved to logs/app.log. Log level can be configured via the LOG_LEVEL environment variable or in config/config.py.

Testing

Run the test suite:

pytest tests/

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

MIT License - see LICENSE file for details

Support

For issues and questions, please open an issue in the GitHub repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BERT Text Classification Application

Overview

Features

Project Structure

Installation

Prerequisites

Setup

Usage

Starting the Application

Training a Model

Making Predictions

Data Format

Model Support

Configuration

Dependencies

Container Deployment

Logging

Testing

Contributing

License

Support

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
data		data
models		models
tests		tests
ui		ui
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Containerfile		Containerfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.py		setup.py

devninja-in/bert-classification-application

Folders and files

Latest commit

History

Repository files navigation

BERT Text Classification Application

Overview

Features

Project Structure

Installation

Prerequisites

Setup

Usage

Starting the Application

Training a Model

Making Predictions

Data Format

Model Support

Configuration

Dependencies

Container Deployment

Logging

Testing

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages