An enterprise-level text classification application using BERT transformers with a modern Streamlit web interface.
This application provides a comprehensive solution for training and deploying BERT models for text classification tasks. Built with Streamlit and leveraging Hugging Face's transformers library, it offers an intuitive web interface for both technical and non-technical users.
- Interactive Training: Upload CSV datasets and train customized BERT models through a user-friendly web interface
- Real-time Prediction: Classify new text using trained models with instant results
- Batch Processing: Run predictions on multiple texts simultaneously
- Model Management: Save, load, and manage trained models with automatic persistence
- Data Visualization: View data distributions, confusion matrices, and model performance metrics
- Custom Styling: Modern UI with custom CSS styling and responsive design
- Comprehensive Logging: Detailed application logging for debugging and monitoring
- Configurable Parameters: Adjustable training hyperparameters and model settings
bert-classification-application/
├── app.py # Main Streamlit application entry point
├── config/ # Configuration settings
│ ├── __init__.py
│ └── config.py # App configuration and constants
├── data/ # Data handling utilities
│ ├── __init__.py
│ └── data_loader.py # CSV loading and preprocessing
├── models/ # Model definition and training
│ ├── __init__.py
│ ├── model.py # BERT classifier implementation
│ └── training.py # Training logic and utilities
├── ui/ # User interface components
│ ├── __init__.py
│ ├── about_page.py # About page UI
│ ├── prediction_page.py # Prediction interface
│ ├── styles.py # Custom CSS and styling
│ └── training_page.py # Training interface
├── utils/ # Helper utilities
│ ├── __init__.py
│ ├── logger.py # Logging configuration
│ └── visualization.py # Charts and plots
├── tests/ # Test suite
│ ├── __init__.py
│ └── test_model.py # Model tests
├── trained_models/ # Saved model files (auto-created)
├── logs/ # Application logs (auto-created)
├── requirements.txt # Python dependencies
├── setup.py # Package installation
├── Containerfile # Container deployment
└── README.md # This file
- Python 3.8 or higher
- pip package manager
- Clone the repository:
git clone <your-repository-url>
cd bert-classification-application- Create and activate a virtual environment:
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Run the Streamlit app:
streamlit run app.py- Open your browser and navigate to:
http://localhost:8501
- Navigate to "Upload & Train" page
- Upload your dataset (CSV format with text and label columns)
- Select your text and label columns from the dropdown menus
- Configure training parameters:
- Model name (default: distilbert-base-uncased)
- Maximum sequence length
- Batch size
- Number of epochs
- Learning rate
- Click "Train Model" and monitor the progress
- View training results including confusion matrix and classification report
- Navigate to "Predict" page
- Load a previously trained model from the dropdown
- Enter text for classification
- View predictions with confidence scores
- Use batch prediction for multiple texts at once
Your CSV file should contain at least two columns:
- Text column: Contains the text data to be classified
- Label column: Contains the target labels/categories
Example:
text,label
"This movie is amazing!",positive
"I didn't like this product",negative
"Great customer service",positive
The application supports any BERT-based model from Hugging Face Hub:
distilbert-base-uncased(default, faster training)bert-base-uncasedroberta-basealbert-base-v2- And many more...
Key settings can be modified in config/config.py:
- Model settings: Default model, max length, batch size
- Training parameters: Epochs, learning rate, test split ratio
- UI settings: Colors, layout, styling
- Logging: Log level and format
- torch: PyTorch deep learning framework
- transformers: Hugging Face transformers library
- streamlit: Web application framework
- pandas: Data manipulation and analysis
- scikit-learn: Machine learning utilities
- matplotlib/seaborn: Data visualization
- python-dotenv: Environment variable management
See requirements.txt for specific versions.
Build and run using the provided Containerfile:
# Build container
podman build -t bert-classifier .
# Run container
podman run -p 8501:8501 bert-classifierApplication logs are automatically saved to logs/app.log. Log level can be configured via the LOG_LEVEL environment variable or in config/config.py.
Run the test suite:
pytest tests/- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details
For issues and questions, please open an issue in the GitHub repository.