flight_state estimator Using ML
A comprehensive machine learning project for bearing condition monitoring using both Classical ML and Deep Learning approaches on the industry-standard CWRU Bearing Dataset.
This project implements dual approaches for bearing fault detection:
Classical Machine Learning:
- β Manual feature extraction (time + frequency domain)
- β Traditional ML models (Random Forest, SVM, Gradient Boosting)
- β Fast training (~30 seconds)
- β 92-98% accuracy
Deep Learning (1D CNN):
- β Automatic feature learning from raw signals
- β End-to-end learning pipeline
- β Training time (~3-5 minutes)
- β 95-99% accuracy
Real-World Application: Predictive maintenance, condition monitoring, industrial fault detection
Bearing Conditions:
- π’ Normal - Healthy bearing
- π΄ Inner Race Fault - Inner race damage
- π Outer Race Fault - Outer race damage
bearing-fault-ml/
β
βββ main.py # Main entry (supports both approaches)
βββ data_loader.py # CWRU dataset loader
βββ feature_extractor.py # Feature engineering (for ML)
βββ models.py # Classical ML models
βββ visualizer.py # Results visualization
βββ deep_learning_model.py # 1D CNN implementation
βββ requirements.txt # Dependencies
βββ README.md # This file
β
βββ data/
β βββ cwru/ # CWRU dataset
β βββ normal/
β βββ inner_fault/
β βββ outer_fault/
β
βββ results/ # Auto-generated
βββ YYYY-MM-DD_HH-MM-SS/
βββ results.png # ML comparison
βββ confusion_matrix.png # Confusion matrices
βββ cnn_training_history.png # CNN training curves
βββ metrics.csv # Metrics
βββ config.txt # Configuration
βββ cnn_model.h5 # Saved CNN model
# Clone repository
git clone https://github.com/ak5hay19/bearing-fault-ml.git
cd bearing-fault-ml
# Install dependencies
pip install -r requirements.txt
# For Deep Learning (optional)
pip install tensorflowVisit: https://engineering.case.edu/bearingdatacenter
Download 12 files (right-click β Save Link As):
| File | Motor Load | Location |
|---|---|---|
| 97.mat | 0 HP | Normal Baseline β 12k Drive End |
| 98.mat | 1 HP | Normal Baseline β 12k Drive End |
| 99.mat | 2 HP | Normal Baseline β 12k Drive End |
| 100.mat | 3 HP | Normal Baseline β 12k Drive End |
| File | Motor Load | Location |
|---|---|---|
| 105.mat | 0 HP | 12k Drive End β Inner Race β 0.007" |
| 106.mat | 1 HP | 12k Drive End β Inner Race β 0.007" |
| 107.mat | 2 HP | 12k Drive End β Inner Race β 0.007" |
| 108.mat | 3 HP | 12k Drive End β Inner Race β 0.007" |
| File | Motor Load | Location |
|---|---|---|
| 130.mat | 0 HP | 12k Drive End β Outer Race β 0.007" β @6:00 |
| 131.mat | 1 HP | 12k Drive End β Outer Race β 0.007" β @6:00 |
| 132.mat | 2 HP | 12k Drive End β Outer Race β 0.007" β @6:00 |
| 133.mat | 3 HP | 12k Drive End β Outer Race β 0.007" β @6:00 |
# Create folder structure
mkdir -p data/cwru/normal
mkdir -p data/cwru/inner_fault
mkdir -p data/cwru/outer_fault
# Place downloaded files:
# data/cwru/normal/ β 97-100.mat
# data/cwru/inner_fault/ β 105-108.mat
# data/cwru/outer_fault/ β 130-133.matEdit main.py (lines 17-18):
# For Classical ML only (fast)
USE_DEEP_LEARNING = False
USE_CLASSICAL_ML = True
# For Deep Learning only
USE_DEEP_LEARNING = True
USE_CLASSICAL_ML = False
# For comparison (both)
USE_DEEP_LEARNING = True
USE_CLASSICAL_ML = Truepython main.pyPipeline:
Raw Signal (2048 points)
β
Feature Extraction (11 features)
β’ Time: Mean, Std, RMS, Peak, Kurtosis, Skewness
β’ Frequency: Dominant freq, Energy bands, Spectral centroid
β
Feature Scaling (StandardScaler)
β
ML Models (RF, SVM, GB)
β
Predictions
Models:
- Random Forest: 100 trees, robust ensemble
- SVM: RBF kernel, high-dimensional classification
- Gradient Boosting: Sequential learning, highest accuracy
Performance:
- Training time: 5-10 seconds
- Accuracy: 92-98%
- Interpretability: High β
Pipeline:
Raw Signal (2048 points)
β
1D CNN Architecture:
β’ Conv1D(64) β BatchNorm β MaxPool β Dropout
β’ Conv1D(128) β BatchNorm β MaxPool β Dropout
β’ Conv1D(256) β BatchNorm β MaxPool β Dropout
β’ Conv1D(128) β BatchNorm β GlobalAvgPool
β’ Dense(128) β Dropout
β’ Dense(64) β Dropout
β’ Dense(3, softmax)
β
Predictions
CNN Details:
- Input: Raw vibration signals (no preprocessing)
- Total Parameters: ~487,000
- Optimizer: Adam (lr=0.001)
- Loss: Categorical crossentropy
- Callbacks: Early stopping, learning rate reduction
Performance:
- Training time: 2-5 minutes
- Accuracy: 95-99%
- Automatic feature learning β
| Metric | Classical ML | Deep Learning |
|---|---|---|
| Accuracy | 92-98% | 95-99% |
| Training Time | 5-10 sec β‘ | 2-5 min π’ |
| Feature Engineering | Manual | Automatic |
| Interpretability | High | Low |
| Model Size | <1 MB | 5-10 MB |
| GPU Required | No | Recommended |
| Data Required | Low | Medium |
| Best For | Quick results | Max accuracy |
[4/5] Training models...
Training Random Forest... Done (2.34s)
Training SVM... Done (1.87s)
Training Gradient Boosting... Done (3.45s)
[5/5] Evaluating...
Random Forest : Accuracy = 0.9456 (94.56%)
SVM : Accuracy = 0.9523 (95.23%)
Gradient Boosting : Accuracy = 0.9612 (96.12%)
β Best Classical Model: Gradient Boosting (96.12%)
[2/4] Building and training 1D CNN...
Input shape: (2048, 1)
Number of classes: 3
Model: "sequential"
Total params: 487,235
Trainable params: 486,083
Epoch 1/100
58/58 [======] - 2s - loss: 0.8234 - accuracy: 0.6543 - val_accuracy: 0.7012
...
Epoch 35/100
58/58 [======] - 1s - loss: 0.0234 - accuracy: 0.9921 - val_accuracy: 0.9812
[3/4] Evaluating CNN...
β CNN Test Accuracy: 0.9823 (98.23%)
======================================================================
β RESULTS SUMMARY
======================================================================
π Classical ML:
Best Model: Gradient Boosting
Accuracy: 0.9612 (96.12%)
π§ Deep Learning:
CNN Accuracy: 0.9823 (98.23%)
π Output: results/2025-01-10_14-30-45/
======================================================================
Bar charts comparing:
- Model accuracies
- Training times
- Percentage view
Confusion matrices for all ML models showing classification errors
CNN training curves:
- Training vs validation accuracy
- Training vs validation loss
- Early stopping point
Model,Accuracy,Accuracy_Percentage,Training_Time_seconds
Random Forest,0.9456,94.56,2.34
SVM,0.9523,95.23,1.87
Gradient Boosting,0.9612,96.12,3.45Complete configuration and dataset statistics