Predictive Maintaince

This is a comprehensive machine learning project focused on engine health classification. We utilize NASA's CMAPSS dataset (turbofan engine degradation) to perform predictive maintenance. By reframing Remaining Useful Life (RUL) forecasting into a multiclass classification task, we map operational settings and 21 sensor readings to three machine health states:

Good (0)
Moderate (1)
Warning (2)

Overview

The core objective is to detect patterns of equipment degradation early on. The CMAPSS simulation models turbofan engines across different core elements (HPC, LPC, HPT, LPT). Instead of relying on traditional, scheduled maintenance, this project leverages sensor data to proactively alert operators about engine health.

Included in this repository are pre-processed data splits and Jupyter notebooks designed to build, train, and test models capable of reliable multiclass prediction.

Dataset Structure

Our dataset is stored as CSV files, containing the following schema:

ID (internal tracking) — Engine unit identifier.
Cycle — Current time step/observation sequence.
OpSet1, OpSet2, OpSet3 — Operational condition variables.
SensorMeasure1 to 21 — Captured sensor readings.
labels — Target category (0, 1, or 2) used for training/testing.

Data files available (split into 4 primary partitions):

Training_1_all_features.csv to Training_4_all_features.csv
Test_classification_1.csv to Test_classification_4.csv

(Note: Raw text logs generated by the CMAPSS simulation are loaded in the notebook and processed into these analytical CSVs).

Classification Logic

To perform classification, we convert continuous RUL (Remaining Useful Life) into a discrete Life Ratio (LR) metric:

LR = Current Cycle / End of Life (EOL)

The final target mapping is defined as:

Good (0): LR is up to 0.60
Moderate (1): LR is above 0.60 and up to 0.80
Warning (2): LR is greater than 0.80

The labels column present in our datasets reflects this configuration.

Project Files

Classification.ipynb — Contains the main workflow: EDA, feature engineering, and model training.
making_test_data.ipynb — Helper notebook for assembling and verifying test partitions.
Training Data — Training_1_all_features.csv to Training_4_all_features.csv (features and truth labels).
Test Data — Test_classification_1.csv to Test_classification_4.csv (features and truth labels).

Installation

Recommended environment: Python 3.9 or higher.

To set up a local virtual environment:

Windows (PowerShell):

python -m venv venv
.\venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install jupyterlab pandas numpy scikit-learn matplotlib seaborn

Unix/macOS (bash):

python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install jupyterlab pandas numpy scikit-learn matplotlib seaborn

Getting Started

Launch the Jupyter environment:

jupyter lab

Open the Classification.ipynb notebook and execute the cells sequentially.

This will import the dataset.
Process EOL and LR metrics.
Generate the discrete health classes.
Train machine learning models and evaluate performance on the test split.

(Optional) Run making_test_data.ipynb if you wish to study the data preparation process or tweak test partitions.

Example Code (Python)

Here is a fast way to initialize a Logistical Regression baseline using the first partition:

import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Load files
train_set = pd.read_csv("Training_1_all_features.csv")
test_set  = pd.read_csv("Test_classification_1.csv")

# Define columns
features = ["OpSet1", "OpSet2", "OpSet3"] + [f"SensorMeasure{i}" for i in range(1, 22)]
X_tr, y_tr = train_set[features], train_set["labels"]
X_te, y_te = test_set[features],  test_set["labels"]

# Train/Evaluate Baseline
model = Pipeline([
    ("scaler", StandardScaler()),
    ("classifier", LogisticRegression(max_iter=300))
])
model.fit(X_tr, y_tr)
print(classification_report(y_te, model.predict(X_te)))

Modeling Approach

Input Space: Includes 3 operational inputs and 21 standard sensors. Feature scaling is highly recommended.
Temporal Dynamics: While these CSVs consist of individual snapshots, incorporating rolling statistics (like moving averages on features) can improve predictive power.
Algorithm Selection: This multiclass scenario is well-suited for Random Forests, LightGBM/XGBoost, SVM, and deep learning models.
Scalability: The initial script works on single partitions, but multiple partitions can be concatenated to expose the model to a greater variance of fault types.

Evaluation Metrics

Given the nature of predictive maintenance, assessing the frequency of False Negatives (predicting "Good" when "Warning" is imminent) is critical. Use standard multi-class metrics including:

Overall Accuracy
F1-score (macro and micro)
Precision and Recall profiles
Confusion Matrix visualization

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Classification.ipynb		Classification.ipynb
README.md		README.md
Test_classification_1.csv		Test_classification_1.csv
Test_classification_2.csv		Test_classification_2.csv
Test_classification_3.csv		Test_classification_3.csv
Test_classification_4.csv		Test_classification_4.csv
Training_1_all_features.csv		Training_1_all_features.csv
Training_2_all_features.csv		Training_2_all_features.csv
Training_3_all_features.csv		Training_3_all_features.csv
Training_4_all_features.csv		Training_4_all_features.csv
making_test_data.ipynb		making_test_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Maintaince

Table of contents

Overview

Dataset Structure

Classification Logic

Project Files

Installation

Getting Started

Example Code (Python)

Modeling Approach

Evaluation Metrics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predictive Maintaince

Table of contents

Overview

Dataset Structure

Classification Logic

Project Files

Installation

Getting Started

Example Code (Python)

Modeling Approach

Evaluation Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages