Skip to content

avyasaini/Predictive-Maintenance

Repository files navigation

Predictive Maintaince

This is a comprehensive machine learning project focused on engine health classification. We utilize NASA's CMAPSS dataset (turbofan engine degradation) to perform predictive maintenance. By reframing Remaining Useful Life (RUL) forecasting into a multiclass classification task, we map operational settings and 21 sensor readings to three machine health states:

  • Good (0)
  • Moderate (1)
  • Warning (2)

Table of contents

  • Overview
  • Dataset Structure
  • Classification Logic
  • Project Files
  • Installation
  • Getting Started
  • Example Code (Python)
  • Modeling Approach
  • Evaluation Metrics

Overview

The core objective is to detect patterns of equipment degradation early on. The CMAPSS simulation models turbofan engines across different core elements (HPC, LPC, HPT, LPT). Instead of relying on traditional, scheduled maintenance, this project leverages sensor data to proactively alert operators about engine health.

Included in this repository are pre-processed data splits and Jupyter notebooks designed to build, train, and test models capable of reliable multiclass prediction.

Dataset Structure

Our dataset is stored as CSV files, containing the following schema:

  • ID (internal tracking) — Engine unit identifier.
  • Cycle — Current time step/observation sequence.
  • OpSet1, OpSet2, OpSet3 — Operational condition variables.
  • SensorMeasure1 to 21 — Captured sensor readings.
  • labels — Target category (0, 1, or 2) used for training/testing.

Data files available (split into 4 primary partitions):

  • Training_1_all_features.csv to Training_4_all_features.csv
  • Test_classification_1.csv to Test_classification_4.csv

(Note: Raw text logs generated by the CMAPSS simulation are loaded in the notebook and processed into these analytical CSVs).

Classification Logic

To perform classification, we convert continuous RUL (Remaining Useful Life) into a discrete Life Ratio (LR) metric:

LR = Current Cycle / End of Life (EOL)

The final target mapping is defined as:

  • Good (0): LR is up to 0.60
  • Moderate (1): LR is above 0.60 and up to 0.80
  • Warning (2): LR is greater than 0.80

The labels column present in our datasets reflects this configuration.

Project Files

  • Classification.ipynb — Contains the main workflow: EDA, feature engineering, and model training.
  • making_test_data.ipynb — Helper notebook for assembling and verifying test partitions.
  • Training Data — Training_1_all_features.csv to Training_4_all_features.csv (features and truth labels).
  • Test Data — Test_classification_1.csv to Test_classification_4.csv (features and truth labels).

Installation

Recommended environment: Python 3.9 or higher.

To set up a local virtual environment:

Windows (PowerShell):

python -m venv venv
.\venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install jupyterlab pandas numpy scikit-learn matplotlib seaborn

Unix/macOS (bash):

python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install jupyterlab pandas numpy scikit-learn matplotlib seaborn

Getting Started

  1. Launch the Jupyter environment:
jupyter lab
  1. Open the Classification.ipynb notebook and execute the cells sequentially.
  • This will import the dataset.
  • Process EOL and LR metrics.
  • Generate the discrete health classes.
  • Train machine learning models and evaluate performance on the test split.
  1. (Optional) Run making_test_data.ipynb if you wish to study the data preparation process or tweak test partitions.

Example Code (Python)

Here is a fast way to initialize a Logistical Regression baseline using the first partition:

import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Load files
train_set = pd.read_csv("Training_1_all_features.csv")
test_set  = pd.read_csv("Test_classification_1.csv")

# Define columns
features = ["OpSet1", "OpSet2", "OpSet3"] + [f"SensorMeasure{i}" for i in range(1, 22)]
X_tr, y_tr = train_set[features], train_set["labels"]
X_te, y_te = test_set[features],  test_set["labels"]

# Train/Evaluate Baseline
model = Pipeline([
    ("scaler", StandardScaler()),
    ("classifier", LogisticRegression(max_iter=300))
])
model.fit(X_tr, y_tr)
print(classification_report(y_te, model.predict(X_te)))

Modeling Approach

  • Input Space: Includes 3 operational inputs and 21 standard sensors. Feature scaling is highly recommended.
  • Temporal Dynamics: While these CSVs consist of individual snapshots, incorporating rolling statistics (like moving averages on features) can improve predictive power.
  • Algorithm Selection: This multiclass scenario is well-suited for Random Forests, LightGBM/XGBoost, SVM, and deep learning models.
  • Scalability: The initial script works on single partitions, but multiple partitions can be concatenated to expose the model to a greater variance of fault types.

Evaluation Metrics

Given the nature of predictive maintenance, assessing the frequency of False Negatives (predicting "Good" when "Warning" is imminent) is critical. Use standard multi-class metrics including:

  • Overall Accuracy
  • F1-score (macro and micro)
  • Precision and Recall profiles
  • Confusion Matrix visualization

About

Predictive Maintenance reframes RUL forecasting on NASA CMAPSS into a 3-class engine health classifier (Good/Moderate/Warning). Using 3 operational settings and 21 sensor readings, the project provides cleaned CSV splits, Jupyter notebooks for feature engineering and modeling, and examples (Logistic Regression baseline).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors