Semi-supervised classification of equipment breakdown events using a small labeled set to guide label propagation across a large unlabeled dataset.
- 1 600 breakdown events recorded across 20 sensors
- Only 40 events are labeled (3 failure types); the remaining 1 560 are unlabeled
- Goal: classify all events into Failure 1, Failure 2, or Failure 3
With only 40 labeled points, training a model on all 20 sensors risks noise dominating signal. For each sensor we compute the point-biserial correlation against each failure-type indicator (one binary column per class). Sensors whose maximum absolute correlation across all three classes exceeds a threshold of 0.30 are kept.
This reduces the feature space to 4 sensors (Sensor 0, Sensor 2, Sensor 8, Sensor 9) that carry the clearest discriminative signal. The threshold was chosen via a sweep over the labeled data; below 0.30 noisy sensors are included and LOO F1 drops.
LabelSpreading (scikit-learn) propagates the 40 known labels across the full 1 600-point graph using an RBF kernel. The graph connects every point to every other point weighted by feature similarity; labels flow from labeled nodes to unlabeled nodes iteratively.
Key hyperparameters (tuned via LOO cross-validation):
| Parameter | Value | Meaning |
|---|---|---|
kernel |
rbf |
Similarity metric between points |
gamma |
1.0 | RBF bandwidth |
alpha |
0.4 | Label clamping strength (0 = hard labels, 1 = free propagation) |
Standard train/test splits are not viable with only 40 labeled points. Instead, we use LOO-CV: for each of the 40 labeled events we temporarily hide its label, refit the model on the remaining 39 labeled + 1 560 unlabeled points, and record the prediction. This gives an honest estimate of generalization without wasting any labeled data.
LOO results (macro F1 = 0.686):
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Failure 1 | 1.00 | 0.60 | 0.75 | 10 |
| Failure 2 | 0.80 | 0.40 | 0.53 | 10 |
| Failure 3 | 0.66 | 0.95 | 0.78 | 20 |
| Macro | 0.82 | 0.65 | 0.69 | 40 |
Failure 2 has the weakest recall — it is the hardest class to distinguish from the others given the available labeled examples.
After evaluation, the model is retrained on all 1 600 points (40 labeled + 1 560 unlabeled) to propagate labels across the full dataset.
Propagated distribution:
| Label | Count |
|---|---|
| Failure 1 | 447 |
| Failure 2 | 442 |
| Failure 3 | 711 |
sensor_failure_analysis/
├── data/
│ └── data_sensors.csv # 1 600 events × 20 sensors
├── pipelines/
│ ├── config.py # paths, thresholds, hyperparameters
│ └── main_pipeline.py # end-to-end orchestration
├── src/
│ ├── data_processing/
│ │ ├── loader.py # CSV loading, label parsing, scaling
│ │ └── feature_selection.py # correlation matrix, sensor selection
│ └── model_training/
│ ├── train.py # LabelSpreading fit + LOO-CV
│ └── evaluate.py # metrics, MLflow logging
└── experiments/ # exploratory notebooks and plots
Install dependencies:
poetry installRun the pipeline:
poetry run python pipelines/main_pipeline.pyView experiment results in MLflow:
poetry run mlflow ui --backend-store-uri sqlite:///mlflow.dbThen open http://localhost:5000.
| Decision | Alternative considered | Reason chosen |
|---|---|---|
| Label Spreading | k-means / PCA clustering | Unsupervised methods ignore the 40 labels; semi-supervised approach directly uses them |
| Point-biserial correlation for feature selection | PCA, mutual information | Interpretable, stable with small labeled sets, directly measures per-class discriminability |
| LOO-CV | k-fold CV | With only 40 labeled points, LOO maximises training data per fold and gives the most reliable estimate |
| StandardScaler on selected sensors only | Scale all 20 sensors | Avoids scaling noise sensors that were discarded |