End-to-end machine learning project for detecting anomalies in industrial sensor data using LSTM Autoencoder and Isolation Forest models, with MLflow experiment tracking and FastAPI deployment.
- Overview
- Architecture
- Project Structure
- Getting Started
- Pipeline
- Models
- API Deployment
- MLflow Tracking
- Testing
- Docker
Industrial machinery relies on vibration sensors to monitor bearing health. This project implements a complete anomaly detection pipeline that:
- Ingests vibration sensor data (NASA Bearing dataset format)
- Extracts time-domain and frequency-domain features
- Trains two complementary anomaly detection models
- Tracks experiments with MLflow
- Deploys as a REST API for real-time inference
- 🏗️ Production-ready architecture with clean separation of concerns
- 📊 Dual-model approach: Isolation Forest (fast, interpretable) + LSTM Autoencoder (temporal patterns)
- 🔬 Feature engineering: 13 features × N channels (RMS, kurtosis, spectral centroid, etc.)
- 📈 MLflow integration: Full experiment tracking with parameters, metrics, and artifacts
- 🚀 FastAPI deployment: REST API with batch prediction, health checks, and OpenAPI docs
- 🐳 Docker support: Containerized training and serving
- ✅ Comprehensive tests: Unit tests for all components
┌─────────────────────────────────────────────────────────────────┐
│ SENSOR ANOMALY DETECTION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ Raw Data │──▶│ Feature │──▶│ Model Training │ │
│ │ Ingestion│ │ Extraction │ │ (IF + LSTM-AE) │ │
│ └──────────┘ └──────────────┘ └──────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ MLflow │◀──│ Evaluate │◀──│ Trained Models │ │
│ │ Tracking │ │ & Compare │ │ │ │
│ └──────────┘ └──────────────┘ └──────────┬───────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────┐ │
│ │ FastAPI Service │ │
│ │ /predict │ │
│ │ /predict/batch │ │
│ └──────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
anomaly-detections/
├── src/
│ ├── data/
│ │ ├── ingestion.py # Data loading (NASA Bearing dataset)
│ │ └── preprocessing.py # Normalization, sequencing, splitting
│ ├── features/
│ │ └── extraction.py # Time & frequency domain features
│ ├── models/
│ │ ├── isolation_forest.py # Isolation Forest detector
│ │ ├── lstm_autoencoder.py # LSTM Autoencoder detector
│ │ └── train.py # Training pipeline with MLflow
│ └── api/
│ └── app.py # FastAPI deployment service
├── notebooks/
│ └── 01_eda_and_modeling.ipynb # Exploratory analysis
├── configs/
│ └── config.yaml # Pipeline configuration
├── tests/
│ ├── test_data.py # Data module tests
│ ├── test_features.py # Feature extraction tests
│ ├── test_models.py # Model tests
│ └── test_api.py # API endpoint tests
├── run_pipeline.py # Main entry point
├── Dockerfile # Container definition
├── docker-compose.yml # Multi-service orchestration
├── pyproject.toml # Project metadata & dependencies
└── README.md
- Python 3.10+
- pip or conda
# Clone the repository
git clone https://github.com/pelabdang/anomaly-detections.git
cd anomaly-detections
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -e ".[dev]"
pip install pyyaml httpx# Run the full pipeline (uses synthetic data by default)
python run_pipeline.py
# View results in MLflow
mlflow ui --port 5000
# Open http://localhost:5000
# Start the API
uvicorn src.api.app:app --reload --port 8000
# Open http://localhost:8000/docsThe training pipeline (run_pipeline.py) executes the following steps:
| Step | Description | Output |
|---|---|---|
| 1 | Load/generate vibration data | Multi-channel time series |
| 2 | Extract statistical & spectral features | Feature matrix (N×52) |
| 3 | Normalize & split (70/30) | Train/test sets |
| 4 | Train Isolation Forest | Anomaly scores & labels |
| 5 | Train LSTM Autoencoder | Reconstruction errors |
| 6 | Log to MLflow | Parameters, metrics, artifacts |
All pipeline parameters are configurable via configs/config.yaml:
model:
isolation_forest:
n_estimators: 200
contamination: 0.05
lstm_autoencoder:
sequence_length: 30
hidden_size: 64
epochs: 50
patience: 10- Type: Unsupervised ensemble method
- Strength: Fast training, interpretable scores, no temporal assumption
- Use case: Detect point anomalies in feature space
- Type: Deep learning reconstruction model
- Strength: Captures temporal dependencies, sensitive to subtle pattern changes
- Use case: Detect temporal anomalies and early degradation
| Model | Training Time | Interpretability | Temporal Awareness |
|---|---|---|---|
| Isolation Forest | ⚡ Fast | ✅ High | ❌ No |
| LSTM Autoencoder | 🐢 Slower | ✅ Yes |
The FastAPI service provides real-time anomaly detection:
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| POST | /predict |
Single prediction |
| POST | /predict/batch |
Batch predictions |
| GET | /model/info |
Model metadata |
| GET | /docs |
Swagger UI |
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{
"features": [0.15, 3.2, 0.01, 0.45, 3.1, 0.08, 0.006, 0.12, 0.35, 4500, 2100, 1200, 150000],
"timestamp": "2024-01-15T10:30:00"
}'{
"is_anomaly": false,
"anomaly_score": -0.42,
"timestamp": "2024-01-15T10:30:00"
}All experiments are tracked with MLflow:
# Start MLflow UI
mlflow ui --port 5000Tracked items:
- Model hyperparameters
- Training/test anomaly ratios
- Anomaly scores and reconstruction errors
- Model artifacts (saved weights)
- Training loss curves (LSTM)
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=src --cov-report=html
# Run specific test file
pytest tests/test_models.py -v# Full stack (train + API + MLflow UI)
docker compose up --build
# Just the API
docker compose up api
# Just training
docker compose run train| Service | Port | Description |
|---|---|---|
api |
8000 | Prediction API |
mlflow |
5000 | Experiment tracking UI |
train |
- | Training pipeline |
Synthetic industrial bearing vibration data simulating the full lifecycle:
| Parameter | Value |
|---|---|
| Snapshots | 500 time captures |
| Samples per snapshot | 2048 data points |
| Channels | 4 sensors (ch1–ch4) |
| Sample rate | 20,480 Hz |
| Simulated phases | Normal → Degradation → Failure |
| Extracted features | 52 dimensions (13 features per channel × 4 channels) |
For each channel, the following statistical and spectral features were extracted:
| Domain | Feature | Description |
|---|---|---|
| Time | RMS | Root Mean Square — signal energy |
| Time | Kurtosis | Signal impulsiveness |
| Time | Skewness | Distribution asymmetry |
| Time | Peak-to-Peak | Maximum peak-to-peak amplitude |
| Time | Crest Factor | Peak/RMS ratio |
| Time | Clearance Factor | Impact sensitivity |
| Time | Shape Factor | Waveform shape |
| Time | Impulse Factor | Peak detection |
| Time | Variance | Signal dispersion |
| Time | Mean Abs | Mean absolute value |
| Frequency | Spectral Centroid | Spectral center of mass |
| Frequency | Spectral Bandwidth | Spectral spread |
| Frequency | Dominant Frequency | Frequency with highest energy |
| Metric | Value |
|---|---|
| Algorithm | Isolation Forest (tree ensemble) |
| n_estimators | 200 |
| Contamination | 5% |
| Anomalies detected (train) | ~5% (as configured) |
| Anomalies detected (test) | ~30–35% (degradation + failure phase) |
| Training time | < 1 second |
| Failure detection accuracy | ~95% |
Observed behavior:
- Low (normal) anomaly scores during healthy operation phase
- Clear score transition starting at snapshot ~350 (onset of degradation)
- Sharp separation between normal and anomalous distributions in the score histogram
| Metric | Value |
|---|---|
| Architecture | Encoder LSTM → Decoder LSTM |
| Hidden size | 64 units |
| Num layers | 2 layers |
| Sequence length | 30 timesteps |
| Epochs | 50 (with early stopping, patience=10) |
| Batch size | 32 |
| Threshold | 95th percentile of reconstruction error on training data |
| Anomalies detected (test) | ~25–30% |
| Early detection | 10–15% of snapshots ahead of Isolation Forest |
Observed behavior:
- Low and stable reconstruction error during normal operation
- Gradual error increase before the declared failure phase (early detection)
- Convergent training loss curve with effective early stopping
| Aspect | Isolation Forest | LSTM Autoencoder |
|---|---|---|
| Training time | ⚡ < 1s | 🐢 ~2-5 min |
| Severe failure detection | ✅ 95% | ✅ 95% |
| Early degradation detection | ✅ Superior (+10-15%) | |
| Interpretability | ✅ High (isolation scores) | |
| Temporal awareness | ❌ No | ✅ Yes |
| Production use (inference) | ⚡ ~1ms | 🐢 ~10ms |
-
Complementary approach: The two models detect complementary aspects — Isolation Forest identifies point anomalies in the feature space, while the LSTM captures temporal patterns of progressive degradation.
-
Early detection: The LSTM Autoencoder detects degradation signals 10-15% of snapshots earlier than Isolation Forest, as it captures temporal dependencies that indicate subtle changes in vibration patterns.
-
Ensemble confidence: When both models agree on an anomaly classification, there is high confidence in the diagnosis. Disagreements indicate transition states (early degradation).
-
Production viability: Isolation Forest is ideal for real-time inference (< 1ms), while the LSTM can be used for batch analyses with higher sensitivity.
# Install dependencies
pip install -e ".[dev]"
pip install pyyaml httpx
# Run the full pipeline (generates synthetic data + trains + evaluates)
python run_pipeline.py
# View experiments in MLflow
mlflow ui --port 5000
# Open http://localhost:5000
# Run the notebook with detailed visualizations
jupyter notebook notebooks/01_eda_and_modeling.ipynb- Add real NASA Bearing dataset download script
- Implement MIMII (audio) dataset support
- Add Grafana dashboard for monitoring
- Implement online learning for model updates
- Add model comparison visualization
- Kubernetes deployment manifests
MIT License - see LICENSE for details.