A privacy-by-design federated learning framework for anomaly detection in smart buildings using stacked Long Short-Term Memory (LSTM) networks. This repository implements the FSLSTM model that enables IoT sensors to collaboratively learn for anomaly detection while preserving data privacy through secure multi-party computation.
Keywords: federated learning, anomaly detection, smart buildings, IoT sensors, LSTM, privacy preservation, machine learning, deep learning
Our framework operates on comprehensive smart building infrastructures equipped with diverse IoT sensor networks including:
- π‘ Lighting Control Systems - Smart occupancy-based lighting automation
- π‘οΈ HVAC Systems - Intelligent heating, ventilation, and air conditioning control
- πΉ Security Cameras - Building surveillance and access monitoring
- π₯ Fire Suppression - Real-time fire detection and suppression systems
- π§ Water Management - Leak detection and water usage optimization
- πͺ Building Access Control - Smart entry and security management
Our federated stacked LSTM approach achieves state-of-the-art performance compared to centralized and federated baselines:
| Model | Precision | Recall | F1-Score | Balanced Accuracy | MAE | MSE | RMSE |
|---|---|---|---|---|---|---|---|
| FSLSTM (Ours) | 0.89 | 0.79 | 0.87 | 0.90 | 0.162 | 0.19 | 0.435 |
| FGRU | 0.84 | 0.66 | 0.59 | 0.80 | 0.211 | 0.29 | 0.538 |
| FLR | 0.65 | 0.71 | 0.70 | 0.69 | 0.339 | 0.34 | 0.583 |
| LSTM | 0.66 | 0.61 | 0.58 | 0.71 | 0.243 | 0.33 | 0.574 |
| LR | 0.57 | 0.60 | 0.52 | 0.72 | 0.341 | 0.48 | 0.692 |
Our evaluation encompasses 180 IoT sensors across five critical building systems:
- Lighting Systems: 86 sensors (47.8%)
- Occupancy Detection: 46 sensors (25.6%)
- HVAC Thermostats: 23 sensors (12.8%)
- Water Leakage Detection: 16 sensors (8.9%)
- Building Access Control: 9 sensors (5.0%)
Key Performance Highlights:
- π AUC Score: 0.90 - Superior classification performance
- β‘ 2x Faster Convergence - Compared to centralized LSTM training
- π Privacy-Preserving - No raw sensor data leaves local devices
- π 90% Balanced Accuracy - Robust performance across imbalanced datasets
| Method | Collective Anomalies | Contextual Anomalies |
|---|---|---|
| Correct (%) | False (%) | |
| FSLSTM | 88 | 9 |
| FGRU | 74 | 12 |
| FLR | 65 | 21 |
| LSTM | 66 | 33 |
| LR | 56 | 54 |
FSLSTM demonstrates remarkable training efficiency:
- π― Stable Convergence: Reaches optimal performance in ~20 epochs
- π Smooth Loss Curves: Less fluctuation compared to centralized approaches
- β±οΈ Fast Training: 2x faster than centralized LSTM on identical datasets
- π Consistent Performance: Reliable convergence across multiple runs
Scalability Performance Insights:
- π Linear Scalability: Training time scales efficiently with sensor count
- π FSLSTM Advantage: Consistently outperforms FGRU and centralized LSTM
- βοΈ Optimal Performance: Best efficiency achieved with 160-200 sensors
- π§ Practical Deployment: Suitable for large-scale IoT deployments
Privacy-by-Design Implementation:
- π Local Training: Each sensor trains on private data locally
- π‘ Secure Aggregation: Only model parameters are shared via encrypted channels
- π― Pattern Recognition: Global model learns from distributed patterns
β οΈ Anomaly Detection: Real-time classification with threshold determination- π’ BAS Integration: Seamless integration with Building Automation Systems
Significant Communication Overhead Reduction:
- π FSLSTM: ~80 MB communication cost (83% reduction vs. centralized LSTM)
- π Federated Advantage: Dramatically lower bandwidth requirements
- πΎ Scalable Design: Cost remains manageable with increasing clients
- π Privacy Benefit: No raw data transmission required
Outstanding Regression Performance:
- π 90% Prediction Accuracy for building energy consumption
- π Real-time Monitoring: 600-minute prediction windows
- π‘ Smart Optimization: Enables proactive energy management
- π Pattern Recognition: Captures complex temporal dependencies
Advanced Anomaly Detection Capabilities:
- β‘ Real-time Detection: Immediate identification of anomalous patterns
- π― Multi-sensor Monitoring: Simultaneous tracking across sensor types
- π Peak Detection: Automatic identification of unusual energy spikes
- π‘οΈ Contextual Analysis: Temperature and occupancy correlation
- π§ Smart Alerts: Proactive maintenance and fault prevention
Our comprehensive evaluation utilizes three real-world datasets from General Electric Current smart building IoT production systems:
- π Sensor Event Log Dataset: 1M+ event logs from 180 sensors over 4 months
- β‘ Energy Usage Dataset: Electricity consumption data aggregated every 15 minutes
- π€οΈ Weather API Dataset: Environmental data (temperature, humidity, pressure, solar radiation)
Data Processing Pipeline:
- π° Temporal Window: 600-minute sequences (10-hour windows)
- π Sequence Length: 60 timesteps (1-hour LSTM input sequences)
- π Data Split: 80% training, 10% validation, 10% testing
- π― Multi-Task Support: Classification (anomaly detection) + Regression (energy prediction)
Stacked LSTM Configuration:
- π 3 LSTM Layers: Hierarchical feature learning
- π§ 128 Hidden Units: Per layer (configurable)
- π§ Fully Connected: 100-unit dense layer
- β‘ Activation Functions: Sigmoid (classification) / Linear (regression)
- π‘οΈ Dropout Regularization: 20% rate for overfitting prevention
Federated Learning Process:
- π― Client Selection: Random sampling of 36 sensors per round (20% participation)
- π± Local Training: 5 epochs on private sensor data
- π Secure Aggregation: Encrypted parameter sharing via FedAvg
- π Global Update: Weighted averaging based on client data sizes
- π Iterative Process: 50 communication rounds for convergence
Our federated approach significantly outperforms traditional centralized and federated baselines across all evaluation metrics:
π Classification Performance Improvements:
- +29 percentage points F1-Score improvement over centralized LSTM
- +19 percentage points Balanced Accuracy gain over centralized LSTM
- +10 percentage points Balanced Accuracy improvement over FGRU
- +18 AUC points better ROC performance than centralized LSTM
π Regression Performance Superiority:
- 33% lower MAE compared to centralized LSTM
- 42% reduction in MSE versus centralized LSTM
- 24% lower RMSE than FGRU baseline
- Superior energy prediction with 90% accuracy
Convergence Speed Comparison:
- FSLSTM: Converges in ~20 epochs (2 hours)
- Centralized LSTM: Requires ~50 epochs (6 hours)
- FGRU: Similar federated efficiency but lower accuracy
- Communication Rounds: 50 rounds optimal for stable performance
- Privacy-Preserving: Federated learning approach that keeps sensor data local
- Multi-Task Learning: Simultaneous learning across multiple sensor types
- Fast Convergence: 2x faster training convergence compared to centralized LSTM
- Comprehensive Evaluation: Support for both classification and regression tasks
- Real-World Datasets: Evaluated on IoT production systems from smart buildings
- Secure Aggregation: Built-in privacy protection mechanisms
- Python 3.7 or higher for machine learning research
- CUDA-compatible GPU (recommended for federated training)
git clone https://github.com/your-username/FSLSTM.git
cd FSLSTM
pip install -e .pip install fslstmpip install torch>=1.7.0
pip install numpy>=1.19.0
pip install pandas>=1.2.0
pip install scikit-learn>=0.24.0
pip install matplotlib>=3.3.0
pip install seaborn>=0.11.0
pip install tqdm>=4.60.0
pip install pysyft>=0.5.0
pip install tensorboard>=2.4.0from fslstm import FSLSTMTrainer, DataLoader
from fslstm.config import Config
# Load configuration for smart building anomaly detection
config = Config.from_file("configs/smart_building.yaml")
# Prepare IoT sensor data for federated learning
data_loader = DataLoader(config)
train_data, test_data = data_loader.load_sensor_data()
# Initialize federated learning trainer
trainer = FSLSTMTrainer(config)
# Train the FSLSTM model using federated approach
trainer.fit(train_data)
# Evaluate anomaly detection performance
results = trainer.evaluate(test_data)
print(f"Balanced Accuracy: {results['balanced_accuracy']:.4f}")
print(f"F1 Score: {results['f1_score']:.4f}")# Train FSLSTM model for smart building anomaly detection
python scripts/train.py --config configs/smart_building.yaml
# Evaluate trained federated learning model
python scripts/evaluate.py --model_path checkpoints/fslstm_best.pth --data_path data/test/
# Run complete federated learning pipeline
python scripts/run_pipeline.py --config configs/smart_building.yamlsensor_data/
βββ sensor_events.csv
βββ energy_usage.csv
βββ weather_api.csv
Sensor Events (sensor_events.csv):
timestamp,sensor_id,sensor_type,value,status,zone_id
2019-05-01 08:00:00,S001,occupancy,1,normal,Zone_A
2019-05-01 08:01:00,S002,temperature,22.5,normal,Zone_BEnergy Usage (energy_usage.csv):
timestamp,sensor_id,energy_consumption,appliance_type
2019-05-01 08:00:00,S001,1.25,LED_light
2019-05-01 08:01:00,S002,2.8,HVACfrom fslstm.data import SensorDataProcessor
processor = SensorDataProcessor(
window_size=600, # 10 hours in minutes for IoT sensor data
stride=60, # 1 hour stride for time series analysis
normalize=True
)
# Process raw smart building sensor data
processed_data = processor.process_sensor_logs("data/sensor_events.csv")# Model Configuration for Federated LSTM
model:
name: "FSLSTM"
lstm_layers: 3
hidden_size: 128
dropout: 0.2
fc_size: 100
# Federated Learning Configuration for IoT Sensors
federated:
num_clients: 180
clients_per_round: 36
num_rounds: 50
local_epochs: 5
batch_size: 1024
# Training Configuration for Smart Building Anomaly Detection
training:
learning_rate: 0.001
optimizer: "adam"
loss_function: "cross_entropy" # or "mse" for regression
device: "cuda"
# Data Configuration for IoT Sensor Networks
data:
window_size: 600
sequence_length: 60
train_split: 0.8
val_split: 0.1
test_split: 0.1
# Sensor Configuration for Smart Buildings
sensors:
categories: ["lights", "thermostat", "occupancy", "water_leakage", "building_access"]
num_sensors: 180
# Privacy Configuration for Federated Learning
privacy:
secure_aggregation: true
differential_privacy: falsefrom fslstm.config import Config
config = Config()
config.model.lstm_layers = 3
config.model.hidden_size = 256
config.federated.num_clients = 100
config.training.learning_rate = 0.0005
# Save configuration for smart building research
config.save("my_config.yaml")from fslstm import FSLSTMTrainer, FederatedDataLoader
# Initialize federated data loader for IoT sensors
fed_loader = FederatedDataLoader(
data_path="data/sensor_events.csv",
num_clients=180,
client_split="sensor_type" # Split by sensor type for federated learning
)
# Create federated datasets for smart building sensors
client_datasets = fed_loader.create_client_datasets()
# Initialize federated learning trainer
trainer = FSLSTMTrainer(config)
# Federated training for anomaly detection
trainer.federated_fit(
client_datasets=client_datasets,
num_rounds=50,
clients_per_round=36
)# For comparison with centralized machine learning approach
from fslstm.baselines import CentralizedLSTM
centralized_model = CentralizedLSTM(config)
centralized_model.fit(train_data)
results = centralized_model.evaluate(test_data)# Enable logging and visualization for federated learning
from fslstm.utils import TrainingLogger
logger = TrainingLogger(log_dir="logs/fslstm_experiment")
trainer = FSLSTMTrainer(config, logger=logger)
# Training with monitoring for smart building anomaly detection
trainer.fit(train_data, validation_data=val_data)
# View federated learning training curves
logger.plot_training_curves()
logger.plot_convergence_comparison()from fslstm.evaluation import Evaluator
evaluator = Evaluator(config)
# Load trained federated learning model
model = trainer.load_model("checkpoints/fslstm_best.pth")
# Evaluate on smart building test data
results = evaluator.evaluate(
model=model,
test_data=test_data,
metrics=["accuracy", "precision", "recall", "f1", "auc", "mae", "mse"]
)
print("Anomaly Detection Classification Results:")
print(f" Balanced Accuracy: {results['balanced_accuracy']:.4f}")
print(f" Precision: {results['precision']:.4f}")
print(f" Recall: {results['recall']:.4f}")
print(f" F1-Score: {results['f1_score']:.4f}")
print("Energy Prediction Regression Results:")
print(f" MAE: {results['mae']:.4f}")
print(f" MSE: {results['mse']:.4f}")
print(f" RMSE: {results['rmse']:.4f}")from fslstm.evaluation import AnomalyDetector
detector = AnomalyDetector(model, threshold=0.5)
# Detect anomalies in real-time IoT sensor data
anomalies = detector.detect_anomalies(sensor_stream)
# Evaluate collective and contextual anomalies in smart buildings
collective_results = detector.evaluate_collective_anomalies(test_data)
contextual_results = detector.evaluate_contextual_anomalies(test_data)from fslstm.baselines import run_baseline_comparison
# Compare with baseline machine learning methods
baseline_results = run_baseline_comparison(
data=test_data,
methods=["LR", "LSTM", "FLR", "FGRU", "FSLSTM"],
config=config
)
# Generate comparison plots for research evaluation
evaluator.plot_method_comparison(baseline_results)
evaluator.plot_roc_curves(baseline_results)Our FSLSTM model achieves state-of-the-art performance on smart building anomaly detection:
| Model | Precision | Recall | F1-Score | Balanced Accuracy | MAE | MSE | RMSE |
|---|---|---|---|---|---|---|---|
| LR | 0.57 | 0.60 | 0.52 | 0.72 | 0.341 | 0.48 | 0.692 |
| LSTM | 0.66 | 0.61 | 0.58 | 0.71 | 0.243 | 0.33 | 0.574 |
| FLR | 0.65 | 0.71 | 0.70 | 0.69 | 0.339 | 0.34 | 0.583 |
| FGRU | 0.84 | 0.66 | 0.59 | 0.80 | 0.211 | 0.29 | 0.538 |
| FSLSTM | 0.89 | 0.79 | 0.87 | 0.90 | 0.162 | 0.19 | 0.435 |
- Fast Convergence: 2x faster training compared to centralized LSTM
- Superior Performance: 90% balanced accuracy on sensor anomaly detection
- Privacy Preservation: Maintains data locality while achieving collaborative learning
- Communication Efficiency: Significant reduction in communication costs
- Multi-Task Learning: Effective learning across different sensor types
from fslstm.visualization import ResultVisualizer
visualizer = ResultVisualizer()
# Plot federated learning training convergence
visualizer.plot_convergence_comparison(trainer.history)
# Plot ROC curves for anomaly detection
visualizer.plot_roc_curves(results)
# Plot smart building energy consumption prediction
visualizer.plot_energy_prediction(predictions, ground_truth)
# Plot real-time anomaly detection timeline
visualizer.plot_anomaly_timeline(anomalies, timestamps)FSLSTM/
βββ fslstm/
β βββ __init__.py
β βββ models/
β β βββ __init__.py
β β βββ fslstm.py # Main FSLSTM model
β β βββ lstm_layers.py # LSTM layer implementations
β β βββ federated_model.py # Federated learning wrapper
β βββ data/
β β βββ __init__.py
β β βββ data_loader.py # Data loading utilities
β β βββ preprocessing.py # Data preprocessing
β β βββ federated_data.py # Federated data distribution
β βββ training/
β β βββ __init__.py
β β βββ trainer.py # Main training logic
β β βββ federated_trainer.py # Federated training
β β βββ aggregation.py # Federated aggregation algorithms
β βββ evaluation/
β β βββ __init__.py
β β βββ evaluator.py # Model evaluation
β β βββ metrics.py # Evaluation metrics
β β βββ anomaly_detection.py # Anomaly detection evaluation
β βββ baselines/
β β βββ __init__.py
β β βββ centralized_lstm.py # Centralized LSTM baseline
β β βββ federated_lr.py # Federated Logistic Regression
β β βββ federated_gru.py # Federated GRU
β βββ utils/
β β βββ __init__.py
β β βββ config.py # Configuration management
β β βββ logger.py # Logging utilities
β β βββ privacy.py # Privacy mechanisms
β βββ visualization/
β βββ __init__.py
β βββ plots.py # Plotting functions
β βββ dashboard.py # Interactive dashboard
βββ scripts/
β βββ train.py # Training script
β βββ evaluate.py # Evaluation script
β βββ run_pipeline.py # Complete pipeline
β βββ preprocess_data.py # Data preprocessing script
βββ configs/
β βββ smart_building.yaml # Default configuration
β βββ ablation_study.yaml # Ablation study config
β βββ baseline_comparison.yaml # Baseline comparison config
βββ data/
β βββ raw/ # Raw sensor data
β βββ processed/ # Processed datasets
β βββ examples/ # Example datasets
βββ notebooks/
β βββ 01_data_exploration.ipynb # Data exploration
β βββ 02_model_training.ipynb # Model training tutorial
β βββ 03_evaluation.ipynb # Evaluation and results
β βββ 04_visualization.ipynb # Result visualization
βββ tests/
β βββ test_models.py
β βββ test_data.py
β βββ test_training.py
β βββ test_evaluation.py
βββ requirements.txt
βββ setup.py
βββ README.md
βββ LICENSE
from fslstm.sensors import SensorInterface
class CustomSensor(SensorInterface):
def __init__(self, sensor_id, sensor_type):
super().__init__(sensor_id, sensor_type)
def read_data(self):
# Custom IoT sensor data reading logic
return sensor_data
def preprocess(self, data):
# Custom preprocessing for smart building data
return processed_data
# Register custom IoT sensor for federated learning
trainer.register_sensor_type("custom_sensor", CustomSensor)# Configure different tasks for different IoT sensor types
config.tasks = {
"occupancy": {"type": "classification", "classes": 2},
"temperature": {"type": "regression", "target": "energy_consumption"},
"lighting": {"type": "classification", "classes": 2}
}from fslstm.privacy import DifferentialPrivacy, SecureAggregation
# Enable differential privacy for federated learning
privacy_mechanism = DifferentialPrivacy(epsilon=1.0, delta=1e-5)
trainer.set_privacy_mechanism(privacy_mechanism)
# Enable secure aggregation for IoT sensor networks
secure_agg = SecureAggregation()
trainer.set_aggregation_method(secure_agg)from fslstm.experiments import AblationStudy
# Run ablation study on number of LSTM layers for federated learning
ablation = AblationStudy(config)
results = ablation.run_layer_ablation(
layers=[1, 2, 3, 4],
dataset=train_data
)
# Analyze results for smart building anomaly detection
ablation.plot_layer_comparison(results)from fslstm.experiments import ConvergenceAnalysis
# Analyze federated learning convergence with different number of IoT clients
convergence_study = ConvergenceAnalysis(config)
convergence_results = convergence_study.analyze_client_scaling(
client_counts=[20, 40, 80, 160, 200],
dataset=train_data
)If you use this code in your research, please cite:
@article{fslstm2020,
title={A Federated Learning Approach to Anomaly Detection in Smart Buildings},
journal={ACM Transactions on Internet of Things},
volume={2},
number={4},
pages={1--23},
year={2021},
keywords={federated learning, anomaly detection, smart buildings, IoT sensors, LSTM, privacy preservation}
}Related Research Publications:
- FedTime: Federated Learning for Time Series Forecasting by Raed Abdel Sater
- Federated Learning for IoT: Challenges and Opportunities
- Privacy-Preserving Machine Learning in Smart Cities
- LSTM Networks for Time Series Anomaly Detection
This project is licensed under the MIT License - see the LICENSE.md file for details.
Note: This implementation is based on the federated learning framework for anomaly detection in smart buildings. The model supports both classification tasks (sensor fault detection) and regression tasks (energy consumption prediction) while preserving data privacy through federated learning.











