Final Year Project | Research Paper Ready | IEEE IoT Journal Target
A production-ready implementation of a Byzantine-resilient Federated Learning system for IoT Intrusion Detection. TV-FLIDS defends against malicious clients through a unified three-criteria verification gate combined with dynamic memory-aware trust scoring.
| Setup | Min RAM | Recommended GPU | Est. Full Run Time |
|---|---|---|---|
| CPU only | 8 GB | — | ~8 hours |
| GPU (8 GB VRAM) | 16 GB | NVIDIA RTX 3080+ | ~90 min |
Set CPU parallelism: export TVFLIDS_SIM_CLIENT_CPUS=4
Standard Federated Learning is vulnerable to adversarial clients that poison the global model. TV-FLIDS addresses this by introducing:
- Verification Gate — Pre-aggregation filter checking loss consistency, gradient direction, and statistical outliers
- Trust Scoring — Dynamic per-client scores with exponential memory decay (T_i = α·S_i + β·A_i − γ·O_i)
- Adaptive Weights — Meta-gradient learning of α, β, γ from server validation loss
- Formal Guarantees — Proposition 1 bounding Byzantine influence under the trust floor
NSL-KDD / UNSW-NB15
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Flower FL Simulation │
│ │
│ Client 1..N │
│ ┌────────────┐ │
│ │ Local MLP │──► Δw_i + val_loss_i ──────────────────────►│
│ └────────────┘ (per round) │
│ │
│ TVFLIDSStrategy (Server) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ 1. VerificationModule │ │
│ │ ├─ Check 1: loss_after > loss_before? │ │
│ │ ├─ Check 2: cosine_sim(Δw_i, mean_Δw) > threshold? │ │
│ │ └─ Check 3: z_score(||Δw_i||) < threshold? │ │
│ │ 2. TrustScorer (Adaptive) │ │
│ │ T_i(t) = 0.9·T_i(t-1) + 0.1·[α·S_i+β·A_i-γ·O_i] │ │
│ │ Meta-gradient update on α, β, γ │ │
│ │ 3. Weighted Aggregation │ │
│ │ w^{t+1} = Σ(T_i/ΣT) · w_i │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ Global Model w^{t+1} │
└─────────────────────────────────────────────────────────────┘
│
▼
NSL-KDD Test Set Evaluation
Metrics: Accuracy, F1-Macro, Attack Success Rate
The default model is IDSMLP (4-layer MLP with BatchNorm + Dropout).
An experimental IDSBiLSTM is available for sequential traffic analysis:
python experiments/run_experiment.py --strategy tvflids --attack label_flip_30 --model bilstmThe paper results use --model mlp (default).
- 5 FL strategies: FedAvg, Krum, Trimmed Mean, FLTrust, FoolsGold, TV-FLIDS
- 4 attack types: Label Flip, Gradient Scaling, Noise Injection, Backdoor
- IID & Non-IID data partitioning (Dirichlet α=0.5/0.1)
- Publication-grade statistics: 5-seed mean±std, Wilcoxon, McNemar tests
- 6 paper figures: Convergence, trust evolution, robustness curve, ablation, weight trajectory, confusion matrices
- Overhead analysis: Per-round timing and communication cost measurement
- Full ablation suite: A1–A5 component contribution analysis
conda env create -f environment.yml
conda activate tvflids
make datapip install -r requirements.txt
bash scripts/download_nslkdd.shpython -c "import torch; import flwr; print('PyTorch:', torch.__version__, '| Flower:', flwr.__version__)"
# Expected: PyTorch: 2.1.0 | Flower: 1.6.0
make smoke- Simulation speed: TV-FLIDS adds ~15% overhead over FedAvg per round
(verification gate + trust scoring; evaluation is shared). - CPU parallelism: Set
TVFLIDS_SIM_CLIENT_CPUSto at mostnproc / 2
to avoid Ray resource starvation:export TVFLIDS_SIM_CLIENT_CPUS=4 python experiments/run_experiment.py --strategy tvflids --attack label_flip_30 - GPU usage: Set
TVFLIDS_SIM_CLIENT_GPUS=0.1per virtual client
if your GPU has ≥ 8 GB VRAM.
# Automatic download via script
bash scripts/download_nslkdd.sh
# OR via Python
python -c "from data.preprocessing.nslkdd_pipeline import download_nslkdd; download_nslkdd('data/raw/KDDTrain+.txt', 'data/raw/KDDTest+.txt')"Verifies as:
data/raw/KDDTrain+.txt → ~125,973 rows
data/raw/KDDTest+.txt → ~22,544 rows
All stochastic operations are seeded via utils/seed.py::set_all_seeds(seed):
| Operation | Seeded via |
|---|---|
| NumPy global state | np.random.seed(seed) |
| NumPy RNGs | np.random.default_rng(seed) |
| PyTorch | torch.manual_seed(seed) + torch.backends.cudnn.deterministic=True |
| CUDA | torch.cuda.manual_seed_all(seed) |
| Python hash | os.environ["PYTHONHASHSEED"] = str(seed) |
| Attacks | seed + client_id per client |
| SMOTE | random_state=seed |
| Data partitioning | np.random.default_rng(seed) |
To reproduce Table 1, use seeds [42, 123, 456, 789, 1337].
After downloading NSL-KDD, verify file integrity:
python scripts/verify_data.pyExpected SHA-256 hashes are recorded in scripts/verify_data.py.
If the check fails, re-run bash scripts/download_nslkdd.sh.
Download from: https://research.unsw.edu.au/projects/unsw-nb15-dataset
Place at: data/raw/UNSW_NB15_training-set.csv and data/raw/UNSW_NB15_testing-set.csv
# TV-FLIDS vs 30% label flip attack (primary experiment)
python experiments/run_experiment.py --strategy tvflids --attack label_flip_30
# FedAvg baseline (no defense, shows vulnerability)
python experiments/run_experiment.py --strategy fedavg --attack label_flip_30
# Clean baseline (no attacks, upper bound)
python experiments/run_experiment.py --strategy fedavg --attack no_attack
# FLTrust comparison (SOTA baseline)
python experiments/run_experiment.py --strategy fltrust --attack label_flip_30# Custom seed and rounds (fast test)
python experiments/run_experiment.py \
--strategy tvflids \
--attack label_flip_30 \
--seed 123 \
--rounds 20
# IID data partition
python experiments/run_experiment.py \
--strategy tvflids \
--attack label_flip_30 \
--partition iid
# Extreme non-IID (stress test)
python experiments/run_experiment.py \
--strategy tvflids \
--attack label_flip_30 \
--partition noniid \
--alpha 0.1
# Different attack types
python experiments/run_experiment.py --strategy tvflids --attack gradient_scale_30
python experiments/run_experiment.py --strategy tvflids --attack noise_30
python experiments/run_experiment.py --strategy tvflids --attack backdoor_20Integration tests will auto-download NSL-KDD if needed.
python tests/test_all.py
python tests/test_integration.pypython experiments/run_full_comparison.py \
--strategies fedavg krum trimmed_mean fltrust foolsgold tvflids \
--attack label_flip_30 \
--seeds 42 123 456 789 1337 \
--rounds 100python experiments/run_ratio_sweep.py \
--methods fedavg fltrust tvflids \
--ratios 0.0 0.1 0.2 0.3 0.4 0.5 \
--seeds 42 123 456 \
--rounds 100python experiments/run_ablation.py \
--attack label_flip_30 \
--rounds 100 \
--seeds 42 123 456 789 1337# Runs everything — expect 2-4 hours on CPU, ~30 min on GPU
bash scripts/run_all_experiments.shAfter running all experiments:
# Generate LaTeX Table 1
python scripts/generate_tables.py \
--input results/tables/full_comparison_results.json \
--output results/tables/table1.tex
# Check result completeness before submission
python scripts/check_results.pyProposition 1 and Lemma 1 (trust convergence) can be verified numerically:
# Synthetic verification (no data required)
python theory/proposition1_verification.py
# Full theory validation suite
python scripts/run_theory_validation.pyResults are saved to results/tables/proposition1_real.json.
| Strategy | Reference | Key Property |
|---|---|---|
fedavg |
McMahan et al., 2017 | Standard baseline (no defense) |
krum |
Blanchard et al., NeurIPS 2017 | Nearest-neighbor selection |
trimmed_mean |
Yin et al., ICML 2018 | Coordinate-wise robust mean |
fltrust |
Cao et al., NDSS 2021 | Server-root trust bootstrapping |
foolsgold |
Fung et al., 2018 | Sybil resistance via history |
flame |
Nguyen et al., USENIX Security 2022 | HDBSCAN + adaptive noise |
rfa |
Pillutla et al., IEEE TSP 2022 | Geometric median (Weiszfeld) |
tvflids |
This work | 3-criteria gate + adaptive trust |
tvflids_fixed |
This work | TV-FLIDS with fixed α, β, γ |
| Config Key | Type | Ratio | Description |
|---|---|---|---|
no_attack |
— | 0% | Clean baseline |
label_flip_10/20/30 |
Data | 10/20/30% | Flip attack→Normal labels |
gradient_scale_10/30 |
Model | 10/30% | Amplify gradient ×10 |
noise_30 |
Model | 30% | Gaussian noise (σ=0.5) |
backdoor_20 |
Data | 20% | Trigger pattern insertion |
Edit config/fl_config.yaml to change hyperparameters:
Dataset paths live in config/dataset_config.yaml.
federated_learning:
num_clients: 20 # Simulated IoT devices
num_rounds: 100 # FL communication rounds
fraction_fit: 0.5 # 50% clients participate per round
local_epochs: 5 # Local training epochs
local_lr: 0.001 # Adam learning rate
trust:
alpha: 0.4 # Similarity weight
beta: 0.4 # Accuracy weight
gamma: 0.2 # Anomaly penalty weight
memory_decay: 0.9 # EMA decay factor
min_trust: 0.01 # Trust floor
verification:
loss_threshold: 0.0 # Reject if ΔL < 0
cosine_threshold: 0.0 # Flag if cos_sim < 0
zscore_threshold: 2.5 # Flag if |z| > 2.5| Strategy | Accuracy | F1-Macro | ASR |
|---|---|---|---|
| FedAvg (no defense) | 0.6026 ± 0.0114 | 0.5726 ± 0.0089 | 0.4012 ± 0.0253 |
| Krum | 0.8170 ± 0.0089 | 0.7891 ± 0.0108 | 0.1650 ± 0.0243 |
| Trimmed Mean | 0.8038 ± 0.0097 | 0.7755 ± 0.0104 | 0.2157 ± 0.0404 |
| FLTrust | 0.8642 ± 0.0121 | 0.8349 ± 0.0107 | 0.1357 ± 0.0388 |
| FoolsGold | 0.8081 ± 0.0109 | 0.7794 ± 0.0119 | 0.1908 ± 0.0352 |
| FLAME | 0.8256 ± 0.0082 | 0.8016 ± 0.0095 | 0.1878 ± 0.0308 |
| RFA | 0.8082 ± 0.0117 | 0.7811 ± 0.0130 | 0.2016 ± 0.0226 |
| TV-FLIDS | 0.8801 ± 0.0095 | 0.8502 ± 0.0141 | 0.1908 ± 0.0035 |
Values are mean ± std over 5 seeds (42, 123, 456, 789, 1337).
Full results: results/tables/full_comparison_results.json
tv-flids/
├── config/ # Hyperparameter configs (YAML)
│ ├── fl_config.yaml
│ └── dataset_config.yaml
├── data/
│ ├── preprocessing/ # NSL-KDD, UNSW-NB15 pipelines
│ └── partitioning.py # IID & Non-IID (Dirichlet) partitioners
├── extras/
│ └── mnist_fl_pipeline.py # MNIST FL benchmark (not used in paper)
├── models/
│ └── mlp.py # IDSMLP + IDSBiLSTM architectures
├── fl/
│ ├── client.py # Flower FL client with attack injection
│ ├── strategy.py # TVFLIDSStrategy (main novel contribution)
│ └── baselines/ # FedAvg, Krum, TrimMean, FLTrust, FoolsGold
├── trust/
│ ├── trust_scorer.py # Fixed-weight trust scoring
│ ├── adaptive_trust_scorer.py # Meta-gradient adaptive α,β,γ
│ └── verification.py # Three-criteria verification gate
├── attacks/
│ └── adversarial.py # 4 attack types + configuration registry
├── evaluation/
│ ├── metrics.py # Accuracy, F1, ASR, FNR tracking
│ ├── statistical_testing.py # Wilcoxon, McNemar, multi-seed reporting
│ ├── visualization.py # 6 paper-ready figure generators
│ └── overhead.py # Time/communication cost analysis
├── theory/
│ ├── proposition1_verification.py # Prop. 1 numerical verification
│ └── convergence_analysis.py # Convergence rate (τ) fitting
├── experiments/
│ ├── run_experiment.py # Main experiment runner (start here)
│ ├── run_ablation.py # A1-A5 ablation studies
│ ├── run_ratio_sweep.py # Adversarial ratio sweep
│ └── run_full_comparison.py # Multi-seed Table 1 reproduction
├── utils/
│ ├── seed.py # Centralized seed management
│ └── logger.py # JSON + TensorBoard logging
├── scripts/
│ ├── download_nslkdd.sh # Dataset download
│ └── run_all_experiments.sh # Full paper reproduction
├── results/ # Generated outputs (gitignored)
│ ├── logs/ # Per-experiment JSON logs
│ ├── figures/ # PDF paper figures
│ └── tables/ # CSV/JSON result tables
├── requirements.txt
└── README.md
- Byzantine threshold: TV-FLIDS degrades when adversarial fraction exceeds ~50%.
At f/N > 0.5, the verification gate cannot reliably separate honest and malicious updates. - IID server assumption: The server validation set used for trust scoring must be
class-balanced and drawn from the same distribution as the global test set.
Distribution shift between server val and test data is not handled. - Single model architecture: All clients use the same MLP architecture.
Heterogeneous model support (e.g., different depths) is out of scope. - NSL-KDD age: NSL-KDD is a 2009-era dataset. Performance on modern IoT traffic
datasets (e.g., CIC-IoT23) is not evaluated.
If you use this code in your research, please cite:
@inproceedings{tvflids2025,
title = {TV-FLIDS: Trust-Aware and Verifiable Federated Learning for
Intrusion Detection under Adaptive Byzantine Clients},
author = {Ali Akarma},
booktitle = {},
year = {2025},
url = {https://github.com/aliakarma/tv-flids}
}@inproceedings{blanchard2017nips,
title={Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent},
author={Blanchard et al.},
booktitle={NeurIPS}, year={2017}
}
@inproceedings{cao2021fltrust,
title={FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping},
author={Cao et al.},
booktitle={NDSS}, year={2021}
}
@article{mcmahan2017fedavg,
title={Communication-Efficient Learning of Deep Networks from Decentralized Data},
author={McMahan et al.},
booktitle={AISTATS}, year={2017}
}MIT License — see LICENSE for details.