TV-FLIDS: Trust-Aware & Verifiable Federated Intrusion Detection System

Final Year Project | Research Paper Ready | IEEE IoT Journal Target

A production-ready implementation of a Byzantine-resilient Federated Learning system for IoT Intrusion Detection. TV-FLIDS defends against malicious clients through a unified three-criteria verification gate combined with dynamic memory-aware trust scoring.

Hardware Requirements

Setup	Min RAM	Recommended GPU	Est. Full Run Time
CPU only	8 GB	—	~8 hours
GPU (8 GB VRAM)	16 GB	NVIDIA RTX 3080+	~90 min

Set CPU parallelism: export TVFLIDS_SIM_CLIENT_CPUS=4

Overview

Standard Federated Learning is vulnerable to adversarial clients that poison the global model. TV-FLIDS addresses this by introducing:

Verification Gate — Pre-aggregation filter checking loss consistency, gradient direction, and statistical outliers
Trust Scoring — Dynamic per-client scores with exponential memory decay (T_i = α·S_i + β·A_i − γ·O_i)
Adaptive Weights — Meta-gradient learning of α, β, γ from server validation loss
Formal Guarantees — Proposition 1 bounding Byzantine influence under the trust floor

Architecture

NSL-KDD / UNSW-NB15
        │
        ▼
┌─────────────────────────────────────────────────────────────┐
│                    Flower FL Simulation                     │
│                                                             │
│  Client 1..N                                                │
│  ┌────────────┐                                             │
│  │ Local MLP  │──► Δw_i + val_loss_i ──────────────────────►│
│  └────────────┘         (per round)                         │
│                                                             │
│  TVFLIDSStrategy (Server)                                   │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  1. VerificationModule                                 │ │
│  │     ├─ Check 1: loss_after > loss_before?              │ │
│  │     ├─ Check 2: cosine_sim(Δw_i, mean_Δw) > threshold? │ │
│  │     └─ Check 3: z_score(||Δw_i||) < threshold?         │ │
│  │  2. TrustScorer (Adaptive)                             │ │
│  │     T_i(t) = 0.9·T_i(t-1) + 0.1·[α·S_i+β·A_i-γ·O_i]    │ │
│  │     Meta-gradient update on α, β, γ                    │ │
│  │  3. Weighted Aggregation                               │ │
│  │     w^{t+1} = Σ(T_i/ΣT) · w_i                          │ │
│  └────────────────────────────────────────────────────────┘ │
│                        │                                    │
│                   Global Model w^{t+1}                      │
└─────────────────────────────────────────────────────────────┘
        │
        ▼
  NSL-KDD Test Set Evaluation
  Metrics: Accuracy, F1-Macro, Attack Success Rate

Model Selection

The default model is IDSMLP (4-layer MLP with BatchNorm + Dropout).
An experimental IDSBiLSTM is available for sequential traffic analysis:

python experiments/run_experiment.py --strategy tvflids --attack label_flip_30 --model bilstm

The paper results use --model mlp (default).

Features

5 FL strategies: FedAvg, Krum, Trimmed Mean, FLTrust, FoolsGold, TV-FLIDS
4 attack types: Label Flip, Gradient Scaling, Noise Injection, Backdoor
IID & Non-IID data partitioning (Dirichlet α=0.5/0.1)
Publication-grade statistics: 5-seed mean±std, Wilcoxon, McNemar tests
6 paper figures: Convergence, trust evolution, robustness curve, ablation, weight trajectory, confusion matrices
Overhead analysis: Per-round timing and communication cost measurement
Full ablation suite: A1–A5 component contribution analysis

Installation

Option A — Conda (Recommended)

conda env create -f environment.yml
conda activate tvflids
make data

Option B — pip

pip install -r requirements.txt
bash scripts/download_nslkdd.sh

Verify Installation

python -c "import torch; import flwr; print('PyTorch:', torch.__version__, '| Flower:', flwr.__version__)"
# Expected: PyTorch: 2.1.0 | Flower: 1.6.0
make smoke

Performance Notes

Simulation speed: TV-FLIDS adds ~15% overhead over FedAvg per round
(verification gate + trust scoring; evaluation is shared).

CPU parallelism: Set TVFLIDS_SIM_CLIENT_CPUS to at most nproc / 2
to avoid Ray resource starvation:

export TVFLIDS_SIM_CLIENT_CPUS=4
python experiments/run_experiment.py --strategy tvflids --attack label_flip_30

GPU usage: Set TVFLIDS_SIM_CLIENT_GPUS=0.1 per virtual client
if your GPU has ≥ 8 GB VRAM.

Dataset Setup

NSL-KDD (Primary — required)

# Automatic download via script
bash scripts/download_nslkdd.sh

# OR via Python
python -c "from data.preprocessing.nslkdd_pipeline import download_nslkdd; download_nslkdd('data/raw/KDDTrain+.txt', 'data/raw/KDDTest+.txt')"

Verifies as:

data/raw/KDDTrain+.txt  → ~125,973 rows
data/raw/KDDTest+.txt   → ~22,544 rows

Reproducibility Guarantee

All stochastic operations are seeded via utils/seed.py::set_all_seeds(seed):

Operation	Seeded via
NumPy global state	`np.random.seed(seed)`
NumPy RNGs	`np.random.default_rng(seed)`
PyTorch	`torch.manual_seed(seed)` + `torch.backends.cudnn.deterministic=True`
CUDA	`torch.cuda.manual_seed_all(seed)`
Python hash	`os.environ["PYTHONHASHSEED"] = str(seed)`
Attacks	`seed + client_id` per client
SMOTE	`random_state=seed`
Data partitioning	`np.random.default_rng(seed)`

To reproduce Table 1, use seeds [42, 123, 456, 789, 1337].

Data Integrity Verification

After downloading NSL-KDD, verify file integrity:

python scripts/verify_data.py

Expected SHA-256 hashes are recorded in scripts/verify_data.py. If the check fails, re-run bash scripts/download_nslkdd.sh.

UNSW-NB15 (Secondary — optional)

Download from: https://research.unsw.edu.au/projects/unsw-nb15-dataset Place at: data/raw/UNSW_NB15_training-set.csv and data/raw/UNSW_NB15_testing-set.csv

Quick Start

Run a single experiment

# TV-FLIDS vs 30% label flip attack (primary experiment)
python experiments/run_experiment.py --strategy tvflids --attack label_flip_30

# FedAvg baseline (no defense, shows vulnerability)
python experiments/run_experiment.py --strategy fedavg --attack label_flip_30

# Clean baseline (no attacks, upper bound)
python experiments/run_experiment.py --strategy fedavg --attack no_attack

# FLTrust comparison (SOTA baseline)
python experiments/run_experiment.py --strategy fltrust --attack label_flip_30

Run with options

# Custom seed and rounds (fast test)
python experiments/run_experiment.py \
    --strategy tvflids \
    --attack label_flip_30 \
    --seed 123 \
    --rounds 20

# IID data partition
python experiments/run_experiment.py \
    --strategy tvflids \
    --attack label_flip_30 \
    --partition iid

# Extreme non-IID (stress test)
python experiments/run_experiment.py \
    --strategy tvflids \
    --attack label_flip_30 \
    --partition noniid \
    --alpha 0.1

# Different attack types
python experiments/run_experiment.py --strategy tvflids --attack gradient_scale_30
python experiments/run_experiment.py --strategy tvflids --attack noise_30
python experiments/run_experiment.py --strategy tvflids --attack backdoor_20

Testing

Integration tests will auto-download NSL-KDD if needed.

python tests/test_all.py
python tests/test_integration.py

Reproducing Paper Results

Table 1 — Full Strategy Comparison

python experiments/run_full_comparison.py \
    --strategies fedavg krum trimmed_mean fltrust foolsgold tvflids \
    --attack label_flip_30 \
    --seeds 42 123 456 789 1337 \
    --rounds 100

Figure 3 — Robustness Curve

python experiments/run_ratio_sweep.py \
    --methods fedavg fltrust tvflids \
    --ratios 0.0 0.1 0.2 0.3 0.4 0.5 \
    --seeds 42 123 456 \
    --rounds 100

Table 2 — Ablation Study

python experiments/run_ablation.py \
    --attack label_flip_30 \
    --rounds 100 \
    --seeds 42 123 456 789 1337

All Experiments (Full Reproduction)

# Runs everything — expect 2-4 hours on CPU, ~30 min on GPU
bash scripts/run_all_experiments.sh

Reproducing Paper Tables

After running all experiments:

# Generate LaTeX Table 1
python scripts/generate_tables.py \
    --input results/tables/full_comparison_results.json \
    --output results/tables/table1.tex

# Check result completeness before submission
python scripts/check_results.py

Theoretical Verification

Proposition 1 and Lemma 1 (trust convergence) can be verified numerically:

# Synthetic verification (no data required)
python theory/proposition1_verification.py

# Full theory validation suite
python scripts/run_theory_validation.py

Results are saved to results/tables/proposition1_real.json.

Available Strategies

Strategy	Reference	Key Property
`fedavg`	McMahan et al., 2017	Standard baseline (no defense)
`krum`	Blanchard et al., NeurIPS 2017	Nearest-neighbor selection
`trimmed_mean`	Yin et al., ICML 2018	Coordinate-wise robust mean
`fltrust`	Cao et al., NDSS 2021	Server-root trust bootstrapping
`foolsgold`	Fung et al., 2018	Sybil resistance via history
`flame`	Nguyen et al., USENIX Security 2022	HDBSCAN + adaptive noise
`rfa`	Pillutla et al., IEEE TSP 2022	Geometric median (Weiszfeld)
`tvflids`	This work	3-criteria gate + adaptive trust
`tvflids_fixed`	This work	TV-FLIDS with fixed α, β, γ

Available Attacks

Config Key	Type	Ratio	Description
`no_attack`	—	0%	Clean baseline
`label_flip_10/20/30`	Data	10/20/30%	Flip attack→Normal labels
`gradient_scale_10/30`	Model	10/30%	Amplify gradient ×10
`noise_30`	Model	30%	Gaussian noise (σ=0.5)
`backdoor_20`	Data	20%	Trigger pattern insertion

Configuration

Edit config/fl_config.yaml to change hyperparameters:

Dataset paths live in config/dataset_config.yaml.

federated_learning:
  num_clients: 20       # Simulated IoT devices
  num_rounds: 100       # FL communication rounds
  fraction_fit: 0.5     # 50% clients participate per round
  local_epochs: 5       # Local training epochs
  local_lr: 0.001       # Adam learning rate

trust:
  alpha: 0.4            # Similarity weight
  beta: 0.4             # Accuracy weight
  gamma: 0.2            # Anomaly penalty weight
  memory_decay: 0.9     # EMA decay factor
  min_trust: 0.01       # Trust floor

verification:
  loss_threshold: 0.0   # Reject if ΔL < 0
  cosine_threshold: 0.0 # Flag if cos_sim < 0
  zscore_threshold: 2.5 # Flag if |z| > 2.5

Expected Results

Strategy	Accuracy	F1-Macro	ASR
FedAvg (no defense)	0.6026 ± 0.0114	0.5726 ± 0.0089	0.4012 ± 0.0253
Krum	0.8170 ± 0.0089	0.7891 ± 0.0108	0.1650 ± 0.0243
Trimmed Mean	0.8038 ± 0.0097	0.7755 ± 0.0104	0.2157 ± 0.0404
FLTrust	0.8642 ± 0.0121	0.8349 ± 0.0107	0.1357 ± 0.0388
FoolsGold	0.8081 ± 0.0109	0.7794 ± 0.0119	0.1908 ± 0.0352
FLAME	0.8256 ± 0.0082	0.8016 ± 0.0095	0.1878 ± 0.0308
RFA	0.8082 ± 0.0117	0.7811 ± 0.0130	0.2016 ± 0.0226
TV-FLIDS	0.8801 ± 0.0095	0.8502 ± 0.0141	0.1908 ± 0.0035

Values are mean ± std over 5 seeds (42, 123, 456, 789, 1337).
Full results: results/tables/full_comparison_results.json

Project Structure

tv-flids/
├── config/                    # Hyperparameter configs (YAML)
│   ├── fl_config.yaml
│   └── dataset_config.yaml
├── data/
│   ├── preprocessing/         # NSL-KDD, UNSW-NB15 pipelines
│   └── partitioning.py        # IID & Non-IID (Dirichlet) partitioners
├── extras/
│   └── mnist_fl_pipeline.py          # MNIST FL benchmark (not used in paper)
├── models/
│   └── mlp.py                 # IDSMLP + IDSBiLSTM architectures
├── fl/
│   ├── client.py              # Flower FL client with attack injection
│   ├── strategy.py            # TVFLIDSStrategy (main novel contribution)
│   └── baselines/             # FedAvg, Krum, TrimMean, FLTrust, FoolsGold
├── trust/
│   ├── trust_scorer.py        # Fixed-weight trust scoring
│   ├── adaptive_trust_scorer.py  # Meta-gradient adaptive α,β,γ
│   └── verification.py        # Three-criteria verification gate
├── attacks/
│   └── adversarial.py         # 4 attack types + configuration registry
├── evaluation/
│   ├── metrics.py             # Accuracy, F1, ASR, FNR tracking
│   ├── statistical_testing.py # Wilcoxon, McNemar, multi-seed reporting
│   ├── visualization.py       # 6 paper-ready figure generators
│   └── overhead.py            # Time/communication cost analysis
├── theory/
│   ├── proposition1_verification.py  # Prop. 1 numerical verification
│   └── convergence_analysis.py       # Convergence rate (τ) fitting
├── experiments/
│   ├── run_experiment.py      # Main experiment runner (start here)
│   ├── run_ablation.py        # A1-A5 ablation studies
│   ├── run_ratio_sweep.py     # Adversarial ratio sweep
│   └── run_full_comparison.py # Multi-seed Table 1 reproduction
├── utils/
│   ├── seed.py                # Centralized seed management
│   └── logger.py              # JSON + TensorBoard logging
├── scripts/
│   ├── download_nslkdd.sh     # Dataset download
│   └── run_all_experiments.sh # Full paper reproduction
├── results/                   # Generated outputs (gitignored)
│   ├── logs/                  # Per-experiment JSON logs
│   ├── figures/               # PDF paper figures
│   └── tables/                # CSV/JSON result tables
├── requirements.txt
└── README.md

Known Limitations

Byzantine threshold: TV-FLIDS degrades when adversarial fraction exceeds ~50%.
At f/N > 0.5, the verification gate cannot reliably separate honest and malicious updates.
IID server assumption: The server validation set used for trust scoring must be
class-balanced and drawn from the same distribution as the global test set.
Distribution shift between server val and test data is not handled.
Single model architecture: All clients use the same MLP architecture.
Heterogeneous model support (e.g., different depths) is out of scope.
NSL-KDD age: NSL-KDD is a 2009-era dataset. Performance on modern IoT traffic
datasets (e.g., CIC-IoT23) is not evaluated.

Citation

If you use this code in your research, please cite:

@inproceedings{tvflids2025,
  title   = {TV-FLIDS: Trust-Aware and Verifiable Federated Learning for
             Intrusion Detection under Adaptive Byzantine Clients},
  author  = {Ali Akarma},
  booktitle = {},
  year    = {2025},
  url     = {https://github.com/aliakarma/tv-flids}
}

Key References

@inproceedings{blanchard2017nips,
  title={Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent},
  author={Blanchard et al.},
  booktitle={NeurIPS}, year={2017}
}

@inproceedings{cao2021fltrust,
  title={FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping},
  author={Cao et al.},
  booktitle={NDSS}, year={2021}
}

@article{mcmahan2017fedavg,
  title={Communication-Efficient Learning of Deep Networks from Decentralized Data},
  author={McMahan et al.},
  booktitle={AISTATS}, year={2017}
}

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
attacks		attacks
config		config
data		data
evaluation		evaluation
experiments		experiments
extras		extras
fl		fl
models		models
results		results
scripts		scripts
tests		tests
theory		theory
trust		trust
utils		utils
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
REMEDIATION_PLAN.md		REMEDIATION_PLAN.md
TVFLIDS_NeurIPS_Remediation_Plan.md		TVFLIDS_NeurIPS_Remediation_Plan.md
audit_hyperparams.py		audit_hyperparams.py
environment.yml		environment.yml
mock_phase7_results.py		mock_phase7_results.py
requirements.txt		requirements.txt
scratch_test.py		scratch_test.py
test_partitioner.py		test_partitioner.py
verify_adaptive_weights.py		verify_adaptive_weights.py

Folders and files

Latest commit

History

Repository files navigation

TV-FLIDS: Trust-Aware & Verifiable Federated Intrusion Detection System

Hardware Requirements

Overview

Architecture

Model Selection

Features

Installation

Option A — Conda (Recommended)

Option B — pip

Verify Installation

Performance Notes

Dataset Setup

NSL-KDD (Primary — required)

Reproducibility Guarantee

Data Integrity Verification

UNSW-NB15 (Secondary — optional)

Quick Start

Run a single experiment

Run with options

Testing

Reproducing Paper Results

Table 1 — Full Strategy Comparison

Figure 3 — Robustness Curve

Table 2 — Ablation Study

All Experiments (Full Reproduction)

Reproducing Paper Tables

Theoretical Verification

Available Strategies

Available Attacks

Configuration

Expected Results

Project Structure

Known Limitations

Citation

Key References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages