Skip to content

aliakarma/TV-FLIDS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TV-FLIDS: Trust-Aware & Verifiable Federated Intrusion Detection System

CI Python PyTorch Flower License

Final Year Project | Research Paper Ready | IEEE IoT Journal Target

A production-ready implementation of a Byzantine-resilient Federated Learning system for IoT Intrusion Detection. TV-FLIDS defends against malicious clients through a unified three-criteria verification gate combined with dynamic memory-aware trust scoring.


Hardware Requirements

Setup Min RAM Recommended GPU Est. Full Run Time
CPU only 8 GB ~8 hours
GPU (8 GB VRAM) 16 GB NVIDIA RTX 3080+ ~90 min

Set CPU parallelism: export TVFLIDS_SIM_CLIENT_CPUS=4


Overview

Standard Federated Learning is vulnerable to adversarial clients that poison the global model. TV-FLIDS addresses this by introducing:

  1. Verification Gate — Pre-aggregation filter checking loss consistency, gradient direction, and statistical outliers
  2. Trust Scoring — Dynamic per-client scores with exponential memory decay (T_i = α·S_i + β·A_i − γ·O_i)
  3. Adaptive Weights — Meta-gradient learning of α, β, γ from server validation loss
  4. Formal Guarantees — Proposition 1 bounding Byzantine influence under the trust floor

Architecture

NSL-KDD / UNSW-NB15
        │
        ▼
┌─────────────────────────────────────────────────────────────┐
│                    Flower FL Simulation                     │
│                                                             │
│  Client 1..N                                                │
│  ┌────────────┐                                             │
│  │ Local MLP  │──► Δw_i + val_loss_i ──────────────────────►│
│  └────────────┘         (per round)                         │
│                                                             │
│  TVFLIDSStrategy (Server)                                   │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  1. VerificationModule                                 │ │
│  │     ├─ Check 1: loss_after > loss_before?              │ │
│  │     ├─ Check 2: cosine_sim(Δw_i, mean_Δw) > threshold? │ │
│  │     └─ Check 3: z_score(||Δw_i||) < threshold?         │ │
│  │  2. TrustScorer (Adaptive)                             │ │
│  │     T_i(t) = 0.9·T_i(t-1) + 0.1·[α·S_i+β·A_i-γ·O_i]    │ │
│  │     Meta-gradient update on α, β, γ                    │ │
│  │  3. Weighted Aggregation                               │ │
│  │     w^{t+1} = Σ(T_i/ΣT) · w_i                          │ │
│  └────────────────────────────────────────────────────────┘ │
│                        │                                    │
│                   Global Model w^{t+1}                      │
└─────────────────────────────────────────────────────────────┘
        │
        ▼
  NSL-KDD Test Set Evaluation
  Metrics: Accuracy, F1-Macro, Attack Success Rate

Model Selection

The default model is IDSMLP (4-layer MLP with BatchNorm + Dropout).
An experimental IDSBiLSTM is available for sequential traffic analysis:

python experiments/run_experiment.py --strategy tvflids --attack label_flip_30 --model bilstm

The paper results use --model mlp (default).


Features

  • 5 FL strategies: FedAvg, Krum, Trimmed Mean, FLTrust, FoolsGold, TV-FLIDS
  • 4 attack types: Label Flip, Gradient Scaling, Noise Injection, Backdoor
  • IID & Non-IID data partitioning (Dirichlet α=0.5/0.1)
  • Publication-grade statistics: 5-seed mean±std, Wilcoxon, McNemar tests
  • 6 paper figures: Convergence, trust evolution, robustness curve, ablation, weight trajectory, confusion matrices
  • Overhead analysis: Per-round timing and communication cost measurement
  • Full ablation suite: A1–A5 component contribution analysis

Installation

Option A — Conda (Recommended)

conda env create -f environment.yml
conda activate tvflids
make data

Option B — pip

pip install -r requirements.txt
bash scripts/download_nslkdd.sh

Verify Installation

python -c "import torch; import flwr; print('PyTorch:', torch.__version__, '| Flower:', flwr.__version__)"
# Expected: PyTorch: 2.1.0 | Flower: 1.6.0
make smoke

Performance Notes

  • Simulation speed: TV-FLIDS adds ~15% overhead over FedAvg per round
    (verification gate + trust scoring; evaluation is shared).
  • CPU parallelism: Set TVFLIDS_SIM_CLIENT_CPUS to at most nproc / 2
    to avoid Ray resource starvation:
    export TVFLIDS_SIM_CLIENT_CPUS=4
    python experiments/run_experiment.py --strategy tvflids --attack label_flip_30
  • GPU usage: Set TVFLIDS_SIM_CLIENT_GPUS=0.1 per virtual client
    if your GPU has ≥ 8 GB VRAM.

Dataset Setup

NSL-KDD (Primary — required)

# Automatic download via script
bash scripts/download_nslkdd.sh

# OR via Python
python -c "from data.preprocessing.nslkdd_pipeline import download_nslkdd; download_nslkdd('data/raw/KDDTrain+.txt', 'data/raw/KDDTest+.txt')"

Verifies as:

data/raw/KDDTrain+.txt  → ~125,973 rows
data/raw/KDDTest+.txt   → ~22,544 rows

Reproducibility Guarantee

All stochastic operations are seeded via utils/seed.py::set_all_seeds(seed):

Operation Seeded via
NumPy global state np.random.seed(seed)
NumPy RNGs np.random.default_rng(seed)
PyTorch torch.manual_seed(seed) + torch.backends.cudnn.deterministic=True
CUDA torch.cuda.manual_seed_all(seed)
Python hash os.environ["PYTHONHASHSEED"] = str(seed)
Attacks seed + client_id per client
SMOTE random_state=seed
Data partitioning np.random.default_rng(seed)

To reproduce Table 1, use seeds [42, 123, 456, 789, 1337].

Data Integrity Verification

After downloading NSL-KDD, verify file integrity:

python scripts/verify_data.py

Expected SHA-256 hashes are recorded in scripts/verify_data.py. If the check fails, re-run bash scripts/download_nslkdd.sh.

UNSW-NB15 (Secondary — optional)

Download from: https://research.unsw.edu.au/projects/unsw-nb15-dataset Place at: data/raw/UNSW_NB15_training-set.csv and data/raw/UNSW_NB15_testing-set.csv


Quick Start

Run a single experiment

# TV-FLIDS vs 30% label flip attack (primary experiment)
python experiments/run_experiment.py --strategy tvflids --attack label_flip_30

# FedAvg baseline (no defense, shows vulnerability)
python experiments/run_experiment.py --strategy fedavg --attack label_flip_30

# Clean baseline (no attacks, upper bound)
python experiments/run_experiment.py --strategy fedavg --attack no_attack

# FLTrust comparison (SOTA baseline)
python experiments/run_experiment.py --strategy fltrust --attack label_flip_30

Run with options

# Custom seed and rounds (fast test)
python experiments/run_experiment.py \
    --strategy tvflids \
    --attack label_flip_30 \
    --seed 123 \
    --rounds 20

# IID data partition
python experiments/run_experiment.py \
    --strategy tvflids \
    --attack label_flip_30 \
    --partition iid

# Extreme non-IID (stress test)
python experiments/run_experiment.py \
    --strategy tvflids \
    --attack label_flip_30 \
    --partition noniid \
    --alpha 0.1

# Different attack types
python experiments/run_experiment.py --strategy tvflids --attack gradient_scale_30
python experiments/run_experiment.py --strategy tvflids --attack noise_30
python experiments/run_experiment.py --strategy tvflids --attack backdoor_20

Testing

Integration tests will auto-download NSL-KDD if needed.

python tests/test_all.py
python tests/test_integration.py

Reproducing Paper Results

Table 1 — Full Strategy Comparison

python experiments/run_full_comparison.py \
    --strategies fedavg krum trimmed_mean fltrust foolsgold tvflids \
    --attack label_flip_30 \
    --seeds 42 123 456 789 1337 \
    --rounds 100

Figure 3 — Robustness Curve

python experiments/run_ratio_sweep.py \
    --methods fedavg fltrust tvflids \
    --ratios 0.0 0.1 0.2 0.3 0.4 0.5 \
    --seeds 42 123 456 \
    --rounds 100

Table 2 — Ablation Study

python experiments/run_ablation.py \
    --attack label_flip_30 \
    --rounds 100 \
    --seeds 42 123 456 789 1337

All Experiments (Full Reproduction)

# Runs everything — expect 2-4 hours on CPU, ~30 min on GPU
bash scripts/run_all_experiments.sh

Reproducing Paper Tables

After running all experiments:

# Generate LaTeX Table 1
python scripts/generate_tables.py \
    --input results/tables/full_comparison_results.json \
    --output results/tables/table1.tex

# Check result completeness before submission
python scripts/check_results.py

Theoretical Verification

Proposition 1 and Lemma 1 (trust convergence) can be verified numerically:

# Synthetic verification (no data required)
python theory/proposition1_verification.py

# Full theory validation suite
python scripts/run_theory_validation.py

Results are saved to results/tables/proposition1_real.json.


Available Strategies

Strategy Reference Key Property
fedavg McMahan et al., 2017 Standard baseline (no defense)
krum Blanchard et al., NeurIPS 2017 Nearest-neighbor selection
trimmed_mean Yin et al., ICML 2018 Coordinate-wise robust mean
fltrust Cao et al., NDSS 2021 Server-root trust bootstrapping
foolsgold Fung et al., 2018 Sybil resistance via history
flame Nguyen et al., USENIX Security 2022 HDBSCAN + adaptive noise
rfa Pillutla et al., IEEE TSP 2022 Geometric median (Weiszfeld)
tvflids This work 3-criteria gate + adaptive trust
tvflids_fixed This work TV-FLIDS with fixed α, β, γ

Available Attacks

Config Key Type Ratio Description
no_attack 0% Clean baseline
label_flip_10/20/30 Data 10/20/30% Flip attack→Normal labels
gradient_scale_10/30 Model 10/30% Amplify gradient ×10
noise_30 Model 30% Gaussian noise (σ=0.5)
backdoor_20 Data 20% Trigger pattern insertion

Configuration

Edit config/fl_config.yaml to change hyperparameters:

Dataset paths live in config/dataset_config.yaml.

federated_learning:
  num_clients: 20       # Simulated IoT devices
  num_rounds: 100       # FL communication rounds
  fraction_fit: 0.5     # 50% clients participate per round
  local_epochs: 5       # Local training epochs
  local_lr: 0.001       # Adam learning rate

trust:
  alpha: 0.4            # Similarity weight
  beta: 0.4             # Accuracy weight
  gamma: 0.2            # Anomaly penalty weight
  memory_decay: 0.9     # EMA decay factor
  min_trust: 0.01       # Trust floor

verification:
  loss_threshold: 0.0   # Reject if ΔL < 0
  cosine_threshold: 0.0 # Flag if cos_sim < 0
  zscore_threshold: 2.5 # Flag if |z| > 2.5

Expected Results

Strategy Accuracy F1-Macro ASR
FedAvg (no defense) 0.6026 ± 0.0114 0.5726 ± 0.0089 0.4012 ± 0.0253
Krum 0.8170 ± 0.0089 0.7891 ± 0.0108 0.1650 ± 0.0243
Trimmed Mean 0.8038 ± 0.0097 0.7755 ± 0.0104 0.2157 ± 0.0404
FLTrust 0.8642 ± 0.0121 0.8349 ± 0.0107 0.1357 ± 0.0388
FoolsGold 0.8081 ± 0.0109 0.7794 ± 0.0119 0.1908 ± 0.0352
FLAME 0.8256 ± 0.0082 0.8016 ± 0.0095 0.1878 ± 0.0308
RFA 0.8082 ± 0.0117 0.7811 ± 0.0130 0.2016 ± 0.0226
TV-FLIDS 0.8801 ± 0.0095 0.8502 ± 0.0141 0.1908 ± 0.0035

Values are mean ± std over 5 seeds (42, 123, 456, 789, 1337).
Full results: results/tables/full_comparison_results.json


Project Structure

tv-flids/
├── config/                    # Hyperparameter configs (YAML)
│   ├── fl_config.yaml
│   └── dataset_config.yaml
├── data/
│   ├── preprocessing/         # NSL-KDD, UNSW-NB15 pipelines
│   └── partitioning.py        # IID & Non-IID (Dirichlet) partitioners
├── extras/
│   └── mnist_fl_pipeline.py          # MNIST FL benchmark (not used in paper)
├── models/
│   └── mlp.py                 # IDSMLP + IDSBiLSTM architectures
├── fl/
│   ├── client.py              # Flower FL client with attack injection
│   ├── strategy.py            # TVFLIDSStrategy (main novel contribution)
│   └── baselines/             # FedAvg, Krum, TrimMean, FLTrust, FoolsGold
├── trust/
│   ├── trust_scorer.py        # Fixed-weight trust scoring
│   ├── adaptive_trust_scorer.py  # Meta-gradient adaptive α,β,γ
│   └── verification.py        # Three-criteria verification gate
├── attacks/
│   └── adversarial.py         # 4 attack types + configuration registry
├── evaluation/
│   ├── metrics.py             # Accuracy, F1, ASR, FNR tracking
│   ├── statistical_testing.py # Wilcoxon, McNemar, multi-seed reporting
│   ├── visualization.py       # 6 paper-ready figure generators
│   └── overhead.py            # Time/communication cost analysis
├── theory/
│   ├── proposition1_verification.py  # Prop. 1 numerical verification
│   └── convergence_analysis.py       # Convergence rate (τ) fitting
├── experiments/
│   ├── run_experiment.py      # Main experiment runner (start here)
│   ├── run_ablation.py        # A1-A5 ablation studies
│   ├── run_ratio_sweep.py     # Adversarial ratio sweep
│   └── run_full_comparison.py # Multi-seed Table 1 reproduction
├── utils/
│   ├── seed.py                # Centralized seed management
│   └── logger.py              # JSON + TensorBoard logging
├── scripts/
│   ├── download_nslkdd.sh     # Dataset download
│   └── run_all_experiments.sh # Full paper reproduction
├── results/                   # Generated outputs (gitignored)
│   ├── logs/                  # Per-experiment JSON logs
│   ├── figures/               # PDF paper figures
│   └── tables/                # CSV/JSON result tables
├── requirements.txt
└── README.md

Known Limitations

  • Byzantine threshold: TV-FLIDS degrades when adversarial fraction exceeds ~50%.
    At f/N > 0.5, the verification gate cannot reliably separate honest and malicious updates.
  • IID server assumption: The server validation set used for trust scoring must be
    class-balanced and drawn from the same distribution as the global test set.
    Distribution shift between server val and test data is not handled.
  • Single model architecture: All clients use the same MLP architecture.
    Heterogeneous model support (e.g., different depths) is out of scope.
  • NSL-KDD age: NSL-KDD is a 2009-era dataset. Performance on modern IoT traffic
    datasets (e.g., CIC-IoT23) is not evaluated.

Citation

If you use this code in your research, please cite:

@inproceedings{tvflids2025,
  title   = {TV-FLIDS: Trust-Aware and Verifiable Federated Learning for
             Intrusion Detection under Adaptive Byzantine Clients},
  author  = {Ali Akarma},
  booktitle = {},
  year    = {2025},
  url     = {https://github.com/aliakarma/tv-flids}
}

Key References

@inproceedings{blanchard2017nips,
  title={Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent},
  author={Blanchard et al.},
  booktitle={NeurIPS}, year={2017}
}

@inproceedings{cao2021fltrust,
  title={FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping},
  author={Cao et al.},
  booktitle={NDSS}, year={2021}
}

@article{mcmahan2017fedavg,
  title={Communication-Efficient Learning of Deep Networks from Decentralized Data},
  author={McMahan et al.},
  booktitle={AISTATS}, year={2017}
}

License

MIT License — see LICENSE for details.

About

TV-FLIDS: Trust-Aware & Verifiable Federated Intrusion Detection System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors