This project implements human activity recognition using WiFi signals instead of cameras. It uses Channel State Information (CSI), turns it into images, and classifies activities with a Vision Transformer (ViT) model. The project report is available here.
- Install Python and dependencies:
conda create -n wihar python=3.9
conda activate wihar
pip install -r requirements.txt- Download NTU-Fi HAR CSI files and place them in:
data/NTU-Fi_HAR/train_amp/<activity_name>/*.mat
data/NTU-Fi_HAR/test_amp/<activity_name>/*.mat
- Generate images from CSI:
python cfr_out.py- This creates images in
data/cfr_dataset/train/anddata/cfr_dataset/test/.
- Create a validation split by copying a portion (10–20%) of images from each class in
train/to a newval/folder:
data/cfr_dataset/
├── train/
├── val/
└── test/
- Train the model:
python vit_train.py fit \
--data.dataset custom \
--data.root data/cfr_dataset/ \
--data.num_classes 6 \
--data.batch_size 32 \
--trainer.max_steps 1800 \
--trainer.check_val_every_n_epoch 2 \
--model.warmup_steps 180 \
--model.lr 0.01- Test the model:
python vit_train.py test \
--ckpt_path <path_to_best_checkpoint> \
--data.dataset custom \
--data.root data/cfr_dataset/ \
--data.num_classes 6- Replace
<path_to_best_checkpoint>with the actual checkpoint file.
Thanks to SenseFi and vit-finetune.