Kun Wang · Haifeng Bao* · Weining Fang
State Key Laboratory of Advanced Rail Autonomous Operation, Beijing Jiaotong University
📄 Official implementation of the DriverHOI model from our paper submitted to Reliability Engineering & System Safety.
Targeting the spatially dense layout of display and control devices on train driving consoles, we designed the DriverHOI model based on Graph Parsing Neural Networks (GPNN). By fusing 3D hand poses and device geometric priors, the model constructs a "Driver–Device" heterogeneous graph and performs iterative message passing, achieving joint inference of driving actions and interaction objects.
Performance under 10-fold Leave-One-Subject-Out (LOSO) cross-validation:
| Metric | Score |
|---|---|
| Overall Accuracy | 94.0% |
| Action Accuracy | 98.3% |
| Interaction Accuracy | 99.1% |
| Device Top-1 Accuracy | 94.6% |
| Device Top-3 Accuracy | 99.8% |
The DriverHOI3D dataset was collected in a high-fidelity 1:1 train driving simulator. It contains 1,856 samples from 10 subjects, covering 47 valid action–device interaction pairs.
| Item | Detail |
|---|---|
| Subjects | 10 participants |
| Samples | 1,856 |
| Actions | 4 types — Point, Press, Push, Swing |
| Devices | 31 console devices + 1 "No Interaction" |
| Camera Views | 4 synchronized views per sample |
| Annotations | RGB image, 3D hand keypoints (21 joints/hand), camera parameters |
📂 Data Directory Structure (click to expand)
DriverHOI3D/
├── calibration/
│ ├── intri.yml
│ └── extri.yml
├── subject1/
│ ├── point/
│ │ ├── device0/
│ │ │ ├── MBP25030012/ # Camera view 1
│ │ │ ├── MBP25030014/ # Camera view 2
│ │ │ ├── MBP25030016/ # Camera view 3
│ │ │ ├── MBP25030017/ # Camera view 4
│ │ │ └── post_json/ # 3D keypoint annotations
│ │ ├── device1/
│ │ └── ...
│ ├── press/
│ ├── push/
│ └── swing/
├── subject2/
└── ...
git clone https://github.com/wang-10086/DriverHOI.git
cd DriverHOI
pip install -r requirements.txtEdit config.py to set your dataset path:
DATA_ROOT = "/path/to/DriverHOI3D"python train.py --test_subj subject1 --exp_name subject1 --model_type DriverHOI --num_epochs 50 --lr 1e-3python run_cv.py --model_type DriverHOI --job_name driverhoi_exp01This sequentially trains 10 folds, each time leaving one subject out for testing.
Single fold — edit CKPT_PATH and TEST_SUBJ in test.py before running:
python test.pyBatch evaluation (all folds) — edit JOB_NAME in test_cv.py to match the training job name:
python test_cv.pyThe framework supports four ablation configurations to verify the contribution of each input feature:
| Mode | Pose | Geom | Visual | Act Acc | Top-1 Acc | Overall Acc |
|---|---|---|---|---|---|---|
| Baseline | ✅ | ✅ | ❌ | 98.3% | 94.6% | 94.0% |
| No_Pose | ❌ | ✅ | ❌ | 89.7% | 94.1% | 84.9% |
| No_Geom | ✅ | ❌ | ❌ | 97.7% | 56.6% | 55.5% |
| With_Visual | ✅ | ✅ | ✅ | 92.8% | 93.9% | 88.1% |
python run_cv.py --model_type DriverHOI --ablation no_pose --job_name ablation_no_pose
python run_cv.py --model_type DriverHOI --ablation no_geom --job_name ablation_no_geom
python run_cv.py --model_type DriverHOI --ablation with_visual --job_name ablation_visualWe compare three interaction reasoning mechanisms under the same feature encoding:
| Model | Mechanism | Act Acc | Top-1 Acc | Overall Acc |
|---|---|---|---|---|
| TransHOI | Attention-based | 97.2% | 92.9% | 90.6% |
| SCG-HOI | SCG-based (static graph) | 96.8% | 91.6% | 88.4% |
| DriverHOI | GPNN (dynamic graph) | 98.3% | 94.6% | 94.0% |
python run_cv.py --model_type TransHOI --job_name transhoi_exp01
python run_cv.py --model_type SCG-HOI --job_name scghoi_exp01This project is released for academic research purposes.