Skip to content

53mins/CIGPose

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation

Bohao Li  |  Zhicheng Cao  |  Huixian Li  |  Yangming Guo

Paper (arXiv:2603.09418)

Abstract

State-of-the-art whole-body pose estimators often lack robustness, producing anatomically implausible predictions in challenging scenes. We posit this failure stems from spurious correlations learned from visual context, a problem we formalize using a Structural Causal Model (SCM). The SCM identifies visual context as a confounder that creates a non-causal backdoor path, corrupting the model's reasoning. We introduce the Causal Intervention Graph Pose (CIGPose) framework to address this by approximating the true causal effect between visual evidence and pose. The core of CIGPose is a novel Causal Intervention Module: it first identifies confounded keypoint representations via predictive uncertainty and then replaces them with learned, context-invariant canonical embeddings. These deconfounded embeddings are processed by a hierarchical graph neural network that reasons over the human skeleton at both local and global semantic levels to enforce anatomical plausibility. Extensive experiments show CIGPose achieves a new state-of-the-art on COCO-WholeBody. Notably, our CIGPose-x model achieves 67.0% AP, surpassing prior methods that rely on extra training data. With the additional UBody dataset, CIGPose-x is further boosted to 67.5% AP, demonstrating superior robustness and data efficiency.

📄 Table of Contents

📖 Introduction

CIGPose is a whole-body pose estimation framework that improves robustness in challenging scenes (e.g., occlusion, clutter, difficult lighting) by explicitly addressing visual confounding via a causal perspective.

It is implemented as an MMPose project under mmpose/projects/cigpose.

  • Causal formulation: Model visual context as a confounder and target the interventional distribution P(Y|do(F)) instead of the observational P(Y|F).
  • Causal Intervention Module (CIM): Use predictive uncertainty to identify confounded keypoint embeddings and replace them with learned, context-invariant canonical embeddings.
  • Hierarchical graph reasoning: Perform local (intra-part) and global (inter-part) message passing on deconfounded embeddings to enforce anatomical plausibility.

🖼️ Visualizations

1) Overall results + typical failure case of confounding

  • (a) Quantitative comparison on COCO-WholeBody val: CIGPose achieves strong accuracy while remaining data-efficient.
  • (b) Qualitative comparison vs. RTMPose-x: baseline predictions may latch onto spurious background cues; CIGPose mitigates this by intervening on confounded keypoint representations, producing more anatomically plausible poses.

2) Qualitative comparison on challenging scenes

  • From left to right: input image, RTMPose-x, CIGPose-x.
  • CIGPose is more robust under common confounders (e.g., occlusion and clutter), yielding cleaner and more coherent whole-body structures.

3) More qualitative results

  • Additional examples further illustrating the robustness gains and improved anatomical consistency brought by causal intervention + hierarchical graph reasoning.

📊 Model Zoo & Results

Results on COCO-WholeBody v1.0 val

Config Input Size FLOPS (G) Body AP Foot AP Face AP Hand AP Whole AP ckpt
CIGPose-m 256x192 2.3 69.0 64.3 82.1 49.7 59.9 pth
CIGPose-l 256x192 4.6 71.2 69.0 83.3 54.0 62.6 pth
CIGPose-l 384x288 10.7 73.0 72.0 88.3 59.8 66.3 pth
CIGPose-x 384x288 18.7 73.5 72.3 88.1 60.2 67.0 pth
CIGPose-l+UBody 256x192 4.6 71.3 66.2 83.4 55.5 63.1 pth
CIGPose-l+UBody 384x288 10.7 73.1 72.3 88.0 61.2 66.9 pth
CIGPose-x+UBody 384x288 18.7 73.5 70.3 88.4 62.6 67.5 pth

Results on COCO val2017

Config Input Size FLOPS (G) Params (M) AP AR ckpt
CIGPose-m 256x192 1.9 14 76.6 79.3 pth
CIGPose-l 256x192 4.2 28 77.6 80.3 pth
CIGPose-l 384x288 9.4 29 78.5 81.1 pth

Results on CrowdPose test set

Config Input Size Params (M) AP AP easy AP medium AP hard ckpt
CIGPose-m 256x192 14.4 71.4 81.0 72.7 58.9 pth
CIGPose-l 256x192 28.4 73.7 82.8 75.1 61.2 pth
CIGPose-l 384x288 28.8 74.2 82.9 75.6 62.5 pth
CIGPose-x 384x288 50.4 75.8 84.2 77.3 63.6 pth

🛠️ Installation

Our code is based on MMPose.

# 1. Create a conda environment
conda create -n cigpose python=3.8 -y
conda activate cigpose

# 2. Install PyTorch (adjust CUDA version as needed)
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html

# 3. Install MMCV and MMDetection
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"

# 4. Clone this repository
git clone https://github.com/53mins/CIGPose.git
cd CIGPose/mmpose
pip install -v -e .

🏃 Usage

Data Preparation

Please refer to MMPose guidelines for dataset preparation:

⭐Train a model

cd mmpose
bash tools/dist_train.sh mmpose/projects/cigpose/wholebody_2d_keypoint/cigpose-l_8xb32-420e_coco-wholebody-384x288.py 8

⭐Test a model

bash tools/dist_test.sh mmpose/projects/cigpose/wholebody_2d_keypoint/cigpose-l_8xb32-420e_coco-wholebody-384x288.py path/to/checkpoint.pth

🙏 Acknowledgements

  • This project is built on top of MMPose, and follows its training/testing utilities and dataset conventions.

📄 License

This project is released under the LICENSE.

📚 Citation

If you find CIGPose useful in your research, please consider citing:

@article{li2026cigpose,
  title={CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation},
  author={Li, Bohao and Cao, Zhicheng and Li, Huixian and Guo, Yangming},
  journal={arXiv preprint arXiv:2603.09418},
  year={2026}
}

About

[CVPR 2026] CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages