CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation

Bohao Li | Zhicheng Cao | Huixian Li | Yangming Guo

Abstract

State-of-the-art whole-body pose estimators often lack robustness, producing anatomically implausible predictions in challenging scenes. We posit this failure stems from spurious correlations learned from visual context, a problem we formalize using a Structural Causal Model (SCM). The SCM identifies visual context as a confounder that creates a non-causal backdoor path, corrupting the model's reasoning. We introduce the Causal Intervention Graph Pose (CIGPose) framework to address this by approximating the true causal effect between visual evidence and pose. The core of CIGPose is a novel Causal Intervention Module: it first identifies confounded keypoint representations via predictive uncertainty and then replaces them with learned, context-invariant canonical embeddings. These deconfounded embeddings are processed by a hierarchical graph neural network that reasons over the human skeleton at both local and global semantic levels to enforce anatomical plausibility. Extensive experiments show CIGPose achieves a new state-of-the-art on COCO-WholeBody. Notably, our CIGPose-x model achieves 67.0% AP, surpassing prior methods that rely on extra training data. With the additional UBody dataset, CIGPose-x is further boosted to 67.5% AP, demonstrating superior robustness and data efficiency.

📖 Introduction

CIGPose is a whole-body pose estimation framework that improves robustness in challenging scenes (e.g., occlusion, clutter, difficult lighting) by explicitly addressing visual confounding via a causal perspective.

It is implemented as an MMPose project under mmpose/projects/cigpose.

Causal formulation: Model visual context as a confounder and target the interventional distribution P(Y|do(F)) instead of the observational P(Y|F).
Causal Intervention Module (CIM): Use predictive uncertainty to identify confounded keypoint embeddings and replace them with learned, context-invariant canonical embeddings.
Hierarchical graph reasoning: Perform local (intra-part) and global (inter-part) message passing on deconfounded embeddings to enforce anatomical plausibility.

🖼️ Visualizations

1) Overall results + typical failure case of confounding

(a) Quantitative comparison on COCO-WholeBody val: CIGPose achieves strong accuracy while remaining data-efficient.
(b) Qualitative comparison vs. RTMPose-x: baseline predictions may latch onto spurious background cues; CIGPose mitigates this by intervening on confounded keypoint representations, producing more anatomically plausible poses.

2) Qualitative comparison on challenging scenes

From left to right: input image, RTMPose-x, CIGPose-x.
CIGPose is more robust under common confounders (e.g., occlusion and clutter), yielding cleaner and more coherent whole-body structures.

3) More qualitative results

Additional examples further illustrating the robustness gains and improved anatomical consistency brought by causal intervention + hierarchical graph reasoning.

📊 Model Zoo & Results

Results on COCO-WholeBody v1.0 val

Config	Input Size	FLOPS (G)	Body AP	Foot AP	Face AP	Hand AP	Whole AP	ckpt
CIGPose-m	256x192	2.3	69.0	64.3	82.1	49.7	59.9	pth
CIGPose-l	256x192	4.6	71.2	69.0	83.3	54.0	62.6	pth
CIGPose-l	384x288	10.7	73.0	72.0	88.3	59.8	66.3	pth
CIGPose-x	384x288	18.7	73.5	72.3	88.1	60.2	67.0	pth
CIGPose-l+UBody	256x192	4.6	71.3	66.2	83.4	55.5	63.1	pth
CIGPose-l+UBody	384x288	10.7	73.1	72.3	88.0	61.2	66.9	pth
CIGPose-x+UBody	384x288	18.7	73.5	70.3	88.4	62.6	67.5	pth

Results on COCO val2017

Config	Input Size	FLOPS (G)	Params (M)	AP	AR	ckpt
CIGPose-m	256x192	1.9	14	76.6	79.3	pth
CIGPose-l	256x192	4.2	28	77.6	80.3	pth
CIGPose-l	384x288	9.4	29	78.5	81.1	pth

Results on CrowdPose test set

Config	Input Size	Params (M)	AP	AP easy	AP medium	AP hard	ckpt
CIGPose-m	256x192	14.4	71.4	81.0	72.7	58.9	pth
CIGPose-l	256x192	28.4	73.7	82.8	75.1	61.2	pth
CIGPose-l	384x288	28.8	74.2	82.9	75.6	62.5	pth
CIGPose-x	384x288	50.4	75.8	84.2	77.3	63.6	pth

🛠️ Installation

Our code is based on MMPose.

# 1. Create a conda environment
conda create -n cigpose python=3.8 -y
conda activate cigpose

# 2. Install PyTorch (adjust CUDA version as needed)
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html

# 3. Install MMCV and MMDetection
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"

# 4. Clone this repository
git clone https://github.com/53mins/CIGPose.git
cd CIGPose/mmpose
pip install -v -e .

🏃 Usage

Data Preparation

Please refer to MMPose guidelines for dataset preparation:

⭐Train a model

cd mmpose
bash tools/dist_train.sh mmpose/projects/cigpose/wholebody_2d_keypoint/cigpose-l_8xb32-420e_coco-wholebody-384x288.py 8

⭐Test a model

bash tools/dist_test.sh mmpose/projects/cigpose/wholebody_2d_keypoint/cigpose-l_8xb32-420e_coco-wholebody-384x288.py path/to/checkpoint.pth

🙏 Acknowledgements

This project is built on top of MMPose, and follows its training/testing utilities and dataset conventions.

📄 License

This project is released under the LICENSE.

📚 Citation

If you find CIGPose useful in your research, please consider citing:

@article{li2026cigpose,
  title={CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation},
  author={Li, Bohao and Cao, Zhicheng and Li, Huixian and Guo, Yangming},
  journal={arXiv preprint arXiv:2603.09418},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
mmpose		mmpose
resources		resources
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation

Abstract

📄 Table of Contents

📖 Introduction

🖼️ Visualizations

1) Overall results + typical failure case of confounding

2) Qualitative comparison on challenging scenes

3) More qualitative results

📊 Model Zoo & Results

Results on COCO-WholeBody v1.0 val

Results on COCO val2017

Results on CrowdPose test set

🛠️ Installation

🏃 Usage

Data Preparation

⭐Train a model

⭐Test a model

🙏 Acknowledgements

📄 License

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

CIGPose: Causal Intervention Graph Neural Network for Whole-Body Pose Estimation

Abstract

📄 Table of Contents

📖 Introduction

🖼️ Visualizations

1) Overall results + typical failure case of confounding

2) Qualitative comparison on challenging scenes

3) More qualitative results

📊 Model Zoo & Results

Results on COCO-WholeBody v1.0 val

Results on COCO val2017

Results on CrowdPose test set

🛠️ Installation

🏃 Usage

Data Preparation

⭐Train a model

⭐Test a model

🙏 Acknowledgements

📄 License

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages