Heatmap-Free Lightweight Pose Estimation via Multiplicative Feature Interaction and Occlusion-Aware Training
π οΈ Installation | π Quick Start | π Model Zoo | π Ablations | π Citation
StarNet-Pose is an efficient 2D human pose estimation framework that leverages the lightweight StarNet backbone based on element-wise multiplicative feature interaction. It achieves state-of-the-art performance among lightweight top-down pose estimators while running at up to 313.8 FPS.
The core contributions:
- StarNet Backbone: A lightweight CNN architecture based on the "Star Operation" (element-wise multiplication of two feature branches), achieving strong feature representation with significantly fewer parameters than conventional backbones.
- Multiplicative Feature Interaction: Replaces traditional additive feature fusion (e.g., skip connections in HRNet) with multiplicative interaction, enabling richer feature representation at lower computational cost.
- Occlusion-Aware Training: Systematic evaluation of CoarseDropout augmentation strategies for improving robustness under heavy occlusion.
- Two Efficient Variants: StarNet-Pose-T (3.5M params, 72.00 AP, 313.8 FPS) and StarNet-Pose-S (6.4M params, 72.99 AP, 173.5 FPS).
This repository provides the official implementation of "StarNet-Pose: Heatmap-Free Lightweight Pose Estimation via Multiplicative Feature Interaction and Occlusion-Aware Training", accepted at Neurocomputing.
π Built upon MMPose β the OpenMMLab Pose Estimation Toolbox.
| Feature | Description |
|---|---|
| πͺΆ Lightweight | Two variants: 3.5M (Tiny) and 6.4M (Small) parameters |
| β‘ Ultra-Fast | Up to 313.8 FPS (Tiny) on RTX 5090 |
| π― High Precision | 72.99 AP (Small) on COCO val2017, surpassing RTMPose-s by +1.4 AP |
| π‘οΈ Occlusion Robust | Occlusion-aware training via CoarseDropout augmentation |
| π Plug-and-Play | Compatible with RTMPose head and full MMPose ecosystem |
| π¦ Two Variants | StarNet-Pose-T (Tiny) and StarNet-Pose-S (Small) |
- Python >= 3.8
- PyTorch >= 1.8 (Install Guide)
- CUDA (optional, for GPU training)
StarNet-Pose is built upon MMPose. First install the base framework:
# Install MMEngine & MMCV
pip install mmengine mmcv
# Install MMPose (the full framework)
git clone https://github.com/open-mmlab/mmpose.git
cd mmpose
pip install -e .
cd ..git clone https://github.com/lechan775/starnet-pose.git
cd starnet-pose
# Install dependencies
pip install -r requirements.txt
# Copy backbone modules into your MMPose installation
cp -r mmpose/models/backbones/* <mmpose_install_path>/mmpose/models/backbones/
cp -r configs/body_2d_keypoint/rtmpose/coco/* <mmpose_install_path>/configs/body_2d_keypoint/rtmpose/coco/Pre-trained StarNet backbone weights (ImageNet-1K) and StarNet-Pose model checkpoints are distributed via GitHub Releases.
# StarNet-S1 backbone (for StarNet-Pose-T)
wget https://github.com/ma-xu/Rewrite-the-Stars/releases/download/checkpoints_v1/starnet_s1.pth.tar
# StarNet-S3 backbone (for StarNet-Pose-S)
wget https://github.com/ma-xu/Rewrite-the-Stars/releases/download/checkpoints_v1/starnet_s3.pth.tarFull StarNet-Pose checkpoints are available at GitHub Releases.
import torch
from starnet import starnet_s3
# Load pre-trained StarNet-S3 (ImageNet-1K)
model = starnet_s3(pretrained=True)
model.eval()
x = torch.randn(1, 3, 224, 224)
feat = model(x) # 1000-way ImageNet logitscd <mmpose_install_path>
# Train StarNet-Pose-S (recommended: 8 GPUs)
GPUS=8 bash tools/dist_train.sh \
configs/body_2d_keypoint/rtmpose/coco/rtmpose_starnetca-s3_8xb256-420e_coco-256x192.py 8
# Fast evaluation on COCO-mini
python tools/train.py \
configs/body_2d_keypoint/rtmpose/coco/rtmpose_starnetca-s3_8xb256-420e_coco-256x192_mini.py# Evaluate trained model on COCO val2017
python tools/test.py \
configs/body_2d_keypoint/rtmpose/coco/rtmpose_starnetca-s3_8xb256-420e_coco-256x192.py \
<path_to_checkpoint>.pth# Compute FLOPs and parameter count
python tools/analysis_tools/get_flops.py \
configs/body_2d_keypoint/rtmpose/coco/rtmpose_starnetca-s3_8xb256-420e_coco-256x192.py
# Measure inference latency and FPS
python tools/benchmark_latency.py \
configs/body_2d_keypoint/rtmpose/coco/rtmpose_starnetca-s3_8xb256-420e_coco-256x192.py \
<path_to_checkpoint>.pth| Model | Backbone | Params (M) | GFLOPs | AP | APβ΅β° | APβ·β΅ | AR | Latency (ms) | FPS |
|---|---|---|---|---|---|---|---|---|---|
| StarNet-Pose-T | StarNet-S1 | 3.504 | 0.435 | 72.00 | 91.51 | 79.62 | 75.03 | 3.19 | 313.8 |
| StarNet-Pose-S | StarNet-S3 + CA | 6.428 | 0.765 | 72.99 | 91.64 | 80.69 | 76.01 | 5.76 | 173.5 |
β‘ FPS measured on NVIDIA RTX 5090 under unified PyTorch/MMPose forward benchmark (batch=1). StarNet-Pose-T surpasses RTMPose-t by +3.8 AP while running at a higher frame rate.
| Method | Backbone | Params (M) | GFLOPs | AP | APβ΅β° | APβ·β΅ | AR |
|---|---|---|---|---|---|---|---|
| RTMPose-t | CSPNeXt-t | 3.34 | 0.360 | 68.20 | 88.30 | 75.90 | 73.60 |
| RTMPose-s | CSPNeXt-s | 5.47 | 0.680 | 71.60 | 89.20 | 78.90 | 76.80 |
| Lite-HRNet-30 | Lite-HRNet-30 | 1.80 | 0.319 | 67.20 | 88.00 | 75.00 | 73.30 |
| X-HRNet-30 | X-HRNet-30 | 2.10 | 0.300 | 67.40 | 87.50 | 75.40 | 73.50 |
| LMFormer-L | LMFormer-L | 4.10 | 1.400 | 68.90 | 88.30 | 76.40 | 74.70 |
| LGM-Pose | LGM-Pose | 1.10 | 0.600 | 69.30 | 89.50 | 76.20 | 73.70 |
| StarNet-Pose-T | StarNet-S1 | 3.504 | 0.435 | 72.00 | 91.51 | 79.62 | 75.03 |
| StarNet-Pose-S | StarNet-S3 + CA | 6.428 | 0.765 | 72.99 | 91.64 | 80.69 | 76.01 |
| Model | Params (M) | FLOPs (G) | AP | APβ·β΅ | AR | Latency (ms) | FPS |
|---|---|---|---|---|---|---|---|
| Lite-HRNet-18 | 1.10 | 0.205 | 64.80 | 73.00 | 71.20 | 18.14 | 55.1 |
| RTMPose-t | 3.34 | 0.360 | 68.20 | 75.90 | 73.60 | 3.51 | 285.1 |
| StarNet-Pose-T | 3.504 | 0.435 | 72.00 | 79.62 | 75.03 | 3.19 | 313.8 |
| Lite-HRNet-30 | 1.80 | 0.319 | 67.20 | 75.00 | 73.30 | 30.38 | 32.9 |
| RTMPose-s | 5.47 | 0.680 | 71.60 | 78.90 | 76.80 | 3.71 | 269.6 |
| StarNet-Pose-S | 6.428 | 0.765 | 72.99 | 80.69 | 76.01 | 5.76 | 173.5 |
| Component | StarNet-Pose-T (Tiny) | StarNet-Pose-S (Small) |
|---|---|---|
| Stem | 3Γ3 Conv-BN-ReLU6, stride 2 | 3Γ3 Conv-BN-ReLU6, stride 2 |
| Stage 1 | 24 ch, depth 2, no CA | 32 ch, depth 2, no CA |
| Stage 2 | 48 ch, depth 2, no CA | 64 ch, depth 2, no CA |
| Stage 3 | 96 ch, depth 8, no CA | 128 ch, depth 8, CA enabled |
| Stage 4 | 192 ch, depth 3, no CA | 256 ch, depth 4, CA enabled |
| Head | RTMCCHead (SimCC), in-ch 192 | RTMCCHead (SimCC), in-ch 256 |
| Configuration | AP | APβ΅β° | APβ·β΅ | AR | Best AP (epoch) |
|---|---|---|---|---|---|
| StarNet baseline (w/o attention) | 50.62 | 77.94 | 54.08 | 54.42 | 50.67 (180) |
| StarNet + CBAM | 49.54 | 77.21 | 54.47 | 53.35 | 50.21 (190) |
| StarNet + CA | 49.65 | 78.15 | 51.26 | 53.46 | 50.60 (170) |
| Configuration | Backbone | CoarseDropout | AP |
|---|---|---|---|
| RTMPose-T baseline | CSPNeXt-Tiny | 1.0β0.5 schedule | 42.14 |
| StarNet-S1 (w/o CA) | StarNet-S1 | 1.0β0.5 schedule | 47.30 |
| StarNetCA-S1 | StarNet-S1 | 1.0β0.5 schedule | 46.86 |
| StarNetCA-S1 + fixed dropout | StarNet-S1 | Fixed p=0.6 | 47.13 |
| StarNet-S1 + fixed dropout | StarNet-S1 | Fixed p=0.6 | 47.48 |
To reproduce these experiments:
bash experiments/occlusion_prob_study/run_fast_track_experiments.sh
Input (256Γ192Γ3)
β
βΌ
ββββββββββββββββ
β Stem (Conv) β 3Γ3 Conv-BN-ReLU6, stride=2
ββββββββ¬ββββββββ
βΌ
ββββββββββββββββ
β Stage 1 β StarBlock ΓNβ (no CA)
β (64Γ48) β
ββββββββ¬ββββββββ
βΌ Downsample (3Γ3 Conv, stride=2)
ββββββββββββββββ
β Stage 2 β StarBlock ΓNβ (no CA)
β (32Γ24) β
ββββββββ¬ββββββββ
βΌ Downsample
ββββββββββββββββ
β Stage 3 β StarBlock ΓNβ (CA enabled in S variant)
β (16Γ12) β
ββββββββ¬ββββββββ
βΌ Downsample
ββββββββββββββββ
β Stage 4 β StarBlock ΓNβ (CA enabled in S variant)
β (8Γ6) β
ββββββββ¬ββββββββ
βΌ
ββββββββββββββββ
β RTMCC Head β SimCC-based coordinate classification
ββββββββ¬ββββββββ
βΌ
Keypoints (17 Γ 2Β·(W+H) logits)
The StarBlock core operation β multiplicative feature interaction:
βββββββββββ βββββββββββ
x ββββΆβ DWConv ββββΆ fβ ββΆβ ReLU6 ββββ
β 7Γ7 β βββββββββββ β element-wise
βββββββββββ ββββΆ multiply βββΆ g βββΆ DWConv2 βββΆ + βββΆ out
βββββββββββ β β²
β DWConv ββββΆ fβ ββββββββββββββββ β
β 7Γ7 β input βββββββ (residual)
βββββββββββ
Coordinate Attention (applied after the g projection in StarNet-Pose-S):
x (B,C,H,W) βββ¬βββΆ Pool H (B,C,H,1) βββ
β ββββΆ Concat βββΆ Conv1Γ1+BN+ReLU βββΆ Split
ββββΆ Pool W (B,C,1,W) βββ β β
βΌ βΌ
ConvH β Sigmoid ConvW β Sigmoid
β β
ββββββΆ x Γ attnH Γ attnW βββΆ out
See figures/ for detailed architecture diagrams (.drawio format).
starnet-pose/
βββ starnet.py # Standalone StarNet implementation (no MMPose dep.)
βββ mmpose/models/backbones/
β βββ starnet.py # StarNet backbone for MMPose (StarNet-S1/S3)
β βββ starnet_ca.py # StarNetCA backbone (StarNet + Coordinate Attention)
β βββ utils/
β βββ coordinate_attention.py # Coordinate Attention module (CVPR 2021)
βββ configs/body_2d_keypoint/rtmpose/coco/
β βββ rtmpose_starnet-s3_*.py # StarNet-Pose-S (vanilla StarNet-S3)
β βββ rtmpose_starnetca-s3_*.py # StarNet-Pose-S (StarNet-S3 + CA stages 3-4)
β βββ rtmpose_starnetca-s3_*_a800.py # A800-optimized config
β βββ rtmpose_*_mini.py # COCO-mini fast evaluation configs
βββ experiments/occlusion_prob_study/
β βββ datasets.py # Dataset path configuration
β βββ experiment_matrix.py # Experiment suite definitions
β βββ generate_configs.py # Config generator
β βββ run_fast_track_experiments.sh # Fast evaluation pipeline (~2 hours)
β βββ run_all_paper_experiments.sh # Full experiment suite
β βββ generated_configs/ # Auto-generated experiment .py files
βββ tools/
β βββ train.py # Training launch script
β βββ benchmark_latency.py # FPS & latency benchmark
β βββ test_starnet_*.py # Unit tests
β βββ visualize_*.py # Visualization utilities
βββ demo/ # Demo examples
βββ figures/ # Architecture diagrams (.drawio)
βββ requirements.txt # Python dependencies
βββ LICENSE # Apache 2.0 License
βββ CITATION.cff # Citation metadata
βββ README.md # This file
This work builds upon several excellent open-source projects:
- StarNet / Rewrite the Stars (CVPR 2024) β The StarNet backbone architecture by Xu Ma et al.
- Coordinate Attention (CVPR 2021) β The CA module by Qibin Hou et al.
- MMPose β The OpenMMLab Pose Estimation Toolbox.
- RTMPose β Real-time multi-person pose estimation framework.
If you use StarNet-Pose in your research, please cite:
@article{pan2025starnetpose,
title = {StarNet-Pose: Heatmap-Free Lightweight Pose Estimation via
Multiplicative Feature Interaction and Occlusion-Aware Training},
author = {Zheng Luo and Guowei Jiang and Runhang Pan and Qi Qi and Xin Xie and Siyuan Chen},
journal = {Neurocomputing},
year = {2026},
url = {https://github.com/lechan775/starnet-pose}
}Also consider citing the foundational works:
@inproceedings{ma2024starnet,
title = {Rewrite the Stars},
author = {Ma, Xu and ...},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR)},
year = {2024}
}
@inproceedings{hou2021ca,
title = {Coordinate Attention for Efficient Mobile Network Design},
author = {Hou, Qibin and Zhou, Daquan and Feng, Jiashi},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR)},
year = {2021}
}
@misc{mmpose2020,
title = {OpenMMLab Pose Estimation Toolbox and Benchmark},
author = {MMPose Contributors},
year = {2020},
url = {https://github.com/open-mmlab/mmpose}
}This project is released under the Apache 2.0 License.
Note: The StarNet backbone code is adapted from Rewrite-the-Stars. The Coordinate Attention module is adapted from CoordAttention. This project inherits the Apache 2.0 license from MMPose.