A quick attempt to utilize Stable Diffusion (SD) image generative models (v1.5 & XL) and MediaPipe's Pose Landmarker vision models to generate 3D human poses from an AI prompt.
- 2025-06-22: Minor updates on docs and dependencies.
- 2024-11-12: Added a Godot full-body IK pose solver.
- 2024-10-24: Initial implementation.
- Clone repo:
git clone https://github.com/jerenchen/simple-diffusion-pose-gen.git. - Change dir into
pythonand (optionally) use a virtual enivronment (e.g. conda). - Install Python depedencies:
pip install -r requirements.txt. - Download MediaPipe Pose Landmarker (Full) and save the file under
python/tasks.
- Inside dir
python, Initialize the Python pose-gen service:python posegen.py --base sd15 --steps 8. - Open and run
projec.godotinside dirgodotwith Godot Engine. - Enter a prompot to generate a pose.
![]() |
|---|
| Generating a pose using prompt "A basketball player making a 3-pointer jump shot" |
NOTE: The standalone demo requires PySide6 >= v6.7.
Inside dir python, run: python simple-diffusion-pose-gen.py, and then enter a prompt to generate a pose.
![]() |
|---|
![]() |
- SD15 has fewer model parameters and therefore requires less memory whereas SDXL could generate images with higher fidelity.
- Hyper-SD (a SD inference acceleration technique) Steps: 2, 4, or 8, trade-off between speed (fewer bigger steps) and quality (more smaller steps).
- PyTorch device for running SD inference (if available): CPU, CUDA, or MPS (Apple Silicon).


