A minimal but verified synthetic RGB-D data generation pipeline for human tracking and pose-related computer vision tasks in NVIDIA Isaac Sim.
This project uses SMPL/AMASS-style human motion as the source of body pose and shape, renders the human in Isaac Sim, and exports synchronized RGB, depth, mask, bounding box, 3D joints, camera metadata, and SMPL annotations.
The current focus is not large-scale generation yet. The first goal is to build a reliable geometry-and-annotation pipeline, verify that the exported labels align with the rendered image, and then scale the system carefully.
Synthetic data is useful for human tracking because it can provide labels that are difficult or expensive to collect in real-world data, such as depth, segmentation masks, 3D joints, and parametric body annotations.
However, a synthetic image is only useful if its labels are trustworthy. A rendered RGB image may look correct while the 3D pose, camera metadata, depth, mask, or bounding box are misaligned. This project therefore focuses first on internal consistency before scaling to more humans, scenes, cameras, and appearance variation.
The project follows an iterative development process. I first define the assumptions and output labels, then build a minimal end-to-end pipeline, and then validate whether the exported annotations are consistent with the rendered frame.
Only after this clean pipeline is verified should the system move to appearance variation, real-world sanity checking, design revision, and large-scale data generation.
The current implementation covers the green part of the workflow: a controlled SMPL/AMASS/Isaac Sim pipeline with internal consistency validation. The next major step is an appearance module prototype for clothing, eyewear, and headwear.
The current pipeline starts from an AMASS motion sequence. AMASS provides natural mocap-based motion in an SMPL-compatible format, including pose, translation, and body-shape information.
The SMPL body model converts this motion into a posed human mesh, SMPL parameters, and 3D joints. Isaac Sim then renders the human in an indoor scene using a configured camera. The pipeline exports RGB, depth, and mask outputs, along with SMPL annotations, bounding boxes, 3D joints, and camera metadata.
A final overlay check is used to verify that the exported annotations match the rendered frame.
This is an example RGB frame rendered from Isaac Sim. The current version uses a clean SMPL body, which is useful for verifying the geometry and annotation pipeline before adding clothing and appearance variation.
The depth image is exported from the renderer and provides the per-pixel distance information associated with the same camera frame.
The instance mask identifies visible human pixels in the rendered image. This is useful for segmentation, occlusion analysis, bounding box generation, and debugging label-image alignment.
The sequence preview shows the SMPL human animated over time in the Isaac Sim scene. The current motion comes from AMASS and is converted through the SMPL body model.
The overlay image is used to validate that the exported labels are aligned with the rendered RGB frame.
The validation checks include:
- the instance mask covers the visible human body
- the bounding box encloses the human region
- projected SMPL joints align with the rendered body
- RGB, depth, mask, SMPL metadata, and camera metadata refer to the same frame
This does not prove real-world transfer, but it verifies that the synthetic labels are geometrically consistent before scaling the dataset.
For each generated frame, the pipeline can export:
| Output | Description |
|---|---|
| RGB image | Rendered camera image from Isaac Sim |
| Depth image | Ground-truth depth from the renderer |
| Instance mask | Visible human mask in image space |
| Bounding box | 2D box around the visible human |
| SMPL parameters | Body pose, shape, and translation metadata |
| 3D joints | SMPL-derived joints transformed into render/world frame |
| Camera metadata | Camera pose and intrinsics used for projection |
| Verification overlay | RGB image with mask, bbox, and projected joints |
Implemented:
- SMPL/AMASS-based human mesh sequence generation
- Isaac Sim RGB-D rendering
- instance mask export
- bounding box export
- SMPL annotation export
- 3D joint export in render/world frame
- annotation overlay validation
Planned:
- more AMASS motion sequences
- more camera viewpoints and indoor scenes
- body-shape variation through SMPL parameters
- SMPL-compatible clothing / appearance variation
- small real-world RGB-D reference check
- downstream model testing for detection, pose, or tracking
The planned development path is:
- Verify the clean SMPL-based pipeline
- Add a small SMPL-compatible clothing / appearance prototype
- Collect a small real-world RGB-D reference set
- Compare synthetic and real data gaps
- Revise generator settings
- Scale one factor at a time: motion, body shape, camera/scene, appearance, and multi-person cases
- Test downstream utility on detection, pose, or tracking models
This repository is intended to contain code, configuration files, documentation, and small example outputs.
Large datasets, SMPL model files, AMASS data, and third-party assets should not be committed directly unless their licenses explicitly allow redistribution.