You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unsupervised Projector–Camera Mapping with Pix2PixHD
Abstract
The Reality Transform superimposes dynamic, adaptive projections onto surfaces—whether people, buildings or household items—generating vibrant son-et-lumière experiences. Machine learning and sensor feedback let the system continuously refine brightness, color, and patterns, turning physical spaces into interactive storytelling platforms.
This repository demonstrates an unsupervised workflow for learning how a scene transforms camera observations into styled projected images—akin to identifying a transfer function in control theory. We capture “emission”–“recording” pairs automatically, train a high-resolution Pix2PixHD model to approximate the scene’s inverse response, and then leverage external style pipelines (ComfyUI, Stable Diffusion, etc.) to stabilize or creatively guide the final appearance.
No labeled data or explicit calibration is needed: the system learns purely from raw projector–camera interactions. Below are the main steps, plus optional paths for static animations or fully interactive feedback loops.
1. Data Capture
Script:record_dtd_dataset.py
Project Emission (Step Input): Displays a pattern (e.g., from DTD textures) via the projector.
Capture Recording (Step Response): Saves the camera’s reaction in Recordings/.
Alignment: The same filename is used in Emissions/, matching each pair ((E_i, R_i)).
2. Data Preparation
Script:crop_and_prepare_pix2pixHD_dataset.py
Crop & Resize: Optionally remove unwanted vertical space, then resize both emissions and recordings (e.g., 2048×1024).
Aligned Folders: Place camera images in train_A and projector images in train_B.
3. Pix2PixHD Training
Script:train.py
Input/Output Setup: The “recording” ((A)) is the input, and the matching “emission” ((B)) is the target.
GAN & Losses: Pix2PixHD uses adversarial, feature matching, and optionally VGG-based perceptual losses.
Checkpoints: Trained weights (.pth files) are saved at intervals.
This trains a function from the camera view back to the projector domain. By itself, Pix2PixHD is a single transformation—unstable if placed in a direct feedback loop.
The cropped, resized step response and resultant predicted emission are illustrated below.
Step Response
Resultant Predicted Emission
3.5. Optional Animation & Off-Line Styling
Before moving to real-time control, you can use the trained generator’s predicted emissions as static inputs to:
Deforum for music-synchronized or timeline-based animations,
Stable Diffusion WebUI (e.g., for prompt-driven transformations),
Any offline pipeline that modifies or stylizes the emitted image.
This produces non-interactive animations or edited frames for a static scene—no projector feedback involved. You simply pass the generator’s output to your creative or style framework, then (if desired) project the final frames afterward.
Pix2PixHD's predicted Emmission from the step response being input to ControlNet
4. Real-Time Inference & Stabilized Feedback
Script:call_rec2emis_comfui.py
Process:
Capture the environment (camera feedback).
Run the Pix2PixHD generator (raw function).
Incorporate external “style” or pose constraints (ComfyUI, etc.).
Project the updated emission.
In future capture again, and repeat. While stable loops have been demonstrated with Fast Style Transfers, the time per iteration here has made experimentation cumbersome. Future research may be performed on mannequins.
ComfyUI or other style pipelines stabilize and guide the loop. Without them, a naive feedback can oscillate or diverge. This approach enables fully interactive illusions or environment transformations.
5. Future RL Integration
This pipeline can also bootstrap a reinforcement learning (RL) agent: by using Pix2PixHD (plus stylization) as part of an environment model, an RL policy could learn to optimize projector outputs for specific objectives—extending the concept to more sophisticated, policy-driven tasks.
Sample Outputs
Below are clickable thumbnails from the samples/ directory. Images were captures with an industrial camera, and may appear dull to the human eye. Video the Reality Transform captured with conventional cameras is available on social media, e.g. Reddit and X.