Our repository is organized into multiple directories for different aspects of robotic surgery research:
Our scene reconstruction and pose analytics pipeline follows these key stages:
-
YOLO-pose Finetuning: Initial model training using the SurgPose dataset to establish foundational pose recognition capabilities. Further refinement of the pretrained model using SurgVU dataset pose annotations (Northwell Physicians Group, Encord. Contact liam.mchugh@columbia.edu) for domain-specific adaptation.
-
Monocular Depth Finetuning*: Using calibrated Stereo-Vision inference as annotations, Metric3D can be finetuned for laparoscopic surgery, improving the performance of depth-integrated kinematics reconstruction on monocular video datasets. Code for finetuning & depth inference (both monocular using Metric3D and stereo using NVLabs FoundationStereo) can be found in the depth_recon subdirectory.
-
Kinematic Inference:
- Core Pose Detection: Extraction of key instrument positions and orientations
- Optional Enhancements:
- Stereo/monocular depth inference for enhanced spatial awareness
- SAM instrument masking for constraining x/y & especially depth projections
-
Kinematic Clustering: Analysis of movement patterns to identify surgical gestures and techniques
This guide will help you set up the environment and run kinematic inference for this project.
git submodule update --init --recursive# Create environment from the provided YAML file
# LOCAL MACHINES (flexible torch/cuda)
conda env create -f kinematics/kinematics_env_flexmachine.yml
# CLOUD ENVIRONMENTS:
conda env create -f kinematics/kinematics_env.yml
# Activate the environment
conda activate kinematicsPose Models (see XX for complete pose analytics report):
- surgvu finetunes: Download surgvu_finetune.zip
- northwell finetune (yolo11m):
Extract and put file in kinematics/models/
python kinematics/kinematic_inference.py --input <input video> --save-video For further questions, please refer to the project documentation or contact the maintainer.
