Skip to content

AIDL-IPAL/SurgicalPhaseRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Surgical Phase Inference Toolkit

This repository provides a practical inference pipeline for surgical phase recognition.

Files

  • phaselib.py: core library for model loading, inference (frame/image/video), postprocessing, rendering.
  • phase_run.py: CLI wrapper that uses phaselib for image/video processing and rendering.
  • frame_api_example.py: tiny example that initializes the model and predicts one frame.

Model Checkpoints

We provide two checkpoints: One is trained using a curated set of 108 videos and the other using a set of 202 videos. For downstream applications, we suggest using the second model, which was trained using all available videos.

Model Name NumTrain NumTest Batch Size LR TestAcc
resnet50_108.pt 108 50 16 1.00E-05 0.8012
resnet50_202.pt 202 0 32 1.00E-05 -

Installation

Use your existing environment with PyTorch. Required packages:

  • torch
  • torchvision
  • numpy
  • opencv-python
  • Pillow
  • pandas

CLI Usage

1. Video input -> video output

For a postprocessed output that attempts to remove mistakes due to the per-frame pipeline, run:

python phase_run.py input.mp4 \
  --output-media output_overlay.mp4 \
  --output-fps 30 \
  --min-segment-frames 15 \
  --use-default-hernia-order

For the best per-frame accuracy, run:

python phase_run.py input.mp4 \
  --output-media output_overlay.mp4 \
  --output-fps 30

2. Image input -> image output

python phase_run.py input_frame.png \
  --output-media output_frame_overlay.png \
  --input-type image \
  --output-type image

--input-type and --output-type support auto, video, image.
By default (auto), type is inferred from file extension. Current CLI behavior supports matching media modes (video -> video, image -> image).

Outputs

For every run, three outputs are produced:

  • Media output (--output-media):
    • video mode: rendered video with timeline below the frame
    • image mode: rendered image with timeline below the image
  • JSON report (--output-json, default: same basename + .json)
  • NPY array (--output-npy, default: same basename + .npy)

JSON content

The JSON includes:

  • input/output metadata
  • raw predictions and postprocessed predictions
  • confidences
  • phase label metadata and timeline colors
  • postprocessing settings used
  • contiguous phase segments with start/end times

NPY content

int32 array of postprocessed frame-level phase indices.

Timeline and Marker Behavior

  • Timeline strip is rendered below the media.
  • Each phase index maps to a distinct color.
  • The current-time marker is drawn only within the timeline strip (it does not extend into the video/image panel).

FPS Control

Use --output-fps to control video output FPS.

  • If omitted, output FPS defaults to input video FPS.
  • Duration is preserved by mapping output timestamps back to processed input-frame indices.

Postprocessing Controls

These controls reduce noisy frame-level phase switching:

  • --median-window: mode filter over local frames.
  • --min-segment-frames: merges very short segments into neighbors.
  • --phase-order: comma-separated ordered phase names or indices to enforce logical progression.
  • --use-default-hernia-order: applies built-in inguinal hernia order.
  • --logic-max-backward: allowed backward steps in ordered phases.
  • --logic-max-forward-jump: max allowed forward jump in ordered phases.

Example with logic-aware smoothing:

python phase_run.py input.mp4 \
  --output-media output_overlay.mp4 \
  --use-default-hernia-order \
  --median-window 5 \
  --min-segment-frames 4 \
  --logic-max-backward 0 \
  --logic-max-forward-jump 1

Programmer API

The runtime API is intentionally simple:

from phaselib import initialize_model
import cv2

predictor = initialize_model(
    model_path="resnet50-p7-v188-b16-lr1em5-a.pt",
    device="auto",
)

# Single frame
frame = cv2.imread("example_frame.png")
pred = predictor.predict_frame(frame)
print(pred.phase_name, pred.confidence)

# Full video with postprocessing
result = predictor.predict_video("surgery.mp4", batch_size=32)
smoothed = predictor.postprocess(
    result.raw_preds, result.confidences,
    use_hernia_order=True, min_segment_frames=3,
)
print(result.num_frames, smoothed)

You can also run frame_api_example.py directly.

References

Please cite the papers below if you use these models:

@article{zang2023surgical,
  title={Surgical phase recognition in inguinal hernia repair---AI-based confirmatory baseline and exploration of competitive models},
  author={Zang, Chengbo and Turkcan, Mehmet Kerem and Narasimhan, Sanjeev and Cao, Yuqing and Yarali, Kaan and Xiang, Zixuan and Szot, Skyler and Ahmad, Feroz and Choksi, Sarah and Bitner, Daniel P and others},
  journal={Bioengineering},
  volume={10},
  number={6},
  pages={654},
  year={2023},
  publisher={MDPI}
}

@article{choksi2023bringing,
  title={Bringing Artificial Intelligence to the operating room: edge computing for real-time surgical phase recognition},
  author={Choksi, Sarah and Szot, Skyler and Zang, Chengbo and Yarali, Kaan and Cao, Yuqing and Ahmad, Feroz and Xiang, Zixuan and Bitner, Daniel P and Kostic, Zoran and Filicori, Filippo},
  journal={Surgical Endoscopy},
  volume={37},
  number={11},
  pages={8778--8784},
  year={2023},
  publisher={Springer}
}

About

Surgical Phase Recognition models for Inguinal Hernia Repair

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages