Surgical Phase Inference Toolkit

This repository provides a practical inference pipeline for surgical phase recognition.

Files

phaselib.py: core library for model loading, inference (frame/image/video), postprocessing, rendering.
phase_run.py: CLI wrapper that uses phaselib for image/video processing and rendering.
frame_api_example.py: tiny example that initializes the model and predicts one frame.

Model Checkpoints

We provide two checkpoints: One is trained using a curated set of 108 videos and the other using a set of 202 videos. For downstream applications, we suggest using the second model, which was trained using all available videos.

Model Name	NumTrain	NumTest	Batch Size	LR	TestAcc
resnet50_108.pt	108	50	16	1.00E-05	0.8012
resnet50_202.pt	202	0	32	1.00E-05	-

Installation

Use your existing environment with PyTorch. Required packages:

torch
torchvision
numpy
opencv-python
Pillow
pandas

CLI Usage

1. Video input -> video output

For a postprocessed output that attempts to remove mistakes due to the per-frame pipeline, run:

python phase_run.py input.mp4 \
  --output-media output_overlay.mp4 \
  --output-fps 30 \
  --min-segment-frames 15 \
  --use-default-hernia-order

For the best per-frame accuracy, run:

python phase_run.py input.mp4 \
  --output-media output_overlay.mp4 \
  --output-fps 30

2. Image input -> image output

python phase_run.py input_frame.png \
  --output-media output_frame_overlay.png \
  --input-type image \
  --output-type image

--input-type and --output-type support auto, video, image.
By default (auto), type is inferred from file extension. Current CLI behavior supports matching media modes (video -> video, image -> image).

Outputs

For every run, three outputs are produced:

Media output (--output-media):
- video mode: rendered video with timeline below the frame
- image mode: rendered image with timeline below the image
JSON report (--output-json, default: same basename + .json)
NPY array (--output-npy, default: same basename + .npy)

JSON content

The JSON includes:

input/output metadata
raw predictions and postprocessed predictions
confidences
phase label metadata and timeline colors
postprocessing settings used
contiguous phase segments with start/end times

NPY content

int32 array of postprocessed frame-level phase indices.

Timeline and Marker Behavior

Timeline strip is rendered below the media.
Each phase index maps to a distinct color.
The current-time marker is drawn only within the timeline strip (it does not extend into the video/image panel).

FPS Control

Use --output-fps to control video output FPS.

If omitted, output FPS defaults to input video FPS.
Duration is preserved by mapping output timestamps back to processed input-frame indices.

Postprocessing Controls

These controls reduce noisy frame-level phase switching:

--median-window: mode filter over local frames.
--min-segment-frames: merges very short segments into neighbors.
--phase-order: comma-separated ordered phase names or indices to enforce logical progression.
--use-default-hernia-order: applies built-in inguinal hernia order.
--logic-max-backward: allowed backward steps in ordered phases.
--logic-max-forward-jump: max allowed forward jump in ordered phases.

Example with logic-aware smoothing:

python phase_run.py input.mp4 \
  --output-media output_overlay.mp4 \
  --use-default-hernia-order \
  --median-window 5 \
  --min-segment-frames 4 \
  --logic-max-backward 0 \
  --logic-max-forward-jump 1

Programmer API

The runtime API is intentionally simple:

from phaselib import initialize_model
import cv2

predictor = initialize_model(
    model_path="resnet50-p7-v188-b16-lr1em5-a.pt",
    device="auto",
)

# Single frame
frame = cv2.imread("example_frame.png")
pred = predictor.predict_frame(frame)
print(pred.phase_name, pred.confidence)

# Full video with postprocessing
result = predictor.predict_video("surgery.mp4", batch_size=32)
smoothed = predictor.postprocess(
    result.raw_preds, result.confidences,
    use_hernia_order=True, min_segment_frames=3,
)
print(result.num_frames, smoothed)

You can also run frame_api_example.py directly.

References

Please cite the papers below if you use these models:

@article{zang2023surgical,
  title={Surgical phase recognition in inguinal hernia repair---AI-based confirmatory baseline and exploration of competitive models},
  author={Zang, Chengbo and Turkcan, Mehmet Kerem and Narasimhan, Sanjeev and Cao, Yuqing and Yarali, Kaan and Xiang, Zixuan and Szot, Skyler and Ahmad, Feroz and Choksi, Sarah and Bitner, Daniel P and others},
  journal={Bioengineering},
  volume={10},
  number={6},
  pages={654},
  year={2023},
  publisher={MDPI}
}

@article{choksi2023bringing,
  title={Bringing Artificial Intelligence to the operating room: edge computing for real-time surgical phase recognition},
  author={Choksi, Sarah and Szot, Skyler and Zang, Chengbo and Yarali, Kaan and Cao, Yuqing and Ahmad, Feroz and Xiang, Zixuan and Bitner, Daniel P and Kostic, Zoran and Filicori, Filippo},
  journal={Surgical Endoscopy},
  volume={37},
  number={11},
  pages={8778--8784},
  year={2023},
  publisher={Springer}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Surgical Phase Inference Toolkit

Files

Model Checkpoints

Installation

CLI Usage

1. Video input -> video output

2. Image input -> image output

Outputs

JSON content

NPY content

Timeline and Marker Behavior

FPS Control

Postprocessing Controls

Programmer API

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
frame_api_example.py		frame_api_example.py
phase_run.py		phase_run.py
phaselib.py		phaselib.py

Folders and files

Latest commit

History

Repository files navigation

Surgical Phase Inference Toolkit

Files

Model Checkpoints

Installation

CLI Usage

1. Video input -> video output

2. Image input -> image output

Outputs

JSON content

NPY content

Timeline and Marker Behavior

FPS Control

Postprocessing Controls

Programmer API

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages