This repository provides a practical inference pipeline for surgical phase recognition.
phaselib.py: core library for model loading, inference (frame/image/video), postprocessing, rendering.phase_run.py: CLI wrapper that usesphaselibfor image/video processing and rendering.frame_api_example.py: tiny example that initializes the model and predicts one frame.
We provide two checkpoints: One is trained using a curated set of 108 videos and the other using a set of 202 videos. For downstream applications, we suggest using the second model, which was trained using all available videos.
| Model Name | NumTrain | NumTest | Batch Size | LR | TestAcc |
|---|---|---|---|---|---|
| resnet50_108.pt | 108 | 50 | 16 | 1.00E-05 | 0.8012 |
| resnet50_202.pt | 202 | 0 | 32 | 1.00E-05 | - |
Use your existing environment with PyTorch. Required packages:
torchtorchvisionnumpyopencv-pythonPillowpandas
For a postprocessed output that attempts to remove mistakes due to the per-frame pipeline, run:
python phase_run.py input.mp4 \
--output-media output_overlay.mp4 \
--output-fps 30 \
--min-segment-frames 15 \
--use-default-hernia-orderFor the best per-frame accuracy, run:
python phase_run.py input.mp4 \
--output-media output_overlay.mp4 \
--output-fps 30python phase_run.py input_frame.png \
--output-media output_frame_overlay.png \
--input-type image \
--output-type image--input-type and --output-type support auto, video, image.
By default (auto), type is inferred from file extension.
Current CLI behavior supports matching media modes (video -> video, image -> image).
For every run, three outputs are produced:
- Media output (
--output-media):- video mode: rendered video with timeline below the frame
- image mode: rendered image with timeline below the image
- JSON report (
--output-json, default: same basename +.json) - NPY array (
--output-npy, default: same basename +.npy)
The JSON includes:
- input/output metadata
- raw predictions and postprocessed predictions
- confidences
- phase label metadata and timeline colors
- postprocessing settings used
- contiguous phase segments with start/end times
int32 array of postprocessed frame-level phase indices.
- Timeline strip is rendered below the media.
- Each phase index maps to a distinct color.
- The current-time marker is drawn only within the timeline strip (it does not extend into the video/image panel).
Use --output-fps to control video output FPS.
- If omitted, output FPS defaults to input video FPS.
- Duration is preserved by mapping output timestamps back to processed input-frame indices.
These controls reduce noisy frame-level phase switching:
--median-window: mode filter over local frames.--min-segment-frames: merges very short segments into neighbors.--phase-order: comma-separated ordered phase names or indices to enforce logical progression.--use-default-hernia-order: applies built-in inguinal hernia order.--logic-max-backward: allowed backward steps in ordered phases.--logic-max-forward-jump: max allowed forward jump in ordered phases.
Example with logic-aware smoothing:
python phase_run.py input.mp4 \
--output-media output_overlay.mp4 \
--use-default-hernia-order \
--median-window 5 \
--min-segment-frames 4 \
--logic-max-backward 0 \
--logic-max-forward-jump 1The runtime API is intentionally simple:
from phaselib import initialize_model
import cv2
predictor = initialize_model(
model_path="resnet50-p7-v188-b16-lr1em5-a.pt",
device="auto",
)
# Single frame
frame = cv2.imread("example_frame.png")
pred = predictor.predict_frame(frame)
print(pred.phase_name, pred.confidence)
# Full video with postprocessing
result = predictor.predict_video("surgery.mp4", batch_size=32)
smoothed = predictor.postprocess(
result.raw_preds, result.confidences,
use_hernia_order=True, min_segment_frames=3,
)
print(result.num_frames, smoothed)You can also run frame_api_example.py directly.
Please cite the papers below if you use these models:
@article{zang2023surgical,
title={Surgical phase recognition in inguinal hernia repair---AI-based confirmatory baseline and exploration of competitive models},
author={Zang, Chengbo and Turkcan, Mehmet Kerem and Narasimhan, Sanjeev and Cao, Yuqing and Yarali, Kaan and Xiang, Zixuan and Szot, Skyler and Ahmad, Feroz and Choksi, Sarah and Bitner, Daniel P and others},
journal={Bioengineering},
volume={10},
number={6},
pages={654},
year={2023},
publisher={MDPI}
}
@article{choksi2023bringing,
title={Bringing Artificial Intelligence to the operating room: edge computing for real-time surgical phase recognition},
author={Choksi, Sarah and Szot, Skyler and Zang, Chengbo and Yarali, Kaan and Cao, Yuqing and Ahmad, Feroz and Xiang, Zixuan and Bitner, Daniel P and Kostic, Zoran and Filicori, Filippo},
journal={Surgical Endoscopy},
volume={37},
number={11},
pages={8778--8784},
year={2023},
publisher={Springer}
}