Skip to content

Annotation boundary between 'dissect' and 'cut' in CholecT50 - question from Rendezvous reproduction #30

@Harivignz

Description

@Harivignz

Hi CAMMA team,

I'm Hari Vignesh Balaji, an AI engineer from India.
Currently reproducing the Rendezvous evaluation on CholecT45-crossval. The per-class verb AP breakdown raises a question I haven't found addressed in the paper or the annotation protocol.

When I inspect frames in which the model misclassifies 'dissect' vs 'cut', they appear visually nearly identical. the same hook instrument, the same tissue contact, the same spatial configuration. The difference seems to lie in surgical intent and the motion vector rather than any single-frame visual feature.

My question is about the annotation design: was there an explicit criterion in the annotation guide that allowed annotators to distinguish 'dissect' from 'cut' in ambiguous frames? Or was the boundary set at the clinical-semantic level (i.e., annotators were surgeons who applied clinical context that the model could not recover from a frozen frame)?

I'm trying to understand whether the verb mAP ceiling on these classes reflects a dataset labelling ambiguity, a fundamental single-frame limitation, or something addressable with temporal context, which I noticed the Rendezvous-in-Time paper partially addresses.

Thank you for any insight and for making the code and ivtmetrics publicly available. They made reproduction straightforward.

Best regards,
Hari Vignesh BALAJI
India
https://www.linkedin.com/in/harivignz/
https://github.com/Harivignz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions