(Potentially) A very important bug in extract_features_from_gt.py

Hi, 12-in-1 is a very interesting work based on vilbert.
However, I am confused about the extract_features_from_gt.py scipt.

In the README under data/ directory, you said: to extract data features, users should firstly transform all grounding truth as the following format:

{
    {
        'file_name': 'name_of_image_file',
        'file_path': '<path_to_image_file_on_your_disk>',
        'bbox': array([
                        [ x1, y1, width1, height1],
                        [ x2, y2, width2, height2],
                        ...
                    ]),
        'num_box': 2
    },
    ....
}

However, I notice that in the extract_features_from_gt.py script, you do not recover the xywh to xyxy format, which should cause wrong feature extraction.
I am not sure whether this is an elaborate design or a bug.

Further, if this is a bug, what about the features used in VILBERT and 12-in-1? Are they correctly extracted using the correct bounding boxes?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Potentially) A very important bug in extract_features_from_gt.py #77

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

(Potentially) A very important bug in extract_features_from_gt.py #77

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions