Question
I see in appendix that, if the model outputs look_down, then the next step is model will output pixel coordinates. However, I want to know, if model output look_down, does it always first look_down, and then output a pixel?
Greate job! Thanks!