We annotate the popular long-term tracking dataset, LTB50, with dense language descriptions. Based on this language-annotated dataset, we extend traditional Long-term visual Tracking (LT) to Long-term Vision-Language Tracking (LVLT).
We also provide an annotation toolkit, which is developed with the tkinter package.
python -m lib.gui
Text Box: The upper one shows the last description, the lower one is used to annotate the current frame. You can fill the lower one with a language description and click thesavebutton (or press theEnterkey).- key
Ctrl+UpandCtrl+Down(button|<abd>|): choose video - key
Ctrl+LeftandCtrl+Right(button<abd>): choose frame - key
Shift+LeftandShift+Right(button<<abd>>): fast-backward and fast-forward - key
Alt+LeftandAlt+Right(button@<<abd>>@): to the last description, to the next description - key
Enter(buttonSave): save the description of current frame - key
Delete(buttonClear): clear the description of current frame
If you find this project useful in your research, please consider cite:
@article{DBLP:journals/corr/abs-1804-07056,
author = {Alan Lukezic and
Luka Cehovin Zajc and
Tom{\'{a}}s Voj{\'{\i}}r and
Jiri Matas and
Matej Kristan},
title = {Now you see me: evaluating performance in long-term visual tracking},
eprinttype = {arXiv},
eprint = {1804.07056},
}
- Now you see me: evaluating performance in long-term visual tracking.
Alan Lukežič, Luka Čehovin Zajc, Tomáš Vojíř, Jiří Matas, Matej Kristan. arXiv, 1804.07056.
LVLT is released under the GPL-3.0 License.
