You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Feb 3, 2026. It is now read-only.
I have run the demo.sh and the prompt seems to digest all the images but has only 1 <image> token. The results seem to get ambiguous as the video gets longer. Is it doing inference based on the first frame it sees after segmenting?
I have run the
demo.shand the prompt seems to digest all the images but has only 1<image>token. The results seem to get ambiguous as the video gets longer. Is it doing inference based on the first frame it sees after segmenting?