Hi,
Great work!
I noticed that your implementation replaces the original MTR's 64 static anchors with 6 learnable random initialization queries.
I tried applying Annealed WTA directly to the original MTR (with static anchors), but the training failed to converge. I was wondering if the switch to learnable queries is strictly necessary for convergence? Did you observe similar instability with static anchors during your development process?
Thanks.
Hi,
Great work!
I noticed that your implementation replaces the original MTR's 64 static anchors with 6 learnable random initialization queries.
I tried applying Annealed WTA directly to the original MTR (with static anchors), but the training failed to converge. I was wondering if the switch to learnable queries is strictly necessary for convergence? Did you observe similar instability with static anchors during your development process?
Thanks.