Thank you for sharing your code. As the paper describes the vision loss as AR, the code of this line seems to directly reconstruct the current token rather than the next token. I wonder where the label shift operation for AR is, or if I might have misunderstood this part? Thx.
Thank you for sharing your code. As the paper describes the vision loss as AR, the code of this line seems to directly reconstruct the current token rather than the next token. I wonder where the label shift operation for AR is, or if I might have misunderstood this part? Thx.