
Hello, author. Today when I tested the sh run_scripts/sevila/pre-train/pretrain_qvh.sh script, I encountered the above error. I had no issues when running refine and other scripts. Through debugging and tracing the bug, I found that the problem occurs when performing the addition between scores (on the 26th iteration) and position_bias_masked.
The reason is that in the T5Attention mechanism, when computing key_states, the project operation enters the following section:
# cross-attn
# (batch_size, n_heads, seq_length, dim_per_head)
hidden_states = shape(proj_layer(key_value_states))
At this point, the dimension of key_states becomes [16, 32, 440, 64], which causes the third dimension of scores to be 440. However, key_length remains 110, so the third dimension of position_bias_masked is 110.
I tried modifying some code, but the errors only increased. I'm not sure if there are any issues with some settings, so I'd like to consult you on this. Thank you.
The reason is that in the T5Attention mechanism, when computing key_states, the project operation enters the following section:
At this point, the dimension of key_states becomes [16, 32, 440, 64], which causes the third dimension of scores to be 440. However, key_length remains 110, so the third dimension of position_bias_masked is 110.
I tried modifying some code, but the errors only increased. I'm not sure if there are any issues with some settings, so I'd like to consult you on this. Thank you.