RuntimeError: The size of tensor a (416) must match the size of tensor b (104) at non-singleton dimension 3

<img width="1557" height="758" alt="Image" src="https://github.com/user-attachments/assets/49a0b3f9-eb89-40c0-af4c-095555daa697" />
Hello, author. Today when I tested the sh run_scripts/sevila/pre-train/pretrain_qvh.sh script, I encountered the above error. I had no issues when running refine and other scripts. Through debugging and tracing the bug, I found that the problem occurs when performing the addition between scores (on the 26th iteration) and position_bias_masked. 

The reason is that in the T5Attention mechanism, when computing key_states, the project operation enters the following section:
```python
# cross-attn
# (batch_size, n_heads, seq_length, dim_per_head)
hidden_states = shape(proj_layer(key_value_states))
```
At this point, the dimension of key_states becomes [16, 32, 440, 64], which causes the third dimension of scores to be 440. However, key_length remains 110, so the third dimension of position_bias_masked is 110. 

I tried modifying some code, but the errors only increased. I'm not sure if there are any issues with some settings, so I'd like to consult you on this. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: The size of tensor a (416) must match the size of tensor b (104) at non-singleton dimension 3 #31

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RuntimeError: The size of tensor a (416) must match the size of tensor b (104) at non-singleton dimension 3 #31

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions