Skip to content

RuntimeError: The size of tensor a (416) must match the size of tensor b (104) at non-singleton dimension 3 #31

@GaoR311

Description

@GaoR311
Image Hello, author. Today when I tested the sh run_scripts/sevila/pre-train/pretrain_qvh.sh script, I encountered the above error. I had no issues when running refine and other scripts. Through debugging and tracing the bug, I found that the problem occurs when performing the addition between scores (on the 26th iteration) and position_bias_masked.

The reason is that in the T5Attention mechanism, when computing key_states, the project operation enters the following section:

# cross-attn
# (batch_size, n_heads, seq_length, dim_per_head)
hidden_states = shape(proj_layer(key_value_states))

At this point, the dimension of key_states becomes [16, 32, 440, 64], which causes the third dimension of scores to be 440. However, key_length remains 110, so the third dimension of position_bias_masked is 110.

I tried modifying some code, but the errors only increased. I'm not sure if there are any issues with some settings, so I'd like to consult you on this. Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions