Implementation and the description in the paper

Hi,
thank you for your hard work you’ve put into this project. I really appreciate the effort and dedication that has gone into maintaining Tora.

I have a question regarding the difference between the model architecture described in the paper and the current implementation. 
In the paper, the usage of cross-attention mechanism for both the S-DiT-B block and T-DiT-B block is mentioned. 
However, from reviewing the code, it doesn’t seem like that it is actually implemented.

If I’m missing something, please let me know.

Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation and the description in the paper #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementation and the description in the paper #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions