Skip to content

Implementation and the description in the paper #47

@pmy0792

Description

@pmy0792

Hi,
thank you for your hard work you’ve put into this project. I really appreciate the effort and dedication that has gone into maintaining Tora.

I have a question regarding the difference between the model architecture described in the paper and the current implementation.
In the paper, the usage of cross-attention mechanism for both the S-DiT-B block and T-DiT-B block is mentioned.
However, from reviewing the code, it doesn’t seem like that it is actually implemented.

If I’m missing something, please let me know.

Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions