Again, thanks to the authors for open-sourcing some of the training codes that benefit reproduction.
How to implement rule based chunking: Space like chunking mentioned in the paper? Some examples would be great!
As I obtained from the paper is that the chunking mechanism is operated by trained linear Q and linear K, but how to use space like chunking in training?
Thanks.
Again, thanks to the authors for open-sourcing some of the training codes that benefit reproduction.
How to implement rule based chunking: Space like chunking mentioned in the paper? Some examples would be great!
As I obtained from the paper is that the chunking mechanism is operated by trained linear Q and linear K, but how to use space like chunking in training?
Thanks.