Question about space like chunking

Again, thanks to the authors for open-sourcing some of the training codes that benefit reproduction.

How to implement rule based chunking: **Space like chunking** mentioned in the paper?  Some examples would be great! 

As I obtained from the paper is that the chunking mechanism is operated by trained linear **Q** and linear **K**, but how to use space like chunking in training? 

Thanks.