DeepSeek Native Sparse Attention pytorch implementation(Non-Official)
【手撕NSA】DeepSeek新作-原生稀疏注意力-超长文(附代码)
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
| Name | Name | Last commit date | ||
|---|---|---|---|---|
DeepSeek Native Sparse Attention pytorch implementation(Non-Official)
【手撕NSA】DeepSeek新作-原生稀疏注意力-超长文(附代码)
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention