Skip to content

RTX30xx FP8 feather matmul triton kernel? #16

@phazei

Description

@phazei

I saw this was released https://github.com/SuriyaaMM/feather not too long ago.

I was wondering, is that something that would ideally be implemented here? Would it basically speed up all fp8 operations presuming the operation was completed long before the memory was copied from VRAM to the registers? Seems like it's in line with the kernels here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions