feat: addons for FP8 attention bmm, paged attention, and linear in FMS#154
feat: addons for FP8 attention bmm, paged attention, and linear in FMS#154ani300 merged 16 commits intofoundation-model-stack:mainfrom
Conversation
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Andrea Fasoli <andrea.fasoli@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
|
this PR needs minimal changes and in my opinion it is good to go. FP8 addons are an experimental feature which may require further validation of math and model outputs, but they don't interact with other parts of FMS-MO, so won't break existing code. Only exception is the additional import of torchao, only needed for fp8, which is being added to the build as optional requirement. @tharapalanivel @chichun-charlie-liu please check that this is done appropriately. @ani300 we have barebone unit tests for other addons, that check op registration in the torch namespace and output shape validation: https://github.com/foundation-model-stack/fms-model-optimizer/blob/main/tests/aiu_addons/test_gptq_addon.py |
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
andrea-fasoli
left a comment
There was a problem hiding this comment.
this PR looks ready to go
|
Now that ibm-fms is an optional package, we will need to add guards to each file that imports fms similar to the other aiu_addon files. |
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
|
@BrandonGroth done |
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>
Description of the change
This is an updated version of @andrea-fasoli 's refactor of my FP8 work, adding Paged Attention kernels as well as cleaning the code.
Related issues or PRs
#149
How to verify the PR
Code review (including math) is required.
Was the PR tested
Checklist for passing CI/CD:
git commit -signoffor equivalenttox -e fixtox -e linttox -e spellchecktox -e unitNote: CI/CD performs unit tests on multiple versions of Python from a fresh install. There may be differences with your local environment and the test environment.