Some simple or fake implementations of LLM infra function
todo:
training:
parallel_implementation
inference (vllm features): clean_vllm
- chunked
- paged attention
cuda algorithm
- flash attention
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Some simple or fake implementations of LLM infra function
todo:
training:
parallel_implementation
inference (vllm features): clean_vllm
cuda algorithm