Skip to content

from report now it is not easy to know how much KVCache VRAM is saving #4

@gaowayne

Description

@gaowayne

please review below data

================================================================================
Mode                      Context    Peak VRAM    KV Est     Speed        Prefill
================================================================================
fp16                      460        6098         159        68.9         0.294
turboquant-4bit           460        6066         170        32.9         0.321
turboquant-3bit           460        6070         173        40.1         0.038
fp16                      930        6304         303        73.3         0.019
turboquant-4bit           930        6242         343        33.1         0.039
turboquant-3bit           930        6249         350        39.7         0.034
fp16                      1860       6715         605        69.5         0.021
turboquant-4bit           1860       6590         687        32.6         0.054
turboquant-3bit           1860       6605         702        39.3         0.045
fp16                      3720       7532         1212       64.0         0.040
turboquant-4bit           3720       7285         1376       32.5         0.119
turboquant-3bit           3720       7317         1407       38.7         0.092

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions