Skip to content

performance-mode vllm console print is error #404

@yiminghub2024

Description

@yiminghub2024

(APIServer pid=1) INFO 04-27 13:13:47 [loggers.py:259] Engine 000: Avg prompt throughput: 1.3 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 12.1%
(APIServer pid=1) INFO: 10.133.72.160:53084 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:53098 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:56890 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO 04-27 13:13:57 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 12.1%
(APIServer pid=1) INFO: 10.133.72.160:56900 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:56916 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:49642 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:49652 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:49668 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:46842 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.90.1.5:35528 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:46854 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:46868 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO 04-27 13:14:27 [loggers.py:259] Engine 000: Avg prompt throughput: 1.3 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 12.1%
(APIServer pid=1) INFO: 10.133.72.160:59910 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:59916 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:59920 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.90.1.5:38502 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.90.1.5:38510 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO 04-27 13:14:37 [loggers.py:259] Engine 000: Avg prompt throughput: 2.6 tokens/s, Avg generation throughput: 0.2 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 12.1%
(APIServer pid=1) INFO: 10.133.72.160:54104 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO: 10.133.72.160:54108 - "GET /metrics HTTP/1.1" 200 OK

after i open performance-mode , the console print is error , i have many session , many tokens tps , many kvcache use many prefix cache hit ,but it says both 0 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions