performance-mode vllm console print is error

(APIServer pid=1) INFO 04-27 13:13:47 [loggers.py:259] Engine 000: Avg prompt throughput: 1.3 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 12.1%
(APIServer pid=1) INFO:     10.133.72.160:53084 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:53098 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:56890 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO 04-27 13:13:57 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 12.1%
(APIServer pid=1) INFO:     10.133.72.160:56900 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:56916 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:49642 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:49652 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:49668 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:46842 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.90.1.5:35528 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:46854 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:46868 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO 04-27 13:14:27 [loggers.py:259] Engine 000: Avg prompt throughput: 1.3 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 12.1%
(APIServer pid=1) INFO:     10.133.72.160:59910 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:59916 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:59920 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.90.1.5:38502 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.90.1.5:38510 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO 04-27 13:14:37 [loggers.py:259] Engine 000: Avg prompt throughput: 2.6 tokens/s, Avg generation throughput: 0.2 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, MM cache hit rate: 12.1%
(APIServer pid=1) INFO:     10.133.72.160:54104 - "GET /metrics HTTP/1.1" 200 OK
(APIServer pid=1) INFO:     10.133.72.160:54108 - "GET /metrics HTTP/1.1" 200 OK


after i open performance-mode , the console print is error , i have many session , many tokens tps , many kvcache use many prefix cache hit ,but it says both 0 . 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance-mode vllm console print is error #404

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

performance-mode vllm console print is error #404

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions