-
Notifications
You must be signed in to change notification settings - Fork 167
Description
What did you do?
Hello, I am new to Consul and trying to understand why consul_up metric continuously fluctuates between up and down, despite all services running well (all Consul nodes are healthy and pods running). We have an alert set to trigger when consul_up is failing to be above 90% in past 5 min: (avg_over_time(consul_up{job="consul-exporter"}[5m]) * 100) < 90.
What did you expect to see?.
We expect to see consul_up give a value of 1 and be constant.
What did you see instead? Under which circumstances?
Instead, we see continuous fluctuations between consul_up being 1 (up) and 0 (down). Thus, our alert is getting triggered often even when all Consul health checks are spotless (we had a Consul support engineer verify this).
I have attached all images and log files explaining the issue.
Environment
Prod
-
System information:
Linux 5.8.0-1041-aws x86_64
-
consul_exporter version:
0.7.1
-
Consul version:
Consul v1.8.0
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible
agents) -
Prometheus version:
prometheus, version 2.28.1 (branch: HEAD, revision: b0944590a1c9a6b35dc5a696869f75f422b107a1)
-
Prometheus configuration file:
-
Logs:
prometheus_consul_exporter_logs.txt
prometheus_logs.txt


