Consul Fails to Query Service Health - consul_up is down ~40% of time

**What did you do?**
Hello, I am new to Consul and trying to understand why consul_up metric continuously fluctuates between up and down, despite all services running well (all Consul nodes are healthy and pods running). We have an alert set to trigger when consul_up is failing to be above 90% in past 5 min: (avg_over_time(consul_up{job="consul-exporter"}[5m]) * 100) < 90.

**What did you expect to see?**.
        We expect to see consul_up give a value of 1 and be constant.

**What did you see instead? Under which circumstances?**
        Instead, we see continuous fluctuations between consul_up being 1 (up) and 0 (down). Thus, our alert is getting triggered often even when all Consul health checks are spotless (we had a Consul support engineer verify this).
I have attached all images and log files explaining the issue.

<img width="810" alt="consul_nodes_health" src="https://user-images.githubusercontent.com/114620112/208193541-5eabfae5-8c0f-4498-acdf-ddd5702eae3a.png">
<img width="1658" alt="consul_uptime_graph" src="https://user-images.githubusercontent.com/114620112/208193543-0fbcd383-f844-498b-aa20-4370445872f7.png">
<img width="1709" alt="consul_uptime_value" src="https://user-images.githubusercontent.com/114620112/208193546-16837148-354a-49d5-9284-583de690e9b7.png">

**Environment**
        Prod

* System information:

	Linux 5.8.0-1041-aws x86_64

* consul_exporter version:

	0.7.1

* Consul version:

	Consul v1.8.0
        Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible 
        agents)

* Prometheus version:

	prometheus, version 2.28.1 (branch: HEAD, revision: b0944590a1c9a6b35dc5a696869f75f422b107a1)

* Prometheus configuration file:

    [prometheus_config.txt](https://github.com/prometheus/consul_exporter/files/10249623/prometheus_config.txt)

* Logs:
[prometheus_consul_exporter_logs.txt](https://github.com/prometheus/consul_exporter/files/10249575/prometheus_consul_exporter_logs.txt)
[prometheus_logs.txt](https://github.com/prometheus/consul_exporter/files/10249577/prometheus_logs.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consul Fails to Query Service Health - consul_up is down ~40% of time #255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consul Fails to Query Service Health - consul_up is down ~40% of time #255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions