Skip to content

MCP: allow filtering infrastructure_health by cluster #4286

@ebarron

Description

@ebarron

Summary

Add an optional cluster parameter to the infrastructure_health MCP tool so users can scope the health check to a single cluster (or a small set of clusters), rather than always running cluster-wide.

Background

infrastructure_health currently runs a hardcoded set of PromQL queries (cluster status, node status, aggregate status, volume status, capacity thresholds, health alerts) with no label filter — it always evaluates across every cluster the TSDB is monitoring.

In multi-cluster Harvest deployments (multiple datacenters, lab vs. prod, etc.) it's a very common ask to check the health of one specific cluster or a logical grouping of clusters (e.g. all prod clusters, one lab) without seeing noise from the others.

Proposed enhancement

Add an optional argument to the tool, e.g.:

  • cluster — exact cluster name, or
  • cluster_match — regex pattern matched against the cluster label

When provided, each of the internal PromQL queries should be augmented with a {cluster="X"} (or {cluster=~"pattern"}) matcher so the report only reflects the targeted cluster(s).

When omitted, behavior is unchanged (global view).

Use cases

  • "How is prod-east-1 doing right now?"
  • "Are any of my lab clusters unhealthy?"
  • Per-datacenter ops dashboards / chatbot conversations scoped to a single site.

Notes

Filed as part of a NAbox chatbot integration review — all the other harvest-mcp query tools (metrics_query, metrics_range_query, list_metrics with matches, list_label_values) already allow cluster scoping via PromQL label matchers in their query arguments. infrastructure_health is the only "what's wrong?" tool that can't be narrowed.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions