Recommended approaches to handle high cardinality from workload metrics (e.g., qos_read_io_type)? #4247
-
|
Hi Harvest Team, We are using Harvest to monitor our NetApp storage systems. However, we have run into high cardinality issues when enabling workload.yaml and workload_volume.yaml. Our monitoring team cautions that high cardinality is an anti-pattern in the world of Observability. Specifically, metrics like qos_read_io_type can emit timeseries with labels for each volume and metric type: qos_read_io_type{..., volume="volume_A", ..., metric="hya_non_cache"} For environments with thousands of volumes, this combinatorial explosion (number of volumes × distinct values for the metric label such as cache, disk, bamboo_ssd) results in extremely high cardinality that easily exceeds our telemetry ingestion limits (e.g., 50,000 timeseries). Furthermore, we noticed that many of these timeseries continuously report a value of 0. For now, we have disabled workload.yaml and workload_volume.yaml for multiple reasons, although we would ideally like to re-enable them in the future. We would love your suggestions on:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
hi @songlin-rgb, good question. Here are some general thoughts on high cardinality.
|
Beta Was this translation helpful? Give feedback.
hi @songlin-rgb, good question. Here are some general thoughts on high cardinality.
CIFSSession,CIFSShare,NetConnections, etc.drop_if_zero: true?), but you can do that at ingest via recording rules ormetric_relabel_configs. You could also use this to only keep metrics for specific volumes,…