Advanced Analytics & Statistical Analysis

ProDisco isn't just for fetching Kubernetes resources - it includes powerful statistical analysis capabilities for in-depth cluster observability. By combining Prometheus metrics with the simple-statistics library, you can perform anomaly detection, trend analysis, and correlation analysis directly in the sandbox.

Available Analytics Library
Discovering Analytics Functions
Example Workflows
Quick Reference: Prompt Examples

Available Analytics Library

The sandbox provides the simple-statistics library for statistical analysis:

Library	Version	Purpose	Key Functions
simple-statistics	7.8.8	Descriptive stats, distributions, regression	`mean`, `median`, `standardDeviation`, `zScore`, `linearRegression`, `sampleCorrelation`

Discovering Analytics Functions

Use searchTools with documentType: "function" to discover available analytics functions:

// List all analytics functions
{ documentType: "function", library: "simple-statistics" }

// Search for specific functions
{ methodName: "regression", documentType: "function" }

// Find correlation functions
{ methodName: "correlation", documentType: "function" }

Example Workflows

1. Cluster Health Report with Statistics

Prompt:

Analyze the CPU and memory usage across all pods in my cluster. Calculate mean, median, standard deviation, and identify any outliers using z-scores. Show me which pods are consuming resources above the 95th percentile.

What it does:

Queries CPU and memory metrics for all pods
Calculates descriptive statistics (mean, median, std dev, min, max)
Computes z-scores to identify statistical outliers
Finds pods above the 95th percentile

Example Output:

CPU USAGE ANALYSIS
==================
Total Pods Analyzed: 15
Mean:                8.60 millicores
Median:              2.11 millicores
Std Deviation:       12.21 millicores
95th Percentile:     46.74 millicores

PODS ABOVE 95TH PERCENTILE:
┌──────────────────────────────────────────────────────────────────┐
│ NAMESPACE/POD                              │ CPU (mc) │ Z-SCORE │
├──────────────────────────────────────────────────────────────────┤
│ kube-system/kube-apiserver-kind-control-pl │     46.7 │    3.12 │ ⚠️
└──────────────────────────────────────────────────────────────────┘

STATISTICAL OUTLIERS (|z-score| > 2):
└─ kube-system/kube-apiserver: 46.7 mc (z=3.12, HIGH)

Key Libraries Used:

const ss = require('simple-statistics');

const mean = ss.mean(values);
const median = ss.median(values);
const stdDev = ss.standardDeviation(values);
const percentile95 = ss.quantile(values, 0.95);
const zScore = (value - mean) / stdDev;

2. Memory Leak Detection

Prompt:

Check for potential memory leaks in my cluster. Fetch memory usage over the last 2 hours and use linear regression to identify pods with steadily increasing memory. Predict what the memory will be in 1 hour.

What it does:

Fetches 2 hours of memory time-series data per pod
Fits linear regression to each pod's memory trend
Calculates growth rate (MB/hour)
Projects memory usage 1 hour into the future
Flags pods with concerning growth patterns

Example Output:

MEMORY LEAK DETECTION
=====================
Pod: prometheus-grafana
   Current Memory: 702.3 MB
   Trend: +0.84 MB/hour
   R² (fit quality): 0.89
   Predicted (1 hour): 703.1 MB
   ⚠️ Potential leak - consistent upward trend

Pod: alertmanager
   Current Memory: 48.2 MB
   Trend: -0.02 MB/hour
   ✅ Stable - no leak detected

Key Libraries Used:

const ss = require('simple-statistics');

// Fit linear regression: memory vs time
const pairs = times.map((t, i) => [t, memoryValues[i]]);
const regression = ss.linearRegression(pairs);
const regressionLine = ss.linearRegressionLine(regression);

// Predict future value
const predictedMemory = regressionLine(currentTime + 60); // 1 hour ahead
const growthRate = regression.m * 60; // MB per hour

3. Network Anomaly Detection

Prompt:

Analyze network traffic patterns in my cluster and detect anomalies. Use statistical methods to find any network receive/transmit rates that are more than 2 standard deviations from normal.

What it does:

Queries network receive/transmit bytes rate over time
Calculates mean and standard deviation per interface
Identifies data points with |z-score| > 2
Classifies anomalies as HIGH (spike) or LOW (drop)

Example Output:

NETWORK TRAFFIC ANOMALY DETECTION
=================================
Analysis Period: Last 1 hour (1-minute intervals)
Threshold: ±2 standard deviations from mean

RECEIVE TRAFFIC (eth0):
   Mean Rate: 0.5 KB/s
   Std Dev:   0.1 KB/s

   ⚠️ ANOMALIES DETECTED: 5
      └─ 2025-12-09T23:59:20Z: 0.8 KB/s (z-score: 3.15, HIGH)
      └─ 2025-12-10T00:00:20Z: 0.8 KB/s (z-score: 3.14, HIGH)
      └─ 2025-12-10T00:01:20Z: 0.8 KB/s (z-score: 3.13, HIGH)

INTERPRETATION:
The eth0 interface experienced a traffic spike around midnight,
suggesting a scheduled job or automated task.

Key Libraries Used:

const ss = require('simple-statistics');

const mean = ss.mean(values);
const stdDev = ss.standardDeviation(values);

values.forEach((value, i) => {
  const zScore = (value - mean) / stdDev;
  if (Math.abs(zScore) > 2) {
    anomalies.push({
      time: timestamps[i],
      value,
      zScore,
      direction: zScore > 0 ? 'HIGH' : 'LOW'
    });
  }
});

4. Performance Correlation Analysis

Prompt:

Find correlations between CPU usage and memory usage for the prometheus pods. Tell me if high CPU correlates with high memory usage.

What it does:

Fetches time-series data for both CPU and memory
Calculates Pearson correlation coefficient (r)
Computes R² (coefficient of determination)
Fits linear regression to quantify relationship
Interprets correlation strength

Example Output:

CPU vs MEMORY CORRELATION ANALYSIS - PROMETHEUS PODS
====================================================

PER-POD ANALYSIS:
┌─────────────────────────────────────────────────────────────┐
│ Pod: prometheus-grafana                                      │
│ Pearson Correlation (r):  -0.1635                           │
│ R-squared (r²):            0.0267                           │
│ Correlation Strength:     ⚪ NEGLIGIBLE NEGATIVE             │
│ Data Points:              61                                 │
├─────────────────────────────────────────────────────────────┤
│ Linear Regression: Memory = -0.036 × CPU + 702.69           │
│ For every 1mc CPU increase, memory decreases by 0.036 MB    │
└─────────────────────────────────────────────────────────────┘

CONCLUSION:
There is NO significant correlation between CPU and memory usage.
Average correlation across pods: -0.033
CPU and memory are used independently by these pods.

Key Libraries Used:

const ss = require('simple-statistics');

// Pearson correlation coefficient
const correlation = ss.sampleCorrelation(cpuValues, memValues);
const rSquared = correlation * correlation;

// Linear regression
const pairs = cpuValues.map((cpu, i) => [cpu, memValues[i]]);
const regression = ss.linearRegression(pairs);

Quick Reference: Prompt Examples

Copy these prompts to get started with analytics:

Use Case	Prompt
Cluster Health	"Analyze CPU and memory usage across all pods. Calculate mean, median, standard deviation, and identify outliers using z-scores. Show pods above the 95th percentile."
Memory Leaks	"Check for memory leaks. Fetch memory usage over 2 hours and use linear regression to identify pods with increasing memory. Predict memory in 1 hour."
Network Anomalies	"Analyze network traffic and detect anomalies. Find receive/transmit rates more than 2 standard deviations from normal."
Correlation	"Find correlations between CPU and memory usage for prometheus pods. Tell me if high CPU correlates with high memory."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced Analytics & Statistical Analysis

Table of Contents

Available Analytics Library

Discovering Analytics Functions

Example Workflows

1. Cluster Health Report with Statistics

2. Memory Leak Detection

3. Network Anomaly Detection

4. Performance Correlation Analysis

Quick Reference: Prompt Examples

See Also

FilesExpand file tree

analytics.md

Latest commit

History

analytics.md

File metadata and controls

Advanced Analytics & Statistical Analysis

Table of Contents

Available Analytics Library

Discovering Analytics Functions

Example Workflows

1. Cluster Health Report with Statistics

2. Memory Leak Detection

3. Network Anomaly Detection

4. Performance Correlation Analysis

Quick Reference: Prompt Examples

See Also