Intelligent Kubernetes Traffic Optimization & Pod Rescheduling
StornX monitors your Kubernetes workloads and automatically reschedules pods to reduce inter-service latency, balance traffic across nodes, and maintain fault tolerance — all without downtime.
- Why StornX?
- Key Features
- Architecture
- Quick Start
- Configuration
- Documentation
- Development
- Roadmap
- Contributing
- License
In distributed microservice architectures, Kubernetes schedules pods once — at creation time. As traffic patterns change, pods that communicate frequently may end up on distant nodes, increasing latency and wasting bandwidth.
StornX fixes this by continuously optimizing pod placement and traffic routing based on real-time metrics.
| Without StornX | With StornX |
|---|---|
| Pods placed randomly across zones | Communicating pods co-located for lower latency |
| Static Istio routing weights | Adaptive traffic balancing based on load & latency |
| Manual scaling decisions | Zone-aware autoscaling that preserves fault tolerance |
| No visibility into cross-service latency | Prometheus-driven, data-informed decisions every cycle |
-
OptiBalancer — Gradually rebalances Istio DestinationRule traffic weights based on latency, load, and replica count. Uses adaptive step sizing with configurable urgency scaling to avoid oscillation.
-
OptiScaler — Intelligent pod autoscaling that selects the optimal node for new replicas using service-graph analysis (upstream/downstream relationships) and falls back to resource-based (LFU) selection when no graph data is available.
-
Fault Tolerance — Ensures replicas are distributed across availability zones. Respects PodDisruptionBudgets and coordinates with existing HPAs.
-
Zero-Downtime Rescheduling — New pods reach
Runningstate before old pods are removed, guaranteeing uninterrupted service. -
Single-Instance Design — Runs as exactly one replica to prevent duplicate scheduling decisions.
┌───────────────────────────────────────────────────────────────────────┐
│ StornX │
│ │
│ ┌──────────────┐ ┌───────────────┐ ┌───────────────────────┐ │
│ │ Cron Engine │───▶│ OptiBalancer │───▶│ Istio DestinationRule│ │
│ │ (node-cron) │ │ Traffic │ │ Updates │ │
│ │ │ │ Optimization │ └───────────────────────┘ │
│ │ │ └───────────────┘ │
│ │ │ ┌───────────────┐ ┌───────────────────────┐ │
│ │ │───▶│ OptiScaler │───▶│ Pod Create / Delete │ │
│ │ │ │ Autoscaling │ │ (kubectl) │ │
│ └──────────────┘ └───────────────┘ └───────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ Prometheus Adapter │ │
│ │ • P95 response times (Istio) │ │
│ │ • CPU / Memory utilization │ │
│ │ • Request rates & service graph │ │
│ └──────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────────┘
| Component | Version | Required |
|---|---|---|
| Kubernetes | ≥ 1.19 | Yes |
| Helm | ≥ 3.2 | Yes |
| Prometheus | any | Yes |
| Istio | any | Recommended (for traffic balancing) |
# Add the namespace
kubectl create namespace stornx
# Install with default values
helm install stornx ./.kubernetes/helm -n stornx
# Or with production values
helm install stornx ./.kubernetes/helm -n stornx \
-f ./.kubernetes/helm/values-production.yaml
# Customize inline
helm install stornx ./.kubernetes/helm -n stornx \
--set config.namespaces="my-app,my-api" \
--set config.prometheusUrl="http://prometheus.monitoring.svc:9090"kubectl get pods -n stornx
kubectl logs -n stornx -l app.kubernetes.io/name=stornx -fhelm uninstall stornx -n stornx
kubectl delete namespace stornxStornX is configured entirely via environment variables, all exposed through the Helm chart values.
| Variable | Default | Description |
|---|---|---|
ENV |
production |
Run mode (production / development) |
APP_PORT |
3000 |
HTTP server port |
NAMESPACES |
default |
Comma-separated namespaces to monitor |
PROMETHEUS_URL |
http://prometheus.prometheus.svc.cluster.local:9090 |
Prometheus endpoint |
CRONJOB_EXPRESSION |
* * * * * |
Cron schedule for the optimization loop |
LOCALITY_LABELS_CRON |
* * * * * |
Cron for zone-label discovery |
| Variable | Default | Description |
|---|---|---|
METRICS_TYPE |
memory |
Primary metric: cpu or memory |
METRICS_UPPER_THRESHOLD |
70 |
Upper % to trigger scale-up / rescheduling |
METRICS_LOWER_THRESHOLD |
20 |
Lower % to trigger scale-down |
RESPONSE_TIME_THRESHOLD |
100 |
Target P95 response time in ms |
CPU_WEIGHT |
50 |
Weight (0–100) for CPU in combined score |
MEMORY_WEIGHT |
50 |
Weight (0–100) for Memory in combined score |
These control the adaptive traffic-shifting algorithm. Leave defaults if you are unsure — see docs/README.md for a deep dive.
| Variable | Default | Description |
|---|---|---|
BALANCER_MIN_DELTA |
5 |
Minimum L1 delta to apply a DestinationRule update |
BALANCER_MIN_STEP_SIZE |
5 |
Floor of the adaptive step (% points per cycle) |
BALANCER_MAX_STEP_SIZE |
20 |
Ceiling of the adaptive step |
BALANCER_URGENCY_THRESHOLD |
50 |
L1 delta at which max step is used |
BALANCER_EPSILON |
1 |
Per-route convergence tolerance |
| Variable | Default | Description |
|---|---|---|
FT_MAX_ZONES |
3 |
Maximum zones across which replicas are spread |
For full Helm chart parameters (RBAC, probes, resources, persistence, etc.) see the Helm Chart README.
| Document | Description |
|---|---|
| Main README | Environment variables, balancer deep-dive |
| OptiScaler Docs | Autoscaling algorithm, decision trees, fault tolerance |
| Helm Chart README | Full chart parameters, installation variants |
| Istio Setup | Istio Helm installation notes |
| Addons | Prometheus, Grafana, Jaeger, Kiali manifests |
- Node.js ≥ 22
- Yarn
cd scheduler
yarn install# Development mode (uses local kubeconfig)
ENV=development yarn startyarn lint # TypeScript check + ESLint + Prettier
yarn test # Jest with coveragedocker build -f .docker/Dockerfile -t alazidis/stornx:latest scheduler
docker push alazidis/stornx:latestStornX/
├── scheduler/ # Core application (TypeScript / Node.js)
│ ├── src/
│ │ ├── config/ # Environment config, logger, K8s client
│ │ ├── core/
│ │ │ ├── optiBalancer/ # Traffic weight optimization engine
│ │ │ └── optiScaler/ # Pod autoscaling logic
│ │ ├── adapters/
│ │ │ ├── k8s/ # Kubernetes API services
│ │ │ └── prometheus/ # Prometheus query layer
│ │ └── cronjobs/ # Cron-based scheduling orchestrator
│ └── tests/ # Jest unit + scenario tests
├── .kubernetes/helm/ # Helm chart
├── .docker/ # Dockerfile (multi-stage)
├── .github/workflows/ # CI pipeline (GitHub Actions)
├── addons/ # Istio addons, Kubecost, sample apps
├── docs/ # Extended documentation
└── istio-helm/ # Istio Helm setup
- Reduce inter-service response time — Continuously optimize pod placement so that frequently communicating services are co-located, minimizing network hops and P95 latency
- Per-deployment autoscaling — Scale each Deployment independently based on its own metrics, thresholds, and traffic patterns instead of a single global policy
- Per-deployment traffic routing — Apply fine-grained Istio DestinationRule weight adjustments per Deployment, allowing each service to have its own balancing strategy
- StatefulSet support — Extend rescheduling and autoscaling to StatefulSets (currently only Deployments are supported)
- Predictive decisions from historical traffic — Use historical metrics to forecast traffic patterns and proactively scale / rebalance before demand spikes occur
- Dashboard — Built-in web UI for visualizing decisions, traffic distributions, and historical trends
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Commit your changes (
git commit -m 'feat: add my feature') - Push to the branch (
git push origin feature/my-feature) - Open a Pull Request
Please ensure all tests pass (yarn test) and linting is clean (yarn lint) before submitting.
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.
Built by Apostolos Lazidis
