feat(metrics): TECH-6381 add org_slug label to keeperhub_workflow_executions_total by chong-techops · Pull Request #1146 · KeeperHub/keeperhub

chong-techops · 2026-05-06T06:56:01Z

Summary

Adds org_slug label to keeperhub_workflow_executions_total so dashboards/alerts can scope success rate and traffic panels to managed clients (techops-services, ajna) the same way the errors gauge already supports.
Replaces the two separate DB queries (status totals + per-org error breakdown) with a single combined query that groups by status AND organization.slug. Per-status totals and errorByOrgSlug are derived from the combined result, keeping the existing errors gauge unchanged.
Anonymous workflows continue to bucket under org_slug="_anonymous" so per-org sums match the unfiltered totals.
Updates METRICS_REFERENCE.md with the corrected aggregation pattern (sum(max by (..., org_slug) (...))) — the previously documented max by (status) would silently underreport totals once the partition label exists.

Why

TechOps wants the workflow Success Rate panel split into User vs System (managed orgs only), so user-workflow failures don't drag the system rate down. The errors gauge already had org_slug; the executions gauge didn't, which made a system-only success rate impossible to compute.

Deploy ordering

This PR should ship before the corresponding techops_infrastructure PR so the dashboard panels render against the new label as soon as they reach prod.

Replace the two queries (status counts + per-org error breakdown) with a single combined query that groups by status AND organization.slug. Adds executionsByStatusAndOrgSlug to WorkflowStats so the prometheus collector can label keeperhub_workflow_executions_total by org_slug, following the same convention as the errors gauge (anonymous workflows bucket under '_anonymous' so per-org sums match the global totals). totalSuccess/totalError/etc and errorByOrgSlug are now derived from the combined query, keeping the existing errors gauge wiring unchanged.

Add org_slug to keeperhub_workflow_executions_total so dashboards and alerts can scope the success rate to managed clients (the errors gauge already had this label; the executions gauge didn't, which made it impossible to compute a system-only success rate). Reset before populating so series for orgs that drop to zero in a given status clear out instead of going stale -- same pattern as the errors gauge.

…rrors Document the convention so dashboard authors know they can scope these gauges by managed-client org_slug, and that '_anonymous' is reserved for personal workflows.

…eled gauges The existing 'use max by (label)' guidance was correct when status was the only label and pods were the only repetition source. With org_slug now a real partition dimension on workflow_executions_total and workflow_execution_errors_total, max by (status) returns the busiest single org instead of the total -- a silent regression for any panel using that pattern. Document the corrected pattern (sum-of-max) so dashboard authors get the total across orgs while still deduping pods, and update the delta() examples to match.

chong-techops temporarily deployed to staging May 6, 2026 06:56 — with GitHub Actions Inactive

chong-techops added 4 commits May 6, 2026 16:27

docs(metrics): TECH-6381 note org_slug label on workflow executions/e…

70fbdb2

…rrors Document the convention so dashboard authors know they can scope these gauges by managed-client org_slug, and that '_anonymous' is reserved for personal workflows.

chong-techops force-pushed the TECH-6381-executions-total-org-slug-label branch from 3acb76a to fd045a7 Compare May 6, 2026 06:58

chong-techops temporarily deployed to staging May 6, 2026 06:58 — with GitHub Actions Inactive

chong-techops requested a review from a team May 6, 2026 07:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metrics): TECH-6381 add org_slug label to keeperhub_workflow_executions_total#1146

feat(metrics): TECH-6381 add org_slug label to keeperhub_workflow_executions_total#1146
chong-techops wants to merge 4 commits intostagingfrom
TECH-6381-executions-total-org-slug-label

chong-techops commented May 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chong-techops commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Deploy ordering

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chong-techops commented May 6, 2026 •

edited

Loading