Skip to content

Production Map: Auto-build service dependency graph from traces #8

@nomadicmehul

Description

@nomadicmehul

Summary

Build a dynamic "production map" — a service dependency graph constructed from trace data and K8s service discovery. The agent uses this map to understand blast radius and trace cascading failures.

Why This Matters

When pod-A fails, the agent needs to know that service-B and service-C depend on it to assess impact. Currently this context doesn't exist. Sonarly calls this a "living map" — we should build our own, infrastructure-native version.

Acceptance Criteria

  • Discover services from K8s (services, endpoints, ingress)
  • Build dependency edges from trace data (service A calls service B)
  • Persist map in data/production_map.json (updated periodically)
  • Agent queries the map during investigation to assess blast radius
  • CLI command: nightops map show to display the dependency graph
  • Map updates automatically as new traces are observed

Technical Notes

  • Start with K8s service discovery + trace data from observability providers
  • Could use networkx or simple adjacency list for the graph
  • Inspired by Sonarly's "production map" concept but focused on infra topology

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:agentAgent architecture and orchestrationarea:rcaRoot cause analysisphase:2-smart-triagePhase 2 — Smart Triage & Investigationpriority:highImportant, do if time allowstype:featureNew feature or capability

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions