Skip to content

[Feature] Add OpenTelemetry and Prometheus-compatible metrics #313

Description

@richardcmckinney

Problem

The app has structured logs and queues, but operators need metrics and traces to diagnose latency, queue stalls, webhook failures, auth anomalies, and AI cost drivers.

Proposed solution

Add OpenTelemetry instrumentation and a Prometheus-compatible metrics endpoint. Track request latency, response status, API rate-limit events, queue depth, job duration, webhook delivery outcomes, storage operations, database query latency, and AI token usage.

Acceptance criteria

  • Metrics endpoint can be enabled for self-hosted deployments.
  • HTTP request metrics include route pattern, method, status, and latency buckets.
  • Queue metrics include waiting, active, failed, completed, stalled, and job duration.
  • Webhook metrics include attempts, successes, failures, retries, and latency.
  • AI metrics include model, feature, token count, failure count, and cost estimate where available.
  • Tracing can be exported to an OTLP collector.
  • Sensitive payloads, tokens, emails, and secrets are not included in spans or metrics labels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions