Skip to content

Add API usage analytics + metrics emission framework #630

@JohnRDOrazio

Description

@JohnRDOrazio

Summary

Track this as a possible future feature: a metrics layer for both internal health signals (currently surfaced ad-hoc via /health) and API usage analytics (currently not captured anywhere).

Why

Today the API has no first-class metrics emission. Reconciliation health is exposed via the /health endpoint (see openfga_outbox block added in the Option B+C work), and request-level signals exist only in the log files. There's no way to answer questions like:

  • How many /calendar requests did we serve last month?
  • Which API keys / applications are the heaviest consumers?
  • What's the per-endpoint error rate trend?
  • Which calendars (nations / dioceses) are queried most often?
  • What's p50 / p95 / p99 latency per endpoint?

These are the questions that drive capacity planning, deprecation decisions, and outreach to heavy users.

Out of scope of the current openfga-reconciliation work

The Option B+C design (see docs/superpowers/specs/2026-06-02-openfga-async-reconciliation-design.md) deliberately uses /health polling rather than a metrics emitter, on the reasoning that adding Prometheus/StatsD wiring for two operational signals would be premature. This issue tracks the eventual graduation to a real metrics framework when more signals justify it.

Candidate substrates

  • Prometheus + a /metrics endpoint — the industry default, scraped by ops infrastructure. Requires the promphp/prometheus_client_php package (or similar) and a place to register collectors.
  • StatsD + DogStatsD-compatible emitter — push model, lighter integration, no scraping infrastructure on the host.
  • PG aggregation table + scheduled rollup — a api_requests_hourly (or similar) table populated by middleware on each request, rolled up via a periodic SQL job. No external metrics infrastructure; everything stays in PG. Lower fidelity but zero new dependencies.

Per-question signals to capture

Signal Tags / dimensions
Request count endpoint, status code, API key ID, application ID
Request latency endpoint, status code
Calendar data fetched calendar type (general/national/diocesan/wider), nation, diocese
Auth event event type (login, refresh, logout), result
Outbox state (already in /health; could be moved here)
OpenFGA call count + latency operation (check/listObjects/writeTuple/deleteTuple), result

Acceptance criteria (when this lands)

  • A middleware captures per-request metrics with the dimensions above, with minimal latency overhead (< 1ms p99).
  • /admin/metrics or /metrics (auth model TBD per substrate choice) exposes the aggregates.
  • The outbox observability currently in /health migrates to the new substrate (or stays in /health if that's the simpler ops model — explicitly decide).

Related

  • Option B+C async reconciliation design (the proximate trigger for this issue).
  • src/Repositories/ApiKeyRepository.php and src/Repositories/ApplicationRepository.php — already model the dimensions we'd tag metrics with.

Notes

This is a track-only issue. No code lands until a separate brainstorming pass picks a substrate and scopes the rollout.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions