Skip to content

"[FEATURE] Historical Metrics API with 7-Day Rolling Window" #31

@humanauction

Description

@humanauction

Historical Metrics & Time-Series Storage

User Story

AS A network administrator
I WANT to query historical network metrics over configurable time ranges
SO THAT I can identify trends, investigate past incidents and generate reports.


Acceptance Criteria

Given the daemon has been running for multiple time windows
When I query /metrics/history?start=1735171200&end=1735257600
Then I receive a JSON response containing aggregated stats for each 1-minute window within that range

Given the database exceeds the 7-day retention policy
When the daemon performs automatic cleanup
Then records older than 7 days are deleted, and only recent data remains

Given no time range is specified in the request
When I query /metrics/history
Then the API returns the last 24 hours of data by default

Given I request data with start > end or invalid timestamps
When the API validates the request
Then it returns a 400 Bad Request with a descriptive error message

Additional acceptance criteria:

  • Historical data is persisted to SQLite in 1-minute aggregates (reusing existing agg_stats table)
  • /metrics/history endpoint supports query parameters: start (Unix timestamp), end (Unix timestamp), limit (max windows)
  • Response format matches /metrics structure but includes an array of time-windowed snapshots
  • Database schema includes an index on window_start for efficient range queries (already exists)
  • Automatic cleanup job runs daily at 00:00 UTC, deleting records older than 7 days
  • No performance degradation on /metrics endpoint (historical queries run in separate thread pool)
  • Integration tests verify data integrity across write → read → cleanup lifecycle

Tasks

Backend (C++):

  • Add GET /metrics/history endpoint to NetMonDaemon.cpp
    • Parse query parameters (start, end, limit)
    • Validate timestamp range (400 on invalid input)
    • Call StatsPersistence::loadHistoryRange(start, end, limit)
    • Format response as JSON array of windowed stats
  • Extend StatsPersistence.cpp with time-range query:
    • Add loadHistoryRange(int64_t start, int64_t end, size_t limit) method
    • SQL query: SELECT * FROM agg_stats WHERE window_start BETWEEN ? AND ? ORDER BY window_start DESC LIMIT ?
  • Implement cleanup job in StatsPersistence.cpp:
    • Add cleanupOldRecords(int retention_days) method
    • SQL: DELETE FROM agg_stats WHERE window_start < (strftime('%s', 'now') - (? * 86400))
  • Schedule cleanup task in NetMonDaemon::run():
    • Launch background thread that sleeps until next midnight UTC
    • Calls persistence_->cleanupOldRecords(7) daily

Testing:

  • Unit test: StatsPersistence::loadHistoryRange() with mock data (3-day range)
  • Unit test: StatsPersistence::cleanupOldRecords() verifies deletion of old rows
  • Integration test: POST /metrics/history with valid/invalid timestamps
  • Integration test: Verify 7-day retention (simulate time passage, check DB state)
  • Load test: Ensure /metrics/history with 10,080 windows (7 days of 1-min data) completes in <500ms

Documentation:

  • Update docs/api.md with /metrics/history endpoint specification
  • Add example curl commands for common queries (last hour, specific date range)
  • Document retention policy in README.md

Technical Notes

Response Format Example:

{
  "start": 1735171200,
  "end": 1735257600,
  "windows": [
    {
      "timestamp": 1735257540,
      "window_start": 1735257540,
      "total_bytes": 1048576,
      "total_packets": 2048,
      "bytes_per_second": 17476,
      "protocol_breakdown": {"TCP": 800000, "UDP": 248576},
      "active_flows": [...]
    },
    {
      "timestamp": 1735257480,
      ...
    }
  ]
}

Database Impact:

  • 1 minute of data ≈ 50-200 flow records (typical workload)
  • 7 days × 1440 minutes = 10,080 windows
  • Estimated DB size: 50-200 MB (with indexes)

Performance Safeguards:

  • Limit max query range to 30 days (prevent OOM on large queries)
  • Use LIMIT clause to cap response size (default: 1440 windows = 24 hours)
  • Cleanup runs at low-traffic hours (00:00 UTC)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions