Skip to content

Feature Request: Real-time dashboard view combining in-memory buffer with SQLite #290

@Neiko2002

Description

@Neiko2002

Problem

We evaluated dozens of LLM gateways and proxies (Phoenix/Arize, LiteLLM, GoModel, and many others) looking for a self-hosted solution that provides real-time visibility of LLM requests and streamed completions in a Web UI.

The core requirement is simple: when a client sends a request to the gateway and the LLM streams tokens back via SSE, the dashboard should display the prompt and streamed tokens as they happen — not with a 5-second delay.

What we found

Most tools have significant delays (30–50s polling) or no body visibility at all. GoModel came closest — it captures request/response bodies, parses SSE events, and reconstructs full response content. However, the dashboard only reads from SQLite, not from the in-memory buffer.

GoModel's StreamLogObserver correctly captures SSE events and builds the full response body in memory. It then flushes to SQLite every 5 seconds (FlushInterval: 5s) or every 1000 entries (BufferSize: 1000). The dashboard's GET /admin/api/v1/audit/log endpoint queries only SQLite — it does not also read the in-memory buffer.

This means:

  • No live updates in the dashboard between flush cycles
  • Minimum 5-second delay before any new request appears
  • No auto-refresh or polling mechanism to check for new entries
  • The in-memory buffer exists solely as a write-back cache, not as a live data source

Proposed Solution

Combine in-memory and SQLite data in the dashboard API response:

  1. Keep the in-memory buffer as the primary read source for the dashboard API, with SQLite as the durable backup.
  2. Or: Return both in-memory entries (newest, not yet flushed) and SQLite entries (persisted) in the GET /admin/api/v1/audit/log response, merging them by timestamp.
  3. Or: Add a separate WebSocket/SSE endpoint that pushes new entries to the dashboard as they arrive in memory, without waiting for SQLite flush.

This would give users near-instant visibility into LLM requests — the prompt appears immediately, and streamed tokens update in real-time as they're captured by StreamLogObserver.

Why this matters

For debugging LLM integrations, monitoring token usage, and understanding what prompts/completions are flowing through the gateway, real-time visibility is essential. A 5-second delay is acceptable for analytics and cost tracking, but it makes the dashboard useless for live debugging.

We tested many tools and found that none of them provide true real-time dashboard visibility of LLM request/response bodies. GoModel has the architecture to do it (the in-memory buffer already exists and captures everything), but it doesn't expose it.

Environment

  • GoModel version: latest (ENTERPILOT/GoModel)
  • Use case: Debugging LLM streaming, monitoring real-time token usage, understanding prompt/response content
  • Current workaround: None — must wait 5+ seconds for each request to appear

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions