Skip to content

Camera job becomes unstable or stops after long runtime #54

Description

@nathabee

Type

bug, camera, performance, backend, frontend

Problem

When a camera job is started, it works for a long time, possibly around two hours or more, but later the system becomes unstable. The frontend appears saturated, and the backend camera job may stop producing new snapshots or stop responding correctly.

Current suspicion: the problem may be caused by one or more of these:

  • frontend DOM or image refresh leak
  • repeated image reload without cache control
  • unbounded frontend polling
  • backend camera loop not isolated enough from dashboard polling
  • disk growth from snapshots / deltas / events
  • memory pressure caused by image loading, image comparison, or retained buffers
  • database contention if camera events/snapshots are persisted too frequently
  • HTTP request accumulation if the frontend keeps polling while previous requests are still pending

Expected behavior

The camera job must run independently in the backend after being started.

The dashboard may disconnect, close, reload, or become unavailable, but the backend camera job should continue until:

  • the user stops it,
  • the runtime stops,
  • the camera source fails,
  • a configured error threshold is reached,
  • or the job lifecycle explicitly moves to FAILED, STOPPED, or COMPLETED.

The frontend must not be part of the camera capture loop.

Actual behavior

After a long camera-job runtime, the system degrades. The dashboard appears saturated and the backend may no longer behave as expected.

Design rule

The camera job must be backend-owned.

Bad pattern:

Frontend timer -> call API -> backend captures one picture -> frontend timer -> repeat

Correct pattern:

Frontend -> POST start camera job
Backend scheduler/job loop -> captures snapshots independently
Frontend -> GET camera job status / latest snapshot metadata
Frontend -> optional image refresh

Investigation points

Check whether:

  • camera capture is triggered by frontend polling instead of backend scheduling
  • image elements are recreated endlessly instead of reusing/updating one stable element
  • old image URLs remain referenced in the DOM
  • URL.createObjectURL() is used without URL.revokeObjectURL()
  • snapshot/delta lists are appended without trimming
  • event rendering appends infinitely instead of replacing/paging
  • frontend polling continues while a previous request is still unresolved
  • the backend camera loop runs on a shared HTTP handler thread instead of a scheduler/executor
  • snapshot persistence opens image streams without closing them
  • image buffers are retained after analysis
  • database writes happen too often or without batching/deduplication
  • snapshot/delta retention is missing or too permissive

Acceptance criteria

  • Starting the camera job creates a backend-side job/session with an explicit lifecycle.
  • Closing the browser does not stop the camera job.
  • Reopening the dashboard shows the current camera job state.
  • The frontend only polls status/latest metadata at a bounded interval.
  • The frontend must not accumulate unbounded DOM nodes, images, logs, or event rows.
  • Long-run test of at least 3 hours does not show continuous heap growth.
  • Long-run test of at least 3 hours does not show uncontrolled browser memory growth.
  • Camera job failure is visible as a clear backend job state, not only as a frozen dashboard.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions