Skip to content

Design decision: should backfill ingestion publish to the change stream? #40

@maxine-at-forecast

Description

@maxine-at-forecast

Context

From adversarial review of v0.4.0b1 (W11).

Problem

Records ingested via the backfill path (ingestion/backfill.py) do not emit change events to the ChangeStream. Any client subscribed to subscribeChanges will miss records that arrive through backfill.

Considerations

Arguments for publishing backfill events:

  • Subscribers get a complete view of all data changes regardless of ingestion path
  • Simplifies client logic — no need for separate backfill awareness

Arguments against:

  • Backfill is historical data, not real-time changes — semantically different
  • Backfill can produce thousands of events in rapid succession, overwhelming subscriber queues and triggering backpressure disconnects
  • Clients that care about historical completeness should use query endpoints, not the live stream

Decision needed

  • Should backfill publish to the change stream?
  • If yes, should events be tagged with a source: "backfill" field so clients can filter?
  • If no, should this be documented explicitly in the subscribeChanges endpoint docs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions