Skip to content

sendInteractions: validate that referenced datasets exist before recording analytics #41

@maxine-at-forecast

Description

@maxine-at-forecast

Context

From adversarial review of v0.4.0b1 (W7).

Problem

The sendInteractions endpoint validates that datasetUri is a syntactically valid AT-URI with the correct collection (science.alt.dataset.entry), but never checks that the referenced dataset actually exists in the entries table. Compare with publishLabel which does query_get_entry(pool, d_did, d_rkey) and returns 400 if not found.

Without this check, the analytics tables accumulate orphan events for nonexistent datasets, which could pollute analytics dashboards and waste storage.

Trade-offs

  • Adding an existence check means a DB query per interaction item (up to 100 per batch), which increases latency for a fire-and-forget endpoint
  • Could batch the existence checks with a single query_get_entries call for the whole batch instead of per-item lookups
  • Alternatively, could do a soft check (log a warning but still record) to avoid rejecting valid interactions for recently-deleted datasets

Acceptance criteria

  • Interactions referencing nonexistent datasets are either rejected or flagged
  • Performance impact is minimal (batch lookup preferred over per-item)
  • Tests cover both existing and nonexistent dataset URIs

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions