Skip to content

Stateless retention period #96

@nickevansuk

Description

@nickevansuk

Background

To minimise the number of RPDE pages a data consumer needs to harvest in order to reach "relevant" data, a retention period can be utilised. "Relevant" data for the purposes of this proposal are records that contain data relating to a events in the future.

Overall approach

An effective retention period can be implemented by having the first page of the feed start from the first relevant record (instead of from the beginning of time). This approach is useful for simple cases where a CRON job is not desirable.

The idea of moving the "first page" is that the data consumer starts harvesting from a point in the feed where all events in the future are ahead of it. The beginning of the feed effectively moves, so when the data consumer starts from the beginning of the feed, they always start from the earliest point that’s relevant, but anyone who’s already consuming the data will be unaffected.

The approaches below all maintain the integrity of the core RPDE invariants.

The main challenge is that the RPDE query is a > query, and hence just finding and using the earliest relevant record in the feed as the "default value" parameters for the query will exclude that record.

Approach 1: Precise

Alter the RPDE query slightly such that it precisely includes the "default value"

To get the "default values"

  --get default values, to be executed ONLY if @afterTimestamp and @afterId NOT provided
  SELECT @firstTimestamp = modified, @firstId = id
    FROM ...
   WHERE startDate >= @now AND state <> "deleted"
ORDER BY modified, id
   LIMIT 1

In the RPDE query

   --include this WHERE clause only if default values are available (first page)
   WHERE (modified = @firstTimestamp AND id >= @firstId) OR (modified > @firstTimestamp)
   --include this WHERE clause only if @afterTimestamp and @afterId provided (not first page)
   WHERE (modified = @afterTimestamp AND id > @afterId) OR (modified > @afterTimestamp)
      -- if default values are not available and @afterTimestamp and @afterId not provided, do not include WHERE clause
ORDER BY modified, id

Approach 2: Rough

Alter the RPDE query slightly such that it includes the "default value", and anything that was modified at the same time

This approach does not take into account bulk updates

To get the "default values"

  --get default values, to be executed ONLY if @afterTimestamp and @afterId NOT provided
  SELECT @firstTimestamp = modified
    FROM ...
   WHERE startDate >= @now AND state <> "deleted"
ORDER BY modified
   LIMIT 1

In the RPDE query

   --include this WHERE clause only if default values are available (first page)
   WHERE (modified >= @firstTimestamp)
   --include this WHERE clause only if @afterTimestamp and @afterId provided (not first page)
   WHERE (modified = @afterTimestamp AND id > @afterId) OR (modified > @afterTimestamp)
   -- if default values are not available and @afterTimestamp and @afterId not provided, do not include WHERE clause
ORDER BY modified, id

Approach 3: Simple

Use id - 1 to ensure that the default value is included (note modified - 1 can be used for non-integer id values, though this will not be as precise for bulk updates).

To get the "default values"

  --get default values, to be executed ONLY if @afterTimestamp and @afterId NOT provided
  SELECT @afterTimestamp = modified, @afterId = id - 1
    FROM ...
   WHERE startDate >= @now AND state <> "deleted"
ORDER BY modified, id
   LIMIT 1

In the RPDE query

Note no changes to the RPDE query are required

   --include this WHERE clause only if @afterTimestamp and @afterId provided OR default values are provided
   WHERE (modified = @afterTimestamp AND id > @afterId)
      OR (modified > @afterTimestamp)
ORDER BY modified, id

Approach 4: Specific

Use two queries to obtain the precise value to use in the existing RPDE query

Get the first relevant record, and use this to attain the default values

  --to be executed ONLY if @afterTimestamp and @afterId NOT provided

  --get first relevant record
  SELECT @relevantAfterTimestamp = modified, @relevantAfterId = id
    FROM ...
   WHERE startDate >= @now AND state <> "deleted"
ORDER BY modified, id
   LIMIT 1

  --get default vlaues
  SELECT @afterTimestamp = modified, @afterId = id
    FROM ...
   WHERE modified < @relevantAfterTimestamp AND id < @relevantAfterId
ORDER BY modified DESC, id DESC
   LIMIT 1

In the RPDE query

Note no changes to the RPDE query are required

   --include this WHERE clause only if @afterTimestamp and @afterId provided OR default values are provided
   WHERE (modified = @afterTimestamp AND id > @afterId)
      OR (modified > @afterTimestamp)
ORDER BY modified, id

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions