Background
To minimise the number of RPDE pages a data consumer needs to harvest in order to reach "relevant" data, a retention period can be utilised. "Relevant" data for the purposes of this proposal are records that contain data relating to a events in the future.
Overall approach
An effective retention period can be implemented by having the first page of the feed start from the first relevant record (instead of from the beginning of time). This approach is useful for simple cases where a CRON job is not desirable.
The idea of moving the "first page" is that the data consumer starts harvesting from a point in the feed where all events in the future are ahead of it. The beginning of the feed effectively moves, so when the data consumer starts from the beginning of the feed, they always start from the earliest point that’s relevant, but anyone who’s already consuming the data will be unaffected.
The approaches below all maintain the integrity of the core RPDE invariants.
The main challenge is that the RPDE query is a > query, and hence just finding and using the earliest relevant record in the feed as the "default value" parameters for the query will exclude that record.
Approach 1: Precise
Alter the RPDE query slightly such that it precisely includes the "default value"
To get the "default values"
--get default values, to be executed ONLY if @afterTimestamp and @afterId NOT provided
SELECT @firstTimestamp = modified, @firstId = id
FROM ...
WHERE startDate >= @now AND state <> "deleted"
ORDER BY modified, id
LIMIT 1
In the RPDE query
--include this WHERE clause only if default values are available (first page)
WHERE (modified = @firstTimestamp AND id >= @firstId) OR (modified > @firstTimestamp)
--include this WHERE clause only if @afterTimestamp and @afterId provided (not first page)
WHERE (modified = @afterTimestamp AND id > @afterId) OR (modified > @afterTimestamp)
-- if default values are not available and @afterTimestamp and @afterId not provided, do not include WHERE clause
ORDER BY modified, id
Approach 2: Rough
Alter the RPDE query slightly such that it includes the "default value", and anything that was modified at the same time
This approach does not take into account bulk updates
To get the "default values"
--get default values, to be executed ONLY if @afterTimestamp and @afterId NOT provided
SELECT @firstTimestamp = modified
FROM ...
WHERE startDate >= @now AND state <> "deleted"
ORDER BY modified
LIMIT 1
In the RPDE query
--include this WHERE clause only if default values are available (first page)
WHERE (modified >= @firstTimestamp)
--include this WHERE clause only if @afterTimestamp and @afterId provided (not first page)
WHERE (modified = @afterTimestamp AND id > @afterId) OR (modified > @afterTimestamp)
-- if default values are not available and @afterTimestamp and @afterId not provided, do not include WHERE clause
ORDER BY modified, id
Approach 3: Simple
Use id - 1 to ensure that the default value is included (note modified - 1 can be used for non-integer id values, though this will not be as precise for bulk updates).
To get the "default values"
--get default values, to be executed ONLY if @afterTimestamp and @afterId NOT provided
SELECT @afterTimestamp = modified, @afterId = id - 1
FROM ...
WHERE startDate >= @now AND state <> "deleted"
ORDER BY modified, id
LIMIT 1
In the RPDE query
Note no changes to the RPDE query are required
--include this WHERE clause only if @afterTimestamp and @afterId provided OR default values are provided
WHERE (modified = @afterTimestamp AND id > @afterId)
OR (modified > @afterTimestamp)
ORDER BY modified, id
Approach 4: Specific
Use two queries to obtain the precise value to use in the existing RPDE query
Get the first relevant record, and use this to attain the default values
--to be executed ONLY if @afterTimestamp and @afterId NOT provided
--get first relevant record
SELECT @relevantAfterTimestamp = modified, @relevantAfterId = id
FROM ...
WHERE startDate >= @now AND state <> "deleted"
ORDER BY modified, id
LIMIT 1
--get default vlaues
SELECT @afterTimestamp = modified, @afterId = id
FROM ...
WHERE modified < @relevantAfterTimestamp AND id < @relevantAfterId
ORDER BY modified DESC, id DESC
LIMIT 1
In the RPDE query
Note no changes to the RPDE query are required
--include this WHERE clause only if @afterTimestamp and @afterId provided OR default values are provided
WHERE (modified = @afterTimestamp AND id > @afterId)
OR (modified > @afterTimestamp)
ORDER BY modified, id
Background
To minimise the number of RPDE pages a data consumer needs to harvest in order to reach "relevant" data, a retention period can be utilised. "Relevant" data for the purposes of this proposal are records that contain data relating to a events in the future.
Overall approach
An effective retention period can be implemented by having the first page of the feed start from the first relevant record (instead of from the beginning of time). This approach is useful for simple cases where a CRON job is not desirable.
The idea of moving the "first page" is that the data consumer starts harvesting from a point in the feed where all events in the future are ahead of it. The beginning of the feed effectively moves, so when the data consumer starts from the beginning of the feed, they always start from the earliest point that’s relevant, but anyone who’s already consuming the data will be unaffected.
The approaches below all maintain the integrity of the core RPDE invariants.
The main challenge is that the RPDE query is a
>query, and hence just finding and using the earliest relevant record in the feed as the "default value" parameters for the query will exclude that record.Approach 1: Precise
Alter the RPDE query slightly such that it precisely includes the "default value"
To get the "default values"
In the RPDE query
Approach 2: Rough
Alter the RPDE query slightly such that it includes the "default value", and anything that was modified at the same time
This approach does not take into account bulk updates
To get the "default values"
In the RPDE query
Approach 3: Simple
Use
id - 1to ensure that the default value is included (notemodified - 1can be used for non-integeridvalues, though this will not be as precise for bulk updates).To get the "default values"
In the RPDE query
Note no changes to the RPDE query are required
Approach 4: Specific
Use two queries to obtain the precise value to use in the existing RPDE query
Get the first relevant record, and use this to attain the default values
In the RPDE query
Note no changes to the RPDE query are required