Trim recents table via table swap and add cleanup worker cronjob#4613
Open
zwolf wants to merge 12 commits into
Open
Trim recents table via table swap and add cleanup worker cronjob#4613zwolf wants to merge 12 commits into
zwolf wants to merge 12 commits into
Conversation
This was referenced May 21, 2026
Member
Author
squish would concatenate the SQL-style comments and cause the migration to break. Leaving unsquished rather than delete them or change their format. I loosened up some of the other Rubocop rspec guidelines (# of lines & expects) as well as updated the expected Rails version. |
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Member
Author
|
This is ready for review! @yuenmichelle1 @Tooyosi @lcjohnso |
yuenmichelle1
approved these changes
Jun 4, 2026
Member
Author
|
Found an error. I hadn't changed the cutoff date from 14 to 90 days in the migration. Fixed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The recents table is huge and getting huger. To make the name accurate, we should only retain records from the last 14 days, capped at a maximum of 20 recents per user. This PR executes the initial massive data purge and sets up a Sidekiq cron job to maintain these limits going forward.
Data Migration
Deleting ~400M rows via standard batched DELETE queries would take ages, thrash the database I/O, and leave behind massive index fragmentation (dead tuples). To make this happen with zero downtime and no index bloat, this migration uses a table swap approach:
recents_new)Cleanup Worker
RecentsCleanupWorker runs two separate sweeps hourly:
Time sweep: Deletes anything older than
1490 days using in_batches.Volume sweep: Enforces the 20-record maximum per user, per project.
Deleting this way avoids the "mark now, remove later" antipattern that results in multiple dead tuples (one on update, one on delete) when a row is affected individually. These sweeps bulk delete them by created_at, then by user_id/created_at.
For the volume sweep, querying the entire table with a GROUP BY every hour isn't ideal, even once it reaches a steady state of new and pruned records. So it only queries users who have been active in the last 2 hours (created a new recent) and prunes their extras directly using the new compound index.
If records fall through the cracks (say, a user creates 25 recents and the worker fails for some reason, then they weren't active in the next run's window), those 5 extra records will simply age out and be caught by the
1490-day time sweep. This isn't very likely, though, and keeps these queries super quick.I tested the migration locally with a simulated delay to ensure the transaction lock correctly catches concurrent inserts. That is, I threw in a sleep(30) and created some classifications and recents on the console, and they were copied into the new table by the last step. I'm pretty sure that lock required to copy the few tens of thousands of recents during the swap should be nice and quick and shouldn't noticeably affect request time (especially since those queries are already abysmal).
There will be a follow up PR to drop the bloated recents_old table and that space'll get reclaimed by autovacuuming over time.
edit: I'll clean up some of the Hound suggestions, but I'm not using
SQL.squishto maintain readability and avoid accidentally concatenating comments.Review checklist
apiary.apibfile?