The commits table of the mergebot is already getting pretty large (2.5GB). This is almost entirely from the statuses column, which does not compress that well intrinsically but is extremely redundant between rows, although not identical (because various items include build numbers or identifiers, commit authors, runtime duration, ...)
There are, most likely, three good options here:
- prune older commits, commits are fundamentally just a vessel to move statuses over to PRs and stagings, once those are closed, merged, or otherwise completed the commits stop being useful (eventually), this requires no external dependencies or extras, and can even be done on a point basis (e.g. every few months clean all the commits older than 6 months which don't match any open PR, or something), even on commits which are conserved (staging heads and commits) it should be possible to remove the statuses themselves
- encode runbot patterns and perform manual content extraction into known templates
- partitioning based on the commit's write date, although that doesn't really save on retrieving backups / duplicating the DB I think, unless the partition can use a different storage form than the source?
- use some sort of compressible columnar storage e.g. timescaledb, pg-xpatch (no support for UPDATE tho), ...
The commits table of the mergebot is already getting pretty large (2.5GB). This is almost entirely from the
statusescolumn, which does not compress that well intrinsically but is extremely redundant between rows, although not identical (because various items include build numbers or identifiers, commit authors, runtime duration, ...)There are, most likely, three good options here: