Columnar storage / compression

The commits table of the mergebot is already getting pretty large (2.5GB). This is almost entirely from the `statuses` column, which does not compress that well intrinsically but is *extremely* redundant between rows, although not identical (because various items include build numbers or identifiers, commit authors, runtime duration, ...)

There are, most likely, three good options here:

1. prune older commits, commits are fundamentally just a vessel to move statuses over to PRs and stagings, once those are closed, merged, or otherwise completed the commits stop being useful (eventually), this requires no external dependencies or extras, and can even be done on a point basis (e.g. every few months clean all the commits older than 6 months which don't match any open PR, or something), even on commits which are conserved (staging heads and commits) it should be possible to remove the statuses themselves
2. encode runbot patterns and perform manual content extraction into known templates
3. partitioning based on the commit's write date, although that doesn't really save on retrieving backups / duplicating the DB I think, unless the partition can use a different storage form than the source?
4. use some sort of compressible columnar storage e.g. [timescaledb](https://www.tigerdata.com/timescaledb), [pg-xpatch](https://github.com/ImGajeed76/pg-xpatch) (no support for UPDATE tho), ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Columnar storage / compression #1383

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Columnar storage / compression #1383

Description

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions