Subtransaction cache overflow with --no-kill-backend --wait-timeout=10800

[pg_repack_assignment_table_log.txt](https://github.com/user-attachments/files/20689178/pg_repack_assignment_table_log.txt)

### Summary

When applying `pg_repack` in safe mode (`--no-kill-backend`) with a long wait timeout (specifically, 10800 sec) PostgreSQL cluster could degrade with so-called subtransactions (or [subtransactions cache overflow](https://www.rockdata.net/blog/subtransaction-cache-overflow/)) problem.

The (supposed) mechanics of the problem:

1. after applying CDC-log to a temporary table for the first time (and before the `repack_swap`) `pg_repack` tries to escalate the lock to AccessExclusive level;
2. to do this, it creates a SAVEPOINT (which opens a subtransaction):

```
LOG: (query) SAVEPOINT repack_sp1
LOG: (query) SET LOCAL statement_timeout = 100
LOG: (query) LOCK TABLE public.assignment IN ACCESS EXCLUSIVE MODE
```

3. if the load on the table is high, this (usually) doesn't work and subtransaction rolls back:

```
LOG: (query) ROLLBACK TO SAVEPOINT repack_sp1
```
4. and if the wait timeout is quite long (like, `--wait-timeout=10800`) SAVEPOINT … LOCK TABLE … ROLLBACK TO repeats N times (in our case – more than 3500);
5. crucial moment (IMHO) is how PostgreSQL [implements](https://www.postgresql.org/docs/current/sql-savepoint.html) SAVEPOINT: _SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In PostgreSQL, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing._
6. when rolling back to a savepoint, PostgreSQL [doesn't release](https://www.postgresql.org/docs/current/sql-rollback-to.html) (delete) it: _The savepoint remains valid and can be rolled back to again later, if needed._
7. so, with multiple SAVEPOINT … LOCK TABLE … ROLLBACK TO PostgreSQL essentially creates multiple subtransactions with the same name (`repack_sp1`);
8. the [problem](https://www.postgresql.org/docs/current/subxacts.html) is that: _The more subtransactions each transaction keeps open (not rolled back or released), the greater the transaction management overhead. Up to 64 open subxids are cached in shared memory for each backend; after that point, the storage I/O overhead increases significantly due to additional lookups of subxid entries in `pg_subtrans`._;
9. as with every new incoming transaction PostgreSQL must determine the visibility of tuples in a table by creating its snapshot (which includes all current table transactions with its subtransactions), the cluster quickly degrades with such symptoms as `SubtransSLRU` and `SubtransBuffer` wait events go off the roof.

And this is exactly what happened in our case:

<img src="https://github.com/user-attachments/assets/42324c1f-5849-4318-a469-42dd33eeafdc" width="800">

After `pg_repack` was terminated, PostgreSQL cluster has quickly resumed its normal operation.

### Proposed solution

If I may suggest an improvement, `pg_repack` could not only roll back to savepoint, but release it:

```
ROLLBACK TO SAVEPOINT repack_sp1;
RELEASE SAVEPOINT repack_sp1;
```

`RELEASE SAVEPOINT` [frees](https://www.postgresql.org/docs/current/sql-release-savepoint.html) savepoint resources and it should prevent subtransactions cache overflow issue.

Would be glad to hear your comments.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Subtransaction cache overflow with --no-kill-backend --wait-timeout=10800 #457

Summary

Proposed solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Subtransaction cache overflow with --no-kill-backend --wait-timeout=10800 #457

Description

Summary

Proposed solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions