Skip to content

Subtransaction cache overflow with --no-kill-backend --wait-timeout=10800 #457

@asvrvs

Description

@asvrvs

pg_repack_assignment_table_log.txt

Summary

When applying pg_repack in safe mode (--no-kill-backend) with a long wait timeout (specifically, 10800 sec) PostgreSQL cluster could degrade with so-called subtransactions (or subtransactions cache overflow) problem.

The (supposed) mechanics of the problem:

  1. after applying CDC-log to a temporary table for the first time (and before the repack_swap) pg_repack tries to escalate the lock to AccessExclusive level;
  2. to do this, it creates a SAVEPOINT (which opens a subtransaction):
LOG: (query) SAVEPOINT repack_sp1
LOG: (query) SET LOCAL statement_timeout = 100
LOG: (query) LOCK TABLE public.assignment IN ACCESS EXCLUSIVE MODE
  1. if the load on the table is high, this (usually) doesn't work and subtransaction rolls back:
LOG: (query) ROLLBACK TO SAVEPOINT repack_sp1
  1. and if the wait timeout is quite long (like, --wait-timeout=10800) SAVEPOINT … LOCK TABLE … ROLLBACK TO repeats N times (in our case – more than 3500);
  2. crucial moment (IMHO) is how PostgreSQL implements SAVEPOINT: SQL requires a savepoint to be destroyed automatically when another savepoint with the same name is established. In PostgreSQL, the old savepoint is kept, though only the more recent one will be used when rolling back or releasing.
  3. when rolling back to a savepoint, PostgreSQL doesn't release (delete) it: The savepoint remains valid and can be rolled back to again later, if needed.
  4. so, with multiple SAVEPOINT … LOCK TABLE … ROLLBACK TO PostgreSQL essentially creates multiple subtransactions with the same name (repack_sp1);
  5. the problem is that: The more subtransactions each transaction keeps open (not rolled back or released), the greater the transaction management overhead. Up to 64 open subxids are cached in shared memory for each backend; after that point, the storage I/O overhead increases significantly due to additional lookups of subxid entries in pg_subtrans.;
  6. as with every new incoming transaction PostgreSQL must determine the visibility of tuples in a table by creating its snapshot (which includes all current table transactions with its subtransactions), the cluster quickly degrades with such symptoms as SubtransSLRU and SubtransBuffer wait events go off the roof.

And this is exactly what happened in our case:

After pg_repack was terminated, PostgreSQL cluster has quickly resumed its normal operation.

Proposed solution

If I may suggest an improvement, pg_repack could not only roll back to savepoint, but release it:

ROLLBACK TO SAVEPOINT repack_sp1;
RELEASE SAVEPOINT repack_sp1;

RELEASE SAVEPOINT frees savepoint resources and it should prevent subtransactions cache overflow issue.

Would be glad to hear your comments.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions