Skip to content

Panic on Tx rollback when connection has been closed by PgBouncer/Postgres (close of closed channel) #2470

@molguin92

Description

@molguin92

Describe the bug

Tx.Rollback() consistently panics when the underlying connection has been closed by the server (PgBouncer or Postgres) due to an idle-in-transaction timeout.

Stack trace:

panic: close of closed channel

goroutine 3122 [running]:
github.com/jackc/pgx/v5/pgconn.(*PgConn).receiveMessage(0xc000419b08)
        /home/gitlab-runner/go/pkg/mod/github.com/jackc/pgx/v5@v5.7.0/pgconn/pgconn.go:583 +0x370
github.com/jackc/pgx/v5/pgconn.(*Pipeline).getResults(0xc000419c60)
        /home/gitlab-runner/go/pkg/mod/github.com/jackc/pgx/v5@v5.7.0/pgconn/pgconn.go:2162 +0x34
github.com/jackc/pgx/v5/pgconn.(*Pipeline).Close(0xc000419c60)
        /home/gitlab-runner/go/pkg/mod/github.com/jackc/pgx/v5@v5.7.0/pgconn/pgconn.go:2263 +0xf6
github.com/jackc/pgx/v5.(*Conn).deallocateInvalidatedCachedStatements(0xc000dceb40, {0xdde860, 0xc0006c0e00})
        /home/gitlab-runner/go/pkg/mod/github.com/jackc/pgx/v5@v5.7.0/conn.go:1402 +0x226
github.com/jackc/pgx/v5.(*Conn).Exec(0xc000dceb40, {0xdde860?, 0xc0006c0e00?}, {0xcc6678, 0x8}, {0x0, 0x0, 0x0})
        /home/gitlab-runner/go/pkg/mod/github.com/jackc/pgx/v5@v5.7.0/conn.go:462 +0xcc
github.com/jackc/pgx/v5.(*dbTx).Rollback(0xc000012c00, {0xdde860?, 0xc0006c0e00?})
        /home/gitlab-runner/go/pkg/mod/github.com/jackc/pgx/v5@v5.7.0/tx.go:204 +0x4b
github.com/jackc/pgx/v5/pgxpool.(*Tx).Rollback(0xc000012c18, {0xdde860?, 0xc0006c0e00?})
        /home/gitlab-runner/go/pkg/mod/github.com/jackc/pgx/v5@v5.7.0/pgxpool/tx.go:38 +0x2a

I suspect this a repeat of the issue identified back in #1920

To Reproduce
Steps to reproduce the behavior:

  1. Initialize a PgxPool and connect to a PostgreSQL instance behind a PgBouncer proxy.
  2. Initiate a transaction: tx, _ := pool.Begin(ctx).
  3. Hold the transaction for the exact same amount of time the PgBouncer proxy and PostgreSQL database are configured to terminate idle-in-transaction connections.
  4. Attempt to commit something to the database using an expired context. This will result, as expected, in a "context deadline exceeded" error:
err := tx.Commit(ctx) // err is context.DeadlineExceeded
  1. Now, attempt a rollback of the transaction using a different context. For instance, using the background context as is generally considered best practice. This causes a panic.
tx.Rollback(context.Background()) // panic!!

Unfortunately, I have been unable to reproduce this in a unit or integration test, given the toil involved in configuring PgBouncer for testing. Following the above steps with only PostgreSQL (no PgBouncer) does not result in the panic, which leads me to believe PgBouncer is a key factor here.

I have been able to reproduce this reliably in a staging environment, inside a Kubernetes pod and connecting to a proper PgBouncer + PostgreSQL setup. I can't really reproduce the full code here due to the amount of boilerplate necessary to run this in said K8s pod, but the general structure of the relevant code segment is:

// Both PgBouncer and PostgreSQL are configured to terminate idle transactions at
// the 1 minute mark.
cacheCtx, cacheCtxCancel := context.WithTimeout(context.Background(), time.Minute)
defer cacheCtxCancel()
tx, err := pool.Begin(cacheCtx)
if err != nil {
    panic(err.Error())
}

time.Sleep(time.Minute)
// at this point both the cache context and the idle-in-transaction deadlines
// should have been hit.

_, err = tx.Exec(cacheCtx, "INSERT INTO random_table (value) VALUES ($1);", 123)
if err != context.DeadlineExceeded {
    panic(fmt.Sprintf("%v", err))
}
err = tx.Rollback(context.Background()) // panics!!

Expected behavior
Rollback after a connection has been closed by the server should return an error, but never panic.

Actual behavior
Rollback after a connection has been closed by the server results in a panic in (*PgConn).receiveMessage: close of closed channel.

Version

  • Go: go version go1.25.5 linux/amd64.
  • PostgreSQL: PostgreSQL 14.8
  • pgx: v5.7.0

Additional context

I suspect this a regression of behavior identified and fixed back in #1920

Looking at the stack traces and the code, I note that the last three calls in the stack traces of these two issues are almost exactly the same, except in my case (*Pipeline).Close calls (*Pipeline).getResults, whereas in #1920 (pgx version v5.4.3), (*Pipeline).Close used to instead call (*Pipeline).GetResults (note the capitalization!!). (*Pipeline).GetResults contains a check for the p.closed flag used in the fix for #1920 (2e84dcc), whereas (*Pipeline).getResults does not include this check.

I believe this regression may have accidentally been introduced in 22fe501#diff-7600c6d53b2bb66dc26f15ff10fb080ad09158eea6e7562a8dc1db33d1ea2abfL2181.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions