Skip to content

bitswap: taskWorker still blocks on stream.Close #1142

@lidel

Description

@lidel

Summary

Bitswap taskWorker goroutines in boxo/bitswap/server spend up to DefaultNegotiationTimeout (10s) each on stream.Close() when the remote peer is slow or silent during the multistream lazy handshake. Each stalled Close holds a worker until its SetReadDeadline expires. On a busy node with even a small fraction of unresponsive peers, the TaskWorkerCount pool saturates and bitswap stops serving blocks to new peers.

We observed this on collab-cluster nodes running kubo v0.41.0-rc2 (boxo v0.38.0) while investigating #1141. The earlier fix in #1083 (boxo v0.36.0) set a read deadline to bound each individual Close, and that deadline works as intended. The remaining problem is structural: 10s per worker is still too long under realistic churn bigger storage provider may experience.

Environment

Component Version
kubo v0.41.0-rc2
boxo v0.38.0
go-libp2p v0.48.0
go-multistream v0.6.1
Observed on collab-cluster-rbx8.cluster.dwebops.net

Call path

Every busy taskWorker goroutine on the affected host sat at the same frame:

boxo/bitswap/server.taskWorker                  (server.go:278)
  boxo/bitswap/server.sendBlocks                (server.go:335)
  boxo/bitswap/network.router.SendMessage       (router.go:104)
  boxo/bitswap/network/bsnet.impl.SendMessage   (ipfs_impl.go:401)
  go-libp2p/p2p/host/basic.streamWrapper.Close  (basic_host.go:691)
  multistream.lazyClientConn.Close              (lazyClient.go:196)
  multistream.once.Do                           (lazyClient.go:53 / 58)
  ...doReadHandshake...                          (lazyClient.go:107)
  ...stream.Read blocked in yamux or quic-go...

lazyClientConn.Close() finishes the handshake it skipped during the write path by reading a varint token. If the peer is gone or silent, that read blocks until the SetReadDeadline set in bsnet.SendMessage expires.

How widespread is this on a running node

On rbx8 at the moment of capture (kubo v0.41.0-rc2, 12-hour uptime, TaskWorkerCount=12):

  • 8 of 12 workers sat in this frame.
  • vole bitswap check against this node timed out in 3 of 3 trials (30s timeout).

A snapshot on a v0.40.1 sibling node showed 5 of 12 workers in the same frame, with partial probe success.

Reproduction

A minimal reproducer uses net.Conn and go-multistream, with no libp2p involved:

  • Silent server: TCP listener that accepts and holds the connection without responding.
  • Client: NewMSSelect, then Write(payload), then optional SetReadDeadline, then Close.
  • Measure elapsed time for Close.

Results:

  • Without SetReadDeadline: Close stuck indefinitely (tested to 10s cap).
  • With SetReadDeadline(2s): Close returns in 2.000289693s. The deadline chain works.

Quantifying the pool-saturation effect

A 12-worker pool processes 200 tasks, 10 percent of which have a silent peer, with a 10s close deadline (same as production):

Variant Completed Time Tasks/s
Current path: worker waits for Close 194 / 200 20s (budget capped) 9.7
Decoupled: Close runs in a goroutine, worker returns after Write 200 / 200 5ms ~40,000

At 10 percent unresponsive peers, the synchronous close drops effective throughput by about 4000x. This matches the field observation: engine -> msg fires for thousands of other peers in the same window that a new peer sees zero.

Why it makes sense to fix this in boxo

Two upstream locations could bound Close directly: go-multistream (in lazyClientConn.Close) and go-libp2p (in streamWrapper.Close). Either would help every consumer.

Some reasons to prefer a boxo-local fix:

  • go-multistream is used by implementations beyond libp2p. Changing Close semantics, even to "bounded with a timeout", could surprise older code that depends on the blocking guarantee for correctness (the QUIC STOP_SENDING edge case the current code comment describes).
  • The future of go-libp2p maintenance is uncertain, so an upstream round trip for behaviour that mainly matters to bitswap may take a while to land.
  • Bitswap has a clear opinion: once the message bytes are written, the multistream ack is not needed to consider the send done. This is a bitswap-specific choice, and boxo is a natural place for it.

A boxo-local fix keeps the blast radius small, ships on boxo's release cadence, and avoids renegotiating Close semantics with every other consumer of go-multistream.

Proposed fix

In bsnet.SendMessage, after msgToStream succeeds, move Close to a goroutine and return. The worker is freed as soon as the bytes are written.

// Sketch; not final.
if err := bsnet.msgToStream(ctx, s, outgoing, timeout); err != nil {
    _ = s.Reset()
    return err
}
go func(s network.Stream, deadline time.Time) {
    _ = s.SetReadDeadline(deadline)
    _ = s.Close()
}(s, time.Now().Add(timeout))
return nil

Trade-offs:

  • The caller no longer sees errors from Close. Current callers in boxo log them and move on, and the bitswap message itself is already written to the stream before Close is reached.
  • Each send spawns one goroutine. Goroutines are cheap, and the in-flight total is already bounded by the deadline. In steady state, the close-goroutine pool size is about send_rate * deadline, which is the same work the task-worker pool holds today. The change moves that holding away from the workers.
  • streamMessageSender (the persistent per-peer sender) uses a different code path so not impacted (but will double check in a PR)

Related

Artifacts

Two goroutines.stacks files from collab cluster production (Kubo v0.40.1 and v0.41.0-rc2), the full ipfs diag profile bundles, and the two reproducer programs are available on the boxes.

Next steps

I will open a PR with the proposed fix, plus a regression test that mirrors the pool-saturation benchmark.

Metadata

Metadata

Assignees

Labels

P1High: Likely tackled by core team if no one steps upkind/bugA bug in existing code (including security flaws)need/analysisNeeds further analysis before proceeding

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions