I hit what appeared to be "slow boot" when testing with libkrun and libkrunfw and a in-guest agent that was communicating over vsock.
I put a reproducer together and made a draft pr for it #683 . The PR is just as a placeholder to hold the reproducer. Claude did aid in this development, but please don't discount this as AI slop. If I'm using it wrong, please feel free to let me know.
When a host process connect()s to the unix socket exposed by krun_add_vsock_port2(..., listen=true), the very first attempt after VM boot behaves as follows:
connect() returns immediately (libkrun's UnixAcceptorProxy accepts on the host)
read(1) blocks for ~5 seconds
read() returns 0 (EOF) — no byte was ever delivered from the guest
- a second host connection, opened immediately afterwards, completes a full round trip in well under 1 ms
A caller that probes for guest readiness by retrying on EOF sees its total cold-start time dominated by this 5 s block.
Reproducer
A self-contained C reproducer lives in #683 at examples/vsock-latency/. The guest binary is a tiny statically linked program that binds an AF_VSOCK listener, accepts a connection, and writes one ack byte. The host program configures the VM, spawns a timing thread, and calls
krun_start_enter.
cd examples/vsock-latency
make
make demo
Sample output (libkrunfw 5.4.0 / bundled kernel 6.12.87 / host kernel 6.18):
=== PROBE_TIMEOUT_MS=unset (cold-boot, ~5s)... ===
attempt[1] +5008.00 ms read=EOF
attempt[2] +0.39 ms read=ack
phase: first-roundtrip-ok +5330.88 ms (delta-from-socket=5018.48 ms, attempts=2)
=== PROBE_TIMEOUT_MS=2000 (cold-boot, ~2.5s)... ===
attempt[1] +2029.04 ms read=timeout
attempt[2] +0.29 ms read=ack
phase: first-roundtrip-ok +2351.62 ms (delta-from-socket=2039.41 ms, attempts=2)
=== PROBE_TIMEOUT_MS=100 (cold-boot, ~0.5s)... ===
attempt[1] +101.94 ms read=timeout
attempt[2] +0.30 ms read=ack
phase: first-roundtrip-ok + 424.91 ms (delta-from-socket=112.34 ms, attempts=2)
The diagnostic line is attempt[1] +5008 ms read=EOF: a host connect() + read(1) round trip that should take milliseconds takes 5 seconds before libkrun closes the socket with EOF.
Trace through the code
Following the host-side path on the doomed first attempt:
- Host
connect() → libkrun's UnixAcceptorProxy::process_event accepts the unix connection (src/devices/src/virtio/vsock/unix.rs:687-714).
- libkrun creates a new proxy for the connection and tries to complete the vsock handshake to the guest's AF_VSOCK listener.
- The vsock leg of that first attempt does not complete and the proxy gets marked
ProxyRemoval::Deferred (src/devices/src/virtio/vsock/muxer_thread.rs:99-104, visible as deferring proxy removal: <id> WARN messages).
- The reaper thread holds the proxy in
released_map for TIMEOUT = Duration::new(5, 0) before removing it (src/devices/src/virtio/vsock/reaper.rs:10,32-38).
- While the proxy is in
released_map, the host-side unix socket FD stays bound to it. The host's read() therefore blocks until the reaper finally removes the proxy and closes the unix FD, yielding EOF.
Subsequent host connections create a fresh proxy and complete normally in sub-millisecond time.
Environment
- libkrun: 1.18.0 (built from source)
- libkrunfw: 5.4.0
- libkrunfw bundled kernel: 6.12.87
- Host kernel: 6.18.x
- Architecture: x86_64
I hit what appeared to be "slow boot" when testing with libkrun and libkrunfw and a in-guest agent that was communicating over vsock.
I put a reproducer together and made a draft pr for it #683 . The PR is just as a placeholder to hold the reproducer. Claude did aid in this development, but please don't discount this as AI slop. If I'm using it wrong, please feel free to let me know.
When a host process
connect()s to the unix socket exposed bykrun_add_vsock_port2(..., listen=true), the very first attempt after VM boot behaves as follows:connect()returns immediately (libkrun'sUnixAcceptorProxyaccepts on the host)read(1)blocks for ~5 secondsread()returns 0 (EOF) — no byte was ever delivered from the guestA caller that probes for guest readiness by retrying on EOF sees its total cold-start time dominated by this 5 s block.
Reproducer
A self-contained C reproducer lives in #683 at
examples/vsock-latency/. The guest binary is a tiny statically linked program that binds an AF_VSOCK listener, accepts a connection, and writes one ack byte. The host program configures the VM, spawns a timing thread, and callskrun_start_enter.Sample output (libkrunfw 5.4.0 / bundled kernel 6.12.87 / host kernel 6.18):
The diagnostic line is
attempt[1] +5008 ms read=EOF: a hostconnect()+read(1)round trip that should take milliseconds takes 5 seconds before libkrun closes the socket with EOF.Trace through the code
Following the host-side path on the doomed first attempt:
connect()→ libkrun'sUnixAcceptorProxy::process_eventaccepts the unix connection (src/devices/src/virtio/vsock/unix.rs:687-714).ProxyRemoval::Deferred(src/devices/src/virtio/vsock/muxer_thread.rs:99-104, visible asdeferring proxy removal: <id>WARN messages).released_mapforTIMEOUT = Duration::new(5, 0)before removing it (src/devices/src/virtio/vsock/reaper.rs:10,32-38).released_map, the host-side unix socket FD stays bound to it. The host'sread()therefore blocks until the reaper finally removes the proxy and closes the unix FD, yielding EOF.Subsequent host connections create a fresh proxy and complete normally in sub-millisecond time.
Environment