Cherry pick commits for v1.18.1 by slp · Pull Request #682 · containers/libkrun

slp · 2026-05-18T16:16:16Z

This PR cherry picks some commits from main to be included in v1.18.1.

Some applications check for network availability by looking for a network device configured for Internet access. When TSI is used, there is no such device available by default, although Internet is accessible. Then those applications behave like when the connection is not available. Let's solve this problem by setting up a dummy network interface. The dummy interface is automatically created when CONFIG_DUMMY is enabled in kernel or the corresponding kernel module is loaded. This means a sufficiently recent libkrunfw version is needed (see containers/libkrunfw#116). The dummy interface is initially down. In order to make the applications happy, the interface must be brought up and set up for Internet connections. This is ensured by setting the IP address to 10.0.0.1/8 (an arbitrary choice without any special reason) in init.c if TSI is enabled. The netmask is selected to be sane; it doesn't cover the whole IP range and we cannot set a default route because then TSI has problems, but it's OK for the tested application. We can change it if some application has trouble with that. TSI availability is determined by checking the presence of `tsi_hijack' in the kernel command line, before `--' delimiter if present. The dummy interface simply swallows all packets. But it is effectively bypassed by TSI for practical purposes. Things like ICMP don't work in either case. When the kernel support is not available, the device is not present and init.c cannot set it up. We skip the configuration silently in such a case, to not spam users with errors if they use older libkrunfw or custom kernels. Fixes: containers#576 Signed-off-by: Milan Zamazal <mzamazal@redhat.com> (cherry picked from commit 2593acc) Signed-off-by: Sergio Lopez <slp@redhat.com>

Fixes containers#650. Signed-off-by: Sven-Hendrik Haase <svenstaro@gmail.com> (cherry picked from commit e12b9b3) Signed-off-by: Sergio Lopez <slp@redhat.com>

In commit c0e42fb0e0 ("vsock/virtio: cap TX credit to local buffer size") the kernel stopped honoring our peer_buf_alloc value, capping it to its own. Use the kernel's peer_buf_alloc instead of CONN_TX_BUF_SIZE as a hint of when we need to send a credit update. Signed-off-by: Sergio Lopez <slp@redhat.com> (cherry picked from commit 4b5b451) Signed-off-by: Sergio Lopez <slp@redhat.com>

We need to bind to the correct socket types (IPv6, Unix) instead of only IPv4. This fixes UDP and unix dgram tests hanging when waiting for reply. Reported-by: Jan Noha <nohajc@gmail.com> Signed-off-by: Matej Hrica <mhrica@redhat.com> (cherry picked from commit 4380b32) Signed-off-by: Sergio Lopez <slp@redhat.com>

The cross_domain `write` handler only matched `CrossDomainItem::WaylandWritePipe`, falling into the catch-all for every other item type after the unconditional `remove()` at the top of the function had already dropped the entry from the table. PipeWire (and other clients that share host-created eventfds via SCM_RIGHTS for per-period wakeups) sends CMD_WRITE on those eventfd identifiers — the first such write returns InvalidCrossDomainItemType *after* removing the item, and every subsequent write on the same identifier returns InvalidCrossDomainItemId, masquerading on the guest as the opaque VIRTIO_GPU_RESP_ERR_UNSPEC (0x1200). Reproduced inside a libkrun guest on x86_64 by routing PipeWire's ALSA-shim audio through a host PipeWire daemon. With a paired BT speaker as the sink, `speaker-test -D pipewire -c2 -t wav -l 1` produces, per stream, ~10 entries of: [ 0.682819] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207) [ 0.723762] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207) [ 0.767615] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207) [ 0.807779] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207) [ 0.852469] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207) [ 0.896552] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207) [ 0.936504] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207) [ 0.980567] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207) [ 1.024476] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207) [ 1.064636] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207) with audio still playing — PipeWire has socket-based fallback timing that doesn't depend on eventfd ack, so the failures are cosmetic for playback. They are not cosmetic for clients that strictly require the eventfd handshake (PipeWire's ALSA shim under heavier loads, and the buffer-pool wakeup path used by V4L2 capture streams). Add an Eventfd arm that mirrors the WaylandWritePipe semantics: `write_volatile` performs the 8-byte counter increment, and the item is re-inserted into the table unless the guest signaled `hang_up`. Verified post-fix: zero CMD_WRITE failures, zero `0x1200` entries in guest dmesg, audio playback unchanged. Camera capture (gst-launch pipewiresrc → MJPEG) also exercises this path for buffer-pool wakeups and runs cleanly with valid 98%-non-zero JPEG frames. Signed-off-by: Adam Ford <adam.ford@anodize.com> (cherry picked from commit 5835d52) Signed-off-by: Sergio Lopez <slp@redhat.com>

The unescape_string() function in init.c, which handles JSON escape sequences when parsing environment variables from .krun_config.json, had a bug where the pointer 'val' was not advanced past the escape character after processing it. When encountering a two-character JSON escape sequence like \n or \", the switch statement pre-increments val to point at the escape character (e.g., 'n' or '"') and writes the unescaped byte to the output. However, it never advances val past that character. On the next loop iteration, the character is not a backslash, so it gets copied again as a literal character. This causes: - \n (JSON-escaped newline) to produce a newline followed by a literal 'n' - \" (JSON-escaped double quote) to produce two double quotes For example, an environment variable set to a JSON string like: {\"key\": \"value\"} would be rendered inside the krun VM as: {""key"": ""value""} Fix this by adding val++ after writing the unescaped character in each case of the switch statement. The 'u' (unicode) case already handles its own pointer arithmetic and is not affected. Fixes: containers#678 Assisted-by: <anthropic/claude-opus-4.6> Signed-off-by: Dusty Mabe <dusty@dustymabe.com> (cherry picked from commit 60fe4f6) Signed-off-by: Sergio Lopez <slp@redhat.com>

building using musl fails with: error[E0412]: cannot find type `statx` in crate `libc` --> src/devices/src/virtio/fs/linux/passthrough.rs:187:39 musl_v1_2_3 allows builds targeting musl 1.2.3 or newer to use statx rust-lang/libc/src/unix/linux_like/mod.rs#L264-L269: cfg_if! { if #[cfg(any( target_env = "gnu", target_os = "android", all(target_env = "musl", musl_v1_2_3) ))] { fixes: containers#431 Signed-off-by: Pepper Gray <hello@peppergray.xyz> (cherry picked from commit c9c92e3) Signed-off-by: Sergio Lopez <slp@redhat.com>

The previous write() held item_state.lock() across the write_volatile call that performs the actual syscall, serializing every cross-domain CMD_WRITE behind any other operation on the items table — including add_item from process_receive, which fires for every new fd received via SCM_RIGHTS as guests open additional channels. For per-period eventfd wakeups (e.g. PipeWire audio streams signaling host playback every audio period), this means the write completes only after any in-flight item_state operation finishes. Under stream-create churn — many guest applications opening streams concurrently, each delivering new fds via SCM_RIGHTS that hit add_item under the same lock — the wait can exceed the audio period budget and produce missed-deadline glitches at the host's audio output. This change shrinks the critical section to a brief fd dup() under the lock, performs the syscall lock-free, and only re-acquires the lock if hang_up indicates the item should be removed. In the common case (hang_up == 0, e.g. repeated eventfd wakeups for an active stream) the table is no longer touched per write, eliminating both the contention and the previous "remove + conditional re-insert" churn. Behavioral changes vs the old code: - Common case (hang_up == 0): item stays in the table; we hand out a dup'd fd for the write. Net behavior identical, lock hold time bounded by dup(). - hang_up == 1: item removed in a separate phase after the write, instead of "removed unconditionally then re-inserted on hang_up == 0". Same observed end state. - Concurrent writes to the same id (no longer serialized): the kernel guarantees atomicity for eventfd writes (8 bytes) and pipe writes <= PIPE_BUF, which are the only two CrossDomainItem variants this branch handles. Each caller dup's its own fd and writes through it independently. Verified with a synthetic reproducer: a sustained 1 kHz sine playing through a guest PipeWire stream while ~200 short-lived guest streams open and close concurrently (each issuing SCM_RIGHTS for new eventfds, contending with the sustained stream's per-period writes for item_state). Capturing the host sink monitor and counting sample-to-sample deltas exceeding a clean-sine threshold, the stress workload produced ~10 distinct glitch bursts in an 8-second capture before this change, and zero across five consecutive runs after. Signed-off-by: Adam Ford <adam.ford@anodize.com> (cherry picked from commit d904143) Signed-off-by: Sergio Lopez <slp@redhat.com>

from_tx_virtq_head assumed a fixed 2-descriptor layout (header + one data descriptor), which breaks with newer kernels that may combine header and data in a single descriptor (Linux 6.2+) or split data across multiple descriptors. Handle three cases: single combined descriptor (zero-copy), classic two-descriptor (zero-copy, unchanged), and multi-descriptor data (copied into an owned contiguous buffer). The RX path already handled the combined case; this brings the TX path to parity and beyond. Assisted-by: Claude Code:claude-opus-4.6 Signed-off-by: Sergio Lopez <slp@redhat.com> (cherry picked from commit 0ecf4d5) Signed-off-by: Sergio Lopez <slp@redhat.com>

This is a patch version update for the stable-1.18.x series. Signed-off-by: Sergio Lopez <slp@redhat.com>

mtjhrc

LGTM!

mz-pdm and others added 9 commits May 18, 2026 17:20

Bump libspa to v0.9

352a1ad

Fixes containers#650. Signed-off-by: Sven-Hendrik Haase <svenstaro@gmail.com> (cherry picked from commit e12b9b3) Signed-off-by: Sergio Lopez <slp@redhat.com>

slp requested review from MatiasVara, dorindabassey, jakecorrenti, mtjhrc and tylerfanelli as code owners May 18, 2026 16:16

Bump patch version to 1.18.1

3411afd

This is a patch version update for the stable-1.18.x series. Signed-off-by: Sergio Lopez <slp@redhat.com>

slp force-pushed the cherry-pick-1.18.1 branch from b44b1b2 to 3411afd Compare May 18, 2026 16:23

jakecorrenti approved these changes May 18, 2026

View reviewed changes

mtjhrc approved these changes May 20, 2026

View reviewed changes

slp merged commit b7e43f0 into containers:stable-1.18.x May 20, 2026
14 checks passed

slp deleted the cherry-pick-1.18.1 branch May 20, 2026 15:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cherry pick commits for v1.18.1#682

Cherry pick commits for v1.18.1#682
slp merged 10 commits into
containers:stable-1.18.xfrom
slp:cherry-pick-1.18.1

slp commented May 18, 2026

Uh oh!

mtjhrc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

slp commented May 18, 2026

Uh oh!

mtjhrc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants