Cherry pick commits for v1.18.1#682
Merged
Merged
Conversation
Some applications check for network availability by looking for a network device configured for Internet access. When TSI is used, there is no such device available by default, although Internet is accessible. Then those applications behave like when the connection is not available. Let's solve this problem by setting up a dummy network interface. The dummy interface is automatically created when CONFIG_DUMMY is enabled in kernel or the corresponding kernel module is loaded. This means a sufficiently recent libkrunfw version is needed (see containers/libkrunfw#116). The dummy interface is initially down. In order to make the applications happy, the interface must be brought up and set up for Internet connections. This is ensured by setting the IP address to 10.0.0.1/8 (an arbitrary choice without any special reason) in init.c if TSI is enabled. The netmask is selected to be sane; it doesn't cover the whole IP range and we cannot set a default route because then TSI has problems, but it's OK for the tested application. We can change it if some application has trouble with that. TSI availability is determined by checking the presence of `tsi_hijack' in the kernel command line, before `--' delimiter if present. The dummy interface simply swallows all packets. But it is effectively bypassed by TSI for practical purposes. Things like ICMP don't work in either case. When the kernel support is not available, the device is not present and init.c cannot set it up. We skip the configuration silently in such a case, to not spam users with errors if they use older libkrunfw or custom kernels. Fixes: containers#576 Signed-off-by: Milan Zamazal <mzamazal@redhat.com> (cherry picked from commit 2593acc) Signed-off-by: Sergio Lopez <slp@redhat.com>
Fixes containers#650. Signed-off-by: Sven-Hendrik Haase <svenstaro@gmail.com> (cherry picked from commit e12b9b3) Signed-off-by: Sergio Lopez <slp@redhat.com>
In commit c0e42fb0e0 ("vsock/virtio: cap TX credit to local buffer
size") the kernel stopped honoring our peer_buf_alloc value, capping
it to its own.
Use the kernel's peer_buf_alloc instead of CONN_TX_BUF_SIZE as a hint
of when we need to send a credit update.
Signed-off-by: Sergio Lopez <slp@redhat.com>
(cherry picked from commit 4b5b451)
Signed-off-by: Sergio Lopez <slp@redhat.com>
We need to bind to the correct socket types (IPv6, Unix) instead of only IPv4. This fixes UDP and unix dgram tests hanging when waiting for reply. Reported-by: Jan Noha <nohajc@gmail.com> Signed-off-by: Matej Hrica <mhrica@redhat.com> (cherry picked from commit 4380b32) Signed-off-by: Sergio Lopez <slp@redhat.com>
The cross_domain `write` handler only matched
`CrossDomainItem::WaylandWritePipe`, falling into the catch-all for
every other item type after the unconditional `remove()` at the top of
the function had already dropped the entry from the table. PipeWire
(and other clients that share host-created eventfds via SCM_RIGHTS for
per-period wakeups) sends CMD_WRITE on those eventfd identifiers — the
first such write returns InvalidCrossDomainItemType *after* removing
the item, and every subsequent write on the same identifier returns
InvalidCrossDomainItemId, masquerading on the guest as the opaque
VIRTIO_GPU_RESP_ERR_UNSPEC (0x1200).
Reproduced inside a libkrun guest on x86_64 by routing PipeWire's
ALSA-shim audio through a host PipeWire daemon. With a paired BT
speaker as the sink, `speaker-test -D pipewire -c2 -t wav -l 1`
produces, per stream, ~10 entries of:
[ 0.682819] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207)
[ 0.723762] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207)
[ 0.767615] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207)
[ 0.807779] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207)
[ 0.852469] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207)
[ 0.896552] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207)
[ 0.936504] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207)
[ 0.980567] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207)
[ 1.024476] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207)
[ 1.064636] [drm:virtio_gpu_dequeue_ctrl_func] *ERROR* response 0x1200 (command 0x207)
with audio still playing — PipeWire has socket-based fallback timing
that doesn't depend on eventfd ack, so the failures are cosmetic for
playback. They are not cosmetic for clients that strictly require the
eventfd handshake (PipeWire's ALSA shim under heavier loads, and the
buffer-pool wakeup path used by V4L2 capture streams).
Add an Eventfd arm that mirrors the WaylandWritePipe semantics:
`write_volatile` performs the 8-byte counter increment, and the item
is re-inserted into the table unless the guest signaled `hang_up`.
Verified post-fix: zero CMD_WRITE failures, zero `0x1200` entries in
guest dmesg, audio playback unchanged. Camera capture (gst-launch
pipewiresrc → MJPEG) also exercises this path for buffer-pool wakeups
and runs cleanly with valid 98%-non-zero JPEG frames.
Signed-off-by: Adam Ford <adam.ford@anodize.com>
(cherry picked from commit 5835d52)
Signed-off-by: Sergio Lopez <slp@redhat.com>
The unescape_string() function in init.c, which handles JSON escape
sequences when parsing environment variables from .krun_config.json,
had a bug where the pointer 'val' was not advanced past the escape
character after processing it.
When encountering a two-character JSON escape sequence like \n or \",
the switch statement pre-increments val to point at the escape
character (e.g., 'n' or '"') and writes the unescaped byte to the
output. However, it never advances val past that character. On the
next loop iteration, the character is not a backslash, so it gets
copied again as a literal character.
This causes:
- \n (JSON-escaped newline) to produce a newline followed by a
literal 'n'
- \" (JSON-escaped double quote) to produce two double quotes
For example, an environment variable set to a JSON string like:
{\"key\": \"value\"}
would be rendered inside the krun VM as:
{""key"": ""value""}
Fix this by adding val++ after writing the unescaped character in
each case of the switch statement. The 'u' (unicode) case already
handles its own pointer arithmetic and is not affected.
Fixes: containers#678
Assisted-by: <anthropic/claude-opus-4.6>
Signed-off-by: Dusty Mabe <dusty@dustymabe.com>
(cherry picked from commit 60fe4f6)
Signed-off-by: Sergio Lopez <slp@redhat.com>
building using musl fails with:
error[E0412]: cannot find type `statx` in crate `libc`
--> src/devices/src/virtio/fs/linux/passthrough.rs:187:39
musl_v1_2_3 allows builds targeting musl 1.2.3 or newer to use statx
rust-lang/libc/src/unix/linux_like/mod.rs#L264-L269:
cfg_if! {
if #[cfg(any(
target_env = "gnu",
target_os = "android",
all(target_env = "musl", musl_v1_2_3)
))] {
fixes: containers#431
Signed-off-by: Pepper Gray <hello@peppergray.xyz>
(cherry picked from commit c9c92e3)
Signed-off-by: Sergio Lopez <slp@redhat.com>
The previous write() held item_state.lock() across the write_volatile call that performs the actual syscall, serializing every cross-domain CMD_WRITE behind any other operation on the items table — including add_item from process_receive, which fires for every new fd received via SCM_RIGHTS as guests open additional channels. For per-period eventfd wakeups (e.g. PipeWire audio streams signaling host playback every audio period), this means the write completes only after any in-flight item_state operation finishes. Under stream-create churn — many guest applications opening streams concurrently, each delivering new fds via SCM_RIGHTS that hit add_item under the same lock — the wait can exceed the audio period budget and produce missed-deadline glitches at the host's audio output. This change shrinks the critical section to a brief fd dup() under the lock, performs the syscall lock-free, and only re-acquires the lock if hang_up indicates the item should be removed. In the common case (hang_up == 0, e.g. repeated eventfd wakeups for an active stream) the table is no longer touched per write, eliminating both the contention and the previous "remove + conditional re-insert" churn. Behavioral changes vs the old code: - Common case (hang_up == 0): item stays in the table; we hand out a dup'd fd for the write. Net behavior identical, lock hold time bounded by dup(). - hang_up == 1: item removed in a separate phase after the write, instead of "removed unconditionally then re-inserted on hang_up == 0". Same observed end state. - Concurrent writes to the same id (no longer serialized): the kernel guarantees atomicity for eventfd writes (8 bytes) and pipe writes <= PIPE_BUF, which are the only two CrossDomainItem variants this branch handles. Each caller dup's its own fd and writes through it independently. Verified with a synthetic reproducer: a sustained 1 kHz sine playing through a guest PipeWire stream while ~200 short-lived guest streams open and close concurrently (each issuing SCM_RIGHTS for new eventfds, contending with the sustained stream's per-period writes for item_state). Capturing the host sink monitor and counting sample-to-sample deltas exceeding a clean-sine threshold, the stress workload produced ~10 distinct glitch bursts in an 8-second capture before this change, and zero across five consecutive runs after. Signed-off-by: Adam Ford <adam.ford@anodize.com> (cherry picked from commit d904143) Signed-off-by: Sergio Lopez <slp@redhat.com>
from_tx_virtq_head assumed a fixed 2-descriptor layout (header + one data descriptor), which breaks with newer kernels that may combine header and data in a single descriptor (Linux 6.2+) or split data across multiple descriptors. Handle three cases: single combined descriptor (zero-copy), classic two-descriptor (zero-copy, unchanged), and multi-descriptor data (copied into an owned contiguous buffer). The RX path already handled the combined case; this brings the TX path to parity and beyond. Assisted-by: Claude Code:claude-opus-4.6 Signed-off-by: Sergio Lopez <slp@redhat.com> (cherry picked from commit 0ecf4d5) Signed-off-by: Sergio Lopez <slp@redhat.com>
This is a patch version update for the stable-1.18.x series. Signed-off-by: Sergio Lopez <slp@redhat.com>
jakecorrenti
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR cherry picks some commits from main to be included in v1.18.1.