Skip to content

[pull] master from libretro:master#975

Merged
pull[bot] merged 17 commits intoAlexandre1er:masterfrom
libretro:master
Apr 28, 2026
Merged

[pull] master from libretro:master#975
pull[bot] merged 17 commits intoAlexandre1er:masterfrom
libretro:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Apr 28, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

The hourglass icon on the task-progress notification widget never
visibly rotated.  The animation system was correctly easing
msg->hourglass_rotation through 0 -> -2*pi every six seconds, but
the rendered icon stayed static across every backend.

Three independent bugs, each blocking a different family of
drivers:

  1. gfx/gfx_widgets.c - the call site that draws the hourglass
     forwarded the live `radians` value to gfx_widgets_draw_icon
     but left `cosine`/`sine` at their cos(0)/sin(0) defaults.
     The matrix-rotation path (gl, gl1, glcore, vulkan, d3d8)
     consumes those trig values via gfx_display_rotate_z, so it
     was always building an identity rotation regardless of
     `radians`.

     Fix: recompute cosf(radians)/sinf(radians) alongside `radians`
     in the in-progress branch.

  2. gfx/drivers/d3d_shaders/sprite_sm4.hlsl.h - the sprite shader
     used by the d3d10 / d3d11 / d3d12 menu display drivers.  The
     C code on those drivers dutifully writes
     `sprite->params.rotation = draw->rotation` into the vertex
     stream, but the shader pipeline never reads it: the GSInput
     struct doesn't have a `params` field, so rotation is dropped
     at the VS->GS boundary, and the geometry shader builds the
     four corner vertices straight from `position.xy + position.zw
     * cornerOffset` with no rotation applied.

     Fix: forward `params` from VS to GS (params.x is already
     consumed by the VS for scaling; we need params.y in the GS
     for rotation), and rewrite the GS to compute corner offsets
     by rotating per-corner pixel-space sign vectors `(sx, sy) in
     {+/-1}^2` and scaling them back by `position.z * 0.5` (X) and
     `position.w * 0.5` (Y).  Because position.z = 2W/vw and
     position.w = 2W/vh both encode the same pixel-space length W
     for a square icon, this preserves the icon's shape under
     rotation on a non-square viewport.  Non-square icons would
     skew, but the only caller of rotation today (the hourglass)
     is square.

  3. gfx/drivers/d3d9hlsl.c, gfx/drivers/d3d9cg.c - the d3d9
     menu display drivers' single-sprite path (no explicit vertex
     array supplied by caller, which is what gfx_widgets_draw_icon
     does) builds an axis-aligned quad from draw->x / draw->y /
     draw->width / draw->height directly, applies scale_factor,
     and never references draw->rotation or draw->matrix_data at
     all - so rotation was silently dropped on both shader
     paths.

     Fix: when draw->rotation is non-zero, rotate each of the
     four pixel-space corner offsets around the quad centre, then
     normalise back to [0,1] via 1/video_width and 1/video_height.
     Same aspect-correct approach as the d3d_shaders fix above.
     Add <math.h> to d3d9hlsl.c (d3d9cg.c already had it).

The d3d9 multi-vertex path uses the matrix_data route, so it's
handled by fix (1) alone and needs no driver-side change.

Tested on gl / gl1 / glcore / vulkan / d3d9_cg / d3d9_hlsl /
d3d10 / d3d11 / d3d12.
Follow-up to b9777c8 (the original ASan+UBSan workflow) and
70f24be (v8: flip to halt-on-error after a clean baseline).

Adds a third smoke tier that actually loads a libretro core and
runs it through ~5 seconds of real frame dispatch + clean
shutdown, covering everything --help and --features can't reach:
dlopen of a core, retro_load_game, the stb_image-driven decode
path, the X11 + XVideo color-conversion + shared-memory image-
transfer pipeline, the runloop, and full cleanup-on-shutdown.

The new step:

  1. Builds cores/libretro-imageviewer/image_core.so with the same
     SANITIZER=address,undefined the main retroarch binary uses
     -- ensures stb_image's stack-buffer-overflow / use-after-
     return surface is fully instrumented (the global ASan
     allocator interceptor catches heap bugs across module
     boundaries regardless, but stack instrumentation requires
     per-TU compilation).  This works because v9 (efae310) made
     the standalone Makefile sanitizer-aware.

  2. Generates an 8x8 solid-red test PNG via a Python heredoc
     using only struct + zlib -- 75 bytes, no apt dependency on
     imagemagick or similar (Python ships on every ubuntu-latest).

  3. Spins up Xvfb on display :99 with a 320x240x24 screen (small
     to minimise X server memory footprint), waits for the XVideo
     extension to be available, writes a minimal retroarch.cfg
     that pins the video driver to "xvideo" and the audio driver
     to "null" (no PulseAudio / ALSA dependency on the runner),
     and runs:

       DISPLAY=:99 ./retroarch \
           -c /tmp/asan-cfg/retroarch.cfg \
           -L cores/libretro-imageviewer/image_core.so \
           /tmp/test.png \
           --max-frames=300 \
           --verbose

     --max-frames=300 = ~5s nominal at 60fps, ~15-25s under
     sanitizer overhead.  Crucially, --max-frames triggers a
     clean exit through the normal runloop teardown -- not a
     SIGTERM mid-execution -- so cleanup paths are sanitizer-
     instrumented too.  Wall-clock timeout(1) at 60s as a safety
     backstop.

  4. Surfaces sanitizer findings (AddressSanitizer: / runtime
     error:) in the step output as informational text only.  The
     step is soft-fail (continue-on-error: true) on this first
     iteration because lots of things can fire here that aren't
     RetroArch bugs (Xvfb quirks, libGL / Mesa software-rasterizer
     leaks at shutdown, X11 driver init noise) and forcing strict
     enforcement before measuring a baseline would block merges
     on noise.  Once the baseline is characterised the same way
     the --help step's was, this step can be flipped to strict.

Why xvideo specifically: it's the smallest self-contained X11
video driver in the tree (1163 LOC adapted from bSNES / MPlayer),
no GL dependency, real YUV color conversion + XShm transfer, and
Xvfb exposes the XVideo extension by default.  null video would
be simpler but skips the entire pixel-data path, which is exactly
the surface most likely to fire under sanitizer.

* .github/workflows/Linux-asan-ubsan.yml

  - Install dependencies: add xvfb, x11-utils, libxv-dev,
    libxext-dev, libxxf86vm-dev.  No new compiler/runtime
    dependencies; the X11 transitive dev headers are needed for
    xvideo build, xvfb + x11-utils for the runtime invocation.

  - Configure: add --enable-xvideo to make xvideo a hard build
    requirement -- silent fall-back to a different driver would
    skew the smoke's coverage without warning.

  - New steps appended after --features:
      * Build imageviewer core under ASan + UBSan
      * Generate test PNG for imageviewer
      * Smoke run imageviewer headless under Xvfb (soft-fail)

  - Header comment + --help comment updated to describe the
    three-tier coverage structure (--help, --features, imageviewer).

Verified locally on efae310:
  - YAML parses with 9 steps, soft-fail correctly scoped to only
    the new imageviewer step.
  - Imageviewer core builds clean with SANITIZER=address,undefined
    (build step minus apt installs reproduced verbatim locally).
  - Test PNG generation runs clean, produces a valid 75-byte 8x8
    PNG (verified by `file` and stb_image acceptance).
  - Xvfb on a fresh ubuntu-latest equivalent exposes the XVideo
    and MIT-SHM extensions out-of-the-box.
  - xdpyinfo -queryExtensions emits extension names with leading
    whitespace, hence the ^[[:space:]]+ in the detection regex
    (a naive ^XVideo / ^MIT-SHM regex would silently fail to
    match -- caught and fixed during pre-flight).

Cannot verify locally: the actual end-to-end run, which would
require building a sanitizer-instrumented retroarch binary against
a full apt set.  The first CI run is the experiment, the same way
the original workflow's was.
Reported by gcc 15.2.0 (msys2 / MinGW) building libretro-net-retropad
under -Wstringop-overflow:

   stdstring.c: In function 'string_tokenize':
   stdstring.c:500:16: warning: writing 1 byte into a region of
   size 0 [-Wstringop-overflow=]
     500 |    token[_len] = '\0';
         |    ~~~~~~~~~~~~^~~~~~

The flagged write is redundant.  string_tokenize allocates
malloc((_len + 1) * sizeof(char)) and then calls
strlcpy(token, str_ptr, _len + 1).  Looking at the strlcpy
contract in libretro-common/compat/compat_strl.c:31-41:

   size_t __len = _len < len - 1 ? _len : len - 1;
   memcpy(s, in, __len);
   s[__len] = '\0';

strlcpy already NUL-terminates within the `_len + 1` byte limit.
Because _len was computed at lines 489-492 as either
(delim_ptr - str_ptr) -- strictly <= strlen(str_ptr) -- or
strlen(str_ptr) directly, _len <= strlen(str_ptr) is always
true, so strlcpy's __len resolves to _len and the terminator
lands at token[_len].  The token[_len] = '\0' line just rewrites
the byte strlcpy already wrote.

gcc 15's tightened -Wstringop-overflow analyser cannot constrain
the relationship between _len and the malloc'd size (the size
expression `(_len + 1) * sizeof(char)` involves a multiply that
the analyser doesn't fold through to the bounds check on
token[_len]), so it warns under the assumption that _len could
exceed the malloc'd size by one.  Removing the redundant write
silences the warning without changing semantics.

Not a strlcpy_append situation: that helper (78c52ab) is for
chained appends of the form

    _len += strlcpy(s + _len, src, len - _len);

This is a single copy into a freshly-malloc'd buffer -- the
right fix is to delete the redundant trailing write, not to
swap idioms.

* libretro-common/string/stdstring.c

  Drop the redundant token[_len] = '\0' that follows the strlcpy
  in string_tokenize, with an inline comment explaining the
  strlcpy contract that makes the line dead and the
  -Wstringop-overflow interaction that made it actively harmful.

Verified locally:
  - The five string_tokenize behaviours that callers rely on
    (basic comma split, empty token at start, multi-char
    delimiter, no-delimiter pass-through, empty input) all pass
    under -fsanitize=address with the redundant write removed.
  - Cannot reproduce the gcc 15 warning on local gcc 13.3
    (warning is gcc-15-specific), but verified the local build
    stays clean under -Wstringop-overflow=4 and -Werror after
    the change.
Run #1 of the v10 imageviewer smoke step tripped on Xvfb's
incomplete XVideo support:

   Error: [XVideo] XvQueryAdaptors() found 0 adaptors.
   Error: [Video] Cannot open video driver. Exiting...

Xvfb advertises the XVideo *extension* (which my pre-flight
check verified) but provides zero XVideo *adaptors* -- adaptors
are the hardware-acceleration backends, and Xvfb has no real
video hardware to back them with.  The xvideo driver in
gfx/drivers/xvideo.c correctly errors out via XvQueryAdaptors's
count check rather than crashing -- that's good defensive code,
not a bug -- but it means xvideo is unusable on Xvfb and we
need a different video driver for the smoke.

Pivot to SDL2.  RetroArch's sdl2 video driver
(gfx/drivers/sdl2_gfx.c) uses SDL2's X11 backend with MIT-SHM
for image transfer -- both confirmed available on Xvfb out of
the box.  A standalone SDL_CreateRenderer probe under Xvfb
returns a working SDL_RENDERER_SOFTWARE renderer cleanly
(verified locally before this patch landed).

Coverage tradeoff: we lose xvideo's YUV color-conversion +
XShm codepath but keep all the high-leverage surface (dlopen
of the core, retro_load_game, the stb_image decode path, video
driver init, the runloop, full cleanup-on-shutdown).  The
dropped xvideo-specific surface is small enough that it's not
worth the cost of finding a virtual X server with software
XVideo adaptors -- nothing widely available actually provides
those.

* .github/workflows/Linux-asan-ubsan.yml

  - Install dependencies: drop libxv-dev, libxext-dev,
    libxxf86vm-dev (xvideo build deps).  libsdl2-dev was
    already in the base set so no new packages required.

  - Configure: --enable-xvideo -> --enable-sdl2.  Same explicit-
    over-implicit reasoning: silent fall-back to a different
    driver would skew the smoke without warning.

  - Smoke step: video_driver = "xvideo" -> "sdl2" in the inline
    retroarch.cfg.  The xdpyinfo XVideo readiness probe becomes
    an MIT-SHM probe (which SDL2 actually uses).  Step comment
    rewritten to document why we pivoted from xvideo, with the
    actual error message from run #1 inlined for context.

  - Header comment: tier-2 description updated from "X11 +
    XVideo color-conversion pipeline" to "SDL2 + X11 + MIT-SHM
    rendering pipeline".

Pre-flight lesson: extension presence != functional driver.
xdpyinfo -queryExtensions told me XVideo was present; what I
should have verified is that XvQueryAdaptors returns a positive
count.  For SDL2 the equivalent positive verification is
SDL_CreateRenderer returning non-NULL, which the local probe
confirmed.

Verified locally:
  - YAML parses, 9 steps, soft-fail still scoped to only the
    imageviewer step.
  - Standalone SDL2 renderer probe (SDL_CreateWindow +
    SDL_CreateRenderer w/ SDL_RENDERER_SOFTWARE) on Xvfb display
    returns ok, exit 0.
  - Patch applies cleanly to b9777c8 + v8 (efae310 + ace9c9f
    didn't touch this file).

Cannot verify locally: the actual end-to-end run, which would
require building a sanitizer-instrumented retroarch binary
linked against SDL2.  As before, the next CI run is the
experiment.
Found by the new ASan+UBSan CI workflow's headless SDL2
imageviewer smoke (b9777c8 + d967813).  Run #2 of the smoke
(after the v12 xvideo->sdl2 pivot) caught a SEGV at process
startup:

   AddressSanitizer:DEADLYSIGNAL
   ==9537==ERROR: AddressSanitizer: SEGV on unknown address
   0x000000000968.  The signal is caused by a READ memory
   access.

   #0 XQueryExtension      (libX11.so.6+0x361f6)
   #1 XInitExtension       (libX11.so.6)
   #2 XextAddDisplay       (libXext.so.6)
   #3 XScreenSaverQueryExtension (libXss.so.1+0x1565)
   #4 x11_suspend_screensaver
   #5 video_driver_init_internal
   #6 drivers_init
   #7 retroarch_main_init

The address 0x968 is a small offset off NULL -- libX11's
XQueryExtension dereferencing the Display * argument before
checking it for null.

Root cause:
gfx/common/x11_common.c::xss_screensaver_inhibit() was called
via x11_suspend_screensaver() with g_x11_dpy == NULL.  The flow:

  1. SDL2 video driver inits.
  2. sdl2_set_handles() in gfx/common/sdl2_common.c:54 calls
     video_driver_display_type_set(RARCH_DISPLAY_X11) so the
     display *type* gate matches X11, and routes the SDL-owned
     X Display through video_driver_display_set() at line 55.
  3. The file-scope `g_x11_dpy` in x11_common.c:66 stays at its
     initial NULL value.  It's only assigned at line 795 inside
     XOpenDisplay() in the xvideo / GL / X11-direct init paths,
     which the SDL2 driver doesn't take.
  4. drivers_init() -> video_driver_init_internal() ->
     x11_suspend_screensaver() -- line 286 gates on display_type
     == RARCH_DISPLAY_X11 (passes; SDL2 set it), then at lines
     289-292 tries dbus_suspend_screensaver() if HAVE_DBUS, then
     at line 293 calls xss_screensaver_inhibit(g_x11_dpy,
     enable) -- with g_x11_dpy == NULL.
  5. xss_screensaver_inhibit() at line 227 calls
     XScreenSaverQueryExtension(NULL, ...) -> libX11
     dereferences -> SEGV.

Why hasn't this been hit before?
Most desktop builds set HAVE_DBUS -- pkg-config finds dbus-1 on
basically every modern Linux distro -- so the
dbus_suspend_screensaver() call at line 290 short-circuits and
returns true before the xss path is reached.  The CI runner
doesn't apt-install libdbus-1-dev (HAVE_DBUS off) but libxss1 is
pulled in transitively by xvfb / x11-utils (HAVE_XSCRNSAVER on
at link time), exposing the corner.

A symmetric defensive check already exists at line 255 in
xdg_screensaver_inhibit():

   if (g_x11_dpy && g_x11_win)
      ...

The xss path was the missing-guard one.

* gfx/common/x11_common.c

  Add a `if (!dpy) return false;` guard at the top of
  xss_screensaver_inhibit().  Applies only to the
  HAVE_XSCRNSAVER branch; the no-op stub at line 237 already
  discards `dpy` with `(void) dpy;` and returns false.

  Inline comment explaining the failure mode points at the
  relevant call sites in sdl2_common.c so a future maintainer
  can trace the invariant without re-deriving it.

Verified locally:
  - The added check returns false for NULL dpy without
    invoking libX11 (matches the existing "extension not
    present" return contract).  Non-NULL dpy paths are
    unchanged.
  - xss_screensaver_inhibit() is `static`, single caller (line
    311 in x11_suspend_screensaver) -- no other reachable code
    paths to audit.

Cannot verify locally that this resolves the CI SEGV without
rebuilding retroarch with the same apt set the runner has --
the next CI run is the experiment.  Expected outcome:
xss_screensaver_inhibit() returns false harmlessly, the
function falls through to the xdg_screensaver path (which has
its own NULL guard), and the SDL2 smoke proceeds into
retro_run.
…oy paths

Four issues found in the same audit pass over gfx/drivers/vulkan.c, all
in the create-with-old / destroy-buffer ownership protocol.  None of
them are the trigger for the iOS materialui content-close abort that
prompted the audit (the font teardown does not go through the patched
paths), but they are real and worth fixing on their own.

* vulkan_create_texture: VkDeviceMemory leak on alloc failure
  (gfx/drivers/vulkan.c)

  When called with a non-NULL `old`, the function destroys old's
  view/image/buffer at the top of the `if (old)` block, then either
  pilfers old->memory or allocates fresh memory for tex.memory.  The
  trailing `if (old)` cleanup is responsible for freeing old->memory in
  the no-pilfer path (skipped by the pilfer branch via
  old->memory = VK_NULL_HANDLE).

  The vkAllocateMemory failure branch returns early before reaching
  that cleanup.  At return, old->view/image/buffer are dangling but
  zeroed by the caller's `*old = tex` overwrite, while old->memory
  still holds the original live VkDeviceMemory and is now unreachable.
  Pure leak per failed call; reachable from every call site that
  passes a real `old` (vk_per_frame texture realloc on resize, MUI/RGUI
  set_texture, readback staging realloc).

  Inline an equivalent cleanup before the early return: unmap if
  mapped, free old->memory, zero old.  Mirrors the unmap-before-free
  the pilfer branch already does on line 1277.

* vulkan_create_texture: free-while-mapped on no-pilfer cleanup
  (gfx/drivers/vulkan.c)

  The trailing `if (old)` cleanup was calling vkFreeMemory on
  old->memory without unmapping first, even though the pilfer branch a
  few lines up does unmap.  vkFreeMemory implicitly unmaps so this is
  not a spec violation, but MoltenVK has historically handled
  free-while-mapped poorly (KhronosGroup/MoltenVK#957) and the
  asymmetry with the pilfer branch is gratuitous.

  Add the `if (old->mapped) vkUnmapMemory(...)` guard before the free.

* vulkan_destroy_buffer: VUID-vkFreeMemory-memory-00677 violation
  (gfx/drivers/vulkan.c)

  Order was: vkUnmapMemory (unconditional) -> vkFreeMemory ->
  vkDestroyBuffer.  Two problems.  First, the memory is freed while
  buffer->buffer is still bound to it, which the spec forbids per
  VUID-vkFreeMemory-memory-00677 -- vulkan_destroy_texture (line 974)
  gets this right; this function was the outlier.  Second,
  vkUnmapMemory was unconditional, but vulkan_create_buffer leaves
  buffer->mapped == NULL when its initial vkMapMemory fails, so a
  destroy on a never-mapped buffer would unmap memory that is not
  currently mapped.

  Fix the order to unmap (gated on buffer->mapped) -> destroy buffer
  -> free memory, with NULL-handle guards on the destroys for symmetry
  with vulkan_destroy_texture.

* vulkan_init_quad_ibo: same VUID violation in vkMapMemory failure path
  (gfx/drivers/vulkan.c)

  Identical wrong order in the IBO-init map-failure cleanup -- frees
  vk->quad_ibo.memory before destroying vk->quad_ibo.buffer that is
  still bound to it.  Swap the two calls.  vulkan_deinit_quad_ibo
  itself was already correct.
rgui_render() was filling the entire framebuffer with bg_dark_color
before drawing the info messagebox, which erased the menu underneath
and left the popup floating against an empty background.

The messagebox already paints its own opaque background, border, and
drop shadow within its footprint, so the full-screen fill served no
purpose other than destroying the menu render that had just been
composed above it. Drop it; the popup now appears on top of the
visible menu as intended.
Five independent issues found during a focused audit of menu/drivers/
xmb.c.  Two are real corruption vectors driven by uninitialized memory
and a stale-pointer free ordering; two are size_t underflows in the
ticker helpers that don't fire on the current XMB call sites but
violate the contract a future caller could plausibly hit; and one is
an unused heap allocation on every main-menu init.  Grouped together
because they share an audit context and none is large enough to stand
alone.

* xmb_alloc_node: uninitialized gfx_thumbnail_t fields
  (menu/drivers/xmb.c)

  The comment above xmb_alloc_node deliberately uses malloc instead
  of calloc to avoid zeroing the multi-KB thumbnail_path_data char
  arrays on every node alloc on big lists (MAME, FBA).  Per-field
  init only sets node->thumbnail_icon.icon.texture = 0, leaving the
  rest of the gfx_thumbnail_t substruct (status, flags, alpha,
  delay_timer, width, height) and all of thumbnail_path_data
  uninitialized.

  When a node is later freed without its thumbnail ever having been
  loaded -- the common case for the line-edit path at the bottom of
  the file (xmb_list_init -> xmb_alloc_node, allocated and stored
  into list->list[i].userdata) -- xmb_free_node calls
  gfx_thumbnail_reset, which reads `thumbnail->flags &
  GFX_THUMB_FLAG_FADE_ACTIVE` and `thumbnail->texture` before
  initializing them.  Uninit `flags` may match the bit and trigger
  gfx_animation_kill_by_tag against arbitrary state; uninit
  `texture != 0` feeds a garbage handle to
  video_driver_texture_unload.  The latter is the actual heap
  corruption vector -- driver-side texture maps treat the handle as
  an index/key into their own allocations.

  Separately, the lazy thumbnail-path resolution in the render path
  reads `thumbnail_icon->thumbnail_path_data.icon_path[0]` to decide
  whether resolution is needed; on a freshly-allocated node that
  byte is uninitialized.

  Zero the small gfx_thumbnail_t substruct (~32 bytes) and clear
  icon_path[0] only.  This preserves the original optimization that
  kept thumbnail_path_data's PATH_MAX/NAME_MAX char arrays unzeroed,
  since the path-data fields are written via fill_pathname/strlcpy
  before any read of those fields.

* playlist_db_node_map freed after the nodes it borrows from
  (menu/drivers/xmb.c)

  The map's values are non-owning xmb_node_t* pointers into
  xmb->horizontal_list[i].userdata, populated in
  xmb_context_reset_horizontal_list.  In xmb_refresh_horizontal_list,
  xmb_free, and the xmb_init error path the order was: free nodes
  via xmb_free_list_nodes, then RHMAP_FREE the map.  Between those
  two calls the map holds dangling pointers.

  No re-entry path I could trace would actually read the map during
  that window today (the menu is single-threaded and
  xmb_free_list_nodes calls only gfx_thumbnail_reset and free), but
  the invariant isn't enforced and the fix is a one-line reorder.
  Free the map first at all three sites.

* xmb_animation_line_ticker_smooth: size_t underflow in fade-string
  copy length (menu/drivers/xmb.c)

  At the two fade-string memcpy sites:

      if (copy_len >= line_ticker->top_fade_str_len)
         copy_len = line_ticker->top_fade_str_len - 1;

  if top_fade_str_len == 0 the comparison is true, copy_len
  underflows to SIZE_MAX, the memcpy writes effectively unbounded
  bytes, and the trailing NUL store at line_ticker->top_fade_str
  [copy_len] writes at SIZE_MAX offset.  Same pattern for
  bottom_fade_str_len.

  XMB's only caller passes sizeof(stack_array) so the live path is
  safe today, but xmb_animation_line_ticker_smooth takes the public
  gfx_animation_ctx_line_ticker_smooth_t shape.  Add the missing
  `_str_len > 0` guard to each branch.

* xmb_animation_line_ticker: same underflow on line_ticker->len
  (menu/drivers/xmb.c)

  max_copy = line_ticker->len - 1 underflows when len == 0, then
  copy_len > max_copy is false and the memcpy at line ~4566 writes
  the full block size.  The early sanity check guards line_len < 1
  and max_lines < 1 but not len < 1.  Add it for symmetry with the
  smooth variant; same caller-contract argument.

* xmb_menu_init_list: dead heap allocation
  (menu/drivers/xmb.c)

  info.exts = strldup("lpl", sizeof("lpl")) has been allocated on
  every main-menu init since the function was introduced.
  DISPLAYLIST_MAIN_MENU never reads info->exts (only
  directory-scanning displaylists like
  DISPLAYLIST_DATABASE_PLAYLISTS_HORIZONTAL do, where the same line
  legitimately filters .lpl files); the buffer is allocated, never
  read, then freed by menu_displaylist_info_free.  Almost certainly
  copy-pasted from xmb_init_horizontal_list at line ~2927.  Drop it.
Three issues found during a focused audit of menu/drivers/ozone.c.
The first is a latent double-free in the deep-copy helper that the
struct's own comment warned about; the second is a dangling-pointer
window during teardown across three sites; the third is a dead heap
allocation on every main-menu init.  Same audit class as the recent
xmb pass (commit 9a27e66) since ozone shares much of its data-flow
shape with xmb.

* ozone_copy_node: latent double-free of console_name
  (menu/drivers/ozone.c)

  ozone_node has a comment above its definition saying

      /* If you change this struct, also
         change ozone_alloc_node and
         ozone_copy_node */

  ozone_alloc_node was kept in sync when console_name was added,
  ozone_copy_node was not.  ozone_copy_node does

      *new_node          = *old_node;
      new_node->fullpath = old_node->fullpath
            ? strdup(old_node->fullpath) : NULL;

  -- bitwise-copying every pointer field including console_name,
  then deep-copying only fullpath.  When either the source or the
  copy is later passed to ozone_free_node, the free(node->
  console_name) at the top of that function frees the same buffer
  twice.

  The bug is latent today: console_name is only populated on
  horizontal_list nodes (sidebar playlist tabs), and the only
  caller of ozone_copy_node is ozone_list_deep_copy, which copies
  the vertical selection_buf into selection_buf_old for the
  scroll-transition cache.  Vertical entries don't carry a
  console_name, so no double-free actually fires from current
  call sites.  But the precondition is unenforced -- any future
  feature that copies a horizontal node (or even adds a second
  string field to ozone_node and forgets the same step) re-opens
  the bug.  Mirror fullpath's strdup-or-NULL pattern.

* playlist_db_node_map freed after the nodes it borrows from
  (menu/drivers/ozone.c)

  Same shape as the xmb fix in 9a27e66.  The map's values are
  non-owning ozone_node_t* pointers into ozone->horizontal_list[i].
  userdata, populated in ozone_context_reset_horizontal_list.  In
  ozone_refresh_horizontal_list, the ozone_init error path, and
  ozone_free, the order was: free nodes via ozone_free_list_nodes,
  then RHMAP_FREE the map.  Between those two calls the map holds
  dangling pointers.

  No re-entry path I could trace would actually read the map during
  that window today, but the invariant isn't enforced and the fix
  is a one-line reorder per site.  Free the map first at all three.

* ozone_menu_init_list: dead heap allocation
  (menu/drivers/ozone.c)

  info.exts = strldup("lpl", sizeof("lpl")) is allocated on every
  main-menu init.  DISPLAYLIST_MAIN_MENU never reads info->exts
  (only directory-scanning displaylists like
  DISPLAYLIST_DATABASE_PLAYLISTS_HORIZONTAL do, where the same
  call legitimately filters .lpl files); the buffer is allocated,
  never read, then freed by menu_displaylist_info_free.  Almost
  certainly copy-pasted from ozone_init_horizontal_list at line
  ~5046, same as the xmb instance fixed in 9a27e66.  Drop it.
word_wrap_wideglyph violated its implicit contract that the returned
length always fits inside the destination buffer.  Three return paths
forwarded strlcpy()'s return value, which is strlen(src) per spec --
not bytes-actually-written -- so on truncation the function returned
a value larger than the destination it had written into.

The sister function word_wrap() has a `len < src_len + 1` guard up
front that bails to 0 when the buffer is too small, sidestepping
this entirely.  word_wrap_wideglyph has no such guard and tries to
fit what it can, which is correct behaviour, but the inflated return
breaks every caller that uses it as a length argument.

Concrete impact: three menu drivers (xmb, ozone, materialui) feed
this return value as the `n` argument to memchr() over the wrapped
destination buffer when laying out messageboxes
(xmb_render_messagebox_internal at xmb.c line ~1183;
ozone_draw_messagebox at ozone.c line ~7362;
materialui_render_messagebox at materialui.c line ~2825).  Pre-patch,
an inflated `n` walks memchr past the destination into adjacent stack
memory (the lines[] / line_lengths[] arrays the messagebox helpers
keep right next to the wrapped buffer).  A '\n' byte found there
yields a wild pointer used in pointer arithmetic and as a length
argument to font_driver_get_message_width(), causing stack info
disclosure or a crash.

The wideglyph path is selected by msg_hash_get_wideglyph_str(), so
the bug is reachable in CJK locales (Japanese, Korean, Chinese) via
any messagebox payload exceeding MENU_LABEL_MAX_LENGTH (256 bytes
default) -- error notifications, file paths, network handshake
text, search results.

Two issues fixed in stdstring.c:

* Return value: every truncating return path
  (`return strlcpy(...)` and `return (s - s_start) + strlcpy(...)`)
  now clamps strlcpy's return to the actual bytes written:

      copied = strlcpy(s, src, n);
      if (copied >= n) copied = (n > 0) ? n - 1 : 0;
      return [(s - s_start) +] copied;

  This avoids introducing a strnlen() dependency (not used anywhere
  else in libretro-common, portability concerns on old MSVC).

* `len` semantic drift: the loop body decremented `len` by char_len
  on every byte written, while the early-return paths computed
  `remaining = len - (s - s_start)`.  Both expressions track "bytes
  used", so the formula effectively double-subtracted, yielding a
  too-small `remaining` and forcing the strlcpys to truncate harder
  than necessary.  The lastspace/lastwideglyph rewinds at lines ~411
  and ~429 also moved `s` backward without a matching `len` adjustment,
  desyncing the two over time.  Drop the `len -= char_len`, compute
  `remaining = len - (s - s_start)` once at the top of each loop
  iteration (and reuse it for the buffer-overflow guard previously
  spelled `if (char_len >= len)`).  `len` now stays constant across
  the function body, matching word_wrap()'s convention.

Regression test: libretro-common/samples/string/word_wrap_overflow
_test/.  Five cases covering both truncating and non-truncating
inputs across the line-358 early-return path and the main-loop
late-return paths.  The discriminator is two-fold: the test asserts
return-value-vs-bytes-written directly, AND mimics the menu drivers'
call shape by feeding the returned value to memchr() over the
destination, so an ASan build catches pre-patch failures via
heap-buffer-overflow on the memchr.  Verified by running against
both pre-patch and post-patch stdstring.c:

    pre-patch:  short_input_tiny_dest fails
                (returned=47, dst_size=16) -> exit 1
    post-patch: all five cases pass -> exit 0

CI: word_wrap_overflow_test added to RUN_TARGETS in
.github/workflows/Linux-libretro-common-samples.yml.  Builds under
SANITIZER=address,undefined like the other overflow regression
tests in the same workflow (rpng_chunk_overflow_test,
vfs_read_overflow_test, cdrom_cuesheet_overflow_test,
chd_meta_overflow_test, cdfs_dir_record_test, rzip_chunk_size_test).

Note on libretro-common/test/string/test_stdstring.c (the
libcheck-based suite): I considered adding a unit-test entry there
too, but the file is already broken at HEAD on master --
test_string_replacesubstr calls string_replace_substring with the
old 3-arg signature, but the API was extended to 5 args some time
ago; the test fails to compile.  Wiring up Makefile.test in CI
would have caught that regression but is out of scope here.
The samples-based regression test is the working surface and
matches the established pattern for every other overflow test
in the tree.
The fallback path in rgui_render_osk that triggers when the on-screen
keyboard cannot fit on the framebuffer was building its message via
the unsafe `*pos += strlcpy(buf + *pos, src, len - *pos)` pattern.
The construction was sound here because msg is large
(NAME_MAX_LENGTH) and inputs are short, but the same string-build
shape is what strlcpy_append exists to replace -- it's bound-checked,
short-circuits cleanly on truncation, and avoids the off-by-one
arithmetic around the embedded '\n'.

Mirrors materialui_render's identical construction at materialui.c
line ~8091, which was already converted in the strlcpy_append
chain-conversion sweeps (commits 25ade82 / e446242 / 78c52ab).
Three independent issues found during a focused audit of menu/
drivers/materialui.c.  One is a real heap overflow in the status-bar
metadata builder driven by user-controllable input strings; one is
an OOB write/read on power-user-sized playlist counts; one is a
TOCTOU OOB read in the touch-tap handler.  Same audit class as the
recent xmb (9a27e66) and ozone (d56c728) commits, grouped together
because they share an audit context and none is large enough to
stand alone.

* status-bar metadata: heap overflow via _len underflow in
  strlcpy chain (menu/drivers/materialui.c)

  materialui_render_process_entry_playlist_desktop builds the
  status-bar metadata string ("Core: <name> ⠀ <runtime> ⠀ <last
  played>") through a chain of six

      _len += strlcpy(buf + _len, src, sizeof(buf) - _len);

  calls.  This pattern is NOT self-bounding: strlcpy returns
  strlen(src), not bytes-actually-written, so when any intermediate
  call truncates -- because the source is longer than the remaining
  buffer space -- _len overshoots sizeof(status_bar.str).  The next
  iteration's `sizeof(...) - _len` underflows size_t to ~SIZE_MAX,
  the strlcpy treats the destination as essentially infinite, and
  writes proceed past status_bar.str into adjacent struct fields
  (runtime_fallback_str, last_played_fallback_str, then whatever
  follows in materialui_handle_t on the heap).

  Reachable via any path that produces a long source string: a
  crafted playlist entry with an outsize core_name, a long runtime
  log entry, or a verbose locale translation of
  MENU_ENUM_LABEL_VALUE_PLAYLIST_SUBLABEL_CORE.

  Convert the chain to strlcpy_append, the bound-checked replacement
  added in 78c52ab and applied across the codebase in 25ade82 /
  e446242.  After truncation strlcpy_append clamps *pos to len-1 so
  the chain short-circuits cleanly instead of propagating an
  underflow.

* playlist_selection[]: OOB write/read for users with too many
  playlists (menu/drivers/materialui.c)

  mui->playlist_selection is sized [NAME_MAX_LENGTH], which is the
  *file/dir name length cap* (128 on small builds, 256 on default)
  -- the wrong constant for what's actually a per-playlist-tab
  remembered-selection cache.  materialui_navigation_set writes

      mui->playlist_selection[mui->playlist_selection_ptr] = selection;

  with no bound check on the index, where the index is the user's
  row in the playlists tab (also stored unchecked one branch
  above).  A user with > NAME_MAX_LENGTH playlists -- not implausible
  for someone with FBNeo, MAME, and every console they own --
  navigating to row N and then entering any playlist will OOB-write
  at offset N into adjacent struct fields and the heap beyond.  The
  read site in materialui_populate_entries has the matching OOB
  read.

  Two-part fix.  Replace NAME_MAX_LENGTH with a purpose-built
  MUI_PLAYLIST_SELECTION_MAX (1024 -- comfortable headroom for any
  plausible setup, ~8 KB cache on mui).  Bound-check both the write
  and read against that constant.  Selections in slots beyond
  MUI_PLAYLIST_SELECTION_MAX simply aren't remembered; falling
  through to the existing default-selection behaviour is
  preferable to a heap corruption.

* materialui_pointer_up TAP path: OOB read on stale ptr
  (menu/drivers/materialui.c)

  The TAP-gesture branch does

      list = MENU_LIST_GET_SELECTION(menu_list, 0);
      if (!list)
         break;
      if (!(node = (materialui_node_t*)list->list[ptr].userdata))
         break;

  with no bound check on ptr against list->size.  ptr was set in a
  prior render frame's hit-test loop, where it was bounded by that
  frame's entries_end.  Between render and event delivery the list
  can be repopulated -- search filter applied, navigation back/
  forward, async list rebuild -- leaving ptr stale and possibly
  past the new list end.

  The neighbouring LONG_PRESS branch (line ~11199) and the swipe
  handler at materialui_pointer_up_swipe_horz_default (line ~10874)
  both already guard with `ptr < entries_end`; the TAP path missed
  the same defence.  Add it.
Four independent issues found during a focused audit of gfx/
gfx_thumbnail.c, the subsystem touched by all four menu drivers
(xmb, ozone, materialui, rgui).  Same audit class as the recent
xmb (9a27e66), ozone (d56c728), and materialui (d5e598a) commits.

* gfx_thumbnail_get_path: missing break on LEFT case
  (gfx/gfx_thumbnail.c)

  The switch in gfx_thumbnail_get_path() that selects which path
  to return based on thumbnail_id is missing a `break` after the
  GFX_THUMBNAIL_LEFT case.  When a caller asks for the LEFT
  thumbnail and *path_data->left_path is empty, control falls
  through to GFX_THUMBNAIL_ICON; if *path_data->icon_path is
  populated, the function returns the icon path *as if it were
  the left thumbnail*.  The caller then loads the icon as the
  left thumbnail texture.

  Wrong-content substitution rather than memory corruption, but
  silent and reachable on any menu where left thumbnails are
  unset but icons are configured (the default for most playlist
  configurations on xmb/ozone/materialui).  Add the missing
  break.

* gfx_thumbnail_get_content_dir: stack buffer overflow on long
  directory paths (gfx/gfx_thumbnail.c)

  The function extracts the basename of the directory portion of
  path_data->content_path by copying that prefix into a local
  tmp_buf[NAME_MAX_LENGTH] and then calling
  path_basename_nocompression() on the copy.  The strlcpy size
  passed is the directory-portion length _len, which is bounded
  only by `_len < PATH_MAX_LENGTH` (i.e. up to 2047) -- but
  tmp_buf is only NAME_MAX_LENGTH (256) bytes.  Whenever the
  directory portion exceeds 255 chars, strlcpy writes up to
  _len-1 bytes into the smaller tmp_buf and corrupts the stack
  past it.

  Reachable on systems with deep folder hierarchies; content_path
  is set from runtime_content_path_basename and other content
  paths capped at PATH_MAX_LENGTH, so 500+ char directory
  prefixes are entirely plausible (a savefile path nested under
  the user's home directory plus a deep ROM organisation
  scheme).

  Fix: anchor the copy at the latest position that still lets
  the directory's last segment fit in tmp_buf, then take the
  basename from there.  We never needed the full prefix in
  tmp_buf -- only its tail -- so dropping the leading bytes when
  _len > sizeof(tmp_buf) gives the same answer without the
  overflow.

* gfx_savestate_thumbnail_get_path: _len underflow in path
  build (gfx/gfx_thumbnail.c)

  Same unsafe pattern that the recent strlcpy_append sweep
  (78c52ab / 25ade82 / e446242 / d5e598a) was created to
  replace, applied here to a mixed strlcpy + snprintf chain:

      _len  = strlcpy(s, state_name, len);
      if (state_slot > 0)
         _len += snprintf(s + _len, len - _len, "%d", state_slot);
      ...
      strlcpy(s + _len, FILE_PATH_PNG_EXTENSION, len - _len);

  Both strlcpy and snprintf return their would-be length on
  truncation, so when state_name approaches @len, _len overshoots
  @len, the next `len - _len` subtraction underflows size_t to
  ~SIZE_MAX, and subsequent calls treat the destination as
  essentially infinite.

  Reachable: state_name is runloop_st->name.savestate which is
  sized PATH_MAX_LENGTH; on a deep ROM directory the savestate
  name itself can fill the buffer, and the slot suffix or .png
  extension push the running total past @len.

  Add explicit truncation guards after each step so the chain
  stays inside the buffer.  Also add the missing `!s || !len`
  guard at the top -- the historical s[0] = '\0' before the
  NULL check was a latent NULL-deref.

* gfx_thumbnail_path_init: malloc -> calloc to zero
  playlist_index (gfx/gfx_thumbnail.c)

  gfx_thumbnail_path_init() allocates the path_data struct via
  malloc and then calls gfx_thumbnail_path_reset(), which
  resets the string buffers and the three playlist_*_mode
  fields but leaves playlist_index untouched -- garbage from
  malloc until the first set_content_*() call writes it.  Read
  sites in xmb (xmb_set_title at xmb.c:2566 and the related
  fallback at xmb.c:7462) sample
  thumbnail_path_data->playlist_index before any setter has
  necessarily run, then pass it to playlist_get_index().  The
  callee bounds-checks against the playlist's size, so a
  garbage index just produces a stale pl_entry rather than a
  crash, but the code is latently UB.

  Switch to calloc.  Cheaper than touching path_reset's API
  and forecloses the issue for any future read site that
  doesn't replicate the bounds-check defence.
task_set_title() and task_set_error() take ownership of a heap
pointer and rely on the task system to free it at completion, but
neither setter frees the *previous* value -- so calling either one
twice on the same task without an intervening free leaks the prior
string.  task_free_title() has long existed for the title side of
this protocol; the error side had no symmetric helper at all,
forcing callers that legitimately needed to update the error mid-
task to either guard with task_get_error() or just live with the
leak.

This commit:

* Adds task_free_error(), symmetric to the pre-existing
  task_free_title() (both lock property_lock under HAVE_THREADS,
  free if non-NULL, NULL out the field).

* Documents the free-then-set protocol clearly in
  queues/task_queue.h on both setters, both struct fields, and
  cross-references the helpers via @see -- the previous "@warning
  This does not free the existing X" line was correct but easy to
  miss, and three sites in tasks/task_cloudsync.c missed it.

* Fixes those three task_cloudsync.c violators (lines 107, 1281,
  1331).  task_cloudsync sets the initial title at line 1439 via
  direct assignment, then later replaces it via task_set_title in
  three callbacks without calling task_free_title first.  Each
  replacement leaks the prior strdup.

* Adds a regression test under libretro-common/samples/queues/
  task_queue_title_error_test/ exercising both helpers and the
  free-then-set protocol (single-set, replace-via-free, free-on-
  null no-op, double-free no-op, 50-iteration stress chain).
  Wires it into the existing libretro-common-samples CI workflow,
  so it builds under -fsanitize=address,undefined and LSan flags
  any future regression to the protocol.

The test compiles task_queue.c without HAVE_THREADS so the slock /
scond / sthread paths drop out cleanly -- the title/error
protocol is identical with and without threading; the locks just
serialise it.  The single remaining external dependency
(cpu_features_get_time_usec) is provided as a trivial stub since
no test path drives the queue through gather().

Verified pre-patch: the test fails to link without task_free_error
(undefined reference), confirming it genuinely exercises the new
symbol rather than passing trivially.

Verified post-patch: ASan + UBSan + LSan clean, all ten cases
pass.
@pull pull Bot locked and limited conversation to collaborators Apr 28, 2026
@pull pull Bot added the ⤵️ pull label Apr 28, 2026
@pull pull Bot merged commit 50f3dd8 into Alexandre1er:master Apr 28, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant