Skip to content

[pull] master from libretro:master#980

Merged
pull[bot] merged 11 commits intoAlexandre1er:masterfrom
libretro:master
Apr 30, 2026
Merged

[pull] master from libretro:master#980
pull[bot] merged 11 commits intoAlexandre1er:masterfrom
libretro:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Apr 30, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

The GDI video driver previously only worked with RGUI and had a stub
font driver returning NULL for glyph metrics.  This commit brings it
up to feature parity with d3d8/d3d9 for the menu/widget/font/OSD
surfaces while preserving Win95/98 backwards compatibility.

gfx/drivers/gdi_gfx.c

  Display driver (gfx_display_ctx_gdi)
    - draw: solid quads via FillRect+cached HBRUSH, translucent
      solids via 1x1 premultiplied AlphaBlend, per-corner gradients
      via software bilinear interpolation into a scratch DIB (alpha
      interpolated per-pixel alongside RGB so widget drop-shadows
      that hold RGB constant and animate alpha top->bottom fade
      correctly), and textures via per-texture DIB + AlphaBlend
      with optional RGB tint through a scratch DIB.
    - draw recognises both coord conventions: plain quad
      (draw->x/y/w/h) and custom geometry (coords->vertex +
      coords->tex_coord) used by gfx_display_draw_texture_slice
      (9-patch) and gfx_display_draw_bg (menu wallpapers).
    - draw->scale_factor applied as centred scaling around the dst
      rect midpoint on the plain-quad path only (XMB selected-tab
      icon zoom).  Slice path leaves scale_factor uninitialised so
      we must not read it there.
    - get_default_vertices/tex_coords return real BL,BR,TL,TR
      arrays in 0..1 space so gfx_display_draw_bg gets non-zero
      coords (otherwise MaterialUI/XMB wallpapers degenerate to a
      0x0 rect).
    - scissor_begin/end via SaveDC/RestoreDC + CreateRectRgn +
      SelectClipRgn.  blend_begin/end are no-ops (AlphaBlend is
      per-call).

  Font driver (modelled on d3d8/gl1)
    - A8 atlas mirrored into a BGRA premultiplied DIB; per-glyph
      AlphaBlend hot path for plain white, scratch-DIB tinted
      path for arbitrary RGB.  All five vtable functions
      implemented (get_glyph and get_message_width were NULL in
      the legacy driver, breaking text layout in widgets and the
      stats overlay).
    - Pre-Win98 fallback: TextOut + system font when AlphaBlend
      isn't available.  Glyph metrics still come from the font
      backend so layout stays consistent.

  gdi_load_texture
    - Premultiplies alpha at load time and detects fully-opaque
      textures (lets the draw fast-path skip alpha math).

  gdi_frame
    - bmp_menu (window-sized 32-bit BGRA DIB section) is the
      compositing target whenever any window-resolution content
      is active that frame: textured menus, gfx_widgets, OSD msg,
      or the Display Statistics overlay.  The core/RGUI frame is
      StretchDIBits-upscaled into bmp_menu so widgets, OSD glyphs
      and stats text land at native window resolution on top of
      the upscaled image, instead of being drawn at core size and
      smeared by WM_PAINT's StretchBlt at present time.
    - Pure core gameplay (no widgets, OSD, menu, or stats) keeps
      the legacy small-bmp + WM_PAINT path to avoid the bmp_menu
      clear + BitBlt cost in the steady state.
    - When a textured menu is alive over running content, the
      core frame is uploaded into bmp_menu as a background
      underlay BEFORE menu_driver_frame paints the menu on top.
      Without this, Ozone's semi-transparent wallpaper composites
      against solid black instead of the running game.
    - Display Statistics overlay (statistics_show + stat_text)
      rendered through the font driver with osd_stat_params
      positioning.  Drawn BEFORE widgets so widget panels can
      partially obscure it (matching d3d8/d3d9 layering).
      Suppressed while the menu is alive — d3d8/d3d9 structure
      this as `else if (statistics_show)` off the menu condition;
      the menu drivers own the screen and have their own stats
      path.
    - Per-frame OSD msg renders AFTER widgets so transient
      messages aren't hidden by notification panels.
    - bmp_menu clear is FillRect with bmp_menu pre-selected, not
      a direct DIB pixel write.  Direct writes to a DIB section
      selected into a DC race with GDI's batch queue and produce
      sporadic missing draws.
    - GdiFlush() before BitBlt-to-winDC at present time so all
      draws into bmp_menu have committed.
    - Manual BitBlt-to-winDC + ValidateRect for the bmp_menu
      present path; the legacy bmp path keeps InvalidateRect ->
      WM_PAINT.
    - gdi_gfx_widgets_enabled wired into the video_gdi vtable.

gfx/common/gdi_defines.h

  Extends gdi_t with bmp_menu DIB section + menu_pixels pointer +
  menu_surface_width/height, bmp_width/bmp_height tracking
  (separate from video_width to avoid Step 6/8 fighting over the
  field when RGUI is alive at a different size from the core),
  cached HBRUSH (brush_cached + colour invalidation flag),
  scissor save/restore state, menu_textured_active flag.

gfx/common/win32_common.c

  WM_PAINT handlers (wnd_proc_gdi_dinput, wnd_proc_gdi_winraw,
  wnd_proc_gdi_common): StretchBlt source rect uses bmp_width /
  bmp_height with a fallback to video_width / video_height.

configuration.c

  check_menu_driver_compatibility: case 'g' now matches "gdi"
  exactly so XMB / Ozone / MaterialUI become selectable on the
  GDI driver instead of falling through to RGUI.

Compile-clean under MinGW i686 with -Wall -Wextra
-Werror=implicit-function-declaration
-Werror=declaration-after-statement for _WIN32_WINNT in {0x0400,
0x0410, 0x0501} and with / without HAVE_MENU and HAVE_GFX_WIDGETS.
The GDI video driver was stretching the core frame edge-to-edge
across the entire window regardless of the user's aspect ratio
setting (Settings -> Video -> Scaling -> Aspect Ratio).  d3d8 / d3d9
report e.g. Scale: 2821 x 2160 inside a 3840 x 2160 viewport with
black pillar bars; GDI was reporting Scale: 0 x 0 and stretching
the frame to fill the window edge-to-edge.

Three pieces were missing:

  1. gdi_t had no video_viewport_t state.  set_viewport was an
     empty stub, set_aspect_ratio and apply_state_changes pokes
     were both NULL, so the user's aspect-ratio settings never
     reached the driver.

  2. The frame upload (Step 9 + the textured-menu Step 4b
     underlay) used (0, 0, surface_w, surface_h) as the
     destination rect on bmp_menu, ignoring the viewport entirely.

  3. WM_PAINT's StretchBlt used (0, 0, screen_w, screen_h) as the
     destination on the legacy bmp path, also ignoring viewport.

gfx/common/gdi_defines.h

  Adds video_viewport_t vp + bool keep_aspect + bool should_resize
  to gdi_t.  vp.full_width / full_height hold the window size;
  vp.x / y / width / height hold the destination rect for the core
  frame after aspect-ratio correction.  Includes ../video_defines.h
  for the type.

gfx/drivers/gdi_gfx.c

  - gdi_init: keep_aspect = video->force_aspect; should_resize =
    true so the first frame computes a real viewport before any
    present.
  - gdi_alive: sets should_resize on window-resize events (the
    resize bool from win32_check_window).
  - gdi_set_viewport: real implementation calling
    video_driver_update_viewport(&gdi->vp, force_full,
    gdi->keep_aspect, true).  Mirrors d3d8's pattern.
  - gdi_set_aspect_ratio + gdi_apply_state_changes pokes added
    and wired into the poke interface table.  Both set
    should_resize; set_aspect_ratio also forces keep_aspect = true
    (matching d3d8).
  - gdi_frame Step 2b: when should_resize is set, call
    gdi_set_viewport with the current window size and clear the
    flag.  Defensive fallback to full-window viewport if vp ends
    up zero-sized so we still draw something.
  - gdi_upload_core_frame_to_menu (Step 4b helper) and Step 9
    bmp_menu branch: destination rect is now the viewport
    sub-rect (gdi->vp.x / y / width / height) of bmp_menu rather
    than the whole surface.  Step 4 already cleared bmp_menu to
    black so the bars appear automatically without an explicit
    fill.
  - Forward declaration for gdi_set_viewport since it's called
    from gdi_frame but defined alongside the vtable near the
    bottom of the file.

gfx/common/win32_common.c

  - New wnd_proc_gdi_paint(gdi) helper: paints the four border
    rects (top / bottom / left / right of the viewport) with
    BLACK_BRUSH, then StretchBlts gdi->bmp into the viewport
    rect.  Skips the FillRect calls when the viewport already
    fills the window.
  - All three WM_PAINT handlers (wnd_proc_gdi_dinput,
    wnd_proc_gdi_winraw, wnd_proc_gdi_common) replaced with a
    single call to the helper.  The previous bodies were three
    near-identical 25-line blocks.

Manual BitBlt-from-bmp_menu present path in gdi_frame Step 14
needed no change: bmp_menu is window-sized and was cleared to
black with the game underlay landing only in the viewport
sub-rect, so the bars are baked into bmp_menu by present time
and a 1:1 BitBlt to winDC is still correct.

Note: at high window resolutions GDI's WM_PAINT StretchBlt will
drop frames where d3d8 / d3d9 would not — the StretchBlt is a
software scale on the CPU.  Running at integer scale (Settings ->
Video -> Scaling -> Integer Scale) avoids this on cores whose
native resolution divides evenly into the window.

Compile-clean under MinGW i686 with -Wall -Wextra
-Werror=implicit-function-declaration
-Werror=declaration-after-statement for _WIN32_WINNT in {0x0400,
0x0410, 0x0501} and with / without HAVE_MENU and
HAVE_GFX_WIDGETS.
The GDI video driver had get_overlay_interface set to NULL, so
on-screen input overlays (touch controls / virtual gamepads
configured via Settings -> On-Screen Overlay -> Display Overlay)
loaded silently, registered for input hit-testing, but never
rendered anything visible.  This commit adds a working overlay
implementation matching the d3d8 / d3d9 contract, so an overlay
config tuned against those backends behaves identically here.

gfx/common/gdi_defines.h

  Adds nested struct gdi_overlay { HBITMAP bmp; unsigned tex_w/h;
  float tex_coords[4]; float vert_coords[4]; float alpha_mod;
  bool fullscreen; } and an array of these on gdi_t, plus
  overlays_size and overlays_enabled flags.  All wrapped in
  HAVE_OVERLAY.

gfx/drivers/gdi_gfx.c

  Implements the six video_overlay_interface_t entry points:

    - gdi_overlay_load: builds a 32-bit BGRA premultiplied DIB
      section per image (same conversion gdi_load_texture does
      for menu / widget textures), so the per-frame draw is a
      straight AlphaBlend with no pixel rewriting.

    - gdi_overlay_tex_geom / gdi_overlay_vertex_geom: store the
      0..1 normalised geometry verbatim.  Unlike d3d8 / d3d9 / gl
      we do NOT flip y here: those backends emit vertices in y-up
      clip space and rely on the viewport transform to invert,
      whereas GDI is a y-down pixel-space blit straight to a DC.
      Overlay descriptor coordinates are y-down (same convention
      as RETRO_DEVICE_POINTER, which the hit-test path consumes),
      so a direct multiply is correct.  Applying the d3d-style
      `y = 1 - y; h = -h;` flip here renders the overlay
      vertically mirrored.

    - gdi_overlay_enable: toggles overlays_enabled.

    - gdi_overlay_full_screen: per-overlay flag controlling
      whether vert_coords span the full window (true; covers
      letterbox / pillarbox bars so touch controls keep working
      when the game is letterboxed) or just the game viewport
      rect (false).

    - gdi_overlay_set_alpha: per-overlay alpha modulation,
      applied at render time as AlphaBlend's
      SourceConstantAlpha (per-pixel alpha was already
      premultiplied at load).

  gdi_overlays_render composites all enabled overlays onto the
  active target (bmp_menu when active) using AlphaBlend with
  AC_SRC_ALPHA + SourceConstantAlpha.  No-op on Win95 (no
  AlphaBlend); the static dispatch is sized so the compile-time
  branch produces no warnings on either side.

  Wired into gdi_frame as a new Step 10b between stats (Step 10)
  and widgets (Step 11), matching the d3d8 / d3d9 layering: stats
  -> overlay -> widgets -> OSD msg.  Overlays render regardless
  of menu state so virtual gamepad controls remain visible while
  navigating the menu, also matching the d3d8 / d3d9 behaviour.

  Wired into the need_bmp_menu trigger block: an active overlay
  forces bmp_menu allocation just like widgets do, since overlays
  are window-resolution images that would smear if drawn into the
  small core-frame bmp and scaled up by WM_PAINT.

  gdi_overlay_free called from gdi_free before texDC teardown
  since the overlay DIB sections may be selected into texDC at
  free time.

  Forward declarations added near the existing forward-decl block.

Compile-clean under MinGW i686 with -Wall -Wextra
-Werror=implicit-function-declaration
-Werror=declaration-after-statement for _WIN32_WINNT in {0x0400,
0x0410, 0x0501} and with / without HAVE_OVERLAY, HAVE_MENU,
HAVE_GFX_WIDGETS.
…t_info

gdi: implement viewport_info

The video_driver_t::viewport_info hook is optional, and several
drivers (gdi, caca, sixel, network, fpga, vg, ps2, xenon360,
xshm) leave it NULL.  The wrapper video_driver_get_viewport_info
returns false in that case and leaves the caller's struct
untouched.

That contract is unsafe with the way many call sites are
written.  Callers in menu_setting.c, menu/drivers/rgui.c, and
several input drivers declare `video_viewport_t vp;` on the
stack, call video_driver_get_viewport_info(&vp), and proceed to
read fields off the struct without checking the return value.
When viewport_info is NULL, those reads land on uninitialised
stack memory.

The most damaging consequence: setting_action_start_custom_vp_*
in menu_setting.c writes the result straight back into
settings->video_vp_custom:

    custom->width  = vp.full_width  - custom->x;
    custom->height = vp.full_height - custom->y;

When the active video driver is gdi (no viewport_info),
vp.full_width / vp.full_height are stack garbage, and the
resulting custom_viewport_width / _height get persisted to
retroarch.cfg on shutdown.  Restarting then reads back values
like custom_viewport_width=57874, custom_viewport_height=32759,
custom_viewport_x=972119847 — and aspect ratio CUSTOM uses those
values for the game frame's destination rect, which renders the
game effectively invisible until the user manually picks a
different aspect ratio.

This patch:

  gfx/video_driver.c

    Zero the output struct in video_driver_get_viewport_info
    when the driver doesn't implement viewport_info, or when
    the viewport pointer itself is NULL.  Callers that
    correctly check the return value see no behavioural change;
    callers that ignore it now read all-zeros rather than stack
    garbage, which degenerates predictably (vp.full_width = 0
    means custom->width gets set to -custom->x — bounded —
    instead of unbounded).  This bounds the failure mode for
    every driver currently lacking viewport_info, not just gdi.

  gfx/drivers/gdi_gfx.c

    Implements viewport_info properly.  Mirrors
    d3d8_viewport_info: copy gdi->vp into the output struct.
    With this in place, gdi_t's existing vp tracking (already
    maintained for letterbox/pillarbox and overlay rendering)
    becomes the source of truth that the menu reads back when
    the user adjusts the custom viewport.
Two related fixes for the Vulkan HDR pipeline, both surfaced by toggling
HDR mode (Off/HDR10/scRGB) in the menu while running fullscreen on
Win11 + NVIDIA.

1) vulkan_create_swapchain never set VK_CTX_FLAG_HDR_SCRGB on the scRGB
   success path - it only ever cleared it. Downstream code that branches
   on this flag would therefore behave as if scRGB output was inactive
   even when the swapchain had been created with R16G16B16A16_SFLOAT +
   VK_COLOR_SPACE_EXTENDED_SRGB_LINEAR_EXT.

2) swapchain_semaphores[] is populated lazily by vulkan_acquire_next_image,
   one slot at a time, only for the image actually returned by
   vkAcquireNextImageKHR. On the swapchain recreate path
   vulkan_destroy_swapchain memsets the entire array to zero, and at
   least one path through the recreate reaches vulkan_present with
   current_swapchain_index pointing at an image whose slot has not yet
   been re-populated. vkQueuePresentKHR is then handed VK_NULL_HANDLE in
   pWaitSemaphores, which NVIDIA real-fullscreen on Win11 dereferences
   inside the ICD and segfaults on. Other drivers (AMD, Intel, MoltenVK,
   and NVIDIA in windowed-fullscreen) tolerate the NULL silently, which
   is why this only reproduces on one configuration.

   Fix by pre-creating all per-image present semaphores immediately
   after vkGetSwapchainImagesKHR, so no acquire ordering can leave the
   array in a half-populated state. The existing lazy-allocate in
   vulkan_acquire_next_image becomes a no-op (slot is already non-NULL)
   but is left in place as a safety net.

Reproducer: fullscreen Vulkan + NVIDIA RTX 5090 + Win11, toggle HDR mode
in the menu. Crashes inside vkQueuePresentKHR on the second frame after
reinit, when the acquire returns swapchain image index 1 for the first
time on the new swapchain.
When RGUI is up over a running core with menu_rgui_transparency
enabled, d3d9 / d3d8 / etc render the chequer pattern with
partial alpha so the game shows through.  GDI was rendering the
chequer opaque against solid black, hiding the game completely.

Two changes:

  Step 4b (game-frame underlay)

    The condition that skipped Step 4b whenever menu_frame was
    set was based on the assumption that RGUI uses the legacy
    non-textured (gdi->bmp + WM_PAINT) path.  That stopped being
    true once widgets / overlays could force RGUI through the
    bmp_menu path — which is the common case (any user with the
    FPS widget or a touch overlay enabled).  The condition is
    relaxed to "menu is alive AND content is loaded", so RGUI
    gets the same game-frame underlay that XMB / Ozone /
    MaterialUI already do.

  Step 9 RGUI bmp_menu branch

    StretchDIBits with SRCCOPY discards the alpha and overwrites
    the underlay.  Replaced with gdi_blit_rgui_alpha, a new
    helper that:

      - Allocates a 32-bit BGRA scratch DIB section sized to
        RGUI's frame.
      - Converts the 16-bit RGBA4444 source to BGRA32
        premultiplied (alpha lives in bits 0-3 of the RGBA4444
        word; argb32_to_rgba4444 in rgui.c is the producer).
      - AlphaBlend-stretches the scratch DIB over bmp_menu with
        AC_SRC_OVER + AC_SRC_ALPHA.

    On Win95 (no AlphaBlend) the helper is compiled out and the
    code falls back to the existing opaque StretchDIBits path,
    so transparency degrades to solid backgrounds — same
    behaviour as platforms RGUI itself doesn't consider
    transparency-capable (ps2, sdl_dingux, etc).

    The opaque path is also still used for non-RGUI 16-bit
    sources and for non-16-bit sources, which keeps the core
    game frame on its existing fast SRCCOPY path.

The conversion is per-pixel software, but RGUI frames are small
(256x192 to 512x480 typical) and only one composite happens per
frame.  Caching the scratch DIB on gdi_t is a possible future
optimisation; left out for now since correctness was the goal
and the per-frame cost is negligible at typical RGUI
resolutions.
The four AlphaBlend helper paths were allocating a fresh DIB
section on every call:

  - gdi_blit_rgui_alpha            (RGUI alpha composite, per frame)
  - gdi_blit_texture_modulated     (tinted icon, per draw)
  - gfx_display_gdi_draw gradient  (per-vertex colour ramp, per draw)
  - gfx_display_gdi_draw 1x1       (translucent solid colour, per draw)

CreateDIBSection / DeleteObject is a kernel-side syscall pair —
allocates committed pages, populates the BITMAPINFOHEADER,
registers a kernel handle, and on free does the inverse.  For
Ozone, gfx_display_ctx_gdi_draw is called hundreds of times per
frame; the bulk of those go through the 1x1 or gradient paths.

Cache one DIB per path on gdi_t:

  - scratch_1x1: fixed 1x1 BGRA, allocated lazily on first use,
    held for the lifetime of gdi_t.  Used by the translucent
    solid-quad path: rewrite the single pixel each call instead
    of recreating the whole DIB.
  - scratch_quad: variable, grow-only.  Shared by the gradient
    and texture-modulated paths.  When a draw asks for w x h
    pixels and the existing DIB is at least that big, it's
    reused; otherwise we DeleteObject + CreateDIBSection at the
    new max dimension.  We never shrink, since the whole point
    is to amortise the allocation cost across frames.
  - scratch_rgui: variable, grow-only.  Used by the RGUI alpha
    path.  Kept separate from scratch_quad so a frame that
    composites RGUI AND draws gradient quads doesn't thrash one
    shared slot back and forth.

The "DIB might be larger than the request" wrinkle: the inner
loops in the gradient / tint / RGUI paths now use the cached
DIB's actual width as their stride (gdi->scratch_*_w), and the
downstream AlphaBlend / BitBlt source rect is the requested
w x h sub-rect — so leftover stale pixels in the unused tail are
never sampled.

Three new helpers:

  gdi_ensure_scratch_quad / gdi_ensure_scratch_rgui:
    Grow-or-reuse a variable-sized scratch DIB.  Return false on
    allocation failure; callers bail out of the draw (matching
    the previous CreateDIBSection-failure behaviour).

  gdi_ensure_scratch_1x1:
    Lazy first-time allocation of the fixed 1x1 DIB.

  gdi_release_scratch:
    Tear down all three.  Called from gdi_free, before texDC is
    destroyed (the scratch DIBs may be selected into texDC at
    free time).

No visual change intended.  All four paths produce exactly the
same pixels into the same destination rect — the only thing
that changes is the lifetime of the source DIB section.
The gradient path in gfx_display_gdi_draw computed every pixel
through a doubly-nested bilinear interpolation loop:

  for each row
    compute top-bottom interp factor
    for each column
      compute left-right interp factor
      blend TL/TR (top edge), blend BL/BR (bottom edge)
      blend top/bottom edges horizontally
      premultiply if not all_opaque
      store

In practice almost every gradient the menu and widget code emits
is one of two specific shapes:

  - vertical-only   (TL == TR and BL == BR): header strips,
    drop shadows, sidebar fades.  Every column is identical;
    every row is a uniform colour.
  - horizontal-only (TL == BL and TR == BR): rarer, but used for
    e.g. some progress / focus indicators.  Every row is
    identical; colour varies across columns only.

Detect those cases and collapse the doubly-nested loop:

  vertical-only: compute the row colour once per row (one
    interpolation per channel), then fill the row width with that
    pixel.  For a 600x80 vertical gradient that's ~80 pixel
    computes plus 80 row fills, instead of ~48000 pixel computes.

  horizontal-only: compute the first row pixel-by-pixel, memcpy
    it to every subsequent row.

The general 4-corner bilinear path stays as the fallback for
anything that doesn't fit either shape — moved into its own
else branch with no behavioural change relative to the previous
implementation.

Output is byte-identical to the bilinear path for both 1D
shapes (the math reduces to the same formula when the redundant
dimension's factors cancel), so this is purely a speed
optimisation with no visual change.
The hot pixel paths in gfx_display_gdi_draw and gdi_font_render_line
do many `(uint32_t)x / 255u` operations per pixel — that's a 20-30
cycle integer divide on x86 vs a few cycles for shift+add.  For a
typical Ozone-with-widgets frame:

  - General 4-corner gradient: 14 divides per pixel.
  - 1D gradients (vertical/horizontal): 4 divides per row/column,
    plus 3 per non-opaque pixel.  Less hot since the previous
    commit collapsed those to 1D loops, but still worth a free
    win.
  - Tinted-glyph font composite: 4 divides per glyph pixel.

Add a GDI_DIV255 macro:

  #define GDI_DIV255(x) ((((x) + 1) + ((x) >> 8)) >> 8)

Verified bit-exact equivalent of `(uint32_t)x / 255u` for every
input in [0, 255*255 = 65025] — a brute-force comparison against
integer division across all 65026 values produces zero diffs.
That's exactly the input range that products of two 8-bit values
land in, which is what every divide-by-255 site here computes.

Applied at every hot per-pixel /255 site:

  - Gradient bilinear (general 4-corner path): 14 sites per
    pixel.
  - 1D gradient paths (vertical-only, horizontal-only): 4 sites
    per row/column plus 3 sites per non-opaque pixel.
  - Tinted-glyph font scratch composite: 4 sites per pixel.
  - 1x1 translucent-solid premultiply: 3 sites per draw.
  - Texture-modulated tint (out_a only): 1 site per pixel.
  - Font line outer premultiply: 3 sites per line.
  - gdi_load_texture / gdi_overlay_load: 3 sites per non-opaque
    pixel.  Load-time only, but free to apply for consistency.

Deliberately NOT changed:

  - The `/ (255u * 255u)` divides for out_r/g/b in
    gdi_blit_texture_modulated.  Collapsing those to two
    sequential GDI_DIV255 calls would introduce up to 1 LSB of
    rounding error compared to the single divide, since
    (a/255)*(b/255) has a different rounding boundary than
    (a*b)/(255*255).  The cost saving isn't worth a visible
    drift in tinted-icon pixels.
  - The `(x + 127) / 255` rounded form in gdi_blit_rgui_alpha.
    That's deliberately round-to-nearest rather than truncate,
    which GDI_DIV255 doesn't reproduce.  RGUI's per-frame cost
    is dominated by syscall / blit overhead, not the divides.
  - The `(iy * 255u) / (dst_h - 1)` interp-factor divides.
    Divisor varies per draw; not a constant-255 case.

No visual change intended.  Output is byte-identical to the
divide-based code at every converted site.
The bmp_menu present path was creating a temporary DC, selecting
the DIB section into it, BitBlt-ing through that DC to the
window DC, then tearing the temporary DC down — every frame:

   HDC menu_dc = CreateCompatibleDC(gdi->winDC);
   if (menu_dc) {
      HBITMAP menu_old = (HBITMAP)SelectObject(menu_dc, gdi->bmp_menu);
      GdiFlush();
      BitBlt(gdi->winDC, ..., menu_dc, ..., SRCCOPY);
      SelectObject(menu_dc, menu_old);
      DeleteDC(menu_dc);
   }

SetDIBitsToDevice does the same thing — copy a DIB to a window
DC at 1:1 scale — but takes the raw pixel pointer directly,
skipping the temporary DC framework entirely.  We already keep a
uint32_t* into the DIB pixels (gdi->menu_pixels, populated by
gdi_ensure_menu_surface) so the substitution is straightforward:

   GdiFlush();
   SetDIBitsToDevice(gdi->winDC,
         0, 0, surface_width, surface_height,
         0, 0, 0, surface_height,
         gdi->menu_pixels, &bmi, DIB_RGB_COLORS);

Per frame this saves a CreateCompatibleDC, two SelectObjects,
and a DeleteDC syscall.  The pixel-copy bandwidth is unchanged
(the GDI implementation still has to move surface_width *
surface_height * 4 bytes from system RAM to the window's
displayable surface), but the API path is shorter and avoids
some of GDI's DC-based source-routing overhead.

bmp_menu is allocated at exactly the window surface size so
there's no scaling involved, which is the key precondition for
SetDIBitsToDevice (the no-scaling, simpler cousin of
StretchDIBits).  The DIB is top-down (biHeight is negative in
gdi_ensure_menu_surface) and SetDIBitsToDevice supports
top-down DIBs natively, so the bit layout matches without
needing to flip rows.

Compatibility: SetDIBitsToDevice is part of the original Win32
API.  Same availability surface as BitBlt itself — Win95, NT
3.5+, Win32s.  No regression vs the previous path.

The legacy bmp + WM_PAINT + StretchBlt route (used when no
widgets, no textured menu, no OSD/stats) is untouched — that
path actually does scale (small core frame to window-sized
viewport), so SetDIBitsToDevice doesn't fit there.
@pull pull Bot locked and limited conversation to collaborators Apr 30, 2026
@pull pull Bot added the ⤵️ pull label Apr 30, 2026
@pull pull Bot merged commit ff65ab1 into Alexandre1er:master Apr 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant