Skip to content

Windows fixes#1103

Open
mtelvers wants to merge 6 commits into
ocsigen:masterfrom
mtelvers:windows-fixes-v2
Open

Windows fixes#1103
mtelvers wants to merge 6 commits into
ocsigen:masterfrom
mtelvers:windows-fixes-v2

Conversation

@mtelvers
Copy link
Copy Markdown
Contributor

@mtelvers mtelvers commented Mar 3, 2026

I have been working on ocluster/obuilder on Windows and have run into several difficulties with lwt. I would like to contribute these fixes back to the codebase. I have divided it into four separate commits to make it easier to review. Each commit has a detailed description.

I note that some of the code generation was done using an AI agent. However, I have reviewed every line of these commits and tested them on both Windows 2025 and Windows 2019.

Move initialize_threading() to run before pool_mutex is accessed in
lwt_unix_start_job, rather than only inside the DETACH/SWITCH case.

On Windows, CRITICAL_SECTION must be explicitly initialized via
InitializeCriticalSection before use. Unlike pthreads where a
zero-initialized mutex (PTHREAD_MUTEX_INITIALIZER) is valid, a
zero-initialized CRITICAL_SECTION is invalid. The previous code could
access pool_mutex before initialize_threading() was called, causing
crashes or undefined behavior on Windows when the first job used
async_method != NONE.

initialize_threading() is idempotent (guarded by threading_initialized
flag), so calling it earlier is safe and has no effect on Unix platforms.
Copy link
Copy Markdown
Collaborator

@raphael-proust raphael-proust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for the contribution that's outside of my area, it's very valuable <3

i'll try to get someone to read the windows-specific C code before merging

Comment thread src/unix/lwt_unix.cppo.ml Outdated
wait_read ch >>= fun () ->
(* On Windows, select() doesn't work with pipe handles, so skip
wait_read and let the worker thread handle blocking directly. *)
(if Sys.win32 then Lwt.return_unit else wait_read ch) >>= fun () ->
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat nitpicky but also i want to keep the code very readable to ease maintenance…

I think i'd rather have the branch hoisted up so that the AST is flatter (if wider)

Lazy.force ch.blocking >>= function
| true when Sys.win32 ->
  (* comment *)
  windows code
| true ->
  linux code true
| false ->
  linux code false

Copy link
Copy Markdown
Contributor Author

@mtelvers mtelvers Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this usage pattern.

}
}

#endif
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the bandwidth to review this rn. I'm completely unfamiliar with the specifics of windows. I'll try to ping someone who can help with that part.

Comment thread src/unix/lwt_unix_stubs.c
if (async_method != LWT_UNIX_ASYNC_METHOD_NONE) {
initialize_threading();
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a fix for #1104 actually

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was at least morally wrong on Unix as well, I think - those variables aren't using the static initialisers either?

Copy link
Copy Markdown
Contributor

@dra27 dra27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a quick look - the first commit fixes a bug introduced in #1094, which seems worth separating.

I'm not convinced by the arguments given for the other commits:

  • I don't understand the issue or fix being addressed in the handle inheritance race (but as this code very closely mirrors code in OCaml, it's certainly interesting) - but the description seems to refer to a pattern that's not actually in use (where is SetHandleInformation being used and why doesn't it actually turn off handle inheritance, and where's the mention of the fact that stdhandles inherit regardless of setting?)
  • The waitpid commit has a suspicious lack of the subtleties referred to in ocaml/ocaml#11021 and related issues (the PID implementation is a nightmare...). It might be that the commit is a better frying pan than the current one, though!
  • Surely the wait_read/wait_write change blocks concurrency, or at least risks hanging loads of threads in read syscalls (less of an issue on Windows than Unix, perhaps, but still not great?)? Note that OCaml's win32unix implementation can block on pipes and other non-socket kinds of fd... I'd kinda expect lwt's implementation here to specialise on the handle type (i.e. use select when it's a handle and otherwise roll a call for WaitForMultipleObjects or some such as appropriate when they're not sockets, using the win32unix interface to see which is which)

Comment thread src/unix/lwt_unix_stubs.c
if (async_method != LWT_UNIX_ASYNC_METHOD_NONE) {
initialize_threading();
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was at least morally wrong on Unix as well, I think - those variables aren't using the static initialisers either?

@@ -101,7 +158,7 @@ CAMLprim value lwt_process_create_process(value prog, value cmdline, value env,

flags |= CREATE_UNICODE_ENVIRONMENT;
if (! CreateProcess(progs, cmdlines, NULL, NULL, TRUE, flags,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bInheritHandles is still TRUE, which leaves me wondering what this commit is actually fixing?

Switch from STARTUPINFO to STARTUPINFOEX with
PROC_THREAD_ATTRIBUTE_HANDLE_LIST to explicitly specify which handles
the child process should inherit.

Previously, bInheritHandles=TRUE caused all inheritable handles in
the process to be inherited by every child. When spawning concurrent
child processes, this leaks unrelated handles — e.g. child B inherits
child A's pipe handles, preventing EOF on A's pipes when A exits.
On Unix, O_CLOEXEC prevents this; on Windows, the equivalent is
PROC_THREAD_ATTRIBUTE_HANDLE_LIST to restrict inheritance to only the
intended stdin/stdout/stderr handles.

The handle list is deduplicated since UpdateProcThreadAttribute requires
unique handle values (e.g. when stdout and stderr point to the same
handle).

Also fixes the comment to correctly reference fd0, fd1, fd2 instead of
fd1, fd2, fd3.
Add a new Windows-specific waitpid job (windows_waitpid_job.c) that uses
OpenProcess + WaitForSingleObject + GetExitCodeProcess instead of the
POSIX-style Unix.waitpid which doesn't work properly on Windows.

The new implementation:
- Runs in a worker thread via the Lwt job system for proper async behavior
- Supports WNOHANG via a 0-timeout WaitForSingleObject call
- Returns (0, WEXITED 0) for WNOHANG when the process is still running,
  matching the POSIX convention
- Returns (pid, WEXITED exit_code) when the process has exited

The OCaml side dispatches to _win32_waitpid on Windows (which runs the
new C job) instead of _waitpid (which called Unix.waitpid synchronously).
On Windows, select() only works with socket handles, not with pipe
handles or regular file handles. Calling wait_read or wait_write on
a blocking pipe fd would fail because the underlying select() call
cannot handle non-socket HANDLEs.

Skip the wait_read/wait_write calls on Windows (guarded by Sys.win32)
and let the worker thread handle blocking directly. This is safe
because Sys.win32 is a compile-time constant, so there is no runtime
overhead on Unix platforms.

Affected call sites: read, pread, read_bigarray, write, pwrite,
write_bigarray (6 total).
@mtelvers
Copy link
Copy Markdown
Contributor Author

mtelvers commented Apr 10, 2026

Thanks for the detailed review. I really appreciate the expertise here.

pool_mutex fix (commit a3b5e36): I agreed this is a standalone bug fix for the regression introduced in #1094. Happy to split this into its own PR. You're right that it's morally wrong on Unix too. :-)

Handle inheritance (lwt_process_stubs.c): The commit message is poorly worded, sorry about that. The old code didn't use SetHandleInformation; it was more a description of the general problem rather than the specific code path. What's actually happening is that the old code used STARTUPINFO with bInheritHandles=TRUE, which causes all inheritable handles in the process to be inherited by the child. bInheritHandles must remain TRUE for the attribute list to take effect. The reference is https://devblogs.microsoft.com/oldnewthing/20111216-00/?p=8873. I'll reword the commit message to make this clearer.

The concrete problem: with the old code, bInheritHandles=TRUE with plain STARTUPINFO causes every inheritable handle in the process to be inherited by every child. While pipes created with ~cloexec:true start non-inheritable, ensure_inheritable temporarily creates inheritable duplicates for the child's stdio, and any other inheritable handles in the process would also leak. Using PROC_THREAD_ATTRIBUTE_HANDLE_LIST restricts each child to only its intended stdio handles, which is the correct pattern on Windows (equivalent to O_CLOEXEC + posix_spawn on Unix).

I've also noticed a minor issue: when all three stdio handles are NULL/INVALID_HANDLE_VALUE, the attribute list isn't created and EXTENDED_STARTUPINFO_PRESENT isn't set, but cb is still sizeof(STARTUPINFOEX) rather than sizeof(STARTUPINFO). This works in practice but is technically incorrect per the API contract. I'll fix that.

waitpid: Fair point about the PID reuse issue. OpenProcess by PID is inherently racy if the process has exited and the PID has been reused. OCaml's runtime avoids this by caching the process handle from CreateProcess, but Lwt's async job bypasses Unix.waitpid so it can't access that internal cache. There's also a handle leak if the promise is cancelled: the worker thread stays blocked in WaitForSingleObject(INFINITE) with no way to clean up (though this same issue exists in lwt_process_wait_job)

Looking at this again, my motivation (ocluster/obuilder on Windows) ended up not using it. It exclusively uses Lwt_process for process management, which already has a safe wait path via cached handles. I left this in for completeness rather than necessity; however, Lwt_unix.system and Lwt_unix.wait() do use this path, and without it Windows falls back to calling Unix.waitpid synchronously, which blocks the event loop. I've added a comment documenting the PID reuse caveat and pointing callers to Lwt_process for race-free waiting.

wait_read/wait_write: The motivation here is that without this fix, any Lwt_io.read or Lwt_io.write on a pipe channel hangs on Windows. The chain is: Lwt_process.open_process creates pipes, which are marked as blocking on Windows (mk_ch ~blocking:Sys.win32). Reading from them goes through Lwt_io -> Lwt_bytes.read -> Lwt_unix.read_bigarray, which takes the | true -> branch and calls wait_read. That first does a zero-timeout select() check via readable ch. If data is already in the pipe buffer, this may succeed, and the read proceeds fine. But if the pipe is empty, register_action registers the fd with the Lwt engine's select() loop, and select() on Windows only works with sockets, not pipes, so the engine can't monitor the fd and the read hangs. Thus, the bug is intermittent: fast-producing child processes that fill the pipe buffer before Lwt reads hide it; slow producers that haven't written yet expose it. This is the bug I've been looking for for some years!

On the concurrency concern: skipping wait_read doesn't actually lose concurrency here. run_job dispatches the blocking read to a worker thread pool (default size 1000), so multiple concurrent Lwt.t promises each get their own worker thread while the main event loop continues. On Unix, wait_read is an optimised sleep on select() until data is ready, but on Windows with blocking pipe fds, the worker-thread approach is reasonable. The cost is that each pending pipe read/write ties up a pool thread. Under extreme concurrency, this could exhaust the pool, but for normal workloads, the pool is more than adequate.

The guard is slightly broader than necessary as it is applied to all blocking fds on Windows, not just pipes. In practice, this is fine because Lwt creates sockets as non-blocking, so they already take the | false -> branch and aren't affected.

You mentioned using WaitForMultipleObjects for non-socket handles, which would be a better long-term approach. That would be a much larger PR, there's the issue of the 64-handle MAXIMUM_WAIT_OBJECTS limit on the API. A more sophisticated engine backend can always be built later.

@raphael-proust
Copy link
Copy Markdown
Collaborator

first commit is included in ocaml/opam-repository#29752

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants