Windows fixes#1103
Conversation
Move initialize_threading() to run before pool_mutex is accessed in lwt_unix_start_job, rather than only inside the DETACH/SWITCH case. On Windows, CRITICAL_SECTION must be explicitly initialized via InitializeCriticalSection before use. Unlike pthreads where a zero-initialized mutex (PTHREAD_MUTEX_INITIALIZER) is valid, a zero-initialized CRITICAL_SECTION is invalid. The previous code could access pool_mutex before initialize_threading() was called, causing crashes or undefined behavior on Windows when the first job used async_method != NONE. initialize_threading() is idempotent (guarded by threading_initialized flag), so calling it earlier is safe and has no effect on Unix platforms.
raphael-proust
left a comment
There was a problem hiding this comment.
thanks a lot for the contribution that's outside of my area, it's very valuable <3
i'll try to get someone to read the windows-specific C code before merging
| wait_read ch >>= fun () -> | ||
| (* On Windows, select() doesn't work with pipe handles, so skip | ||
| wait_read and let the worker thread handle blocking directly. *) | ||
| (if Sys.win32 then Lwt.return_unit else wait_read ch) >>= fun () -> |
There was a problem hiding this comment.
somewhat nitpicky but also i want to keep the code very readable to ease maintenance…
I think i'd rather have the branch hoisted up so that the AST is flatter (if wider)
Lazy.force ch.blocking >>= function
| true when Sys.win32 ->
(* comment *)
windows code
| true ->
linux code true
| false ->
linux code false
There was a problem hiding this comment.
I've refactored this usage pattern.
| } | ||
| } | ||
|
|
||
| #endif |
There was a problem hiding this comment.
I don't have the bandwidth to review this rn. I'm completely unfamiliar with the specifics of windows. I'll try to ping someone who can help with that part.
| if (async_method != LWT_UNIX_ASYNC_METHOD_NONE) { | ||
| initialize_threading(); | ||
| } | ||
|
|
There was a problem hiding this comment.
This was at least morally wrong on Unix as well, I think - those variables aren't using the static initialisers either?
dra27
left a comment
There was a problem hiding this comment.
I've had a quick look - the first commit fixes a bug introduced in #1094, which seems worth separating.
I'm not convinced by the arguments given for the other commits:
- I don't understand the issue or fix being addressed in the handle inheritance race (but as this code very closely mirrors code in OCaml, it's certainly interesting) - but the description seems to refer to a pattern that's not actually in use (where is
SetHandleInformationbeing used and why doesn't it actually turn off handle inheritance, and where's the mention of the fact that stdhandles inherit regardless of setting?) - The waitpid commit has a suspicious lack of the subtleties referred to in ocaml/ocaml#11021 and related issues (the PID implementation is a nightmare...). It might be that the commit is a better frying pan than the current one, though!
- Surely the wait_read/wait_write change blocks concurrency, or at least risks hanging loads of threads in read syscalls (less of an issue on Windows than Unix, perhaps, but still not great?)? Note that OCaml's win32unix implementation can block on pipes and other non-socket kinds of fd... I'd kinda expect lwt's implementation here to specialise on the handle type (i.e. use
selectwhen it's a handle and otherwise roll a call forWaitForMultipleObjectsor some such as appropriate when they're not sockets, using the win32unix interface to see which is which)
| if (async_method != LWT_UNIX_ASYNC_METHOD_NONE) { | ||
| initialize_threading(); | ||
| } | ||
|
|
There was a problem hiding this comment.
This was at least morally wrong on Unix as well, I think - those variables aren't using the static initialisers either?
| @@ -101,7 +158,7 @@ CAMLprim value lwt_process_create_process(value prog, value cmdline, value env, | |||
|
|
|||
| flags |= CREATE_UNICODE_ENVIRONMENT; | |||
| if (! CreateProcess(progs, cmdlines, NULL, NULL, TRUE, flags, | |||
There was a problem hiding this comment.
bInheritHandles is still TRUE, which leaves me wondering what this commit is actually fixing?
Switch from STARTUPINFO to STARTUPINFOEX with PROC_THREAD_ATTRIBUTE_HANDLE_LIST to explicitly specify which handles the child process should inherit. Previously, bInheritHandles=TRUE caused all inheritable handles in the process to be inherited by every child. When spawning concurrent child processes, this leaks unrelated handles — e.g. child B inherits child A's pipe handles, preventing EOF on A's pipes when A exits. On Unix, O_CLOEXEC prevents this; on Windows, the equivalent is PROC_THREAD_ATTRIBUTE_HANDLE_LIST to restrict inheritance to only the intended stdin/stdout/stderr handles. The handle list is deduplicated since UpdateProcThreadAttribute requires unique handle values (e.g. when stdout and stderr point to the same handle). Also fixes the comment to correctly reference fd0, fd1, fd2 instead of fd1, fd2, fd3.
Add a new Windows-specific waitpid job (windows_waitpid_job.c) that uses OpenProcess + WaitForSingleObject + GetExitCodeProcess instead of the POSIX-style Unix.waitpid which doesn't work properly on Windows. The new implementation: - Runs in a worker thread via the Lwt job system for proper async behavior - Supports WNOHANG via a 0-timeout WaitForSingleObject call - Returns (0, WEXITED 0) for WNOHANG when the process is still running, matching the POSIX convention - Returns (pid, WEXITED exit_code) when the process has exited The OCaml side dispatches to _win32_waitpid on Windows (which runs the new C job) instead of _waitpid (which called Unix.waitpid synchronously).
On Windows, select() only works with socket handles, not with pipe handles or regular file handles. Calling wait_read or wait_write on a blocking pipe fd would fail because the underlying select() call cannot handle non-socket HANDLEs. Skip the wait_read/wait_write calls on Windows (guarded by Sys.win32) and let the worker thread handle blocking directly. This is safe because Sys.win32 is a compile-time constant, so there is no runtime overhead on Unix platforms. Affected call sites: read, pread, read_bigarray, write, pwrite, write_bigarray (6 total).
1cc72ec to
6413e90
Compare
|
Thanks for the detailed review. I really appreciate the expertise here.
Handle inheritance ( The concrete problem: with the old code, I've also noticed a minor issue: when all three stdio handles are waitpid: Fair point about the PID reuse issue. Looking at this again, my motivation (ocluster/obuilder on Windows) ended up not using it. It exclusively uses
On the concurrency concern: skipping The guard is slightly broader than necessary as it is applied to all blocking fds on Windows, not just pipes. In practice, this is fine because Lwt creates sockets as non-blocking, so they already take the You mentioned using |
|
first commit is included in ocaml/opam-repository#29752 |
I have been working on ocluster/obuilder on Windows and have run into several difficulties with lwt. I would like to contribute these fixes back to the codebase. I have divided it into four separate commits to make it easier to review. Each commit has a detailed description.
I note that some of the code generation was done using an AI agent. However, I have reviewed every line of these commits and tested them on both Windows 2025 and Windows 2019.