CommandQueueMT: Make re-entrant again + Fix multiple flushers case #113802

RandomShaper · 2025-12-09T13:32:52Z

Let's see if we finally get a functional and reliable CommandQueueMT.

In order to allow _flush() to be called from multiple threads, we were considering making the mutex recursive. However, even after removing the unlock allowance logic that is indeed not needed here, there's still some condition variable logic that needs the mutex to be binary.

In the end, I've simply added a thread-local variable for the current thread to know if it should attempt to lock.

Moreover, loosely related to re-entrancy, in the case of another thread requesting a flush while another thread performing one, I've added code so the former waits for the latter. Otherwise, we would be breaking the assumption that after calling a flush function all the commands added up to that point have been run.

☣ Disclaimer: untested. ☣

CC @brycehutchings

Fixes #113627.

core/templates/command_queue_mt.h

brycehutchings · 2025-12-09T18:57:43Z

This makes sense to me, and I love that unlock allowance zone isn't needed so this file continues to trend toward being simpler 🥳

Ivorforce · 2025-12-09T19:00:48Z

Note that thread_local can be pretty expensive. But it's more important that CommandQueueMT is reliable than that it is performant (plus, I'm not sure if it matters in practice here, since the cost might be less than all the mutexes, context switches and the task itself anyway).

dsnopek · 2025-12-09T19:15:08Z

Thanks! I'll do some testing with multi-threaded rendering in the context of XR

dsnopek

I some testing of this PR with my XR fork of the GDQuest TPS demo and multi-threaded rendering enabled - it worked fine! I also did some Perfetto traces (on both Samsung Galaxy XR and Meta Quest 3) to make sure that it still looks like it did back on #112506 and that was good too :-)

RandomShaper · 2025-12-10T10:25:52Z

Note that thread_local can be pretty expensive. But it's more important that CommandQueueMT is reliable than that it is performant (plus, I'm not sure if it matters in practice here, since the cost might be less than all the mutexes, context switches and the task itself anyway).

Good point. I also considered using an atomic flusher thread id plus some compare-and-exchange logic (I'd swear I saw something like that committed recently, but I couldn't find it). However, I agree that either approach is dwarfed by everything else. Not a bad thing to take a deeper look into at some point, though.

@dsnopek, thank you again for testing!

PS: I'm happy we found the right approach in the end, but I have to admit I regret having been so "trigger-happy," which led to so much iteration.

mihe

I can't speak to the code too much, but I can confirm that this does indeed fix #113627 when testing the MRP in #113620, without #113622.

brycehutchings · 2025-12-09T18:37:28Z

core/templates/command_queue_mt.h

+
 		MutexLock lock(mutex);

 		if (unlikely(flush_read_ptr)) {


Is this check needed at all? When I look at how flush_read_ptr interacts with the mutex, it looks like it is always set to 0 before the mutex is released. So any time we get past the mutex being acquired, isn't flush_read_ptr guaranteed to be 0? It seems like it should just flow down to the while loop and run that normally.

This check used to be before the mutex acquire, and there it made sense to avoid the deadlock issue (although it had race condition problems). By moving it to after the mutex, the deadlock issue was introduced, but now the condition should never happen. At least I think so?

The lock guard does temp_unlock() while flush_read_ptr is non-zero, so it's not guaranteed to always be zero when we get past the mutex. See this discussion.

Ah right. Is there a risk that after lock.temp_unlock() but before the no-op command is queued that the other thread finishes flushing and then this stalls forever?

Even if so though, in practice this might mean being blocked for one frame?

Repiteo · 2025-12-11T00:17:18Z

Thanks!

RandomShaper requested a review from mihe December 9, 2025 13:32

RandomShaper requested review from a team as code owners December 9, 2025 13:32

RandomShaper added bug topic:core cherrypick:4.4 Considered for cherry-picking into a future 4.4.x release cherrypick:4.5 Considered for cherry-picking into a future 4.5.x release labels Dec 9, 2025

RandomShaper added this to the 4.6 milestone Dec 9, 2025

akien-mga requested a review from dsnopek December 9, 2025 13:38

blueskythlikesclouds reviewed Dec 9, 2025

View reviewed changes

core/templates/command_queue_mt.h Show resolved Hide resolved

mihe reviewed Dec 9, 2025

View reviewed changes

core/templates/command_queue_mt.h Show resolved Hide resolved

CommandQueueMT: Make re-entrant again + Fix multiple flushers case

824287e

RandomShaper force-pushed the make_cmd_queue_reentrant_again branch from 7380544 to 824287e Compare December 9, 2025 15:09

dsnopek approved these changes Dec 9, 2025

View reviewed changes

mihe approved these changes Dec 10, 2025

View reviewed changes

brycehutchings approved these changes Dec 10, 2025

View reviewed changes

Repiteo merged commit fe638bc into godotengine:master Dec 11, 2025
20 checks passed

RandomShaper deleted the make_cmd_queue_reentrant_again branch December 11, 2025 06:19

Uh oh!

CommandQueueMT: Make re-entrant again + Fix multiple flushers case #113802

CommandQueueMT: Make re-entrant again + Fix multiple flushers case #113802

Conversation

RandomShaper commented Dec 9, 2025

Uh oh!

Uh oh!

Uh oh!

brycehutchings commented Dec 9, 2025

Uh oh!

Ivorforce commented Dec 9, 2025

Uh oh!

dsnopek commented Dec 9, 2025

Uh oh!

dsnopek left a comment

Choose a reason for hiding this comment

Uh oh!

RandomShaper commented Dec 10, 2025

Uh oh!

mihe left a comment

Choose a reason for hiding this comment

Uh oh!

brycehutchings Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

mihe Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

brycehutchings Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

brycehutchings Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Repiteo commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants