Skip to content

Conversation

@RandomShaper
Copy link
Member

Let's see if we finally get a functional and reliable CommandQueueMT.

In order to allow _flush() to be called from multiple threads, we were considering making the mutex recursive. However, even after removing the unlock allowance logic that is indeed not needed here, there's still some condition variable logic that needs the mutex to be binary.

In the end, I've simply added a thread-local variable for the current thread to know if it should attempt to lock.

Moreover, loosely related to re-entrancy, in the case of another thread requesting a flush while another thread performing one, I've added code so the former waits for the latter. Otherwise, we would be breaking the assumption that after calling a flush function all the commands added up to that point have been run.

☣ Disclaimer: untested. ☣

CC @brycehutchings

Fixes #113627.

@RandomShaper RandomShaper requested a review from mihe December 9, 2025 13:32
@RandomShaper RandomShaper requested review from a team as code owners December 9, 2025 13:32
@RandomShaper RandomShaper added bug topic:core cherrypick:4.4 Considered for cherry-picking into a future 4.4.x release cherrypick:4.5 Considered for cherry-picking into a future 4.5.x release labels Dec 9, 2025
@RandomShaper RandomShaper added this to the 4.6 milestone Dec 9, 2025
@akien-mga akien-mga requested a review from dsnopek December 9, 2025 13:38
@RandomShaper RandomShaper force-pushed the make_cmd_queue_reentrant_again branch from 7380544 to 824287e Compare December 9, 2025 15:09
@brycehutchings
Copy link
Contributor

This makes sense to me, and I love that unlock allowance zone isn't needed so this file continues to trend toward being simpler 🥳

@Ivorforce
Copy link
Member

Note that thread_local can be pretty expensive. But it's more important that CommandQueueMT is reliable than that it is performant (plus, I'm not sure if it matters in practice here, since the cost might be less than all the mutexes, context switches and the task itself anyway).

@dsnopek
Copy link
Contributor

dsnopek commented Dec 9, 2025

Thanks! I'll do some testing with multi-threaded rendering in the context of XR

Copy link
Contributor

@dsnopek dsnopek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I some testing of this PR with my XR fork of the GDQuest TPS demo and multi-threaded rendering enabled - it worked fine! I also did some Perfetto traces (on both Samsung Galaxy XR and Meta Quest 3) to make sure that it still looks like it did back on #112506 and that was good too :-)

@RandomShaper
Copy link
Member Author

Note that thread_local can be pretty expensive. But it's more important that CommandQueueMT is reliable than that it is performant (plus, I'm not sure if it matters in practice here, since the cost might be less than all the mutexes, context switches and the task itself anyway).

Good point. I also considered using an atomic flusher thread id plus some compare-and-exchange logic (I'd swear I saw something like that committed recently, but I couldn't find it). However, I agree that either approach is dwarfed by everything else. Not a bad thing to take a deeper look into at some point, though.


@dsnopek, thank you again for testing!

PS: I'm happy we found the right approach in the end, but I have to admit I regret having been so "trigger-happy," which led to so much iteration.

Copy link
Contributor

@mihe mihe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't speak to the code too much, but I can confirm that this does indeed fix #113627 when testing the MRP in #113620, without #113622.


MutexLock lock(mutex);

if (unlikely(flush_read_ptr)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed at all? When I look at how flush_read_ptr interacts with the mutex, it looks like it is always set to 0 before the mutex is released. So any time we get past the mutex being acquired, isn't flush_read_ptr guaranteed to be 0? It seems like it should just flow down to the while loop and run that normally.

This check used to be before the mutex acquire, and there it made sense to avoid the deadlock issue (although it had race condition problems). By moving it to after the mutex, the deadlock issue was introduced, but now the condition should never happen. At least I think so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock guard does temp_unlock() while flush_read_ptr is non-zero, so it's not guaranteed to always be zero when we get past the mutex. See this discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Is there a risk that after lock.temp_unlock() but before the no-op command is queued that the other thread finishes flushing and then this stalls forever?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if so though, in practice this might mean being blocked for one frame?

@Repiteo Repiteo merged commit fe638bc into godotengine:master Dec 11, 2025
20 checks passed
@Repiteo
Copy link
Contributor

Repiteo commented Dec 11, 2025

Thanks!

@RandomShaper RandomShaper deleted the make_cmd_queue_reentrant_again branch December 11, 2025 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug cherrypick:4.4 Considered for cherry-picking into a future 4.4.x release cherrypick:4.5 Considered for cherry-picking into a future 4.5.x release topic:core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CommandQueueMT no longer supports recursive flushing

7 participants