Skip to content

Running with threaded guest apps stalls on futex(FUTEX_CLOCK_REALTIME) but it feels like it shouldn't. #21

@neil-rti

Description

@neil-rti

I'm using the prescribed QEMU branch (78385...) and have applied the Cannoli patches w/o error -- Cannoli works great on single-threaded guest applications. But -- running any guest apps that use threads will result in QEMU getting hung on a futex() call (see below).

2 terminals:

  1. (in Cannoli coverage or symbolizer example directory): cargo run --release
  2. QEMU_STRACE=1 QEMU_CANNOLI=<path/to/cannoli>/release/libcoverage.so <path/to/qemu>build/qemu-x86_64 app-to-run

I've also tried this with the libjitter_always.so option with symbolizer.

The QEMU terminal stalls with trace of:

<snip>
368333 rt_sigprocmask(SIG_BLOCK,0x0000000000593d00,0x00002aaaab2aad90,8) = 0
368333 Unknown syscall 435
368333 clone(CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,child_stack=0x00002aaaad2b4230,parent_tidptr=0x00002aaaad2b4910,tls=0x00002aaaad2b4640,child_tidptr=0x00002aaaad2b4910)368333 set_robust_list(0x2aaaacab2920,24) = -1 errno=38 (Function not implemented)
368333 rt_sigprocmask(SIG_SETMASK,0x00002aaaacab2f30,NULL,8) = 0
368333 set_robust_list(0x2aaaad2b4920,24) = -1 errno=38 (Function not implemented)
368333 rt_sigprocmask(SIG_SETMASK,0x00002aaaad2b4f30,NULL,8) = 0
 = 368344
368333 rt_sigprocmask(SIG_SETMASK,0x00002aaaab2aad90,NULL,8) = 0
368333 write(1,0x60bef0,45)main waits for threads to complete. gVar = 0
 = 45
368333 futex(0x00002aaaabaae910,FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET,368341,NULL,NULL,0)Killed

and the Cannoli terminal shows (for coverage) when hung:

cov      11180 | /home/devl/q/testapps/t1/thread_app_x86_64!main+0x109
cov      11181 | /home/devl/q/testapps/t1/thread_app_x86_64!std::thread::join()+0x0
<snip 6>
cov      11188 | /home/devl/q/testapps/t1/thread_app_x86_64!std::thread::join()+0x12
cov      11189 | /home/devl/q/testapps/t1/thread_app_x86_64!pthread_join+0x0
<snip 3>
cov      11193 | /home/devl/q/testapps/t1/thread_app_x86_64!pthread_join+0xe
cov      11194 | /home/devl/q/testapps/t1/thread_app_x86_64!__pthread_clockjoin_ex+0x0
<snip 38>
cov      11232 | /home/devl/q/testapps/t1/thread_app_x86_64!__pthread_clockjoin_ex+0x118
cov      11233 | /home/devl/q/testapps/t1/thread_app_x86_64!_pthread_cleanup_push+0x0
<snip 5>
cov      11233 | /home/devl/q/testapps/t1/thread_app_x86_64!_pthread_cleanup_push+0x21
cov      11233 | /home/devl/q/testapps/t1/thread_app_x86_64!__pthread_clockjoin_ex+0x11d
<snip 7>
cov      11241 | /home/devl/q/testapps/t1/thread_app_x86_64!__pthread_clockjoin_ex+0x12f
cov      11242 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0x0
<snip 22>
cov      11265 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xc8
cov      11266 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xcb
cov      11267 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xce
cov      11268 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xd1
cov      11269 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xd5
cov      11270 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xda
cov      11271 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xe0
cov      11272 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xe5

The guest application code at this hangpoint disassembles to:

  54baf8:	45 31 c0             	xor    %r8d,%r8d
  54bafb:	44 89 ea             	mov    %r13d,%edx
  54bafe:	41 89 c4             	mov    %eax,%r12d
  54bb01:	8b 74 24 0c          	mov    0xc(%rsp),%esi
  54bb05:	48 8b 7c 24 10       	mov    0x10(%rsp),%rdi
  54bb0a:	41 b9 ff ff ff ff    	mov    $0xffffffff,%r9d
  54bb10:	b8 ca 00 00 00       	mov    $0xca,%eax
  54bb15:	0f 05                	syscall 

while the callstack for QEMU is at:

#0  safe_syscall_base () at ../common-user/host/x86_64/safe-syscall.inc.S:75
#1  0x0000585c53916d63 in safe_futex (val3=-1, uaddr2=0x0, timeout=<optimized out>, val=368434, op=265, uaddr=<optimized out>) at ../linux-user/syscall.c:678
#2  do_safe_futex (val3=-1, uaddr2=0x0, timeout=<optimized out>, val=368434, op=265, uaddr=<optimized out>) at ../linux-user/syscall.c:7857
#3  do_futex (time64=false, cpu=<optimized out>, val3=<optimized out>, uaddr2=0, timeout=<optimized out>, val=<optimized out>, op=<optimized out>, uaddr=46912512911632) at ../linux-user/syscall.c:7944
#4  do_syscall1 (cpu_env=<optimized out>, num=<optimized out>, arg1=46912512911632, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=4294967295, arg8=<optimized out>, arg7=<optimized out>)
    at ../linux-user/syscall.c:12990
#5  0x0000585c5391a653 in do_syscall (cpu_env=cpu_env@entry=0x585c7c51dba0, num=202, arg1=46912512911632, arg2=265, arg3=368434, arg4=0, arg5=0, arg6=4294967295, arg7=0, arg8=0) at ../linux-user/syscall.c:13894
#6  0x0000585c5386baf4 in cpu_loop (env=env@entry=0x585c7c51dba0) at ../linux-user/x86_64/../i386/cpu_loop.c:242
#7  0x0000585c53867835 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../linux-user/main.c:1160

These applications run successfully in QEMU when Cannoli is not connected, or when run natively on Linux.
This was tested on x86_64 Ubuntu 22, but I've also tried this on Ubuntu 20 and 24 -- same results.
I've tried this with symbolizer and coverage clients in Cannoli -- same lockup with both.

I'm pretty sure this should work -- the TLS debug video by gamozolabs on YT was helpful, it was running VLC with Cannoli coverage and wasn't stalling on futex(), but when I try running VLC in the same way it stops in the same place as above: futex(FUTEX_CLOCK_REALTIME).
Note that other futex() calls that don't use CLOCK_REALTIME seem to work as expected.

Can anyone verify that this trouble happens on their system, or can any experienced eyes offer any pointers to a solution?
Race condition? Something clobbering the mutex? Borked-up patch job?
This should work, dammit. TIA.

If it's of any help, here's a small guest test app that I used to collect the above trace/stack info:
It runs as expected under QEMU (tested for x86_64, aarch64, and arm) and native, but hangs as above when Cannoli is used.

#include <iostream>
#include <thread>
#include <vector>
#include <string.h>

// global to update
int32_t gVar = 0;

// Function to be executed by each thread
void threadfunc(int id) {
    gVar += id;
}

int main() {
    const int num_threads = 4;  // Number of threads to spawn
    std::vector<std::thread> threads;  // Vector to hold threads

    std::cout << "Hello from main!" << std::endl;

    // Create and launch threads
    for (int i = 0; i < num_threads; ++i) {
        threads.push_back(std::thread(threadfunc, i));  // Launch each thread
    }

    std::cout << "main waits for threads to complete. gVar = " << gVar << std::endl;

    // Join the threads with the main thread
    for (auto& t : threads) {
        t.join();
    }

    std::cout << "All threads completed!  gVar = " << gVar << std::endl;
    return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions