I'm using the prescribed QEMU branch (78385...) and have applied the Cannoli patches w/o error -- Cannoli works great on single-threaded guest applications. But -- running any guest apps that use threads will result in QEMU getting hung on a futex() call (see below).
2 terminals:
- (in Cannoli coverage or symbolizer example directory): cargo run --release
- QEMU_STRACE=1 QEMU_CANNOLI=<path/to/cannoli>/release/libcoverage.so <path/to/qemu>build/qemu-x86_64 app-to-run
I've also tried this with the libjitter_always.so option with symbolizer.
The QEMU terminal stalls with trace of:
<snip>
368333 rt_sigprocmask(SIG_BLOCK,0x0000000000593d00,0x00002aaaab2aad90,8) = 0
368333 Unknown syscall 435
368333 clone(CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,child_stack=0x00002aaaad2b4230,parent_tidptr=0x00002aaaad2b4910,tls=0x00002aaaad2b4640,child_tidptr=0x00002aaaad2b4910)368333 set_robust_list(0x2aaaacab2920,24) = -1 errno=38 (Function not implemented)
368333 rt_sigprocmask(SIG_SETMASK,0x00002aaaacab2f30,NULL,8) = 0
368333 set_robust_list(0x2aaaad2b4920,24) = -1 errno=38 (Function not implemented)
368333 rt_sigprocmask(SIG_SETMASK,0x00002aaaad2b4f30,NULL,8) = 0
= 368344
368333 rt_sigprocmask(SIG_SETMASK,0x00002aaaab2aad90,NULL,8) = 0
368333 write(1,0x60bef0,45)main waits for threads to complete. gVar = 0
= 45
368333 futex(0x00002aaaabaae910,FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET,368341,NULL,NULL,0)Killed
and the Cannoli terminal shows (for coverage) when hung:
cov 11180 | /home/devl/q/testapps/t1/thread_app_x86_64!main+0x109
cov 11181 | /home/devl/q/testapps/t1/thread_app_x86_64!std::thread::join()+0x0
<snip 6>
cov 11188 | /home/devl/q/testapps/t1/thread_app_x86_64!std::thread::join()+0x12
cov 11189 | /home/devl/q/testapps/t1/thread_app_x86_64!pthread_join+0x0
<snip 3>
cov 11193 | /home/devl/q/testapps/t1/thread_app_x86_64!pthread_join+0xe
cov 11194 | /home/devl/q/testapps/t1/thread_app_x86_64!__pthread_clockjoin_ex+0x0
<snip 38>
cov 11232 | /home/devl/q/testapps/t1/thread_app_x86_64!__pthread_clockjoin_ex+0x118
cov 11233 | /home/devl/q/testapps/t1/thread_app_x86_64!_pthread_cleanup_push+0x0
<snip 5>
cov 11233 | /home/devl/q/testapps/t1/thread_app_x86_64!_pthread_cleanup_push+0x21
cov 11233 | /home/devl/q/testapps/t1/thread_app_x86_64!__pthread_clockjoin_ex+0x11d
<snip 7>
cov 11241 | /home/devl/q/testapps/t1/thread_app_x86_64!__pthread_clockjoin_ex+0x12f
cov 11242 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0x0
<snip 22>
cov 11265 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xc8
cov 11266 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xcb
cov 11267 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xce
cov 11268 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xd1
cov 11269 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xd5
cov 11270 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xda
cov 11271 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xe0
cov 11272 | /home/devl/q/testapps/t1/thread_app_x86_64!__futex_abstimed_wait_cancelable64+0xe5
The guest application code at this hangpoint disassembles to:
54baf8: 45 31 c0 xor %r8d,%r8d
54bafb: 44 89 ea mov %r13d,%edx
54bafe: 41 89 c4 mov %eax,%r12d
54bb01: 8b 74 24 0c mov 0xc(%rsp),%esi
54bb05: 48 8b 7c 24 10 mov 0x10(%rsp),%rdi
54bb0a: 41 b9 ff ff ff ff mov $0xffffffff,%r9d
54bb10: b8 ca 00 00 00 mov $0xca,%eax
54bb15: 0f 05 syscall
while the callstack for QEMU is at:
#0 safe_syscall_base () at ../common-user/host/x86_64/safe-syscall.inc.S:75
#1 0x0000585c53916d63 in safe_futex (val3=-1, uaddr2=0x0, timeout=<optimized out>, val=368434, op=265, uaddr=<optimized out>) at ../linux-user/syscall.c:678
#2 do_safe_futex (val3=-1, uaddr2=0x0, timeout=<optimized out>, val=368434, op=265, uaddr=<optimized out>) at ../linux-user/syscall.c:7857
#3 do_futex (time64=false, cpu=<optimized out>, val3=<optimized out>, uaddr2=0, timeout=<optimized out>, val=<optimized out>, op=<optimized out>, uaddr=46912512911632) at ../linux-user/syscall.c:7944
#4 do_syscall1 (cpu_env=<optimized out>, num=<optimized out>, arg1=46912512911632, arg2=<optimized out>, arg3=<optimized out>, arg4=0, arg5=0, arg6=4294967295, arg8=<optimized out>, arg7=<optimized out>)
at ../linux-user/syscall.c:12990
#5 0x0000585c5391a653 in do_syscall (cpu_env=cpu_env@entry=0x585c7c51dba0, num=202, arg1=46912512911632, arg2=265, arg3=368434, arg4=0, arg5=0, arg6=4294967295, arg7=0, arg8=0) at ../linux-user/syscall.c:13894
#6 0x0000585c5386baf4 in cpu_loop (env=env@entry=0x585c7c51dba0) at ../linux-user/x86_64/../i386/cpu_loop.c:242
#7 0x0000585c53867835 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../linux-user/main.c:1160
These applications run successfully in QEMU when Cannoli is not connected, or when run natively on Linux.
This was tested on x86_64 Ubuntu 22, but I've also tried this on Ubuntu 20 and 24 -- same results.
I've tried this with symbolizer and coverage clients in Cannoli -- same lockup with both.
I'm pretty sure this should work -- the TLS debug video by gamozolabs on YT was helpful, it was running VLC with Cannoli coverage and wasn't stalling on futex(), but when I try running VLC in the same way it stops in the same place as above: futex(FUTEX_CLOCK_REALTIME).
Note that other futex() calls that don't use CLOCK_REALTIME seem to work as expected.
Can anyone verify that this trouble happens on their system, or can any experienced eyes offer any pointers to a solution?
Race condition? Something clobbering the mutex? Borked-up patch job?
This should work, dammit. TIA.
If it's of any help, here's a small guest test app that I used to collect the above trace/stack info:
It runs as expected under QEMU (tested for x86_64, aarch64, and arm) and native, but hangs as above when Cannoli is used.
#include <iostream>
#include <thread>
#include <vector>
#include <string.h>
// global to update
int32_t gVar = 0;
// Function to be executed by each thread
void threadfunc(int id) {
gVar += id;
}
int main() {
const int num_threads = 4; // Number of threads to spawn
std::vector<std::thread> threads; // Vector to hold threads
std::cout << "Hello from main!" << std::endl;
// Create and launch threads
for (int i = 0; i < num_threads; ++i) {
threads.push_back(std::thread(threadfunc, i)); // Launch each thread
}
std::cout << "main waits for threads to complete. gVar = " << gVar << std::endl;
// Join the threads with the main thread
for (auto& t : threads) {
t.join();
}
std::cout << "All threads completed! gVar = " << gVar << std::endl;
return 0;
}
I'm using the prescribed QEMU branch (78385...) and have applied the Cannoli patches w/o error -- Cannoli works great on single-threaded guest applications. But -- running any guest apps that use threads will result in QEMU getting hung on a futex() call (see below).
2 terminals:
I've also tried this with the libjitter_always.so option with symbolizer.
The QEMU terminal stalls with trace of:
and the Cannoli terminal shows (for coverage) when hung:
The guest application code at this hangpoint disassembles to:
while the callstack for QEMU is at:
These applications run successfully in QEMU when Cannoli is not connected, or when run natively on Linux.
This was tested on x86_64 Ubuntu 22, but I've also tried this on Ubuntu 20 and 24 -- same results.
I've tried this with symbolizer and coverage clients in Cannoli -- same lockup with both.
I'm pretty sure this should work -- the TLS debug video by gamozolabs on YT was helpful, it was running VLC with Cannoli coverage and wasn't stalling on futex(), but when I try running VLC in the same way it stops in the same place as above: futex(FUTEX_CLOCK_REALTIME).
Note that other futex() calls that don't use CLOCK_REALTIME seem to work as expected.
Can anyone verify that this trouble happens on their system, or can any experienced eyes offer any pointers to a solution?
Race condition? Something clobbering the mutex? Borked-up patch job?
This should work, dammit. TIA.
If it's of any help, here's a small guest test app that I used to collect the above trace/stack info:
It runs as expected under QEMU (tested for x86_64, aarch64, and arm) and native, but hangs as above when Cannoli is used.