bench(fmha): cp.async vs TMA bulk microbench — refutes Tier-2 LDGSTS→TMA lever #169
background
wait
wait-all
cancel
Loading