Hi,
We have been seeing slow performance with Cilkplus and clang for tight loops using hyperobjects.
Apparently, the Intel compiler manages to hoist __cilkrts_hyper_lookup() out of the critical loops while clang leaves the call in the inner loop, causing significant performance degradation (we have observed up to 2x).
Will you be looking into this?
Kind regards,
Hans Vandierendonck