The add/sub benchmark fails to achieve max throughput on Skylake when running single-threaded. Figure out why and fix it.