On Mon, Jul 25, 2011 at 5:04 AM, Maurizio De Cecco wrote:
I'm getting SIMD instructions when I compile. However, you have two
things slowing you down:
- The calculations for the for(;;) loop is slowing you down with
- You're only using one xmm register, so you're getting some memory slowdowns.
Both of these can be solved by having gcc unroll your loops for you
(recompile with -funroll-loops).
In addition, you're handling 3 buffers at a time. bufc[k] = bufa[k] *
bufb[k]. You might be able to speed it up a little by converting the
memcopy(bufc, bufa, N*sizeof(float));