wiki:LibCSSE

Version 1 (modified by john, 12 years ago) (diff)

--

The first routine I worked on was memcpy().

Comparison of stock memcpy() of a single page on FreeBSD vs Linux on a Westmere (values are TSC deltas):

x fbsd/westmere/builtin
+ linux/builtin
    N           Min           Max        Median           Avg        Stddev
x 1000           336         18444           340       361.628     573.11483
+ 1000           276          9996           280       288.924     307.34136
Difference at 95.0% confidence
        -72.704 +/- 40.3074
        -20.1046% +/- 11.1461%
        (Student's t, pooled s = 459.847)
Idea Westmere Sandy Bridge Ivy Bridge
Replace dec with sub none none none
Use movsd instead of movsq slightly slower slightly slower 6% faster
Simple movdqa loop 138% slower 58% slower 46% slower
movdqa 32 at a time (old) 27% slower 14% faster 17% faster
movdqa 32 at a time (new) 27% slower 15% faster 18% faster
movdqa 64 at a time 224% slower 131% slower 116% slower