wiki:LibCSSE/memset

Version 1 (modified by john, 12 years ago) (diff)

--

strlen

Variants

Name Description
stock MD amd64 version {{rep stosq}}
SSE2 movdqu for block-store
SSE2 aligned movaps for aligned block-store and movdqu for unaligned
AVX 128 128-bit vmovdqu for block-store
AVX 256 256-bit vmovdqu for block-store
ERMS repne stosb for machines with ERMS

Machines Tested

CPU Speed (GHz) Notes
AMD FX-8120 3.11 1 x 8 zoo.freebsd.org
AMD Opteron 6328 3.20 2 x 8 Supermicro H8DG6/H8DGi
Intel Xeon X5365 3.00 2 x 4 Supermicro X7DBU
Intel Xeon X5482 3.20 2 x 4 Supermicro X7DWN+
Intel Xeon X5675 3.07 Westmere 2 x 6 Supermicro X8DTU
Intel Core i5-2520M 2.50 Sandy Bridge 1 x 4 Thinkpad X220 (4286)
Intel Core i5-2500K 3.30 Sandy Bridge 1 x 4 MSI Z77A-G45 (MS-7752)
Intel Xeon E5-2680 2.70 Romley 2 x 8 Supermicro X9DRW
Intel Xeon E5-2667 v2 3.30 Romley V2 2 x 8 Supermicro X9DRW (supports ERMS)

Test Cases

Name Description
page set page to 0xa5
short set aligned 15 bytes to 0xa5
short2 set aligned 32 bytes to 0xa5
short3 set aligned 48 bytes to 0xa5
offset set misaligned ( + 4) 128 bytes to 0
offset2 set misaligned ( + 7) 97 bytes to 0

Results

The numbers are the min value in the distribution where the values are a TSC delta across a single invocation of the test.

Bold indicates the lowest time among the given variations in a Test and CPU combination. Green text is used for times faster than the stock implementation, and red text is used for times slower than the stock implementation.

CPU

Test / Variant

page

short

short2

short3

offset

offset2

stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS
AMD FX-8120
AMD Opteron 6328
Intel Xeon X5365
Intel Xeon X5482
Intel Xeon X5675
Intel Core i5-2520M
Intel Core i5-2500K
Intel Xeon E5-2680
Intel Xeon E5-2667 v2

Conclusions