wiki:LibCSSE/memset

Version 14 (modified by john, 12 years ago) (diff)

--

strlen

Variants

Name Description
stock MD amd64 version {{rep stosq}}
SSE2 movdqu for block-store
SSE2 aligned movaps for aligned block-store and movdqu for unaligned
AVX 128 128-bit vmovdqu for block-store
AVX 256 256-bit vmovdqu for block-store
ERMS repne stosb for machines with ERMS

Note: clang was too smart and inlined all the short memset calls, so I had to create a copy of the amd64 version called memset_stock() to fool it.

Machines Tested

CPU Speed (GHz) Notes
AMD FX-8120 3.11 1 x 8 zoo.freebsd.org
AMD Opteron 6328 3.20 2 x 8 Supermicro H8DG6/H8DGi
Intel Xeon X5365 3.00 2 x 4 Supermicro X7DBU
Intel Xeon X5482 3.20 2 x 4 Supermicro X7DWN+
Intel Xeon X5675 3.07 Westmere 2 x 6 Supermicro X8DTU
Intel Core i5-2520M 2.50 Sandy Bridge 1 x 4 Thinkpad X220 (4286)
Intel Core i5-2500K 3.30 Sandy Bridge 1 x 4 MSI Z77A-G45 (MS-7752)
Intel Xeon E5-2680 2.70 Romley 2 x 8 Supermicro X9DRW
Intel Xeon E5-2667 v2 3.30 Romley V2 2 x 8 Supermicro X9DRW (supports ERMS)

Test Cases

Name Description
page set page to 0xa5
short set aligned 15 bytes to 0xa5
short2 set aligned 32 bytes to 0xa5
short3 set aligned 48 bytes to 0xa5
offset set misaligned ( + 4) 128 bytes to 0
offset2 set misaligned ( + 7) 97 bytes to 0

Results

The numbers are the min value in the distribution where the values are a TSC delta across a single invocation of the test.

Bold indicates the lowest time among the given variations in a Test and CPU combination. Green text is used for times faster than the stock implementation, and red text is used for times slower than the stock implementation.

CPU

Test / Variant

page

short

short2

short3

offset

offset2

stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS
AMD FX-8120 1078 987 972 974 3095 1009 157 161 157 157 157 157 188 99 90 97 91 248 203 89 119 95 119 290 265 89 96 97 148 469 221 122 122 120 144 469
AMD Opteron 6328 482 443 424 461 2454 449 69 106 106 106 106 106 68 87 88 86 87 128 66 90 92 90 89 151 102 89 95 95 99 230 104 92 93 93 98 226
Intel Xeon X5365 657 1206 378 -- -- 720 144 144 144 -- -- 144 126 90 90 -- -- 162 126 99 90 -- -- 171 243 135 135 -- -- 252 126 108 108 -- -- 225
Intel Xeon X5482 624 1144 312 -- -- 696 32 112 112 -- -- 112 32 64 64 -- -- 128 32 72 64 -- -- 144 56 120 120 -- -- 224 56 96 104 -- -- 200
Intel Xeon X5675 352 296 300 -- -- 428 24 100 100 -- -- 96 46 83 92 -- -- 120 24 83 48 -- -- 136 99 56 106 -- -- 192 48 99 106 -- -- 160
Intel Core i5-2520M 1812 962 962 950 1400 13100 87 350 350 350 350 337 87 162 162 150 600 400 87 162 162 150 562 450 87 187 187 187 625 700 87 187 187 187 637 612
Intel Core i5-2500K 734 635 635 627 924 907 222 231 231 231 231 222 156 107 107 99 396 264 156 107 107 99 404 297 453 123 123 123 412 420 156 123 123 123 420 354
Intel Xeon E5-2680 356 308 308 304 452 440 28 112 112 112 112 112 28 52 52 48 196 132 28 52 52 48 196 148 56 60 60 60 200 204 52 60 60 60 204 176
Intel Xeon E5-2667 v2 428 344 344 340 494 292 24 60 60 60 60 60 24 84 84 84 228 60 24 84 88 84 228 56 52 96 96 92 232 60 48 64 64 64 208 60

Conclusions