wiki:LibCSSE/memset

Version 21 (modified by john, 12 years ago) (diff)

--

strlen

Variants

Name Description
stock MD amd64 version {{rep stosq}}
SSE2 movdqu for block-store
SSE2 aligned movaps for aligned block-store and movdqu for unaligned
AVX 128 128-bit vmovdqu for block-store
AVX 256 256-bit vmovdqu for block-store
ERMS repne stosb for machines with ERMS

Note: clang was too smart and inlined all the short memset calls, so I had to create a copy of the amd64 version called memset_stock() to fool it.

Machines Tested

CPU Speed (GHz) Notes
AMD FX-8120 3.11 1 x 8 zoo.freebsd.org
AMD Opteron 6328 3.20 2 x 8 Supermicro H8DG6/H8DGi
Intel Xeon X5365 3.00 2 x 4 Supermicro X7DBU
Intel Xeon X5482 3.20 2 x 4 Supermicro X7DWN+
Intel Xeon X5675 3.07 Westmere 2 x 6 Supermicro X8DTU
Intel Core i5-2520M 2.50 Sandy Bridge 1 x 4 Thinkpad X220 (4286)
Intel Core i5-2500K 3.30 Sandy Bridge 1 x 4 MSI Z77A-G45 (MS-7752)
Intel Xeon E5-2680 2.70 Romley 2 x 8 Supermicro X9DRW
Intel Xeon E5-2667 v2 3.30 Romley V2 2 x 8 Supermicro X9DRW (supports ERMS)

Test Cases

Name Description
page set page to 0xa5
short set aligned 15 bytes to 0xa5
short2 set aligned 32 bytes to 0xa5
short3 set aligned 48 bytes to 0xa5
offset set misaligned ( + 4) 128 bytes to 0
offset2 set misaligned ( + 7) 97 bytes to 0

Results

The numbers are the min value in the distribution where the values are a TSC delta across a single invocation of the test.

Bold indicates the lowest time among the given variations in a Test and CPU combination. Green text is used for times faster than the stock implementation, and red text is used for times slower than the stock implementation.

CPU

Test / Variant

page

short

short2

short3

offset

offset2

stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS stock SSE2 SSSE2 aligned AVX 128 AVX 256 ERMS
AMD FX-8120 1078 987 972 974 3095 1009 157 161 157 157 157 157 188 99 90 97 91 248 203 89 119 95 119 290 265 89 96 97 148 469 221 122 122 120 144 469
AMD Opteron 6328 490 446 454 454 2485 457 108 106 108 108 108 108 126 90 92 92 94 130 128 91 91 96 94 144 148 90 95 93 103 231 137 93 96 96 99 233
Intel Xeon X5365 657 1206 378 -- -- 720 144 144 144 -- -- 144 126 90 90 -- -- 162 126 99 90 -- -- 171 243 135 135 -- -- 252 126 108 108 -- -- 225
Intel Xeon X5482 624 1144 312 -- -- 688 112 112 112 -- -- 112 96 64 64 -- -- 128 96 72 64 -- -- 144 216 120 120 -- -- 224 96 96 96 -- -- 192
Intel Xeon X5675 352 296 300 -- -- 428 100 100 96 -- -- 96 76 44 48 -- -- 120 76 44 48 -- -- 136 208 106 106 -- -- 192 76 52 56 -- -- 160
Intel Core i5-2520M 1812 962 962 950 1400 13100 337 350 350 350 350 337 237 162 162 150 600 400 237 162 162 150 612 450 687 187 187 187 625 700 237 187 187 187 637 612
Intel Core i5-2500K 321 285 285 282 417 411 81 84 84 84 84 81 57 39 39 36 171 96 57 39 39 36 174 135 192 45 45 45 177 180 57 45 45 45 180 156
Intel Xeon E5-2680 356 308 308 304 448 436 108 112 112 112 112 108 76 52 52 48 196 128 76 52 52 52 196 144 220 60 60 60 204 208 76 60 60 60 208 172
Intel Xeon E5-2667 v2 424 344 340 340 484 292 56 60 60 60 60 56 152 84 80 80 224 56 152 84 84 80 228 56 120 96 96 92 236 60 132 64 64 64 208 56

Conclusions