| Version 19 (modified by john, 12 years ago) (diff) |
|---|
strlen
Variants
| Name | Description |
|---|---|
| stock | MD amd64 version {{rep stosq}} |
| SSE2 | movdqu for block-store |
| SSE2 aligned | movaps for aligned block-store and movdqu for unaligned |
| AVX 128 | 128-bit vmovdqu for block-store |
| AVX 256 | 256-bit vmovdqu for block-store |
| ERMS | repne stosb for machines with ERMS |
Note: clang was too smart and inlined all the short memset calls, so I had to create a copy of the amd64 version called memset_stock() to fool it.
Machines Tested
| CPU | Speed (GHz) | Notes |
|---|---|---|
| AMD FX-8120 | 3.11 | 1 x 8 zoo.freebsd.org |
| AMD Opteron 6328 | 3.20 | 2 x 8 Supermicro H8DG6/H8DGi |
| Intel Xeon X5365 | 3.00 | 2 x 4 Supermicro X7DBU |
| Intel Xeon X5482 | 3.20 | 2 x 4 Supermicro X7DWN+ |
| Intel Xeon X5675 | 3.07 | Westmere 2 x 6 Supermicro X8DTU |
| Intel Core i5-2520M | 2.50 | Sandy Bridge 1 x 4 Thinkpad X220 (4286) |
| Intel Core i5-2500K | 3.30 | Sandy Bridge 1 x 4 MSI Z77A-G45 (MS-7752) |
| Intel Xeon E5-2680 | 2.70 | Romley 2 x 8 Supermicro X9DRW |
| Intel Xeon E5-2667 v2 | 3.30 | Romley V2 2 x 8 Supermicro X9DRW (supports ERMS) |
Test Cases
| Name | Description |
|---|---|
| page | set page to 0xa5 |
| short | set aligned 15 bytes to 0xa5 |
| short2 | set aligned 32 bytes to 0xa5 |
| short3 | set aligned 48 bytes to 0xa5 |
| offset | set misaligned ( + 4) 128 bytes to 0 |
| offset2 | set misaligned ( + 7) 97 bytes to 0 |
Results
The numbers are the min value in the distribution where the values are a TSC delta across a single invocation of the test.
Bold indicates the lowest time among the given variations in a Test and CPU combination. Green text is used for times faster than the stock implementation, and red text is used for times slower than the stock implementation.
CPU | Test / Variant | |||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
page | short | short2 | short3 | offset | offset2 | |||||||||||||||||||||||||||||||
| stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | |
| AMD FX-8120 | 1078 | 987 | 972 | 974 | 3095 | 1009 | 157 | 161 | 157 | 157 | 157 | 157 | 188 | 99 | 90 | 97 | 91 | 248 | 203 | 89 | 119 | 95 | 119 | 290 | 265 | 89 | 96 | 97 | 148 | 469 | 221 | 122 | 122 | 120 | 144 | 469 |
| AMD Opteron 6328 | 490 | 446 | 454 | 454 | 2485 | 457 | 108 | 106 | 108 | 108 | 108 | 108 | 126 | 90 | 92 | 92 | 94 | 130 | 128 | 91 | 91 | 96 | 94 | 144 | 148 | 90 | 95 | 93 | 103 | 231 | 137 | 93 | 96 | 96 | 99 | 233 |
| Intel Xeon X5365 | 657 | 1206 | 378 | -- | -- | 720 | 144 | 144 | 144 | -- | -- | 144 | 126 | 90 | 90 | -- | -- | 162 | 126 | 99 | 90 | -- | -- | 171 | 243 | 135 | 135 | -- | -- | 252 | 126 | 108 | 108 | -- | -- | 225 |
| Intel Xeon X5482 | 624 | 1144 | 312 | -- | -- | 688 | 112 | 112 | 112 | -- | -- | 112 | 96 | 64 | 64 | -- | -- | 128 | 96 | 72 | 64 | -- | -- | 144 | 216 | 120 | 120 | -- | -- | 224 | 96 | 96 | 96 | -- | -- | 192 |
| Intel Xeon X5675 | 352 | 296 | 300 | -- | -- | 428 | 100 | 100 | 96 | -- | -- | 96 | 76 | 44 | 48 | -- | -- | 120 | 76 | 44 | 48 | -- | -- | 136 | 208 | 106 | 106 | -- | -- | 192 | 76 | 52 | 56 | -- | -- | 160 |
| Intel Core i5-2520M | 1812 | 962 | 962 | 950 | 1400 | 13100 | 337 | 350 | 350 | 350 | 350 | 337 | 237 | 162 | 162 | 150 | 600 | 400 | 237 | 162 | 162 | 150 | 612 | 450 | 687 | 187 | 187 | 187 | 625 | 700 | 237 | 187 | 187 | 187 | 637 | 612 |
| Intel Core i5-2500K | 321 | 285 | 285 | 282 | 417 | 411 | 81 | 84 | 84 | 84 | 84 | 81 | 57 | 39 | 39 | 36 | 171 | 96 | 57 | 39 | 39 | 36 | 174 | 135 | 192 | 45 | 45 | 45 | 177 | 180 | 57 | 45 | 45 | 45 | 180 | 156 |
| Intel Xeon E5-2680 | 356 | 308 | 308 | 304 | 452 | 440 | 28 | 112 | 112 | 112 | 112 | 112 | 28 | 52 | 52 | 48 | 196 | 132 | 28 | 52 | 52 | 48 | 196 | 148 | 56 | 60 | 60 | 60 | 200 | 204 | 52 | 60 | 60 | 60 | 204 | 176 |
| Intel Xeon E5-2667 v2 | 428 | 344 | 344 | 340 | 494 | 292 | 24 | 60 | 60 | 60 | 60 | 60 | 24 | 84 | 84 | 84 | 228 | 60 | 24 | 84 | 88 | 84 | 228 | 56 | 52 | 96 | 96 | 92 | 232 | 60 | 48 | 64 | 64 | 64 | 208 | 60 |
