| Version 8 (modified by john, 12 years ago) (diff) |
|---|
strlen
Variants
| Name | Description |
|---|---|
| stock | MD amd64 version {{rep stosq}} |
| SSE2 | movdqu for block-store |
| SSE2 aligned | movaps for aligned block-store and movdqu for unaligned |
| AVX 128 | 128-bit vmovdqu for block-store |
| AVX 256 | 256-bit vmovdqu for block-store |
| ERMS | repne stosb for machines with ERMS |
Machines Tested
| CPU | Speed (GHz) | Notes |
|---|---|---|
| AMD FX-8120 | 3.11 | 1 x 8 zoo.freebsd.org |
| AMD Opteron 6328 | 3.20 | 2 x 8 Supermicro H8DG6/H8DGi |
| Intel Xeon X5365 | 3.00 | 2 x 4 Supermicro X7DBU |
| Intel Xeon X5482 | 3.20 | 2 x 4 Supermicro X7DWN+ |
| Intel Xeon X5675 | 3.07 | Westmere 2 x 6 Supermicro X8DTU |
| Intel Core i5-2520M | 2.50 | Sandy Bridge 1 x 4 Thinkpad X220 (4286) |
| Intel Core i5-2500K | 3.30 | Sandy Bridge 1 x 4 MSI Z77A-G45 (MS-7752) |
| Intel Xeon E5-2680 | 2.70 | Romley 2 x 8 Supermicro X9DRW |
| Intel Xeon E5-2667 v2 | 3.30 | Romley V2 2 x 8 Supermicro X9DRW (supports ERMS) |
Test Cases
| Name | Description |
|---|---|
| page | set page to 0xa5 |
| short | set aligned 15 bytes to 0xa5 |
| short2 | set aligned 32 bytes to 0xa5 |
| short3 | set aligned 48 bytes to 0xa5 |
| offset | set misaligned ( + 4) 128 bytes to 0 |
| offset2 | set misaligned ( + 7) 97 bytes to 0 |
Results
The numbers are the min value in the distribution where the values are a TSC delta across a single invocation of the test.
Bold indicates the lowest time among the given variations in a Test and CPU combination. Green text is used for times faster than the stock implementation, and red text is used for times slower than the stock implementation.
CPU | Test / Variant | |||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
page | short | short2 | short3 | offset | offset2 | |||||||||||||||||||||||||||||||
| stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | |
| AMD FX-8120 | 663 | 601 | 404 | 405 | 1886 | 1018 | 49 | 98 | 64 | 64 | 95 | 153 | 49 | 52 | 38 | 39 | 55 | 243 | 49 | 55 | 49 | 43 | 72 | 292 | 51 | 56 | 43 | 44 | 90 | 469 | 52 | 74 | 52 | 49 | 88 | 469 |
| AMD Opteron 6328 | 482 | 443 | 424 | 461 | 2454 | 449 | 69 | 106 | 106 | 106 | 106 | 106 | 68 | 87 | 88 | 86 | 87 | 128 | 66 | 90 | 92 | 90 | 89 | 151 | 102 | 89 | 95 | 95 | 99 | 230 | 104 | 92 | 93 | 93 | 98 | 226 |
| Intel Xeon X5365 | 657 | 1197 | 378 | -- | -- | 720 | 63 | 144 | 144 | -- | -- | 144 | 63 | 90 | 90 | -- | -- | 162 | 63 | 99 | 90 | -- | -- | 171 | 243 | 135 | 135 | -- | -- | 252 | 126 | 108 | 117 | -- | -- | 225 |
| Intel Xeon X5482 | 624 | 1144 | 312 | -- | -- | 696 | 32 | 112 | 112 | -- | -- | 112 | 32 | 64 | 64 | -- | -- | 128 | 32 | 72 | 64 | -- | -- | 144 | 56 | 120 | 120 | -- | -- | 224 | 56 | 96 | 104 | -- | -- | 200 |
| Intel Xeon X5675 | 352 | 296 | 300 | -- | -- | 428 | 24 | 100 | 100 | -- | -- | 96 | 46 | 83 | 92 | -- | -- | 120 | 24 | 83 | 48 | -- | -- | 136 | 99 | 56 | 106 | -- | -- | 192 | 48 | 99 | 106 | -- | -- | 160 |
| Intel Core i5-2520M | ||||||||||||||||||||||||||||||||||||
| Intel Core i5-2500K | ||||||||||||||||||||||||||||||||||||
| Intel Xeon E5-2680 | 356 | 308 | 308 | 304 | 452 | 440 | 28 | 112 | 112 | 112 | 112 | 112 | 28 | 52 | 52 | 48 | 196 | 132 | 28 | 52 | 52 | 48 | 196 | 148 | 56 | 60 | 60 | 60 | 200 | 204 | 52 | 60 | 60 | 60 | 204 | 176 |
| Intel Xeon E5-2667 v2 | 428 | 344 | 344 | 340 | 494 | 292 | 24 | 60 | 60 | 60 | 60 | 60 | 24 | 84 | 84 | 84 | 228 | 60 | 24 | 84 | 88 | 84 | 228 | 56 | 52 | 96 | 96 | 92 | 232 | 60 | 48 | 64 | 64 | 64 | 208 | 60 |
