| Version 2 (modified by john, 12 years ago) (diff) |
|---|
strlen
Variants
| Name | Description |
|---|---|
| stock | MD amd64 version {{rep stosq}} |
| SSE2 | movdqu for block-store |
| SSE2 aligned | movaps for aligned block-store and movdqu for unaligned |
| AVX 128 | 128-bit vmovdqu for block-store |
| AVX 256 | 256-bit vmovdqu for block-store |
| ERMS | repne stosb for machines with ERMS |
Machines Tested
| CPU | Speed (GHz) | Notes |
|---|---|---|
| AMD FX-8120 | 3.11 | 1 x 8 zoo.freebsd.org |
| AMD Opteron 6328 | 3.20 | 2 x 8 Supermicro H8DG6/H8DGi |
| Intel Xeon X5365 | 3.00 | 2 x 4 Supermicro X7DBU |
| Intel Xeon X5482 | 3.20 | 2 x 4 Supermicro X7DWN+ |
| Intel Xeon X5675 | 3.07 | Westmere 2 x 6 Supermicro X8DTU |
| Intel Core i5-2520M | 2.50 | Sandy Bridge 1 x 4 Thinkpad X220 (4286) |
| Intel Core i5-2500K | 3.30 | Sandy Bridge 1 x 4 MSI Z77A-G45 (MS-7752) |
| Intel Xeon E5-2680 | 2.70 | Romley 2 x 8 Supermicro X9DRW |
| Intel Xeon E5-2667 v2 | 3.30 | Romley V2 2 x 8 Supermicro X9DRW (supports ERMS) |
Test Cases
| Name | Description |
|---|---|
| page | set page to 0xa5 |
| short | set aligned 15 bytes to 0xa5 |
| short2 | set aligned 32 bytes to 0xa5 |
| short3 | set aligned 48 bytes to 0xa5 |
| offset | set misaligned ( + 4) 128 bytes to 0 |
| offset2 | set misaligned ( + 7) 97 bytes to 0 |
Results
The numbers are the min value in the distribution where the values are a TSC delta across a single invocation of the test.
Bold indicates the lowest time among the given variations in a Test and CPU combination. Green text is used for times faster than the stock implementation, and red text is used for times slower than the stock implementation.
CPU | Test / Variant | |||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
page | short | short2 | short3 | offset | offset2 | |||||||||||||||||||||||||||||||
| stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | stock | SSE2 | SSSE2 aligned | AVX 128 | AVX 256 | ERMS | |
| AMD FX-8120 | 663 | 601 | 404 | 405 | 1886 | 1018 | 49 | 98 | 64 | 64 | 95 | 153 | 49 | 52 | 38 | 39 | 55 | 243 | 49 | 55 | 49 | 43 | 72 | 292 | 51 | 56 | 43 | 44 | 90 | 469 | 52 | 74 | 52 | 49 | 88 | 469 |
| AMD Opteron 6328 | ||||||||||||||||||||||||||||||||||||
| Intel Xeon X5365 | ||||||||||||||||||||||||||||||||||||
| Intel Xeon X5482 | ||||||||||||||||||||||||||||||||||||
| Intel Xeon X5675 | ||||||||||||||||||||||||||||||||||||
| Intel Core i5-2520M | ||||||||||||||||||||||||||||||||||||
| Intel Core i5-2500K | ||||||||||||||||||||||||||||||||||||
| Intel Xeon E5-2680 | ||||||||||||||||||||||||||||||||||||
| Intel Xeon E5-2667 v2 | ||||||||||||||||||||||||||||||||||||
