| Version 1 (modified by john, 12 years ago) (diff) |
|---|
strlen
Variants
| Name | Description |
|---|---|
| stock | MI C version |
| SSE2 | pcmpeqb and pmovmskb |
| SSE4.2 | pcmpestri and pcpmestrm |
| AVX | 128-bit vpcmpeqb and vpmovmskb |
| ERMS | repne scasb for machines with ERMS |
Note: clang was too smart and optimized plain strlen calls away, so I had to create a copy of the C version called strlen_mi() to fool it.
Machines Tested
| CPU | Speed (GHz) | Notes |
|---|---|---|
| Xeon X5365 | 3.00 | 2 x 4 Supermicro X7DBU |
| Xeon X5482 | 3.20 | 2 x 4 Supermicro X7DWN+ |
| Xeon X5675 | 3.07 | Westmere 2 x 6 Supermicro X8DTU |
| Core i5-2520M | 2.50 | Sandy Bridge 1 x 4 Thinkpad X220 (4286) |
| Xeon E5-2680 | 2.70 | Romley 2 x 8 Supermicro X9DRW |
| Xeon E5-2667 v2 | 3.30 | Romley V2 2 x 8 Supermicro X9DRW |
Test Cases
| Name | Description |
|---|---|
| page | aligned string one page - 1 long |
| short | aligned string 14 characters long |
| short2 | aligned string 32 characters long |
| short3 | aligned string 48 characters long |
| offset | 4 byte offset string 126 characters long |
| offset2 | 7 byte offset string 95 characters long |
Results
The numbers are the min value in the distribution where the values are a TSC delta across a single invocation of the test.
CPU | Test / Variant | |||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
page | short | short2 | short3 | offset | offset2 | |||||||||||||||||||||||||
| stock | SSE2 | SSSE4.2 | AVX | ERMS | stock | SSE2 | SSSE4.2 | AVX | ERMS | stock | SSE2 | SSSE4.2 | AVX | ERMS | stock | SSE2 | SSSE4.2 | AVX | ERMS | stock | SSE2 | SSSE4.2 | AVX | ERMS | stock | SSE2 | SSSE4.2 | AVX | ERMS | |
