wiki:LibCSSE/strlen

Version 5 (modified by john, 12 years ago) (diff)

--

strlen

Variants

Name Description
stock MI C version
SSE2 pcmpeqb and pmovmskb
SSE4.2 pcmpestri and pcpmestrm
AVX 128-bit vpcmpeqb and vpmovmskb
ERMS repne scasb for machines with ERMS

Note: clang was too smart and optimized plain strlen calls away, so I had to create a copy of the C version called strlen_mi() to fool it.

Machines Tested

CPU Speed (GHz) Notes
Xeon X5365 3.00 2 x 4 Supermicro X7DBU
Xeon X5482 3.20 2 x 4 Supermicro X7DWN+
Xeon X5675 3.07 Westmere 2 x 6 Supermicro X8DTU
Core i5-2520M 2.50 Sandy Bridge 1 x 4 Thinkpad X220 (4286)
Xeon E5-2680 2.70 Romley 2 x 8 Supermicro X9DRW
Xeon E5-2667 v2 3.30 Romley V2 2 x 8 Supermicro X9DRW

Test Cases

Name Description
page aligned string one page - 1 long
short aligned string 14 characters long
short2 aligned string 32 characters long
short3 aligned string 48 characters long
offset 4 byte offset string 126 characters long
offset2 7 byte offset string 95 characters long

Results

The numbers are the min value in the distribution where the values are a TSC delta across a single invocation of the test.

CPU

Test / Variant

page

short

short2

short3

offset

offset2

stock SSE2 SSSE4.2 AVX ERMS stock SSE2 SSSE4.2 AVX ERMS stock SSE2 SSSE4.2 AVX ERMS stock SSE2 SSSE4.2 AVX ERMS stock SSE2 SSSE4.2 AVX ERMS stock SSE2 SSSE4.2 AVX ERMS
Xeon X5365 1386 864 -- -- 16506 81 72 -- -- 180 90 72 -- -- 252 90 81 -- -- 315 144 108 -- -- 630 135 108 -- -- 504
Xeon X5482 1608 808 -- -- 16464 48 40 -- -- 136 48 40 -- -- 140 56 40 -- -- 264 80 80 -- -- 592 72 72 -- -- 464
Xeon X5675 1592 848 2100 -- 8252 60 24 24 -- 92 32 46 32 -- 124 78 46 83 -- 156 76 64 104 -- 316 64 56 88 -- 256
Core i5-2520M
3525 1950 6463 536 25812 100 75 75 18 300 112 75 87 18 412 112 75 112 21 512 212 100 262 21 1012 175 100 212 24 825
Xeon E5-2680 1496 632 2048 644 8260 36 24 24 28 96 40 24 28 24 132 52 24 32 24 164 80 40 84 24 324 68 36 64 24 264
Xeon E5-2667 v2 1296 632 2076 648 8260 24 24 24 24 100 36 24 28 24 132 48 24 28 28 164 80 44 84 24 324 64 28 64 24 264