wiki:LibCSSE/strlen

Version 3 (modified by john, 12 years ago) (diff)

--

strlen

Variants

Name Description
stock MI C version
SSE2 pcmpeqb and pmovmskb
SSE4.2 pcmpestri and pcpmestrm
AVX 128-bit vpcmpeqb and vpmovmskb
ERMS repne scasb for machines with ERMS

Note: clang was too smart and optimized plain strlen calls away, so I had to create a copy of the C version called strlen_mi() to fool it.

Machines Tested

CPU Speed (GHz) Notes
Xeon X5365 3.00 2 x 4 Supermicro X7DBU
Xeon X5482 3.20 2 x 4 Supermicro X7DWN+
Xeon X5675 3.07 Westmere 2 x 6 Supermicro X8DTU
Core i5-2520M 2.50 Sandy Bridge 1 x 4 Thinkpad X220 (4286)
Xeon E5-2680 2.70 Romley 2 x 8 Supermicro X9DRW
Xeon E5-2667 v2 3.30 Romley V2 2 x 8 Supermicro X9DRW

Test Cases

Name Description
page aligned string one page - 1 long
short aligned string 14 characters long
short2 aligned string 32 characters long
short3 aligned string 48 characters long
offset 4 byte offset string 126 characters long
offset2 7 byte offset string 95 characters long

Results

The numbers are the min value in the distribution where the values are a TSC delta across a single invocation of the test.

CPU

Test / Variant

page

short

short2

short3

offset

offset2

stock SSE2 SSSE4.2 AVX ERMS stock SSE2 SSSE4.2 AVX ERMS stock SSE2 SSSE4.2 AVX ERMS stock SSE2 SSSE4.2 AVX ERMS stock SSE2 SSSE4.2 AVX ERMS stock SSE2 SSSE4.2 AVX ERMS
Xeon X5365 1386 864 -- -- 16506 81 72 -- -- 180 90 72 -- -- 252 90 81 -- -- 315 144 108 -- -- 630 135 108 -- -- 504
Xeon X5482 1608 808 -- -- 16464 48 40 -- -- 136 48 40 -- -- 140 56 40 -- -- 264 80 80 -- -- 592 72 72 -- -- 464