= strlen = == Variants == ||= '''Name''' =||= '''Description''' =|| || stock || MI C version || || SSE2 || {{{pcmpeqb}}} and {{{pmovmskb}}} || || SSE4.2 || {{{pcmpestri}}} and {{{pcpmestrm}}} || || AVX || 128-bit {{{vpcmpeqb}}} and {{{vpmovmskb}}} || || ERMS || {{{repne scasb}}} for machines with ERMS || '''Note:''' clang was too smart and optimized plain {{{strlen}}} calls away, so I had to create a copy of the C version called {{{strlen_mi()}}} to fool it. == Machines Tested == ||= '''CPU''' =||= '''Speed (GHz)''' =||= '''Notes''' =|| || AMD FX-8120 || 3.11 || 1 x 8 zoo.freebsd.org || || Intel Xeon X5365 || 3.00 || 2 x 4 Supermicro X7DBU || || Intel Xeon X5482 || 3.20 || 2 x 4 Supermicro X7DWN+ || || Intel Xeon X5675 || 3.07 || Westmere 2 x 6 Supermicro X8DTU || || Intel Core i5-2520M || 2.50 || Sandy Bridge 1 x 4 Thinkpad X220 (4286) || || Intel Core i5-2500K || 3.30 || Sandy Bridge 1 x 4 MSI Z77A-G45 (MS-7752) || || Intel Xeon E5-2680 || 2.70 || Romley 2 x 8 Supermicro X9DRW || || Intel Xeon E5-2667 v2 || 3.30 || Romley V2 2 x 8 Supermicro X9DRW (supports ERMS) || == Test Cases == ||= '''Name''' =||= '''Description''' =|| || page || aligned string one page - 1 long || || short || aligned string 14 characters long || || short2 || aligned string 32 characters long || || short3 || aligned string 48 characters long || || offset || 4 byte offset string 126 characters long || || offset2 || 7 byte offset string 95 characters long || == Results == The numbers are the min value in the distribution where the values are a TSC delta across a single invocation of the test. Bold indicates the lowest time among the given variations in a Test and CPU combination. Green text is used for times faster than the stock implementation, and red text is used for times slower than the stock implementation. {{{#!th rowspan=3 '''CPU''' }}} {{{#!th colspan=30 '''Test / Variant''' }}} |-- {{{#!th colspan=5 '''page''' }}} {{{#!th colspan=5 '''short''' }}} {{{#!th colspan=5 '''short2''' }}} {{{#!th colspan=5 '''short3''' }}} {{{#!th colspan=5 '''offset''' }}} {{{#!th colspan=5 '''offset2''' }}} |-- ||= '''stock''' =||= '''SSE2''' =||= '''SSSE4.2''' =||= '''AVX''' =||= '''ERMS''' =|| \ ||= '''stock''' =||= '''SSE2''' =||= '''SSSE4.2''' =||= '''AVX''' =||= '''ERMS''' =|| \ ||= '''stock''' =||= '''SSE2''' =||= '''SSSE4.2''' =||= '''AVX''' =||= '''ERMS''' =|| \ ||= '''stock''' =||= '''SSE2''' =||= '''SSSE4.2''' =||= '''AVX''' =||= '''ERMS''' =|| \ ||= '''stock''' =||= '''SSE2''' =||= '''SSSE4.2''' =||= '''AVX''' =||= '''ERMS''' =|| \ ||= '''stock''' =||= '''SSE2''' =||= '''SSSE4.2''' =||= '''AVX''' =||= '''ERMS''' =|| || Intel Xeon X5365 || \ || 1386|| '''[[span(864, style=color: green)]]'''|| -- || -- || [[span(16506, style=color:red)]]|| \ || 81|| '''[[span(72, style=color: green)]]'''|| -- || -- || [[span(180, style=color:red)]]|| \ || 90|| '''[[span(72, style=color: green)]]'''|| -- || -- || [[span(252, style=color:red)]]|| \ || 90|| '''[[span(81, style=color: green)]]'''|| -- || -- || [[span(315, style=color:red)]]|| \ || 144|| '''[[span(108, style=color: green)]]'''|| -- || -- || [[span(630, style=color:red)]]|| \ || 135|| '''[[span(108, style=color: green)]]'''|| -- || -- || [[span(504, style=color:red)]]|| || Intel Xeon X5482 || \ || 1608|| '''[[span(808, style=color: green)]]'''|| -- || -- || [[span(16464, style=color:red)]]|| \ || 48|| '''[[span(40, style=color: green)]]'''|| -- || -- || [[span(136, style=color:red)]]|| \ || 48|| '''[[span(40, style=color: green)]]'''|| -- || -- || [[span(140, style=color:red)]]|| \ || 56|| '''[[span(40, style=color: green)]]'''|| -- || -- || [[span(264, style=color:red)]]|| \ || '''80'''|| '''80'''|| -- || -- || [[span(592, style=color:red)]]|| \ || '''72'''|| '''72'''|| -- || -- || [[span(464, style=color:red)]]|| || Intel Xeon X5675 || \ || 1592|| '''[[span(848, style=color: green)]]'''|| [[span(2100, style=color:red)]]|| -- || [[span(8252, style=color:red)]]|| \ || 60|| '''[[span(24, style=color: green)]]'''|| '''[[span(24, style=color: green)]]'''|| -- || [[span(92, style=color:red)]]|| \ || '''32'''|| [[span(46, style=color:red)]]|| '''32'''|| -- || [[span(124, style=color:red)]]|| \ || 78|| '''[[span(46, style=color: green)]]'''|| [[span(83, style=color:red)]]|| -- || [[span(156, style=color:red)]]|| \ || 76|| '''[[span(64, style=color: green)]]'''|| [[span(104, style=color:red)]]|| -- || [[span(316, style=color:red)]]|| \ || 64|| '''[[span(56, style=color: green)]]'''|| [[span(88, style=color:red)]]|| -- || [[span(256, style=color:red)]]|| || Intel Core i5-2520M || \ || 3525|| [[span(1950, style=color: green)]]|| [[span(6463, style=color:red)]]|| '''[[span(536, style=color: green)]]'''|| [[span(25812, style=color:red)]]|| \ || 100|| [[span(75, style=color: green)]]|| [[span(75, style=color: green)]]|| '''[[span(18, style=color: green)]]'''|| [[span(300, style=color:red)]]|| \ || 112|| [[span(75, style=color: green)]]|| [[span(87, style=color: green)]]|| '''[[span(18, style=color: green)]]'''|| [[span(412, style=color:red)]]|| \ || 112|| [[span(75, style=color: green)]]|| 112|| '''[[span(21, style=color: green)]]'''|| [[span(512, style=color:red)]]|| \ || 212|| [[span(100, style=color: green)]]|| [[span(262, style=color:red)]]|| '''[[span(21, style=color: green)]]'''|| [[span(1012, style=color:red)]]|| \ || 175|| [[span(100, style=color: green)]]|| [[span(212, style=color:red)]]|| '''[[span(24, style=color: green)]]'''|| [[span(825, style=color:red)]]|| || Intel Core i5-2500K || \ || 1002|| '''[[span(552, style=color: green)]]'''|| [[span(1893, style=color:red)]]|| [[span(573, style=color: green)]]|| [[span(7350, style=color:red)]]|| \ || 21|| '''[[span(18, style=color: green)]]'''|| '''[[span(18, style=color: green)]]'''|| '''[[span(18, style=color: green)]]'''|| [[span(87, style=color:red)]]|| \ || 33|| '''[[span(18, style=color: green)]]'''|| '''[[span(18, style=color: green)]]'''|| '''[[span(18, style=color: green)]]'''|| [[span(117, style=color:red)]]|| \ || 36|| '''[[span(21, style=color: green)]]'''|| [[span(27, style=color: green)]]|| '''[[span(21, style=color: green)]]'''|| [[span(147, style=color:red)]]|| \ || 57|| [[span(24, style=color: green)]]|| [[span(69, style=color:red)]]|| '''[[span(21, style=color: green)]]'''|| [[span(297, style=color:red)]]|| \ || 45|| '''[[span(21, style=color: green)]]'''|| [[span(54, style=color:red)]]|| [[span(24, style=color: green)]]|| [[span(225, style=color:red)]]|| || Intel Xeon E5-2680 || \ || 1496|| '''[[span(632, style=color: green)]]'''|| [[span(2048, style=color:red)]]|| [[span(644, style=color: green)]]|| [[span(8260, style=color:red)]]|| \ || 36|| '''[[span(24, style=color: green)]]'''|| '''[[span(24, style=color: green)]]'''|| [[span(28, style=color: green)]]|| [[span(96, style=color:red)]]|| \ || 40|| '''[[span(24, style=color: green)]]'''|| [[span(28, style=color: green)]]|| '''[[span(24, style=color: green)]]'''|| [[span(132, style=color:red)]]|| \ || 52|| '''[[span(24, style=color: green)]]'''|| [[span(32, style=color: green)]]|| '''[[span(24, style=color: green)]]'''|| [[span(164, style=color:red)]]|| \ || 80|| [[span(40, style=color: green)]]|| [[span(84, style=color:red)]]|| '''[[span(24, style=color: green)]]'''|| [[span(324, style=color:red)]]|| \ || 68|| [[span(36, style=color: green)]]|| [[span(64, style=color: green)]]|| '''[[span(24, style=color: green)]]'''|| [[span(264, style=color:red)]]|| || Intel Xeon E5-2667 v2 || \ || 1296|| '''[[span(632, style=color: green)]]'''|| [[span(2076, style=color:red)]]|| [[span(648, style=color: green)]]|| [[span(8260, style=color:red)]]|| \ || '''24'''|| '''24'''|| '''24'''|| '''24'''|| [[span(100, style=color:red)]]|| \ || 36|| '''[[span(24, style=color: green)]]'''|| [[span(28, style=color: green)]]|| '''[[span(24, style=color: green)]]'''|| [[span(132, style=color:red)]]|| \ || 48|| '''[[span(24, style=color: green)]]'''|| [[span(28, style=color: green)]]|| [[span(28, style=color: green)]]|| [[span(164, style=color:red)]]|| \ || 80|| [[span(44, style=color: green)]]|| [[span(84, style=color:red)]]|| '''[[span(24, style=color: green)]]'''|| [[span(324, style=color:red)]]|| \ || 64|| [[span(28, style=color: green)]]|| 64|| '''[[span(24, style=color: green)]]'''|| [[span(264, style=color:red)]]|| == Conclusions == - The SSE2 version is generally faster than the stock version. - The SSE4.2 version is generally slower than the SSE2 version. - The AVX version often outperforms the SSE2 version. - It seems that ERMS does not accelerate {{{repne scasb}}}.