{"id":1249,"date":"2011-09-19T13:57:51","date_gmt":"2011-09-19T17:57:51","guid":{"rendered":"http:\/\/fgiasson.com\/blog\/?p=1249"},"modified":"2011-09-19T13:57:51","modified_gmt":"2011-09-19T17:57:51","slug":"benchmark-of-phps-main-string-search-functions","status":"publish","type":"post","link":"https:\/\/fgiasson.com\/blog\/index.php\/2011\/09\/19\/benchmark-of-phps-main-string-search-functions\/","title":{"rendered":"Benchmark of PHP&#8217;s main String Search Functions"},"content":{"rendered":"<table>\n<tbody>\n<tr>\n<td>I am currently upgrading the <a title=\"structWSF\" href=\"http:\/\/openstructs.org\/structwsf\">structWSF<\/a> ontologies related web service endpoints along with the structOntology <a title=\"conStruct\" href=\"http:\/\/drupal.org\/project\/construct\">conStruct<\/a> module to make them more performing so that we can load ontologies that have thousands of classes and properties (at least up to 30 000 of them).<\/td>\n<td valign=\"top\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-medium wp-image-1367 shadow_curl\" title=\"benchmark\" src=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/benchmark-300x229.jpg\" alt=\"\" width=\"180\" height=\"137\" srcset=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/benchmark-300x229.jpg 300w, https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/benchmark.jpg 604w\" sizes=\"auto, (max-width: 180px) 100vw, 180px\" \/><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>While testing these new upgrades with them <a title=\"UMBEL Ontology\" href=\"http:\/\/umbel.org\">UMBEL ontology<\/a>, I noticed that much of the time was spent by a few number of <code>stripos()<\/code> calls located in the <code>loadXML()<\/code> function of the <a href=\"http:\/\/code.google.com\/p\/structwsf\/source\/browse\/branches\/dev\/framework\/ProcessorXML.php\"><code>ProcessorXML.php<\/code><\/a> internal <a title=\"structXML\" href=\"http:\/\/techwiki.openstructs.org\/index.php\/StructXML\">structXML<\/a> parser. They were used to extract the prefixes in the header of the structXML files, and then to resolve them into the XML file. I was using <code>stripos()<\/code> instead of <code>strpos()<\/code> to make the parsing of these structXML files case-insensitive even if XML is case-sensitive itself. However, due to their processing cost, I did change this behaviors by using the <code>strpos()<\/code> function instead. Here are the main reasons to this change:<\/p>\n<ul>\n<li>XML is itself case-sensitive, so don&#8217;t try to be too clever<\/li>\n<li>These structXML files that are exchanged are mostly internal to structXML<\/li>\n<li>Their parsing performances is critical<\/li>\n<\/ul>\n<h3>The Tests<\/h3>\n<p>This is a non-scientific post about some experimentation I made related to the various PHP 5.3 string search functions. These tests have been performed on a small Amazon EC2 instance using <a href=\"http:\/\/www.php-debugger.com\/dbg\/\">DBG<\/a> and <a title=\"PhpED\" href=\"http:\/\/www.nusphere.com\/products\/phped.htm\">PHPeD<\/a>.<\/p>\n<p>[cc lang=&#8217;php&#8217; line_numbers=&#8217;true&#8217;]<br \/>\n[raw]<br \/>\n<?php\n  \n$text = \"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce malesuada aliquet pharetra. Nunc tincidunt tempus eleifend. Cras aliquet risus eget tortor elementum at molestie erat auctor. Sed sapien nulla, auctor a aliquam in, ornare eget enim. Ut ac luctus nunc. Etiam et tortor felis, sed fringilla orci. Fusce laoreet ligula turpis, quis sodales enim. Pellentesque at sapien ut dolor malesuada placerat eu ac quam. Pellentesque purus elit, sodales in fringilla eu, egestas vitae ipsum. Nam condimentum, nisi ac tincidunt luctus, odio erat porta turpis, eget varius felis leo sit amet lorem. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Maecenas quis pulvinar dui. Integer quis eros nibh. Donec in lectus vitae ligula euismod vulputate ut euismod enim. Ut vehicula, sapien at faucibus ornare, nulla lorem luctus purus, sed imperdiet augue purus quis enim.\";\n$explodedText = explode(\" \", $text);\n  \nfor($i = 0; $i < 10000; $i++)\n{\n  $word = $word = array_rand($explodedText );\n    \n  strpos($text, $word);\n  stripos($text, $word);\n  strstr($text, $word);\n  stristr($text, $word);\n}\n\n?><br \/>\n[\/raw]<br \/>\n[\/cc]<\/p>\n<p>The first test uses a text of 138 words. That text get exploded into an array where each value is a word of that text. Then, before each iteration, we randomly select a word that we will search, within the text, using each of the 4 search functions.<\/p>\n<blockquote><p><em>Note that in the result images below, each of the line in the left-most column are the ones of the PHP code above.<\/em><\/p><\/blockquote>\n<p>That first test starts with 10 000 iterations. Here are the results of the first run:<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test1.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1275\" title=\"PHP string search test #1\" src=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test1.gif\" alt=\"\" width=\"450\" height=\"68\" srcset=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test1.gif 876w, https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test1-300x45.gif 300w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><a href=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test11.gif\"><br \/>\n<\/a><\/p>\n<p>The second test uses the same 138 words, but the test is performed 100 000 times:<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test2.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1285\" title=\"test2\" src=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test2.gif\" alt=\"\" width=\"450\" height=\"68\" \/><\/a><\/p>\n<p>As we can see, <code>strpos()<\/code> and <code>strstr()<\/code> are clearly faster than their case-insensitive counterparts.<\/p>\n<p>Now, let&#8217;s see what is the impact of the size of the text to search. We will now perform the two tests with 10 000 and 100 000 iterations but with a text that has 497 words.<\/p>\n<p>[cc lang=&#8217;php&#8217; line_numbers=&#8217;true&#8217;]<br \/>\n[raw]<br \/>\n<?php\n  \n$longText = \"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce malesuada aliquet pharetra. Nunc tincidunt tempus eleifend. Cras aliquet risus eget tortor elementum at molestie erat auctor. Sed sapien nulla, auctor a aliquam in, ornare eget enim. Ut ac luctus nunc. Etiam et tortor felis, sed fringilla orci. Fusce laoreet ligula turpis, quis sodales enim. Pellentesque at sapien ut dolor malesuada placerat eu ac quam. Pellentesque purus elit, sodales in fringilla eu, egestas vitae ipsum. Nam condimentum, nisi ac tincidunt luctus, odio erat porta turpis, eget varius felis leo sit amet lorem. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Maecenas quis pulvinar dui. Integer quis eros nibh. Donec in lectus vitae ligula euismod vulputate ut euismod enim. Ut vehicula, sapien at faucibus ornare, nulla lorem luctus purus, sed imperdiet augue purus quis enim. Nunc eu consectetur quam. Duis nulla sem, tincidunt vel placerat at, ultricies eu est. Vestibulum sed nulla nunc, et tristique orci. Aliquam nulla sapien, lobortis in sagittis vitae, tincidunt ut felis. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut condimentum, orci venenatis mollis faucibus, purus enim euismod massa, a imperdiet sapien arcu in sapien. Nulla convallis sodales pretium. Nulla facilisi. Maecenas molestie est tortor. Fusce congue, leo eu tristique sodales, odio leo facilisis lectus, in euismod odio tellus ut sapien. Fusce odio orci, facilisis eu convallis et, consectetur nec mauris. Nullam nulla lacus, volutpat sit amet pulvinar quis, pulvinar eget dolor. Curabitur sit amet odio sem, at dapibus tellus. Donec nec dictum eros. Morbi convallis libero ultrices magna varius suscipit. Duis bibendum volutpat felis non fermentum. Phasellus nunc mi, ornare et vulputate sed, pellentesque sed enim. Mauris suscipit, nisl quis tempor mollis, tortor nunc varius odio, eu dictum odio mi quis sapien. Morbi placerat, erat quis mattis iaculis, urna nisi faucibus nisi, eu mattis elit mauris eu quam. Mauris euismod tincidunt ante quis interdum. Phasellus elementum libero in arcu tempus tincidunt. Praesent in nunc eget nibh porta imperdiet eget eget mauris. Morbi pellentesque dapibus lacus, rutrum sollicitudin nisi fermentum vel. Cras tempor mattis urna, sit amet semper eros varius ut. Fusce erat elit, tempus non commodo et, egestas sit amet odio. Suspendisse libero neque, porttitor vel volutpat eget, placerat in mi. Proin pharetra leo in ligula porttitor vestibulum. Curabitur vel mauris nec lorem sollicitudin porttitor. Sed suscipit, mauris ac sollicitudin tempus, orci velit aliquet leo, vitae ornare mi nulla a tellus. Morbi turpis justo, vestibulum ac auctor sed, vulputate nec nisl. Quisque ut ultricies orci. Sed vel dolor at felis egestas venenatis in ut elit. Nam quis neque sem. Morbi turpis magna, porttitor vulputate dignissim commodo, auctor eu nibh. Ut at nisl tortor. Quisque cursus interdum mi ut molestie. Vivamus nec ipsum ipsum. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Sed quis ipsum erat, quis dignissim nunc. Sed eu diam dapibus tortor fermentum dignissim. Phasellus ac turpis nisl, dictum consequat elit. Suspendisse at turpis quis eros pharetra imperdiet. Mauris ut nisl augue. \";\n$explodedLongText = explode(\" \", $longText);\n  \nfor($i = 0; $i < 500000; $i++)\n{\n  $word = array_rand($explodedLongText);\n    \n  strpos($text, $word);\n  stripos($text, $word);\n  strstr($text, $word);\n  stristr($text, $word);\n}\n\n?><br \/>\n[\/raw]<br \/>\n[\/cc]<\/p>\n<p>That third test starts with 10 000 iterations. Here are the results of the third run:<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test3.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1286\" title=\"test3\" src=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test3.gif\" alt=\"\" width=\"450\" height=\"68\" srcset=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test3.gif 875w, https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test3-300x45.gif 300w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/a><\/p>\n<p>The fourth test uses the same 497 words, but the test is performed 100 000 times:<\/p>\n<p style=\"text-align: center;\"><a href=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test4.gif\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1287\" title=\"test4\" src=\"https:\/\/fgiasson.com\/blog\/wp-content\/uploads\/2011\/09\/test4.gif\" alt=\"\" width=\"450\" height=\"68\" \/><\/a><\/p>\n<p>As we can see, even if we add more words, the same kind of performances are experienced.<\/p>\n<h3>Conclusion<\/h3>\n<p>After many runs (I only demonstrated a few here). I think I can affirm that <code>strpos()<\/code> and <code>strstr()<\/code> are way faster than their case-insensitive counterparts. However, <code>strpos()<\/code> seems a little bit faster than <code>strstr()<\/code>, but it seems to depends of the context, and which random words are being searched for. In any cases, according to PHP&#8217;s documentation, we should always use <code>strpos()<\/code> instead of <code>strstr()<\/code> because it supposedly use less memory.<\/p>\n<p>There may also be some unknown memory considerations that may affect the code I used to test these functions. In any case, I can affirm that in a real context, where queries are sent to the Ontology: Read web service endpoint that hosts the UMBEL ontology, that <code>strpos()<\/code> is a way faster than <code>stripos()<\/code>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I am currently upgrading the structWSF ontologies related web service endpoints along with the structOntology conStruct module to make them more performing so that we can load ontologies that have thousands of classes and properties (at least up to 30 000 of them). While testing these new upgrades with them UMBEL ontology, I noticed that [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[155,66],"tags":[186,187,188,177,168],"class_list":["post-1249","post","type-post","status-publish","format-standard","hentry","category-osf-web-services","category-programming","tag-code","tag-optimization","tag-osf","tag-php","tag-umbel-2"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/1249","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=1249"}],"version-history":[{"count":76,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/1249\/revisions"}],"predecessor-version":[{"id":1383,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/1249\/revisions\/1383"}],"wp:attachment":[{"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=1249"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=1249"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fgiasson.com\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=1249"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}