Category Archives: php-mysql

Share mysql and apache folders from windows XAMPP to linux XAMPP

This is a replica of the article by jultech from 2007 for posterity and data-replication purposes.

When dualbooting between windows (first OS installed) and linux, do this:

~ # mv /opt/lampp/var/mysql /opt/lampp/var/mysql.BACKUP
 ~ # ln -s /mnt/windisk/path/to/XAMPP/mysql/data/ /opt/lampp/var/mysql
 ~ # mv /opt/lampp/htdocs /opt/lampp/htdocs.BACKUP
 ~ # ln -s /mnt/windisk/path/to/XAMPP/htdocs /opt/lampp/htdocs

This creates a backup of the original XAMPP folders in linux, and them symlinks them to the windows disk mounts.

Restart XAMPP to have th configuration running.

Thank you jultech.

Add custom social media share icon buttons in phpBB

Self reference article: how to add responsive social media share icons on your forum for each page.

HTML to add to /styles/stylename/template/over_header.html, right under <div id="page-header" class="page-width">

 <div id="socialshare" data-expendable="yes">
 <div>
 <a class="twittersharebutton" href="http://twitter.com/share?url={SOCIAL_URL}" target="_blank">t</a>
 <a class="facebooksharebutton" href="http://www.facebook.com/sharer/sharer.php?u={SOCIAL_URL}" target="_blank">f</a>
 <a class="googlesharebutton" href="https://plus.google.com/share?url={SOCIAL_URL}" target="_blank">g</a>
 </div>
 </div>

CSS to add at the bottom of <head> section of same file:

<style>
 #socialshare {
 z-index:20;
}
a.twittersharebutton, a.facebooksharebutton, a.googlesharebutton {
 color:white;
 text-align:center;
 text-decoration: none;
 font-family:tahoma;
 vertical-align:bottom;
}
a.twittersharebutton {
 background-color:#55acee;
}
a.facebooksharebutton {
 background-color:#3b5998;
}
a.googlesharebutton {
 font-family:georgia;
 background-color:#dd4b39;
 line-height:60% !important;
}
#socialshare a:hover {
 color:white;
 filter:saturate(3);
}
@media (min-width: 1100px) {
 #socialshare {
 position:absolute;
 text-align:center;
 left:-55px;
 }
#socialshare>div {
 width: 50px;
 position:fixed;
 top:10px;
 }
a.twittersharebutton, a.facebooksharebutton, a.googlesharebutton {
 display:block;
 margin-bottom:10px;
 width:50px;
 height:50px;
 font-size:50px;
 line-height:51px;
 border-radius:6px;
 }
 
 #page-header {position:relative;}
}
@media (max-width:1099px) {
 #socialshare {
 bottom:2px;
 left:2px;
 position:fixed;
 }
a.twittersharebutton, a.facebooksharebutton, a.googlesharebutton {
 display:inline-block;
 margin-right:10px;
 width:30px;
 height:30px;
 font-size:30px;
 line-height:31px;
 border-radius:4px;
 }
}
</style>

Code to add in /includes/functions.php, in the template variable assignment (look for LAST_VISIT_DATE)

'SOCIAL_URL' => urlencode(generate_board_url() . '/' . $user->page['page']),

Reload, all done!

Cache PHP to gzipped static html pages using htaccess redirect

No matter the server performance, the fastest kind of website for a visitor is the one with static HTML pages; this way the server just has to upload existing data to the browser instead of starting the PHP compiler, open a connection to MySQL, fetch data, format the page and only after that send it. Less CPU, less memory, less processing time, more users served in a smaller amount of time.

Avoiding PHP and MySQL execution altogether is the key, and my project was to create something similar to WPSuperCache to be implemented in a generic PHP website, so my thanks go to said plugin’s developers for the source code that gave me precious tips.

DISCLAIMER: this guide presumes you are “fluent” with PHP and .htaccess coding, and is meant to just give directions as of how to obtain a certain result. This guide will not give a pre-cooked solution to just copy/paste into your website, you need to change/add code to adapt it to your need; I am not the person who can help you if you need a tailored solution, simply because I have no time to give away free personalized help!

Here’s our logical scheme:

  1. Visitor comes to website and asks for page X;
  2. Webserver checks via .htaccess (avoiding PHP execution) if there is a static cached version of page X already, and in this case either serves the gzipped file (if the client supports it), or falls back to the uncompressed HTML version… at which point our scheme ends; otherwise, if no cached version exists, continue to next step;
  3. PHP builds the page and serves it to the visitor as “fresh”, but at the same time saves the HTML output as both gzipped and uncompressed version so the next visitor will directly download that one.

This is a very bare procedure, which needs to be perfected by the following conditions:

  • Not all pages need to be cached (those that by their very nature change very often, like real time statistics, a poll, whatever)… in fact, some pages must NOT be cached (those that only moderators can view, both for secutiry reasons and consistency); PHP must avoid caching those pages, so that htaccess will always serve the “fresh” PHP page.
  • Pages that do get cached still need to be refreshed, because sometimes they can change (articles or posts that are modified/updated with time, or where comments get posted).
  • Some pages, even if the corresponding cached version exists, must NOT be served from cache, for instance when POST data is submitted (sometimes, also with GET, you as the webmaster should know if this is the case), that requires PHP to be handled.

Define “caching behaviour” in PHP

This section is where you set the variables that the PHP caching routine will check against to know when and how to do its job.

$cachepath=$_SERVER["DOCUMENT_ROOT"]."/cache/";
$cachesuffix="-static";
$nocachelist=array(
    "search"=>1,
    "statistics"=>1,
    "secretcodes"=>1,
    "..."=>1,
 );

And explained:

  • $cachepath is the folder where you want the static (HTML and gzip) files to be stored (the website I use this caching routine on, has all the pages accessible from root folder with rewrite URLs, in other words the only slash in the URL is the one after the domain name, and I have no need to reproduce any folder structure; if you do, you’re on your own);
  • $cachesuffix is a string I need to add to the URL string (for example if address is domainname.com/pagename then in this case the cachefile will be named pagename-static), this is useful if you want to cache the homepage which is not named index.php but is just domainname.com/ because in that case, the cachefile name would be empty and .htaccess won’t find it;
  • $nocachelist is an associative array where you have to add as many keys (pointing to a value of 1) as the pages you don’t want (for whatever reason) to cache; in the key name you have to put the string the user would write in the URL bar after the domain name slash to get to the page, for example if you don’t want to cache domainname.com/statistics you would be using “statistics”=>1 in there, as already is.

Have PHP actually save to disk the cached pages

     if (
        !isset($nocachelist[$_GET["page"]]) &&
        !$_SESSION["admin"] &&
        !count($_POST) &&
        !$_SERVER["QUERY_STRING"]
    ) {
        //build the uncompressed cache
        if (!file_exists($cachepath.$url.$cachesuffix.".html")) {
            $cached=str_replace("~creationtime","cached",$html);
            $fp=fopen($cachepath.$url.$cachesuffix.".html","w");
            fwrite($fp,$cached);
            fclose($fp);
        }
        //build the compressed cache
        if (!file_exists($cachepath.$url.$cachesuffix.".html.gz")) {
            $cachedz=str_replace("~creationtime","cached&amp;gzipped",$html);
            $cachedz=gzencode($cachedz,9);
            $cachedzsize=strlen($cachedz);
            $fp=fopen($cachepath.$url.$cachesuffix.".html.gz","w");
            fwrite($fp,$cachedz);
            fclose($fp);
        }
    }

Pretty much self explanatory, isn’t it.
No seriously, do you really want me to elaborate?

Ok, you got it.

  • The $_GET[“page”] is a GET value I set early in the code to know where we are in the website, you can use any variable here as long as you can check it against the $nocachelist array;
  • The other conditions in the first if should be clear, they avoid building the page’s cache if the CMS’s admin is logged (security) or if POST data is submitted or if there is a query string appended to the URL (consistency/stability);
  • $url is a variable that I define early in the code, and contains the string after the domain name slash and before the query string question mark, basically the kind of string you fill the $nocachelist array with (if you were paying attention, you may now think I have a redundant variable since $url and $_GET[“page”] should be the same, but this is not the case for other reasons);
  • $html is the string variable that, across the whole CMS, defines the raw HTML code to echo at the end of the PHP execution; you can either do like I do and define such string, or use an output buffer to obtain HTML if you instead print the HTML directly to screen during PHP execution;
  • ~creationtime is a “hotkey” I use in my template to plug in the time in seconds that was needed to create the page in PHP; since I am creating a cached version now, the creation time of the page is zero, because it’s already there to be downloaded from the browser instead of having to be compiled by the server, so in there I print either “cached&gzipped” for clients that support gzip, or only “cached” when the browser doesn’t not; you can safely strip out this part, as this is more of a eyecandy/nerdy/debug thing.

Let .htaccess send the cached files before starting the PHP compiler

AddEncoding x-gzip .gz
<FilesMatch "\.html\.gz$">
    ForceType text/html
</FilesMatch>

#GZIP CMS
RewriteCond %{REQUEST_METHOD} !POST
RewriteCond %{REQUEST_URI} !/forum/
RewriteCond %{QUERY_STRING} !.*=.*
RewriteCond %{HTTP:Cookie} !^.*(isadmin|nocache).*$
RewriteCond %{HTTP:X-Wap-Profile} !^[a-z0-9\"]+ [NC]
RewriteCond %{HTTP:Profile} !^[a-z0-9\"]+ [NC]
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond /full/path/to/your/htdocs/cache/$1-static.html.gz -f
RewriteRule ^(.*) /cache/$1-static.html.gz [L,T=text/html]

#UNCOMPRESSED CMS
RewriteCond %{REQUEST_METHOD} !POST
RewriteCond %{REQUEST_URI} !/forum/
RewriteCond %{QUERY_STRING} !.*=.*
RewriteCond %{HTTP:Cookie} !^.*(isadmin|nocache).*$
RewriteCond %{HTTP:X-Wap-Profile} !^[a-z0-9\"]+ [NC]
RewriteCond %{HTTP:Profile} !^[a-z0-9\"]+ [NC]
RewriteCond /full/path/to/your/htdocs/cache/$1-static.html -f
RewriteRule ^(.*) /cache/$1-static.html [L]

This is the code you need to plug in .htaccess, preferably after everything else, but before defining the custom error pages; anyway, since you should be .htaccess-fluent, you shouldn’t need to be told where this fits best.

Some detailing for the curious:

  • First bit is needed (at least I needed it) to serve the gzipped files in a way that the browser knows to handle, otherwise I just got gibberish (the gzip was being sent to output without being uncompressed by the browser first);
  • if the cookies “isadmin” or  “nocache” exist, the cache version of the page will not be served even if it exists; easy explanation, if an admin is logged, and there is special content on a page that only admins can see, you don’t want them to see the “vanilla” cached version of the pages instead; so it’s your duty in this case to set a “isadmin” cookie when an admin logs in, and remove it when the admin logs out;
  • choosing the correct full path on the webserver was a bit tricky in my case, I can’t quite remember where the issue was, but depending on the method I used to get it I had different paths; I kind of remember I had to choose between $_SERVER[“DOCUMENT_ROOT”] and dirname(__FILE__) because only one was working with .htaccess;
  • you don’t really have to change much in this code snippet unless you have particular needs; the /forum/ path exclusion could be irrelevant in your case;
  • thanks go to WPSuperCache developers for their .htaccess code that I stole to build this snippet!

Last but not least

The cache is there to help you, not to give an handicap to your website. You choose what the cache performance should be, and how many times it should get served instead of the dynamic page, before considering it paied off, but you have to clear the cache from time to time to make sure you’re not serving outdated stuff to your visitors.

In my case I use an external cronjob provider (setcronjob.com) to trigger a PHP routine every night which includes the following:

$handle=opendir("cache");
while (($file = readdir($handle))!==false) @unlink("cache/".$file);
closedir($handle);

so that everyday the website starts off with a fresh cache. Not less important, you should clear the cache of a single page as soon as you know that page changed, unless you’re ok with the changes being visibile only the next day after all the cache is cleared anyway. Example: you edit a page while logged as admin, or a user posts a comment, or anything happens that you have control over that alters in any way the page’s HTML: simply use unlink() to delete both the gzipped and uncompressed caches and the website will recreate them with the updated content.

 

Have fun with pimping up this draft!

Zip MySQL database backup with PHP and send as email attachment

I was a happy user of MySQLDumper until a while ago, when my shared hosting upgraded to the cloud architecture and latest PHP version; with that, some modules for PERL went missing and I was left without a viable backup solution (MySQLDumper PHP approach uses a javascript reloading routine which is superslow and is not compatible with wget nor remote cronjobs like SetCronJob.com).

So I found a couple of tutorials on the net (this, and this), mixed them together, especially improving David Walsh’s routine, blended them with some healthy bzip2 compression, and shipped to my webspace.

I stripped the part where you can select which tables to backup; I figured that any backup deserving this name should be “all-inclusive”; also I think that anyone with the special need to backup only selective tables should also be smart enough to change the code to suit his/her needs.

UPDATE 9/9/11: I just added a snippet of code to blacklist some tables from the backup, since I needed to trim the size or it gave a fatal error about the available memory being full; just fill the array with rows as you see fit

This is the result:

<?php
$creationstart=strtok(microtime()," ")+strtok(" ");

$dbhost="your.sql.host";
$dbname="yourdbname";
$dbuser="yourmysqlusername";
$dbpass="yourmysqlpassword";

$mailto="mail.me@mymailbox.com";
$subject="Backup DB";
$from_name="Your trustworthy website";
$from_mail="noreply@yourwebsite.com";

mysql_connect($dbhost, $dbuser, $dbpass);
mysql_select_db($dbname);

$tablesblocklist=array(
    "tablename1"=>1,
    "tablename2"=>1,
    "tablename3"=>1,
);
$tables = array();
$result = mysql_query("SHOW TABLES");
while($row = mysql_fetch_row($result))
$tables[] = $row[0];
foreach($tables as $table) {
if (!isset($tablesblocklist[$table])) {
$result = mysql_query("SELECT * FROM $table");
$return.= "DROP TABLE IF EXISTS $table;";
$row2 = mysql_fetch_row(mysql_query("SHOW CREATE TABLE $table"));
$return.= "\n\n".$row2[1].";\n\n";
while($row = mysql_fetch_row($result)) {
$return.= "INSERT INTO $table VALUES(";
$fields=array();
foreach ($row as $field)
$fields[]="'".mysql_real_escape_string($field)."'";
$return.= implode(",",$fields).");\n";
}
$return.="\n\n\n";
}
}
$filename='db-backup-'.date("Y-m-d H.m.i").'.sql.bz2';

$content=chunk_split(base64_encode(bzcompress($return,9)));
$uid=md5(uniqid(time()));
$header=
"From: ".$from_name." <".$from_mail.">\r\n".
"Reply-To: ".$replyto."\r\n".
"MIME-Version: 1.0\r\n".
"Content-Type: multipart/mixed; boundary=\"".$uid."\"\r\n\r\n".
"This is a multi-part message in MIME format.\r\n".
"--".$uid."\r\n".
"Content-type:text/plain; charset=iso-8859-1\r\n".
"Content-Transfer-Encoding: 7bit\r\n\r\n".
$message."\r\n\r\n".
"--".$uid."\r\n".
"Content-Type: application/octet-stream; name=\"".$filename."\"\r\n".
"Content-Transfer-Encoding: base64\r\n".
"Content-Disposition: attachment; filename=\"".$filename."\"\r\n\r\n".
$content."\r\n\r\n".
"--".$uid."--";
mail($mailto,$subject,"",$header);

$creationend=strtok(microtime()," ")+strtok(" ");
$creationtime=number_format($creationend-$creationstart,4);
echo "Database backup compressed to bz2 and sent by email in $creationtime seconds";
?>

(WordPress killed the indenting and I’ve no intention of fixing it manually)

The whole routine runs in roughly 7seconds on my not-so-bright hoster server, including bzip2’ing and mailing, while the original text-only output for David’s code took on the same server roughly 17seconds; probably it’s due to the removal of several redundant loops, and using the builtin mysql_real_escape_string() and implode() functions instead of David’s workarounds.

My sincere thanks go to the authors of the two guides, without whom I wouldn’t be writing this today 😉

MySQL query with ORDER BY,GROUP BY to remove duplicate rows after ordering

Apparently this is a widely discussed issue with MySQL; many newbies (like me) wanting to SELECT rows from a table, where they know there are duplicates of some field ‘a’ which they want unique, but making sure that the selected row has the field ‘b’ with the lower, highest, longest or shortest (whatever) value among the duplicates… well, these newbies first try and make a query like:

SELECT a, b, c FROM table WHERE <conditions> ORDER BY b DESC GROUP BY a

and they get errors and realize it’s not possible, because MySQL syntax wants the GROUP BY clause to come before the ORDER BY one, like:

SELECT a, b, c FROM table WHERE <conditions> GROUP BY a ORDER BY b DESC

In this case though, rows first get grouped, and only after duplicates are removed, they get ordered (pretty much useless for what we need).

How to go about first ordering duplicates the way we need, and then remove the rest?

To cut to the chase, this is how:

SELECT t.a, t.b, t.c 
  FROM table t
  JOIN (SELECT MIN(b) as b
    FROM table
    GROUP BY a) t2 ON t2.b = t.b
  WHERE <conditions>
  GROUP BY a
  ORDER BY NULL

What this “query skeleton” does, is ordering the rows by the field ‘b’ (ASC order, since MIN() function is used, otherwise you would be using MAX(), or anyway the function that better suits your needs) inside the JOIN operator, and then grouping these rows by field ‘a’ in the outer query; the ORDER BY NULL clause at the end is to avoid MySQL doing unnecessary ordering steps as the rows have already been ordered by the GROUP BY clause. Effective, sure, but incredibly slow query.

I thought I could go on by just using this, yet I wasn’t satisfied, as I became a performance optimization junkie during my PHP/MySQL learning curve. So I thought, what if I just get the raw rows, and order them later as I need in a PHP array?

So, let’s examine this approach instead:

$results=doquery("
  SELECT a, b, c
    FROM table
    WHERE <conditions>
  ");
$datagrid=array();
while ($slot=mysql_fetch_array($results)) $datagrid[]=$slot;
sort($datagrid);
$compare="";
foreach ($datagrid as $k=>$row)
  if ($row[0]==$compare)
    unset($datagrid[$k]);
  else
    $compare=$row[0];

After this, the multidimensional array $datagrid contains the data that you need. Some explaining is in order:

  • the SQL query returns the unprocessed rows corresponding to the WHERE conditions you specified, so you will get duplicate rows, in an arbitrary order
  • after inserting the rows data inside a multidimensional array, the sort() PHP function takes care of sorting this array depending on the sub-arrays in it, first by subarray[0], then by subarray[1] and so on (let’s open a parenthesis here: if you search for “php sort multidimensional array” on the internet, you will find all sorts of hand-made custom scripts to do this, as if sort() alone wasn’t enough; well, I have PHP5 on my hosting, and I threw a multidimensional array at it… it got splendidly sorted just as I needed, so it worked for me; in case it doesn’t for you, look on php.net function_sort user comments, there are all kinds of custom multi-sorting scripts in there, even if nothing is as fast as just sort() which is a builtin function)
  • a $compare string is initialized, and used on the following foreach loop to unset() all the subarrays where the element [0] (our ‘a’ field) is the same as the previous subarray’s [0], thus effectively pruning duplicates after the first occurrence

After this you can pass the resulting array to another routine which will process it as needed.

To take performance even further, in case you need to do a simple processing on the resulting ordered&grouped array, you can change the last foreach of the previous routine into:

foreach ($datagrid as $row)
  if ($row[0]!=$compare) {
    $compare=$row[0];
    <processing routine in here>
  }

Again, to explain, this foreach loop doesn’t care about unsetting the duplicated subarrays, but just runs your processing code on the first occurence of a subarray, ignoring the others; with this approach, you are cutting several times on the excecution time:

  1. you presumably save time as you don’t need another foreach loop to process the pruned multidimensional array, since you do everything in the first and only foreach
  2. you avoid issuing as many unset()‘s as the duplicate rows are
  3. you save on cycles and memory needed to define a new array to store the pruned data, as you only work on the very first array you built on the SQL results
  4. you simplify the foreach because you are not requesting an associative result anymore ($k=>$row) but only request the element value

With my benchmark (on a VERY reduced sets of rows, I must admit), I measured a performance increase of a whopping one hundred times (yes, exactly 100X faster) for the PHP ordering/pruning, if compared to MySQL “double-GROUP BY” approach (0.08 seconds for MySQL/PHP, against 8 seconds for MySQL only, for 100 iterations). Your results may vary on very large, or of a different kind, data sets, anyway testing will not hurt. From my 100x speed experience, there’s still plenty of leeway space.

Inverted search in MySQL database: check if keywords in field match some text

Usually, “searching a database” means looking for rows that contain a “document” in a field that matches a certain keywords set (mostly using the MATCH AGAINST construct); let’s say you want to do the opposite instead, that is finding those rows that have a field containing keywords, or a phrase, that matches to a text you already know. This is what I call an inverted search, or reverse search.

Real-life scenario: you build a website for job offers, candidates come in searching for listings that suit them, but find none, or find just a few that are not satisfactory. Either they keep coming to the website doing a search (let’s say they use the keywords “veterinar* cat” because they want to take care of animals, and like cats above all), or they subscribe to the RSS feed for that search (if your site offers one)… or they “save” this preference, and have the site send them an email as soon as a new offer pops up that matches their search, and this is where this article comes in.

Now, it’s easy to save the keywords for each user in a MySQL table, but how easy is it to check if the text of new messages matches said keywords? In PHP, you would go about building an array of keywords, and using strpos() on each of them to see if they are all contained in the text.

In MySQL you have the INSTR() which is the equivalent of PHP’s strpos(), but you cannot store arrays in a field (I mean real arrays, not serialized ones), and storing as many rows as the keywords are is going to be troublesome, as normalization isn’t needed: a “set of keywords” is mostly unique to each user subscribing to a search, so it makes sense to store them in a text field in the form of “keyword1,keyword2,keyword3,…”; again, if you were in PHP, you would do a explode(“,”,$keywords) to obtain the array of keywords, but in MySQL there is no such function, so you pretty much have to create one.

Ideally, you need a function where you pass the known document/text to match against (the concatenated subject and the body of the job offer, in this example), the string in the form “keyword1,…,keywordN“, and a parameter which tells the function that the delimiter (“glue” in PHP’s explode/implode) is a comma, so:

somefunction(document, keywords, delimiter)

This function would then return a positive integer if all of the keywords are present in the text, or 0 otherwise.

The function I put together by scraping info around was:

DELIMITER ;;

DROP FUNCTION IF EXISTS `matchkwdarray`;;
CREATE FUNCTION `matchkwdarray`(`str` text, `arr` text, `delimit` tinytext) RETURNS char(1) CHARSET utf8
 DETERMINISTIC
BEGIN
 DECLARE i INT DEFAULT 1;
 DECLARE matches INT DEFAULT 1;
 DECLARE token TINYTEXT;
 label:WHILE(matches>0) DO
 SET token=replace(substring(substring_index(arr, delimit, i), length(substring_index(arr, delimit, i - 1)) + 1), delimit, '');
 IF token='' THEN LEAVE label; END IF;

 SET matches=INSTR(str, token);
 SET i=i+1;
 END WHILE;
 RETURN matches;
END;;

DELIMITER ;

To add this to your MySQL database, you can either connect via CLI, or, as most people with a website on a shared hosting, not having command-line access, using a database frontend; phpMyAdmin, bloated as it is, doesn’t support functions/routines, so you need another one, and I found Adminer to be a massively better alternative, performance and flexibility-wise, and it’s only a 150kb-something single PHP file, which makes it all the better.

So, when you test the function, you have the following:

SELECT matchkwdarray('this is just a very simple test', 'just,test', ',')
Returns: 2

SELECT matchkwdarray('this is just a very simple test', 'some,stuff', ',')
Returns: 0

SELECT matchkwdarray('this is just a very simple test', 'this,very,stuff', ',')
Returns: 0

The function is resource efficient: as soon as a keyword is not in the text, it stops processing the keywords array. Surely, if you don’t want a 100% match but are fine with lower than that, you can easily modify the function to “stream” the whole keywords array, even removing the last character off each keywords to allow some variations, and return a percentage match which you can then evaluate later on.

MySQL SELECT query with more WHERE conditions runs faster

For some reason I wanted to add more redundant conditions to a WHERE clause in a MySQL SELECT query, but I feared that, being redundant conditions, I may end up slowing up the query.

Shortly, there is this classifieds site, where each item’s table has the town_id, province_id and region_id rows. If you know the town_id you’re looking for already, you can query:

SELECT * FROM items WHERE town_id='12'

instead of:

SELECT * FROM items WHERE town_id='12' AND province_id='34' AND region_id='56'

as the results will be the same… but should you?

I prepared a quick test panel:

$creationstart=strtok(microtime()," ")+strtok(" ");
for($i=0;$i<=999;$i++)
 $test=doquery("SELECT * FROM items WHERE town_id='58003'");
$creationend=strtok(microtime()," ")+strtok(" ");
$creationtime=number_format($creationend-$creationstart,4);
echo "town only: $creationtime<br />";

$creationstart=strtok(microtime()," ")+strtok(" ");
for($i=0;$i<=999;$i++)
 $test=doquery("SELECT * FROM items WHERE town_id='58003' AND province_id='58' AND region_id='12'");
$creationend=strtok(microtime()," ")+strtok(" ");
$creationtime=number_format($creationend-$creationstart,4);
echo "all conditions: $creationtime<br />";

Surprisingly, it took anywhere from 2.4s to 4.5s for the “all conditions” routine to complete, while the “town only” query ended up being about 0.4s slower. At first I thought it might be the MySQL cache in action, but even inverting the position of the cycles inside the code, results were the same.

So well, even if it looks redundant to you, adding more WHERE ... AND ... clauses in your query speeds up the fetching of the results.

  This article has been Digiproved

PHP search string functions performance is “needle” dependent

I was playing with some templating benchmarks in my post before this one, and bumped into an interesting find regarding the way strpos works, and consequently all similar “search (and replace)” functions like str_replace, strrpos, stripos, str_ireplace, substr_replace, even preg_match and preg_replace (I tested only str_replace and preg_match out of these, but I assume they will all bear the same results).

  1. From here on  am using the php.net convention, and will call “needle” the string you’re searching for, and “haystack” the string inside of which you’re executing the search.
  2. Knowing these results will only be useful if you can choose which “needles” to put in the “haystack” to be searched for in a later time, hence only if you’re building a template.
  3. I serched if someone else found this already, but this doesn’t appear to be the case, so if I am selling for new already known things just don’t bash me.

So let’s say you’re creating a template scheme, where you insert several tags that need to be replaced by the dynamically generated content of your website. You can call them whatever you want as long as they are unique strings inside the template, so for example %tag% or {tag} or <!–tag–> or ~tag~ (or just whatever). Maybe you think that choosing which delimiting chars to use for the tag name is only up to your personal tastes, and I did too before discovering this by accident, but I am going to guess that if the template contains well formed HTML code, then using ~tag~ is going to be much faster than using <!–tag–>.

The general principle is, searching for a needle beginning with a “rare” character is much faster than searching for a needle starting with a commonly used character inside the haystack.

For example, if your haystack is an excerpt from an ebook, searching for “%hello” (if present) will be way faster than searching for “hello”. The reason for this? The function in C that searches for the string, starts by searching for its first character, if found checks if the following one matches, and so on; so if you’re searching for “hello” the function will pause at every “h” to see if after that there’s an “e”, and if yes then checks if there’s and “l” and then another “l”, yet if the word is “hellish”, the function will not find the ending “o”, and will have to discard the work and time spent and go on with the search. The “%” character on the other hand is pretty rare inside of a “normal text”, if not unique, so the function will have to “pause” way less times before hitting a full match.

Let’s put it to the test, this is the routine:

$creationstart=strtok(microtime()," ")+strtok(" ");
for ($i=0;$i<100000;$i++) $testpos=strpos($test,"malesuarda");
$creationend=strtok(microtime()," ")+strtok(" ");
$creationtime=number_format($creationend-$creationstart,4);
echo "malesuarda $testpos: ".$creationtime."<br />";

$creationstart=strtok(microtime()," ")+strtok(" ");
for ($i=0;$i<100000;$i++) $testpos=strpos($test,"%malesuada");
$creationend=strtok(microtime()," ")+strtok(" ");
$creationtime=number_format($creationend-$creationstart,4);
echo "%malesuada $testpos: ".$creationtime."<br />";

Let’s do some explaining: $test is a fairly long string that I previously defined inside the code (it is made of a lorem ipsum kind of text, several paragraphs amounting to almost 13kb), inside of which I took a random word, “malesuada“, which is repeated several times, and I made two occurrences of this word slightly different, to render them unique; they were both towards the end of the string, I changed one into malesuarda adding a”r”, and another one (further away in the string) into %malesuada, then just loaded the PHP script; I echoed the value of $testpos as well, to confirm that the strings were actually found.

As expected, here are the results:

malesuarda 10970: 3.5609
%malesuada 11514: 0.7632

Replacing strpos with any other functions listed at the beginning of this article will deal similar results.

  This article has been Digiproved

PHP templating, preg_match, file_get_contents and strpos, the fastest

While working on the website of reecycle.it, il frecycling italiano per il riciclo degli oggetti, I am spending my resources also on reducing at most the load on the Apache server of my italian shared hosting, Tophost, so i decided to make some benchmarks to see which was the best speed I could load the templates at, using different methods (a template is a “model” of a page -or part of-  written in HTML code, inside which the dynamic values loaded by the website engine are inserted).

All the templates used by reecycle.it were initially created as several .html files (one per each template) with several %tags% substituted by the str_replace() function before the output, but I thought that maybe there were different, faster ways to obtain the same result; a page can contain up to 5 or more different templates (general layout, login panel, simple search widget, search results table, and single row of the search results table), so each page could tell the server to access and load 5 different files on the hard disk (we are ignoring the disk cache for simplicity); maybe reading a single, bigger file containing all of the templates, loading it into memory, and later extracting the needed parts, is faster? This path can be taken in two ways, either by writing two “clean” lines of code using preg_match() and a regular expression, or by using a rather less “elegant” but strictly more performant combo of strpos() and substr() which instead needs several more lines of code.

In other words, I needed to know which one of the three (separate template files, single big template with regexp extraction, and single big template with strpos/substr extraction) was faster. I already knew preg_match was heaps slower than strpos/substr, yet I included it in the test for the sake of completeness.

This is the routine I used:

<?php
$creationstart=strtok(microtime()," ")+strtok(" ");

for ($i=0;$i<1000;$i++) {
 $text=file_get_contents("full.html");
 for ($n=1;$n<=8;$n++) {
 preg_match("/<!--{$n}\(-->(.*)<!--\){$n}-->/s",$text,$matches);
 $html[$n]=$matches[1];
 }
}

$creationend=strtok(microtime()," ")+strtok(" ");
$creationtime=number_format($creationend-$creationstart,4);
echo "preg_match: ".$creationtime."<br />";

/////////////////
$creationstart=strtok(microtime()," ")+strtok(" ");

for ($i=0;$i<1000;$i++) {
 $text=file_get_contents("full.html");
 for ($n=1;$n<=8;$n++) {
 $start=strpos($text,"<!--$n(-->")+strlen("<!--$n(-->");
 $ending=strpos($text,"<!--)$n-->");
 $html[$n]=substr($text,$start,($ending-$start));
 }
}

$creationend=strtok(microtime()," ")+strtok(" ");
$creationtime=number_format($creationend-$creationstart,4);
echo "strpos/substr: ".$creationtime."<br />";

////////////////////
$creationstart=strtok(microtime()," ")+strtok(" ");

for ($i=0;$i<1000;$i++) {
 for ($n=1;$n<=8;$n++) {
 $html[$n]=file_get_contents($n.".html");
 }
}

$creationend=strtok(microtime()," ")+strtok(" ");
$creationtime=number_format($creationend-$creationstart,4);
echo "file_get_contents: ".$creationtime."<br />";

where full.html is the single HTML file containing all the templates (a total of 8, consisting in paragraphs in lorem ipsum style, of different length), identified by <!--templatenumber(--> and <!--)templatenumber--> between which was the code to extract, while the single template files were named from 1.html to 8.html.

What the code does is, for each method it repeats 1000 iterations in which every single template is loaded, from 1 to 8, and measures the needed time to complete. This usage is not very realistic, as the template code is actually several lines of HTML code instead of few long lines of text, and templates are never loaded all together, but only those which are actually needed to draw the page; anyway, better performance in this test means better performance in real-life use (WRONG! check bottom for more details).

So, this was the result:

preg_match: 1.8984
strpos/substr: 0.0681
file_get_contents: 0.1352

Final times were obviously different at each page refresh, from a minimum (for preg_match) of 1.4s up to a maximum of 3s, anyway the relationship between them remained the same, that is the strpos/substr combination was two times faster than file_get_contents called for each file, yet what surprised me is how preg_match method is almost 30 times slower than strpos/substr, and hence 15 times slower than asking the server to read several different files together (I suppose this was due to the disk cache in action).

On a side note, the tech support of my hosting, inquired about this, suggested me to drop reading separate files in favour of reading a single file and using preg_match… go figure.

UPDATE:

I just went and tested this benchmark with real templates off reecycle.it… oh how much I was wrong.
I made a function on reecycle.it to fetch the requested template, that when the template inside the big file is missing, loads the single file template, returns it, and adds the missing template to the big single file, so after a while I got a 37kb supertemplate.tpl containing all of the templates of the website. I just changed the routines in the example above, to use the real files of the templates… and behold, the results were inverted! Using file_get_contents() on several files was two times faster than using strpos/substr on a single big file. No matter how I changed it, several, separate small files were still much faster than a single big file.

I blame it on two things: the file is actually big, so string functions have to deal with a big chunk of data to process, and especially the tag formats, since the template delimiters, practically in HTML comment format, begin with the “less than” symbol which is bound to be repeated lots of times inside HTML code, maybe confusing the strpos function.

In fact, in the templates archive file I modified the delimiting tags so they were like {tag(} and {)tag} instead of using the html comment tags <!–…–>, and the results of the benchmark went back to normal, being faster for the strpos/substr combo on the single file archive than with file_get_contents on several separate files, and the more the template requests, the faster the strpos/strsub method if compared to file_get_contents… see results of my next post.

Se vuoi cambiare la tua password, o non la ricordi, inserisci qui l’indirizzo di posta elettronica con cui ti sei registrato<br />
<form method=”post” action=”index.php”>
<table>
<tr>
<td class=”dida”>
email
</td>
<td class=”dati”>
<input type=”text” name=”resetmail” />
</td>
</tr>
<tr>
<td colspan=”2″>
<input type=”hidden” value=”resetpassword” name=”action” />
<input type=”submit” value=”Invia mail di reset” name=”sendresetpassword” />
</td>
</tr>
</table>
</form>

  This article has been Digiproved

PunBB e-mail notification on each new post extension

On another site I manage, I just switched from the previous forum script to PunBB; one feature I missed after the switch is the ability to activate an email notification for each and every post the users add into the forum, so I had to write my own.
Following is the source code of the extension, save it as manifest.xml, upload it into /extensions/newpost_mail_notify/ PunBB folder (or another name of your liking), and install from PunBB admin panel with the usual method. There is nothing to set, as long as the extension is active it just does its job.

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE extension SYSTEM "ext-1.0.dtd">

<extension engine="1.0">
<id>pun_newpost_mail_notify</id>
<title>Send email to admin for each new post</title>
<version>1.1</version>
<description>This extension simply sends an email to the configured admin email address everytime a new post is added by someone who's not the configured admin</description>
<author>Ephestione</author>

<minversion>1.3.2</minversion>
<maxtestedon>1.3.4</maxtestedon>

<hooks>
<hook id="po_pre_redirect"><![CDATA[
if ($forum_user['group_id']!=1) { //default admin user id
$notification_to=$forum_config['o_admin_email'];
$notification_subject='['.$forum_config['o_board_title'].'] New topic/post notification';
$notification_message='New post/thread at URL: '.$base_url.'/viewtopic.php?pid='.$new_pid.'#p'.$new_pid;
mail($notification_to,$notification_subject,$notification_message);
}
]]></hook>

</hooks>

</extension>

This extension is very spartan, it will send a mail to the preconfigured admin email address, only if the new post is by someone else than the admin (obviously), with a simple link pointing to the post in the board.

I just updated the code with the suggestion by Grez from PunBB forums, correcting the check on the user which now uses the group_id and not the user id, thus being more solid.

Disclaimer: not my fault if your server explodes, this extension has been made/tested for PunBB 1.3.4, and if you’re not using that version you should definitely upgrade; may be a bad idea to use it if you get a lot of posts each day; I needed it because my forum is not so active, so I prefer getting notified if there’s something new, rather than go checking everyday.

  This article has been Digiproved