Tag Archives: htaccess

Cache PHP to gzipped static html pages using htaccess redirect

No matter the server performance, the fastest kind of website for a visitor is the one with static HTML pages; this way the server just has to upload existing data to the browser instead of starting the PHP compiler, open a connection to MySQL, fetch data, format the page and only after that send it. Less CPU, less memory, less processing time, more users served in a smaller amount of time.

Avoiding PHP and MySQL execution altogether is the key, and my project was to create something similar to WPSuperCache to be implemented in a generic PHP website, so my thanks go to said plugin’s developers for the source code that gave me precious tips.

DISCLAIMER: this guide presumes you are “fluent” with PHP and .htaccess coding, and is meant to just give directions as of how to obtain a certain result. This guide will not give a pre-cooked solution to just copy/paste into your website, you need to change/add code to adapt it to your need; I am not the person who can help you if you need a tailored solution, simply because I have no time to give away free personalized help!

Here’s our logical scheme:

  1. Visitor comes to website and asks for page X;
  2. Webserver checks via .htaccess (avoiding PHP execution) if there is a static cached version of page X already, and in this case either serves the gzipped file (if the client supports it), or falls back to the uncompressed HTML version… at which point our scheme ends; otherwise, if no cached version exists, continue to next step;
  3. PHP builds the page and serves it to the visitor as “fresh”, but at the same time saves the HTML output as both gzipped and uncompressed version so the next visitor will directly download that one.

This is a very bare procedure, which needs to be perfected by the following conditions:

  • Not all pages need to be cached (those that by their very nature change very often, like real time statistics, a poll, whatever)… in fact, some pages must NOT be cached (those that only moderators can view, both for secutiry reasons and consistency); PHP must avoid caching those pages, so that htaccess will always serve the “fresh” PHP page.
  • Pages that do get cached still need to be refreshed, because sometimes they can change (articles or posts that are modified/updated with time, or where comments get posted).
  • Some pages, even if the corresponding cached version exists, must NOT be served from cache, for instance when POST data is submitted (sometimes, also with GET, you as the webmaster should know if this is the case), that requires PHP to be handled.

Define “caching behaviour” in PHP

This section is where you set the variables that the PHP caching routine will check against to know when and how to do its job.

$cachepath=$_SERVER["DOCUMENT_ROOT"]."/cache/";
$cachesuffix="-static";
$nocachelist=array(
    "search"=>1,
    "statistics"=>1,
    "secretcodes"=>1,
    "..."=>1,
 );

And explained:

  • $cachepath is the folder where you want the static (HTML and gzip) files to be stored (the website I use this caching routine on, has all the pages accessible from root folder with rewrite URLs, in other words the only slash in the URL is the one after the domain name, and I have no need to reproduce any folder structure; if you do, you’re on your own);
  • $cachesuffix is a string I need to add to the URL string (for example if address is domainname.com/pagename then in this case the cachefile will be named pagename-static), this is useful if you want to cache the homepage which is not named index.php but is just domainname.com/ because in that case, the cachefile name would be empty and .htaccess won’t find it;
  • $nocachelist is an associative array where you have to add as many keys (pointing to a value of 1) as the pages you don’t want (for whatever reason) to cache; in the key name you have to put the string the user would write in the URL bar after the domain name slash to get to the page, for example if you don’t want to cache domainname.com/statistics you would be using “statistics”=>1 in there, as already is.

Have PHP actually save to disk the cached pages

     if (
        !isset($nocachelist[$_GET["page"]]) &&
        !$_SESSION["admin"] &&
        !count($_POST) &&
        !$_SERVER["QUERY_STRING"]
    ) {
        //build the uncompressed cache
        if (!file_exists($cachepath.$url.$cachesuffix.".html")) {
            $cached=str_replace("~creationtime","cached",$html);
            $fp=fopen($cachepath.$url.$cachesuffix.".html","w");
            fwrite($fp,$cached);
            fclose($fp);
        }
        //build the compressed cache
        if (!file_exists($cachepath.$url.$cachesuffix.".html.gz")) {
            $cachedz=str_replace("~creationtime","cached&gzipped",$html);
            $cachedz=gzencode($cachedz,9);
            $cachedzsize=strlen($cachedz);
            $fp=fopen($cachepath.$url.$cachesuffix.".html.gz","w");
            fwrite($fp,$cachedz);
            fclose($fp);
        }
    }

Pretty much self explanatory, isn’t it.
No seriously, do you really want me to elaborate?

Ok, you got it.

  • The $_GET[“page”] is a GET value I set early in the code to know where we are in the website, you can use any variable here as long as you can check it against the $nocachelist array;
  • The other conditions in the first if should be clear, they avoid building the page’s cache if the CMS’s admin is logged (security) or if POST data is submitted or if there is a query string appended to the URL (consistency/stability);
  • $url is a variable that I define early in the code, and contains the string after the domain name slash and before the query string question mark, basically the kind of string you fill the $nocachelist array with (if you were paying attention, you may now think I have a redundant variable since $url and $_GET[“page”] should be the same, but this is not the case for other reasons);
  • $html is the string variable that, across the whole CMS, defines the raw HTML code to echo at the end of the PHP execution; you can either do like I do and define such string, or use an output buffer to obtain HTML if you instead print the HTML directly to screen during PHP execution;
  • ~creationtime is a “hotkey” I use in my template to plug in the time in seconds that was needed to create the page in PHP; since I am creating a cached version now, the creation time of the page is zero, because it’s already there to be downloaded from the browser instead of having to be compiled by the server, so in there I print either “cached&gzipped” for clients that support gzip, or only “cached” when the browser doesn’t not; you can safely strip out this part, as this is more of a eyecandy/nerdy/debug thing.

Let .htaccess send the cached files before starting the PHP compiler

AddEncoding x-gzip .gz
<FilesMatch "\.html\.gz$">
    ForceType text/html
</FilesMatch>

#GZIP CMS
RewriteCond %{REQUEST_METHOD} !POST
RewriteCond %{REQUEST_URI} !/forum/
RewriteCond %{QUERY_STRING} !.*=.*
RewriteCond %{HTTP:Cookie} !^.*(isadmin|nocache).*$
RewriteCond %{HTTP:X-Wap-Profile} !^[a-z0-9\"]+ [NC]
RewriteCond %{HTTP:Profile} !^[a-z0-9\"]+ [NC]
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond /full/path/to/your/htdocs/cache/$1-static.html.gz -f
RewriteRule ^(.*) /cache/$1-static.html.gz [L,T=text/html]

#UNCOMPRESSED CMS
RewriteCond %{REQUEST_METHOD} !POST
RewriteCond %{REQUEST_URI} !/forum/
RewriteCond %{QUERY_STRING} !.*=.*
RewriteCond %{HTTP:Cookie} !^.*(isadmin|nocache).*$
RewriteCond %{HTTP:X-Wap-Profile} !^[a-z0-9\"]+ [NC]
RewriteCond %{HTTP:Profile} !^[a-z0-9\"]+ [NC]
RewriteCond /full/path/to/your/htdocs/cache/$1-static.html -f
RewriteRule ^(.*) /cache/$1-static.html [L]

This is the code you need to plug in .htaccess, preferably after everything else, but before defining the custom error pages; anyway, since you should be .htaccess-fluent, you shouldn’t need to be told where this fits best.

Some detailing for the curious:

  • First bit is needed (at least I needed it) to serve the gzipped files in a way that the browser knows to handle, otherwise I just got gibberish (the gzip was being sent to output without being uncompressed by the browser first);
  • if the cookies “isadmin” or  “nocache” exist, the cache version of the page will not be served even if it exists; easy explanation, if an admin is logged, and there is special content on a page that only admins can see, you don’t want them to see the “vanilla” cached version of the pages instead; so it’s your duty in this case to set a “isadmin” cookie when an admin logs in, and remove it when the admin logs out;
  • choosing the correct full path on the webserver was a bit tricky in my case, I can’t quite remember where the issue was, but depending on the method I used to get it I had different paths; I kind of remember I had to choose between $_SERVER[“DOCUMENT_ROOT”] and dirname(__FILE__) because only one was working with .htaccess;
  • you don’t really have to change much in this code snippet unless you have particular needs; the /forum/ path exclusion could be irrelevant in your case;
  • thanks go to WPSuperCache developers for their .htaccess code that I stole to build this snippet!

Last but not least

The cache is there to help you, not to give an handicap to your website. You choose what the cache performance should be, and how many times it should get served instead of the dynamic page, before considering it paied off, but you have to clear the cache from time to time to make sure you’re not serving outdated stuff to your visitors.

In my case I use an external cronjob provider (setcronjob.com) to trigger a PHP routine every night which includes the following:

$handle=opendir("cache");
while (($file = readdir($handle))!==false) @unlink("cache/".$file);
closedir($handle);

so that everyday the website starts off with a fresh cache. Not less important, you should clear the cache of a single page as soon as you know that page changed, unless you’re ok with the changes being visibile only the next day after all the cache is cleared anyway. Example: you edit a page while logged as admin, or a user posts a comment, or anything happens that you have control over that alters in any way the page’s HTML: simply use unlink() to delete both the gzipped and uncompressed caches and the website will recreate them with the updated content.

 

Have fun with pimping up this draft!

Redirect mobile and desktop users to the correct CSS via .htaccess file

The winning solution for a mobile compliant website is to have a single HTML structure for every visitor, and just choose the appropriate CSS to render the page correctly for both worlds; so, I’m not speaking here about a /mobile/ folder under your website root, nor any ?mobile query string, but about the exact same URL, that has all the candy when viewed by a desktop browser, while is super-compact and sleak if the browser is running on a smartphone.

How do you do that?

First, get yourself two stylesheets suited for desktop and mobile, or rather, like I did, prepare a set of CSS files, where there is a “common.css” always used, and a “desktop.css” and “mobile.css” that get loaded when needed. Also, get yourself minify in order to define a “desk” and a “mobi” group of files to load with a simple URL (I use this procedure on another website).

Then add this to your .htaccess file, fairly early after the start of the RewriteEngine:

RewriteCond %{HTTP_USER_AGENT} !^.*(2.0\ MMP|240x320|400X240|AvantGo|BlackBerry|Blazer|Cellphone|Danger|DoCoMo|Elaine/3.0|EudoraWeb|Googlebot-Mobile|hiptop|IEMobile|KYOCERA/WX310K|LG/U990|MIDP-2.|MMEF20|MOT-V|NetFront|Newt|Nintendo\ Wii|Nitro|Nokia|Opera\ Mini|Palm|PlayStation\ Portable|portalmmm|Proxinet|ProxiNet|SHARP-TQ-GX10|SHG-i900|Small|SonyEricsson|Symbian\ OS|SymbianOS|TS21i-10|UP.Browser|UP.Link|webOS|Windows\ CE|WinWAP|YahooSeeker/M1A1-R2D2|iPhone|iPod|Android|BlackBerry9530|LG-TU915\ Obigo|LGE\ VX|webOS|Nokia5800).* [NC]
RewriteCond %{HTTP_user_agent} !^(w3c\ |w3c-|acs-|alav|alca|amoi|audi|avan|benq|bird|blac|blaz|brew|cell|cldc|cmd-|dang|doco|eric|hipt|htc_|inno|ipaq|ipod|jigs|kddi|keji|leno|lg-c|lg-d|lg-g|lge-|lg/u|maui|maxo|midp|mits|mmef|mobi|mot-|moto|mwbp|nec-|newt|noki|palm|pana|pant|phil|play|port|prox|qwap|sage|sams|sany|sch-|sec-|send|seri|sgh-|shar|sie-|siem|smal|smar|sony|sph-|symb|t-mo|teli|tim-|tosh|tsm-|upg1|upsi|vk-v|voda|wap-|wapa|wapi|wapp|wapr|webc|winw|winw|xda\ |xda-).* [NC]
RewriteRule ^stylesheets\.css$ /min/index.php?g=cssdesk [L]
RewriteRule ^stylesheets\.css$ /min/index.php?g=cssmobi [L]

First of all, these directives do not consider the iPad as a mobile device, since it has plenty of screen space in my opinion to just work as a desktop browser; if you do not agree, just insert an “ipad” string in there together with the rest.

Some explaining is in order: credit for the RewriteCond’s goes to the creators of WP Supercache, which has been a model to write my own caching system on another website of mine (next article is coming about the caching routine itself).

Those RewriteCond lines tie the next RewriteRule directive only to desktop browsers (since they exclude all user agents that can be associated to mobile devices), so the first RewriteRule redirects to the /min/index.php?g=cssdesk minified CSS set tailored for desktops (change it as needed). The [L] directive right after it tells .htaccess to ignore everything else after that. When the RewriteCond’s are not met, it means a mobile device is coming to your site, so the first RewriteRule is ignored, and the one after that is used that redirects to the mobile CSS set instead.

You see that I redirect requests for a file named “stylesheets.css”. Guess what, this file doesn’t exist on my website, and it just references from the template’s HTML: it’s .htaccess that deals with giving the browser the correct data. In this case all you have to do is reference the stylesheets in your HTML code like this:

<link rel="stylesheet" href="/stylesheets.css" type="text/css" media="screen" />

The advantages? This is faster if compared to the same operation done with PHP, since a webserver processes .htaccess directives faster than it compiles PHP code, plus you may be saving an additional include operation on an external PHP file. Also, an approach of this type is absolutely needed if you want to implement a no-PHP-execution caching system, because then there would be no PHP code to use to address browsers to the desktop or mobile CSS, would it.