ScriptCachingwithPHP
时间:2007-02-17 来源:PHP爱好者
Intended Audience
Introduction
The Caching Imperative
The script Caching Solution
The Caching script
Implementation: Avoiding Common Pitfalls
Summary
The script
About the Author
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Intended Audience
This article is intended for the PHP programmer interested in creating a static HTML cache of dynamic PHP scripts. The article has been written specifically for an Apache server running PHP scripts, but the ideas described here are applicable to almost any Web environment.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The article assumes that you have some experience with creating dynamic Web sites and that you are familiar with HTTP – at least enough to know what a "404 Page Not Found" error means and the definition of the environment variables $REQUEST_URI and $DOCUMENT_ROOT.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Introduction
The benefits to using dynamic Web pages are well known, but there are nonetheless two significant drawbacks: speed and search engine accessibility.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Speed: The speed in which a user receives a page after clicking a link or entering a URL is a crucial factor for a Website. It depends on dozens of variables, some of which you may have control over and some of which you don’t. There are countless bottlenecks in the process, and it’s probably impossible to fix them all. This bottleneck we will tackle here is the one caused by waiting for the server side scripts to create the HTML output.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Search Engine Accessibility: By this I mean the ability of search engines to point to a particular Web page. Most search engines function by using a "Crawler" program. Crawler programs begin on a certain page and navigate through the links on it. Every page a crawler visits is then indexed on the search engine’s database.
Most crawlers, however, are only programmed to navigate through static (HTML) pages – not dynamic ones. So, for example, pages with URLs that contain a "?" character (indicating a query string) or a filename ending with ".php" will not be accessed. Consequently, crawlers will not index these pages, making your site less accessible to new visitors.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: A crawler cannot tell the difference between an HTML file’s output and a PHP file’s. They both send the same content type. Therefore, most crawlers simply decide according to the filename and/or if there is a query string in the URL – that is, if the URL contains a "?".
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
This article discusses a procedure for dealing with both of these drawbacks. The article’s script should be sufficient for use under most circumstances – but in particular, small scale Web sites and individual script pages that are only moderately subject to change (dynamics).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The Caching Imperative
Simply speaking, caching entails storing the output of one or more dynamic scripts into static HTML files. A visitor to your site would be directed to these HTML files rather than to their original dynamic versions.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The mechanism for doing so can be described using a Magazine’s Web site as an example.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
A Magazine’s Web site would likely have a database that contained numerous articles and stories. You would normally have a script (say "show_article.php") that:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Receives an article ID number
Reads the article’s content from the database
Puts it into some kind of HTML template
Formats the whole page with navigation links etc...
Sends the resulting HTML to the visitor’s browser
As such, in the site’s homepage you might have links to current articles coded as follows:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
<a href="show_article.php?id=123">Cache Article</a>
Now, articles tend to be static and you would hope that the site was operating under heavy request loads (because it’s popular!!). Consequently, requests for each article would undergo extensive processing – meaning access database, search article, and display it.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Moreover, when you depend on other database information such as layout specifications, then the process would take even longer. Lastly, a search engine’s crawler would not even index the content of your article(s) because the link to the article page contains a "?" and a ".php" extension, and thus the crawler would not follow it.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Therefore, to alleviate these problems a Webmaster should at least consider implementing some form of caching system.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
When You Should Cache a script
While the caching solution presented in this article will be beneficial to many users, there will be circumstances when you will prefer not to cache your scripts at all or use a different caching method.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
scripts that must deal with frequently changing data such as stock values, discussion forums or process forms are not fit for the system described in this article. Under these cases, the decision is up to you – you might decide to leave them dynamic or you might opt for a more advanced solution such as using the Zend Cache.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: Using the Zend Cache for your site caching needs would render the system described in this article totally unnecessary (though you might still want to read it in order to improve your PHP skills !). The Zend Cache provides you with a complete turnkey caching solution. For a complex site I would advise buying it (and I’m not just saying this because this is Zend’s site but because the application is both easier to maintain and is well supported.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
On the other hand, if your site only features a few basic scripts, then you probably do not need to bother with caching at all.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Nonetheless, if you:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Feature (at least relatively) complex scripts on your site,
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Wish to be able to handle numerous page hits,
and/or
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Cannot afford the cost of a commercial caching solution,
then I hope this caching mechanism will serve you well.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
For pages that do not need to be kept up to the minute, the speed of this system cannot be beaten since it creates pure static HTML pages.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The script Caching Solution
The standard caching system solution is to generate static HTML files. From the earlier example, then, the link to the cache article will now be coded as follows:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
<a href="/cache/show_article/id_123.html">Cache Article</a>
id_123.html contains the output generated by the show_article.php script when it is called using id=123.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
It is a good practice to store all of the cached files under a single directory of their own (in the above example, it was the "/cache" directory) with sub-directories named for each creating dynamic script (i.e. "show_article/" directory).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
In this manner, the cached files are separated from the dynamic scripts, making site maintenance that much easier to manage – for example, you can easily perform actions such as deleting old cached files generated by a certain script. More importantly, however, it simplifies cache.php’s string replacement mechanism. For more details, refer to cache.php details.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Be aware that links to your dynamic pages will need to be switched to point to their respective HTML scripts (output).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
So, if you would want article #123 to be cached, for example, you would simply change the link from "show_article.php?id=123" to "cache/show_article/id_123.html".
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: The HTML files do not have to be defined before assigning these new links. A script is not cached until it has been called by the Server.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Furthermore, since the HTML files will reside under a different URL, any relative paths from within those files (e.g. "http://www.myserver.com/path/to/images/art.gif") will need to be corrected. Therefore, consider working with absolute paths such as "http://www.myserver.com/path/to/images/art.gif" or "/path/to/images/art.gif" – note the preceding "/" , meaning relative to the current server .
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Alternatively, you can add a <BASE HREF="http://www.myserver.com/"> tag to your HTML <head> section.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: It is NOT recommended that you change the paths to relative paths from the cache directory (such as "../../path/to/images/art.gif"). This is because the whole point of this caching system is that files may or may not be cached according to your preferences. You will want to have the links working whether the HTML is read from a cached file (under the /cache/ directory) or from the dynamic script (in some other directory); Absolute URLs guarantee this.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The Caching script
Central to the caching system is the caching script, itself (cache.php). It reads the dynamic scripts by using fopen(<dynamic script URL>) as if it was a browser. It generates the output and then saves this output to a static HTML file, after having displayed it to the user.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
cache.php, itself, only uses basic PHP. It can also function independently of any other script. Consequently, you do not need to modify any existing scripts in order to implement script caching.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Activating the Caching script
The recommended method for activating cache.php is to do so by way of the " 404 page not found" event, thereby automating its execution and minimizing its impact on the site.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The "404 Page Not Found" error informs the visitor that the server could not find his/her requested page. Most of the time a standard "Page Not Found" page is displayed. However, since most Web servers enable you to customize your error pages, you can call the cache.php script when a file is not found in place of displaying the default "Page Not Found" page.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
For example, in Apache, you can edit your configuration file (httpd.conf and located in the "apache/conf/" directory) by adding the following statement :
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
ErrorDocument 404 /cache.php
This statement assigns responsibility for handling a 404 error to the cache.php script. Apache will call this script when a file is not found in place of the default "Page Not Found" page.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Warning: Be sure that a copy of the original configuration file is saved before changing it. It is always a good idea to keep a copy of any configuration file before changing it. If you unintentionally corrupted it, you will always be able to resort to the original file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Secondly, add the cache.php script to your system before applying the change. Otherwise, the 404 error will not find the cache.php script and this will lead to another 404 error etc. resulting in an endless loop. (Actually Apache handles that case by issuing a 500 error, but you might run into a server/version does not handle it properly)
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The major benefit to caching using the 404 error is the ability to do so both automatically and on demand, provided you have initially defined the links to the HTML scripts. The absence of a "linked" HTML file triggers the 404 error message, prompting cache.php to define the file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
This first visitor to a file, however, activates the slower, dynamic file by way of the caching script. When cache.php is triggered, the script determines the link to the original dynamic file and generates the HTML output. In doing so, it displays this output to the visitor before saving into the new HTML (static) file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
cache.php is only generated for the first visitor. Once the HTML static file has been created, the defined link becomes valid and the 404 error is no longer generated upon subsequent requests. However, if the data to the dynamic script changes (e.g. someone updated the article) you could simply remove the cached .html file, leading the way for the 404 error to be triggered once more.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: Continuing with our earlier example, if you changed show_article.php, itself, such that its HTML output will be altered, you will want to "clean" out your cache, meaning deleting all of the files under the "show_article/" directory. Consequently, your cache will (eventually) be refreshed with the new HTML.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Tip: If you do not want to cache a certain file, simply leave the (original) link to the dynamic file as is (meaning don’t define an HTML link for that file).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
cache.php Details
The caching script (cache.php) receives the location of the (non-existent) static file via $REQUEST_URI and its purpose is to ultimately generate this static file. ($REQUEST_URI is parsed using a str_replace()command).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
cache.php initially determines the original dynamic script’s URL using str_replace().The resulting URL is stored in the $maker_URL variable.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The script then opens the dynamic script’s URL and reads its output. This is really quite simple as PHP enables you to do so by using the fopen() function.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: fopen() can open a Web page as well as a file. You could read a page from your own site by entering your site’s URL (or "127.0.0.1" which is a reserved IP address that will always point to your local machine).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
In the magazine example, you would use:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$read = fopen ( "http://www.newspapersite.com/show_article.php?id=123","r" );
cache.php then reads the dynamic script URL using fread(), just as if it were a file. While reading the HTML, the script saves it all into a variable. You could display it on the screen as the output is being read (as cache.php does) or simply defer its display until the reading has been done.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Lastly, the script opens a local file to save the newly created HTML:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$write=fopen ("cache/show_article/id_123.html","r");
Note: cache.php does not save the static file until it has finished reading from the dynamic script. The saving operation is also quick – as the entire file is saved at once.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Implementation: Avoiding Common Pitfalls
The caching script provided here handles some common traps that might be encountered. I will describe them here in order to give you a better understanding of the script’s action and also help you avoid those pitfalls should you decide to write your own script.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Visitors to the site whose behavior you cannot predict trigger the script’s action. Therefore, create the static file only after you have generated all of the HTML, thereby saving it all at once.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Doing so prevents an incomplete file from being created, due to a first visitor’s decision to view only part of a page but then moving on to another page. Remember, once a file is in the cache, the caching script will not be triggered again. As such, subsequent visitors will see the cached file, even if only part of the actual HTML was saved. cache.php minimizes the latency time by writing to a file, only once all of the HTML has been defined in a single string. The fwrite() command is used.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Two visitors might request the same file simultaneously. If the file was not yet cached, it might mean that both of the scripts will attempt to create the same cached file simultaneously. This will probably lead to problems. To avoid it, cache.php employs flock() when creating the static file. This command locks a file, preventing another script from accessing it until another flock() is issued to unlock the file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
What used to be a query string (e.g. "?id=123&x=1"), now becomes a filename. Different Operating Systems have different file naming conventions. In cache.php, I decided that "=" and "&" will be converted to "_" and "__", respectively. If the need arises (for example, some of your scripts accept strings which contain "_" in their query string), you can modify the script to reflect the convention most suitable for you.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Prepare for genuine 404 events. There might be times when 404 is called because a request was made for a certain file which really does not exist, without relation to the caching system. The caching script accounts for this by checking whether or not the requested file can be found at all. This is done using the file_exists() function. If the file cannot be found, the script displays a "Couldn’t find $REQUEST_URI" and ceases execution. You might want to customize this message to meet with your particular needs, or even add features such as sending automatic email message to the Webmaster when a page is not found (with the $REQUEST_URI value), so you can fix it later.
Summary
Speed is an important factor in dynamic Websites. In this article I described a method of increasing a site’s speed – the effect of which depends on the sites reliance on dynamic scripts. The heavier a script is, the more speed you gain by turning it into a static page. In addition, static pages can be indexed by crawler programs thereby making your site more accessible to new visitors.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
There are, however, many different solutions for caching and speeding up dynamic Web pages. The one described here is one of the simplest, yet could be very useful in many cases. Since it works by creating a static HTML page instead of a dynamic page, it is as fast as you can get using a pure software solution.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The script is best set up so it’s triggered by the "404 Page not found" event, thus automating its action and minimizing its impact on the site. Moreover, the script requires that you initially define links inside the HTML to images, JS files and other files as absolute paths. Links to files that you will want to cache, must be changed to point directly to the (resulting) static cache files (even if the static files have not yet been created as the 404 event/cache.php does so automatically).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
There are scripts that you might not want to cache. These include scripts that deal with processing forms, displaying rapidly changing data (such as stock prices) or time critical information. Under these circumstances, simply leave the link pointing to the dynamic script.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
If your site contains a few small scripts, you may not need to bother with caching at all. On the other hand, if you rely on complex scripts and fresh data, you should use a much more sophisticated solution, such as the Zend Cache. But if you are somewhere in between, I hope this article will be of help to you. If you have any comments, please feel free to email me.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The script
cache.php
Here is a simple version of the caching script. This version is intentionally simple and meant to be easily read. In practice, you can handle the exceptions more gracefully (a nicer "Page Not Found" page, add "@" before file operations etc.) You might also tailor it to your needs – instead of assuming the creating script end with ".php", for example, you could configure it to be ".php3", ".pl" or some other variation.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Yet this script does the job and can be used as it is.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
<?php
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// example caching script
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// get the static HTML file’s location
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$cache_file = $REQUEST_URI;
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// find out the URL of the dynamic script
// which creates the static file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$maker_URL = str_replace ( "/cache/" , "/" , $cache_file );
$maker_URL = str_replace ( ".html" , "" , $maker_URL );
$last_slash = strrpos ( $maker_URL , "/" );
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// find out the creating script’s name
// and make sure it exists.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$script = substr ( $maker_URL , 0 , $last_slash ) . ".php";
$find = $DOCUMENT_ROOT . $script;
if ( !file_exists ( $find )) {
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// if the file does not exist, show a
// File Not Found error -
// echo ("Couldn’t find $REQUEST_URI");
// you can put a nice page here...
exit;
// but don’t forget to exit !
}
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// now parse the query string
// here, "_" means "=" and "__" means "&"
// These rules are just personal preferences
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$query_str = "?" . substr ( $maker_URL , $last_slash+1 );
$query_str = str_replace ( "__" , "&" , $query_str );
$query_str = str_replace ( "_" , "=" , $query_str );
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// and now create the full maker_URL
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$maker_URL = "http://" . $HTTP_HOST . $script . $query_str;
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// open the maker script and read its output
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$read = fopen ( $maker_URL , "r" );
if ( !$read ) {
echo ( "Could not open $maker_URL" );
exit;
}
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$HTML_output = "";
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// read the HTML output while displaying it
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
while ($line = fgets ( $read , 256 )) {
$HTML_output.= $line;
echo $line;
}
fclose ( $read );
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// finally, save the HTML output
// in a cache file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$write = fopen ( $DOCUMENT_ROOT . $cache_file , "w" );
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
if ( !$write ) {
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// you might not have permission
// to write in that directory.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
echo ( "could not open $writefile for writing" );
exit;
}
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// lock the write file and
// write all the HTML into it
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
if ( !flock ( $write , LOCK_EX + LOCK_NB )) {
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// for PHP version <4.0.1
// change LOCK_EX to 2
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
echo ( "could not lock $writefile" );
exit;
}
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
fwrite ( $write , $HTML_output , strlen ( $HTML_output ) );
flock ( $write , LOCK_UN );
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// release lock. For PHP version <4.0.1
// change LOCK_UN to 3
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
fclose ( $write );
?> chin a i t p oe er . co mGr9ewybGYzJ5IKoJxjkKXBsZB chin a i t p oe er . co mGr9ewybGYzJ5IKoJxjkKXBsZB
php爱好者站 http://www.phpfans.net PHP|MySQL|javascript|ajax|html.
Introduction
The Caching Imperative
The script Caching Solution
The Caching script
Implementation: Avoiding Common Pitfalls
Summary
The script
About the Author
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Intended Audience
This article is intended for the PHP programmer interested in creating a static HTML cache of dynamic PHP scripts. The article has been written specifically for an Apache server running PHP scripts, but the ideas described here are applicable to almost any Web environment.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The article assumes that you have some experience with creating dynamic Web sites and that you are familiar with HTTP – at least enough to know what a "404 Page Not Found" error means and the definition of the environment variables $REQUEST_URI and $DOCUMENT_ROOT.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Introduction
The benefits to using dynamic Web pages are well known, but there are nonetheless two significant drawbacks: speed and search engine accessibility.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Speed: The speed in which a user receives a page after clicking a link or entering a URL is a crucial factor for a Website. It depends on dozens of variables, some of which you may have control over and some of which you don’t. There are countless bottlenecks in the process, and it’s probably impossible to fix them all. This bottleneck we will tackle here is the one caused by waiting for the server side scripts to create the HTML output.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Search Engine Accessibility: By this I mean the ability of search engines to point to a particular Web page. Most search engines function by using a "Crawler" program. Crawler programs begin on a certain page and navigate through the links on it. Every page a crawler visits is then indexed on the search engine’s database.
Most crawlers, however, are only programmed to navigate through static (HTML) pages – not dynamic ones. So, for example, pages with URLs that contain a "?" character (indicating a query string) or a filename ending with ".php" will not be accessed. Consequently, crawlers will not index these pages, making your site less accessible to new visitors.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: A crawler cannot tell the difference between an HTML file’s output and a PHP file’s. They both send the same content type. Therefore, most crawlers simply decide according to the filename and/or if there is a query string in the URL – that is, if the URL contains a "?".
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
This article discusses a procedure for dealing with both of these drawbacks. The article’s script should be sufficient for use under most circumstances – but in particular, small scale Web sites and individual script pages that are only moderately subject to change (dynamics).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The Caching Imperative
Simply speaking, caching entails storing the output of one or more dynamic scripts into static HTML files. A visitor to your site would be directed to these HTML files rather than to their original dynamic versions.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The mechanism for doing so can be described using a Magazine’s Web site as an example.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
A Magazine’s Web site would likely have a database that contained numerous articles and stories. You would normally have a script (say "show_article.php") that:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Receives an article ID number
Reads the article’s content from the database
Puts it into some kind of HTML template
Formats the whole page with navigation links etc...
Sends the resulting HTML to the visitor’s browser
As such, in the site’s homepage you might have links to current articles coded as follows:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
<a href="show_article.php?id=123">Cache Article</a>
Now, articles tend to be static and you would hope that the site was operating under heavy request loads (because it’s popular!!). Consequently, requests for each article would undergo extensive processing – meaning access database, search article, and display it.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Moreover, when you depend on other database information such as layout specifications, then the process would take even longer. Lastly, a search engine’s crawler would not even index the content of your article(s) because the link to the article page contains a "?" and a ".php" extension, and thus the crawler would not follow it.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Therefore, to alleviate these problems a Webmaster should at least consider implementing some form of caching system.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
When You Should Cache a script
While the caching solution presented in this article will be beneficial to many users, there will be circumstances when you will prefer not to cache your scripts at all or use a different caching method.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
scripts that must deal with frequently changing data such as stock values, discussion forums or process forms are not fit for the system described in this article. Under these cases, the decision is up to you – you might decide to leave them dynamic or you might opt for a more advanced solution such as using the Zend Cache.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: Using the Zend Cache for your site caching needs would render the system described in this article totally unnecessary (though you might still want to read it in order to improve your PHP skills !). The Zend Cache provides you with a complete turnkey caching solution. For a complex site I would advise buying it (and I’m not just saying this because this is Zend’s site but because the application is both easier to maintain and is well supported.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
On the other hand, if your site only features a few basic scripts, then you probably do not need to bother with caching at all.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Nonetheless, if you:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Feature (at least relatively) complex scripts on your site,
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Wish to be able to handle numerous page hits,
and/or
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Cannot afford the cost of a commercial caching solution,
then I hope this caching mechanism will serve you well.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
For pages that do not need to be kept up to the minute, the speed of this system cannot be beaten since it creates pure static HTML pages.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The script Caching Solution
The standard caching system solution is to generate static HTML files. From the earlier example, then, the link to the cache article will now be coded as follows:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
<a href="/cache/show_article/id_123.html">Cache Article</a>
id_123.html contains the output generated by the show_article.php script when it is called using id=123.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
It is a good practice to store all of the cached files under a single directory of their own (in the above example, it was the "/cache" directory) with sub-directories named for each creating dynamic script (i.e. "show_article/" directory).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
In this manner, the cached files are separated from the dynamic scripts, making site maintenance that much easier to manage – for example, you can easily perform actions such as deleting old cached files generated by a certain script. More importantly, however, it simplifies cache.php’s string replacement mechanism. For more details, refer to cache.php details.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Be aware that links to your dynamic pages will need to be switched to point to their respective HTML scripts (output).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
So, if you would want article #123 to be cached, for example, you would simply change the link from "show_article.php?id=123" to "cache/show_article/id_123.html".
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: The HTML files do not have to be defined before assigning these new links. A script is not cached until it has been called by the Server.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Furthermore, since the HTML files will reside under a different URL, any relative paths from within those files (e.g. "http://www.myserver.com/path/to/images/art.gif") will need to be corrected. Therefore, consider working with absolute paths such as "http://www.myserver.com/path/to/images/art.gif" or "/path/to/images/art.gif" – note the preceding "/" , meaning relative to the current server .
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Alternatively, you can add a <BASE HREF="http://www.myserver.com/"> tag to your HTML <head> section.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: It is NOT recommended that you change the paths to relative paths from the cache directory (such as "../../path/to/images/art.gif"). This is because the whole point of this caching system is that files may or may not be cached according to your preferences. You will want to have the links working whether the HTML is read from a cached file (under the /cache/ directory) or from the dynamic script (in some other directory); Absolute URLs guarantee this.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The Caching script
Central to the caching system is the caching script, itself (cache.php). It reads the dynamic scripts by using fopen(<dynamic script URL>) as if it was a browser. It generates the output and then saves this output to a static HTML file, after having displayed it to the user.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
cache.php, itself, only uses basic PHP. It can also function independently of any other script. Consequently, you do not need to modify any existing scripts in order to implement script caching.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Activating the Caching script
The recommended method for activating cache.php is to do so by way of the " 404 page not found" event, thereby automating its execution and minimizing its impact on the site.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The "404 Page Not Found" error informs the visitor that the server could not find his/her requested page. Most of the time a standard "Page Not Found" page is displayed. However, since most Web servers enable you to customize your error pages, you can call the cache.php script when a file is not found in place of displaying the default "Page Not Found" page.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
For example, in Apache, you can edit your configuration file (httpd.conf and located in the "apache/conf/" directory) by adding the following statement :
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
ErrorDocument 404 /cache.php
This statement assigns responsibility for handling a 404 error to the cache.php script. Apache will call this script when a file is not found in place of the default "Page Not Found" page.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Warning: Be sure that a copy of the original configuration file is saved before changing it. It is always a good idea to keep a copy of any configuration file before changing it. If you unintentionally corrupted it, you will always be able to resort to the original file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Secondly, add the cache.php script to your system before applying the change. Otherwise, the 404 error will not find the cache.php script and this will lead to another 404 error etc. resulting in an endless loop. (Actually Apache handles that case by issuing a 500 error, but you might run into a server/version does not handle it properly)
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The major benefit to caching using the 404 error is the ability to do so both automatically and on demand, provided you have initially defined the links to the HTML scripts. The absence of a "linked" HTML file triggers the 404 error message, prompting cache.php to define the file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
This first visitor to a file, however, activates the slower, dynamic file by way of the caching script. When cache.php is triggered, the script determines the link to the original dynamic file and generates the HTML output. In doing so, it displays this output to the visitor before saving into the new HTML (static) file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
cache.php is only generated for the first visitor. Once the HTML static file has been created, the defined link becomes valid and the 404 error is no longer generated upon subsequent requests. However, if the data to the dynamic script changes (e.g. someone updated the article) you could simply remove the cached .html file, leading the way for the 404 error to be triggered once more.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: Continuing with our earlier example, if you changed show_article.php, itself, such that its HTML output will be altered, you will want to "clean" out your cache, meaning deleting all of the files under the "show_article/" directory. Consequently, your cache will (eventually) be refreshed with the new HTML.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Tip: If you do not want to cache a certain file, simply leave the (original) link to the dynamic file as is (meaning don’t define an HTML link for that file).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
cache.php Details
The caching script (cache.php) receives the location of the (non-existent) static file via $REQUEST_URI and its purpose is to ultimately generate this static file. ($REQUEST_URI is parsed using a str_replace()command).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
cache.php initially determines the original dynamic script’s URL using str_replace().The resulting URL is stored in the $maker_URL variable.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The script then opens the dynamic script’s URL and reads its output. This is really quite simple as PHP enables you to do so by using the fopen() function.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Note: fopen() can open a Web page as well as a file. You could read a page from your own site by entering your site’s URL (or "127.0.0.1" which is a reserved IP address that will always point to your local machine).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
In the magazine example, you would use:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$read = fopen ( "http://www.newspapersite.com/show_article.php?id=123","r" );
cache.php then reads the dynamic script URL using fread(), just as if it were a file. While reading the HTML, the script saves it all into a variable. You could display it on the screen as the output is being read (as cache.php does) or simply defer its display until the reading has been done.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Lastly, the script opens a local file to save the newly created HTML:
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$write=fopen ("cache/show_article/id_123.html","r");
Note: cache.php does not save the static file until it has finished reading from the dynamic script. The saving operation is also quick – as the entire file is saved at once.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Implementation: Avoiding Common Pitfalls
The caching script provided here handles some common traps that might be encountered. I will describe them here in order to give you a better understanding of the script’s action and also help you avoid those pitfalls should you decide to write your own script.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Visitors to the site whose behavior you cannot predict trigger the script’s action. Therefore, create the static file only after you have generated all of the HTML, thereby saving it all at once.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Doing so prevents an incomplete file from being created, due to a first visitor’s decision to view only part of a page but then moving on to another page. Remember, once a file is in the cache, the caching script will not be triggered again. As such, subsequent visitors will see the cached file, even if only part of the actual HTML was saved. cache.php minimizes the latency time by writing to a file, only once all of the HTML has been defined in a single string. The fwrite() command is used.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Two visitors might request the same file simultaneously. If the file was not yet cached, it might mean that both of the scripts will attempt to create the same cached file simultaneously. This will probably lead to problems. To avoid it, cache.php employs flock() when creating the static file. This command locks a file, preventing another script from accessing it until another flock() is issued to unlock the file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
What used to be a query string (e.g. "?id=123&x=1"), now becomes a filename. Different Operating Systems have different file naming conventions. In cache.php, I decided that "=" and "&" will be converted to "_" and "__", respectively. If the need arises (for example, some of your scripts accept strings which contain "_" in their query string), you can modify the script to reflect the convention most suitable for you.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Prepare for genuine 404 events. There might be times when 404 is called because a request was made for a certain file which really does not exist, without relation to the caching system. The caching script accounts for this by checking whether or not the requested file can be found at all. This is done using the file_exists() function. If the file cannot be found, the script displays a "Couldn’t find $REQUEST_URI" and ceases execution. You might want to customize this message to meet with your particular needs, or even add features such as sending automatic email message to the Webmaster when a page is not found (with the $REQUEST_URI value), so you can fix it later.
Summary
Speed is an important factor in dynamic Websites. In this article I described a method of increasing a site’s speed – the effect of which depends on the sites reliance on dynamic scripts. The heavier a script is, the more speed you gain by turning it into a static page. In addition, static pages can be indexed by crawler programs thereby making your site more accessible to new visitors.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
There are, however, many different solutions for caching and speeding up dynamic Web pages. The one described here is one of the simplest, yet could be very useful in many cases. Since it works by creating a static HTML page instead of a dynamic page, it is as fast as you can get using a pure software solution.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The script is best set up so it’s triggered by the "404 Page not found" event, thus automating its action and minimizing its impact on the site. Moreover, the script requires that you initially define links inside the HTML to images, JS files and other files as absolute paths. Links to files that you will want to cache, must be changed to point directly to the (resulting) static cache files (even if the static files have not yet been created as the 404 event/cache.php does so automatically).
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
There are scripts that you might not want to cache. These include scripts that deal with processing forms, displaying rapidly changing data (such as stock prices) or time critical information. Under these circumstances, simply leave the link pointing to the dynamic script.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
If your site contains a few small scripts, you may not need to bother with caching at all. On the other hand, if you rely on complex scripts and fresh data, you should use a much more sophisticated solution, such as the Zend Cache. But if you are somewhere in between, I hope this article will be of help to you. If you have any comments, please feel free to email me.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
The script
cache.php
Here is a simple version of the caching script. This version is intentionally simple and meant to be easily read. In practice, you can handle the exceptions more gracefully (a nicer "Page Not Found" page, add "@" before file operations etc.) You might also tailor it to your needs – instead of assuming the creating script end with ".php", for example, you could configure it to be ".php3", ".pl" or some other variation.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
Yet this script does the job and can be used as it is.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
<?php
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// example caching script
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// get the static HTML file’s location
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$cache_file = $REQUEST_URI;
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// find out the URL of the dynamic script
// which creates the static file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$maker_URL = str_replace ( "/cache/" , "/" , $cache_file );
$maker_URL = str_replace ( ".html" , "" , $maker_URL );
$last_slash = strrpos ( $maker_URL , "/" );
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// find out the creating script’s name
// and make sure it exists.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$script = substr ( $maker_URL , 0 , $last_slash ) . ".php";
$find = $DOCUMENT_ROOT . $script;
if ( !file_exists ( $find )) {
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// if the file does not exist, show a
// File Not Found error -
// echo ("Couldn’t find $REQUEST_URI");
// you can put a nice page here...
exit;
// but don’t forget to exit !
}
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// now parse the query string
// here, "_" means "=" and "__" means "&"
// These rules are just personal preferences
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$query_str = "?" . substr ( $maker_URL , $last_slash+1 );
$query_str = str_replace ( "__" , "&" , $query_str );
$query_str = str_replace ( "_" , "=" , $query_str );
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// and now create the full maker_URL
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$maker_URL = "http://" . $HTTP_HOST . $script . $query_str;
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// open the maker script and read its output
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$read = fopen ( $maker_URL , "r" );
if ( !$read ) {
echo ( "Could not open $maker_URL" );
exit;
}
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$HTML_output = "";
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// read the HTML output while displaying it
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
while ($line = fgets ( $read , 256 )) {
$HTML_output.= $line;
echo $line;
}
fclose ( $read );
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// finally, save the HTML output
// in a cache file.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
$write = fopen ( $DOCUMENT_ROOT . $cache_file , "w" );
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
if ( !$write ) {
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// you might not have permission
// to write in that directory.
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
echo ( "could not open $writefile for writing" );
exit;
}
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// lock the write file and
// write all the HTML into it
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
if ( !flock ( $write , LOCK_EX + LOCK_NB )) {
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// for PHP version <4.0.1
// change LOCK_EX to 2
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
echo ( "could not lock $writefile" );
exit;
}
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
fwrite ( $write , $HTML_output , strlen ( $HTML_output ) );
flock ( $write , LOCK_UN );
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
// release lock. For PHP version <4.0.1
// change LOCK_UN to 3
ww w.china it power.coGr9ewybGYzJ5IKoJxjkKXBsZB
fclose ( $write );
?> chin a i t p oe er . co mGr9ewybGYzJ5IKoJxjkKXBsZB chin a i t p oe er . co mGr9ewybGYzJ5IKoJxjkKXBsZB
php爱好者站 http://www.phpfans.net PHP|MySQL|javascript|ajax|html.
相关阅读 更多 +