{kun´ēzē}
 
(Reading time: 6 - 12 minutes)
06Dec2016

To cache or not to cache?

Information
2897 hits Updated: 07 December 2016 Blog

What is web caching?

Types of caches

Caching:  bene­fits and dis­ad­van­tages

There are many myths about caching—for example, some people believe that HTTPS does not cache web pages—but there’s a lot of ignorance about how to use caching effectively.  This article doesn’t have all the answers but it may help people learn about web caching and some of the benefits, costs and risks associated with how you use it.

A web cache (or HTTP cache) is an information technology for the temporary storage of web documents, such as HTML pages and images, to reduce bandwidth usage, server load and perceived lag.[1]  Because generating web content over the internet is both slow and expensive and, in today’s fast-paced and time-constrained world, most people’s attention span barely survives one or two seconds, one of the main purposes of caching is to improve the user experience.  Large responses for information can involve many roundtrips between the client and server which delays when they are available and when the browser can process them, and also incur data costs for the visitor.  In other words, caching helps reduce the cost involved between when the user clicks a mouse/presses a key/taps a screen and when an “event” (such as displaying a new web page) occurs.

The ability to cache and reuse previously fetched resources is a critical part of optimising for performance.  The wrong use of caching, on the other hand, can also be counter-productive as we shall discuss.

Types of caches

For most people, there are two main usages of caching:  there’s caching performed by the web browser and there’s caching performed by the web server.  Some people also utilise other, “intermediate” caches, i.e. caches that exist between the server and the browser; the usual term used for these auxiliary caches is proxy caching.  Content Distribution Networks [CDN], such as Cloudflare, are a form of proxy caching.

Perhaps the easiest way to explain what types of caching is involved in the traffic flow between the client and the server within a Content Management System [CMS], is to look at the traffic flow between the client and server without caching involved.  At its most basic level, the client issues a request to get information from the web application, the web application obtains the data from its database and bundles the output back to the client; the client then receives the data bundle, unpacks it and displays it on an output device.  A diagram of the traffic flow appears below.

Data flow between the web client and server in a CMS

Client Browser Cache

Web browser client settings (Google Chrome)Client browsers, such as Google Chrome, Mozilla Firefox, Internet Explorer, etc., will cache URLs on the local hard drive for future access, typically within a directory such as “Temporary Internet Files” where web site objects are stored including most items that are associated with the web pages you have visited.  For example, when you click the Back button on your desktop browser, you are most likely looking at a locally-cached version of the web page instead of the page contents that may have changed during that time.  The advantage is obvious:  faster perceived loading of the web page.  The disadvantage is that if content on the page has changed you will not immediately realise the updates by just accessing the cached version.  Of course the client user can clear their browser cache at any time or updated the cache settings, as shown in the example dialog box from Google Chrome displayed in the image at right (click to enlarge).

Proxy server cache

Proxy server caching works on related principles as client browser cache, but on a greater scale serving a multitude of users simultaneously.  Typically the most requested URLs are stored along with their associated objects, and then these are served first anytime those URLs are requested from clients on the network.  Often, these web proxy servers are set up on the organisation’s network firewall or as part of a larger firewall solution.  They may also use technologies including URL blocking and associated “black lists” (or “white lists”) for restricting or allowing certain categories of websites or specific URLs.  A diagram of where a typical proxy server sits in relation to an organization’s network is illustrated in the image below.

client server02

In the above example, the proxy server is sitting in what is also known as a DMZ, or in computer security terms, is sometimes referred to as a perimeter network.

Server-side cache

Server-side cache reduces the load on the web server by creating a cached copy of dynamically generated pages on the server itself.  Retrieving a webpage from the web server-side cache can save the time needed to serve a fresh page dynamically on-the-fly.  However, if the data that makes up the webpage has changed, the page which has been served from the web server-side cache won’t be as fresh.

Caching:  bene­fits and dis­ad­van­tages

The main benefit in using caches is to improve performance; the main disadvantage is that cached content may be stale, out-of-date and therefore possibly misleading.

Useful for:

Stable, “mature” websites.

Where performance is important.

Where information does not often change (e.g. images, articles, “blogs”).

Re-usable forms.

Not useful for:

Websites still under construction or in development.

Current, up-to-date information is important.

Where information often changes (e.g. web-based forums, “breaking news”).

Forms that involve the transfer of sensitive, personal or financial data.


There are several tutorials that can help people configure their websites to control what information is stored in cache:  e.g. see https://www.mnot.net/cache_docs/.  The main thing to keep in mind is that the end-user, the visitors to your website, will be oblivious to what is stored in cache.  Your task, as the webmaster, is to give your visitors the best user experience that you can without compromising performance, security, bandwidth limitations or integrity of information.

Also, there are a two items that are “normally” cached by the web-browser:  favicons and Javascript libraries.  It’s important to remember that if you update software on the server, if you upload a new favicon image or if you replace a file, you may not immediately see the change reflected when you visit your website.  One of the classic problems that occurs during website development or maintenance is to discover that the site does not behave the “same way” as it used to; quite often the simplest way to identify if a functional problem is caused by stale, cached content is to try viewing the site with a different web browser or even from a different display device/computer.  Knowing that many problems (especially with Javascript) are created because of an old file remaining in cache can save a lot of time when debugging web-based applications.  If in doubt, clear the cache on the server (and on any proxy server(s) if you’re using them) and on the browser.

Caching is a double-edged tool; it can help but it can also cause hours of unnecessary worry if you forget what’s cached and where the cache is stored.

Notes:

[1]  https://en.wikipedia.org/wiki/Web_cache

About the author:

has worked in the information technology industry since 1971 and, since retiring from the workforce in 2007, is a website hobbyist specialising in Joomla, a former member of the Kunena project for more than 8 years and contributor on The Joomla Forum™. The opinions expressed in this article are entirely those of the author. View his profile here.


No thoughts on “To cache or not to cache?”

User Rating: 5 / 5

Star ActiveStar ActiveStar ActiveStar ActiveStar Active
 
Trending now

Some other articles you may be find interesting