It is recommended to implement caching in your web applications and websites. It gives a good performance boost - the user sees the page much quicker, the server on the other hand has to work lesser.
It is actually quite simple to implement caching in your web application. Probably in about 10-15 lines of code, your work will be done.
Please note that this tutorial mainly deals with caching between your web application and the user's browser. It does not take CDNs, proxies, load balancers etc into account - caching techniques may be (slightly) different for them.
Also note that this tutorial is focused on caching content emitted by server-side scripts (PHP, ASP etc) — which is usually HTML, JSON etc. It does not discuss caching CSS, Javascript or image files.
HTTP Caching Headers
Caching is all about sending proper HTTP headers from the server. On the client side you don't need to do anything - the browser reads the HTTP headers and makes caching decisions based on them.
There are many HTTP headers, but while dealing with caching, only 4 are important :
- Cache-Control
- ETag
- Last-Modified
- Expires
Last-Modified and Expires header are defined in the older HTTP/1 specification. They have been superseded by the newer Cache-Control and ETag headers defined in HTTP/1.1 specification. For modern browsers you just need Cache-Control and ETag headers.
"Last-Modified" and "Expires" headers are briefly described in the tutorial. The major focus will be on "ETag" and "Cache-Control" headers.
Freshness and Validation
Once the response from the server comes, and if the server asks the browser to cache the response, then the browser stores the response in its cache. Note that the server can request the browser to cache the content only for a definite period of time (say 120 seconds or 1 day or 6 months). During this time the cache is fresh - if that content is requested again, browser will load the content from the cache. No request to the server is sent again.
Even after the cache has expired, browser does not delete the cache immediately. The browser believes that it is best to confirm from the server whether it should delete it - if the content has not changed then why delete it ?
So the browser first validates from the server whether the content has indeed changed. A request to the server is sent again. If the content has changed, the server sends the fresh content and the browser replaces the cache. Now the cache is fresh again, and the browser will use it till the time it has not expired again.
If content has not changed, the server simply informs the browser that content has not changed, and it can still use the previously stored cached content. Note that in this case the server does not send the whole response again, it only sends HTTP headers indicating that content has not changed. The browser need not download the whole response, and it improves efficiency, by a big amount.
Freshness and validation are important concepts in HTTP caching. You must send HTTP headers indicating freshness and validity of content from your server-side script.
HTTP caching headers are now explained in detail. All these are response headers sent by the server.
Cache-Control Header
Through the Cache-Control header we can specify both freshness and validation. There are a number of directives for this header :
-
"public" and "private"
The "public" directive indicates that the cache can be stored by any cache system - browser, proxy, CDN etc. The "private" directive indicates that the cache can be stored only a system intended for a single user - for example the browser.
If you don't specify neither "public" nor "private" in the Cache-Control header, it is assumed as "public".
Example 1 : The below HTTP header specifies that the response can be cached only by a browser :
Cache-Control: private
Example 2 : The below HTTP header specifies that the response can be cached by any caching system - browser, proxy, CDN etc :
Cache-Control: public
The "no-cache" directive indicates that the cached is stored in the browser, but has to be validated first. The browser sends a request again to the server — if changed then it uses the new response, if not changed then it uses the stored cache. The browser must necessarily make a request to the server to check for content. Although a request has to be sent every time, however efficiency comes from the fact that a new response may need to be downloaded every time.
Cache-Control: no-cache
The "no-store" directive disallows caching — get a response each and every time. You can use this directive if you want the browser to NOT cache the content.
Cache-Control: no-store
The "max-age" directive specifies the time for which content is cached. This is in seconds, and is calculated from the time of the request.
Example : The below HTTP header specifies that the response has to be cached, and it can be reused for the next 120 seconds.
Cache-Control: max-age=120
The "must-revalidate" directive indicates the once the cache has expired, it should not be used without validation from the server. In some cases browser may get content from the cache, even if it has expired. This directive will strongly prevent such a behaviour.
Cache-Control: must-revalidate
You can use multiple directives for Cache-Control header depending on what you want.
Example : The below HTTP header specifies that the response can be cached only by a browser, for 1 hour and expired content should never be used.
Cache-Control: private, max-age=3600, must-revalidate
ETag Header
Through the ETag header we can specify validation.
ETag header is basically a unique token for the content of your page. For example the below HTTP header specifies that the unique token for the response is "12345qwert".
ETag: "12345qwert"
ETag is based on content, so whenever the content changes, the ETag should change as well.
Example : Assume that at some time, the content of your page is "tree", the ETag is set to "12ASQW". Now when the content of the page changes to "flower", the ETag has to be changed. It has to be changed because the content has changed to "flower". In this case the server emits a different ETag header, say "45FGCS".
It is the responsibility of the server or the server-side script to generate the ETag header. It is upto you how you generate this header, however it is most common to use the last modified timestamp or a hash of the content.
How does the browser use the ETag header : On the first request, the browser caches the content and saves the value of the ETag header given by server. As long as the cache is fresh, content is served from it.
However when the cache has expired, the browser sends a If-None-Match request header specifying the same value as the ETag header. The server calculates the ETag for the content. If the calculated ETag is same as If-None-Match, then it sends a 304 Not Modified response header indicating that the response has not changed. The browsers reuses the cache and extends the expiry period as specified in Cache-Control response header.
If the calculated ETag is different than If-None-Match, then the server sends the full response. The browser deletes the old cached copy and the new content is cached for the period as specified in Cache-Control response header.
Once the cache expires again, the whole process is repeated.Last-Modified Header
Last-Modified header contains the date and time at which the content was last modified.
Last-Modified: Sat, 22 Jun 2019 07:30:00 GMT
This indicates that the response was last modified on 22nd June 2019, 7:30 AM GMT time.
As told earlier Last-Modified was defined by an older HTTP specification. If you are just considering modern browsers, you can leave out this header. However if you are considering other caching systems like proxies, load balancers then it is best to include this header also.
Expires Header
Expires header contains the date and time after which the response is considered expired.
Expires: Fri, 22 Jun 2018 07:30:00 GMT
This indicates that the response can be considered to be expired after 22nd June 2018, 7:30 AM GMT time.
Like the Last-Modified header, you can leave out Expires header for modern browsers.
Conclusion
As you have seen, caching can be implemented through a combination of Cache-Control and ETag HTTP response headers. Specifying only a single header won't be helpful — you need to specify both.
This tutorial was mostly focused on the theory of cache headers. See the next tutorial to see some examples of caching. With both these tutorials, you shall have a good understanding of HTTP caching.