Pascal Widdershoven - 9 December 2019
652 words in about 3 minutes
I recently ran into an issue where after deployment of an SPA (Single Page Application), a situation would occur where the page looked broken because CSS could not be loaded. During analysis of the issue I ran into a thing I had never heard of: Heuristic Freshness.
For context, the application is a typical SPA. The compiled application consists of a bunch of files looking something like this (simplified):
1 2 3 ├── index.html └── styles.4a3f9848037579025b00.css └── main.31f7dadf6d2b01fc08c7.js
The browser loads
index.html, which includes various CSS and JS files.
Caching related headers for the CSS and JS looked like this:
1 2 3 4 Date: Fri, 06 Dec 2019 13:09:03 GMT Etag: "31957a05a5df3c3b315b728b40b6e10e" Last-Modified: Mon, 02 Dec 2019 14:12:09 GMT Expires: Sat, 07 Dec 2019 03:09:03 GMT
1 2 3 Date: Fri, 06 Dec 2019 13:09:03 GMT Etag: "c4238385fe77f826b5584fed1f1f1659" Last-Modified: Tue, 03 Dec 2019 10:50:54 GMT
At first sight things looked okay, but then I noticed there weren’t any
Expires on the
I’m quite well aware what all these caching headers do when they are present, but I wasn’t sure what would happen if they are not present 🤔.
This brings us to
Heuristic Freshness. The HTTP specification defines that, when a server does not explicitly specify expiration times, the client (browser) can use heuristics to estimate a plausible expiration time itself.
How exactly this ‘plausible’ expiration time is determined is left up to the client, but it seems that in practise most browsers use the following algorithm:
(now() - Last-Modified) * 0.10. This means a couple of things:
- When you don’t have any
Expiresheaders, the browser calculates this plausible expiry time itself.
- Once your assets are cached by browsers, there’s no way for you to evict them from the cache.
- The files will be cached longer as time passes after deployment (assuming your
Last-Modifiedheaders reflect the time of the last deployment).
As you can see, this can result in some pretty nasty caching issues that are hard to diagnose as the duration for which files are cached will differ case by case depending on time and potentially browser used.
Cache-Control and Expires headers to the rescue!
The lesson I take away from this is that it’s crucial to set either
Expires, to ensure you control how long files can be cached by the browser.
For single page apps like I outlined above where you have an
index.html and a bunch of assets with hashed filenames, the following is a good, safe practise:
1 Cache-control: private, max-age=0, no-cache
This will ensure that the browser will never use a cached copy of your
index.html, without checking with the server if the cache is still valid via a conditional GET request.
For other assets with hashed filenames, you want the opposite:
1 Cache-Control: public, max-age=31557600
These files can be cached for a long, long time, as when they change their filenames will change as well.
With this in place the following will happen during and after a deployment:
- A new
index.htmlwill be uploaded to the server.
- When a user loads your application, the browser will send a conditional GET request
If-Modified-Since: <previous Last-Modified value>, and the server will respond with the new version of your
index.html, since the file was modified by the deployment. This happens because you’ve instructed the browser to always verify if the cached page can be used, using the
- The browser will no longer us the old cached page.
- The browser will store the new page in cache and will continue sending conditional GET requests in the future, to verify if the cached page can still be used.
For more information on HTTP caching (especially about what headers do when they are present) see: