Caching 2: cachefu strategy

With cachefu and varnish installed with basic configuration in place, it is now time to tweak it. Only in case the performance isn’t satisfactory of course.

The default settings are quite OK and take care of some important quick wins, like resource registry caching, image caching and basic document/folder caching. With pretty safe defaults.

Cachefu setup is a matter of trade-offs. So this chapter is structured as “x or y” questions. “Do you want to use Etags or in-memory caching?” It should give you a feel for the trade-offs involved.

At the end I explain cachefu’s default strategy.

Always fresh or a cache of some sort?

Risk. Caching is a matter of risk. If every piece of content is delivered fresh, it is never out of date. By definition, caching means risk of serving outdated content. You can mitigate the risk by cache purging or Etags, btw.

Get permission. Many pages rarely change. A newsitem once-written might only be changed when there’s an accidental typo. Is it bad if there’s a theoretical 10 minute delay in seeing the typo fixed for some users? On the other hand, an expensive-to-render homepage might need an instant refresh to show new news items once they become available.

You get the best performance when plone’s work is minimized.

Action: take a look at the site and determine which items can be more aggressively cached. Cachefu’s defaults are are on the safe side, so is there room for more aggressiveness? Are there public pages that are viewed a lot in a short time by the same user that don’t change often? You could cache them in the browser. Or something like a homepage that you’re bound to visit a lot while clicking around in a site. If performance is tight, perhaps you could cache it for 5 minutes? You’re starting in on advanced high tweakery here though.

Browser cache or proxy cache?

Cachefu can tell the proxy cache (varnish or squid) to cache content so that subsequent results for the same item are served directly by the proxy cache instead of plone. Less work, happy plone.

Every request going to the server is still a request. With associated waiting/downloading time. You can improve the actual perceived speed of your website a whole lot by allowing the browser to cache items in the local browser cache. The typical plone page has a couple of css/javascript files, tens of images and the actual html. If you can reduce those 30-something calls to just one, your site feels much more speedy. And the load on the server is reduced.

The drawback is that you have no control over the browser-cached items apart from the information you send over the very first time (“you can store it for two hours”). A proxy cache, instead, can be purged. Purged manually from the command prompt (“aargh, I made a mistake, let’s clear the whole cache”) or purged automatically (“this news item has changed, let’s tell the proxy cache this item is out of date”).

What cachefu sensibly does: cache filesystem images and the resource registry’ed css/javascript files in the browser cache and also in the proxy cache. Pages are cached in the proxy cache.

Authenticated pages pose a restriction: they cannot logically be stored in the proxy cache as otherwise unauthenticated users can access it. Only browser caching is possible.

Action: are there personalised pages that are requested often that can be “de-personalised”? A neutrally-named link to your personal settings instead of seeing the user’s name displayed there? Javascript to insert personalised content?

Etag or in proxy cache?

Some pages cannot be reliably cached in the proxy cache. They change too often or they must be always fresh. If they’re expensive to calculate but often visited, you get a dilemma. Etags can solve this dilemma.

An “Etag” is just a string that identifies the state of a page. Cachefu helps you create that string: you can put into it for instance:

  • Modification time.
  • Current user’s ID.
  • Last catalog modification time.
  • Current user’s roles.
  • And anything you want, returned by a script of your choosing. Just make sure it is a short string.

The string is send to the browser, which returns it on subsequent requests. If the calculated string is still the same, a “304 not modified” is returned. The idea is that it is way cheaper to re-calculate the Etag than to calculate and return the whole page.

Etags are on a per-individual-browser basis. They’re a private hint passed between one browser and the server. So they’re great for authenticated users.

Tip: if your Etag setup is getting really complicated: why not create a browser view that handles it all for you in code? context/@@Etags/homepage could be a browser view’s method to return the Etags for the homepage. That way you can optimize the Etag at will.

Etags are valid for a limited time. If you’ve covered all the variables that make up your page, you could set this real high. If you set it to an hour, you’re guaranteed that the item is refreshed at least once an hour. Just a safety precaution.

Action: use Etags for pages that are often visited but that have to be absolutely fresh. For instance a homepage with a poll on it. If you’ve filled in the poll, the homepage should show the results instead of the poll form. So you put the poll’s vote status into the Etag.

Etag or in memory?

There’s a possible measure that make Etags even more performant. An Etag’ed page still has to be calculated for every separate browser that requests it, even though the page might be the same as for the other browser. Cachefu has a memory cache in which it can store rendered pages hashed by Etag.

Note: the more specific the Etag, the less memory cache hits you’ll get. So if you can get by with putting the current user’s roles in the Etag instead of the current user’s ID, you’ve already won some performance.

Action: identify often requested pages that are valid for larger groups of users and that have to be absolutely fresh. Set them to “store in memory”. The result is the same as with the Etag setting with the additional in-plone-memory caching of the rendered results.

Action: do keep an eye on cachefu’s memory size settings. Make an estimate of the amount of cached copies and make sure the memory cache is used effectively.

Should we cache at all?

Some pages aren’t visited often so they might actually be simply served by plone itself. A blog with 500 entries has a lot of pages that are viewed just once a year.

Action: don’t overdo caching, restrict yourself to the high-yield items. Measure, then cache.

Authenticated usage, personalised content?

Public pages without personalisation are great for caching. The more personalised a page, the harder it is to get caching right. It might not be possible at all.

Action: if performance of the site could be a problem, watch out with personalising pages. Showing the name of a logged-in user in the page already means the page is not usable for others and thus not generally cacheable.

One-language or multiple languages?

Plone has lots of translations. Showing more than one language means that the proxy cache has to distinguish on the Accept-Language header. Every individual accept-language combination thus gets its own copy of a page in the proxy cache. That is an order of magnitude you’re paying for in performance.

If you enable just a few languages, you could figure out the used language in code and use that value in an Etag to salvage some performance. So just the used language “nl” instead of a full language combination “nl, en” next to “nl, de, en”. On some sites, this helps a lot, especially when the Etag is used for memory caching.

Action: see if you can restrict the website to just one language. The caching will be much simpler and more performant.

Max-age versus s-max-age?

“Max-age” tells the browser how long to store a page in the browser cache. “s-max-age” tells the same to the proxy cache. You could set them to the same value.

On the other hand, cachefu can signal the proxy cache to purge a page but it cannot purge a browser’s cache. So it is possible to safely set the s-max-age to a higher value.

Action: check if proxy cache purging works reliably. If so, you can take the risk of a longer server max age for a bit of extra performance.

Public or private?

Requests of logged-in users are ignored by the proxy cache. Sometimes the page or image is perfectly valid for anonymous users, too. In that case, the public cache-control header tells the proxy cache to store it anyway.

The other way around, some items are personalised despite not looking to be so. Perhaps there’s some ip-based localisation that shows a different homepage image based on percieved location? In that case, a private cache-control header tells the proxy cache not to store it.

Action: are there items that have to be set explicitly to private or public? Where the default guesswork fails?

Explanation of cachefu default strategy in Plone 3.x

Cachefu has detailed explanations in the description field inside its rules. I’ll just raise the main points so that you get a better feel how the above questions play out in practice.

  • Resource registry css/javascript files have a unique ID for every version. So they, by definition, never change. If the css changes, the combined file gets a new ID in its name. So cachefu caches it forever in the browser and the proxy cache. So plone only has to serve it once in a lifetime. No need for purging, as changes mean a new file name.
  • Rarely changing items like file system images are cached in the browser and the proxy cache for a day. There’s some risk in this, but it really helps limit the amount of requests that plone has to handle.
  • “Content” (which means individual pages, news items, etc.) has a rule that caches them in the proxy cache. Pages normally don’t have many unrelated changes, so keeping them in the proxy cache for an hour before refreshing is pretty safe. When the page content itself changes, cachefu purges the page from the proxy cache, so that is covered too. There’s no client-side caching by default as that is impossible to purge.
  • “Containers” (which means the plone site itself, folders, topics) are cached in memory. They depend too much on other content: if an item is added inside them, their navigation tree or folder contents display must change instantly. So purging is impractical (at least, without custom event handlers). Caching in memory is based on the Etag key of the container which by default includes the catalog modification time: every time something somewhere changes, every container’s cache is refreshed on the next request. In-memory caching instead of per-browser Etag caching is used as the assumption is that containers are visited quite often, so storing rendered copies is a worthwhile trade-off.
  • “HTTPCache” items (which mostly mean file system images that are tied to the HTTPCache with a .metadata file) are stored in the proxy cache and the browser cache for a day. They rarely change and the resulting drop in incoming requests is great. A trade-off.

Gotchas

Some additional questions to help you get started thinking about the default configuration.

  • How much memory are all those memory-cached rendered folders going to take up? How often does something in the site change, invalidating the memory cache? Is it worth it?
  • Are there expensive pages, like a homepage with loads of portlets? Specific caching with a customized Etag for just that content type could help heaps. Just add an extra rule, that is easy enough to do.
  • Can you lengthen the amount of time file system images are cached? Or should you lower it?
  • How customized are your pages for logged-in users? Do you depend on all the items that make up the default Etag? Just roles instead of user ID?
  • Can you cut the catalog modification date out of the Etag and lower the amount of time the Etag is valid to compensate? Think about the trade-offs involved.

Extra: purging the proxy cache

Problem: you configured the proxy cache to cache content for a specified time (an hour or a day). Whenever the content changes, you want to refresh the proxy cache’s copy. And you don’t want the proxy cache to keep on badgering plone with “is this still fresh enough?” questions.

Solution: CacheFu can automatically send a purge request to the proxy cache to forget about certain items. The proxy cache doesn’t accept such requests from the internet at large, but typically only those requests originating on the server itself.

Note that the browser can also send a header to the server (and thus to the proxy cache) that requests a really really really fresh copy. “Forced reload”. Squid configurations normally listen to this, but varnish not (at least not in the plone-provided configuration).

On your development machine you can see a “purge queue full” error in your zope debug log. That is cachefu’s purging mechanism that tries to send PURGE requests to the configured proxy cache, which of course fails as cachefu is now on your laptop and not on the server. Don’t worry, the errors have no impact.

The simplest way is to purge just the basic url of any object that changes. Some objects have multiple user-visible views, though. Like different sizes for images. A summary and a full view for a blog entry. And so on. Cachefu handles that use case, too.

Cachefu has several ways to determine the full list of urls to purge:

  • The relevant rule is asked for relative urls to purge. The view urls for the object are determined by taking the relative url and adding /, /view and the current default view. If you’ve configured extra views in your rule, those are added too. If the object has an archetypes imagefield, that field’s possible urls (so the image sizes) are also added.
  • Additionally to the above, a rule can have a purge expression. This is a python script or view method that returns extra urls. By default, cachefu adds a python script (getImageAndFilePurgeUrls.py) that adds /download for files and the various image scales for images.
  • IPurgeUrls subscription adapters are looked up and queried for both relative and absolute urls. This isn’t used by cachefu itself, though. You can use it in your own products to configure extra purging through zope’s component architecture instead of python scripts.

Purging varnish from the command line

Sometimes, stuff breaks. There’s something in varnish’s cache that you want to get out. Varnish supports regex-based purging. To purge everything, look up the port number of your varnish instance and run:

$> bin/varnishadm -T localhost:13080 url.purge '.*'

The regex at the end allows you detailed purging, but that’s probably not interesting. Another brute-force way is to:

  • kill your varnish instance
  • delete the parts/myvarnish/storage file
  • start varnish again.