Samuel Williams Sunday, 21 March 2010

Ensuring sites are cache-friendly is an important part of deploying a website. Sites that load quickly and reduce bandwidth costs are great, especially when there are lots of visitors.

I've been experimenting with Rack-Cache which is an excellent project. It is a full-featured cache for Rack with support for multiple backends.

It is important to keep in mind that not all resources are suited for caching. For example, I made two separate caches as part of my application: A file cache and a dynamic content cache. This is because files on disk don't need to be stored in a cache, where as dynamically generated content does (otherwise you'd have to regenerate it each time).

Caching for resources such as files and other static resources should rely on ETags. Each static resource has an ETag, which is typically a hash of the file size and last modified time. This is pretty easy to implement.

# The core parts of the File class that I use:
class FileReader
	def initialize(path)
		@path = path
		@etag = Digest::SHA1.hexdigest("#{File.size(@path)}#{mtime_date}")

	attr :path
	attr :etag

	def to_path

	def mtime_date

	def size

	def each, "rb") do |fp|
			while part =
				yield part

	def modified?(env)
		if modified_since = env['HTTP_IF_MODIFIED_SINCE']
			return false if File.mtime(@path) <= Time.parse(modified_since)

		if etags = env['HTTP_IF_NONE_MATCH']
			etags = etags.split(/\s*,\s*/)
			return false if etags.include?(etag) || etags.include?('*')

		return true

# Here is basically how we serve the file to the client:
class Static
	# ... snip ...

	def call(env)
		file =
		response_headers = {
			"Last-Modified" => file.mtime_date,
			"Content-Type" => @extensions[ext],
			"Cache-Control" => @cache_control,
			"ETag" => file.etag

		if file.modified?(env)
			response_headers["Content-Length"] = file.size.to_s
			return [200, response_headers, file]
			return [304, response_headers, []]

Caching for resources such as content that is dynamically generated should typically use last modified time exclusively, and typically for only a short period of time (such as 1 hour). This ensures that your site won't be overloaded generating content (when you get slashdotted), but that content will be regenerated fairly frequently.

# Using rack-cache is easy - simply install it and add it to your

use Rack::Cache, {
	:verbose => true

# Then in your content generation, write something like this

response.headers['Cache-Control'] = 'max-age=3600'

# And rack-cache will take care of the rest :)

Also, just because you are caching content, doesn't mean your page can't have dynamic elements - AJAX can provide interactive RSS feeds, change images, change content, very trivially. This means that the majority of your content can be cached while specific parts are generated on the client dynamically. This is something which I'm experimenting with.

Debugging Cache Issues

I had problems because Apache was adding a second set of Cache-Control headers to all requests. This was because of a global ExpiresDefault directive, which simply appends another Cache-Control header. This can cause incorrect cache information to permeate through the internet. Figuring out all the little problems took me a while since there are many levels which potentially cache information.

I found two great tools for checking whether your pages are serving the correct headers, and your stack responds to things such as If-Modified-Since and If-None-Match correctly:

Both of these sites will point out issues with the content you are serving, and highlight potential problems with resources which won't be cached properly due to missing headers, incorrect headers and/or incorrect behavior.


Leave a comment

Please note, comments must be formatted using Markdown. Links can be enclosed in angle brackets, e.g. <>.