Ensuring sites are cache-friendly is an important part of deploying a website. Sites that load quickly and reduce bandwidth costs are great, especially when there are lots of visitors.
I've been experimenting with Rack-Cache which is an excellent project. It is a full-featured cache for Rack with support for multiple backends.
It is important to keep in mind that not all resources are suited for caching. For example, I made two separate caches as part of my application: A file cache and a dynamic content cache. This is because files on disk don't need to be stored in a cache, where as dynamically generated content does (otherwise you'd have to regenerate it each time).
Caching for resources such as files and other static resources should rely on ETags
. Each static resource has an ETag, which is typically a hash of the file size and last modified time. This is pretty easy to implement.
# The core parts of the File class that I use:
class FileReader
def initialize(path)
@path = path
@etag = Digest::SHA1.hexdigest("#{File.size(@path)}#{mtime_date}")
end
attr :path
attr :etag
def to_path
@path
end
def mtime_date
File.mtime(@path).httpdate
end
def size
File.size(@path)
end
def each
File.open(@path, "rb") do |fp|
while part = fp.read(8192)
yield part
end
end
end
def modified?(env)
if modified_since = env['HTTP_IF_MODIFIED_SINCE']
return false if File.mtime(@path) <= Time.parse(modified_since)
end
if etags = env['HTTP_IF_NONE_MATCH']
etags = etags.split(/\s*,\s*/)
return false if etags.include?(etag) || etags.include?('*')
end
return true
end
end
# Here is basically how we serve the file to the client:
class Static
# ... snip ...
def call(env)
file = File.new(...)
response_headers = {
"Last-Modified" => file.mtime_date,
"Content-Type" => @extensions[ext],
"Cache-Control" => @cache_control,
"ETag" => file.etag
}
if file.modified?(env)
response_headers["Content-Length"] = file.size.to_s
return [200, response_headers, file]
else
return [304, response_headers, []]
end
end
end
Caching for resources such as content that is dynamically generated should typically use last modified time exclusively, and typically for only a short period of time (such as 1 hour). This ensures that your site won't be overloaded generating content (when you get slashdotted), but that content will be regenerated fairly frequently.
# Using rack-cache is easy - simply install it and add it to your config.ru
use Rack::Cache, {
:verbose => true
}
# Then in your content generation, write something like this
response.headers['Cache-Control'] = 'max-age=3600'
# And rack-cache will take care of the rest :)
Also, just because you are caching content, doesn't mean your page can't have dynamic elements - AJAX
can provide interactive RSS feeds, change images, change content, very trivially. This means that the majority of your content can be cached while specific parts are generated on the client dynamically. This is something which I'm experimenting with.
Debugging Cache Issues
I had problems because Apache was adding a second set of Cache-Control
headers to all requests. This was because of a global ExpiresDefault
directive, which simply appends another Cache-Control
header. This can cause incorrect cache information to permeate through the internet. Figuring out all the little problems took me a while since there are many levels which potentially cache information.
I found two great tools for checking whether your pages are serving the correct headers, and your stack responds to things such as If-Modified-Since
and If-None-Match
correctly:
Both of these sites will point out issues with the content you are serving, and highlight potential problems with resources which won't be cached properly due to missing headers, incorrect headers and/or incorrect behavior.