Asynchronous DNS with EventMachine

DNS resolution in a typical application is a blocking operation. This is especially problematic if your program is event driven as you can cause the run loop to stall while waiting for a response. RubyDNS provides a fully featured asynchronous DNS resolver built on top of EventMachine, and can be used to minimise the latency of name resolution in your program.


require 'rubydns'

resolver = RubyDNS::Resolver.new([[:udp, "8.8.8.8", 53], [:tcp, "8.8.8.8", 53]])

EventMachine::run do
	resolver.query('www.codeotaku.com') do |response|
		case response
		when RubyDNS::Message
			response.answer.each do |answer|
				host = answer[0].to_s
				ttl = answer[1]
				resource_class = answer[2].class.name.split('::', 4)[-1]
				address = answer[2].address.to_s
			
				puts host.ljust(20) + ttl.to_s.ljust(5) + resource_class.ljust(15) + address.to_s
			end
		when RubyDNS::ResolutionFailure
			puts "Error"
		end
		
		EventMachine::stop
	end
end

# Gives the output
# www.codeotaku.com   287  IN::A          108.162.195.101
# www.codeotaku.com   287  IN::A          108.162.196.199

In your own code you'd probably want to use the new resolver.addresses_for method which helpfully returns a list of addresses.

Why?

Event driving programming is pretty straight forward. You essentially have a loop that reads events and responds to them. Processing is kept to a minimum per cycle, otherwise the loop stalls and becomes unresponsive.

Software that uses a network for communication typically relies on name resolution. In particular, RubyDNS::Server is primarily interested in two things - receiving incoming requests and sending out a response - and - sending out requests and waiting for a response.

DNS resolution is one task that is typically done using operating system functions such as gethostbyname or getaddrinfo. The main problem is that these functions cause your process to sleep until a result is available.

(Have you ever noticed how sometimes in games, the entire game stalls when logging in or listing servers? This is very often due to name resolution latency affecting the main event loop)

If RubyDNS uses these functions, no other events can be processed while we are waiting for the operating system to respond. In practice, this means that RubyDNS::Server may perform poorly if many people are using it simultaneously.

To avoid these problems, RubyDNS recently introduced its own RubyDNS::Resolver which provides robust asynchronous DNS resolution built on top of EventMachine. This resolver isn't just for RubyDNS::Server though, it can be used in any EventMachine event driven code that wants high performance name resolution.

Implementation

Given a request, which consists of one or more DNS questions, our resolver firstly checks whether UDP is a suitable transport. DNS packets are typically routed over UDP but if the packet is too big it should use TCP:

# `message` is the outgoing DNS request. `servers` is an array of potential
# upstream servers who might be able to provide an answer.
def initialize(message, servers, options = {}, &block)
	@message = message
	@packet = message.encode
	
	@servers = servers.dup
	
	# We select the protocol based on the size of the data:
	if @packet.bytesize > UDP_TRUNCATION_SIZE
		@servers.delete_if{|server| server[0] == :udp}
	end
	
	# Measured in seconds:
	@timeout = options[:timeout] || 5
	
	@logger = options[:logger]
end

With this list of candidates, we connect to each one and send the request. In all failure cases, we try the next server if one is available, otherwise if no servers have been successful we signal a resolution failure:

def try_next_server!
	if @request
		@request.close_connection
		@request = nil
	end
				
	if @servers.size > 0
		@server = @servers.shift

		# We make requests one at a time to the given server, naturally the servers
		# are ordered in terms of priority.
		case @server[0]
		when :udp
			@request = UDPRequestHandler.open(@server[1], @server[2], self)
		when :tcp
			@request = TCPRequestHandler.open(@server[1], @server[2], self)
		else
			raise InvalidProtocolError.new(@server)
		end
					
		# Setting up the timeout...
		EventMachine::Timer.new(@timeout) do
			try_next_server!
		end
	else
		# Signal that the deferrable has failed and resolution was not possible:
		fail ResolutionFailure.new("No available servers responded to the request.")
	end
end

We then wait until EventMachine tells us one of two things: some response was received or there was a timeout. If we receive a response, as long as it wasn't truncated, we are successful:

def process_response!(response)
	if response.tc != 0
		# We hardcode this behaviour for now.
		try_next_server!
	else
		succeed response
	end
end

In practice, we use EventMachine::Deferrable to handle this signalling. Using deferrables ultiamtely led to concise and reliable code and I was very happy with the results. I'd recommend taking a look at the full source code.