Samuel Williams Friday, 16 November 2018

The majority of performance improvements in modern processors are due to increased core count rather than increased instruction execution frequency. To maximise hardware utilization, applications need to use multiple processes and threads. Servers that process discrete requests are a good candidate for both parallelization and concurrency improvements. We discuss different ways in which servers can improve processor utilization and how these different approaches affect application code. We show that fibers require minimal changes to existing application code and are thus a good approach for retrofitting existing systems.

RubyKaigi 2019 Slides
Fibers Are the Right Solution.pdf

Building an Asynchronous Server

A typical server consists of a loop that accepts connections and performs some work. We explore basic server designs and how they can be modified to improve scalability.

Looping Server

Here is a basic synchronous server loop:

#!/usr/bin/env ruby

require 'socket'

server = TCPServer.new('localhost', 9090)

loop do
	client = server.accept
	
	while buffer = client.gets
		client.puts(buffer)
	end
	
	client.close
end

It cannot handle more than one client at a time.

Forking Servers

We wrap the client connection in a child process. This is a trivial change to existing code, which in addition to allowing multiple requests to be handled simulataneously on multiple processor cores, also isolates the parent process from bugs and security problems in the child process.

#!/usr/bin/env ruby

require 'socket'

server = TCPServer.new('localhost', 9090)

loop do
	client = server.accept
	
	fork do
		while buffer = client.gets
			client.puts(buffer)
		end
		
		client.close
	end
	
	client.close
end

This is a very robust design that can be easily applied to existing servers. Creating many child processes can consume a lot of memory. Context switching and latency might also be a concern.

Threading Servers

Rather than using a child process, we can use a thread in the same process. We lose the benefits of isolation. On modern systems the difference in performance is minor. Some systems (e.g. JVM) don't support fork, while others (e.g. MRI) don't have truly independent threads.

#!/usr/bin/env ruby

require 'socket'

server = TCPServer.new('localhost', 9090)

loop do
	client = server.accept
	
	Thread.new do
		while buffer = client.gets
			client.puts(buffer)
		end
		
		client.close
	end
end

In practice, forks and threads are at odds with each other. If you try to fork while there are active threads, you will very likely run into bugs. Even if you didn't create the thread, some other library might have, so it can be very tricky in practice.

Fiber Servers

Rather than using a thread, we can use a fiber. The fiber must yield back to the reactor if an operation would block. The reactor is responsible for resuming the fibers when the operation can continue without blocking.

#!/usr/bin/env ruby

require 'socket'
require 'fiber'

# The full implementation is given here, in order to show all the parts. A simpler implementation is given below.
class Reactor
	def initialize
		@readable = {}
		@writable = {}
	end
	
	def run
		while @readable.any? or @writable.any?
			readable, writable = IO.select(@readable.keys, @writable.keys, [])
			
			readable.each do |io|
				@readable[io].resume
			end
			
			writable.each do |io|
				@writable[io].resume
			end
		end
	end
	
	def wait_readable(io)
		@readable[io] = Fiber.current
		Fiber.yield
		@readable.delete(io)
		
		return yield if block_given?
	end
	
	def wait_writable(io)
		@writable[io] = Fiber.current
		Fiber.yield
		@writable.delete(io)
		
		return yield if block_given?
	end
end

server = TCPServer.new('localhost', 9090)
reactor = Reactor.new

Fiber.new do
	loop do
		client = reactor.wait_readable(server) {server.accept}
		
		Fiber.new do
			while buffer = reactor.wait_readable(client) {client.gets}
				reactor.wait_writable(client)
				client.puts(buffer)
			end
			
			client.close
		end.resume
	end
end.resume

reactor.run

Fibers have less overhead when compared to processes and threads, and while one might be comfortable with 1000s of threads, fibers can context switch 100s of millions of times per second on a modern processor core.

Here is a simpler version using the async and async-io gems:

#!/usr/bin/env ruby

require 'async'
require 'async/io/tcp_socket'

Async do |task|
	server = Async::IO::TCPServer.new('localhost', 9090)
	
	loop do
		client, address = server.accept
		
		task.async do
			while buffer = client.gets
				client.puts(buffer)
			end
			
			client.close
		end
	end
end

Callback Servers

Rather than using fibers, you can replace the loop with a callback. The reactor invokes a specific function when a certain event is triggered. While fibers naturally execute code in order, just like threads, callbacks often need an associated state machine to achieve the same level of functionality.

var net = require('net');

var server = net.createServer(function (socket) {
	socket.on('data', function(data){
		socket.write(data.toString())
	})
});

server.listen(9090, "localhost");

Callbacks are the simplest form of concurrency, they also have the lowest overhead, but in exchange, they necessitate moving state tracking complexity into user code (aka callback hell).

Hybrid Servers

Asynchronicity should be a property of how the program is executed, not what it does.

Hybrid parallelism and concurrency is required to maximise scalability. We must choose at least one of multi-process or multi-threads (or both), and additionally, a model for concurrency. Asynchronous fibers are the only model for concurrency which can work with existing code bases in a completely transparent way, and thus the right solution for bringing concurrency to existing web applications.

Parallelism

The first decision, which is largely dictated by the system, is whether to use processes or threads for parallelism. As long as you assume that the server is handling discrete requests and responses, both approaches are equally scalable in practice. async-container provides both and they can be interchanged.

Not all platforms support both options equally well. For example Windows and Java do not support fork. Threads on MRI are not truly independent and thus do not scale as well..

Generally speaking, multi-process is more predictable and has better isolation (e.g. security, reliability, restartability). Threads naturally have less isolation, and this can make code more complex and cause bugs. If one thread crashes, it can take down the entire system.

Concurrency

While the model for parallelism doesn't generally affect how the server is implemented, the model for concurrency most certainly does. The default option, do nothing, is the simplest. Your code will execute from top to bottom, in a predictable way.

If you have existing code, multi-process or multi-thread can be a good approach. You cannot easily retrofit this code with promises, callbacks and other explicit forms of concurrency, because they invert flow control and require major structural changes.

Fibers, on the other hand, are semantically similarly to threads (excepting parallelism), but with less overhead. You can run whatever code you want in a fiber and by default it won't behave any worse than if it was running in its own thread or process. However, if you leverage the non-blocking I/O, you can perform significantly better and with the same resources, handle significantly more active connections.

Some models expose both synchronous and asynchronous methods. Such lazy interfaces burden the application code with irrelevant concurrency complexity. All operations that can be asynchronous should be where possible and it shouldn't require changing the method signature.

Birds of a Feather, Fly Together

Falcon is a web server that supports both multi-thread and multi-process parallelism. For concurrency, it depends on async, which uses fibers. This allows existing Rack applications to work at least as well as existing servers, but in the case they choose to use concurrency-aware libraries, they can achieve significant improvements to both throughput and latency with minimal changes to configuration and zero changes to actual usage.

Asynchronous Postgres

While completely experimental, async-postgres is a transparent wrapper which makes the pg gem work concurrently when handling multiple long-running queries.

Asynchronous MySQL

Similarly, async-mysql is a transparent wrapper which makes the mysql2 gem work concurrently when handling multiple long-running queries.

Asynchronous Faraday

async-http-faraday is a backend for Faraday which makes HTTP requests execute concurrently, using a connection pool for HTTP/1 and multiplexing for HTTP/2, with minimal code changes required.

Asynchronous Ruby

It's actually possible to make ALL Ruby I/O concurrent by default. While this doesn't extend to native libraries, it does show that it's a feasible approach with minimal changes required to user code.

Conclusion

Multi-process and multi-thread designs provide parallelism and allow servers to use all available processors. Fibers improve scalabilty further by maximising I/O concurrency with minimal overheads. Callbacks achieve a similar result, but the inverted flow control requires significant changes to existing code. Fibers don't affect visible code flow, and thus make it possible to execute existing code with minimal changes, while potentially improving latency and throughput. Fibers are the best solution for composable, scalable, non-blocking clients and servers.

Comments

Leave a comment

Please note, comments must be formatted using Markdown. Links can be enclosed in angle brackets, e.g. <www.codeotaku.com>.