The majority of performance improvements in modern processors are due to increased core count rather than increased instruction execution frequency. To maximise hardware utilization, applications need to use multiple processes and threads. Servers that process discrete requests are a good candidate for both parallelization and concurrency improvements. We discuss different ways in which servers can improve processor utilization and how these different approaches affect application code. We show that fibers require minimal changes to existing application code and are thus a good approach for retrofitting existing systems.
Building an Asynchronous Server
A typical server consists of a loop that accepts connections and performs some work. We explore basic server designs and how they can be modified to improve scalability.
Looping Server
Here is a basic synchronous server loop:
#!/usr/bin/env ruby
require 'socket'
server = TCPServer.new('localhost', 9090)
loop do
client = server.accept
while buffer = client.gets
client.puts(buffer)
end
client.close
end
It cannot handle more than one client at a time.
Forking Servers
We wrap the client connection in a child process. This is a trivial change to existing code, which in addition to allowing multiple requests to be handled simulataneously on multiple processor cores, also isolates the parent process from bugs and security problems in the child process.
#!/usr/bin/env ruby
require 'socket'
server = TCPServer.new('localhost', 9090)
loop do
client = server.accept
fork do
while buffer = client.gets
client.puts(buffer)
end
client.close
end
client.close
end
This is a very robust design that can be easily applied to existing servers. Creating many child processes can consume a lot of memory. Context switching and latency might also be a concern.
Threading Servers
Rather than using a child process, we can use a thread in the same process. We lose the benefits of isolation. On modern systems the difference in performance is minor. Some systems (e.g. JVM) don't support fork, while others (e.g. MRI) don't have truly independent threads.
#!/usr/bin/env ruby
require 'socket'
server = TCPServer.new('localhost', 9090)
loop do
client = server.accept
Thread.new do
while buffer = client.gets
client.puts(buffer)
end
client.close
end
end
In practice, forks and threads are at odds with each other. If you try to fork while there are active threads, you will very likely run into bugs. Even if you didn't create the thread, some other library might have, so it can be very tricky in practice.
Fiber Servers
Rather than using a thread, we can use a fiber. The fiber must yield back to the reactor if an operation would block. The reactor is responsible for resuming the fibers when the operation can continue without blocking.
#!/usr/bin/env ruby
require 'socket'
require 'fiber'
# The full implementation is given here, in order to show all the parts. A simpler implementation is given below.
class Reactor
def initialize
@readable = {}
@writable = {}
end
def run
while @readable.any? or @writable.any?
readable, writable = IO.select(@readable.keys, @writable.keys, [])
readable.each do |io|
@readable[io].resume
end
writable.each do |io|
@writable[io].resume
end
end
end
def wait_readable(io)
@readable[io] = Fiber.current
Fiber.yield
@readable.delete(io)
return yield if block_given?
end
def wait_writable(io)
@writable[io] = Fiber.current
Fiber.yield
@writable.delete(io)
return yield if block_given?
end
end
server = TCPServer.new('localhost', 9090)
reactor = Reactor.new
Fiber.new do
loop do
client = reactor.wait_readable(server) {server.accept}
Fiber.new do
while buffer = reactor.wait_readable(client) {client.gets}
reactor.wait_writable(client)
client.puts(buffer)
end
client.close
end.resume
end
end.resume
reactor.run
Fibers have less overhead when compared to processes and threads, and while one might be comfortable with 1000s of threads, fibers can context switch 100s of millions of times per second on a modern processor core.
Here is a simpler version using the async and async-io gems:
#!/usr/bin/env ruby
require 'async'
require 'async/io/tcp_socket'
Async do |task|
server = Async::IO::TCPServer.new('localhost', 9090)
loop do
client, address = server.accept
task.async do
while buffer = client.gets
client.puts(buffer)
end
client.close
end
end
end
Callback Servers
Rather than using fibers, you can replace the loop with a callback. The reactor invokes a specific function when a certain event is triggered. While fibers naturally execute code in order, just like threads, callbacks often need an associated state machine to achieve the same level of functionality.
var net = require('net');
var server = net.createServer(function (socket) {
socket.on('data', function(data){
socket.write(data.toString())
})
});
server.listen(9090, "localhost");
Callbacks are the simplest form of concurrency, they also have the lowest overhead, but in exchange, they necessitate moving state tracking complexity into user code (aka callback hell).
Hybrid Servers
Asynchronicity should be a property of how the program is executed, not what it does.
Hybrid parallelism and concurrency is required to maximise scalability. We must choose at least one of multi-process or multi-threads (or both), and additionally, a model for concurrency. Asynchronous fibers are the only model for concurrency which can work with existing code bases in a completely transparent way, and thus the right solution for bringing concurrency to existing web applications.
Parallelism
The first decision, which is largely dictated by the system, is whether to use processes or threads for parallelism. As long as you assume that the server is handling discrete requests and responses, both approaches are equally scalable in practice. async-container provides both and they can be interchanged.
Not all platforms support both options equally well. For example Windows and Java do not support fork. Threads on MRI are not truly independent and thus do not scale as well..
Generally speaking, multi-process is more predictable and has better isolation (e.g. security, reliability, restartability). Threads naturally have less isolation, and this can make code more complex and cause bugs. If one thread crashes, it can take down the entire system.
Concurrency
While the model for parallelism doesn't generally affect how the server is implemented, the model for concurrency most certainly does. The default option, do nothing, is the simplest. Your code will execute from top to bottom, in a predictable way.
If you have existing code, multi-process or multi-thread can be a good approach. You cannot easily retrofit this code with promises, callbacks and other explicit forms of concurrency, because they invert flow control and require major structural changes.
Fibers, on the other hand, are semantically similarly to threads (excepting parallelism), but with less overhead. You can run whatever code you want in a fiber and by default it won't behave any worse than if it was running in its own thread or process. However, if you leverage the non-blocking I/O, you can perform significantly better and with the same resources, handle significantly more active connections.
Some models expose both synchronous and asynchronous methods. Such lazy interfaces burden the application code with irrelevant concurrency complexity. All operations that can be asynchronous should be where possible and it shouldn't require changing the method signature.
Birds of a Feather, Fly Together
Falcon is a web server that supports both multi-thread and multi-process parallelism. For concurrency, it depends on async, which uses fibers. This allows existing Rack applications to work at least as well as existing servers, but in the case they choose to use concurrency-aware libraries, they can achieve significant improvements to both throughput and latency with minimal changes to configuration and zero changes to actual usage.
Asynchronous Postgres
While completely experimental, async-postgres is a transparent wrapper which makes the pg gem work concurrently when handling multiple long-running queries.
Asynchronous MySQL
Similarly, async-mysql is a transparent wrapper which makes the mysql2 gem work concurrently when handling multiple long-running queries.
Asynchronous Faraday
async-http-faraday is a backend for Faraday which makes HTTP requests execute concurrently, using a connection pool for HTTP/1 and multiplexing for HTTP/2, with minimal code changes required.
Asynchronous Ruby
It's actually possible to make ALL Ruby I/O concurrent by default. While this doesn't extend to native libraries, it does show that it's a feasible approach with minimal changes required to user code.
Conclusion
Multi-process and multi-thread designs provide parallelism and allow servers to use all available processors. Fibers improve scalabilty further by maximising I/O concurrency with minimal overheads. Callbacks achieve a similar result, but the inverted flow control requires significant changes to existing code. Fibers don't affect visible code flow, and thus make it possible to execute existing code with minimal changes, while potentially improving latency and throughput. Fibers are the best solution for composable, scalable, non-blocking clients and servers.