The majority of performance improvements in modern processors are due to increased core count rather than increased instruction execution frequency. To maximise hardware utilization, applications need to use multiple processes and threads. Servers that process discrete requests are a good candidate for both parallelization and concurrency improvements. We discuss different ways in which servers can improve processor utilization and how these different approaches affect application code. We show that fibers require minimal changes to existing application code and are thus a good approach for retrofitting existing systems.
Building an Asynchronous Server
A typical server consists of a loop that accepts connections and performs some work. We explore basic server designs and how they can be modified to improve scalability.
Here is a basic synchronous server loop:
It cannot handle more than one client at a time.
We wrap the client connection in a child process. This is a trivial change to existing code, which in addition to allowing multiple requests to be handled simulataneously on multiple processor cores, also isolates the parent process from bugs and security problems in the child process.
This is a very robust design that can be easily applied to existing servers. Creating many child processes can consume a lot of memory. Context switching and latency might also be a concern.
Rather than using a child process, we can use a thread in the same process. We lose the benefits of isolation. On modern systems the difference in performance is minor. Some systems (e.g. JVM) don't support fork, while others (e.g. MRI) don't have truly independent threads.
In practice, forks and threads are at odds with each other. If you try to fork while there are active threads, you will very likely run into bugs. Even if you didn't create the thread, some other library might have, so it can be very tricky in practice.
Rather than using a thread, we can use a fiber. The fiber must yield back to the reactor if an operation would block. The reactor is responsible for resuming the fibers when the operation can continue without blocking.
Fibers have less overhead when compared to processes and threads, and while one might be comfortable with 1000s of threads, fibers can context switch 100s of millions of times per second on a modern processor core.
Here is a simpler version using the async and async-io gems:
Rather than using fibers, you can replace the loop with a callback. The reactor invokes a specific function when a certain event is triggered. While fibers naturally execute code in order, just like threads, callbacks often need an associated state machine to achieve the same level of functionality.
Callbacks are the simplest form of concurrency, they also have the lowest overhead, but in exchange, they necessitate moving state tracking complexity into user code (aka callback hell).
Asynchronicity should be a property of how the program is executed, not what it does.
Hybrid parallelism and concurrency is required to maximise scalability. We must choose at least one of multi-process or multi-threads (or both), and additionally, a model for concurrency. Asynchronous fibers are the only model for concurrency which can work with existing code bases in a completely transparent way, and thus the right solution for bringing concurrency to existing web applications.
The first decision, which is largely dictated by the system, is whether to use processes or threads for parallelism. As long as you assume that the server is handling discrete requests and responses, both approaches are equally scalable in practice. async-container provides both and they can be interchanged.
Not all platforms support both options equally well. For example Windows and Java do not support fork. Threads on MRI are not truly independent and thus do not scale as well..
Generally speaking, multi-process is more predictable and has better isolation (e.g. security, reliability, restartability). Threads naturally have less isolation, and this can make code more complex and cause bugs. If one thread crashes, it can take down the entire system.
While the model for parallelism doesn't generally affect how the server is implemented, the model for concurrency most certainly does. The default option, do nothing, is the simplest. Your code will execute from top to bottom, in a predictable way.
If you have existing code, multi-process or multi-thread can be a good approach. You cannot easily retrofit this code with promises, callbacks and other explicit forms of concurrency, because they invert flow control and require major structural changes.
Fibers, on the other hand, are semantically similarly to threads (excepting parallelism), but with less overhead. You can run whatever code you want in a fiber and by default it won't behave any worse than if it was running in its own thread or process. However, if you leverage the non-blocking I/O, you can perform significantly better and with the same resources, handle significantly more active connections.
Some models expose both synchronous and asynchronous methods. Such lazy interfaces burden the application code with irrelevant concurrency complexity. All operations that can be asynchronous should be where possible and it shouldn't require changing the method signature.
Birds of a Feather, Fly Together
Falcon is a web server that supports both multi-thread and multi-process parallelism. For concurrency, it depends on async, which uses fibers. This allows existing Rack applications to work at least as well as existing servers, but in the case they choose to use concurrency-aware libraries, they can achieve significant improvements to both throughput and latency with minimal changes to configuration and zero changes to actual usage.
While completely experimental, async-postgres is a transparent wrapper which makes the pg gem work concurrently when handling multiple long-running queries.
Similarly, async-mysql is a transparent wrapper which makes the mysql2 gem work concurrently when handling multiple long-running queries.
async-http-faraday is a backend for Faraday which makes HTTP requests execute concurrently, using a connection pool for HTTP/1 and multiplexing for HTTP/2, with minimal code changes required.
It's actually possible to make ALL Ruby I/O concurrent by default. While this doesn't extend to native libraries, it does show that it's a feasible approach with minimal changes required to user code.
Multi-process and multi-thread designs provide parallelism and allow servers to use all available processors. Fibers improve scalabilty further by maximising I/O concurrency with minimal overheads. Callbacks achieve a similar result, but the inverted flow control requires significant changes to existing code. Fibers don't affect visible code flow, and thus make it possible to execute existing code with minimal changes, while potentially improving latency and throughput. Fibers are the best solution for composable, scalable, non-blocking clients and servers.