Applications that process many simultaneous requests are an area where Ruby struggles. The traditional request-per-process/thread model does not scale well. Explicit concurrency, including promises/futures, impose a significant cognitive burden and their use pollutes code all the way up the call chain. We present async, a gem for Ruby that implements the reactor pattern using stackful coroutines (fibers). We discuss asynchronous programming models, touch briefly on the state of asynchronous programming in Ruby, and show how fibers can be a excellent model for concurrency.
What is asynchronous programming?
A computer program is defined by a set of operations. A synchronous program is one where these operations happen one after the other. The completion of one operation preceeds the start of the next. Synchronous programs will incur latency equal to the sum of all the operations.
If a program can be broken up into a series of operations that can be executed independently, it can be considered asynchronous. The completion of some operations can happen independently of others. Asynchronous programs can improve processor utilization, increasing throughput and lowering latency.
Modern computers utilize both concurrency and parallelism to execute computer programs asynchronously. These concepts are used to describe how operations are scheduled and executed on the computer's processors.
Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.
Rob Pike
Parallel Programs
Parallel programs are those that execute tasks simulateously on multiple processors, typically using processes or threads. Parallel programs are good for CPU-bound workloads because tasks can be executed on independent processors at full speed with minimal contention.
Concurrent Programs
Concurrent programs interleave task execution on a single processor, typically by cooperative or preemptive scheduling. Concurrent programs are good for IO-bound workloads because tasks share the processor while they are waiting on high-latency IO operations.
What makes asynchronous programs slow?
In an ideal world, asynchronous programs run at full speed on as many processors as you have; tasks are scheduled and executed as rapidly as possible. Tasks that depend on external state (e.g. disk IO) have to wait for data to become available. We refer to this as blocking due to contention. This can occur in several ways:
- If you have more tasks than processors, you have CPU contention.
- If you are waiting on disk IO operations, you have disk contention.
- If you are waiting on network operations, you have network contention.
- If you have multiple tasks trying to modify shared resource, you have resource contention.
In general, contention increases latency and decreases throughput. Sometimes it's unavoidable. Sometimes you can mitigate contention by pre-fetching required information. However, even down to the CPU instructions, there can be contention (e.g. on shared L3 cache).
If you can schedule another task to run, you can minimise the effect of the blocking operations on other tasks, thereby reducing overall latency and increasing throughput. Hyper-threading, which operates at the CPU level, is one technique that CPU designers use to increase throughput at the expense of latency (due to context switching on the CPU core).
What is the Reactor pattern?
The reactor pattern is simply a way of scheduling blocking operations so that other tasks can run concurrently. Typically, an operation fails because it would block (e.g. read()
fails with EWOULDBLOCK
). We schedule the file descriptor into an event polling mechanism (e.g. select
, epoll
or kqueue
) and suspend the caller. When the operation is ready to continue, the caller is resumed. In some systems, you provide a callback, in other systems, you can use stackless or stackful coroutines to manage state.
Async uses stackful coroutines, otherwise known as fibers (similar to green threads). Fibers allow us to use normal function, loops and conditionals to execute our program. When an operation would block, we yield the fiber back to the reactor, which will resume from that point when the operation can continue. In the mean time, other tasks can execute. Because no new syntax required for blocking operations (they are just normal function calls), it's possible for existing code to work in a reactor with no modifications; we can also avoid callback hell, a problem associated with more explicit forms of concurrency.
Typical reactors handle blocking IO and timers. Some operations don't have non-blocking operating system level interfaces (e.g. gethostbyname()
), and in that case you need to use thread pools or other concurrency primatives. For the most part these can be implemented cleanly with non-blocking interfaces.
State of Ruby in 2018
Ruby (the interpreter) has a global interpreter lock which ensures that only one line of Ruby is executing at any time within a single Ruby process. Even if you have multiple threads, you can't execute Ruby code in parallel without multiple processes. Some specific parts of the interpreter give up this lock when executing blocking system calls, which allows other Ruby threads to execute. The implication of this is that a multi-process design has better throughput and lower latency than multi-thread.
Ruby (the standard library) has a frustrating IO model:
- It has a mixture of blocking and non-blocking operations, e.g.
- A variety of
Socket
and Socket-like classes with slightly different method signatures, e.g. - Large surface area for basic operations, some of which may or may not use internal buffering, e.g.
pread, read, recv, sysread, readpartial, sync=, fsync, flush, close
. There is no general buffering implementation, and so it gets reinvented, e.g. - Many different places where blocking DNS resolution can occur. Many networking classes take strings for addresses which are resolved internally with blocking DNS, e.g.
- At least 3 different ways to invoke non-blocking behavior, the oldest of which destroyed VM performance.
- JRuby doesn't support the standard UNIX socket model due to limitations of the JVM and doesn't implement the same methods as MRI.
The future for an asynchronous Ruby
It's clear at this point that Ruby has a complex legacy. However, going forward, Ruby needs a vision for a future that makes it easy for users to write efficient asynchronous programs. There are many aspects to this problem, but one core issue is providing an interface on which IO operations can work asynchronously.
Ruby already has support for Fibers, which make cooperative scheduling trivial. All that's needed is an interface for waiting on blocking operations. There are many kinds of blocking operations, but the main ones are IO related. Most other blocking operations can be turned into IO operations by using threads and pipes. The "reactor" or "selector" is a common design pattern, where one can register IO of interest and then receive notifications when the IO is ready to be read/write from.
Fortunately, we already have a very stable implementation of the lowest level. All that's required is calling Fiber.yield
and fiber.resume
at the right time. We've made a proposal to implement such hooks into the existing Ruby implementation and we'd love your feedback.
An overview of Async
We actually have a working implementation of the above proposal using wrappers. The async gem provides a stable implementation of a concurrent IO reactor. It provides cooperatively scheduled tasks which yield when an operation would block. It doesn't monkey patch Ruby's standard library (althought perhaps we should), but provides wrappers which in some cases can be used transparently.
The reactor and all its tasks are contained entirely within one thread, which avoids any locking or inter-thread communication which can be a significant overhead in per-process reactor designs. It also avoids typical shared state synchronisation issues between tasks since only one is executing at a given time.
Nested tasks implicity manage resources. If a task makes several HTTP requests, but is soon cancelled, all nested tasks will also be cancelled. This ensures that complex asynchronous processes can be managed with ease and with guaranteed invariants.
Tasks themselves yield when operations would block, and this is completely transparent to the caller, unlike async/await style concurrency (which includes promises/futures). This avoids the implementation details of concurrency bleeding into otherwise understandable synchronous code, and allows previously synchronous code to become (within limitations of the Ruby API) transparently asynchronous.
Practical implementation with Async
We have enjoyed implementing a wide varity of asynchronous programs with Ruby. It's truly a pleasure and the performance is very encouraging.
- Async provides a reactor, the task structure and a generic IO wrapper. This core gem has a small surface area by design.
- Async::IO provides IO wrappers which in many cases are a drop-in replacements for the Ruby standard library. It is likely that the design of this gem will continue to evolve.
- Async::Container provides policy driven concurrency for servers. You write your server function, and you deploy it on as many threads or processes as you have available.
- Async::DNS provides both DNS resolver (client) and server. It's the guts of RubyDNS.
- Async::HTTP provides both HTTP(S) client and server, with experimental support for HTTP/2.
- Async::WebSocket provides a simple example client/server and is a good place to start if you'd just like to try something out.
- Async::Postgres and Async::MySQL provide asynchronous connection adapters for ActiveRecord. It must be used with the falcon web server. It's a bit of an experiment to see what's possible.
- Async::Await transparently wraps your code and makes it asynchronous. It's magic.
Here is an example of non-blocking Ruby using async-await
:
We look forward to a future where such a model can be brought to all of Ruby.