Asynchronicity should be a property of how the program is executed, not what it does.
Ruby currently implements mutually exclusive threads and exposes both blocking and non-blocking operations. It also supports Fibers which can be used to implement cooperatively scheduled event-driven IO. The cognitive burden of dealing with these different APIs is left as an exercise to the programmer, and thus we have a wide range of IO libraries with varying degrees of concurrency. Composability of components build on different underlying IO libraries is generally poor because each library exposes its own API and has its own underlying event loop. We present an approach to concurrency that scales well and avoids the need to change to existing programs.
Improving Concurrency and Composability
Fibers are a negative overhead abstraction for concurrency, with each fiber representing a synchronous set of operations, and multiple fibers executing cooperatively in a single thread. This design provides concurrency with none of the overheads of parallel (multi-threaded) programming. Programmers can write their code as if it were sequential, which is easy to reason about, but when an operation would block, other fibers are given a chance to execute. Excellent scalability on Ruby is achieved by running multiple processes, each with its own event loop, and many fibers.
Here is an example of a basic asynchronous
read() operation. It is possible to inject such wrappers into existing code and they will work concurrenty without any further changes:
wait_readable() look like? In a simple
The problem with this design is that everyone has to agree on a wrapper and selector implementation. We already have a core IO layer in Ruby that practically everyone uses. Along with
IO.select(...) we have a ton of options for event driven concurrency, including but not limited to: NIO4R (alive), Async (alive), LightIO (experimental), EventMachine (undead), ruby-io (experimental).
The best boundary for event-drive IO loops in Ruby is per-thread (or taking the GIL into account, per-process). Event driven IO is naturally cooperative, and scheduling operations across threads makes it needlessly complicated. We can leverage Ruby's existing IO implementation by intercepting calls to
io.wait_writable() and redirect them to
We add an appropriate C API for
Thread.current.selector and add a layer of indirection to
int rb_io_wait_readable(int f) (and others):
Here is an example of how this fits together:
This design has a very minimal surface area, allows reuse of existing event loops (e.g. EventMachine, NIO4r). It's also trivial for other Rubies to implement (e.g. JRuby, Rubinius, TruffleRuby, etc).
While it's hard to make objective comparisons since this is a feature addition rather than a performance improvement, we can at least look at some benchmarks from async-http and async-postgres which implement the wrapper aproach discussed above.
Puma scales up to its configured limits. Falcon scales up until all cores are pegged.
The code is available here and the Ruby bug report has more details. There is a PR tracking changes.
The goal of these improvements is to improve the composability and performance of async. I've implemented the wrapper approach in async-io and it's proven itself to be a good model in several high level libraries, including: async-dns, async-http and async-http-faraday.
It seems kind of an unfair comparison to use a single process in Puma vs. 8 processes in Falcon.
Just from some quick math it seems like Falcon would still win by a large margin even if Puma was running 8 processes, so I think it’s unnecessary to “cheat” in order to look good no?
Puma is running with 16 threads. Threads which in practice allow Puma to use all 8 cores to the same extent as Falcon. So I think it’s fair.
Leave a comment
Please note, comments must be formatted using Markdown. Links can be enclosed in angle brackets, e.g.