Over the past seven years, I have focused on enhancing concurrency, scalability, and interactivity within the Ruby ecosystem. In 2017, I released Async, a framework for building concurrent Ruby applications, but the initial design required wrappers for intercepting blocking operations, which limited compatibility with existing code. To addresses these limitations, I created the fiber scheduler, which was introduced in Ruby 3.1 and Async 2 in 2021. By transparently redirecting blocking operations to an event loop, the fiber scheduler enables existing Ruby code to run concurrently without modification.
Despite these advancements, the quest for real-time web applications posed further challenges. In 2022, we released Rack 3, which made streaming a mandatory part of the specification. These changes were adopted by Rails 7.1, and further enhanced in Rails 7.2. When combined with the Falcon web server, applications are capable of handling thousands of real-time connections, expanding the possibilities for interactivity. To best leverage these capabilities, new approaches to application design are required. We will discuss the evolution of these technologies, and show how to take advantage of them to build experiences that were previously impossible in Ruby.
The Birth of Async
My first servers were connected to the internet using an ADSL modem. This photo from 2002 shows the basic setup I had in my bedroom:
I used this hardware to experiment with Linux and internet hosting. I was fascinated by the idea of running my own servers, and I wanted to learn how to build and deploy my own public websites. There were many challenges with this setup: My internet connection was slow, and I only had a single dynamically assigned IPv4 address. I used dynamic DNS so that external clients could connect to my servers using public hostnames. Port forwarding from the ADSL modem's external interface to the internal server was used to route traffic to the correct machine. However, when trying to access those hostnames from within my local network, the DNS records would resolve to the internal interface of the modem, and port forwarding would not work. This meant that I could not access my own websites from within my local network.
My solution to this problem was to run a DNS server on my local network. This server would intercept requests for my website's hostname, and return the internal IP address of the server. All other requests would be forwarded to the DNS server provided by the ADSL modem. For a long time I was using bind
for this purpose, but had to manually configure every hostname and zone, which was cumbersome. A few years later, I started learning Ruby, and I thought it would be a fun project to write a custom DNS server to handle this task. Little did I know how deep the rabbit hole would go.
My initial design from 2009 used a single thread per request, which easily became overwhelmed. When I tried to use it on my home network, I noticed browsing the web became much slower, or didn't work at all, as the server was overloaded. Around that time, Node.js
was also released and I started to learn about event loops and non-blocking IO. I wanted to apply these concepts to my DNS server, but I found it difficult to integrate with Ruby's existing IO libraries. I started to experiment with different concurrency models, and this is where the story of Async begins.
EventMachine
A few years later, in 2011, I became aware of EventMachine, and decided to upgrade my DNS server implementation to use it. This allowed me to handle multiple concurrent requests, but I had to rewrite my code to suit its callback-based programming model. When I tried to integrate existing libraries like Net::HTTP
, I found compatibility was poor and the callback-based programming model to be cumbersome and error-prone. In addition, EventMachine at the time could crash under certain conditions, and did not support IPv6, which was becoming increasingly important, so I wasn't entirely happy with the new implementation.
- Callback-Driven Programming Model: Leads to complex, hard-to-maintain code often referred to as "callback hell."
- Limited Concurrency Model: Single event loop doesn't fully utilize multicore processors, causing performance bottlenecks in CPU-bound tasks.
- Blocking Operations: Blocking methods can freeze the event loop, making the application unresponsive, requiring workarounds.
- Difficult Error Handling: Managing errors within callbacks is tricky, leading to fragile code if not handled carefully.
- Compatibility: Due to many of the above points, existing libraries and applications often need to be rewritten to work with EventMachine.
- Steep Learning Curve: The complexity of event-driven programming in EventMachine can be daunting for newcomers.
Celluloid
In 2013, I heard about Celluloid - a framework for building scalable applications in Ruby. It provided an event-driven actor capable of handling multiplexed IO with non-blocking compatibility shims around Ruby's native IO. I was excited by the idea of actors, which I thought would be a good fit for my DNS server, and an improvement over the EventMachine implementation. I decided to update my DNS server with Celluloid. I had to rewrite my code again, and I started to notice my tests were failing in strange ways. I discovered that Celluloid's actors were global, and state was leaking between tests. I started working on Celluloid itself to address these issues, but found the internal implementation challenging due to hard-coded constraints and design decisions that were difficult to follow. These limitations likely contributed to the project's lack of progress towards a stable release.
- Global State and Actors: Actors were global, making it difficult to isolate state.
- Asynchronous Error Handling: The system of supervisors and linked actors could lead to cascading failures, complicating error recovery and debugging.
- Complexity of Actor Messaging: The asynchronous messaging system introduced additional complexity, making the code harder to reason about.
- Compatibility: Celluloid still failed to integrate nicely with external libraries and applications.
Principles
Based on these experiences, I decided to build my own concurrency framework. In hindsight, I can distill my motivations into a few key principles:
Simplicity: The interface should be intuitive and consistent, and the implementation should be easy to understand and reason about. Complexity should arise from the layering of simple components, not from the design itself. We should avoid introducing new concepts or abstractions unless absolutely necessary, aiming to make the framework as easy to understand and use as possible.
Compatibility: Existing programs should be able to run concurrently without modification. Interfaces that introduce non-determinism should be transparent to the program. We should not need to introduce new keywords, methods or semantics except in places where the user explicitly desires concurrency, and even these should be kept to a minimum.
Isolation: The life-cycle of concurrent tasks and associated resources must be clearly defined, and execution of independent operations should not cause undesirable interactions, even in the presence of errors. Normal sequential behaviour should not be affected by concurrent operations. Concurrency may be an internal implementation detail, and should not affect the public interface or behaviour.
In addition, I made a decision to impose a short timeframe for achieving a stable release, guided by past experiences with projects that never reached this critical milestone. The goal was to prioritize the development of the most essential interfaces and avoid the prolonged instability and indecision often associated with extended development phases. By time-boxing the development, the aim was to ensure Async remained focused on the core problem domain of concurrency.
Other Notable Mentions
Before we talk about Async, I want to mention a few other projects and talks that have influenced my thinking and approach to building concurrent systems in Ruby:
- neverblock (2008): was possibly the first attempt to provide synchronous IO operations within EventMachine using fibers, allowing existing synchronous code to execute concurrently using an event loop. I didn't know about this gem until quite recently, but it shows that there is a long history of people trying to improve Ruby's concurrency.
- em-synchrony (2011): was another early attempt to provide synchronous IO operations within EventMachine using fibers. It included monkey patches for many popular libraries for compatibility with EventMachine, but ultimately shows how hard it is to maintain.
- User-level threads....... with threads (2014): is a talk by Paul Turner about how cooperative scheduling can be used to build high-performance concurrent systems. This talk influenced my thinking about the cost of context switching and scheduling operations.
- C++ Coroutines - a negative overhead abstraction (2015): is a talk by Gor Nishanov about how coroutines are a fundamental extension to the concept of a routine that can lead to significantly simpler code. This talk was a major influence on my thinking about how to build concurrent systems.
- Faster IO through io_uring (2019): is a talk by Jens Axboe about the introduction of the
io_uring
interface in the Linux kernel. This talk was a major influence on my thinking about how to manage non-blocking system calls, specifically asynchronous file system access.
Async 1
After four months of development, in August 2017, I released Async 1. It provided a simple mechanism for running concurent tasks and hooks for non-blocking IO. Building on existing ideas from the celluloid-io gem, it utilized nio4r for the event loop and a separate async-io gem for compatibility shims. This design allowed existing Ruby code to run concurrently without modification, provided they could use the compatibility shims, and established a foundation for building high-performance network servers and clients.
Async 1 was the best implementaion I could create without changes to Ruby's internal implementation. I believe it was a success, in that it showed the potential that Ruby has for building highly concurrent web applications. I presented the implementation and ideas to Matz at RubyWorld Conference 2019 and explained that expanding Ruby's internal implementation would enable us to improve compatibility. Specifically, we needed a way to transparently redirect blocking operations from Ruby's native interfaces to an event loop. Matz agreed with the general ideas, and supported the development of a fiber scheduler interface for Ruby.
Fiber Scheduler
Two years later, in December 2021, Ruby 3.1 was released with the fiber scheduler interface. It provided a way to transparently redirect Ruby's internal blocking operations to an event loop. The initial implementation only supported a limited set of operations, but over time it has grown to support almost every blocking operation within Ruby, including:
- Waiting on threads, queues and mutexes.
- Waiting on IO, including reads and writes.
- Resolving DNS names to addresses.
- Waiting on child processes.
- Executing code with a timeout.
The fiber scheduler is transparent to application code, and allows existing Ruby code to run concurrently without modification. This is in stark contrast to other languages and libraries, which often require significant changes to the application code, or the use of special syntax or keywords to introduce concurrency, creating significant compatibility problems.
Async 2
Async 2 was developed alongside the fiber scheduler, and released at the same time in December 2021. Because the fiber scheduler intercepts blocking operations, the Async::IO
compatibility shims are no longer needed. As a result, the implementation of Async 2 is significantly simpler than its predecessor, while having improved compatibility with existing Ruby libraries and applications.
IO::Event
At the heart of Async 2, is the io-event gem, which provides a Ruby-optimised event loop capable of handling thousands of concurrent connections. In fact we have significantly exceeded the carrying capacity of Ruby's garbage collector, which is now the main bottleneck. This event loop now defaults to io_uring
on Linux, giving you access to the latest in high-performance IO interfaces.
More specifically, io_uring
is an extremely exciting technology that continues to evolve with each release of the Linux kernel. It provides a way to perform non-blocking system calls using a submission queue and completion queue. Operations like read
and write
are enqueued to the submission queue, and the kernel processes those requests in the background, enqueuing the result to the completion queue. This allows us to handle all types of file descriptors, greatly improving the compatibility and performance of the event loop.
Falcon
In order to ground the development of Async in real-world use cases, I created the Falcon web server. It enabled me to explore the capabilities of Async in the context of a existing web applications, and to identify areas for improvement. Specifically, I wanted to implement support for HTTP/2, WebSockets, and other modern web technologies, and to ensure that Async could handle the demands of real-time interactive web applications. While working on supporting these features, I discovered that the design of Rack had weaknesses that needed to be addressed. Falcon served as a foundation for exploring these issues, and helped to inform the development of Rack 3.
Rack 3
As part of the effort to improve the performance and scalability of Ruby web applications, we released Rack 3 in 2022. Streaming (and an explicit model for when buffering was allowed) was a key feature of this release, allowing web applications to send data to the client as it becomes available, rather than waiting for the entire response to be generated. This change is particularly useful for real-time applications, such as chat rooms, live dashboards, and multiplayer games, where low latency and high interactivity are critical.
Rails 7
Rails 7.0 introduced the initial support for fiber-per-request. This change was a significant step, enabling Falcon to run Rails applications without leaking state between requests.
Rails 7.1 introduced initial support for Rack 3. When I started working on this part of Rails, it felt a bit like an archaeological dig - there is a lot of history in the Rails codebase, and it can be difficult to understand the motivations behind the implementation. However, I was able to work with the Rails team to integrate the new features of Rack 3, and to ensure that Rails applications could take advantage of the improved performance and scalability of the new version of Rack. I greatly appreciate their support and guidance.
Rails 7.2 introduced with_connection which is a significant change to the way ActiveRecord manages database connections. Holding on to a connection for the duration of a request is a common pattern in Rails, but it can lead to significant contention on the connection pool, especially in long running requests like WebSockets. with_connection
only holds on to the connection for the duration of the block, and then returns it to the pool, reducing contention and improving the efficiency of the connection pool.
One last bastion of compatibility is ActionCable, and I'm pleased to report that we have an effort to "Adapterize" ActionCable so that we can take advantage of Falcon's high performance WebSocket support. This change will ultimately allow Rails applications to handle thousands of concurrent ActionCable connections, without separate servers or infrastructure, providing simplified developent and deployment for real-time web applications. We are currently aiming to ship this feature with Rails 8.
Live
While it's exciting to transparently improve the concurrency of existing applications, I believe that the real power of Async is in building new applications that take full advantage of the capabilities of the Async ecosystem. The live gem provides an interface for building real-time web applications. It takes advantage of the capabilities of Falcon, Async 2, and Rack 3, and provides a foundation for creating server-side rendered components which provide progressive enhancement to existing web applications. As an example, here is a clock view:
Which gives a result like this:
The time is: 2024-10-15 06:22:03 +0000
Ruby has had a poor reputation for building building real-time interactive web applications - and it's not undeserved - but that all changes with the foundation we have built with Async, Falcon, Rack, and Live.
Flappy Bird
As part of my talk at RubyKaigi 2024, I gave a demonstration of a Flappy Bird clone built using the Live gem and running within a Rails web application. The game is built using server-side rendered "Live View", and uses WebSockets for real-time communication between the client and server. The game is fully interactive, and demonstrates the power of the Ruby ecosystem for building real-time web applications.
This demonstration shatters the misconception that Rails can't be used for real-time web applications. Rails is a popular framework and despite the sigificant challenges, there is huge value in supporting progressive enhancement and real-time interactivity within the existing ecosystem. I encourage you to clone the repository and try it out for yourself. It's a fun way to explore the capabilities of the Ruby ecosystem, and to see how far we have come in building real-time web applications.
RubyKaigi 2024 Presentation
You can find the slides for my presentation here.
Lively
Building appropriate foundations is crucial to avoid unnecessary complexity. Many of the frameworks we use today, while powerful, can be challenging for those who are just starting out. The barrier to entry is high, and the learning curve can be steep. New developers often struggle to grasp systems with intricate interfaces that, despite being thoughtfully designed at each stage of their development, can appear confusing or counterintuitive when viewed as a whole.
More specifically, each time we introduce a layer of indirection, we add multiplicative complexity to the whole system. To manage this effectively, it's important to limit the number of options, focusing only on the decisions that truly matter within the problem domain. With this in mind, I have created a Lively gem for live programming in Ruby. Lively builds on the foundation of the Live gem, and provides a single file live coding environment.
Lively enables you to build web applications with a single file, and see the results of your changes in real-time. It provides a simple interface for building interactive applications, and is designed to be easy to use for beginners. I believe that Lively will be a valuable tool for teaching programming, and for exploring the capabilities of the Ruby ecosystem.
Conclusion
The integration of Async 2, Rack 3, Falcon, and Rails 7.2 provides a powerful foundation for building highly interactive and scalable web applications. Due to the design of the fiber scheduler, existing Ruby code can run with improved concurrency, enabling developers to progressively enhance existing applications with real-time interactivity without significant changes to the codebase. The Live gem shows the potential for building new applications that take full advantage of the capabilities of the Async ecosystem, and the Lively gem provides a simple interface for building interactive applications, pushing the boundaries of what is possible within the Ruby ecosystem. I look forward to seeing the innovative applications that Ruby developers will create using these tools.