Samuel Williams Monday, 13 April 2020

Ruby provides a mixed bag of tools for building asynchronous systems. While CRuby wraps all Ruby code in a Global Virtual Machine Lock (GVL), JRuby and TruffleRuby provide truly parallel threads. Code analysis reveals that thread synchronisation primitives are often used incorrectly, and while results may be okay on CRuby, running the same code on JRuby or TruffleRuby can expose race conditions with unexpected consequences. We present a light weight per-thread fiber scheduler which improves the concurrency of existing code with minimal changes. We discuss its implementation within Ruby and evaluate its performance using Net::HTTP.

This is the final report required by the 2019 Ruby Association Grant. There are two previous progress reports: December 2019 and January 2020.

Introduction

In 1996, Yukihiro Matsumoto (Matz) introduced an optional Thread primitive to MRI for producer-consumer style programs, inspired by OCaml's implementation of green threads. At the time, multi-core processors were not popular, and so interleaved execution of Ruby code (concurrency) was acceptable performance trade off.

In 2004, Koichi Sasada introduced native threads into YARV, so that blocking system calls would not stall the entire virtual machine (VM). In order to retain compatibility with existing C code which expected non-reentrant execution, the "Giant VM Lock" (GVL) was also introduced. This lock prevents the parallel execution of many parts of the Ruby VM, including Ruby code and C extensions. In 2007, YARV was merged into CRuby. It was assumed that fine-grained locking would eventually be implemented, however it turned out to be difficult due to thread safety and performance issues, and thus CRuby threads are unable to provide true parallelism.

In many situations, C extensions can release the GVL in order to improve parallelism. This can improve the scalability of a multi-threaded program. However, even this is not guaranteed. Thus, it can be challenging building truly scalable systems using threads with CRuby.

In addition, the non-determinism of threads creates a combinatorial explosion of program states. We have analysed existing Ruby code and identified a significant number of thread-safety issues. Even within Ruby itself, non-determinism leads to unpredictable bugs, including garbage collection of weak references, and deadlocks during signal handling.

Threads, as a model of computation, are wildly nondeterministic, and the job of the programmer becomes one of pruning that nondeterminism. Although many research techniques improve the model by offering more effective pruning, I argue that this is approaching the problem backwards. Rather than pruning nondeterminism, we should build from essentially deterministic, composable components. Nondeterminism should be explicitly and judiciously introduced where needed, rather than removed where not needed.Edward A. Lee
EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2006-1
January 10, 2006

Modern languages have tackled these problems in a variety of ways. Node.js uses a single-threaded event-driven design, with explicit callbacks. Using async/await style syntax helps to alleviate the need for heavily nested code, but it also adds a new semantic dimension which can be cumbersome in practice.

Go uses a multi-threaded scheduler, where multiple "goroutines" are scheduled across multiple system threads. Some thread-safe constructs are provided by default, but the developer may still be required to understand and manage shared mutable state, and unfortunately, in practice, this model is still not as good as a well optimised event loop.

JRuby and TruffleRuby both provide Thread-level parallelism and as a consequence, programs that worked correctly in CRuby due to the GVL, may work incorrectly on these implementations. While true parallelism is a welcome addition, existing thread safety issues are exacerbated with additional nondeterminsm in the form of preemptive scheduling and simultaneous execution which can lead to data corruption.

Existing work shows that event loops can be used to build scalable systems, however existing implementations must wrap around or replace Ruby constructs. We investigated how to use Fibers to expose non-blocking operations directly within CRuby. In order to work within existing event loops, we implemented a per-thread fiber scheduler which provides hooks for these non-blocking operations. This report presents an overview of the design, and shows how the approach improves the performance of existing real-world programs.

Implementation

You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.R. Buckminster Fuller

Our proposed design requires four minor changes. First, we introduce a thread scheduler interface which provides low level hooks for blocking operations. Then, we introduce non-blocking fibers. We extend threads to track whether the current execution context allows non-blocking operations, and if so, invoke the scheduler hooks. Finally, we change all I/Os to be (internally) non-blocking by default, so that they invoke these hooks when they would otherwise block.

Thread Scheduler

The per-thread fiber scheduler interface is used to intercept blocking operations. A typical implementation would be a wrapper for a gem like EventMachine or Async. This design provides separation of concerns between the event loop implementation and application code. It also allows for layered schedulers which can perform instrumentation.

class Scheduler
	# Wait for the given file descriptor to become readable.
	def wait_readable(fd)
	end
	
	# Wait for the given file descriptor to become writable.
	def wait_writable(fd)
	end
	
	# Wait for the given file descriptor to match the specified events within the specified timeout.
	def wait_for_single_fd(fd, events, timeout)
	end
	
	# Sleep the current task for the specified duration, or forever if not specified.
	def wait_sleep(duration = nil)
	end
	
	# The Ruby virtual machine is going to enter a system level blocking operation.
	def enter_blocking_region
	end
	
	# The Ruby virtual machine has completed the system level blocking operation.
	def exit_blocking_region
	end
	
	# Intercept the creation of a non-blocking fiber.
	def fiber(&block)
		Fiber.new(blocking: false, &block)
	end
	
	# Invoked when the thread exits.
	def run
		# Implement event loop here.
	end
end

Non-blocking Fibers

We introduce the concept of blocking and non-blocking fibers. By default, fibers are blocking and there is no change in behaviour. In contrast, non-blocking fibers may invoke specific scheduler hooks when a blocking operation occurs, and these hooks may introduce context switching points.

Fiber.new(blocking: false) do
	puts Fiber.current.blocking? # false
	
	# May invoke `Thread.scheduler&.wait_readable`.
	io.read(...)
	
	# May invoke `Thread.scheduler&.wait_writable`.
	io.write(...)
	
	# Will invoke `Thread.scheduler&.wait_sleep`.
	sleep(n)
end.resume

We also introduce a new method which simplifies the creation of these non-blocking fibers:

Fiber do
	puts Fiber.current.blocking? # false
end

The purpose of this method is to allow the scheduler to internally decide the policy for when to start the fiber, and whether to use symmetric or asymmetric fibers.

Non-blocking Threads

We introduce the concept of blocking and non-blocking threads. By default, threads are blocking. When switching to a non-blocking fiber, we track this state, and the thread may become non-blocking. When a non-blocking thread invokes a blocking operation, it may defer the operation to the thread scheduler.

puts Thread.current.blocking? # 1 (true)

Fiber.new(blocking: false) do
	puts Thread.current.blocking? # false
end.resume

In addition, locking a mutex causes the thread to become blocking:

mutex = Mutex.new

puts Thread.current.blocking? # 1 (true)

Fiber.new(blocking: false) do
	puts Thread.current.blocking? # false
	
	mutex.synchronize do
		puts Thread.current.blocking? # (1) true
	end
	
	puts Thread.current.blocking? # false
end.resume

Non-blocking I/O

Internally, all Ruby I/O operations can handle EAGAIN/EWOULDBLOCK. By making Ruby I/O nonblocking by default (OS permitting), we opportunistically route all operations via the scheduler. Here is an example of constructing a non-blocking socket:

static int
rsock_socket0(int domain, int type, int proto)
{
#ifdef SOCK_CLOEXEC
	type |= SOCK_CLOEXEC;
#endif

#ifdef SOCK_NONBLOCK
	type |= SOCK_NONBLOCK;
#endif

	int result = socket(domain, type, proto);

	if (result == -1)
		return -1;

	rb_fd_fix_cloexec(result);

#ifndef SOCK_NONBLOCK
	rsock_make_fd_nonblock(result);
#endif

	return result;
}

Evaluation

Given the above modifications, we can compile and run a simple example using Net::HTTP and show that scalability is improved with minimal changes to user code.

def fetch_topics(topics)
	counts = {}
	
	conditions = topics.map do |topic|
		condition = Scheduler::Condition.new
		
		Fiber.new(blocking: Fiber.current.blocking?) do
			uri = URI("https://www.google.com/search?q=#{topic}")
			counts[topic] = Net::HTTP.get(uri).scan(topic).size
			
			condition.signal
		end.resume
		
		condition
	end
	
	# Wait for all requests to finish:
	conditions.each(&:wait)
	
	return counts
end

We run the above code with 1-8 topics, measuring the execution time:

While we can't show the benefits of this specific implementation in real world code yet, Async implements a similar event-driven model and has shown improvements in GitHub Changelog Generator and Travis.rb. In both these cases, Async::HTTP::Faraday provides a relatively transparent way to improve concurrency.

Project Schedule

November 2019
Add support for Scheduler#enter_blocking_region and Scheduler#exit_blocking_region.
December 2019
Meetings with Matz and Koichi in Japan to discuss implementation and long term plans. Details are in the previous report.
January 2020
Rework scheduler implementation and improve performance.
Implemented prototype scheduler in Async
March 2020
Add Kernel::Fiber method and scheduler hook.
Fix timeout handling and bind scheduler.run to thread exit.
Improvements to tests. Implement support for thread blocking which bypasses scheduler.
April 2020
Meeting with Matz and Koichi online to discuss progress.
Add support for non-blocking blocking fibers and threads.
Make all I/O non-blocking by default where possible.
Add more tests and documentation.

The schedule was delayed because many experiments were required. It was discussed with the mentor, Koichi Sasada, who approved the extension.

Conclusion

We have shown how non-blocking fibers can be used to improve the concurrency of existing code. We have described a model for intercepting blocking operations on a per-thread basis. We demonstrated a working implementation, showing significant improvements to the concurrency of Net::HTTP with zero changes to the underlying implementation. Finally, we have shown that a similar event-driven model can be used in real world code with similar performance improvements and ergonomics.

We have submitted a proposal for consideration by the Ruby core developers. We will discuss it and hopefully find consensus, and then we will update our implementation. All things going well, merge it in time for Ruby 3.

Sponsorship

Many thanks to the 2019 Ruby Association Grant and all my sponsors who help enable this work.

If you are a company and you depend on Ruby, I invite you to support my work. I also provide commercial support agreements and contracting for businesses that want to improve their infrastructure using Falcon and Async. Feel free to get in touch.

Comments

Leave a comment

Please note, comments must be formatted using Markdown. Links can be enclosed in angle brackets, e.g. <www.codeotaku.com>.