On a 3GHz (3 billion hertz) processor, you expect to be able to context switch billions of times per second?coder543
Modern CPUs are incredible and implementing stackful coroutines has about the same overhead as a function call. Here is the implementation I wrote:
.globl coroutine_transfer
coroutine_transfer:
# Save caller state
pushq %rbp
pushq %rbx
pushq %r12
pushq %r13
pushq %r14
pushq %r15
# Save caller stack pointer
movq %rsp, (%rdi)
# Restore callee stack pointer
movq (%rsi), %rsp
# Restore callee stack
popq %r15
popq %r14
popq %r13
popq %r12
popq %rbx
popq %rbp
# Put the first argument into the return value
movq %rdi, %rax
# We pop the return address and jump to it
ret
On my (objectively ancient) linux desktop, on a single core, I get on the order of 100 million context switches per second. Across all 8 cores, this approaches 1 billion.
That being said, my original remark that it was possible to context switch billions of times per second was too casual and without evidence. At best it was unclear and at worse it was off by an order of magnitude. So, I apologise for any confusion and have updated the article.
The source code is available and you can run the benchmark yourself.