Samuel Williams Saturday, 05 August 2017

I've recently been working on an offline vulkan renderer/compositor. Our initial implementation was a one-shot renderer - spawn the process, render the image, and exit. However, to amortize startup costs, we are converting it to a multi-shot renderer with an HTTP API. The first implementation simply used vkQueueWaitIdle(), but in a multi-threaded environment this might be less than optimal as multiple command buffers are submitted to the same queue.

Using a fence allows the CPU to wait for the GPU to complete a specific command buffer, in this case, rendering the image and saving it to host memory.

In our renderer, we tried using a fence (incorrectly) with a timeout of 0, assuming it meant to wait indefinitely. We couldn't get it to work so we reverted back to using vkQueueWaitIdle() which was fine for a one-shot renderer. However, after implementing our multi-threaded renderer, and attempting to (incorrectly) use fences again, we experienced corrupt output:

After checking the documentation, I found out we were doing it wrong:

If timeout is zero, then vkWaitForFences does not wait, but simply returns the current state of the fences. VK_TIMEOUT will be returned in this case if the condition is not satisfied, even though no actual wait was performed.

The correct implementation uses a loop; we can issue a warning if the job appears to be taking longer than expected:

Console::debug("Submitting command buffer to GPU...");

// The vulkan device:
vk::Device & device = _context->device();

// The command buffer we want to submit:
auto submits = vk::SubmitInfo()
	.setCommandBufferCount(1).setPCommandBuffers(&commands);

// The queue we are going to submit to:
auto queue = device.getQueue(graphics_queue, 0);

// Generate a temporary fence:
auto fence = device.createFenceUnique({});

// Submit the command buffer to the queue with the fence:
queue.submit(1, &submits, *fence);

// Loop until the fence is signalled:
while (true) {
	// Wait for 10ms for the render to complete:
	auto result = device.waitForFences(*fence, true, 10000000);
	
	// Check the result - if it's successful we are done:
	if (result == vk::Result::eSuccess)
		break;
	
	// Otherwise, we took longer than 10ms to render:
	Console::warn("Wait for fence: ", vk::to_string(result));
	
	// If the result wasn't a timeout (e.g. error), we fail:
	if (result != vk::Result::eTimeout)
		throw std::runtime_error("renderer failed");
}

In hindsight, this was a relatively trivial problem, however it highlighted the fact that Vulkan can sometimes be hard to comprehend in its entirety. I didn't write the original fence code, so without knowing any better, I initially suspected some problem with image barriers. When code gets bulky, it makes refactoring and the subsequent debugging harder.