Samuel Williams Saturday, 03 September 2011

I was interested to learn more about garbage collection. The idea is not complex, but obviously writing a fast, portable implementation that deals with all the edge cases is a challenge.

However, I just wanted to try something simple - detecting when pointers are no longer directly on the stack is a good starting point, so I implemented that with a very basic mark and sweep algorithm.

//
//  main.cpp
//  GarbageCollector
//
//  Created by Samuel Williams on 2/09/11.
//  Copyright 2011 Orion Transfer Ltd. All rights reserved.
//

#include <iostream>
#include <map>
#include <cstddef>
#include <cstdlib>
#include <cstring>

/// A very basic stack-only garbage collector.
/// This was to enjoy writing some code to see how garbage collection works.
/// Everything else is just an implementation detail or optimisation =p
class GarbageCollector {
	protected:
		struct Allocation {
			std::size_t size;
			unsigned short mark;
		};
		
		typedef std::map<void *, Allocation> PointerMap;
		
		PointerMap m_allocations;
			
	public:
		void * allocate(std::size_t size);
		void collect();
};

void* GarbageCollector::allocate(std::size_t size) {
	void * pointer = malloc(size);
	
	if (pointer) {
		std::cout << "Allocating " << size << " bytes of memory at " << pointer << std::endl;
		Allocation allocation = {size, 0};
		
		PointerMap::value_type pair(pointer, allocation);
		m_allocations.insert(pair);
	}
	
	return pointer;
}

// This will only collect pointers _directly_ on the stack.
void GarbageCollector::collect() {
	// We expect for this to work, the GarbageCollector must be allocated at the top of the stack.
	void * top = this;
	
	// Then we grab the address of the stack at the current point.
	void ** current = (void **)&top;
	
	std::cout << "Collecting from " << top << " to " << current << std::endl;
	
	// Clear all the marks.
	for (PointerMap::iterator root = m_allocations.begin(); root != m_allocations.end(); root++) {
		root->second.mark = 0;
	}
	
	// We scan the stack and mark all pointers we find.
	while (current < top) {
		void * pointer = *current;
		PointerMap::iterator allocation = m_allocations.find(pointer);
		
		if (allocation != m_allocations.end()) {
			std::cerr << "Found allocation " << pointer << " at " << current << std::endl;
			allocation->second.mark += 1;
		}
		
		// Move to next pointer
		current += 1;
	}
	
	// Scan through all roots again and free any items that were not marked.
	for (PointerMap::iterator root = m_allocations.begin(); root != m_allocations.end(); root++) {
		if (root->second.mark == 0) {
			std::cerr << "Releasing " << root->second.size << " bytes at " << root->first << std::endl;
			
			free(root->first);

			m_allocations.erase(root);
		}
	}
}

void helloWorld (GarbageCollector & gc)
{
	char * buffer = (char *)gc.allocate(256);
	std::cout << "Buffer pointer at " << &buffer << std::endl;

	strcpy(buffer, "Hello, World!");
	
	std::cout << buffer << std::endl;
}

int main (int argc, const char * argv[])
{
	GarbageCollector gc;
	
	for (std::size_t i = 0; i < 10; i++)
		helloWorld(gc);
	
	gc.collect();
	
    return 0;
}

This garbage collector will only scan the stack for allocated memory pointers. It isn't portable or efficient. However, I enjoyed writing the code and trying it out. Maybe I'll make something more complete in the future.

Limitations (Update)

Because this garbage collector is designed to be very simple, it has many limitations:

Feel free to improve the code and send suggestions. Ideally, I want to keep it as easy to understand as possible.

Comments

Interesting post. I started writing some sample precise garbage collectors at: https://github.com/ioddly/gc-zoo. Tracking roots is kind of a pain compared to just scanning the stack but ultimately worth the effort, in my opinion.

On what OS/arch did you test it on? It crashes on my school computer (Win XP). I haven’t tested it yet on my home machine which runs Linux.

Also, I don’t really understand how

void * top = this;
void ** current = (void **)&top;

works. I think it has something to do with crashing the computer ;)

@Kostya It works by getting the address of the local variable on the stack. We make the assumption that this is allocated on the stack at line #101. By writing &top which is effectively &this, we are taking the address of the top of the stack. It is a bit of a hack but it works fine for demonstration purposes - and I’ve also (surprisingly) seen this type of construct in non-trivial garbage collection implementations.

Leave a comment

Please note, comments must be formatted using Markdown. Links can be enclosed in angle brackets, e.g. <www.codeotaku.com>.