News: Stay up to date

The Étoilé community is an active group of developers, designers, testers and users. New work is being done every day. Visit often to find out what we've been up to.

News

Garbage Collection

Posted on 26 May 2011 by David Chisnall

On Sunday, I thought I'd have a go at implementing Apple's APIs for Objective-C garbage collection in the GNUstep Objective-C runtime, using the Boehm collector. As of today (Thursday), it's working well enough that I can run complex applications like Gorm. I've also modified LanguageKit to insert the required write barriers, so you can now write full garbage collected Smalltalk code that integrates with Objective-C code.

I initially started working on this because I got bored with people saying 'GNUstep sucks, it doesn't support garbage collection like Cocoa' and never expected to actually use it. After playing with it, however, I'm starting to change my mind. Gorm using GC uses somewhere between 5 and 10% less RAM than Gorm using explicit reference counting. I see similar low memory usage with everything else that I've tried. There may be some CPU usage overheads, but nothing I've run has been perceptibly slower, so if there are then they're not very important.

Apple spent something on the order of 25 man-years developing their garbage collector for OS X, so don't be surprised if three days of my effort doesn't perform as well. Autozone (Apple's GC) has two main advantages over Boehm:

  • The Boehm mark phase can run concurrently, but it's still a stop-the-world collector. Autozone is fully concurrent.
  • The Boehm collector is portable, while Autozone is so deeply wedged into the Mach virtual memory subsystem that there's no hope of ever disentangling it, meaning that the Boehm collector doesn't get nearly as much feedback from the VM subsystem as Autozone. As a simple example, autozone can read the dirty flag from page table entries directly, so it knows if a page has been modified without scanning it. Boehm can do something similar using mprotect(), but this incurs significant system call overhead.

That said, the Boehm collector is actively developed, and is used in a lot of projects. Performance is already good, and should continue to improve.

Why is garbage collection important for Étoilé? For one thing, it's pretty much an expected feature of languages these days. If you've been using Objective-C for a while, then you probably do the -retain/-release dance without even thinking about it. For new developers, it represents a fairly significant barrier to entry. For example, consider the following method:

- (void)setFoo: (id)newFoo
{
    id tmp = [newFoo retain];
    [foo release];
    foo = tmp;
}

That's the simple form of a set method in Objective-C. Oh, and that isn't thread-safe. Here's the thread-safe version:

- (void)setFoo: (id)newFoo
{
    id tmp = [newFoo retain];
    tmp = __sync_swap(&foo, tmp);
    [tmp release];
}

Well, I think it is, anyway. I probably made an error somewhere, because concurrency is hard. Now here's the thread-safe version in GC mode:

- (void)setFoo: (id)newFoo
{
    foo = newFoo;
}

See the improvement? If you saw those two versions, which would you find easier to understand? Now for the really important question: which one would you expect to be faster? This is where it gets a little bit more complicated. The second version actually has some compiler trickery involved. It's more like this, if you had to write it with a compiler that didn't insert the write barriers explicitly:

- (void)setFoo: (id)newFoo
{
    objc_assign_ivar(newFoo, self, offset_of_foo);
}

So this version does have the overhead of the function call - it's not a straight assignment. The reference counted version, however, has 4 function calls - two to the runtime function to look up the retain and release methods, and two to actually call those methods. With my latest LLVM optimisations, the lookups will probably be cached, but you still have to do method calls.

What goes on in those method calls? Well, retain and release both do atomic increment / decrement operations. On a multicore system, these mean locking the bus, which can have an overhead on the order of a hundred cycles. Oh, and there's a third atomic operation in the middle.

In contrast, the objc_assign_ivar() function does little more than the assignment. It's very cheap. Of course, that's not the whole story. In the traditional mode, if the reference count hits 0, then the object is deleted immediately. In GC mode, the collector must periodically find objects that are no longer referenced and delete them, which adds some overhead.

The other complication is autorelease pools. With pure reference counting, you have a problem returning temporary objects. You want to return them with a reference count of 0 (because you no longer have a reference to them), but you don't want the caller to have to remember to release them. The OpenStep solution to this is to add an -autorelease method, which adds the object to a pool. It is then sent a -release message when the pool is destroyed - typically at the end of the run loop iteration.

This means that temporary objects can exist for a very long time. I had some code a few months ago that was allocating and autoreleasing about 500MB of temporary objects, but only about 5MB of them were live. On my machine, this meant that objects that were no longer live were first being swapped out, then were being swapped back in when the autorelease pool sent them a release message, then finally being completely freed.

In a garbage collected environment, this would not have happened. The collector would have been periodically run and would have freed some of the unreferenced objects before they got swapped out.

Large autorelease pools are a significant problem with a number of defensive programming patterns. One example is this kind of set method:

- (void)setFoo: (id)newFoo
{
    [foo autorelease];
    foo = [newFoo retain];
}

This means that the old value of foo won't be freed until the end of the current run loop iteration.  More importantly, the synthesized property accessor methods are implemented something like this:

- (id)foo
{
    return [[foo retain] autorelease];
}

Both of these are intended to allow you to hold a reference to the value of foo on the stack without finding it suddenly turning into a dangling pointer. The second method actually works, the first just gives you some bugs that are insanely hard to debug in multithreaded code (Google recommends the former, but like almost everything else in their Objective-C style guide it's a really good way of writing unmaintainable code).

This kind of defensive programming means that you don't have to spend so much time thinking, but means that you're writing highly suboptimal code. This puts garbage collection in the same category as most high-level language features: if you look at any specific point in your program, you can probably write it more efficiently using low-level techniques, but doing that for the entire program is probably impossible. If you're in garbage collected mode, the synthesized property accessor method looks like this:

- (id)foo
{
    return foo;
}

No message sends. Nothing added to the autorelease pool. The object is on the stack, so it's treated as a root (i.e. it can't be collected, and neither can anything else that it references. You can write code this simple anywhere where you are dealing with objects.

Since it wasn't working, I'd imagine that most GNUstep / Étoilé programmers have not really looked at Apple's garbage collection APIs in detail. For the most part, you can keep the main advantage of Objective-C: the ease with which you can drop into low-level mode for the few bits of your code that really are performance critical. You can still allocate memory with malloc() and free it with free() - the collector will ignore this memory completely.

This is useful for things like images, large buffers of data, and so on. In a typical Objective-C program, under 20% of the heap will be managed by the garbage collector directly. There are some halfway steps. For example, if you allocate memory with NSAllocateCollectable(), with 0 as the second argument, then you get memory that the collector will free for you when it is no longer referenced, but which is not scanned. If you store a pointer to such data in an instance variable, then it not will be freed as long as the object referencing it is live.

One very convenient feature is the addition of zeroing weak references. The canonical use case for these is NSNotificationCenter. You typically add a line in your -dealloc method removing yourself as a notification observer. In a garbage collected program, you have a -finalize method instead of -dealloc, which is called when the object is collected. You can't unregister from a notification center here, because while any object has a reference to your object, it won't be eligible for finalisation. The notification center solves this by storing a weak reference to your object. This doesn't prevent it from being destroyed. As an added bonus, it means that you don't need to unregister for notifications - the pointer held by NSNotificationCenter will become nil with no interaction on your part.

Most Objective-C code uses some C libraries, and you often want to pass object pointers into such libraries. It's a common C idiom to implement callbacks as a function pointer and a (void*) data pointer. In Objective-C, you typically pass a small trampoline function and a retained object into such functions. The trampoline then sends a message to the object when it is called. With garbage collection, you can call either CFRetain(), or [NSGarbageCollector -disableCollectorForPointer:] to tell the collector not to free the object.

Most Objective-C code should run unmodified in this mode. Clang will strip out memory management message sends (and if it misses any, they're implemented as no-ops) and insert the required barrier functions.

If you want to play with it, you will need to get trunk versions of clang the GNUstep Objective-C runtime, and the GNUstep base and back libraries from subversion. You will then need to recompile everything with the -fobjc-gc-only command-line option. Hopefully everything will work. I didn't have to make any changes to the GNUstep AppKit implementation nor to Gorm. Similarly, the EtoileFoundation test suite worked fine without any changes. I did have to modify LanguageKit, but it's a compiler so that's expected. I had to make one change to GNUstep-back, because it was storing object pointers in some memory allocated with malloc(). If you're doing that, then expect some small problems. Typically, these are fixed by turning malloc(size) calls into calls into NSAllocateCollectable(size, NSScannedOption) and deleting the corresponding free() - hardly a major change.

Hopefully, once it's a bit better tested, we will enable garbage collection by default for Étoilé, and get to write much simpler code.