News

Autorelease Performance Improvements

Posted on 6 April 2012 by David Chisnall

As you may be aware, recent versions of Objective-C added a new @autoreleasepool keyword that defines a new scope bracketed by an autorelease pool. There are two reasons for this. The first is so that, in ARC mode, the compiler can reason more accurately about object lifetimes. The second is to eliminate the need for creating a new object for every scope.

The latter is important because long-lasting autorelease pools can collect a lot of objects and defer their destruction. This is most important on things like an iPhone where RAM is limited, but it also slows things down by meaning that objects persist long after their last use, so they prevent the program from reusing that memory, increasing cache churn and means that you need more system calls for getting new memory pages from the OS and returning them, rather than just reusing them.

This means that cheap autorelease pool creation can have a lot of performance advantages beyond those apparent in microbenchmarks. GNUstep already tries to make autorelease pools quite cheap to create by creating a per-thread cache of freed autorelease pools and reallocating them as required. If you're using ARC mode, however, you don't get autorelease pool objects. The autoreleasepool scope in ARC mode is implemented by bracketing it in calls to objc_autoreleasePoolPush() and objc_autoreleasePoolPop(). These return a void* pointer. In the current release of GNUstep, these just create a new NSAutoreleasePool, so there's no difference between them and explicitly creating the pool.

The current release of the GNUstep Objective-C runtime includes its own implementation which it will use if the NSAutoreleasePool doesn't opt in to supporting ARC mode. With the current svn trunk code, this is now enabled by default. This just creates a linked list of page-sized buffers and returns a pointer into the current buffer with the push and pop functions. This means that creating a new pool scope is very cheap - it's just returning a marker, not creating a new object.

To see how these compare in terms of performance, I used this little microbenchmark:

#import <Foundation/Foundation.h>

@interface Foo : NSObject
{
    id foo;
}
@property(readonly, nonatomic) id foo;
@end
@implementation Foo
- (id)init
{
    foo = [NSObject new];
    return self;
}
- (id)foo
{
#if __has_feature(objc_arc)
    return foo;
#else
    return [[foo retain] autorelease];
#endif
}
@end

int main(void)
{
    id x = [Foo new];
    for (unsigned int i=0 ; i<1000 ; i++)
    {
        @autoreleasepool
        {
            id f;
            for (unsigned int j=0 ; j<100000 ; j++)
            {
                f = [x foo];
            }
        }
    }
    return 0;
}

This performs a total of 100,000,000 autoreleases, 100,000 per autorelease pool. So, how long does it take to run?

With the old implementation, 6.9 seconds.
With the new implementation, 4.5 seconds.
With the new implementation, and the benchmark compiled in ARC mode, 3.5 seconds.

Now I think I need to find something else to optimise.

News: Stay up to date

News

Autorelease Performance Improvements

Search

Status

Archive