News

More Optimization

Posted on 28 April 2010 by David Chisnall

One of the things that is traditionally very slow in Objective-C is sending messages to classes. When you do something like:

[NSMutableArray new];

The compiler expands it to something roughly like this:

Class receiver = objc_lookup_class("NSMutableArray");
SEL new = @selector(new);
IMP method = objc_msg_lookup(receiver, new);
method(receiver, new);

There are several causes of overhead here. The first is the class lookup. In the new runtime, the class table is implemented as a hopscotch hash, which is relatively fast, but a lookup still requires hashing the string, and looking it up in the table. This accounts for the majority of the cost of a class message send.

The second bit of overhead is the class lookup. The new runtime uses the objc_msg_lookup_sender() function, which has a slightly different signature. As I wrote yesterday, you can cache the return value from this call, so we can save a lot of the overhead involved.

The final part is the overhead involved in constructing the call frame for the method and jumping there. This is present even when calling C functions. Overall, this adds up. Sending a million class messages took 56 seconds of CPU time on my machine.

With the new ABI, classes are exported as a public symbol. This means that we can save the cost of the class lookup, as long as that symbol is available. One of the optimizations I committed to the libobjc2 tree today substitutes that symbol for the call to objc_lookup_class(). With that, the cost of a million class messages drops to a bit over 10 seconds. Not bad.

Yesterday, I talked about caching the message lookup for message sends in loops. Class messages always have the same receiver, so they're also a good choice for caching. Another pass that I added today caches all class message sends. Now we're down to only a bit over 4 seconds for a million message sends.

What about the cost of the function? In C, for a small function, we could inline it, but this isn't an option for Objective-C because of the dynamic dispatch. Or is it? The final pass that I added does speculative inlining. This means that it inlines the function that it guesses will be called, and wraps it in a test. If the (cached) lookup returns the function that we are expecting, we go down the inlined path. If not, we call the returned function pointer. The current pass always inlines class methods if possible, but I'll change that soon so that it only inlines them if it's also sensible.

With speculative inlining, we're now down to 2 seconds for a million class messages. For comparison, a million C function calls took 3 seconds on the same machine.

That's the sort of performance I'm aiming for. And, because these optimizations are all done at the LLVM layer, they will work with both Objective-C and Smalltalk. They depend on libobjc2, although it should be possible to implement something similar for Apple's runtime (but not for the old GNU runtime).

News: Stay up to date

News

More Optimization

Search

Status

Archive