News: Stay up to date

The Étoilé community is an active group of developers, designers, testers and users. New work is being done every day. Visit often to find out what we've been up to.


Smalltalk and Objective-C Performance

Posted on 27 April 2010 by David Chisnall

If you find yourself optimizing your code, then it means that the author of your compiler has failed. In Étoilé, we use two languages; Smalltalk and Objective-C. In theory, we can also use EScript (a dialect of ECMAScript using LanguageKit, but I don't think anyone ever has aside from a couple of examples).

Clang, which compiles Objective-C, and LanguageKit, which compiles Smalltalk, both use LLVM for code generation. This means that they both produce an intermediate representation in the same form. LLVM provides a lot of infrastructure for transforming this intermediate representation, which is how you implement compiler optimizations.

As part of the libobjc2 project, I've been writing a few of these that speed up code targeting the new runtime (which both Clang and LanguageKit do). The first of these passes is very simple. The new runtime adds support for non-fragile instance variables. With older Objective-C implementations, instance variables were accessed via a fixed offset. This is nice and fast, but it means that, if you modify one class's instance variable layout (including just adding an ivar), then you must recompile all subclasses.

With non-fragile ivars, you access all instance variables via an indirection variable. This records the offset of the ivar, and is fixed up by the runtime when the class is loaded. This means that you always get the right offset, even if other ivars are rearranged. This is great, but sometimes you don't actually need it. If your class inherits directly from NSObject, for example, or from intermediate classes declared in the same library (which, it turns out, includes about 90% of classes), this extra overhead is for no benefit, because you will be recompiling the subclasses when you recompile the class anyway.

The ivar lowering pass reverses this. It introduces hard-coded ivar offsets if it is safe to do so without increasing the fragility of the library. This means that you only get the performance penalty from non-fragile ivars if you actually need them, for example when you subclass something from a third-party library.

This isn't especially interesting. The extra cost of indirect ivar accesses is really small. You're only likely to notice it if you are doing a huge number of ivar accesses in different objects and your cache is full.

Message sends, on the other hand, have a big impact on performance. Every time you send a message in Objective-C or Smalltalk, you need to do a dynamic lookup to find the method to call, then you call the method in the same way that you call a C function. Both of these have some cost.

One way you can work around this in code is IMP caching, where you perform the lookup yourself, store the result, then call it as a function pointer. The compiler couldn't do this itself, because caching the IMP (instance method pointer - the function pointer for the method) broke some of the dynamic features of Objective-C. If you changed the selector to method mapping (either explicitly via runtime functions or by loading a category) then the cache becomes invalid and there was no mechanism for the runtime to invalidate it.

This changed with the new runtime. Now, the method lookup function returns a slot, which can be safely cached and invalidated later. The new pass makes use of this to automatically cache slots for message sends that happen in loops. To test it, I wrote a simple benchmark program that sends a message 1,000,000,000 times. The method does nothing, it just returns immediately, so all of the time is spent in the message lookup and sending.

Unoptimized, this takes 10 seconds. With the auto-caching pass, it takes around 4.6 seconds. Adding in the normal set of optimizations, these times drop to 8 and 3 seconds, respectively. For reference, replacing the message send with a C function call makes the time 3.5 seconds, so Objective-C is very, very close to raw C performance in this case. Note that if the function is in a different library, as it often is, you have to go via a relocation table, which brings the speeds much closer, and can even make the cached Objective-C version faster.

This isn't the end, of course. One thing that you can do easily in C/C++, but not Objective-C, is inline a function. This involves replacing a call to a function with a copy of the function body. This eliminates the call overhead and lets you do some other optimizations easily, like constant and subexpression propagation between functions.

We can do inlining with Objective-C too, in theory, but it has to be speculative. We can inline methods, then wrap the inlined version in a test that checks that it really is the correct one. That's next on the list.

We probably can't get Smalltalk quite as fast as C, but if it's within 10-20%, there's very little reason not to use it.