News

LanguageKit SmallInt Improvements

Posted on 2 April 2009 by David Chisnall

Yesterday, I spent some more time hacking on clang. This time, it was implementing -ftrapv, a command-line option which automatically inserts overflow checking on integer arithmetic operations. This was made very easy to do because LLVM 2.5 now supports this in the IR. There are now intrinsic functions like llvm.sadd.with.overflow which return a pair containing the result of the arithmetic operation and a flag indicating whether it overflowed. On x86 (and some other back ends - I don't know exactly which ones now, but eventually it should be all of them), this just provides access to a condition flag in a register.

The implementation of -ftrapv in GCC just aborts in case of overflow, but in clang I had it call a function. This function takes the two operands some some information about the operation as arguments, and returns a value which is used in place of the overflowed result. I contributed a simple default version of this to clang, which simply printed a helpful error message and aborted.

For LanguageKit, I did something a bit more interesting. The implementation of arithmetic operations in Smalltalk originally was quite naive. A SmallInt in LanguageKit is an integer squeezed into a pointer. On a 32-bit platform, it is a 31-bit integer, while on 64-bit platforms it is a 63-bit integer. In both cases, the remaining (low) bit is set to 1, to distinguish it from an object pointer (object pointers are always word-aligned and so their least-significant bit is always 0). For each operation, I was doing something like this:

a >>= 1; // Turn the SmallInt into an int
b >>= 1; // Turn the SmallInt into an int
c = a + b; // Do the actual addition
if (c > SMALL_INT_MAX) { handle overflow } // Do some simple overflow checking
c = (c << 1) | 1; // Shift the int back to a SmallInt and set the low bit
return c; // Return the result

One of the problems with this is that every operation then added at least one pair of shifts which the LLVM optimisers could not remove. Detecting that a value right shifted and then left shifted by the same amount is the same as the original is nontrivial, and requires knowledge of the values being stored.

The new code is much simpler. This is the real code from the addition implementation, a function which takes two pointers obj and other as arguments:

OTHER_OBJECT(plus);
intptr_t val = ((intptr_t)other) & ~ 1;
return (void*)((intptr_t)obj + val);

The first line is a macro which tests if other is an object (typically a BigInt, but maybe an NSNumber of similar). If it is, then it promotes obj to a BigInt and tries the addition again. The second line clears the low bit on the other object pointer. The final line just adds the two SmallInts together. Because the low bit on one is always 1, and the low bit in the other is always 0, this addition does the same thing as the two right shifts, addition, left shift, and or in the older version. Even better, because we are now adding word-sized numbers, the hardware overflow checking can now be used correctly. If the addition in the last line fails, the handler will be invoked. This is implemented in the LanguageKit runtime framework and automatically inserts a BigInt pointer in place of the SmallInt value.

I've made similar improvements to the multiplication and subtraction routines. Previously, I was using a very naive approach to checking for multiplication overflow. If either of the operands was bigger than the maximum value you could store in a half-word then I was promoting the operation. Now, multiplications are only promoted if the result does not fit in a 31-bit or 63-bit signed value, just like addition and multiplication.

So, what does this mean performance-wise? I wrote a simple benchmark (in Etoile/Languages/Benchmark) to test this a while ago. This implements the most naive way of calculating the Fibonacci sequence (recursively call fib(n-1) and fib(n-2)) in both Objective-C and Smalltalk. The code measures the amount of CPU time taken to calculate fib(30) 100 times in each version.

When I first wrote this benchmark, some time before Christmas, it showed that the Smalltalk implementation took around 30 times as long as the Objective-C version. This wasn't too bad; I never intended Smalltalk to be used for heavy number crunching. The entire point of targeting the Objective-C ABI was that you can use Objective-C, C, or even inline assembly if you really care about speed. Looking at the generated bitcode, I saw that there were a lot of redundant shifts, and so a great deal of potential for optimisation. Running it again, with the new SmallInt code, I get this:

ObjC fibonacci execution took 23.773438 seconds.
Smalltalk fibonacci execution took 35.953125 seconds.  
Ratio: 1.512323

Smalltalk now takes 1.5 times as long as Objective-C. It's worth mentioning in this case that using a slightly better algorithm in Smalltalk is around a factor of 100 faster; good algorithms with bad compilers almost always beat good compilers with bad algorithms.

I didn't actually believe these results when I first looked at them, and spent quite a while going through the code looking to check that they are correct. This test is pretty much the worst possible case for Smalltalk, so in general use I'd expect Smalltalk to be even closer to Objective-C in terms of performance.

By the way, this test also now shows that the LowerIfTrue transform is actually useful. This is an AST transform in one of the LanguageKit plugins that turns -ifTrue: and related message sends in the AST into conditionals. When I wrote it, I tested this benchmark with and without it and there was no measurable difference; the overhead from the arithmetic was dwarfing the overhead from the extra message sends. This is not the case now. Disabling this optimisation gives this result:

ObjC fibonacci execution took 23.773438 seconds.
Smalltalk fibonacci execution took 60.390625 seconds.
Ratio: 2.540256

The Smalltalk implementation sends an -ifTrue: message when testing whether the argument is less than 2 (in which case it returns 1), while the Objective-C version just uses an if statement. With the transform, the Smalltalk version does not need the extra message send to the block and so compiles to almost the same machine code. I am not a huge fan of this optimisation in theory (since it makes the language marginally less expressive), but in practice it is unlikely to affect any existing code and is implemented by most other Smalltalk compilers.

News: Stay up to date

News

LanguageKit SmallInt Improvements

Search

Status

Archive