News: Stay up to date

The Étoilé community is an active group of developers, designers, testers and users. New work is being done every day. Visit often to find out what we've been up to.


LanguageKit, The Next Generation (Or Something)

Posted on 12 August 2011 by David Chisnall

Anyone following the svn logs will have spotted a ludicrous number of changes in LanguageKit appearing recently. A lot of this has been simple code cleanup. For example, code generation for assignments now all goes via a delegate object, so we can easily plug in different memory management strategies. Currently, LanguageKit supports emitting either the automatic reference counting (ARC) or garbage collection (GC) read and write barriers.

Using ARC, instead of the old retain / release code, means that LanguageKit will be able to benefit from the ARC optimisers written for clang. These remove redundant retain / release pairs and do a number of other clever tricks to reduce the number of operations required.

The interface to the code generation part of LanguageKit had two design decisions that are no longer applicable. Originally, I intended to share the runtime-specific code with clang. Since then, the clang and LanguageKit versions of CGObjCGNU.cpp have diverged a lot, so that's no longer an issue. There was also no Objective-C++ support (GCC's Objective-C++ support is terrible and clang had no C++ support at all), so there was a lot of conversion from Objective-C types to C types and then to C++ types. Now, the entire back end is written in Objective-C++, so we can pass objects right down. This simplifies the code a lot.

LanguageKit now requires the GNUstep Objective-C Runtime (libobjc2), and is no longer compatible with the legacy GCC runtime. This gives us a lot of interesting features.

The polymorphic selector problem is now more or less solved. Libobjc2 introduced type-dependent dispatch a while ago. This means that the mapping from selectors to methods now depends on the types, as well as the names, of the selector. This is important, because Objective-C permits you to define two methods in different parts of the class hierarchy with different types. You can then cast one of these objects to id (an untyped object), cast it to the other type, call the method, and have undefined behaviour. This is particularly problematic for Smalltalk code, because we have no type info in the source code, so we can't disambiguate these cases at compile time.

There are two halves to this problem. One is defining methods, the other is sending messages. When you define a #count method in Smalltalk, should that be the version that returns an integer (like NSArray) or the version that returns an object? With the latest version of LanguageKit, both are now emitted. The runtime will automagically select the correct one based on the type info in the selector.

One of the other changes that libobjc2 made was the modification of the method lookup function to return a cacheable slot pointer. As a side effect, this also means that we can look up the type encoding of a method very quickly. Now, when you send a message with an ambiguous type signature, LanguageKit generates code that first gets the type, then branches based on which type encoding is used. This is slower than a normal message send, but not by much.

LanguageKit now generates blocks that use the same ABI as Objective-C blocks. This has two advantages. First, lots of people care about performance of Objective-C blocks, so they'll be working on improving the LLVM optimisers to make them faster (e.g. inlining them). Second, it means that we can now pass LanguageKit blocks to functions or methods that expect Objective-C blocks. This is not completely true yet, because currently blocks always take objects as arguments and return an object, while a lot of Objective-C code expects blocks with different argument types, but it's a start. Finally, it means that we have less code in LanguageKit to maintain, which is always good for reliability.

LanguageKit has always had an LKObject type. This is a pointer that either has a small integer hidden inside it and the low bit set to 1, or a pointer to a real object. Before any message send, we checked the low bit and only did a real message send it it was zero.

Now, support for small objects is part of the runtime. If the low bits in an object are not 0, the runtime does a side lookup of the class from small table. This means that LanguageKit can skip the special cases, generating much smaller code. It still does generate the special cases for a small selection of methods, but only ones that we'll get a significant benefit from inlining in the small integer case, such as arithmetic. This also has the advantage that we can return small integers from methods - without needing to box them - and Objective-C code can use them as if they were real objects. On 64-bit, we'll eventually start storing 32-bit floats in pointers too.

Oh, one the subject of boxing and unboxing, that code is improved too. We can now box and unbox more complex structures fairly reliably. The new code still needs testing on more platforms, but it's looking promising.

My favourite new feature, however, was something I almost finished over Christmas and then left to bitrot. It's now finished, and we have transparent bridging for C functions. When you specify a framework to load, LanguageKit loads the relevant library. If you have SourceCodeKit installed, then it will also use libclang to parse the framework header (e.g. FooKit.h, for FooKit) and find all of the functions that it declares and their types.

In the Smalltalk front end, this is exposed via the C pseudoclass. When you send messages to C, they are really message sends. For example, you can use the standard math library function sqrt() like this:

C sqrt: 42

This doesn't go via any kind of foreign function interface. There's no sending a message, deconstructing the call frame, and then generating the new call. It's the direct equivalent of writing sqrt(42) in C.

For functions that take more than one argument, we have two options in terms of syntax. You can use a C-like syntax, where the function looks like a single-argument message that takes an array as its argument, like this:

C fdim: {60. 12}

This is equivalent to writing fdim(60, 12) in C. Alternatively, you can use a more Smalltalk-like syntax, and split the function into different parts. For example, to call NSLocationInRange(), you might write:

C NSLocation: l InRange: r.

The parser strips the semicolons and combines the message parts, so this is equivalent to writing NSLocationInRange(l, r) in C.