News: Stay up to date

The Étoilé community is an active group of developers, designers, testers and users. New work is being done every day. Visit often to find out what we've been up to.


Compiler Fun

Posted on 12 May 2008 by David Chisnall

Anyone following the Étoilé svn logs recently will notice that I haven't been committing much for a few weeks. The reason for this is that I've been taking a short break to do some compiler hacking.

Objective-C support was first added to GCC by some guys at NeXT. They didn't want to release their code, but were eventually forced to by the FSF. They did not release the code for their runtime library, and so this code was completely useless to anyone else. RMS wrote a drop-in replacement for this library, which became the GNU Objective-C runtime. Gradually the GNU and NeXT runtimes diverged and the Objective-C support code in GCC became littered with #ifdefs.

After Apple bought NeXT, they continued developing their version of GCC in a branch. This branch was slightly cleaner, since it never had support for the GNU runtime, but no use to anyone on platforms other than Darwin for the same reason. This code is no fun at all to work with - Objective-C structures are lowered to the corresponding C structures, so there is no clean Objective-C AST to work with and runtime-specific code is interleaved with the abstract representations. When Apple add a new language feature, they add it to their branch, and if anyone else wants to use it then they have to merge the changes into the main trunk. Unfortunately, no one is doing this and Objective-C support in GCC is in a rather depressing state (bugs in Objective-C are not seen as show stoppers for a release, as we saw in the early 4.x series).

Recently, GCC switched to GPLv3. Apple corporate policy is that they will not touch GPLv3 code, and so the Apple branch is now a fork of GCC 4.2. Features added to GNU GCC will not find their way into Apple GCC and vice versa, unless explicitly licensed in a compatible way by their contributor.

Apple have also started looking at a new compiler, known as LLVM. This is a modular infrastructure for building compilers. It currently has an Objective-C/C/C++ front end based on Apple's GCC. This combination of an LLVM back end and a GCC front-end is typically known as llvm-gcc. It is found in the iPhone SDK and is likely to be found in the OS X dev tools soon. GCC isn't really designed to be split apart like this, however, and so the Apple guys have been working on a new one.

Unlike GCC, clang has very clean layering. This is intentional, since Apple also want to use it in XCode for syntax highlighting and refactoring tools. This means that every single Objective-C language construct gets corresponding AST nodes which are then passed to another part of the program which emits LLVM intermediate representation (IR) code - single static assignment assembly language - which is then turned into native code for the desired platform.

When I first looked at clang, most of the parsing code for Objective-C was done, but none of the code generation part. This meant that I was free to add any interfaces I wanted. Clang now has an abstract class encapsulating all of the runtime-specific behaviour and hooks in the generic code that call this. I have also written a complete implementation of this for the GNU runtime and an almost-complete one for the Étoilé runtime. As a result of this, clang can now compile about 90% of the files in GNUstep-base without issue. The remaining ones are failing due to a couple of outstanding bugs with implicit casts (the LLVM type system is a lot more strict than the Objective-C one and so casts which are implicit in Objective-C need to become explicit in the IR) and a few C features. GNUstep uses variable length arrays in a few places, for example, and I have only added partial support for these.

My changes to Clang are currently undergoing code review, but after this has happened and I've made the required changes they should go in.

Objective-C isn't the only thing that makes this interesting. Since the object model code is all isolated in a separate class, it is possible to plug this into other compilers trivially. Generating classes, protocols and categories, selectors and message sends that use the underlying GNU runtime (and soon the Étoilé runtime) functionality is trivial when using this class (each high-level construct is mapped to a method call). I am currently in the process of writing a Smalltalk compiler that uses this same back end. LLVM supports both JIT and static compilation, so we will be able to JIT-compile Smalltalk while developing, dump it to a file, and static compile it for distribution.

This means that Smalltalk will be a first-class citizen of the Étoilé ecosystem. Applications will be able to be written in Smalltalk and Smalltalk classes will be able to inherit from Objective-C classes. There is no bridging - Smalltalk methods will be compiled to native code and attached to the same structures as Objective-C methods. Once this is finished, I will be recommending Smalltalk as the development language-of-choice for new Étoilé applications. If you discover that a particular piece of code is too slow (after profiling) then you might want to rewrite it in Objective-C (or even pure C), although I don't expect Smalltalk to be much slower than Objective-C.

Smalltalk is not the only high-level language we will implement in this way - just the first. Expect Io, JavaScript and maybe even Self implementations later. These languages are all prototype-based, however, and so require a few features that are not found in the GNU runtime (but are in the Étoilé runtime) for full support.