News: Stay up to date

The Étoilé community is an active group of developers, designers, testers and users. New work is being done every day. Visit often to find out what we've been up to.


Static Compiling Smalltalk

Posted on 10 November 2008 by David Chisnall

One of the things I wanted to do with Smalltalk was allow static compilation. This is possible with LLVM as the back end. The compiler creates LLVM IR, a low-level intermediate representation form, which is then used to perform optimisations and can be compiled or interpreted. I was using this for the JIT - the IR was created when the code was loaded but turned in to native code on-demand, when each method was used.

Today I committed a few changes to LanguageKit to allow the bitcode to be written to a file instead of loaded. This was slightly more complicated than you might imagine. I use a trick with the JIT where each Smalltalk module uses the set of functions defining small integer messages as a template. This allows them to be inlined nicely without having to worry about cross-module optimisations. For static compilation, this is not desirable, so the biggest change was allowing it to reference these functions externally or internally depending on how the code generator was being used.

Once this was done, I added a new -c option to edlc. If you now do:

$ edlc -c -f

You will get a file test.bc as output. This contains the LLVM bitcode for the Smalltalk file. The next step is to link together all of the .bc files, including the MsgSendSmallInt.bc file which contains definitions of small integer messages:

$llvm-link $(GNUSTEP_LOCAL_ROOT)/Library/Frameworks/LanguageKit.framework/Versions/0/Resources/MsgSendSmallInt.bc test.bc -o smalltalk.bc

This outputs a single file, smalltalk.bc, containing all of the bitcode from the various modules. If you compiled more than one Smalltalk file then list all of the .bc files here. This is completely unoptimised, so let's run some optimisations on it:

$ opt -O3 smalltalk.bc -o smalltalk.optimised.bc

This runs the same set of optimisations that llvm-gcc runs at -O3. I haven't actually done any sensible tests to see if this is sensible, but hopefully it is (if anyone can come up with a good list of optimisations before I get around to doing some sensible testing, please let me know).

Now we have an optimised bitcode file, we want to turn this into object code. This is a two-step process:

$ llc smalltalk.optimised.bc
$ gcc -c smalltalk.optimised.s

The first step produces assembly code, and the second step assembles it (you can use as for the second step, but I was lazy and just threw it at the GCC compiler driver). You now have a file called smallltalk.optimised.o, an object code file that you can link in to your executable just as you would an object code file compiled from Objective-C.

This sounds a bit complicated, and it is. It's actually more steps than the first C compiler I ever used (where preprocess, compile, assemble, and link were all separate steps) required. Fortunately, Nicola Pero is working on adding support for it to GNUstep Make, so soon it should be just a matter of putting SMALLTALK_FILES=... in your GNUmakefile.

The bad news is that this is too big a change to be properly reviewed in time for 0.4.0, so unless you are running trunk you will have to wait for a bit to see it. 0.4.1 should be out around the new year, so you don't have too long to wait...