News: Stay up to date

The Étoilé community is an active group of developers, designers, testers and users. New work is being done every day. Visit often to find out what we've been up to.

News

Pragmatic Smalltalk 0.5

Posted on 12 July 2008 by David Chisnall

I've been calling Étoilé 'a pragmatic Smalltalk' for a long time (although Nicolas, I believe, was the one to coin the expression). Smalltalk is a really great language, but it has two disadvantage:

1) It tends to be bytecode-interpreted, which is not very fast. 2) Implementations tend to be all-or-nothing.

The first is less of a problem now that CPUs are so fast they spend 90% of their time idle in a typical desktop workload. The second is much more of a problem. Smalltalk-80 includes a complete GUI and common implementations, such as Squeak adopt this model. This means that Squeak applications and 'native' applications are entirely separate. If there is one thing that Squeak doesn't have that you need, then using Squeak is not easy.

This week, I committed the first version of the Smalltalk compiler I have been working on to Étoilé svn. Unlike other Smalltalk implementations, this is designed from the ground up for interoperability. Smalltalk objects are compiled (to native code) as Objective-C objects. This means that they can subclass Objective-C objects, and can even implement categories on Objective-C objects. There is no C function interface - if you want to call C functions then call them from Objective-C.

The compiler is in three components. SmalltalkKit contains everything required to take a string containing Smalltalk code and compile it to a set of Objective-C objects.

The Support library contains things needed by Smalltalk but not Objective-C. The most important class here is the BlockClosure class, which implements a Smalltalk block as an Objective-C object with a function pointer as an instance variable and pointers to bound variables and space for promoting other variables (eliminating the need for garbage collected stack frames). There are also a few categories, such as map: and related methods on NSArray which take blocks as arguments. Note that these are implemented in Objective-C even though they are used by Smalltalk - they could, in most cases, easily be implemented in Smalltalk instead.

The final part is a tool which compiles a Smalltalk file, instantiates a specified class, and send the instance a run message. This is very small and shows how the compiler can be used, and will serve as the framework for writing complete applications in Smalltalk.

The parsing is done in Objective-C, using the Lemon parser generator from SQLite. The abstract syntax tree (AST) is constructed out of Objective-C objects, which means it's exposed to Smalltalk. As a result, Smalltalk programs can generate code easily by constructing the AST and invoking its compileWith: method, or by instantiating a parser and giving it a string.

Currently, the compiler only works in-process. It uses runtime introspection when constructing the AST. Code generation, however, is done via LLVM, and involves generating an LLVM intermediate representation (IR) version of the AST, running LLVM optimisation passes on this, and then compiling it to native code. With minor modifications, it is possible to emit the LLVM IR as bitcode and then run extra optimisations on it or compile and link it as a native library. Whether this is interesting depends on how long it takes to run the compiler. For the simple test I've done so far, program startup has taken much longer than parsing and code generation (and I'm using a debug build of LLVM, which is about 10% the speed of a release build). For larger programs, it might be worth statically-compiling. If parsing is a major overhead, it might be worth caching the bitcode for each Smalltalk input class.

So far, it is a fairly naive implementation. Lots more optimisations are possible (some are very easy) than are currently done. My aim, however, is to move as many as possible into LLVM passes, so that they can be used when compiling other dynamic languages. The code representing the Objective-C object model is taken from code I wrote for clang, the new C language family front end for LLVM, and so is also used for compiling Objective-C with LLVM.