News: Stay up to date

The Étoilé community is an active group of developers, designers, testers and users. New work is being done every day. Visit often to find out what we've been up to.

News

So, you want to invent a language?

Posted on 12 October 2008 by David Chisnall

I posted a little while ago about the Smalltalk compiler in Étoilé svn. Since then, Truls Becken has rewritten my parser (which was quite bad, and is now quite good) and tidied up the code a little. I've also refactored it into two frameworks, LangaugeKit and SmalltalkKit. LanguageKit contains all of the abstract syntax tree and code generation stuff, while SmalltalkKit contains all of the Smalltalk-specific parts.

The total line count for the Smalltalk-specific part is a shade over 500 lines of code. This means that writing a new front-end for something Smalltalk-like is very easy (I plan on adding some things to LanguageKit to make slightly less Smalltalk-like languages similarly easy).

If you want to play, then the first thing you need is a subclass of LKCompiler, which implements two methods: +fileExtension and +parser. The first returns the extension used by scripts in your language (@"st" for Smalltalk), while the second returns the Class implementing your parser.

Then you need to implement the parser. This just needs to implement one method, parseString: which takes a string as an argument and returns an AST. For Smalltalk, I have a hand-written tokeniser and use LEMON (from the SQLite project) for the parser. The tokeniser simply turns the string into a stream of tokens and then passes them one at a time to the parser (it might be simpler if I wrote it using something like Lex, but since it's only 200 lines of code now I can't really be bothered). The parser is generated from a BNF-like description of the grammar, with instructions in Objective-C on how to generate the AST from this.

Now that Truls has rewritten it, the Smalltalk grammar is a fairly good example of a LEMON grammar. If you want to write a new language, a good first step is tweaking Smalltalk a bit. If you find that you want a semantic construct that isn't supported by the AST, drop in to SILC and talk to me - adding static flow control (if statements and while loops) is high on my list of priorities, as is support for primitive (non-object) types that aren't auto-boxed.