News

Syntax Highlighting with Clang

Posted on 15 October 2010 by David Chisnall

One of the reasons that I got involved with clang originally was the promise that the same front-end code could be used for other things. Since then, the only things that I've used clang for are compiling and as a static analyser.

More recently, the clang team has produced a new interface, libclang. This is a set of C APIs expose the functionality that an IDE might want. I've started wrapping these in IDEKit (which Quentin informs me is a name that has already been used by someone else, so expect to see it renamed soon, probably to SourceCodeKit).

The libclang APIs let you do a lot of things, including reporting diagnostics (errors, warnings, and so on) in an editor, code completion, and so on. The first thing that I decided to work on was synta highlighting.

Most code editors claim to perform syntax highlighting, but a lot really just do lexical highlighting. Vim is an example of this; it highlights by simply tokenising the input buffer and pattern matching. You can see the difference between vim's lexical highlighting and real syntax highlighting in this image:

Vim and Clang comparison

The top window is a modified version of Typewriter that uses IDEKit to perform syntax highlighting. The bottom window is the same file (MsgSendSmallInt.m from LanguageKit) opened in Vim. There are a few things to notice.

First, Vim doesn't know that COMPARE is a macro instantiation, so it doesn't highlight it at all. True syntax highlighting does. Second, look at the message sends. This code shows two class messages, both sent to BigInt. The syntax highlighter can tell that these are message sends (so it highlights the selector component) and that BigInt is a class, so it makes it purple. In contrast, Vim's lexical highlighter doesn't have patterns for the class or selector names, so it ignores them.

Another example is the handling of intptr_t. Vim treats this as a built-in type name because it's one of the C99-specified types. The syntax highlighter, in contrast, knows that it is a typedef, so highlights it in a different colour to real keywords like int and void.

You can find the modified version of Typewriter in Developer/Examples/CodeEditor. It's just a simple demo - the real code will be integrated into CodeMonkey later. It works fast enough in the files that I tested that you can type without noticing any delay. It's currently re-highlighting the selected line after every character press. This needs a bit of tuning.

For example, it's only really worth running the highlighter at all when the user has typed a whitespace or punctuation character; anything else will probably be the middle of a keyword or identifier, so won't provide any new interesting highlights.

Oh, and one more thing: The highlighter runs in two passes. In the first pass, it tags ranges in the source (an attributed string) with semantic attributes. It then goes through and replaces these with presentation attributes. You don't have to run the second step; you can also use some other transform on the result, such as generating HTML with attributes containing the semantic information and

News: Stay up to date

News

Syntax Highlighting with Clang

Search

Status

Archive