12 Language Grammars

Language grammars are used to assign names to document elements such as keywords, comments, strings or similar. The purpose of this is to allow styling (syntax highlighting) and to make the text editor “smart” about which context the caret is in. For example you may want a key stroke or tab trigger to act differently depending on the context, or you may want to disable spell check as you type those portions of your text document which are not prose (e.g. HTML tags).

The language grammar is used only to parse the document and assign names to subsets of this document. Then scope selectors can be used for styling, preferences and deciding how keys and tab triggers should expand.

For a more thorough introduction to this concept see the introduction to scopes blog post.

12.1 Example Grammar

You can create a new language grammar by opening the bundle editor (Window → Show Bundle Editor) and select “New Language” from the add button in the lower left corner.

This will give you a starting grammar which will look like the one below, so let us start by explaining that.

 1  {  scopeName = 'source.untitled';
 2     fileTypes = ( );
 3     foldingStartMarker = '\{\s*$';
 4     foldingStopMarker = '^\s*\}';
 5     patterns = (
 6        {  name = 'keyword.control.untitled';
 7           match = '\b(if|while|for|return)\b';
 8        },
 9        {  name = 'string.quoted.double.untitled';
10           begin = '"';
11           end = '"';
12           patterns = ( 
13              {  name = 'constant.character.escape.untitled';
14                 match = '\\.';
15              }
16           );
17        },
18     );
19  }

The format is the property list format and at the root level there are five key/value pairs:

There are two additional (root level) keys which are not used in the example:

12.2 Language Rules

A language rule is responsible for matching a portion of the document. Generally a rule will specify a name which gets assigned to the part of the document which is matched by that rule.

There are two ways a rule can match the document. It can either provide a single regular expression, or two. As with the match key in the first rule above (lines 6-8), everything which matches that regular expression will then get the name specified by that rule. For example the first rule above assigns the name keyword.control.untitled to the following keywords: if, while, for and return. We can then use a scope selector of keyword.control to have our theme style these keywords.

The other type of match is the one used by the second rule (lines 9-17). Here two regular expressions are given using the begin and end keys. The name of the rule will be assigned from where the begin pattern matches to where the end pattern matches (including both matches). If there is no match for the end pattern, the end of the document is used.

In this latter form, the rule can have sub-rules which are matched against the part between the begin and end matches. In our example here we match strings that start and end with a quote character and escape characters are marked up as constant.character.escape.untitled inside the matched strings (line 13-15).

Note that the regular expressions are matched against only a single line of the document at a time. That means it is not possible to use a pattern that matches multiple lines. The reason for this is technical: being able to restart the parser at an arbitrary line and having to re-parse only the minimal number of lines affected by an edit. In most situations it is possible to use the begin/end model to overcome this limitation.

12.3 Rule Keys

What follows is a list of all keys which can be used in a rule.

12.4 Naming Conventions

TextMate is free-form in the sense that you can assign basically any name you wish to any part of the document that you can markup with the grammar system and then use that name in scope selectors.

There are however conventions so that one theme can target as many languages as possible, without having dozens of rules specific to each language and also so that functionality (mainly preferences) can be re-used across languages, e.g. you probably do not want an apostrophe to be auto-paired when inserted in strings and comments, regardless of the language you are in, so it makes sense to only set this up once.

Before going through the conventions, here are a few things to keep in mind:

  1. A minimal theme will only assign styles to 10 of the 11 root groups below (meta does not get a visual style), so you should “spread out” your naming i.e. instead of putting everything below keyword (as your formal language definition may insist) you should think “would I want these two elements styled differently?” and if so, they should probably be put into different root groups.

  2. Even though you should “spread out” your names, when you have found the group in which you want to place your element (e.g. storage) you should re-use the existing names used below that group (for storage that is modifier or type) rather than make up a new sub-type. You should however append as much information to the sub-type you choose. For example if you are matching the static storage modifier, then instead of just naming it storage.modifier use storage.modifier.static.«language». A scope selector of just storage.modifier will match both, but having the extra information in the name means it is possible to specifically target it disregarding the other storage modifiers.

  3. Put the language name last in the name. This may seem redundant, since you can generally use a scope selector of: source.«language» storage.modifier, but when embedding languages, this is not always possible.

And now the 11 root groups which are currently in use with some explanation about their intended purpose. This is presented as a hierarchical list but the actual scope name is obtained by joining the name from each level with a dot. For example double-slash is comment.line.double-slash.