Language grammars
Since I made beta 6 and onward unofficial, I’ve been quiet about new features, despite my plan (that, and being a lazy writer). Though seeing how 41% of users are using the “unofficial” betas, I’m going back to “normal” beta releases.
As for labeling these releases betas, at least two anonymous posts think I should call them alpha. For me alpha is something which crashes, and if alpha means feature incomplete and beta is feature complete, I’m afraid TextMate won’t leave alpha for a long long time. But I’ll probably annotate them with some descriptive term sometime in the future, which would clarify the state and also allow me to release something equivalent of nightly builds.
But enough about that, let’s take a look at one of the major new features (though for anyone with the time, I strongly encourage reading the full release notes for each new (beta) release)…
The ability to tell TextMate about your language so that you can, amongst others:
- turn strings blue
- make return continue the current comment on next line
- disable spell checking for HTML tags
TextMate debuted with a flexible format to define the language grammar, but the grammar was in a file on disk and the description for the format was hidden in the TextMate help book. Luckily though a lot of people did find these hidden items and have done a great job adding support for a lot of languages I hadn’t heard about like Perl, Python, and Java (I have since learned that at least Perl is generally used by “hackers” who spend their time performing “DoS” attacks).
With the recent betas the grammar files can be inspected and edited by choosing (from the menus): View → Language → Edit Languages…
The language editor still use the old-style plist format, something I do plan to improve upon, but hopefully the current is enough to motivate people to play more with language grammars.
The rest of this post will detail only what’s new since the help book was written.
Recursion
A grammar can now refer to itself. For example in the context of HTML we can create a rule like this:
{ name = "markup.bold.html";
begin = "<b>"; end = "</b>";
patterns = ( // really means “contains”
{ include = "$self"; }
);
}
This rule tells TextMate that everything starting with <b>
and ending with </b>
should be named “markup.bold” and can contain any of the constructs that the current grammar has a rule for.
Local rules
To avoid repetition it is now possible to define a rule once and refer to it in multiple places. This is done by placing the rule in the key/value container stored at the root level under the repository
key, and referring to it with a #
prefix.
It’s probably best with an example:
{ name = "PHP"; scopeName = "source.php";
patterns = (
{ include = "#string"; },
{ include = "#variable"; }
);
repository = {
string = {
name = "string.quoted.double.php";
begin = "\""; end = "\"";
patterns = (
{ include = "#escape"; },
{ include = "#variable"; }
);
};
variable = {
name = "variable.other.php";
// this pattern is a huge simplification
match = "\\$[A-Za-z]\\w*|\\$\\{[A-Za-z]\\w*\\}";
};
escape = {
name = "constant.character.escape";
match = "\\\\.";
}
};
}
Another use for this is when we want to recursively refer to only a subset of our grammar. For example I’m told that Perl has qq(…)
strings in which it allow balanced parentheses. To match that, we’d have to create a rule like this:
{ name = "string.unquoted.qq.perl";
begin = "qq\\("; end = "\\)";
patterns = (
{ include = "#qq_string_content"; },
);
},
And then in the repository add the definition for qq_string_content
:
qq_string_content = {
begin = "\\("; end = "\\)";
patterns = (
{ include = "#qq_string_content"; },
);
};
Back references
For tags, LaTeX environments and similar, the end pattern depends on how the construct began. Previously it wasn’t possible to make a begin/end rule for that, but now you can refer to captures from the begin pattern in the end pattern using normal back-references. For example to match bash heredoc constructs (w/o stripped indent) one could use the following rule:
{ name = "string.unquoted.heredoc.bash";
begin = "<<(\"|')(\\w+)\\1"; end = "^\\2$";
}
One could of course also create a patterns array to correctly markup variables and similar which does get expanded by bash when the heredoc token is unquoted.
Styling
Previously the color/font style for a construct was placed in the rule which matched the construct. This gave each language its own personality, but was strangely disliked by a lot of users.
Now the grammar files doesn’t contain any styles. They only assign a name to the construct matched and it’s then possible to associate styles to that name using View → Theme → Edit Themes…
This Theme Editor is still very rudimentary (one of the reasons I’ve kept the current betas as unofficial). But the gist of it is this:
There are four themes by default (All Hallow’s Eve, Boring, iPlastic, and Pastels on Dark), each theme includes one or more settings groups, (consider a settings group like a CSS file). Each settings group then contain a number of settings items (consider each settings item as a CSS rule).
When you open the Theme Editor you’ll see a list with all the settings groups in the left side. Above this list is a popup gadget which show the theme. The themes all have access to the same settings groups, and the check marks to the left of the settings group names control whether or not that settings group is included for the selected theme.
If you unfold a settings group you’ll see the settings items it contains. If you click one of these you’ll get a) the settings it “sets” and the scope. The scope is a bit like a CSS selector. It is based on the names given to constructs by rules.
So for example if we look above for the qq(…)
rule defined for Perl, this assigns the name “string.unquoted.qq.perl” to these constructs, and we can use that name in the scope to style these strings.
The match is however prefix-based, so we don’t have to enter the entire name, we could settle with “string.unquoted.qq” or even just “string”. This has the advantage of targeting all constructs that has “string” as the first part of their name.
For this reason it is important that rules derive their name from one of the standard names.
I mentioned that the scope was a bit like a CSS selector, and like a CSS selector, the scope can also target constructs based on their context. So for example if we look above for the minimal PHP grammar. This one names strings, and inside strings we can have variables and escape codes. Imagine we’d like to have variables in strings underlined, but variables outside strings should appear as normal. For this we’d use a scope like “string variable”. This targets all variables inside strings.
Named captures and content
Since grammar rules assign a name to the entire thing matched, it is also possible to assign a name to the capture of the regular expressions.
This is done with an additional captures
key. For example if we make one rule to match a tag and want to assign a name to the namespace and tag name we could do:
{ match = "</?(?:([-_a-zA-Z0-9]+):)?([-_a-zA-Z0-9:]+).*?>";
captures = {
1 = { name = "entity.name.tag.namespace.xml"; };
2 = { name = "entity.name.tag.xml"; };
};
}
Here capture 1 is given the name “entity.name.tag.namespace.xml” and capture 2 is given the name “entity.name.tag.xml”.
For rules which use begin/end keys, captures
refer to both patterns. But one can instead use beginCaptures
or endCaptures
to refer to only the begin or end pattern.
Additionally it might be useful to name only the stuff between the begin and end pattern. This is done using contentName
instead of, or in addition to, the normal name
. So we may want to revise the HTML bold rule above to:
{ contentName = "markup.bold.html";
begin = "<(b)>"; end = "</(b)>";
captures = {
1 = { name = "entity.name.tag.html"; };
};
patterns = (
{ include = "$self"; }
);
}
Settings
The Theme Editor also allow you to set miscellaneous settings for matched constructs. I haven’t composed a list of these yet, and I don’t intend this to be settable in the Theme Editor for long (instead it’ll go into the bundle editor, so settings can be structured like bundles), but for now you can unfold the “Settings: Basic” group and checkout some of the stuff currently set.
You’ll need to switch to textual mode which is done with the segmented control in the upper right corner of the theme editor (showing three lines of text and the standard fonts and colors icon).
Scopes
Scopes were already described above under styling. What wasn’t mentioned is that these scopes can also be used for bundle items (in the bundle editor). This affects when the “activation” method is active.
So for example if you make a macro and give it return as key equivalent but set the scope to “string” then only when the caret is inside a string, will your macro be executed when pressing return.
The scope is also used to decide which bundle item to execute, when there are multiple matches. It’ll always use the one with the most exact scope. If there are several candidates, it’ll show you a menu.
Multi threading
Previously parsing of your text happened lazy (before display) and was cached. But because parsing a line requires all lines above it to be parsed, it would need to parse the entire document when going to the bottom, or when pasting large chunks of text, you’d have to wait for the parser to complete working on this text.
The parser now runs in its own thread, and display no longer requires the text to have been parsed (though it’ll lack styles if it hasn’t). So pasting large chunks of text, loading a huge file and going to the bottom etc. should no longer cause a noticeable delay.
Oniguruma
The Oniguruma regular expression library (version 3.8.2) is now used instead of the library I wrote myself. The main functional advantage is support of look-behind, but Oniguruma has a lot of other neat features. This is the library used in Ruby. The documentation for the regular expression syntax can be found here.
I still use my own library for normal regular expression searches (so the help book entry on the syntax still applies here). I do plan to switch, but there are some technical problems, which is the main reason I wrote my own library in the first place.
Conclusion
This completes the functional revision of the language definition system for this time. Not that I lack ideas for further improvements, but I know some users are eager to see similar attention given to other parts of TextMate, and that’s what I will do.
My plan is to finish the (graphical) interface to this stuff, wrap up the various lose ends, and then do a real non-beta 1.1 release. And then start with the 1.1.x releases where I’ll take care of some of the other things I want to improve, especially the project window needs a major overhaul including making the project drawer more like the Finder, supporting tabs for non-projects, and allowing split views.
As for the function popup (in a better form than the current Go to Symbol list), this will most likely appear before 1.1 final. But please stop nagging me about it!