Introduction to scopes
Problem
Text editors often have different modes for different file types. For example it can have a Java mode with appropriate build actions defined, and another mode for LaTeX.
This system doesn’t work for documents with mixed languages. For example HTML can embed ASP, Java, JavaScript, PHP, Ruby, Tcl, and probably half a dozen other languages. So what happens if we write for
in an HTML document and press tab (to expand the for
-loop)? Some editors solve this by creating new modes for the combinations, like HTML+PHP. But this isn’t a complete solution, since HTML+PHP can still contain both JavaScript and PHP, so we’d still not know how to expand the for
keyword.
As a non-web programmer this problem is probably rare, but if you look at your source, this can be considered a mixed document as well. You have the keywords, variables, operators, numbers, and constants, but you also have strings and comments which generally contain prose.
Do you want the same editing mode for keywords and variables as you want for string content? For example,
- normally you never have a single apostrophe in code, so it makes sense to insert these in pairs, but in strings and comments you use it for contractions,
- it may make sense for spell checking as you type to be enabled for strings, but certainly not for your source (or escape sequences, variables, interpolated code or similar which could be embedded in the string),
- you may want enter to continue the comment on the next line, but only when inside a comment,
- …
Solution
To solve the mixed language problem and additionally allow for a lot of new possibilities with regard to end-user customization, TextMate views your document as a tree, constructed using one of the user customizable language grammars.
What would normally be a mode-specific setting (in other text editors) can then be set for a subset of this tree (using patterns to select the subset).
HTML/XML Analogy
The above may sound a little abstract, but it’s exactly the same which is done when creating a CSS rule like this:
div em { font-style: normal; }
If our document looks like this:
<body>
<p>First <em>emphasis</em></p>
<div>
<p>Second <em>emphasis</em></p>
</div>
</body>
The CSS rule will disable font styling for the second instance of the word emphasis (which would normally be italic).
The tree structure of the above is obvious since we explicitly write the tags. The tags are implicit in TextMate, using the language grammar. So if we apply this example grammar:
scopeName = 'c_source';
patterns = (
{ name = 'storage'; match = 'char'; },
{ name = 'modifier'; match = 'const'; },
{ name = 'operator'; match = '[=*]'; },
{ name = 'string'; begin = '"'; end = '"';
patterns = (
{ name = 'constant'; match = '\\\\.'; },
);
}
);
To this line of code:
char const* str = "Hello world\n";
The conceptual XML tree will look like this:
<c_source>
<storage> char </storage>
<modifier> const </modifier>
<operator> * </operator>
str
<operator> = </operator>
<string>
"Hello world<constant>\n</constant>"
</string>
;
</c_source>
TextMate currently supports CSS descendant selectors and grouping for use when selecting a tree subset. It, however, calls the subset selected the scope, and likewise, when the ability to make a command, preference, color or similar work only on a subset of your source, the text field in which the selector can be entered is prefixed with a scope label.
So with the above grammar, if we want to select all strings, the scope is: string
, if we only want strings in C sources, it’s: c_source string
, and if we want to select constants inside strings, the scope is: string constant
.
Hint: If you have the Language Definitions bundle enabled you can press ⌃⇧P to see a tooltip showing the exact scope of the caret. The current scope of the caret is also exported as a TM_SCOPE
shell variable when executing shell commands.
One last twist
Using the above system to select e.g. all comments would require us to either name all comments comment
(regardless of type) or alternatively use grouping to select all types, e.g.: line-comment, block-comment, embedded-documentation-comment
etc.
The former solution loses information and the latter quickly becomes unmanageable, especially when new types are introduced for which a general type already exist.
So instead the names used are hierarchical with the most general type first, and then more specific types can follow, separated by dots. E.g. for the 3 comment types we’d have:
- comment.line
- comment.block
- comment.documentation
The CSS descendant selectors mentioned above are loosened so that they will settle for a prefix-match of the actual name, e.g. a scope of comment
will match all 3 names from the list, where comment.block
will only match name number 2.
Update: Starting with 1.1b17 it’s also possible to AND, OR, and subtract scope selectors, e.g.: (a | b) & c - d
would select the scope which is not matched by d
, and matched by both c
, and a
or b
.
Where to use them
This entry overlaps a bit with my previous entry on language grammars, but there are currently two places you can use scopes in TextMate (ignoring here that you can use them from your shell commands), namely the Theme Editor and the Bundle Editor.
For commands, macros, and snippets, the scope controls when the activation should be used. So setting the scope of a macro to comment
still allows you to select it from the menu, but the key equivalent or tab trigger will only be checked when the caret is inside a comment.
For drag commands, the “activation” of the command is dragging a file to the text area, and here the scope is matched against the scope of the drop position. So if for example you make a drag command with scope set to source.css
and open an HTML document, only when dropping a file into a <style>…</style>
section, will it consider the drag command for execution.
Starting with 1.1b15 (unreleased as of this writing) there are also preferences in the bundle editor, which are miscellaneous settings like spell checking, smart typing, increase/decrease indent patterns, completion stuff, etc. The scope for these items means that the preference only applies to that scope. Some preferences like tab size and soft wrap can only work on the root scope (e.g. you can set it for text.plain
but not string constant
), although I do expect that as this system evolves, more and more will be settable for arbitrary scopes.
Prior to 1.1b15 the preferences mentioned are set in the Theme Editor.
The Theme Editor is the other place where scopes can be set. As mentioned in a previous post, this is a bit rudimentary, but basically it works like the preferences described above, with the difference being that the settings you change refer to the visual appearance, i.e. font style and colors, and it has an unintuitive system for trying to visualize having multiple style sheets with can include other style sheets :) I hope to have the new color set editor ready for 1.1b15.
Update: The above mostly applies to Fonts & Colors in preferences.
What names should be used
If your fingers are itching to start defining your own language grammars for your proprietary file formats, in order to get context sensitive behavior, you may wonder which names should be used for the stuff you markup.
You are of course free to use whatever names you like, but you’re also strongely encouraged to follow the names defined in the wiki.
Using these names ensures interoperability and means the default themes should do a decent job at coloring your files.