Recursion in Regular Expressions
Robin Houston (author of the original proposal for extending regular expressions to support recursion) gave an example of how to use this feature to match properly nested parentheses on the TextMate mailing list.
He also demonstrates the use of (?x:…)
to allow commenting of the regular expression and splitting it over multiple lines, making it less terse (highly recommended when crafting complex regular expressions e.g. for language grammars).
Inserting Thousand Separators
Speaking of regular expressions, I previously did a post about how to word wrap text using a regular expression. To follow that, here is a regular expression for matching all groups of digits which should be followed by a thousand separator:
\d{1,3}(?=(\d{3})+(?!\d))
So the replacement string would be \0,
or $&,
(depending on format string syntax). Let’s try this with the ls
command and perl:
% ls -l /mach_kernel \
|perl -pe 's/\d{1,3}(?=(\d{3})+(?!\d))/$&,/g'
-rw-r--r-- 1 root wheel 10,235,416 19 Sep 05:50 /mach_kernel
Of course in this case you’d be better off with the -h
option, which gives the “human readable size”, i.e.:
% ls -lh /mach_kernel
-rw-r--r-- 1 root wheel 9.8M Sep 19 05:50 /mach_kernel
Figure Space
Staying in the domain of pretty-printing numbers. I use GeekTool to dump a lot of information on my desktop, some of it consists of columns with numbers. Let’s imagine I am tracking the value of the dollar and euro per week (leftmost column being the week number, and the figures are unlikely going to depict the actual value):
8 $5.45 €7.44
9 $5.42 €7.45
10 $5.37 €7.44
Displaying such data is best done with a monospace font, so that things align. But the € or $ character in these fonts might not be entirely to our liking (when we blow up the text size to 24 points), so can we use a proportional width font? Generally proportional width fonts will make all the digits the exact same width, so that they do align nicely. Though the space will normally not follow the width of the digits, so we have a problem with 10
being wider than ␣9
(here ␣
represents a regular space).
Fortunately somebody already thought of that. If you check out WikiPedia’s table of space characters you will find the Figure Space (U+2007
) which is a space made especially for use when aligning numbers, i.e. it is the width of a digit. So by using leading figure spaces for our week numbers, we can make the above data align nicely even when printed with a proportional width font.
If you do want to use this (or any other unicode character) with GeekTool output, you need to patch the source to interpret command output as utf-8.