TextMate News

Anything vaguely related to TextMate and macOS.

Obfuscating Emails Revisited

The outcome of my last entry about obfuscating email addresses is the HTML → Encrypt Selection (ROT13) command which replaces the current selection with JavaScript to output it (with the actual markup stored in ROT13).

Since most of my pages are already preprocessed, I decided to turn it into a general filter that works on a full HTML page. It replaces all document nodes (which contain an email address) with the corresponding JavaScript.

Since it has given me much joy to once again be able to write plain email addresses in my pages (without worrying about spam harvesters) I figured I should share :) But do double-check that the result you get from the filter is actually what you expect — the parser used is written for (my) valid HTML pages with potential script tags (such as <% … %> and <?php … ?>).

By replacing the full document node with a <script> … </script>-construct it should keep the resulting page as well-formed. Though the filter inserts a <noscript>-tag to improve usability (for clients without JavaScript) but the <noscript>-tag is a block-level tag, so this will generally make the page non-valid. If you want a validating page then you can remove the <noscript> stuff at line 119.

I have pondered a few ways to keep the page as valid and still insert a <noscript>-tag, but I think the best approach would be to instead have the <noscript>-content be the default content, and then use JavaScript to replace that (so avoid using the <noscript>-tag). Ideally the fallback content should be a human-decodable version of the email address.

If anyone ports the filter to PHP please let me know, as I would love to turn it into a WordPress plug-in.

General

14 Comments

29 September 2007

by Billy Halsey

Hi Allan! Your WordPress plugin is available at my blog at this address. Planned for the next version is automatic linking of the email address and user customization of the plain-text alternative.

Thanks for the great idea!

29 September 2007

by Billy Halsey

Hi Allan,

Thanks for testing eMob. You’ve exposed many of the inferiorities that my initial hack seems to suffer. A JavaScript genius I am not.

The biggest problem seems to come from playing nicely with other plugins that automatically convert the email to a clickable link, resulting in something like

<a href="mailto:<abbr id="...

Before this plugin can be used, it’s apparent that I must fix this issue immediately.

As for the added span’s, development was an iterative process and it just fell out of one of the iterations. ;-) Again, not a JS or DOM person, but I’ll clean that up. Thanks for the tips!

07 October 2007

by Brandon

This is awesome! I’ve been obfuscating e-mail address using ROT13 ever since I read your first post on it. Unfortunately I’m still a bit of a Rails newb, so I’m not quite sure what to do with this. I assume I should put it in my application controller? Or is there a better/different place to put it? Anything else I need to do besides that? Thanks!

09 October 2007

by bob

These days Spam harvesting bots have no trouble parsing javascript. So this technique is well-meant, but I’m sorry to tell you that it’s utterly useless.

09 October 2007

by Guillaume Rischard

Sure they can parse javascript, but is it worth the cost? You have a harvesting bot. You find a bunch of javascript. Do you take the time to run it, hoping it has an email address, or just go to the next chunk of text?

Running a simple regular expression on the input will always be many times more efficient, in terms of email addresses harvested/time, than running the javascript on the page.

If this method becomes common enough and computers fast enough to make running the javascript worth it, the code can always be made more complex, to make the decoding of email addresses slower.

09 October 2007

by Allan Odgaard

If bots start to run JavaScript (while harvesting email addresses) then we can litter our pages with traps, i.e. put up invisible links to pages that lead to running infinite loops, or even report infested machines via XMLHttpRequest.

Meanwhile I can report that I receive zero spam for the JavaScript-obfuscated addresses published at this site (and they have been here for quite some time).

Based on a previous test I did it seems majority of harvesting bots doesn’t even entity-decode pages fetched.

So what are you basing your comment on Bob?

16 October 2007

by bob

It’s a shame I lost the program, but i’ve spent a lot of time working on spam-protecting email addresses. Then, a friend showed me a $20 windows shareware tool that can be used to harvest email addresses from webpages. It found all addresses that were only visible after parsing javascript, and it was also pretty good at evading spam-traps (when they were hidden using CSS or javascript, it would just ignore them)

If this method works for you: great, but don’t be surprised if one day you’ll find that you name is on the spam lists.

18 October 2007

by Allan Odgaard

So bob, we have gone from the method being “utterly useless” to you recalling having seen a shareware program that could decipher addresses? ;)

Do you recall any details about this program? Like what type of enciphering did you do, that the program broke?

Don’t get me wrong, I know this is a cat & mouse game, and I certainly do not consider this method anywhere near unbreakable. I can just report that:

Based on a test I did, entity-encoded emails are safe from harvesters
I get no spam to the addresses listed (and enciphered) on these page.

So the cat (harvester) is not too concerned with winning the game (likely because there is far to much low-hanging fruit out there).

But even if the cat awakes, I am pretty optimistic, since legitimate use of email addresses will always be done by a human, and no script can fully mimic a human.

The question is how do we make the least obtrusive turing test?

Two ideas I am presently pondering is:

Use XMLHttpRequest both to fetch the valid address, but also (as a trap) blacklist the client’s IP (or feed it bad data) — a script is likely to follow the wrong path, for a human, use of CSS can make that near impossible.
Use DOM events, e.g. check that mouseEnter was actually called for the link clicked — we could require more intricate “gestures” to be performed.

I have some concerns though about a) making it easy to automatically obfuscate a page (so requiring server support for XMLHttpRequest is maybe not ideal) and b) still support browsers w/o JavaScript, non-mouse users, etc. One solution to that problem though could be, to always provide challenge/response protected email addresses by default, and then have JavaScript alter them to non-protected addresses, when the turing test has determined the page is viewed by a human.

22 October 2007

by Bob

Allen, I’ve asked the friend that gave me the software if he still has it. I’ll let you know..

07 November 2007

by thyncology » Blog Archive » Email Entitizer

[…] by a recent ongoing dialog on the TextMate blog about obfuscating email addresses to prevent spambots from scraping and foiling their […]

09 November 2007

by kameko

interesting that bob has not returned with this magical $20 shareware harvesting tool.

02 December 2007

by Brandon

I’m fairly new to Rails, and I’m not exactly sure what to do with the script? Is it a plugin? Do I call it using before_filter? Is there a special place I should put it?

17 April 2009

by links for 2009-04-17 « Amy G. Dala

[…] TextMate Blog » Obfuscating Emails Revisited (tags: email javascript web_dev utilities) […]

12 May 2009

by Obfuscate no more: why your email address should go au naturale - Jason Priem

[…] are too many js methods to cover in any detail here. Some are better than others; a few try to degrade gracefully for users without Javascript support. All of them, though, share […]