TextMate News

Anything vaguely related to TextMate and macOS.

Obfuscating Email Addresses

People occasionally ask how to modify the “Convert Selection to Entities” command to also convert ASCII so that they can use it as a simple email address obfuscation technique.

Because of that, we want to add an actual “Obfuscate Email Address” command, but how should it be done?

A study from 2002 says that entity-encoding the email address is enough (conclusion 5) which I find rather hard to believe, which is actually the sole purpose of this post, I hereby give you one piece of entity-encoded email address: wrwilsq02@sneakemail.com. I will update this in a month or so, saying if it got any spam.

2006-09-10: And in response to this comment here is a non-obfuscated email address: 7fi8nmi02@sneakemail.com

2006-10-16: The results are in: 286 emails received to the plain text address and just 1 received to the entity-encoded address. This was from a Nigerian scammer, so perhaps it was manual, although none of my contact addresses have received such email.

Though regardless of whether this address gets spam or not, I think the TextMate command to obfuscate an email address should (when called in an HTML context) insert a small JavaScript which use document.write or similar. Perhaps providing a <noscript>foo~at~bar•com</noscript> version.

So any suggestions as to how the code should be would be appreciated. The goal is to keep it as short as possible (since it will be inline). Add your suggestion as a comment to this entry, remember that formatting is Markdown, so indent your code by one tab or four spaces.

Let me end this post with a general advice: when you start your business do not pick logical addresses such as sales, feedback, support, and similar. These will get spam, no matter how well you obfuscate them.

I personally use tm-sales, tm-feedback, tm-support, etc. which currently do not get any spam at all.

categories General

36 Comments

Interesting: In Safari’s RSS feed the address comes out not encoded at all…

08 September 2006

by Chris Ryland

I have exactly the opposite approach: Use logical names so that people can easily contact you, and assume you will get spam, but set up great spam defenses. (I use DSPAM, an OSS solution which is without peer, though setting it up is non-trivial.)

The email obfuscation method I use is the Automatic Labs Enkoder. From the site: “The Enkoder protects email addresses by converting them into encrypted JavaScript code, hiding them from email-harvesting robots while revealing them to real people.”

If you don’t want to use javascript this isn’t a viable option, of course, but if you can it’s easy and (seems) really good.

I wonder if this can be repackaged as a command for TextMate?

08 September 2006

by Jacob Rus

Is there a reason that a random w and l aren’t encoded? Is that just for fun?

I’ve used the following with fair success - it’s not as obfuscated as the Enkoder option but probably works as well as any of these do (I suspect the spammers have already started rendering HTML before searching it).

<script type="text/javascript" charset="utf-8">
    var noSpam = new Array(...list_of_bytes...);
    for (var c in noSpam) {
        document.write(String.fromCharCode(noSpam[c]));
    }
</script>

re: “Perhaps providing a foo~at~bar•com version.”

I have no idea how good the spammers’ crawlers are but I would imagine that nowadays they have no problem “understanding” all common variations of user AT domain DOT com. This is not meant to discourage the noscript version, just an observation.

And it’s certainly important to make life as hard as possible for the spammers. I think if I had a business I would go for easy-to-remember email addresses (info, feedback, support, etc.) and give my customers standard links with no obfuscation at all. Making it easier for the customers is more important than trying to avoid the unavoidable.

Chris Ryland: My problem with spam filters is false positives. When you get 20+ emails/day, some of them using bad or empty subjects, some even in HTML, then filtering will result in false positives.

Ole: Extending a little on the above, some of the subjects people use are also fooling a human, for example looking in my inbox I have one with a subject of just Next Round and another one showing Possible **VERY** devastating… If these were intermixed with spam emails, I am not sure I would open them, nor those which have an empty subject, or those which I open and they show a 7 pt Helvetica font (these are rare, but they come e.g. from resellers or purchase agents, which are contacting me on behalf of someone else, and clearly not sending the email from a Mac ;) )

Jacob: The encoding to entities is automatically done by Markdown. It has a 10% chance of leaving the character as raw.

I am not sure what the point of it is, though likely because the result looks more chaotic, it was believed that a script would have a harder time targeting the result of this specific encoding function.

Ole: About the intelligence of spam harvesters. Have a look e.g. at this mailing list post. It uses the most basic obfuscation technique (of replacing @ with at) — this is standard for all installations of MailMan and yet I have not received a single spam letter sent to that email address, even though the script wouldn’t need to be intelligent, it could just target MailMan.

I must say I am pretty surprised spammers haven’t figured this out — and I fear the day they do.

oliver: Doing something similar to Enkoder is what I would like to provide out-of-the-box.

I’d prefer though if the resulting script was at most 3-4 lines, since I imagine people would insert email addresses more, if they had such command, but I’d rather not have their pages turn into 50% of encrypted JavaScript ;)

The code snippet by Chris Adams seems more appropriate for this.

I just removed all email addresses from my web sites. Instead I use a free package called [Scform] [1]. This package includes a form for sending emails with addresses provided through a drop-down list. The actual addresses are stored on the web host, but NOT in public space. If I send an email to director@mysite.org, php code looks up the address on the host from a list outside the scope of the web site and not publicly readable.

Having to use a form to send mail is mildly annoying, but sure beats getting a ton of spam to all the people who could be recipients.

[1] http://jimsun.linxnet.com/SCForm.html “Scform web site”

08 September 2006

by Jacob Rus

Lewis:

Say no to html forms for emails. They’re really lame, and by far the worst solution mentioned. :)

-Jacob

08 September 2006

by Jacob Rus

Lewis: Also note, if you put a : after the [1] in your comment, markdown will make the link into an actual <a> tag, like so: Scform

:)

08 September 2006

by Jacob Rus

ack, bleh, it didn’t do it, as I just copied and pasted the converted curly quotes from the rendered output of your comment. :/. Ok, here: Scform

:)

I’ve been using a command that employs the fantastic Pear HTML_Crypt class to obfuscate HTML with complete success. It’s proven extremely good in not only protecting emails from harvest, but completely stopping things like comment spam (by obfuscating the form tag).

I set it up like so:

1) I installed the HTML_Crypt package by entering the following in the terminal:

pear install HTML_Crypt

2) I set up a command in TextMate with the input being “selected text or document” and the output being “replace selected text”:

#!/usr/bin/php
<?php

require '/usr/lib/php/HTML/Crypt.php';

$c = new HTML_Crypt(file_get_contents("/dev/stdin"));
$c->output();
?>

Text gets obfuscated by JavaScript with which I know some purists will have issues, but for me has been fine for practical use. I’ve used it on many, many sites and have received not a single complaint from a single client.

This is a server-side/client-side JS combination we use in our CMS. The junk characters are generated by the server, stripped out by the client. Spans are used to hide the junk characters in the link text. Might be too long for your purposes, but useful for comparison. It’s usually output as a single line (more or less) of code.

<script type="text/javascript"><!--
function tmemail(e) {var f="",i=0,j;
for(;i<e.length;i++){j=parseInt(e.charAt(i));
i+=j+1;f+=e.charAt(i);}return f;}
document.write('<a href="mailto:'+
tmemail("2jkn2kle2lfw1fm1"+
"ja0r2ewk2w8e283t")+"@"+
tmemail("227e17a1fs2dst2s"+
"je0r2jcn2cvh0i0f0i1n."+
"0c0o13.2idn2diz")+'">');//-->
</script>newma<span
style="display:none">j</span>rket@<span
style="display:none">k</span>easte<span
style="display:none">2</span>rnhif<span
style="display:none">j</span>i.co.<span
style="display:none">s</span>nz<span
style="display:none">i</span><script
type="text/javascript"><!--
document.write('</a>');//--></script>

Allan - I’d love to integrate the Enkoder into Texmate - email me and we’ll make it happen.

Hi Allan,

To see if it is the entity stopping spam, you should also include an email address without the entities. That way you’ll be able to compare and measure the difference. With only the one email address in the post, all you’ll be able to say is “I got some spam” or “I didn’t get any spam”. In the second case, it would be perfectly possible that a spammer just didn’t crawl your page. In the first case, you might have stopped some spam but not all, but you wouldn’t know.

Having one email addr with entities, and one without, you’ll be able to say that it was the entities which made the difference.

You’ll also want to make sure that no server-side spam filters are running on the email addresses you use.

Douglas

Allan: thanks for the link, that is indeed interesting. I wouldn’t have thought that spammers haven’t caught that.

I don’t mean to toot my own horn here, but honestly, the HTML_Crypt method is really with a look.

The output looks like (one line):

[Edit: I wrapped it to (almost) fit the comment box —Allan]

<script type="text/javascript">var a,s,n;
function x9764187d0e4c1a77606274d5783afc6d(s)
{r='';for(i=0;i<s.length;i++)
{n=s.charCodeAt(i);if(n>=8364){n=128;}
r+=String.fromCharCode(n-3);}return r;}
a='?d#kuhi@%pdlowr=h{dpsohCh{dpsoh1frp%Ah{dpsohCh{dpsoh1frp?2dA';
document.write (x9764187d0e4c1a77606274d5783afc6d(a));
</script>

And one could easily add some <noscript> stuff for standards sake.

Of course, to be really picky one could point out that that using tm-sales instead of sales, etc. is a clear violation of rfc 2142.

But then, it’s not like anyone reads that one anyway.

10 September 2006

by Steve Lianoglou

Suppoing that there were a vote to take Dan Benjamin up on his offer of integrading the Enkoder into TextMate, I’d be inclined to vote yes on that one.

I love that thing.

Jacob,

You said, “Say no to html forms for emails. They’re really lame, and by far the worst solution mentioned. :)”

Could you elaborate on that, as in tell me why?

Also, thanks for the markdown tip. I missed that.

Lewis

11 September 2006

by Ryan Mohr

Why not a simple noSpamMailto(“joe”, “aol”, “com”)?

function noSpamMailto(a, b, c) {
  var e = a + "@" + b + "." + c;
  document.write("<a href='mailto:"+e+"'>");
  document.write(e);
  document.write("</a>"); 
}

That’s the method I use and I don’t get much spam. I’m pretty sure the bots don’t pick up the email addresses because they don’t execute the javascript prior to crawling the page, and the email is not easily parseable in the source.

11 September 2006

by Ryan Mohr

And here it is again formatted correctly

function noSpamMailto(a, b, c) {
  var e = a + “@” + b + “.” + c;
  document.write(”<a href='mailto:"+e+"'>");
  document.write(e);
  document.write(”</a>“); 
}

Douglas: Yes, you’re right. I want to disprove that entity-encoding prevents spam, but if I get no spam, it can of course be for other reasons. I have now added a non-obfuscated email address (well, some days ago) — interestingly neither of the two addresses has yet received any spam.

Pendant: I seem to recall another RFC saying that domain names should make sense, like no-one would ever register the-moon.com unless it actually got populated with a web server ;) FYI though, I don’t actually bounce the “regular” addresses, they just go directly into the spam folder (which I do manually skim).

Lewis: One of the reasons HTML forms suck for sending email is that the user gets no copy of his message. This means that it can be difficult for the user to track when he sent what, and/or quote what he sent in a later context.

Using an email program to send email gives you that for free, with an HTML form the user needs to make a separate log about what he did — should he later need to reference one of his messages to the receiver, he can’t give exact time of delivery, message ID, or similar, but only say “I sent it via your web form”.

Others: Thanks for the many suggestions.

Just to contribute myself, I found a ROT 13 coder in JavaScript which has the advantage that 1) it can easily be decoded from within TextMate after having generated the JS (press ⌃⌘T and enter rot13 with the email address selected), and 2) it seems to be the shortest code produced, though HTML_Crypt is very close.

Here is an example:

<script type="text/javascript">document.write(
"grfg@rknzcyr.pbz".replace(/[a-zA-Z]/g, function(c){
return String.fromCharCode((c <= "Z" ? 90 : 122) >=
(c = c.charCodeAt(0) + 13) ? c : c - 26);}));</script>

13 September 2006

by Jacob Rus

Say no to html forms for emails. They’re really lame, and by far the worst solution mentioned. :)

Could you elaborate on that, as in tell me why?

Sure.

  1. People have varying taste in email clients, but all clients have some sort of useful text box. The text box in an html form is tiny and obnoxious (like the one I’m typing in).
  2. HTML forms don’t put sent mail in my “sent” box. I usually can’t cc/bcc an html-form email to myself, or my friend, or my lawyer.
  3. HTML forms don’t let me set up other email headers.

Basically, it’s a control issue. If I send mail using the usual email system (smtp and the rest), I have complete control over my end. If I use your form, you have control.

13 September 2006

by Jacob Rus

Lewis:

well, so much for the quotes there… sorry the output got messed up. Hopefully it’s still understandable

I created an “Enkode Select” command for TextMate in my FEC (web front-end code) bundle. I’ve got it in a public Subversion repository at http://vivalaweb.info/svn/projects/fec-tmbundle/trunk/

I notice this got a little off the TextMate-specific question, so let me comment first on email obfuscation techniques in general. One method I used with great success is giving out a URL instead of email on research papers, articles, etc. that I write. Obviously this doesn’t help on a Web form where you need to put in an email address, but it has cut down on my spam from the bots that sniff for email addys. If you don’t have a URL, form a mailto:name@domain URL using http://www.tinyURL.com and give that out. Spiders pick it up as a URL (for now), but humans who click get an email link.

As far as TextMate, I don’t see the feature adding any value, because, like Allan, I just can’t believe that study. I think there are many methods in this thread that will end up being more effective.

The best non-javascript non-image method of obfuscating an email address while clicking it still works (in Firefox and IE) is in my link.

A random-text logo generated from mardeg.sitesled.com is shrunken to normal size using inline CSS, the emailto links are then accessed from separate files - CSS for IE, and XML for Firefox.

This is definitely the solution for what to put in the NOSCRIPT tags; in fact you won’t see the demonstration in the link appear until you disable javascript in your browser.

We’ve used Andrew Gregory’s scripts with success: http://www.scss.com.au/family/andrew/webdesign/emaillinks/

Although you probably don’t want to require the inclusion of external libraries. But perhaps the idea (rewriting existing html) can be slimmed down to inline code.

Allan, I would use both a command to obfuscate via the Enkoder method (inline JavaScript) and the convert ASCII to entities method.

Depending on the project I use both these methods. Right not for the entities conversion I run the link through Markdown, take the output, and paste the into the source file (for cases where the source isn’t in Markdown).

So my vote is to add both into a bundle.

One more vote for the Enkoder, been using it since it was just the version on the web site. Personally I think it’s cruel to post an address without it.

Allan, I’m not sure that the fact that it produces 20ish lines of code will discourage people from using it (if that was indeed your argument). If that were true, so many people wouldn’t be doing it by copy/pasting with the stand-alone application. Personally I don’t find the JS distracting - it’s in a nicely formatted box - but if you do, couldn’t the bundle command then fold that block?

I’ personally like dan benjamins Enkoder, however the code it generates is too bulky for my phonebook page which displays about 70 clickable email addresses. I’m looking for something a bit slimmer.

Mac OS X users can also use a Dashboard widget called obfuscatr. It provides JavaScript or just plain hexadecimal encoding of your email addy similarly to the Markdown functionality provided here Re the email obfuscation. See the details at http://tekkie.flashbit.net/mac-os/obfuscatr-110-released.

obfuscatr was also featured in MacWorld Italy of March 2008: http://tekkie.flashbit.net/mac-os/obfuscatr-featured-in-macworld

This is a very handy function in TextMate. I’ve made a Python version of it for working with dynamically generated HTML.