Friday, November 14, 2008

Privacy in open communities?

I just finished reading Doug's latest blog post, which refers to the qemu mailing list archives. We maintain mail archives at Eclipse too, and I'm always interested in comparing notes with other sites.

Beyond all the similarities between their archives and ours, I was surprised to see a link to "download the archives in mbox format."

The mbox file is raw, and reveals email address without obfuscation. It also reveals the IP addresses used to send email (which, in turn, may reveal your geographical location), and, if you're behind a NAT firewall, the mail headers often reveal internal IP addresses too.

Without being overly paranoid, I'm not sure that kind of 'private' information really needs to be out in the open. Putting the raw mbox file up for download is not something we do here at Eclipse.

While I occasionally get asked for the mbox file(s) for Eclipse mail archives, my answer is always 'no'. People are welcome to scrape the 'clean' HTML archives from our website. Sure, it may make research and analysis a bit more difficult, but I feel very strongly about protecting the Eclipse community's email addresses from SPAM harvesters.


Anonymous Doug Schaefer said...

+1. Privacy needs to be the first concern. I love the HTML mailing list archives because the show up in searches, making conversations easy to find.

BTW, I don't see Eclipse newsgroup postings showing up in Google search results. It would be quite valuable to have that.

10:00 AM  

