Wednesday, September 26, 2007

Search source code in CVS... Can that be useful?

Folks have asked for the ability to search through CVS in the past, so I figured I'd see if I can leverage the existing software we run on to accomplish this. It's important for me to try to use what we already have to keep software maintenance to a minimum. We're running ViewVC already, which provides a nice web interface to CVS/SVN, and we have a search engine, so I've launched the Indexer right now on portions of CVS to see if it can work.

Here is what is being indexed:
- each file's commit log (example). This page contains revision, tag and branch strings, as well as commit comments (which often includes a Bugzilla bug number!)
- the actual source code contained in the HEAD stream only (example). Picking only the HEAD stream will ensure the index isn't contaminated with code from older revisions.

Try a couple of searches with the very limited data that's already been indexed (results may be slow as the indexer is working):
Search for a Bugzilla bug reference: Bug 134394
Another one: 171518
Search by package name: "org.eclipse.ui.tutorials.rcp.part1"
Search for words in source code: TODO

I guesstimate it would take a few days to index all of CVS and SVN, and it will add a few hundred thousand URLs to our search database, so this is all just a test. I'd likely need to upgrade the search engine to the latest release, which offers better performance for large sites like Eclipse.

Again, the key here is leveraging our existing setup for this. Ideally, I could install some new whizbang application that arguably does a better job, but using ViewVC and Search means zero added maintenance for us.

So the question is: Do you think this is useful?

Tuesday, September 25, 2007

Nick Boldt needs to win Best Open Source Comedian Award

Nick just never ceases to make me laugh. Read his latest creation:

How does he come up with this stuff?

This is another one of my faves:

Read more of Nick's in-your-face insightfulness:

Thanks, Nick!

Friday, September 21, 2007

Project Meta-Data for

We had a system where projects maintained meta-data in an XML file in their web tree. That was good because it gave us much-needed information about the projects that we could use on the web site. But it was bad because it was hard to expand it, the projects had to maintain the files by hand, the error rate in the files was high, and correlating data between projects was hard. So we migrated all of that into a database over the last month and a half or so.

All of the project data from the XML files was imported. We updated all of the relevant pages on newsgroups, mailing lists, the /projects page, the timeline, the categories pages, etc to use the new system. Best of all we have built a new Portal component to allow project leads to maintain this data. So what kind of data are we talking about? Some examples are planned release dates, mailing list names, Bugzilla product names, blog RSS feeds, and on and on. We will undoubtedly be adding more as time passes but the idea is to make it easy for projects to describe themselves to us and to the rest of the community so that we can easily make information available on the web. Bjorn has put together some brief documentation on what the data is and how we use it, here. Now, project leads if you don't mind updating your data... ;)

Wednesday, September 12, 2007

How much SPAM do you tolerate before you become annoying?

Sometimes I feel companies go too far with their SPAM protection.

From: someone@
To: webmaster@
Dear webmaster, yesterday I was able to log into bugzilla, and today it sais my account is blocked because bugzilla can't send me e-mail. Can you unlock my account and/or explain to me *why* my account was blocked?

From: webmaster@
To: someone@
Dear someone, your Bugzilla account was locked because Bugzilla cannot send you e-mail. The SPAM protection on your mail servers wants Bugzilla to verify it's a valid sender, but Bugzilla cannot do that because it's not a human.

From: someone's mail server-reply-code-12345
To: webmaster@
You're trying to send mail to someone@. We want to protect ourselves from 100% of SPAM, so please reply to this e-mail leaving the reply-code-12345 intact.

(Oh, good grief).

From: webmaster@
To: someone's mail server-reply-code-12345

Yes, I'm human, this is valid, let me in!

From: someone@
To: webmaster@
That was easy, thanks for your help. Your address was confirmed. You can now feel better about the fact that your e-mail will reach my inbox quickly.

Thankfully, I don't need to go through this hassle to reply to everyone that sends mail to I hope this person is enjoying a 100% SPAM-free Inbox, at the expense of everyone else's time. I mean, I appreciate that your time is valuable, but mine is too, and there are other methods for preventing/reducing SPAM that are less intrusive -- especially considering you mailed me first. In case it's not obvious, the address is not the most spam-protected mail address out on that there web, yet we manage it while remaining productive...

I had to rant.

Timeline Gets a Facelift

The projects timeline at just got a facelift thanks to the Simile Project at MIT and the work we've been doing to centralize all the project information that was formerly held in project-info.xml files into a single database. There is some work to do cleaning up dates that are malformed in the DB (hence some project data is not yet shown) but the new front-end is done and I think it's pretty cool. Check it out here.

Monday, September 10, 2007

Downloads RSS Feed

While we're on a roll with new RSS feeds (and because we like you all so much) (okay and to be completely honest about it--there was a long-time bug about this: bug 164390) we've rolled out a new feed for tracking project releases. It's available from the downloads page as both an icon on the page and in the URL bar at the top of your (supported) browser. If you look at it you'll see it's fairly sparse in the number of releases. The data comes from the project-info.xml files that each project has published (or not). We've rolled all of that data into a new database and will be shortly releasing a portal component for projects to manage it. If you don't see your release listed, the portal component will be the place to change that. In a few weeks. :)