Wednesday, September 26, 2007

Search source code in CVS... Can that be useful?

Folks have asked for the ability to search through CVS in the past, so I figured I'd see if I can leverage the existing software we run on to accomplish this. It's important for me to try to use what we already have to keep software maintenance to a minimum. We're running ViewVC already, which provides a nice web interface to CVS/SVN, and we have a search engine, so I've launched the Indexer right now on portions of CVS to see if it can work.

Here is what is being indexed:
- each file's commit log (example). This page contains revision, tag and branch strings, as well as commit comments (which often includes a Bugzilla bug number!)
- the actual source code contained in the HEAD stream only (example). Picking only the HEAD stream will ensure the index isn't contaminated with code from older revisions.

Try a couple of searches with the very limited data that's already been indexed (results may be slow as the indexer is working):
Search for a Bugzilla bug reference: Bug 134394
Another one: 171518
Search by package name: "org.eclipse.ui.tutorials.rcp.part1"
Search for words in source code: TODO

I guesstimate it would take a few days to index all of CVS and SVN, and it will add a few hundred thousand URLs to our search database, so this is all just a test. I'd likely need to upgrade the search engine to the latest release, which offers better performance for large sites like Eclipse.

Again, the key here is leveraging our existing setup for this. Ideally, I could install some new whizbang application that arguably does a better job, but using ViewVC and Search means zero added maintenance for us.

So the question is: Do you think this is useful?


