Once in a while we get asked to move an Eclipse project from Technology to another top-level project. Of course, this involves moving CVS and Bugzilla data. Because Bugzilla doesn't have such "move" tools, we need to go in and manually run SQL UPDATEs on the database.
Manipulating live data this way is scary business. If anything happens to the CVS or bugzilla data, we immediately have dozens of committers armed with hand grenades, revolvers, shotguns, SMGs and RPGs just itching to use them.
So how do we protect the CVS and bugzilla data from potential disasters caused by, among other things, human error?
- We issue test commands on offline datasets until we get them right
- We maintain offsite snapshots of data for fast restore
- We maintain both realtime and delayed snapshots of all our data
I think the key here is realtime and
delayed snapshots. A realtime mirror server will offer little to no protection in case of an "rm -rf /" or a bad SQL UPDATE/DELETE command.
Here are the levels of data protection we employ here at the Eclipse Foundation:
- Redundant disks. No single disk failure should prevent access to the data.
- Redundant storage/database servers. No single server failure should prevent access to the data. This is a costly proposition, as 1.5 TB of disk space is instantly cut in half. Be wary of SANs - even with RAID, the unit itself is a single point of failure if not cloned.
- Offsite clone servers. We have a server in a different physical location that contains not one, but 2 copies (today's and yesterday's) of all the eclipse.org data. This "eclipse.org-in-a-box" is fully equipped to serve our web site, CVS and database if required, and gets daily snapshots from the main servers.
- Backups to tape. We run nightly backups to tape, with monthly offsite backups.
Sound like a lot of infrastructure? It is. But considering what's at stake, we think it's worth it.
Update: fixed a spelling mistake in the word "wary" (Thanks, Bjorn)