Tuesday, May 08, 2007

Bugzilla: avoiding stale searches

We have a master-slave MySQL replication setup here at Eclipse for redundancy, and Bugzilla is configured to use the slave DB for those SELECTs that are appropriate for the slave to handle. This helps performance greatly, especially when small queries need to wait for tables locked by large queries. Our Bugzilla database isn't small, and it's open to the world, so it takes a huge beating - people issue the darndest queries, and lots of them - so some queries can take minutes to run, causing the slave's data to be lagged behind the master.

When the slave is lagged, weird stuff happens to Bugzilla. The most popular of complaints occurs when a user runs a named search and sees a bug they recently closed still displayed with an Open state. Confusing, frustrating, and avoidable.

I wrote a host of system monitoring scripts for the Eclipse servers, and one of those scripts is a MySQL monitor. It reports usage metrics for a nifty web page we use, and it also kills queries that run for too long. I recently hacked in functionality that updates the Bugzilla parameters on-the-fly so that it used the master DB exclusively, should the slave DB become lagged more than 120 seconds. When the slave catches up, Bugzilla parameters are changed again to continue using the slave for maximum performance.

Since I implemented this late last week, Bugzilla was switched to the master (and back) at least a dozen times as a result of heavy load on the slave. Good performance + up-to-date queries = happy committers. I like that.

Gunnar "you must do the right thing" Wagenknecht suggested that I release these infrastructure scripts under the EPL, so I'm in the process of doing so. If they're useful for us, they might be useful for someone else too.


Post a Comment

<< Home