Tuesday, May 02, 2006

MySQL oddities

Some of you who use our site 22 hours/day may have noticed we've been having some database problems for the last couple of weeks. We have two MySQL servers: a master and a replicated slave (well, we used to, anyway). The fault-tolerance aspect of the replication is nice to have, and allows us to get a performance advantage by shifting plenty of SELECT queries to the slave (Bugzilla supports this and benefits tremendously from it, as does our website and search engine).

For the past few weeks, after about 3 days of good service, the Master seems to just stop processing queries. The queries silently queue up until our connection limit is hit. The database server itself responds well; it just doesn't do anything. No logged info, no error message, nothing. So far, no operational parameter seems out-of-place, except one query in the SHOW PROCESSLIST that has a state of *** DEAD ***. We restart the MySQL server and we're good for another few days, except now our slave is out-of-sync.

We went looking in the MySQL docs for that *** DEAD *** state and came up empty handed. Matt downloaded our version's source and started poking around, and found the only instance of the string:

sql_show.cc
#if !defined(DONT_USE_THR_ALARM) && ! defined(SCO)
if (pthread_kill(tmp->real_id,0))
tmp->proc_info="*** DEAD ***"; // This shouldn't happen
#endif

So it looks as if what shouldn't happen happens, except we don't know why and how to fix it. Not being a C++ guru I don't dare look at more code that the above snippet.

A few folks have suggested simply restarting the MySQL server every day. "Lots of sites employ the practice of restarting services regularly", I hear. I dunno, but it sounds Windows-ish to me. Our setup has been working flawlessly for a year without requiring this type of action, so what happened that we'd need that now?

At any rate, the kind folks at Intel have offered to help. If you want to help us work on this issue, you can check out the thread on the mysql forums.

1 Comments:

Anonymous Le ScaL said...

Some companies gave machines to the foundation, could not they provide a license of their database?

4:41 PM  

Post a Comment

<< Home