Monday, November 24, 2008

Eclipse servers out and about last weekend

After months of stellar uptime, the Eclipse servers were acting up last weekend. As it turns out, our primary backend server became so busy doing file operations that its response time was measured in minutes. For all intents and purposes, consider a load of "10.00" to be somewhat high. Here is 'fred', with load averages in the 40's whereas its usual load is about 5.00 :

On our dev/download cluster, node 5 had an interesting day Saturday, slowly crawling up to a barely responsive 331.90:

On Sunday, the same node was pretty much dead all day (no blue line at all). At some point it did manage to resurface, and report a load average of 1932.61. I had never seen such a number on a Linux server.

Oh well, so much for perfect uptime.


Anonymous Kim Moir said...

Interesting. Do you know the root cause that made the primary backend server so busy over the weekend?

10:16 AM  

Post a Comment

<< Home