Monday, March 01, 2010

Hardware Upgrades Part 1: today's problems

Since we've received hardware donations from Google and Intel, I figured I'd start a series on how we're planning on making better.  Let's have a look at the current problems we're facing.

Although there is room for improvement, it all works quite well.

  • Internet traffic is load-balanced across multiple servers for fault tolerance and scalability

  • Shared files are stored on a pair of NFS servers

  • Lots of content stored on local disks to avoid NFS

  • NFS servers also act as the MySQL servers

  • NFS/MySQL servers have three local storage devices: an 8-drive RAID, a 7-drive RAID and a 16-drive iSCSI RAID.

Since all our Cisco network gear was upgraded last year thanks for a very generous donation from Cisco, we're capable of sending multiple Gigabits/second to the Internet.

PROBLEM 1: Using 80% of a 100 megabit Ethernet cable doesn't give us much room to grow.  I can only tap into another 12 megabits or so in case I need bandwidth fast.

PROBLEM 2: The 5-node dev/download cluster works well for redundancy and load capacity, but with only 8GB of RAM in each node, file cache hits are nil.  Every request for a file, be it for CVS, a mailing list archive or a download, must come from NFS.

PROBLEM 3: Virtualized web servers make redundancy and scalability easy, but with only 8GB of RAM in the hosts, one host can only hold one instance of Bugzilla and  CPUs are largely idle.  Furthermore, an obscure Xen issue is drastically affecting memory performance.

PROBLEM 4: Backend servers have 16GB RAM, which is OK, but must share that amount between MySQL and NFS.  MYSQL is "detuned" to not consume too much RAM.  Likewise, MySQL uses precious RAM that could be used for cache.

PROBLEM 5: Single IBM server for builds and signing.  The machine is a monster, but four CPUs can only do so much at any given time.  Also, Continuous Integration means CPUs are rarely idle.

PROBLEM 6: (not shown) Since we have a number of problems, adding new tools such as Git or Gerrit Code Review would only make matters worse.

In my next post, I'll discuss how we plan on addressing these issues with the new hardware we've received.


Anonymous Kim Moir said...

I look forward to this series. Very interesting!

10:58 AM  

Post a Comment

<< Home