Wednesday, April 30, 2008

Portal Committer Nominations Update

We've had the Portal's committer election process in the field for more than a year now. During that time we have made some refinements and we haven't stopped looking at how we can improve things. I'll be working on a lot of updates over the next few months as we implement changes necessary for the Standardized Groups effort (uber-bug 198541).

In the interim I'm trying to wrap up some of the outstanding bugs we have against the existing process and one of the things that nobody likes is how the project lead has to enter the prospective committer's employer information, even if we know about it already. Secondly, that the lead has to enter that information, not the prospective committer. Lastly, the whole process requires some manual intervention by Foundation staff to tie records to existing committers when a different email address is used than the one we know about. The problem is all derived from the fact that the nomination can be based on any email address. We don't know who the person is so we can't tie the nomination to any existing information about them, and we can't get them to log in in such a way as to tie information together either. So we rolled out a change yesterday that should solve this problem, and hopefully make the nomination process easier. Only the first part is solved, I have not yet fixed the problems for the project leads, but that fix is on its way. What we rolled out yesterday is a new nomination form. It now looks like this:

You can type any part of the name or email address of an existing Bugzilla ID or committer email address and it will search our records to find you all the matches. You can then select the person you are nominating at the next step, the familiar page you've been using for the better part of the last year. Here's the modified page with the new drop-down showing search results:

Yes, the box busts right out of our small widgets. Gabe is working on making that better for all of our components that need it. For this process, though, the one major change is that everyone needs to have either a valid Bugzilla ID or an existing committer ID to be nominated. Committers are all required to set up Bugzilla anyway (and most only get nominated after fixing bugs), so by moving this step to the beginning of the process we can solve a lot of problems for Portal users and speed up the nomination process. More changes are on their way, including the mentioned fixes for project leads. Hope you enjoy!

Bugzilla: mod_perl + mod_deflate = giggles

Bugzilla 3.0 supports huge performance gains via mod_perl, but I haven't enabled it on bugs.eclipse.org because the added RAM requirements would kill our PHP-tuned cluster. mod_deflate, which compresses HTML output on-the-fly, is not enabled either, as it adds too much CPU load to our already busy cluster. However, as we gear up to deploy newly donated AMD hardware, we can afford to use both mod_perl and mod_deflate for some serious performance gains.

Of course, before enabling such features on our live site, some benchmarks are in order.

A simple perl script is used to fetch random bugs. This is not typical Bugzilla usage, as one would read and write (and perform searches), but Bugzilla page fetches far outnumber writes, and it does allow a uniform way of measuring raw Bugzilla throughput.

#!/usr/bin/perl

while (1==1) {
$bug_id = int(rand() * 150000);

@lines = `wget --no-check-certificate --header='Accept-Encoding: gzip,deflate' \
https://bugs.eclipse.org/bugs/show_bug.cgi?id=$bug_id 2>&1`;

}

For each test I launch 20 of these scripts on 2 different computers. 'bugs.eclipse.org' refers to the new AMD-based Bugzilla server running in my office. SSL is enabled.

TEST 1: typical CGI, no mod_perl, no mod_deflate.
Pages/minute: 375
RAM used: 1.5G
Load average: 21.00
Typical latency for 1 page: 1 to 2 seconds
Network transit: 8.4 MB/minute (typical page size 23 KB)

This test reflects our current setup. Simply fetching a bug page yields a noticeable latency before any network transit occurs. Forking a Perl process and compiling show_bug.cgi are expensive.

TEST 2: mod_perl, no mod_deflate.
Pages/minute: 916
RAM used: 1.4G
Load average: 15.33
Typical latency for 1 page: 0 to 1 second
Network transit: 20.5 MB/minute (typical page size 23 KB)

Wow. A net gain of 541 pages/minute, with page latency near zero. With no perl to fork and page to compile, the Apache gets bits on the wire almost instantaneously.

TEST 3: mod_perl and mod_deflate with compression level '1' (least).
Pages/minute: 930
RAM used: 1.7G
Load average: 17.00
Typical latency for 1 page: 0 to 1 second
Network transit: 6.1 MB/minute (typical page size 6.7 KB)

Another Wow. Page throughput is up only slightly, but Network transit is cut by 70%. mod_deflate will be most helpful on pages with many comments, and search result pages.

TEST 4: mod_perl and mod_deflate with compression level '9' (most).
Pages/minute: 910
RAM used: 1.7G
Load average: 17.00
Typical latency for 1 page: 0 to 1 second
Network transit: 5.24 MB/minute (typical page size 5.9 KB)

Page throughput is down compared to minimal compression, and page size is decreased only marginally. I suspect I'm getting all I can out of this box (and/or, my two 'client' computers are hosed), but the results are conclusive: mod_perl makes Bugzilla giddy-up, and mod_deflate with compression set to 1 gives the most bang-for-your-buck on the network.

Expect to see noticeable performance gains for Bugzilla sometime in May.

Thursday, April 24, 2008

Your CPU is bleeding edge when...


.. this is your boot screen. I'm setting up the cool AMD servers that were donated at EclipseCON. Strangely, one of them has "AMD Engineering sample" as the CPU brand, instead of its properly-branded "Quad Code AMD Opteron" brothers.

I am fully confident that these Engineering Samples are up to the task. Sort of.

Monday, April 21, 2008

MyFoundation Portal Gets a Facelift

Committers and members are familiar with the MyFoundation Portal that has become a key face of the Foundation in the last year and a half. When you logged in, components on the screen looked like this.

Gabe and I hacked on the Portal on Friday and Gabe did some more hacking this morning, and those square corners have been given a more Web 2.0 look with round corners and slightly larger borders. Here are the results! We hope you enjoy the new look!

Thursday, April 17, 2008

Humour me: Nick Boldt strikes again

You need to go read the description of this bug. Nick's LOL script made me LMFAO!!11!

Thursday, April 10, 2008

Tuning busy Linux boxes

Anyone with a bit of skill can put together a fast Linux box for serving files and databases. Hook up thousands of users accessing 500G of data with hundreds of SQL queries/second and you have a challenge.

You don't need fancy tools for finding bottlenecks, as they typically occur in four areas: CPU, RAM, Disk and Network.

Here's an sample of 'top' taken from our MySQL and NFS master. This busy box is a quad-cpu IBM POWER 5 box with 16G RAM, and two RAID-5 arrays (one 8-disk 15K RPM and one 7-disk 10K RPM) on separate RAID controllers:

(There is a picture here)

Although this box is currently very fast and responsive, it's good to identify the first bottleneck before it becomes a problem. Let's investigate:

1. A load average of 8.18 is not necessarily bad, considering the box has eight processor units (through the magic of SMT), but high load doesn't necessarily mean your CPUs are slow.

2. An average of about 62% of CPU time is waiting for I/O. This is not good, and it's our biggest source of bottleneck. Unlike idle time, a high 'wa' value means the CPU cannot do anything else than wait, because it needs the I/O to continue.

3. 16G RAM with 16G used is not necessarily a sign of low RAM. A busy Linux box *should* use all its RAM for disk cache and buffers. On this box, 3000252 (3G) of RAM is caching files from disk.

4. The MySQL process is taking up a whopping 6.1G of RAM. It may be a bit too aggressive for this box, as some of that RAM could go towards disk cache.

5. Of the 8 NFS daemons, only one is able to Run (R), while the others are blocked by IO wait (D). These are the source of our high I/O wait time from item 2. MySQL and the LDAP server are happily Sleeping (S), waiting for work.

One could conclude that I need to get faster disks (or more of them), but in fact it needs RAM, and lots of it. It's spending too much time going to disk for files when they could be cached, and at 3G the disk cache is not nearly enough for 500G of actively used data. An alternative to more RAM would be to free that 6G occupied by MySQL by moving it to another box.

An extra 16G of RAM (32G total) would go a long way in decreasing disk I/O. Having a total of 56G of RAM would be ideal -- 50G of disk cache (10% of the active data size) and 6G for the monstrous MySQL process.

Tuesday, April 08, 2008

April Fools fun with committers

I'm only a week late in blogging about this, but I pulled a joke on my beloved committers last week when I sent out an oh-so-formal-looking email stating that I was:

- drastically reducing disk quotas (in light of p2's greatness)
- moving the server infra from Linux to Windows (hahaha)
- erasing all CVS projects at the end of the week, forcing all to migrate to SVN

A few 'fell' for the joke, although it's understandable -- I don't have a habit of joking on that list -- but at the end of the day it got a bunch of committers talking to each other, and a few thanked me for making them laugh. It's all in good fun.

Now excuse me as I must go press ENTER on that rm -rf /cvsroot/* command.

Tuesday, April 01, 2008

COBOL IDE project most popular Eclipse project

Long-time popular projects WTP and CDT didn't stand a chance against the COBOL project's stellar gain in popularity in March. Registering over 6 million downloads for March alone, the COBOL project has become the #1 most popular project at eclipse.org. "The Canadian government still uses COBOL" said Eclipse Evangelist Wayne Beaton. He then just stood there, silently.