Thursday, May 07, 2009

Optimizing SQL: Part 1 - minimizing connections and query counts

www.eclipse.org is a pretty busy site, and for the most part, it's well optimized and quite fast.  I'm not talking about Bugzilla or CVS here, only www.eclipse.org.

I'm a big fan of profiling and optimizing. Perhaps this is because I learned to program on a Timex Sinclair 1000 with 2KB of RAM, where Every Byte Counts!

In this multi-part series, I'll walk through some of the very simple optimizations I've done at Eclipse to make the site as fast as it can be with the hardware we have.  The first installment:

Minimizing connections and query counts


You're likely very familiar with the Pick a Mirror page, the most accessed page on our site. That page needs database information to render, so I tend to keep it lightweight and optimized.

Since I recently added Bit Torrent as a download option, I wanted the Pick a Mirror page to show you the Torrent link if the selected file had one.  I created a simple table called torrents_index which contained the name of the torrent file, so a simple pattern match between your file and the table would suffice.

But that involves querying the database. Our web servers and database servers are separate machines, so each query and response hits the LAN (generally).

Since the Pick a Mirror PHP script already queries the database for basic information about the file, I simply tapped into an existing query by adding a reference to TOR.torrentname (added content in red):
$sql = "SELECT DRP.drop_id, VAR.key_value, TOR.torrentname
FROM drops AS DRP
LEFT JOIN SYS_variables AS VAR ON VAR.key_name = 'download_table'
LEFT JOIN torrents_index AS TOR ON (TOR.torrentname LIKE '%${filename_fileonly}.torrent')
WHERE LEFT('$_file', LENGTH(DRP.our_path)) = DRP.our_path
AND LOCATE(DRP.file_pattern, '$_file') > 0
ORDER BY LENGTH(DRP.our_path) DESC LIMIT 1";
$rs = mysql_query($sql, $dbh);

In the above query, you'll notice that I'm fetching data from three different tables, DRP, VAR and TOR by using a JOIN, but the tables are not actually related in any way.

This simple technique of reusing a query is very efficient. It avoids another call to mysql_query(), and saves an entire round-trip to the database server.  Overhead to the existing query is minimal to nil, as observed by timing its execution and showing the query plan with EXPLAIN.

Next up: Optimizing the size of your queries

3 Comments:

Anonymous Mik Kersten said...

Looks like we share some computer history. I got that Timex Sinclair 1000 for my 9th birthday and was hooked. Brings back lots of memories. My first steps learning English were to translate the BASIC manual into Polish.

We are routinely amazed by the effects of your optimizations Denis. It's amazing how fast Bugzilla runs right now, at least when using Mylyn, and presumably in the Web UI as well. I'm actually able to open and submit bugs more quickly than I can browse and send messages via Outloook/IMAP.

6:21 AM  
Anonymous Denis Roy said...

Mik, that is too funny. I got the TS1000 when I was 8 years old. We eventually bought the 16K RAM pack, but like most, we lost our fair share of work due to a RAM pack crash.

I still have all my TS1000 gear, and I 'booted' it up a couple of years ago to see if it still works, and it does -- 25 years later.

9:55 AM  
Anonymous Denis Roy said...

We've been using APC for the Wiki/Live/Eclipse Plugin Central for a while and I agree, it's great. www is not CPU-bound at all, so we haven't really investigated it. Being geo-based, the mirrors list does change quite often, just not for any particular user :)

11:09 AM  

Post a Comment

<< Home