Friday, May 08, 2009

Optimizing SQL Part 2: the size of your queries

The size and complexity of a SQL query can greatly influence its execution time. Since MySQL 5.0, subselects are now supported, but I often see SQL novices overuse subselects with disastrous results.

For instance, consider these queries I use in Babel.  For accurate results, I ran all the queries once, then cleared the query cache, then ran them again.
select string_id from strings where file_id = 35;
[snip]
83 rows in set (0.00 sec)

select translation_id from translations where string_id in 
(select string_id from strings where file_id = 35) and language_id = 2;
[snip]
126 rows in set (8.18 sec)

If I replace the above subselect with the exact list of string_id values from the first query, look at the results:
select translation_id from translations where string_id in 
(1072,1073,[snip: 80 others],197676) and language_id = 2;
[snip]
126 rows in set (4.94 sec)

In this case, without any further optimization, it's worth issuing two simple queries rather than a single subselect.

Of course, in() cannot compete with an inner join for this type of query:
select translation_id from translations 
inner join strings on strings.string_id = translations.string_id
where translations.language_id = 2 and strings.file_id = 35;
[snip]
126 rows in set (0.02 sec)

Subselects are not always your friend. Using table aliases can then cut down on the amount of bytes you're sending to your server, reducing the load on your network stack, and making your query easier to read:
select translation_id from translations t 
inner join strings s on s.string_id = t.string_id
where t.language_id = 2 and s.file_id = 35;

Next tip: maximizing query cache

1 Comments:

Anonymous Ken said...

Oracle (and probably others) optimizes a non-correlated subquery into an inner join. So some of the blame lay on MySQL, not "SQL novices."

2:23 AM  

Post a Comment

<< Home