An application’s performance can often be improved in many ways, but query optimization should always be near the top of your toolkit. Databases are designed to optimize many cases on their own and yet we know that by asking them questions in different ways and changing the structure of data, performance can vary wildly!

The query itself – Can it be improved? Can you reduce the number of joins, return less fields, do less sorting, avoid pagination, avoid generating temporary tables, make better use of existing indexes? MySQL’s EXPLAIN is your friend to help you see what you should improve about your queries.

Looking at an individual query, SELECTs can be first run through MySQL’s EXPLAIN. The output might look something like:

*************************** 1. row ***************************
 id: 1
 select_type: SIMPLE
 table: pending
 type: ALL
 possible_keys: mailkey
 key: NULL
 key_len: NULL
 ref: NULL
 rows: 1396998
 Extra: Using where; Using temporary; Using filesort

Look for your possible_keys and the key that MySQL plans to use for executing your query. Are there fields you could include in your where clause that might make it quicker to find the row (or rows) that you’re looking for?

In the query I’ve referenced, you’ll notice that no keys are being used (see the key column), and it’s using a temporary table, and a filesort (both noted in the Extra column). No matter what anyone tells you, neither of these are intrinsically bad, but if you have too many queries using temporary tables and filesorts across your application, you’ll quickly need to get more disk I/O bandwidth to handle them or they become a Hotspot.

Are you asking the question in the right way? Can you add an index? Can you eliminate joins? Can you eliminate subqueries? Can you delete data to improve performance?

Let’s walk through an elementary example optimization from start to finish. We’ll start with a simple schema:

CREATE TABLE `test_opt` (
 `id` int(11) NOT NULL auto_increment,
 `owner` int(11) NOT NULL,
 `data` char(40) character set latin1 default NULL,
 PRIMARY KEY  (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

This table has three columns and you’ve been using it to store data for clients. Each row has an auto-increment id, then a foreign key owner field, finally the data payload in the data field.

We’ll generate some test data using this small Perl snippet (test_opt.pl)

#!/usr/bin/perl

for ($i = 1; $i<=10000; $i++) {
 print $i%13 . "\t". "payload $i\n";
}

And the following commands:

$ perl test_opt.pl > test_opt.tsv
$ mysqlimport -u username -p -c owner,data --local test_opt_db test_opt.tsv

For more details on mysqlimport see the MySQL Manual

The primary queries you’ve been running against this table are single-row SELECTS, and UPDATES using the PRIMARY KEY id. Obviously these queries are quite fast.

mysql> explain select owner,data from test_opt where id=4393\G
*************************** 1. row ***************************
 id: 1
 select_type: SIMPLE
 table: test_opt
 type: const
 possible_keys: PRIMARY
 key: PRIMARY
 key_len: 4
 ref: const
 rows: 1
 Extra:

The explain for the query shows it’s using the PRIMARY KEY in the key column. Since we’re using InnoDB the rows column is always an estimate, however in this case it knows we’re using the PRIMARY KEY and so there will be one row.

Even to look up all the rows for a given owner, we start to see a problem. For example:

mysql> explain select id, data from test_opt where owner=10\G
*************************** 1. row ***************************
 id: 1
 select_type: SIMPLE
 table: test_opt
 type: ALL
 possible_keys: NULL
 key: NULL
 key_len: NULL
 ref: NULL
 rows: 10269
 Extra: Using where

You’ll see MySQL is now showing that possible_keys is NULL. In the Extra field you’ll see Using where. Since we’re not using any keys, this is a euphemism for a table scan (hence the 10300 rows estimate)! If this table had millions of rows, it would scan all of them to show all the entries with the same owner.

In this case, the answer is easy–add an Index!

mysql> alter table test_opt add key owner (owner);
Query OK, 10000 rows affected (0.42 sec)
Records: 10000  Duplicates: 0  Warnings: 0

You’ll see the explain improve too:

mysql> explain select id, data from test_opt where owner=10\G
*************************** 1. row ***************************
 id: 1
 select_type: SIMPLE
 table: test_opt
 type: ref
 possible_keys: owner
 key: owner
 key_len: 4
 ref: const
 rows: 768
 Extra:

Problem solved!

In the real world, problems won’t always be this simple, but the key is to dig in and see what your options are. Some challenges you might run into in the real world:

  1. There’s already an index, but the queries are still slow
  2. The table already has a bunch of indexes
  3. The table is too large to add an index without extended downtime
  4. The data isn’t clean enough for an index to be helpful (particularly when a UNIQUE index is appropriate)

All of these can be addressed and overcome with creativity and patience.

The keys of optimizing schema boil down to a few questions both when designing schema and evolving it:

  1. What data do I need to store?
  2. What questions/queries will be asked of the data — will it need to be joined, have ownership information, accessed for administrative purposes?
  3. How will the table scale as data grows?

I’m not going to try and cover every trick for getting more out of your database here as that’s a whole topic in itself. I will however link to several resources where you can read more about Schema Design, Query Optimization, and Performance Tuning of MySQL feel free to add your own resources in the comments:

  1. Jay Pipes
  2. MySQL Performance Blog

Making your site faster and more scalable by tuning queries and schema includes art and science so don’t forget to be persistent and use testing to verify whether your changes are providing the value you expected!