Hunting for Database Hotspots, Part 1
When running something like mk-query-digest or our home-grown querysniffer, we like to ask three questions to give us a list of potential database hotspots in our application.
- Which queries are run the most often?
- Which queries take the longest when run?
- Which queries take the most database time overall?
There are many results you’d expect to see common among many web applications when looking at the list including:
- sessions – figuring out if the given user is logged in, and loading up their session
- logins – processing a login and assigning a user into a session
- permissions – should a user be allowed to access a given page
- billing checks – figuring out if a customer has paid
For each of the queries we get on the list, the key is coming up with a plan of attack and prioritized list of improvement areas.
There are several potential attack areas for a given query:
Do nothing, it doesn’t matter – The query is run often, but doesn’t really generate much load. Queries that frequently run on tables that don’t change much can get cached by the MySQL query cache. Queries with really low average execution times are often unlikely to yield high-impact optimizations or are already cached by MySQL.
The query itself – Can it be improved? Can you reduce the number of joins, return less fields, do less sorting, avoid pagination, avoid generating temporary tables, make better use of existing indexes? MySQL’s EXPLAIN is your friend to help you see what you should improve about your queries.
The application – How can the application work with the database to improve performance? Can you eliminate the query entirely by not providing some data to the user? Can you show different but easier to calculate data? Can you reduce the frequency with which it’s accessed by moving more expensive data into drill-down pages rather than on dashboards?
Caching – This is actually related to your application, but deserves a category of its own. How often does the data change? Does the user need the most up-to-date version? Do you know when the data changes and you could invalidate the cache?
Writes – Can you batch them? Can you do them later?
In future posts, we’ll be delving into each of these areas to get more specific.