Database Optimization
for Web Developers
About me
•
16 years experience in web development and administration
(systems, databases, networks)
•
Various jobs:
• Web Developer • Chief Architect • Manager of IT
• Director of Technical Operations • Database Consultant
•
Mostly worked with MySQL
• Developer • DBA
• Administrator
What slows down your web page?
•
Content load time (CSS / JS / Images)
•
Poorly optimized code
Assumptions
•
Average web developer proficiency with SQL databases
•
SQL
•
JOINs
•
Indexes
•
No specific database server expected
MariaDB [sakila]> EXPLAIN SELECT COUNT(*) FROM payment WHERE
payment_date >= '2005-‐05-‐01 00:00:00' AND payment_date < '2005-‐06-‐01 00:00:00';
+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+ | id | table | type | possible_keys | key | rows | Extra | +-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+ | 1 | payment | ALL | NULL | NULL | 16451 | Using where | +-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+
Not enough indexes!
MariaDB [sakila]> EXPLAIN SELECT COUNT(*) FROM payment WHERE
payment_date >= '2005-‐05-‐01 00:00:00' AND payment_date < '2005-‐06-‐01 00:00:00';
+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+ | id | table | type | possible_keys | key | rows | +-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+ | 1 | payment | range | payment_date | payment_date | 1155 | +-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+
Not enough indexes!
When are indexes used?
•
Left-most prefix of index
•
If multiple columns, can use from left to right until column is not specified
exactly
•
Index cannot continue to be used after a range
Too many indexes?
MariaDB [sakila]> SHOW KEYS FROM film;+-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+ | Table | Key_name | Seq_in_index | Column_name | +-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+ | film | PRIMARY | 1 | film_id | | film | idx_title | 1 | title | | film | idx_fk_language_id | 1 | language_id | | film | idx_fk_original_language_id | 1 | original_language_id |
| film | idx_title_year | 1 | title | | film | idx_title_year | 2 | release_year |
Avoid full table scans
SELECT f.FID, f.title, c.category_id, c.nameFROM film_list f JOIN category c ON c.name=f.category WHERE f.price=4.99
+-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+ | table | possible_keys | key | rows | Extra | +-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+ | c | NULL | NULL | 16 | | | f | NULL | NULL | 973 | Using where; Using join buffer | +-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+
Don’t put a query inside a loop
•
Get all the data in one query, and then loop through the results
•
Use a (well-indexed) JOIN to eliminate the need for additional queries
•
If you’re using an ORM, be careful not to put code in a loop that will result in
queries being run; generally, ORMs have bulk-loading methods and ways to
specify JOINs
foreach ($film_ids as $film_id) {
$result = $db-‐>query("SELECT * FROM film WHERE film_id=$film_id");
$row = $result-‐>fetch_array();
echo "<li><span>".$row[1]."</span></li>\n";
Other kinds of indexes
•
B-tree index
•
Hash index
•
Full-text index
•
MyISAM only until MySQL 5.6, MariaDB 10.0.3
•
Often better to use external full-text indexes, like Sphinx, Lucene, Solr,
ElasticSearch
•
Spatial index
•
“Minimum bounding rectangle” before MySQL 5.6 or MariaDB 5.5
•
BYO index
•
Hash column for long keys
•
Grid index
If you can’t index it, cache it!
•
Do the expensive stuff less often
•
Run reports for executives / marketing once a day
•
Create a read-only cache table for searching
•
Don’t propagate inactive rows to the active cache; in other
words, limit the number of rows to be searched
•
Rotate cache tables to keep the active cache table read-only
Eliminate redundant queries
•
Ensure that commonly-accessed data is only
retrieved once
•
Store relevant user information in their session
(with expiration)
•
Use memcached (or an equivalent fast data
store)
•
If you use an ORM, it may have ways to
MyISAM’s “table lock” problem
Read Queue
SELECT (0.1s)
SELECT (3.5s)
SELECT
(121s)
SELECT (0.0s)
SELECT (1.2s)
SELECT (5.1s)
Write Queue
UPDATE
(0.0s)
Multiversion Concurrency Control
(Time T1)
1: a
2: b
3: c
4: d
5: e
6: f
7: g
8: h
9: i
10: j
(Time T2)
1: a
2: b
3: c
4: d
5: h
6: f
7: g
8: h
9: i
10: j
Query 1 at time T1:
SELECT * FROM tbl
Query 2 at time T2:
UPDATE tbl SET v=‘h’ WHERE k=5
Even if Query 1 gets to row 5
after
time T2, the
database will return ‘e’ for row 5. The data will be
consistent from when the query began.*
Batch writes
•
Put the data into a queue
•
Have a process that reads the queue and does a bulk
insert into the database
•
Redis has a built-in queue data type (LIST)
NoSQL interfaces
•
MySQL
•
HandlerSocket
•
memcached
•
PostgreSQL
•
unknown
Replication
•
This is by no means a cure-all
•
One should not use replication slaves to solve
concurrency issues without understanding the
ramifications
•
It’s possible that replication can
increase
the impact of
Tame your ORM
•
Most ORMs have a debug console where you can see
every query that is performed
•
For MySQL, you can use the Percona Toolkit’s
Conclusion: Be smart!
•
Know what your database is doing
•