MySQL: Cloud vs Bare Metal, Performance and Reliability
Los Angeles MySQL Meetup
Let’s meet each other
• Performance geek
– All kinds MySQL and some Sphinx
• Working for Blackbird • Twitter @vfedorkov
Credits and thank yous
• LAMySQL Meetup
– Joe Devon
– Caroline Rose
• Blackbird, AWS advances tech partner
– Laine Campbell
– http://blackbird.io
How does it start
• Startup deciding to go with MySQL • Getting successful
• That server gets slower and slower • Now what to do?
Usual ways to grow
• Buy bigger server (scale up) • Buy another server (scale out) • Tune up your application
Questions to answer
• Which server to buy?
– Where?
• How long does it take to get new server? • How long does it take to make it work? • Why it didn’t help?
Which server to get?
• How much would infrastructure will cost me? • Which performance will I get?
• How easy would be to add more servers? • How much would it cost me?
Physical servers
• Predictable performance • Raw infrastructure
• Takes long to install • Old-style operations
Amazon EC2
• Virtually provisioned boxes
– Managed infrastructure
• Boxes • Network
• Attached disks
– Web/API bound operations
• Strong scalability suport
– Full OS access
• Flexible performance
Amazon RDS
• Fully managed MySQL instances (SaaS)
– Oracle, MSSQL, RedShift, etc. – No OS access
• Same on-demand scalability as EC2
– Scripted scaling up/down
Resources
Bare metal EC2 RDS
CPU Physical Virtual, limited
Virtual, limited
RAM Physical Shared Shared
IO Predictable Network-based, limited
Network-based, limited
Day-to-day MySQL operations
• Hardware faults • Software bugs
• Reliable backup and restore • Monitoring
• Upgrading and replication setup • Networking and Security
Physical servers vs EC2 vs RDS
Bare metal EC2 RDS
Provisioning By you Ready-to-go boxes DB instances Monitoring Your own OS-level by amazon All by amazon
Backups Your own Or no backups Snapshot-based Manual on MySQL level Automated, no delayed replicas Replication Manual Manual Auto magic Networking All yours Provided by AWS Provided by AWS
Hardware faults
Amazon cloud details
• Several regions (US, Europe, Asia)
• Several availability zones inside one region • Different instance types
– General
– Memory optimized – Compute-optimized – GPU-instances
Storage
• EBS – EC2 / RDS • POIPS – EC2 / RDS • SSD-based backend • Ephemeral– EC2 only, attached SSD
RDS details
• Parameter groups instead of my.cnf • One click Web interface
– API available
• Limited visibility and operations • Cloud-specific behavior
– Upgrade as a Surprise
• Can be disables in control panel
Know your application
• Configure MySQL properly • Tune your queries
• Monitor your system • Predict your traffic
EC2 and RDS specific
• Hardware might be different
– Even each benchmark run
• IO is highly relies on network • CPU & IO limiters are in place
First step
• MySQL configuration
– 5% settings makes 95% performance
• Which every query could kill
• What to tune at the first place
– innodb_buffer_pool_size
• Set on ¾ memory in RDS by default
– innodb_file_per_table – key_buffer for MyISAM
Second step
• Network
– skip_name_resolv
• May affect authentication
– max_connect_errors • FLUSH HOSTS – max_allowed_packet • IO – innodb_flush_log_at_trx_commit – innodb_log_file_size
Changes to avoid (usually)
• max_join_size • sort_buffer_size • tx_isolation
• Remember about 5% settings and 95% performance!
Backups
• What would you do if your MySQL has gone?
– Replicas-Replicas-Replicas
• … and binary logs
– For point-in-time recovery – Better to another host
– expire_logs_days – max_binlog_size
• RDS snapshots
• Volume snapshots on EC2
Consider worst case
• Limit InnoDB IO capacity on EBS
– Affecting replica creation time!
• Tune flush_log_at_trx_commit • Keep tmp_table_size big enough
– Temporary tables are BAD! – There are drawbacks!
Queries
• Most likely the cause of MySQL slowness
– Affect application speed
• Fastest MySQL query is the query that did not MySQL at all
– Cached
– Sent to another storage (you name it) – Just didn’t run
Bad queries: full table scans
SELECT * FROM table …
• WHERE DAY(FROM_UNIXTIME(`ts`)) = 205 • WHERE enabled != 1
• WHERE id NOT IN (1,2,3,…,10) • WHERE url LIKE ‘%something%’
– LIKE ‘something%’ is still okay • ORDER BY RAND()
Poor index usage
• SELECT * / COUNT(*)
FROM users WHERE enabled=1
– Hello half of the table
• How to fix?
– Pre-aggregation – Caches!
• Not in MySQL query cache!
Temporary tables
• Bad
– Very bad if they hit the disk
• Will hit if there are TEXT or BLOB fields
• When?
– GROUP BY
– Sub SELECTS (better in 5.6) – DISTINCT + ORDER BY
What else to do?
• Scale
• Run heavy queries on “reporting” replica • Rewrite bad queries where possible
When load is really high
• Query cache (off) • Thread cache
• Table cache
• Slow query log • wait_timeout
• connection pooling
Thank you!
http://blackbird.io,