Retaining globally
distributed high
availability
Art van Scheppingen
1. Who is Spil Games? 2. Theory
3. Spil Storage Pla9orm
4. Ques=ons?
Who are we?
Who is Spil Games?
• Company founded in 2001
• 350+ employees world wide
• 180M+ unique visitors per month
• 45 portals in 19 languages • Casual games
• Social games
• Real =me mul=player games
• Mobile games
• 35+ MySQL clusters
• 60k queries per second (3.5 billion qpd)
Geographic Reach
180 Million Monthly Ac=ve Users(*)
• Over 45 localized portals in 19 languages
• Mul= pla9orm: web, mobile, tablet
• Focus on casual and social games
Girls, Teens and Family
spielen.com
juegos.com
gamesgames.com
games.co.uk
Brands
Foundations
The exci2ng theory
• What does it exactly mean?
Wikipedia:
High availability is a system design approach and
associated service implementa=on that ensures a
prearranged level of opera=onal performance will be met during a contractual measurement period.
Oracle:
• Availability of resources in a computer system
• Master with (many) slave(s)
How do we reach HA with MySQL?
Master
• Master with (many) slave(s)
• Mul= Master
How do we reach HA with MySQL?
Master
Slave
Master
• Master with (many) slave(s)
• Mul= Master
• Clustering
How do we reach HA with MySQL?
Mysqld Mysqld ndbd ndbd ndbd ndbd ndbd mgmt
• Master with (many) slave(s)
• Mul= Master
• Clustering
• Geographical redundancy
How do we reach HA with MySQL?
Master local DC Slave local
DC
• Scale up • Ver=cal
• Faster CPU/Memory/disks
• Expensive
• Costs mul=ply in same rate as # of nodes
• Scale out
• Horizontal
• More (small) machines
• Inexpensive
• Par==oning/federa=ng (sharding)
• Func=onal
• Shard your database func=onally
• Reads
• Add more slaves (keep them coming!)
• Writes
• More disks
• Horizontal par==oning
• Federated par==ons
• Breaking up tables in small parts on the same host
• Par==oned on a column
• Infinite growth (as long as you add diskspace)
• Less used data to slower (cheaper) disks
• No stored procedures, func=ons, etc
• Uneven usage of par==ons (hash par==on may help)
• Once wrihen, data remains on the par==on
Horizontal partitioning
• Breaking up your table in parts on mul=ple hosts
• Par==oned on a column
• Infinite growth (as long as you add hosts)
• Less used data on slower hosts
• Not supported in (standard) MySQL
• Par==oning on applica=on level (or proxy)
• Alterna=vely: NDB
• Uneven usage of par==ons
• Once wrihen data (mostly) remains on the par==on
• Parallel queries to retrieve data from all shards
Federated partitions (sharding)
• Parallel execu=on of sequen=al jobs
• Limited by the weakest link
• As fast as the slowest node
• Fix: nonsequen=al (asynchronous) execu=on
Typical LAMP stack
Client Webserver PHP MySQL Memcache Webserver PHP LoadbalancerA-typical LAMP stack
Client Webserver PHP MySQL Memcache Webserver PHP Loadbalancer MQ JobsSpil Storage Platform
Abstrac2ng the storage layer
• Dependent on one storage pla9orm
• No more pla9orm-‐specific query language
• Differen=ate writes
• Op=mis=c (asynchronous)
• Pessimis=c (synchronous)
• Shard data beher
• Par==on on user and func=on
• Cluster informa=on by users, not by func=on
• Global expansion
• Par==on on geographic loca=on
• Solve uneven usage of data storage
• Move data from shard to shard
• Anything may/could/will fail eventually
• Not designed for the “happy” flow
New architecture overview
Server API Application Model Storage platform Client-side API Presentation layer• Everything wrihen in Erlang
• Piqi as protocol • binary
• JSON
• XML
• SSP u=lizes local caching (memcache)
• Flexible (persistent) storage layer • MySQL (various flavors)
• Membase/Couchbase
• Could be any other storage product
• MQs (DWH updates)
• Predictable
• Reliable
• Decent performance
• Easy to comprehend
• Excellent eco system • Libraries
• Monitoring tools
• Knowledge
• Func=onal language
• High availability: designed for telecom solu=ons
• Excels at concurrency, distribu=on, fault tolerance
• Do more with less!
• Other companies using Erlang:
• What is the bucket model?
• Each record has one unique owner ahribute (GID)
• GID (Global IDen=fier) iden=fying different types
• Bucket(s) per func=onality
• Bucket is structured data
• Ahributes contain data of records
• Ahributes do not have to correspond to schema
$ curl -‐X POST -‐H 'Accept: applica=on/json' -‐H \
'Content-‐Type: applica=on/json' -‐-‐data-‐binary "{\"gid\": \
288511851128422401}" hhp://127.0.0.1:8777/demobucket/get { "records": [ { "gid": 288511851128422401, "given_name": "g", "registered_on": 1, "email": "mail1", "gender": "m",
"birthdate": { "year": 1963, "month": 6, "day": 21 } }
],
"meta_info": { "total_ct": 1 } }
CREATE TABLE demobucket (
gid bigint(20) unsigned not null, given_name varchar(64) not null,
registered_on =nyint(3) unsigned default 0, email varchar(255) not null,
gender enum(‘m’, ‘f’, ‘u’) not null default ‘m’, birthdate date not null,
PRIMARY KEY(gid) );
CREATE TABLE demobucket (
gid bigint(20) unsigned not null, user_name varchar(64) not null,
user_register =mestamp on update CURRENT_TIMESTAMP(),
user_emailaddress varchar(255) not null, user_gender char(1) not null default ‘m’, user_dob varchar(10) not null,
PRIMARY KEY(gid) );
CREATE COLUMNFAMILY demobucket ( gid int PRIMARY KEY,
given_name varchar, registered_on =mestamp, email varchar, gender varchar, birth_date varchar );
demobucket:get( #demobucket_get_input{ gid=12345, filters= [
#filter{ ahr= <<"gender">> , op= <<"=">> , parms= {string, <<"f">>}}, #filter{ ahr= <<"registered_on">>, op= <<"sort">>, parms=asc },
#filter{ ahr= <<"gid">>, op= <<"limit">>, parms={int, 10 }} ]} )
• Nearest datacenter (DC) to the end user
• Satellite DC
• Processing and caching
• Do not own/store data
• Storage DC
• Processing, caching and persistent storage
• Store all same user data in same DC
• Par==on on user globally
• Global IDen=fier per user
• Contains GIDs and their master DC
• GIDs master DC predefined
• Migrated GIDs get updated
• Globally sharded on GID
• (local) GID Lookup
How does this work?
GID lookup
Shard 1 Shard 2
Persistent storage
• Spread data even on shards
• Migra=on of buckets between shards
• GID migra=on between DCs
• Crea=ng a new storage DC needs data migra=on
• Users will automa=cally be migrated a‚er visi=ng
another DC many =mes
• Versioning on bucket defini=ons
• GIDs are assigned to a bucket version
• Data in old bucket versions remain (read only)
• New data only gets wrihen to new bucket version
• Updates migrate data to new bucket version
• Migrates can be triggered
Seamless schema upgrades
Demobucket v1 GID 1234 1235 1236 1237 1238 1239 name Roy Moss Jen Douglas Denholm Richmond Demobucket v2 GID name gender GID 1241 name Patricia gender f GID 1241 1235 name Patricia Moss gender f m GID 1234 1236 1237 1238 1239 name Roy Jen Douglas Denholm Richmond GID 1234 1237 1238 1239 name Roy Douglas Denholm Richmond GID 1241 1235 1236 name Patricia Moss Jen gender f m f• Every cluster (two masters) will contain two shards • Data wrihen interleaved
• HA for both shards
• No warmup needed
• Both masters ac=ve and “warmed up”
• Slaves added (other DC) for HA and backup
Multi Master writes
SSP Shard 1 Shard 2
• SPAPI is in place
• SSP is (mostly) running in shadow mode
• GID buckets running in produc=on
• Ac=vity feed system first to produc=on
• Satellite DC in early 2013!
• Presenta=on can be found at:
hhp://spil.com/perconalondon2012
• If you wish to contact me: [email protected]
• Don’t forget to rate my talk!