Retaining globally distributed high availability Art van Scheppingen Head of Database Engineering

(1)

Retaining globally

distributed high

availability

Art van Scheppingen

(2)

1.  Who is Spil Games? 2.  Theory

3.  Spil Storage Pla9orm

4.  Ques=ons?

(3)

Who are we?

Who is Spil Games?

(4)

•  Company founded in 2001

•  350+ employees world wide

•  180M+ unique visitors per month

•  45 portals in 19 languages •  Casual games

•  Social games

•  Real =me mul=player games

•  Mobile games

•  35+ MySQL clusters

•  60k queries per second (3.5 billion qpd)

(5)

Geographic Reach

180 Million Monthly Ac=ve Users(*)

•  Over 45 localized portals in 19 languages

•  Mul= pla9orm: web, mobile, tablet

•  Focus on casual and social games

(6)

Girls, Teens and Family

spielen.com

juegos.com

gamesgames.com

games.co.uk

Brands

(7)

Foundations

The exci2ng theory

(8)

•  What does it exactly mean?

(9)

Wikipedia:

High availability is a system design approach and

associated service implementa=on that ensures a

prearranged level of opera=onal performance will be met during a contractual measurement period.

Oracle:

•  Availability of resources in a computer system

(10)

•  Master with (many) slave(s)

How do we reach HA with MySQL?

Master

(11)

•  Mul= Master

How do we reach HA with MySQL?

Master

Slave

Master

(12)

•  Mul= Master

•  Clustering

How do we reach HA with MySQL?

Mysqld Mysqld ndbd ndbd ndbd ndbd ndbd mgmt

(13)

•  Mul= Master

•  Clustering

•  Geographical redundancy

How do we reach HA with MySQL?

Master local DC Slave local

DC

(14)

•  Scale up •  Ver=cal

•  Faster CPU/Memory/disks

•  Expensive

•  Costs mul=ply in same rate as # of nodes

•  Scale out

•  Horizontal

•  More (small) machines

•  Inexpensive

•  Par==oning/federa=ng (sharding)

(15)

•  Func=onal

•  Shard your database func=onally

•  Reads

•  Add more slaves (keep them coming!)

•  Writes

•  More disks

•  Horizontal par==oning

•  Federated par==ons

(16)

•  Breaking up tables in small parts on the same host

•  Par==oned on a column

•  Inﬁnite growth (as long as you add diskspace)

•  Less used data to slower (cheaper) disks

•  No stored procedures, func=ons, etc

•  Uneven usage of par==ons (hash par==on may help)

•  Once wrihen, data remains on the par==on

Horizontal partitioning

(17)

•  Breaking up your table in parts on mul=ple hosts

•  Par==oned on a column

•  Inﬁnite growth (as long as you add hosts)

•  Less used data on slower hosts

•  Not supported in (standard) MySQL

•  Par==oning on applica=on level (or proxy)

•  Alterna=vely: NDB

•  Uneven usage of par==ons

•  Once wrihen data (mostly) remains on the par==on

•  Parallel queries to retrieve data from all shards

Federated partitions (sharding)

(18)

•  Parallel execu=on of sequen=al jobs

•  Limited by the weakest link

•  As fast as the slowest node

•  Fix: nonsequen=al (asynchronous) execu=on

(19)

Typical LAMP stack

Client Webserver PHP MySQL Memcache Webserver PHP Loadbalancer

(20)

A-typical LAMP stack

Client Webserver PHP MySQL Memcache Webserver PHP Loadbalancer MQ Jobs

(21)

Spil Storage Platform

Abstrac2ng the storage layer

(22)

•  Dependent on one storage pla9orm

•  _{No
more
pla9orm-‐speciﬁc
query
language}

•  Diﬀeren=ate writes

•  _{Op=mis=c
(asynchronous)}

•  Pessimis=c (synchronous)

•  _{Shard
data
beher}

•  Par==on on user and func=on

•  _{Cluster
informa=on
by
users,
not
by
func=on}

•  Global expansion

•  _{Par==on
on
geographic
loca=on}

•  Solve uneven usage of data storage

•  _{Move
data
from
shard
to
shard}

•  Anything may/could/will fail eventually

•  _{Not
designed
for
the
“happy”
ﬂow}

(23)

(24)

(25)

New architecture overview

Server API Application Model Storage platform Client-side API Presentation layer

(26)

•  Everything wrihen in Erlang

•  Piqi as protocol •  binary

•  JSON

•  XML

•  SSP u=lizes local caching (memcache)

•  Flexible (persistent) storage layer •  MySQL (various ﬂavors)

•  Membase/Couchbase

•  Could be any other storage product

•  MQs (DWH updates)

(27)

•  Predictable

•  Reliable

•  Decent performance

•  Easy to comprehend

•  Excellent eco system •  Libraries

•  Monitoring tools

•  Knowledge

(28)

•  Func=onal language

•  High availability: designed for telecom solu=ons

•  Excels at concurrency, distribu=on, fault tolerance

•  Do more with less!

•  Other companies using Erlang:

(29)

•  What is the bucket model?

•  Each record has one unique owner ahribute (GID)

•  GID (Global IDen=ﬁer) iden=fying diﬀerent types

•  Bucket(s) per func=onality

•  Bucket is structured data

•  Ahributes contain data of records

•  Ahributes do not have to correspond to schema

(30)

$ curl -‐X POST -‐H 'Accept: applica=on/json' -‐H \

'Content-‐Type: applica=on/json' -‐-‐data-‐binary "{\"gid\": \

288511851128422401}" hhp://127.0.0.1:8777/demobucket/get { "records": [ { "gid": 288511851128422401, "given_name": "g", "registered_on": 1, "email": "mail1", "gender": "m",

"birthdate": { "year": 1963, "month": 6, "day": 21 } }

],

"meta_info": { "total_ct": 1 } }

(31)

CREATE TABLE demobucket (

gid bigint(20) unsigned not null, given_name varchar(64) not null,

registered_on =nyint(3) unsigned default 0, email varchar(255) not null,

gender enum(‘m’, ‘f’, ‘u’) not null default ‘m’, birthdate date not null,

PRIMARY KEY(gid) );

(32)

CREATE TABLE demobucket (

gid bigint(20) unsigned not null, user_name varchar(64) not null,

user_register =mestamp on update CURRENT_TIMESTAMP(),

user_emailaddress varchar(255) not null, user_gender char(1) not null default ‘m’, user_dob varchar(10) not null,

PRIMARY KEY(gid) );

(33)

CREATE COLUMNFAMILY demobucket ( gid int PRIMARY KEY,

given_name varchar, registered_on =mestamp, email varchar, gender varchar, birth_date varchar );

(34)

demobucket:get( #demobucket_get_input{ gid=12345, ﬁlters= [

#ﬁlter{ ahr= <<"gender">> , op= <<"=">> , parms= {string, <<"f">>}}, #ﬁlter{ ahr= <<"registered_on">>, op= <<"sort">>, parms=asc },

#ﬁlter{ ahr= <<"gid">>, op= <<"limit">>, parms={int, 10 }} ]} )

(35)

(36)

•  Nearest datacenter (DC) to the end user

•  Satellite DC

•  Processing and caching

•  Do not own/store data

•  Storage DC

•  Processing, caching and persistent storage

•  Store all same user data in same DC

•  Par==on on user globally

•  Global IDen=ﬁer per user

(37)

•  Contains GIDs and their master DC

•  GIDs master DC predeﬁned

•  Migrated GIDs get updated

(38)

•  Globally sharded on GID

•  (local) GID Lookup

How does this work?

GID lookup

Shard 1 Shard 2

Persistent storage

(39)

(40)

•  Spread data even on shards

•  Migra=on of buckets between shards

•  GID migra=on between DCs

•  Crea=ng a new storage DC needs data migra=on

•  Users will automa=cally be migrated a‚er visi=ng

another DC many =mes

(41)

•  Versioning on bucket deﬁni=ons

•  GIDs are assigned to a bucket version

•  Data in old bucket versions remain (read only)

•  New data only gets wrihen to new bucket version

•  Updates migrate data to new bucket version

•  Migrates can be triggered

(42)

Seamless schema upgrades

Demobucket v1 GID 1234 1235 1236 1237 1238 1239 name Roy Moss Jen Douglas Denholm Richmond Demobucket v2 GID name gender GID 1241 name Patricia gender f GID 1241 1235 name Patricia Moss gender f m GID 1234 1236 1237 1238 1239 name Roy Jen Douglas Denholm Richmond GID 1234 1237 1238 1239 name Roy Douglas Denholm Richmond GID 1241 1235 1236 name Patricia Moss Jen gender f m f

(43)

•  Every cluster (two masters) will contain two shards •  Data wrihen interleaved

•  HA for both shards

•  No warmup needed

•  Both masters ac=ve and “warmed up”

•  Slaves added (other DC) for HA and backup

Multi Master writes

SSP Shard 1 Shard 2

(44)

•  SPAPI is in place

•  SSP is (mostly) running in shadow mode

•  GID buckets running in produc=on

•  Ac=vity feed system ﬁrst to produc=on

•  Satellite DC in early 2013!

(45)

(46)

(47)

•  Presenta=on can be found at:

hhp://spil.com/perconalondon2012

•  If you wish to contact me: [email protected]

•  Don’t forget to rate my talk!