Big Data Router for Real-Time Analytics

Full text

(1)
(2)

Big Data Router for Real-Time Analytics

(3)

Big Data Router for Real-Time Analytics

Ba:lefield  3  Player  Sta&s&cs

•  EA  Collected  50TB/day  2013.  

•  Available  Player  Stats  sites:  

•  h?p://ba?lelog.ba?lefield.com  

•  h?p://bf3stats.com  

•  Features  per  gun/vehicle/class  

leader  boards  etc.  

•  Geo-­‐leader  boards  introduced  

when  Ba?lefield  4  was  released   November  2013.  

(4)

Big Data Router for Real-Time Analytics

Harvested  Player  Data  from  bf3stats.com

•  Roughly  2  million  player  records  

•  Each  player  record  has  1076  fields  

•  EffecOvely  a  spread  sheet  with  2  billion  cells  

 

Details:  

•  Each  player  record  has  a  field  country.  

•  Each  player  record  has  fields  for  all  assault  rifles:  

 AK-­‐74,  M416,  M16,  AEK-­‐971,  F2000,  FAMAS,  AUG-­‐A3,  KH-­‐2002,  

(5)

Big Data Router for Real-Time Analytics

Ques&on

For  each  country  &  assault  rifle:  

What  percent  of  players  have  

each  assault  rifle  as  favorite  

assault  rifle?  

 

Bf3stats  (MongoDB):  >1h  

BioCAM  RAW:  37  milliseconds  

6,56   1,57   0,00   1,00   2,00   3,00   4,00   5,00   6,00   7,00  

Favorite  Assault  Rifle  

Log10(milliseconds)  

(6)

Big Data Router for Real-Time Analytics

country_name AK-­‐74 M416 M16 AEK-­‐971 F2000 FAMAS AUG  A3 KH  2002 AN-­‐94 G3A3 SCAR-­‐L L85A2 Sweden 12,31% 20,98% 27,32% 19,13% 7,43% 3,65% 2,26% 1,87% 1,20% 2,11% 0,39% 1,34% United  States 11,19% 23,68% 25,80% 16,53% 8,05% 4,26% 2,63% 1,71% 1,45% 2,26% 0,62% 1,83% Russian  FederaOon 22,95% 12,96% 22,35% 26,44% 6,09% 1,85% 1,85% 1,76% 1,57% 1,18% 0,35% 0,66% France 11,72% 17,02% 33,34% 14,88% 8,79% 6,71% 2,15% 1,79% 0,90% 1,34% 0,35% 1,01% United  Kingdom 13,34% 21,40% 26,52% 16,34% 7,68% 4,03% 2,45% 1,65% 1,05% 1,72% 0,43% 3,40%

Extract  from  the  Analysis

(7)

Big Data Router for Real-Time Analytics

Conclusion

•  Sufficient  reporOng  speed  to  handle  high  velocity  data  flows  

•  Fast  enough  to  perform  analysis  in  real-­‐Ome  on-­‐the-­‐fly  

           

                             

BioCAM  Web  Service  

 

(8)

Big Data Router for Real-Time Analytics

BioCAM  Web  Service

•  Core  BioCAM  AnalyOcs  Engine  

•  Duda  Web  Services  Framework  (h?p://duda.io)  

•  Monkey  Web  Server  (h?p://monkey-­‐project.com)  

•  HTTP(S)/JSON  Web  Service  Interface  

•  Create  mulOple  BioCAM  instances  with  different  schemes  

•  Arbitrarily  deep  break  downs  for  various  kinds  of  analysis  

•  Each  break  down  serves  mulOple  aggregates  

•  Drill-­‐downs  naOvely  supported  from  the  Web  Service  API  

Duda       BioCAM   Monkey   HTTP/JSON  

(9)

Big Data Router for Real-Time Analytics

RTDS  (Real-­‐Time  Data  Storage)

•  NoSQL  graph  database  to  persistently  store  

generic  interconnected  objects  in  an   applicaOon  

•  Linked  directly  into  the  applicaOon  to  store  

its  state  

•  Designed  for  telecom  requirements  

•  24/7  always  low  latency  (no  maintenance  

windows!),  1+1  mirroring,  fast  switchover  and   failover,  upgrades  in  runOme  

•  Side-­‐effect:  low  overhead  and  energy  

efficient   Duda       BioCAM   Monkey   HTTP/JSON   RTDS  

(10)

Big Data Router for Real-Time Analytics

Real-­‐Time  Data  Storage  (RTDS)

•  Persistent  NoSQL  graph  database  

•  Stores  generic  interconnected  objects  in  an  

applicaOon  

•  Linked  directly  into  the  applicaOon  to  store  its  state  

•  Low  overhead   •  Energy  efficient     Duda       BioCAM   Monkey   HTTP/JSON   RTDS  

(11)

Big Data Router for Real-Time Analytics

Real-­‐Time  Data  Storage  cont.

•  Designed  for  telecom  requirements  

•  24/7  always  low  latency  

•  No  maintenance  windows  

•  1+1  mirroring  

•  Fast  switchover  and  failover  

•  Upgrades  in  runOme  

Duda       BioCAM   Monkey   HTTP/JSON   RTDS  

(12)

Big Data Router for Real-Time Analytics

RTDS  –  Internal  Workings

•  Data  is  stored  as  a  transacOon  log  

•  Proven  method,  provides  atomic  transacOons,  audit  

history  and  correctly  ordered  updates  in  hot  standby   instance  

•  Robust  in  crash  scenarios  (corrupOon  in  end  of  log  only)  

•  Self-­‐rotaOng  transacOon  log  

•  No  checkpoinOng  (as  it  introduces  latency  and  peaks  in  

CPU/RAM  resources)  

•  Background  object  traversal  of  all  objects,  writes  latest  

state  to  log,  when  complete  log  is  rotated  

•  ~1%  of  CPU,  no  latency  peaks,  no  resource  peaks,  only  

last  two  logs  required  for  restoring  complete  state  

Duda       BioCAM   Monkey   HTTP/JSON   RTDS  

(13)

Big Data Router for Real-Time Analytics

Real-­‐Time  Data  Storage  cont.

•  Default  operaOon:  asynch  without  locks  

•  Lock-­‐free  algorithms  to  get  and  commit  

transacOon  buffers  

•  Background  threads  for  log  flushing  and  

mirroring  

•  Avoids  latency  and  priority  inversions  

•  Locks  will  be  engaged  in  overload  situaOons  

•  Overhead:  one  RAM  copy  per  object  

•  For  background  traversal,  verify  state  

consistency  etc   Duda       BioCAM   Monkey   HTTP/JSON   RTDS  

(14)

Big Data Router for Real-Time Analytics

Three  companies,  one  binary!

RTDS   Duda       BioCAM   Monkey  

Monkey  Sooware  Company  

Oricane  AB  

(15)

Big Data Router for Real-Time Analytics

BioCAM  –  Internal  Representa&on

•  Records  consists  of  value  fields  and  class  fields  

•  Value  fields  are  typically  numbers  (price,  quanOty,  temperature  etc.)  

•  Three  types  of  class  fields  

•  Explicit:  color,  brand,  country  etc.  

•  Implicit:  Omestamp  falling  within  hour,  week,  month  etc.  

•  SyntheSc:  favourite  assault  rifle  

•  Class  field  values  are  mapped  to  unsigned  integers  

•  Master  key  built  by  packing  class  fields  into  a  large  unsigned  integer  

Class  field  1   Class  field  2   Class  field  3  

(16)

Big Data Router for Real-Time Analytics

Breakdown

•  MulO-­‐branch  tree  structure  

•  Each  level  corresponds  to  a  unique  class  field  

•  Not  all  class  fields  need  to  be  present  

•  Branches  corresponds  to  class  field  values  

•  The  branches  (field  values)  traversed  from  root  to  leaf  is  called  a  path  

(17)

Big Data Router for Real-Time Analytics

Breakdown  Construc&on

•  For  each  record  a  handle  is  created  

•  Each  handle  contain  a  reference  to  the  record  and  a  slave  key  

•  The  slave  key  is  an  integer  representaOon  of  path  where  field  values  

from  higher  levels  are  stored  in  more  significant  bits  

•  Array  of  handles  is  sorted  by  increasing  slave  keys  

•  Implicit  tree  structure  is  built  bo?om  up  from  the  sorted  array  

(18)

Big Data Router for Real-Time Analytics

Aggregates

•  Zero  or  more  aggregates  are  associated  with  each  breakdown  

•  Aggregate  values  are  associated  with  breakdown  nodes  and  leaves  

•  Aggregate  funcSons  are  associated  with  breakdown  levels  

•  Leaf  aggregate  values  are  computed  from  value  fields  in  the  records  

using  the  leaf  aggregate  funcOon  

•  Node  aggregate  values  are  computed  from  childrens  aggregate  values  

using  the  node  aggregate  funcion    

•  Typically  only  one  value  field  in  records  is  considered    

(19)

Big Data Router for Real-Time Analytics

Example

Country:  Sweden  (S),  Finland  (F),  Denmark  (D),  Norway  (N)   Brand:  Audi  (A),  Ford  (F),  Volvo  (V)  

Color:  White  (W),  Red  (R),  Blue  (B)   Breakdown:  Brand,  Color,  Country   Aggregate:  Sales  

(20)

Big Data Router for Real-Time Analytics

Example

A   W   R   B   D   F   N   S   D   F   N   S   D   F   N   S   F   W   R   B   D   F   N   S   D   F   N   S   D   F   N   S   V   W   R   B   D   F   N   S   D   F   N   S   D   F   N   S   Brand   Color   Country   Audi   White   Finland  

(21)

Big Data Router for Real-Time Analytics

 Tradi&onal  Analy&cs  in  Retail

1.  E-­‐receipts  sent  to  Data  Warehouse  

2.  Analysis  of  new  and  historical  data  

3.  Infrequent  reports  (once  per  week  etc.)  

 

Data  not  relevant  to  ”what’s  happening   now”  involved  in  the  analysis  

1   2  

(22)

Big Data Router for Real-Time Analytics

 Real-­‐&me  On-­‐the-­‐fly  Analy&cs  in  Retail

1.  E-­‐receipts  sent  to  Data  Warehouse  

2.  E-­‐receipts  intercepted/sent  in  real-­‐Ome  to  

BioCAM  WS  

3.  Analysis  performed  on-­‐the-­‐fly  

4.  ReporOng  in  real-­‐Ome  

 

Real-­‐Ome  monitoring,  analysis  and  reporOng  with   minimum  stress  on  the  data  warewouse  

1   4   BioCAM   Web  Service   2   3  

(23)

Big Data Router for Real-Time Analytics

Whatever  Mart,  Inc.  

The  Mul&  Tera  Dollar  Retail  Corpora&on

•  1.500  stores  distributed  across  the  globe  open  10.00-­‐18.00  

•  15.000  unique  products  when  taking  size,  color  etc.  into  account  

•   Customer  purchases  an  average  of  30  random  products  in  each  open  

store  every  second  

•  At  peak  rate  2.300  customers  purchase  45.000  products  per  second  

thus  surpassing  500.000  USD  per  second  net  sales  

•  E-­‐receipts  are  reported  immediately  to  BioCAM  Web  Service  

•  Five  different  analyses  are  performed  every  ten  seconds  

(24)

Big Data Router for Real-Time Analytics

Whatever  Mart,  Inc.  

The  Mul&  Tera  Dollar  Retail  Corpora&on

Almost  1000  billion  transacOons  since  launch  

(25)

Big Data Router for Real-Time Analytics

Benchmarks

ConfiguraOons:  

•  Web  Service  –  Access  via  Web  Service  front-­‐end  

•  Direct  access  –  Test  program  linked  with  BioCAM,  access  via  C  API  

•  Stripped  –  Direct  access  to  BioCAM  stripped  from  RTDS  

•  Four  different  data  bases  sizes  (number  of  records)  

(26)

Big Data Router for Real-Time Analytics

Aggregate  Value  Re-­‐calcula&on  Time

•  2500-­‐3000  record  transacOons  per  second  

•  Re-­‐calculaOon  speed  not  dependent  on  transacOons/second  

•  Measured  in  milliseconds  

Web  Service

Direct  Access

Stripped

35

31

29

167

153

133

804

711

650

(27)

Big Data Router for Real-Time Analytics

Transac&on  Time

Web  Service

Direct  Access

Stripped

Load  (x/s)

Time  (us)

Load  (x/s)

Time  (us)

Load  (x/s)

Time  (us)

454

1201

407

183

483

144

1463

1824

1246

161

1464

125

2510

2684

2036

143

2275

118

2930

3064

2408

132

2772

109

4568

32150

3414

128

4107

100

5975

235471

4583

120

5742

91

(28)

Big Data Router for Real-Time Analytics

(29)

Big Data Router for Real-Time Analytics

(30)

Big Data Router for Real-Time Analytics

Conclusion

•  Aggregate  value  re-­‐calculaOon  cost  linear  in  data  base  size  is  

expected  since  the  opOmized  re-­‐calculaOon  scheme  is  not  yet   implemented  

•  TransacOon  cost  completely  dominated  by  Web  Service  front-­‐end  

especially  at  higher  load  

•  Would  be  interesOng  to  bi-­‐pass  the  web  server  and  run  JSON  over  IP  

•  TransacOon  cost  for  Direct  Access  and  stripped  decreases  with  higher  

load  most  likely  due  to  reduced  context  switching  and  higher  cache   locality  

(31)

Big Data Router for Real-Time Analytics

Key  Applica&on  Area:  Gaming

•  Counter  Strike  Global  Offensive  (CSGO)  

Real-­‐Ome  StaOsOcs   Site  to  be  launched  

•  Currently  150  000  

players  on-­‐line   simultaneously  

•  Player  base  grows  

exponenOally  

•  Partnership  with  World  #1  CSGO  team  Ninjas  in  Pyjamas  (www.nip.gl)  

(32)

Big Data Router for Real-Time Analytics

Key  Applica&on  Area:  Energy

•  Oricane  is  involved  in  Cloudberry  Datacenters  

(h?p://www.cloudberry-­‐datacenters.com)  

•  Focus  is  on  energy  savings  in  data  centers  -­‐  discussions  are  slow…  

•  Oricane  want  to  address:  

•  Energy  producOon  

•  Energy  trading  

•  Embedded  applicaOons  

•  Looking  for  a  fast  paced  key  partner  with  lots  of  data  to  process  

Figure

Updating...

References

Updating...