• No results found

Telemetry: The Customer Experience

N/A
N/A
Protected

Academic year: 2021

Share "Telemetry: The Customer Experience"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Copyright  ©  2014  Splunk  Inc.  

Simon  Warrington  

Senior  Program  Manager,  

Microso@  

Telemetry:  The  

(2)

Disclaimer  

2  

During  the  course  of  this  presentaGon,  we  may  make  forward-­‐looking  statements  regarding  future  events  or  the   expected  performance  of  the  company.  We  cauGon  you  that  such  statements  reflect  our  current  expectaGons  and  

esGmates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For   important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,   please  review  our  filings  with  the  SEC.  The  forward-­‐looking  statements  made  in  the  this  presentaGon  are  being  made  as  

of  the  Gme  and  date  of  its  live  presentaGon.  If  reviewed  a@er  its  live  presentaGon,  this  presentaGon  may  not  contain   current  or  accurate  informaGon.  We  do  not  assume  any  obligaGon  to  update  any  forward-­‐looking  statements  we  may   make.  In  addiGon,  any  informaGon  about  our  roadmap  outlines  our  general  product  direcGon  and  is  subject  to  change  

at  any  Gme  without  noGce.  It  is  for  informaGonal  purposes  only,  and  shall  not  be  incorporated  into  any  contract  or   other  commitment.  Splunk  undertakes  no  obligaGon  either  to  develop  the  features  or  funcGonality  described  or  to  

(3)

Agenda  

!  

Telemetry  Defined  

!  

The  Splunk  Journey  

!  

Architecture  

(4)

Microso@  Xbox    

Microso@    

–  $77.8B  FY  2013  

–  99K+  employees  

Xbox  One  

–  All-­‐in-­‐one  entertainment  

console  

–  5  MM+  sold  

Xbox  Entertainment  Studio  

–  Video-­‐based  applicaGons  for  

sports,  live  events  and  original  

narraGve  content  

(5)

About  Me  

!  

Senior  Program  Manager  

!  

Microso@  employee  for  2  years  

!  

IBM  Enterprise  Architect  for  13  years  

(6)

Xbox  Entertainment  Studio  (XES)  Charter  

!  

Showcase  Xbox  capabiliGes  

and  break  new  ground  

!  

Provide  content  and  

experiences  only  found  on  

Xbox  One  

!  

Influence  sales  and  console  

usage  

(7)
(8)

What  is  Telemetry?  

8  

!  

Telemetry  is  the  highly  automated  communicaGons  process  by  

which  measurements  are  made  and  other  data  collected  at  remote  

or  inaccessible  points  and  transmihed  to  receiving  equipment  for  

(9)

Understanding  the  Customer  Experience  

!  

Gain  “last  mile”  insights  in  

real-­‐Gme  

!  

Correlate  errors  or  

performance  characterisGcs  

across  Xbox  and  cloud  

ecosystems    

!  

Gain  visibility  into  occurrence  

and  source  of  outages  

(10)

Telemetry:  The  “So  What?”  Test  

10  

Thick  Client  

Thin  Client  

Monitor  Server  Logs  

What  are  users  doing  

out  there???  

(11)
(12)
(13)

The  Splunk  Journey  

2012  

2013  

2014  

T  <  2012  

Splunk    

Storm  

Console  

(14)

Previous  BI  SoluGon:  Key  Challenges  

14  

•  Homegrown, brittle BI solution

•  Schema-driven, very rigid

•  Difficult to accommodate changes

•  Needed more stability and reliability

•  Difficult to ingest data

•  Drain on engineering resources

Disjointed  picture  

of  the  customer  

experience  

(15)

The  Splunk  Promise  

Cri5cal  informa5on  available  in  

real-­‐5me  

Powerful  5me-­‐series  analy5cs  

Access  to  granular  data  

Robust  aler5ng  

Telemetry  

logs  

“Just  fire  the  hose  at  Splunk  

and  it  has  the  intelligence  to  

…understand  what  those  key  

(16)
(17)

2012:  Legacy  System  

BP Legacy  System System  2 Cloud   Service Vertica Replication Staging Azure “VHD” blobs Proxy

SQL  Azure Proxy Staging

Capture   Service Blobs Blobs Blobs Azure   Connect Staging ABC DEF GHI ABC DEF GHI SSIS App0 Studio Cloud Database On-­‐Site SQL  DB  6 SQL  DB  5 SQL  DB1 SQL  DB0 SQL  DB2 “Buffer   level WWW” Asp.net   REST Data  Mart SSIS

BI  Subscriber  (temp) ABC...

App8 Report  Server bp-­‐bisql05 BI  Team query Splunk  Storm   Forwarder  Svc VM-­‐01 VM-­‐02 Vm-­‐03 Vm-­‐04 ABC  Omniture  Svc DEF  Dashboard  Svc Purchasing  Data DEF  Purchasing  Svc Reports Reports Alerts Purchase  Data Refund  Tool REST  API Live  Telemetry Capture  Layer App2 App4 App1 App3 App5 App6 App7 Reports  

(18)

2013  Architectural  Context  

18  

Splunk  Storm  +  Apache  Storm  +  Ducksboard  +  Hadoop  

Splunk  Storm  

–  Limited  dashboard  access  

–  Limited  real-­‐Gme  queries  

Splunk  was  unproven:  we  needed  

some  redundancy  

Other  systems  allowed  Splunk  to  

focus  on  troubleshooGng  

Azure  Cloud  Services  

Se

rv

ic

e  B

us

 

Apache  Storm  

Partner  Feeds  

Splunk  Storm  

Hadoop  

Azure  Storage  

(19)

High  Level  Architecture  

Splunk  Enterprise  in  Microso@  Cloud  –  Azure  

Se

rv

ic

e  B

us

 

Ops  

Team  

(20)

Cluster  Topology  

20  

Splunk  Enterprise  in  Microso@  Cloud  –  Azure  

Azure6  Cloud  Services  

Ops  

Team  

Region  1  

Region  2  

Region  3  

Services  

Team  

(21)

Splunk  Specs  

!  

Data  sources  

–  XBOX  360  Telemetry  logs  

–  XBOX  One  Telemetry  logs  

–  Smartglass  Telemetry  logs  

–  Win  8  /  Win  Phone  8  

–  Cloud  Services  

 

!  

Indexing    

— 

Average:  75G/day  

— 

Peak:  250G/day  

!  

Stakeholders  

– 

Engineering,  BI,  IT  OperaGons  

(22)

2014  Architectural  Context  

22  

Splunk  Enterprise  +  Hadoop  

Splunk  Enterprise  

–  OperaGons  all  up  

Splunk  capabiliGes  proven  

–  Simultaneously  supports  mulGple  

teams  &  data  perspecGves.  

–  Access  to  data  near  real-­‐Gme  

–  Ever  growing  Xbox  One,  Xbox  360,  

Win  8,  Win  Phone  8,  Cloud  Services  

&  Smartglass  

Infrastructure  dramaGcally  

simplified!  

Azure  Cloud  Services  

Se

rv

ic

e  B

us

 

Partner  Feeds  

Splunk  6.0  

Hadoop  

(23)

Impact  of  Ops  on  

User  Experience  

(24)

Xbox  Telemetry:  Splunk-­‐eye-­‐view  

24  

2014-09-06 01:34:14.9-0000 / “XXXXXXXXXXXXXXXX”/ nfl_x1 / video_heartbeat /

message_id=a4095c7a5e204ffa869e776a7312d924, appsource=nfl_x1, clientip=XX.XX.XX, log_time=2014-09-06 01:34:15.2-0000, event_type=video_heartbeat, video_clip_type=clip, video_name="Underrated week 1 matchups", device_type=DurangoApp,

video_session_id=498C8DEE-6F02-4F8C-B55A-BBB0172D8BEF, event_name=video_heartbeat, session_id=C0F903B2-AC5B-4CC9-996D-4A00247DF6B3, content_version=1.8.1.44986,

video_buffer_seconds=59, video_length=182.624, video_progress=162.8571785, video_bitrate=583000, app_channel=nflnow, video_avg_receive_rate=6045924,

video_buffer_progress_percent=100, video_min_bitrate=100000, video_receive_rate=3834704, tx_sequence=5454, video_total_dropped_frames=0, primary_video_id=2c42dcd2-c98f-4e2c-9995-e7a1f34dec64, video_playback_speed=1, mode=ViewFullScreenLandscape,

video_player_state=Playing, video_start_bitrate=600000, video_channel=nflnow, heartbeat_type=secondary, video_render_fps=59.9460334777832, authentication=n, video_max_bitrate=600001, date_time=2014-09-06T01:34:17, video_dropped_fps=0,

video_cc_enabled=n, video_id=dd49c4bb-b47c-4582-aea6-cc32a6c5e698, video_stream_url="http:// fvodhstream-vh.akamaihd.net/i/films/2014/nfl_com/fantasy/reg/

01/140905_fantasy_live_wk1_underrated_matchups_\,180k\,320k\,500k\,700k\,1200k\,2000k\,3200k\, 5000k\,.mp4.csmil/master.m3u8", primary_video_category=clip, sub_session_id=0,

(25)

Xbox  Telemetry:  Ops-­‐eye-­‐view  

Error  Percentages  over  CCU  

External  dependency  resoluGon  

Number  of  users  

In  ApplicaGon  

Watching  Video  

(26)

Same  data,  different  perspecGves  

26  

Different  PerspecGves    

for  different  stakeholders  

(27)

Real-­‐Gme  OperaGonal  Insights  

!  

PopulaGon  PerspecGve  

–  Concurrent  users/sessions  within  an  

applicaGon/video  event  

–  %  of  errors  by  type  and  by  populaGon  

ê

Trigger  threshold  alerts  

!  

Individual  PerspecGve  

–  User  session-­‐level  troubleshooGng  

!  

System  PerspecGves  

(28)

Real-­‐Gme  OperaGonal  VerificaGon  

28  

!  

Release  Management  Process  

–  Can  deploy  new  applicaGons  and  watch  adopGon  

rates  on  mulGple  plaworms  in  real  Gme  

–  Can  update  any  exisGng  applicaGon  

ê

Validate  Telemetry  

ê

Verify  no  user  drop  off  

ê

Quickly  assess  before  and  a@er  behavior  

(29)

Fast  TroubleshooGng  –  Examples  

Partner  

Customer  

Internal  

Problem  

Xbox  Gtle  not  

working  

Poor  video  quality  for  

UFC  PPV  

ConfiguraGon  defects  

Symptoms  

Spike  in  external  

dependency  

failure  events  

Video  buffering  Gme-­‐

out  excepGons  across  

user  sessions  

Concurrent  Video  

Viewer  trend  line  

drops  off  

Issue  

ISP  outage  in  NY  

Simultaneous  Newlix  

(30)

Proac5ve  Aler5ng  

and  Troubleshoo5ng  

BeDer  Resource  

Alloca5on  

Results  with  Splunk  

30  

Opera5onal  Insights  

Maintenance:  from  16  hours  

per  DAY  to  8  hours  per  

month  

BI  team  no  longer  

responsible  for  

troubleshooGng  

Engineers  focused  on  

building  applicaGons  

Issue  noGficaGon:  from  

15-­‐20  minutes  to  seconds  

Eliminated  false  posiGve  

alerts  

Faster  to  determine  source  

of  issues:  partner  or  

internal  

CreaGng  dashboards:  from  1  

week  to  hours  

Capacity  forecasGng  

App  usage  and  trends  

 

 

 

(31)

Key  Takeaways  

!  

Cloud  based  Splunk  implementaGon  

–  Cloud  Disk  I/O  

–  Universal  Forwarder  vs  other  integraGon  opGons  

–  Load  Balancing  across  the  index  cluster  

–  Load  Balancing  Search  heads  for  different  stakeholders  

–  Other  LimitaGons  

(32)

References

Related documents

The permeability is to a great extent related to the properties of the sediment and the contents of the gas hydrate. These effects were examined by a parametric study with

To be concrete, as the dominance of Malaysian scholars in other IFS hubs suggests, the strength of domestic circuits of Shari’a knowledge and authority can serve as starting

The specification or measurement of the quality in use of a particular product should identify the overall goal of the user, specific sub-goals, the relevant context of use,

Important factors that could cause actual results to differ materially from those in the forward-looking statements include: (a) material adverse changes in aluminum

Drawing from stories of how Spark’s grantees leverage the methodology of social justice feminism, this article provides a path for third- wave influence on the law in practice

Juniper is bringing to bear a range of technologies to give service providers deep insight into their customers’ behaviors, and then combining that with network virtualization to

 If  you  want  to  fashion  a  truly  positive  customer   experience,  you  need  to  think  beyond  CRM  and  consider  how  CRM,  which  serves  more  as  a

Yet many insurers have put together comprehensive strategies to improve the customer experience and have specific plans in place for improving inbound