Copyright © 2014 Splunk Inc.
Simon Warrington
Senior Program Manager,
Microso@
Telemetry: The
Disclaimer
2
During the course of this presentaGon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauGon you that such statements reflect our current expectaGons and
esGmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in the this presentaGon are being made as
of the Gme and date of its live presentaGon. If reviewed a@er its live presentaGon, this presentaGon may not contain current or accurate informaGon. We do not assume any obligaGon to update any forward-‐looking statements we may make. In addiGon, any informaGon about our roadmap outlines our general product direcGon and is subject to change
at any Gme without noGce. It is for informaGonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligaGon either to develop the features or funcGonality described or to
Agenda
!
Telemetry Defined
!
The Splunk Journey
!
Architecture
Microso@ Xbox
•
Microso@
– $77.8B FY 2013
– 99K+ employees
•
Xbox One
– All-‐in-‐one entertainment
console
– 5 MM+ sold
•
Xbox Entertainment Studio
– Video-‐based applicaGons for
sports, live events and original
narraGve content
About Me
!
Senior Program Manager
!
Microso@ employee for 2 years
!
IBM Enterprise Architect for 13 years
Xbox Entertainment Studio (XES) Charter
!
Showcase Xbox capabiliGes
and break new ground
!
Provide content and
experiences only found on
Xbox One
!
Influence sales and console
usage
What is Telemetry?
8
!
Telemetry is the highly automated communicaGons process by
which measurements are made and other data collected at remote
or inaccessible points and transmihed to receiving equipment for
Understanding the Customer Experience
!
Gain “last mile” insights in
real-‐Gme
!
Correlate errors or
performance characterisGcs
across Xbox and cloud
ecosystems
!
Gain visibility into occurrence
and source of outages
Telemetry: The “So What?” Test
10
Thick Client
Thin Client
Monitor Server Logs
What are users doing
out there???
The Splunk Journey
2012
2013
2014
T < 2012
Splunk
Storm
Console
Previous BI SoluGon: Key Challenges
14
• Homegrown, brittle BI solution
• Schema-driven, very rigid
• Difficult to accommodate changes
• Needed more stability and reliability
• Difficult to ingest data
• Drain on engineering resources
Disjointed picture
of the customer
experience
The Splunk Promise
Cri5cal informa5on available in
real-‐5me
Powerful 5me-‐series analy5cs
Access to granular data
Robust aler5ng
Telemetry
logs
“Just fire the hose at Splunk
and it has the intelligence to
…understand what those key
2012: Legacy System
BP Legacy System System 2 Cloud Service Vertica Replication Staging Azure “VHD” blobs ProxySQL Azure Proxy Staging
Capture Service Blobs Blobs Blobs Azure Connect Staging ABC DEF GHI ABC DEF GHI SSIS App0 Studio Cloud Database On-‐Site SQL DB 6 SQL DB 5 SQL DB1 SQL DB0 SQL DB2 “Buffer level WWW” Asp.net REST Data Mart SSIS
BI Subscriber (temp) ABC...
App8 Report Server bp-‐bisql05 BI Team query Splunk Storm Forwarder Svc VM-‐01 VM-‐02 Vm-‐03 Vm-‐04 ABC Omniture Svc DEF Dashboard Svc Purchasing Data DEF Purchasing Svc Reports Reports Alerts Purchase Data Refund Tool REST API Live Telemetry Capture Layer App2 App4 App1 App3 App5 App6 App7 Reports
2013 Architectural Context
18
Splunk Storm + Apache Storm + Ducksboard + Hadoop
•
Splunk Storm
– Limited dashboard access
– Limited real-‐Gme queries
•
Splunk was unproven: we needed
some redundancy
•
Other systems allowed Splunk to
focus on troubleshooGng
Azure Cloud Services
Se
rv
ic
e B
us
Apache Storm
Partner Feeds
Splunk Storm
Hadoop
Azure Storage
High Level Architecture
Splunk Enterprise in Microso@ Cloud – Azure
Se
rv
ic
e B
us
Ops
Team
Cluster Topology
20
Splunk Enterprise in Microso@ Cloud – Azure
Azure6 Cloud Services
Ops
Team
Region 1
Region 2
Region 3
Services
Team
Splunk Specs
!
Data sources
– XBOX 360 Telemetry logs
– XBOX One Telemetry logs
– Smartglass Telemetry logs
– Win 8 / Win Phone 8
– Cloud Services
!
Indexing
—
Average: 75G/day
—
Peak: 250G/day
!
Stakeholders
–
Engineering, BI, IT OperaGons
2014 Architectural Context
22
Splunk Enterprise + Hadoop
•
Splunk Enterprise
– OperaGons all up
•
Splunk capabiliGes proven
– Simultaneously supports mulGple
teams & data perspecGves.
– Access to data near real-‐Gme
– Ever growing Xbox One, Xbox 360,
Win 8, Win Phone 8, Cloud Services
& Smartglass
•
Infrastructure dramaGcally
simplified!
Azure Cloud Services
Se
rv
ic
e B
us
Partner Feeds
Splunk 6.0
Hadoop
Impact of Ops on
User Experience
Xbox Telemetry: Splunk-‐eye-‐view
24
2014-09-06 01:34:14.9-0000 / “XXXXXXXXXXXXXXXX”/ nfl_x1 / video_heartbeat /
message_id=a4095c7a5e204ffa869e776a7312d924, appsource=nfl_x1, clientip=XX.XX.XX, log_time=2014-09-06 01:34:15.2-0000, event_type=video_heartbeat, video_clip_type=clip, video_name="Underrated week 1 matchups", device_type=DurangoApp,
video_session_id=498C8DEE-6F02-4F8C-B55A-BBB0172D8BEF, event_name=video_heartbeat, session_id=C0F903B2-AC5B-4CC9-996D-4A00247DF6B3, content_version=1.8.1.44986,
video_buffer_seconds=59, video_length=182.624, video_progress=162.8571785, video_bitrate=583000, app_channel=nflnow, video_avg_receive_rate=6045924,
video_buffer_progress_percent=100, video_min_bitrate=100000, video_receive_rate=3834704, tx_sequence=5454, video_total_dropped_frames=0, primary_video_id=2c42dcd2-c98f-4e2c-9995-e7a1f34dec64, video_playback_speed=1, mode=ViewFullScreenLandscape,
video_player_state=Playing, video_start_bitrate=600000, video_channel=nflnow, heartbeat_type=secondary, video_render_fps=59.9460334777832, authentication=n, video_max_bitrate=600001, date_time=2014-09-06T01:34:17, video_dropped_fps=0,
video_cc_enabled=n, video_id=dd49c4bb-b47c-4582-aea6-cc32a6c5e698, video_stream_url="http:// fvodhstream-vh.akamaihd.net/i/films/2014/nfl_com/fantasy/reg/
01/140905_fantasy_live_wk1_underrated_matchups_\,180k\,320k\,500k\,700k\,1200k\,2000k\,3200k\, 5000k\,.mp4.csmil/master.m3u8", primary_video_category=clip, sub_session_id=0,
Xbox Telemetry: Ops-‐eye-‐view
Error Percentages over CCU
External dependency resoluGon
Number of users
•
In ApplicaGon
•
Watching Video
Same data, different perspecGves
26
Different PerspecGves
for different stakeholders
Real-‐Gme OperaGonal Insights
!
PopulaGon PerspecGve
– Concurrent users/sessions within an
applicaGon/video event
– % of errors by type and by populaGon
ê
Trigger threshold alerts
!
Individual PerspecGve
– User session-‐level troubleshooGng
!
System PerspecGves
Real-‐Gme OperaGonal VerificaGon
28
!
Release Management Process
– Can deploy new applicaGons and watch adopGon
rates on mulGple plaworms in real Gme
– Can update any exisGng applicaGon
ê
Validate Telemetry
ê
Verify no user drop off
ê
Quickly assess before and a@er behavior
Fast TroubleshooGng – Examples
Partner
Customer
Internal
Problem
Xbox Gtle not
working
Poor video quality for
UFC PPV
ConfiguraGon defects
Symptoms
Spike in external
dependency
failure events
Video buffering Gme-‐
out excepGons across
user sessions
Concurrent Video
Viewer trend line
drops off
Issue
ISP outage in NY
Simultaneous Newlix
Proac5ve Aler5ng
and Troubleshoo5ng
BeDer Resource
Alloca5on
Results with Splunk
30