• No results found

BIG DATA AND INVESTIGATIVE ANALYTICS

N/A
N/A
Protected

Academic year: 2021

Share "BIG DATA AND INVESTIGATIVE ANALYTICS"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

BIG DATA AND

INVESTIGATIVE ANALYTICS

The  New  Fron+er  

(2)

Table of Contents

Introduc+on  ………  3  

 

Chapter  1:  What  Is  Inves+ga+ve  Analy+cs?  ……….  4    

 

Chapter  2:  Top  Five  Requirements  for    

Inves+ga+ve  Analy+cs  ……….……….  10  

 

Chapter  3:  Case  Studies  –  Inves+ga+ve  Analy+cs  for    

Big  Data  ………..  16  

 

Summary  ………  23  

(3)

Introduction

Big Data and

Investigative Analytics

There’s  no  ques+on  that  big  data  represents  both  a  challenge  and  an   opportunity.  As  big  data  volumes  con+nue  to  explode,  businesses  will  face   challenges  in  quickly  extrac+ng  rich  insight  from  the  mountain  of  machine-­‐ generated  data  streaming  in  from  devices,  sensors,  smart  meters,  opera+onal   equipment  and  other  sources.    

 

Tradi+onal  analy+c  tools  are  oTen  not  up  to  the  job  of  allowing  users  to   interrogate  highly  diverse  types  of  big  data.  As  data  connec+ons  and  

dependencies  grow  exponen+ally,  it’s  no  longer  possible  to  capture  ac+onable   informa+on  in  a  rigid  set  of  KPIs  and  canned  reports.  To  effec+vely  manage  big   data,  companies  need  to  explore  op+ons  for  performing  richer,  real-­‐+me  data   analysis  with  far  fewer  resources.  

   

One  approach  for  doing  that  is  Inves+ga+ve  Analy+cs,  where  users  ask  a  series   of  quickly  changing,  itera+ve  ques+ons  to  figure  out  why  something  did  or  did   not  happen  and  how  to  op+mize  a  par+cular  outcome  in  the  future.  Compared   to  tradi+onal  analy+cs,  which  lack  flexibility,  inves+ga+ve  analy+cs  yields  

insight  into  ques+ons  that  haven’t  even  been  dreamed  up  yet.    

In  this  ebook,  we  will  delve  into  the  role  of  inves+ga+ve  analysis  as  it  relates  to   big  data,  technology  requirements  for  pu]ng  inves+ga+ve  analy+cs  into  

(4)

CHAPTER ONE

WHAT IS INVESTIGATIVE

ANALYTICS?

(5)

Emerging Data

Analytics Stack

Days of One-Size-Fits-All Are Gone

“Yesterday’s  BI-­‐ETL-­‐EDW  stack  is  wrong-­‐sided  for  tomorrow’s  

needs,  and  quickly  becoming  irrelevant.”  -­‐  Gigamon  

In  today’s  big  data  world,  the  one-­‐size-­‐fits-­‐all  approach  no  longer  works.  The   data  management  stack  has  transformed  into  mul+ples,  while  the  analy+c  stack   has  had  to  respond  with  individualized  tools  to  get  at  the  appropriate  data  and   func+on,  be  it  opera+onal  analy+cs,  inves+ga+ve  analy+cs  or  predic+ve  

analy+cs.  Big  data  has  created  pockets  of  specializa+on,  where  some  databases   are  great  for  warehousing  (e.g.  Hadoop),  while  others  excel  at  analy+cs.    

 

Companies  are  also  challenged  by  an  evolving  infrastructure  and  the  

prolifera+on  of  data  centers,  data  warehouses  and  data  marts.  Not  only  is  the   infrastructure  used  to  deliver  informa+on  changing,  the  data  coming  in  from  a   myriad  of  new  devices  is  also  changing  drama+cally  –  in  terms  of  speed,  type   and  volume  of  data.    

 

With  the  overwhelming  influx  of  machine-­‐generated  data  begging  to  be   analyzed,  business  users  such  as  data  scien+sts  need  real-­‐+me,  interac+ve   visualiza+on  of  their  data  and  flexible  query  crea+on.  Today,  with  the  right  mix   of  solu+ons,  businesses  are  able  to  analyze  months  worth  of  data  with  sub-­‐ second  response  +me  and  realize  extraordinary  business  value  from  performing   deep  analysis  with  queries  created  on  the  fly.    

(6)

Big Data &

The Internet of Things

A  jet  airliner  generates  20TB  of  diagnos+c  data  per  hour  of  flight.  The   average  oil  plaborm  has  40,000  sensors,  genera+ng  data  24/7.  80%  of  all   households  in  Germany  (32  million)  will  need  to  be  equipped  with  smart   meters  by  2020,  in  accordance  with  the  European  Union  market  guidelines.   These  examples  alone  represent  a  staggering  amount  of  data  that  must  be   captured,  analyzed  and  acted  upon.        

Today’s  AnalyGc  Environment:  

(7)

More  “things”  are  now  connected  to  the  Internet  than  people,  a   phenomenon  dubbed  The  Internet  of  Things.  Fueled  by  machine-­‐to-­‐ machine  (M2M)  data,  the  Internet  of  Things  promises  to  make  our  lives   easier  and  bemer,  from  more  efficient  energy  delivery  and  consump+on  to   mobile  health  innova+ons  where  doctors  can  monitor  pa+ents  from  afar.   However,  the  resul+ng  +dal  wave  of  data  streaming  in  from  smart  devices,   sensors,  monitors,  meters,  etc.,  is  tes+ng  the  capabili+es  of  tradi+onal   database  technologies.  They  simply  can’t  keep  up;  or  when  they’re   challenged  to  scale,  are  cost  prohibi+ve.  

 

Just  ten  years  ago,  the  largest  data  warehouse  in  the  world  was  30TB;   today,  petabyte-­‐sized  data  warehouses  are  common,  and  the  volumes   con+nue  to  grow.  According  to  a  2012  Informa+on  Difference  survey,  most   of  the  209  customers  surveyed  said  they  were  experiencing  data  growth  of   20-­‐50%  annually.  

(8)

Investigative

Analytics

Move from “What Happened?”...to “Why?”

Tradi+onal  analy+c  tools  are  oTen  not  up  to  the  job  of  allowing  users  to   interrogate  the  fast  moving,  highly  diverse  types  of  high-­‐volume  big  data.   As  data  connec+ons  and  dependencies  grow  exponen+ally,  it’s  no  longer   possible  to  capture  ac+onable  informa+on  in  a  rigid  set  of  KPIs  and  canned   reports.  To  effec+vely  manage  big  data,  companies  should  explore  op+ons   for  performing  richer,  real-­‐+me  data  analysis.  One  effec+ve  approach  is   inves+ga+ve  analy+cs.  

 

In  the  recent  TDWI  ebook,  Inves&ga&ve  Analy&cs:  The  New  BI  Fron&er     (June  2013),  analyst  Stephen  Swoyer  describes  the  bookends  of  the   analy+c  con+nuum  as    tradi+onal  analy+cs  and  predic+ve  analy+cs:    

§  Tradi+onal  analy+cs  puts  ques+ons  into  historical  context,  includes  

common  BI  ac+vi+es  (e.g.  reports,  dashboards,  scorecards),  and  is   mostly  SQL-­‐driven.    

§  Predic+ve  analy+cs  on  the  other  hand  uses  uses  data  mining  or  

sta+s+cal  algorithms  to  score  data  with  models  and  forecasts.  Both  of   these  approaches  answer  the  ques+on  of  “what”  –  What  happened?   What  will  happen?  

With  a  more  open-­‐ended  process,  inves+ga+ve  analy+cs,  in  comparison,   answers  the  “why:”  Why  did  it  happen?    

 

(9)

Swoyer  describes  inves+ga+ve  analy+cs  as  “an  open-­‐ended  ac+vity  that   looks  for  pamerns,  anomalies,  and  clusters  (i.e.,  for  clues)  that  can  be  used   to  formulate  ques+ons  or  which  can  be  correlated  with  events,  condi+ons,   or  phenomena.”  With  inves+ga+ve  analy+cs,  users  can    ask  a  series  of   quickly  changing,  itera+ve  ques+ons  to  figure  out  why  something  did  or  did   not  happen  and  how  to  op+mize  a  par+cular  outcome  in  the  future,  

resul+ng  in  deeper  and  richer  insight.  

OperaGonal   AnalyGcs  

IteraGve,  quickly  changing   queries  (usually  ad  hoc)  

AutomaGc     calculaGons  during  live  

transacGons   Alerts,  KPIs,   standard  reports   PredicGve   AnalyGcs   InvesGgaGve   AnalyGcs  

What  is  going  to   happen?   What    

happened?  

What  has  happened   and  why?    

(10)

CHAPTER TWO

TOP FIVE REQUIREMENTS FOR

INVESTIGATIVE ANALYTICS

(11)

Number  1:  Low  Touch  

The  extensive  effort  needed  to  fine  tune  with  indexing,  par++oning  and   sharding  can  all  get  in  the  way  of  effec+ve,  efficient  analy+cs.  In  a  +me  of  s+ll-­‐ constrained  budgets,  data  analysis  needs  to  be  affordable,  as  well  as  easy-­‐to-­‐ use  and  implement,  in  order  to  jus+fy  the  investment.  This  demands  low-­‐touch   solu+ons  that  are  op+mized  to  deliver  fast  analysis  of  large  volumes  of  data,   with  minimal  hardware,  administra+ve  effort  or  customiza+on  needed  to  set-­‐ up  or  change  query  and  repor+ng  parameters.

 

X

“The cool thing is that it can produce a new report –

which produces a new ad-hoc query – and I don’t

have to worry about performance because Infobright

takes care of all that for me.”

-­‐  Bob  Hammond,  CTO,  Jumptap   Low-­‐touch  –  minimal  DBA  requirements  with  a  self-­‐tuning  system  

(12)

Number  2:  Ad-­‐Hoc  Performance  

FricGonless  Inquiry:  Move  from  quesGon  to  answer,  quickly.  

 

In  fast-­‐paced  business  and  opera+onal  environments  (smart  grids  are  a  great   example),  intelligence  needs  change  quickly,  so  analy+c  tools  can’t  be  

constrained  by  data  schemas  that  limit  the  number  and  type  of  queries  that  can   be  performed.  Tradi+onal  data  solu+ons  like  standard,  row-­‐based  rela+onal   databases  fall  short  here,  as  they  were  designed  to  handle  single-­‐record,   structured  data.  Big  data  analysis  requires  a  flexible  solu+on  that  allows  for   unplanned,  ad-­‐hoc  querying,  and  that  doesn’t  require  a  lot  of  +nkering  or     +me-­‐consuming  manual  configura+on  –  such  as  indexing  and  managing  data   par++ons  –  to  create  and  change  analy+c  queries.    

   

Enter  fric+onless  inquiry,  where  the  path  between  ques+on  and  answer  is  void  of   rigid  structure:  when  users  reach  the  “aha!”  moment,  they’ll  have  all  the  

informa+on  needed  to  ask  the  next  ques+on  or  dig  deeper  into  data,  without  

(13)

Number  3:  Dynamic  Scalability  

Scalability:  Inherently   respond  to  increased  load   along  any  of  these  axes  –   query  performance,   number  of  users,  number   of  records/size  of  data.  

 

As  demand  for  inves+ga+ve  analysis  of  big  data  increases,  businesses  need   highly  scalable  solu+ons  that  can  handle  current  and  future  data  growth.  At   some  point,  tradi+onal,  hardware-­‐based  infrastructure  will  run  out  of  

headroom  in  terms  of  storage  and  processing  capabili+es.  However,  adding   more  data  centers,  servers  and  disk  storage  subsystems  is  expensive  to  buy   and  maintain,  crea+ng  a  situa+on  where  costs  begin  to  outweigh  the  

(14)

Number  4:  Load  Speeds  

Machine-­‐generated  data  is  loaded  very,  very  quickly  and  oTen  needs  to  be   inves+gated  within  a  short  period  of  +me  –  for  example,  a  mobile  carrier   who  wants  to  automate  loca+on-­‐based  smart  phone  offers  based  on   incoming  GPS  data.  If  it  takes  too  long  to  process  and  analyze  this  kind  of   data,  the  resul+ng  intelligence  will  fail  to  be  useful.    

 

Businesses  can’t  afford  for  data  to  get  stale.  Solu+ons  must  be  able  to   quickly  and  easily  load,  dynamically  query,  analyze  and  communicate   informa+on  quickly  enough  to  provide  for  whatever  real-­‐+me  query   processing  or  aler+ng  is  required.        

Within 60 seconds of data hitting Infobright customer

HasOffers’ tracking platform, customers are able to run

ad-hoc queries and get results that they can use to

make better business decisions in real-time.

(15)

Number  5:  Compression  

Economical  storage  of  big  data  requires  very  efficient  data  compression   within  a  network  node,  smart  device  or  even  a  massive  data  center  cluster.    

Efficient  compression  lowers  TCO,  allowing  for  less  storage  capacity  and   minimized  networking  and  hardware  investments.  In  addi+on,  efficient  data   compression  increases  the  accuracy  of  query  results  by  enabling  +ghter  data   sampling  increments  and  longer  historical  data  sets  (e.g.  accommoda+ng  for   situa+ons  like  seasonality  in  retail.)  By  capturing  more  data  at  lower  

granularity  levels  –  e.g.  one  second  vs.  one  hour  –  businesses  will  be  able  to   iden+fy  pamerns  that  exist  at  lower  levels  (which  may  have  previously  been   missed  due  to  storage  constraints.)  

(16)

CHAPTER THREE

BIG DATA, INVESTIGATIVE

ANALYTICS CASE STUDIES

(17)

Overview  

Mavenir’s  Converged  Messaging  SoluGon  

Mavenir Systems provides innovative mobile

convergence solutions that enable mobile operators

to offer subscribers new and enhanced services and

applications.

(18)

Challenges  

Mavenir

Mavenir’s  goal  was  to  drive  more  revenue  by  offering  a  solu+on  to  mobile   operators  that  allows  them  to  retrieve  detailed  SMS  records  for  customer   service  and  regulatory  compliance.  They  needed  an  analy+cs  solu+on  to:    

§  Quickly  load  and  store  large  volumes  of  detailed  data  

§  Capacity  in  excess  of  3  billion  messages  per  day  

§  Peak  periods  like  Chinese  New  Year  can  generate  over  70  

million  messages  in  an  hour  

§  Make  that  data  available  for  analysis  within  minutes  

§  Store  90  days  worth  of  data  with  a  small  hardware  footprint  

§  Handle  projected  70%  growth  rate  in  mobile  messaging  

§  Have  low  TCO  including  low  storage  and  license  costs  

   

“Data storage is a big issue for mobile operators,

and it’s only going to get more challenging as the

use of messaging continues to explode.”

(19)

SoluGon:  Infobright  Enterprise  EdiGon  (IEE)  

Mavenir

Data  Compression  

&  History  

• Keep  90  days  of  data   stored  in  less  hardware   footprint  due  to  dras+c   compression  

Ge]ng  Data  in  

and  Out  Quickly  

• 20k  records  per  second   at  peak  capacity  in   ini+al  release   • Current  itera+on  is  

100k  records  per  peak   • Projected  70%  growth  

plan  

• Load  from  event/log   files  every  5  minutes,   making  available  in   near-­‐real  +me  

Reducing  Capex  &  

Opex  

• No  indexes,  data   par++oning  or  manual   tuning  

• No  need  for  DBA   resources  to  manage   the  database  on  an   ongoing  basis   • Low  licensing  costs   • TCO  only  20%  of  the  

cost  of  compe++ve   solu+ons  

Mavenir has won major wireless carriers such as

MetroPCS, Telstra and Viettel based on this solution.

(20)

Overview  

LiveRail is the leading publisher monetization platform

for video delivering over three billion impressions –

25% of all online video ads – each month.

LiveRail

LiveRail  is  a  mul+-­‐plaborm,  real-­‐+me  video  adver+sing  ecosystem  providing:    

§  Real-­‐+me  bidding  

§  Yield  op+miza+on  

§  Ad  serving  analy+cs  

(21)

Challenges  

LiveRail

With  a  growing  roster  of  customers  –  including  PBS,  MLB.com  and  CBS   Interac+ve  –  LiveRail  was  faced  with  managing  increasingly  large  data   volumes  and  a  need  to  provide  clients  with  near  real-­‐+me  access  to  this   informa+on  for  repor+ng  and  ad-­‐hoc  analysis.  

 

§  10  billion  monthly  video  ad  opportuni+es    

§  2  billion  data  points  each  day  

§  Dozens  of  engagement  metrics  including  percentages  

§  Viewed/completed  

§  Pause/resume  

§  Mu+ng  

 

Publishers  needed  the  ability  to  drill  down  with  near  real-­‐+me  access  to   determine  op+mal  video  length,  as  well  as  determine  whether    there  is  a   correla+on  between  comple+on  rates  and  ad  frequency.  

   

“Infobright gives our customers the ability to do

fast, ad-hoc analysis against the extensive video

advertising data.”

 

(22)

SoluGon:  Infobright  IEE  +  Hadoop  

LiveRail recognized with

Computerworld Data+ Award

LiveRail

Data  Compression  &   History   • 25X  space  reduc+on   Or     • 25X  more  history   online   Analyzing  Data   Quickly   • 20,000  ad-­‐hoc/real-­‐ +me  reports  per  day   run  by  customers   • Reports  that  used  to  

take  two  to  three   minutes  now  take   seconds  

Reducing  Capex  &   Opex  

• No  indexing  or  tuning   required  

• Fewer  servers  or   storage  disk  required   • Lower  licensing  costs  

than  alterna+ves   • Low-­‐touch,  simple  

(23)

In Summary

Big Data and

Investigative Analytics

Big  data  demands  a  big  change  in  thinking.  Companies  that  maintain  their  status   quo  of  analy+cs  technologies  and  processes  will  find  themselves  spending  

progressively  more  money  on  servers,  storage  and  DBAs  –  an  approach  that’s   difficult  to  sustain  and  s+ll  presents  the  risk  of  not  ge]ng  the  needed  answers.      

Gone  are  the  days  of  simply  seeking  the  “what”  from  an  analy+cs  solu+ons.   Today,  companies  can  –  and  need  –  to  know  why.  Inves+ga+ve  analy+cs  are  the   key  to  revealing  pamerns  of  behavior  or  insights  to  immediately  take  ac+on  on,   and  either  capitalize  on  or  prevent  in  the  future.      

 

To  extract  rich,  real-­‐+me  insight  from  the  onslaught  of  machine-­‐generated  data,   companies  require  a  technology  founda+on  characterized  by  five  requirements:      

§  Low-­‐touch  administra+on  

§  Flexible,  ad-­‐hoc  querying  

§  Dynamic  scalability  

§  Fast,  reliable  performance  

§  Efficient  compression  

   

When  there’s  more  and  more  data  to  mine,  inves+ga+ve  analy+cs  cut  through   the  clumer  with  precision,  ensuring  accurate,  immediate  results,  even  as  

machine-­‐generated  data  grows  to  the  petabyte  scale…  and  beyond.  By  

maximizing  insight  into  data,  companies  can  make  bemer  decisions  at  the  speed   of  business,  thereby  reducing  costs,  iden+fying  new  revenue  streams,  and  

(24)

HAVE QUESTIONS?

24  

See how

JDSU

and others

are using Infobright to meet their

investigative analytics needs and

drive business value.

Find  us  on  the  web:    www.infobright.com   Contact  us:    877-­‐596-­‐2483  /  [email protected]  

References

Related documents