• No results found

Next- generation Data Management Platforms for Predictive Analytics

N/A
N/A
Protected

Academic year: 2021

Share "Next- generation Data Management Platforms for Predictive Analytics"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Next-­‐generation  Data  Management  Platforms  for  

Predictive  Analytics  

Dr.  Michael  Zeller  of  Zementis  interviews  Dr.  Raghu  Ramakrishnan  of  Microsoft  

 

Raghu  Ramakrishnan,  Ph.D.


Technical  Fellow;  Head,  Big  Data   Platforms  and  Cloud  Information   Services  Lab


Microsoft

Dr.  Ramakrishnan  shares  his  

perspectives  on  the  future  of  machine   learning  and  artificial  intelligence,   and  describes  how  cloud  architectures   can  provide  advantages  when  working   with  big  data  and  predictive  analytics.   He  also  highlights  some  of  the  work   that  he  and  his  team  at  Microsoft  are  

doing  in  predictive  analytics  using  Microsoft's  Azure   cloud.  

MZ:   Hello,   and   welcome   to   this   Zementis   video   chat   on   the  topic  of  real-­‐‑world  use  cases  for  predictive  analytics,   a  very  exciting  topic.  Today  it  is  my  pleasure  to  welcome   Dr.  Raghu  Ramakrishnan  to  share  his  perspectives  about   predictive   analytics,   data   mining   and   machine   learning,   and  today  with  a  particular  focus  on  the  next  generation   of   data   management   platforms.   Not   only   will   we   cover   the   data   science   and   the   technology  behind   the   next-­‐‑ generation  platforms,  but  we’ll  also  explore  the  business   implications  of  these  technologies.  

But,   before   we   begin,   please   allow   me   to   introduce   Dr.   Ramakrishnan,  who  has  a  rich  professional  background   that   straddles   (the)   corporate   as   well   as   academic   worlds.   Since   2012,   he   has   been   with   Microsoft,   where   he   leads   their   Cloud   and   Information   Services   Lab.   He   also   leads   the   engineering   for   Big   Data   Platforms   and   serves  as  a  Technical  Fellow.  He  came  to  Microsoft  from   Yahoo!,  where  he  served  as  a  Yahoo!  Fellow  for  6  years.   Prior   to   Yahoo!,   he   founded   QUIQ,   a   company   that  

developed   innovative   collaborative   customer   support   and  knowledge  management  solutions.    

Previously,  Raghu  was  a  Professor  of  Computer  Sciences   at   The   University   of   Wisconsin,   in   Madison,   and   his   teaching   and   research   focused   on   database   systems,   emphasizing   data   retrieval   and   integration,   integration   analysis,   and   mining.   And   he   often   collaborated   with   researchers  in  industry.  

He   and   his   group   developed   scalable   algorithms   for   clustering,   decision-­‐‑tree   construction,   and   itemset   counting,   and   they   were   among   the   first   to   investigate   mining   of   continuously   evolving   and   streaming   data,   a   very   hot   topic   today.   His   work   on   query   optimization   found  its  way  into  several  commercial  database  systems,   and  his  work  on  extending  SQL  to  deal  with  queries  over   sequences  influenced  the  design  of  window  functions  in   SQL  99.  

As   if   this   is   not   enough,   he  also   serves   in   ACM1  SIGKDD   (the   ACM   Special   Interest   Group   on   Knowledge  

                                                                                                               

(2)

Discovery   and   Data   Mining2),   where   he   has   received   numerous   awards,   and   he   co-­‐‑authored   one   of   the   most   popular   textbooks   on   Database   Management   Systems,   also   known   as   the   "Cow   Book”,   and   it's  currently   in   its   third  edition.  

So,  Welcome,  Raghu.  

RR:  Thank  you,  Michael.  

MZ:  A  pleasure  to  have  you  here!  

RR:  Likewise.  A  pleasure  to  be  here!  

MZ:  Let  me  start  by  citing  a  recent  article  in  the  Harvard   Business  Review3.  It  really  caught  my  attention,  since  it   seems   to   offer   a   very   interesting   statistic   that   seems  to   frame   the   debate…   the   current   debate   about   prospects   for   machine   learning  and   AI.   So   let   me   share   with   you:   the  authors  surveyed  a  large  group  of  CTOs  and  CIOs,  as   well   as   other   senior   executives,   and   asked   what   percentage  of  them  believed  that  technology  will  be  able   to   capture   and   meaningfully   utilize   critical,   experience-­‐‑ based  knowledge.  (we  call  that  “deep  smarts”  maybe  in   reference   to   deep   learning   technologies   that   we   look   at   as   scientists).   But,   surprisingly,   71%,   71%!   of   the   executives   gave   the   answer   “Partially”,   while   only   4%,   only   4%!   said   it's   “Almost   Completely”.   So,   this   doesn’t   really   seem   like   a   very   strong   vote   of   confidence   for   machine  learning  and  AI!      

And,   if   I   may   compress   that,   in   comparison   to   Stephen   Hawking’s   perspective   (of   course   that's   the   other   extreme   here),   and   he   said   “The   development   of   full   artificial   intelligence   could   spell   the   end   of   the   human   race”4.   Of   course   that's   a   little   controversial,   but   I   wonder   if   in   the   business   world   our   technology   executives   –   CTOs   and   CIOs   –   may   not   really   grasp   the   full   impact   of   all   the   technology   in   deep   learning,   machine  learning,  predictive  analytics  that  we  see  today.   So,  what  is  your  perspective  on  that?  

                                                                                                               

2  http://www.sigkdd.org/

3  https://hbr.org/2014/11/artificial-­intelligence-­cant-­replace-­

hard-­earned-­knowledge-­yet  

4http://www.bbc.com/news/technology-­30290540

RR:   So,   I   think   there   a   two   distinct   phenomena   at   play   here,   Mike.   First   off,   if   you   look   at   companies   like   the   web   companies,   their   basic   task   is   to   either   present   content   the   users   are   looking   for,   or   to  monetize   that   user’s   attention  by   showing   them   an   ad.   And   both   of   these  involve  selecting  from  many  choices.  In  search,  you   are  selecting   from   many  different   URLs   you   could   display  as  a  search  result.  In  any  kind  of  portal  scenario,   you   are   trying   to   choose   from   a   range   of   articles   you  could   show.   Or   ads,   of   course,   you   are   selecting   which   to   show.   These   are,  fundamentally,   tasks   where   we  are  predicting  what  the  user  will  respond  to.    

So,  they   needed   to   be   good   at   prediction   as   a   condition   for  being   in   business,   right?   And   they   also   had   access   to  enormous  amounts   of   data,   the   logs   of   what   people   did  when  you  showed  them  an  article  or  when  they  saw   an   ad.   And   you  could  learn   from   this   and   it   could   get   better   and   the   economic   incentive   was   so   high,   they   invested  heavily  and  they  made  it  work.  This  has  been  a   journey  10  years  in  the  making.    

Today,   there   is   a   certain   part   of   the   world   that   totally   understands  the  value  of  data  and  data-­‐‑driven  action.  If   you   take   the   typical   enterprise,   they   understand   the   value   of   data   as   well,   but   from   a   slightly   different   standpoint.   For   one   thing,   they   all   understand   that   transactional   systems   are   the   backbone,   the   bookkeeping.   But   when   you   consider   using  that   data   to   guide   future   actions,   future   decisions,   they   still   for   the   most   part   leveraging   reporting   and   retrospective   analysis.    

They   are   now   in   the   early   days   of   using   data   to   look   ahead,  and  some  of  what  we  hear  is  this  early  phase  in   the   journey.   Some   of   what   we   hear   is   just   the   plain   difficulty   of   operating   on   large   amounts   of   data   to   gain   insight.  If  you  take  the  web  companies  again,  their  stakes   are   so   high,  a   tiny   percentage  increase   in  click   rates   means  hundreds  of  millions  of  dollars.  So  they  were  able   to  invest.    

Where   the   gain   is   not   quite   that   sharp,   it   takes   a   lower   barrier  to  success  to  get  more  people  to  be  engaged,  and   right  now  frankly,  the  burden  of  setting  up  systems  that  

(3)

bring  together  a  lot  of  data,  maintain  them,  allow  you  to   cleanse   them,  allow   you   to   bring   business   logic   to   bear   on  them  and  make  meaningful  decisions,  is  simply  high   and   there   aren't   enough   qualified   people   to   go   around.   It's   not   that   enterprises   are   not   desirous   of   doing   this.   There   is   a   practical   gap   in   the   number   of   people   who   are  available…  who  can  help  in  all  phases  of  this  task.  

MZ:   Excellent,   excellent   point   here.   I   absolutely   agree.   You   look   at   the   companies   that   have  embraced   the   technology,   you   look   at   maybe   manufacturing   companies,  which  are  just  starting  to  embrace  it,  but  the   Internet   of   Things   is   (I   think)   one   of   the   hottest   topics   today.  So,  you  brought  up  exactly  a  very  good  point.   Let’s   turn   a   little   bit   to   the   technology   and   the   architecture   for   a   moment.  You   have,   over   the   years,   worked   with   many   different   platform   technologies   in   your  career,  and  in  recent  years  we  really  have  seen  the   emergence   of   cloud   as   a   viable   architecture.   So,   in   that   context,   what   does   an   enterprise-­‐‑scale   global   cloud   architecture   bring   to   the   table   for   data   analytics,   for   lowering   the   barrier   of   entry   for   predictive   models,   for   machine  learning  and  for  big  data  in  general?  

RR:   It's   a   great   question.   I   think  the   cloud   has   the   potential   to   greatly  simplify   some   of   the   challenges.   So,   as  opposed   to   buying   a   collection   of   machines,   installing  software   and   then   administering   them,   you   have   an   option  of   renting   that   software   preinstalled,   operational   with   effectively   a   bunch   of   administrators  who   operate   the   servers   for   you.   This   takes   away   some   of   the   mechanical   burdens.   It   could   also  in  principle  take  away  some  of  the  more  challenging   operational   concerns,   like   security   (and)   compliance   certifications.   At   the   end   of   the   day   you   still   have   the  opportunity  and   the   responsibility   of   understanding   your  business,  understanding  your   data,   bringing  your   tools  to  bear  to  make  sense  of  it  and  to  get  some  insight.   But,   what   the   cloud   can   do,   I   think,   is   reduce   some   of   the  avoidable  friction.  However,  there  is  another  thing  to   keep  in  mind.    

Historically,   people   think   in   terms   of   data   warehousing   as   part   of   the   journey   towards   insights.   And   data  

warehousing   for   the   past   20   years   has  meant   relational   data  warehousing.  Today  that's  no  longer  the  case.  While   relational   warehouses  continue  to   thrive,   increasingly   people  want   to   look   at   any   and   all   forms   of   data:   weblogs,   documents,  imagery,   along   with   tabular   data.   Sometimes   tabular   data   that   is   from  transactional   systems;  in  other  cases,  transactional  tabular  data  that  is   distilled   from   some   of   these   other   sources   of   data.   But   over  and  above  the  tabular  data,  they  want  to  deal  with   unstructured,  partially  structured  data.  In  addition  to  the   cloud  helping  to  avoid  some  of  the  friction  with  analysis,   part   of   the  challenge   people   face,   is…   there   is   another   revolution  going  on.    

Historically,  people  create  so  called  data  warehouses  as   a   staging   area   for   data   analysis,   and   this   has   typically  meant   relational   data   warehouses.   However,   increasingly,   while   relations   continue   to   be   very   important,   both   relations   that   come   from   transactional   stores   and   relations   or   tables   that   are  distilled   from   all   these  diverse  types  of  data  that  we  talked  about,  people   want   to   directly   intermingle   analysis   of   relational   and  unstructured,  loosely  structured,  multimedia  data.     So  this  is  fundamentally  a  departure  from  how  they  have   framed   the   goal   of   analysis.   Along   with   the   diversity   of   data  and  the  enormously  greater  scales  of  data,  you  are   seeing  the  diversity  in  their  types  of  analysis  they  need   to  be  able  to  carry  out.  And  this  has  led  to  interest   in   a   new  class  of  systems  called  distributed  systems  (such  as   Hadoop),   so   people   are   now  simultaneously   grappling   with   how   to   wrap   their   heads   around   this   new   class   of   systems,   new   class   of  heterogeneous   data   warehouses   and   heterogeneous   analysis   at   the   same   time   that   they   are  trying  to  move  to  the  cloud.  

MZ:  Yes,  sometimes  I  have  heard  that  called  a  "data  lake"   in  the  cloud.  

RR:   Yes,   and   in   fact   we   riffed   off   of   that   in   naming   our   new  store  product  on  Azure.  It  is  a  service  called  Azure   Data  Lake.  

MZ:  Are  we  where  we  should  be  with  cloud  technology,   then?   Is   it   abstracting  enough   away   of   the   nitty   gritty  

(4)

details   of  tweaking  machines   and   looking   at   virtual   boxes?  

RR:   I   think   we   are   on   a   path   that's   the   right   trajectory.   We’re   not   there   yet.   I   would   say   that   today,   by   far   the   majority  use  of  cloud  is  as  a  means  to  rent  infrastructure,   whether  it's  compute  cycles  or  storage.  What  we  need  is   a   transition   to   renting   higher-­‐‑level   services   and   eventually   vertically   integrated  solutions.   And,   we   are   seeing  that  happening.  All  the  major  cloud  vendors,  they   are  now  offering  services  such  as  database  services.  You   can  have  transactional  databases  or  even  analytic  stores   now   that   you   can   rent,   which   is   a   far   cry   from   simply   renting  blobs  and  virtual  machines.  

MZ:  And  then  talking  about  data  and  the  cloud,  I  think  no   discussion  of  a  cloud  environment  is  complete  until  you   touch   on   the   topic   of   security.   We  have   seen   very   high   profile   data   breaches   in   the   last   few   years.   They   are   happening  more  often  and  they  are  getting  more  severe.   So,  is  the  cloud  kind  of  an  additional  risk,  or  maybe  is  it   actually  an  opportunity  for  us?  

RR:  it's  a  fantastic  question.  I  would  have  said  that,  say,  3   years  ago,  the  answer  would  have  been  that  most  people   viewed  the  cloud  as  a  risk…  that  they  could  secure  their   data  more  effectively  themselves.  After  the  rash  of  high-­‐‑ profile  breaches,  I  think  that  is  shifting.  If  you  look  at  the   large   cloud   providers,   they   are   investing  enormously  in   security,   because   they   must,   and   that's   their   core   competence.   And   as   this   becomes   recognized,   I   think   people   are   increasingly   more   comfortable   placing   their   data   in   the   cloud.   But   it's   not   just   security.   As   you   well   know,   compliance   is   a   big,   big   issue,   and   in   many   verticals,  there  are  critical  compliance  certifications  that   must   be   achieved   before   you   can   use   systems.   And   as   cloud   providers   get   certified,   this   provides   a   very   attractive   alternative   to   doing   it   yourself   and  going   through  all  of  that  stress.  So,  I  think,  3  years  from  now,   most  people  will  view  compliance  certifications,  security   as  reasons  to  move  to  the  cloud.  

MZ:  That's  actually  a  fantastic  point.  So,  I  guess  with  that   prediction,  maybe  in  a  few  years  we  will  see  an  inflection   point   where   it   becomes   better   to   move   to   the   cloud   to  

shed   the  responsibility,   to   a   certain   extent   to   rely   on   someone   who   has   more   capacity,   more   expertise   in   security,  in  compliance  to  really  take  on  these  extremely   important   tasks   to   secure   your   enterprise.   Because   it’s,   as  we  know,  a  high  risk.  Reputational  risk  and  losing  the   data  is  never,  never  a  good  idea.    

I   think   in   closing   I  would   say  thank   you   very   much   for   your   time   today.   This   conversation   was   extremely   insightful,   and   I   hope   it   also   helped   our   viewers   to   deepen   their   knowledge   about   analytics,   advanced   analytics   (and)   how   cloud   technology   and   similar   technologies   really   can   help   them   in  their   business  endeavors.  

And  to  you,  our  viewers,  let  me  close  with  a  quote  from   another  person  who  knew  a  thing  or  two  about  practical   applications  of  science,  it  is  Albert  Einstein.  

Albert   Einstein   once   said:   “Information   is   not   knowledge”.   Let   that   sink   in   a   little   bit:   "Information   is   not  knowledge".   So,   as   we   sometimes  really   get   caught   up   in   the   hype   of   big   data,   let's   not   forget   that   the   goal   here  is  not  to  collect  data  but  really  to  derive  knowledge   from   it,   which   ultimately   will   lead   to   better  business   decisions.  

So   thank   you,   thank   you   again   for   joining   us   today!   To   learn   more   about   predictive   analytics   and   various   practical   use   cases,   please   visit   our   website   at  www.zementis.com

(5)

About  Zementis  

Zementis,  Inc.  provides  software  solutions  for  predictive  analytics.  The  company  was  founded  on  the  principle  that   data   science   teams   and   IT   departments   can   collaborate   seamlessly   and   efficiently,   allowing   predictive   models   to   rapidly   move   from   development   to   deployment,   so   that   businesses   and   other   data-­‐‑centric   organizations   can   easily   incorporate   predictive   analytics   into   their   routine   operations.   Agile   deployment   of   predictive   solutions   is   the   cornerstone  of  the  Zementis  philosophy.  

CIO   Review   recognized   Zementis   as   one   of   the   "Top   20   most   promising   Big   Data   companies   in   2013”,   and   Gartner   named  Zementis  a  “Cool  Vendor  in  Data  Science”  in  2014.  Its  ADAPA®  and  Universal  PMML  Plug-­‐‑in  (UPPI)  scoring   engines   are   designed   from   the   ground   up   to   benefit   from   open   standards   and   to   significantly   shorten   the   time-­‐‑to-­‐‑ market  for  predictive  analytics  in  any  industry.  Customers  such  as  Bosch,  FICO,  Equifax  and  Western  Union  have  used   Zementis  solutions  successfully  to  enhance  their  predictive  analytics  capacity  and  capabilities.    

Zementis   partners   with   leading   analytics   and   data   warehouse   solution   providers   to   enrich   and   extend   customer   capabilities.  Supported  partner  solutions  and  platforms  include:  Amazon  Web  Services,  Apache  Software  Foundation   (Hadoop,  Hive,  Spark,  Storm,    Tomcat),  Cloudera,  Datameer,  FICO,  Hortonworks,  IBM  (BigInsights,  PureData  /  Netezza,   WebSphere),   MapR,   Microsoft   Azure,   Oracle   WebLogic,   Pivotal   Greenplum,   RedHat   JBoss,   SAP   (HANA,   Sybase   IQ),   Teradata  and  Teradata  Aster.  

For  more  information,  please  visit  www.zementis.com.      

References

Related documents

Successive versions presented at American Finance Association, Atlanta, December 28, 1989; Western Finance Association, University of Washington, June 25, 1989; Financial

They also, considered six components for financial diagnosis, including: Diagnosis of financial position; Profitability analysis; Analysis of intermediary management balance

Beginning in 2010, with the strong support of University of Tirana (UOT) authorities at the department, faculty, and vice-chancellor level, C-Change supported the process

The University had also implemented a new asset allocation model for the Total Return Investment Pool (TRIP) which should provide UC more credit for liquidity. He concluded

All throughout the program, particip part includes work on program development projects, the business game and live case study in international seminars.. INCE ACADEMICS

In order to enhance the prophetic solidarity engagement of the UCZ in the context of socioeconomic and political uncertainty, this study delved into David Korten’s

To enrich the understanding of the very complicated human activity of integration, gender and social identity processes will be used to understand why there is a

The policy provides 3 levels of lifetime insurance cover for cats subject to certain terms and conditions being met.. Significant features