• No results found

Building Successful Big Data Solutions

N/A
N/A
Protected

Academic year: 2021

Share "Building Successful Big Data Solutions"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

 

 

               

 

 

 

Building  Successful    

Big  Data  Solutions  

(2)

                                               

Executive  Summary

 

The   decision   to   invest   in   and   leverage   the   widespread   “Big   Data1”   revolution,   whether   you’re   a   large  

multinational  corporation  or  the  smallest  sole-­‐proprietorship,  is  no  longer  an  option  as  data  growth  has   outstripped  the  ability  for  people  and  20th  Century  technology  to  make  sense  of  it  all.  Differentiation  and   successful   execution   requires   a   21st   Century   approach   to   intelligent   analytics,   which   go   beyond   the   ability  to  count  and  sort  methodologies,  but  rather  approach  all  data  automatically,  whether  structured   or  unstructured.  The  successful  business  requires  tools  which  continuously  learn  and  reveal  actionable   and   unforeseen   connections,   while   also   being   able   to   flexibly   move   between   legacy   data   (which   may   reside   in   highly   organized   silos)   and   unstructured   data   generated   in   real-­‐time.   Atigeo’s   xPatternsTM   intelligent   Big   Data   platform   is   capable   of   providing   the   required   level   of   analytics   visibility   into   data,   both  structured  and  unstructured,  against  any  application  now  and  into  the  future.  

 

According  to  McKinsey  &  Company2,  there  is  a  growing  shortage  of  both  data  managers  and  skilled  data  

analysts  necessary  to  handle  the  continued  exponential  unstructured  data  growth.  Current  technologies   require  multiple  analyst  touch  points,  volume  limits  and  strict  data  policies;  today’s  solutions  must  lead   with   complex   and   constantly   evolving   sets   of   open   source   technologies.   However,   solutions   must   go   beyond  the  ability  to  store,  manage,  and  retrieve  the  copious  amounts  of  data  (that  is  simply  the  point   of  entry),  and  provide  advanced  analytics,  which  can  lead  to  quantifiably  more  effective  marketing  and   optimized  operations.  The  question  to  ask  is,  “Does  my  solution  just  enable  search,  recommendations   and   classifications   over   large   volumes   of   data,   or   does   it   also   achieve   unprecedented   relevance   necessary  for  robust  ROI?”  

 

Additionally,   privacy,   compliance   and   security   of   one’s   data   is   paramount.       As   data   explodes,   these   concerns   explode   with   it;   xPatterns   was   designed   innately   to   solve   for   these   concerns   in   a   Big   Data   world.   Merchants,   governments,   and   hackers   are   all   looking   for   ways   to   leverage   personal   data   and   consumers  are  right  to  be  wary  about  the  shifting  boundary  between  more  services  and  less  privacy.  The   question  to  ask  is,  “Does  my  Big  Data  platform  secure  my  data  out  of  the  box?”  

 

As  noted  above,  the  shortage  of  personnel  is  magnified  by  the  investment  of  time  required  to  retrain   current  IT  staff,  as  the  Big  Data  learning  curve  is  significantly  higher  due  to  the  number  of  technologies   and  components  involved.  Companies  must  decide:  Do  I  train  my  personnel  or  do  I  partner  with  21st   Century  technology?  

 

While  all  of  the  above  is  challenging,  Atigeo’s  seven-­‐year  head  start  in  addressing  Big  Data  analytics  has   ensured  xPatterns  is  the  appropriate  application  framework  for  building  enterprise-­‐grade,  intelligent  Big   Data  applications,  which  can  be  deployed  on-­‐premise  or  in  the  cloud  with  minimal  IT  support  required.   The  robust  Software  Developers  Kit  allows  data  scientists  to  easily  plug  and  play  to  try  with  the  best  in   class  tools.        

 

In   summary,   Atigeo’s   xPatterns   platform   makes   it   easy   to   combine   different   kinds   of   intelligent   components  in  Big  Data  applications,  including  those  built  by  our  partners  and  available  in  open  source.                                                                                                                              

1  Big  Data  definition:  the  unprecedented  growth  in  the  volume,  velocity  and  variety  of  data  in  our  world.  

2  The  McKinsey  Global  Institute:  “Big  Data:  The  next  frontier  for  innovation,  competition  and  productivity”  June  2011   2  The  McKinsey  Global  Institute:  “Big  Data:  The  next  frontier  for  innovation,  competition  and  productivity”  June  2011  

(3)

                                               

The  Big  Data  Opportunity  

“Big  Data”  has  been  defined  to  address  the  unprecedented  growth  in  the  volume,  velocity,  variety;  in   addition   Atigeo   believes   it   is   necessary   to   address   both   the   visualization   of   data   in   our   world   and   its   accessibility  for  all.  This  explosion  in  unstructured  and  semi-­‐structured  data  is  expected  to  account  for   90%  of  newly  created  data  going  forward3.  This  opens  up  significant  business  opportunities  to  leverage  

Big   Data   through   advanced   analytics   that   tie   directly   into   business   processes   and   applications.   As   the   following  diagram  presented  by  McKinsey  &  Company  in  September  20114  shows,  early  adopters  in  this   space  have  significantly  outperformed  their  respective  markets.  

 

   

   

The  question  today  is  no  longer  about  whether  or  not  a  company  should  invest  in  Big  Data  Analytics  to   stay  competitive,  but  how  to  gain  insight  hidden  beneath  the  surface  of  the  data  as  well  as  lower  the   total  cost  of  ownership  and  the  time  to  market  in  order  to  increase  their  chance  of  success  and  return.      

The   difference   between   Big   Data   and   traditional,   smaller   transactional   data   sets   is   that   Big   Data,   being   large   in   sample,   provides   more   insightful   patterns  when  applying  advanced  analytics  such  as   statistical   analysis,   machine   learning,   data   mining,   natural   language   processing,   information   retrieval   and   predictive   analytics   in   automated   ways,   otherwise   your   ability   to   unlock   the   value   of   your   data   (relevance)   will   dramatically   fall   off   because   there   aren’t   enough   people   on   the   planet   to   analyze  and  structure  for  this  global  data  growth.    

                                                                                                                         

3  J.P.  Morgan  –  Big  Data  Primer  June  2012  

(4)

                                               

However,  according  to  a  2011  analysis  by  McKinsey  &  Company,  for  the  first  time  in  history  there  is  a   current   shortage   of   1.5   million   data-­‐savvy   managers   to   tackle   the   unstructured   data   relevant   to   enterprises.  The  diagram  on  the  right  summarizes  the  general  trend  we  see  where  the  combination  of   massive  amounts  of  data  (volume),  coming  from  multiple  sources  (variety)  at  real  time  (velocity),  causes   traditional   approaches   (in   particular   those   that   rely   on   human   tagging,   prioritization   and   analysis)   to   become  ineffective  or  impractical.    Atigeo  believes  that  we  are  at  an  inflection,  or  decision  point,  where   the   growth   of   unstructured   data   overwhelms   the   growth   of   analysts   who   identify   structure   within   unstructured  data.  Thus,  relevance  falls  off  if  no  adjustment  is  made  to  handle  unstructured  data.  This   gap  will  grow  exponentially  for  the  foreseeable  future.  Hence,  the  collection,  analysis  and  integration  of   Big   Data   into   business   operations   must   be   automated   and   requires   an   expanded   portfolio   of   technologies.  

 

For  enterprises  to  capture  the  Big  Data  opportunity  effectively,  “accessibility”  in  their  Big  Data  solution   is   extremely   paramount.     Access   is   the   mean   to   democratize   the   Big   Data   tools   to   empower   every   employee   throughout   the   organization   to   maximize   value   of   the   data   for   the   company,   instead   of   leaving  the  job  to  a  small  group  of  specialists.      

 

Therefore,  building  successful  Big  Data  solutions  is  about  taking  advantage  of  volume,  velocity,  variety   and  visualization  through  analytics  and  making  it  accessible  to  all.          

Implementing  Big  Data  Solutions  

Our  framework  for  Big  Data  analytics  implementation,  successfully  applied  across  multiple  verticals  to   date,   confirms   that   the   best   solution   requires   a   different   technology   mix   per   customer,   substantial   domain-­‐specific  knowledge  and  data,  and  multiple  iterations  of  data-­‐driven  continuous  improvement.    

Until   now,   there   has   been   no   end-­‐to-­‐end   solution   that   fits   all   these   criteria;   we   expect   to   see   tremendous   advancements   in   technology   in   the   next   few   years   from   both   incumbents   and   new   entrants.   Companies   must   therefore   consider   a   platform   that   is   flexible   in   quickly   adopting   new   technologies,  for  both  distributed  data  processing  and  advanced  analytics,  as  they  become  available  and   enterprise-­‐ready.  In  addition,  such  a  platform  must  also  have  the  ability  to  comply  with  each  company’s   unique  requirements  while  leveraging  existing  data  and  infrastructure.    

 

Traditional  database  technologies,  analytics,  etc.  have  served  industry  well  until  recently  where  in  the   21st  Century  we  can  take  advantage  of  real  time  advances  available  in  Open  Source,  across  high  speed   networks,  breakthroughs  in  compute  power  and  systems,  and  advances  in  intelligence  technologies  like   xPatterns  are  game  changing.  

Introducing  xPatterns

 

xPatterns  is  an  application  framework  for  building  enterprise-­‐grade,  intelligent  Big  Data  applications  and   an   abstraction   platform   which   can   leverage   all   these   advances   by   ISVs,   Open   Source   community   technologies,   NLP,   machine   learning,   semantic,   academics,   etc..   Our   roadmap   is   guided   by   our   belief                                                                                                                            

5  The  McKinsey  Global  Institute:  see  footnote  2  

(5)

                                               

that   the   opportunity   to   capture   value   of   Big   Data   is   through   access,   analytics   and   visualization.     xPatterns   democratizes   the   current   technologies   by   abstracting   the   complexity   of   usage   of   i.e.   open   source  Hadoop  framework  (Access)  and  adding  ever  increasing  proprietary  and  open  source  Analytics   and  Visualization  tools  to  enable  automated  and  easy  manipulation  of  data  to  fit  all  business  needs.            

It  can  be  deployed  either  on-­‐premise  or  in  the  cloud.  xPatterns  provides  an  SDK  for  data  scientists  to   easily   configure   plug-­‐and-­‐play   components   and   to   experiment   with   best   in   class   tools,   reusing   and   integrating   with   the   company’s   existing   assets.   Data   scientists   can   then   directly   deploy   apps   as   web   services   or   analytical   jobs,   providing   a   seamless   transition   from   analysis   to   production.   The   runtime   environment  (Hadoop,  NoSQL,  search,  etc.)  is  completely  abstracted  away,  allowing  for  faster  time  to   market,  no  need  for  in-­‐house  expertise  and  easy  transition  between  underlying  technologies.    

 

xPatterns  –  what’s  included  out  of  the  box?  

 

Distributed  Processing   • Scalable,  reliable  processing   • Scalable,  reliable  storage   • NoSql  (key/value)  storage   • Pig  &  Hive  queries  

• Workflows  &  Scheduling   • High  availability  

• Backups   • Auto-­‐scaling  

• Search,  filtering,  faceting   • Real-­‐time  dataset  updates   • Shared  schema  mgmt.  

 

Advanced  Analytics   • Natural  language  toolkit   • Supervised  learning   • Unsupervised  learning   • Concept  extraction   • Ontologies  

• Plotting  &  Visualization   • Information  Retrieval   • Data  Mining   • Scientific  computing     • Predictive  analytics   • Inference     Framework  Features   • Create  &  deploy  apps   • Scheduled  workflows   • Data  ingestion,  push  or  pull   • Normalize,  filter  and  de-­‐dup  

incoming  data  

• Plug  &  play  analytical  tools   • Continuous  measurement   • Automated  feedback  loop   • Data  lifecycle  management   • Logging  &  monitoring   • Personalization  

 

On  the  next  page  is  the  xPatterns  architectural  diagram  consisting  of  the  infrastructure  layer,  horizontal   and   domain   specific   intelligence   layer   and   development   and   administrative   environment   layer.   The   framework  is  designed  to  achieve  flexibility  for  customers  to  choose  the  right  intelligence  to  solve  their   specific  Big  Data  problem  using  a  simple  high-­‐level  programming  language.  While  customers  focus  on   business  solutions,  xPatterns  take  care  of  Big  Data  environment.  Thus,  xPatterns  can  lower  the  barrier  of   entry  for  any  enterprises  or  application  to  take  advantage  of  Big  Data  opportunities.  

 

 

(6)

                                               

 

Intelligence  Components  

xPatterns  makes  it  easy  to  combine  different  kinds  of  intelligence  components  in  Big  Data  applications.   Some  of  these  components  are  open  source  including  popular  Python  libraries  such  as  nltk6  for  natural  

language  processing,  scikit7  for  machine  learning  and  matplotlib8  for  visualization.  Additional  intelligence  

components  are  those  built  by  our  partners  such  as  IBM’s  SystemT9  and  SystemML10.  The  third  category  

                                                                                                                          6  http://nltk.org/   7  http://scikit-­‐learn.org/stable/   8  http://matplotlib.sourceforge.net/   9  http://www.almaden.ibm.com/cs/projects/systemt/   10  http://www.almaden.ibm.com/cs/projects/systemml/  

(7)

                                               

of   components   comprises   patented   innovations   enabling   xPatterns   to   deliver   better   results,   using   algorithms  available  by  Atigeo  and  exposed  through  a  rich  set  of  APIs.  Examples  of  these  are:  

 

Relevance:  xPatterns  Relevance  takes  a  "relevance  discovery"  approach  that  delivers  on  the  promise  of   deriving  actionable  intelligence  from  an  enterprise's  disparate  sources  of  structured  and  unstructured   data.  xPatterns  automatically  creates  and  dynamically  maintains  semantic  ontologies  known  as  domain   experts  (DEs).  At  the  core  of  the  relevance  technology  is  the  creation  of  high-­‐quality  DEs  in  near  real-­‐ time.  

 

The  DE  is  built  as  a  Relevance  Neural  Network  (RNN)  that  maps  relationships  between  a  set  of  terms   (i.e.,  semantic  concepts)  and  related  terms  (output  layer),  intermediated  by  context  (i.e.,  documents  or   articles).  The  network  weights  are  initialized  (or  bootstrapped)  with  statistically  optimal  values  based  on   frequency   statistics.   Thereafter,   the   weights   are   strengthened   or   weakened   through   training   by   live   interaction   with   users,   as   well   as   with   new   data.   This   learning   capability   enables   better   relevance   by   leveraging  the  wisdom  of  crowds.  The  figure  on  the  left  below  shows  a  depiction  of  a  DE  RNN;  the  figure   on   the   right   is   an   xPatterns   visualization   of   network   relationships,   showing   relevant   documents   for   a   concept  and  relevant  concepts  for  a  document:  

  A  DE  captures  and  represents  relationships  between  concepts  within  a  given  domain.  DEs  are  created   automatically   by   analyzing   and   processing   large   bodies   of   unstructured   text   information   about   the   domain.  They  can  be  leveraged  to  determine  indirect  semantic  relationships  between  queried  concepts   and   related   concepts,   and   to   facilitate   understanding   of   the   relevance   of   a   specific   document   to   a   specific   concept.   DEs   represent   "IsAssociatedWith"   relationships   for   domains,   derived   simply   from   reading  and  reviewing  large  bodies  of  unstructured  text  information  about  a  given  area  of  interest.    

Inference:  xPatterns  Inference  delivers  complex  predictions  from  evidence.  Combined  with  a  Bayesian  

Model   Average   (BMA)   approach   to   integrate   user   preferences   embodied   in   a   Bayesian   network   (BN),   xPatterns   Inference   can   provide   higher   accuracy   even   when   collective   preferences   are   sparse.   The   power  of  Inference  is  attributable  to  its  ability  to  integrate  evidence  from  different  domains  at  various   levels  of  scope  in  a  scalable  way.  

(8)

                                               

 

Inference   incorporates   ontological   information   in   the   task   of   prediction.   This   information   can   be   captured   through   a   representation   of   DEs   thereby   allowing   the   incorporation   of   unstructured   information,  which  is  particularly  well  suited  to  cold-­‐start  prediction  scenarios.  

 

Cold-­‐start  prediction  describes  a  situation  where  the  data  sample  is  still  small  and  forming,  and  there  is   not  enough  sample  to  make  prediction  using  traditional  statistics  models.    An  example  is  shown  in  the   diagram  below,  where  user  A  provided  a  small  set  of  cuisine  preferences  and  the  task  is  to  infer  user  A’s   other  preferences  on  cuisines  not  listed.  The  algorithm  takes  into  account  the  preferences  of  all  users   and  the  additional  relationship  weightings  represented  by  Domain  Experts  to  infer  the  likelihood  of  user   A’s  other  preferences  in  cuisine.  This  allows  us  to  calculate  with  high  confidence  the  probability  whether   A   likes   Chinese   Food   even   if   preferences   collected   from   the   population   are   too   small   of   a   sample,   especially  in  the  beginning  of  the  sample  collection  process.      

   

As  the  scenario  evolves,  personal  or  local  evidence  grows  in  tandem  with  population  level  behavior.  This   evidence   may   be   structured   or   unstructured.   The   three   main   components   of   ontology,   personal/local   behavior  and  population  level  behavior  combine  to  render  optimally  informed  inference.  

 

Classification:   xPatterns   Classification   infers   type   or   class   from   complex   information.   Classification   integrates  structured  and  unstructured  data  into  classification  scenarios,  which  may  have  large  scales  in   the  volume  of  data,  the  size  of  the  input  space  and  the  number  of  possible  classes  that  may  be  inferred.   Classification  develops  deeper  understanding  of  unstructured  data  through  processing  natural  language   to  decipher  complex  relationships.  The  deeper  understanding  enables  qualities  of  sentiment,  time  and   reference,  which  are  applied  to  distinguish  among  subtly  distinct  classes.    

 

For  example,  a  domain-­‐specific  classification  tool  is  incorporated  for  healthcare  professionals,  leveraging   the   Unified   Medical   Language   System®   (UMLS®)   from   US   National   Library   of   Medicine   as   well   as   International   Classification   of   Diseases   (ICD-­‐9   and   ICD-­‐10)   data   to   return   precise   classifications   for   unstructured  medical  text.        

(9)

                                               

 

Cooperative  Distributed  Inferencing  (CDI):  xPatterns  CDI  is  a  new  paradigm  for  Inferencing  and  Optimal   Control   in   real   time.   It   is   a   distributed   optimziation   approach   with   built-­‐in   synchronication   in   a   continuous  optimization  of  all  types  of  rules,  soft  and  hard  rules.  The  paradigm  for  inferencing  converts   multiple  knowledge  bases  from  exponential  complexity  to  polynomic  complexity.  Then,  constraints  are   build  with  a  pareto  strategy  that  synchronize  different  rules  to  form  a  converging  optimal  result.            

The  application  of  this  inferencing  model  is  vast.  One  example  is  optimizing  the  power  grid,  which  has   multiple  knowledge  bases  and  rules  that  are  not  all  taken  into  account  by  the  out-­‐dated  algorithms.  This   leads   to   local   ad-­‐hoc   adjustments   and   empirical   corrections,   which   are   sub-­‐optimal.   The   figure   below   shows  the  current  model  and  an  xPatterns  CDI  model.      

 

In  summary,  the  three  main  differentiating  points  for  CDI  are:   1. Deal  with  large  size  rule  sets  in  real  time  

2. Express  variety  of  rules  and  constraint  with  optimization  

3. Distributed  cooperation  between  independent  nodes,  without  needing  trust  among  nodes    

(10)

                                               

 

 

Personas  -­‐  The  unique  xPatterns  privacy  model  makes  it  possible  for  individual  users  to  create,  build  and   control  their  own  digital  “personas.”  These  anonymous,  secure  profiles  keep  users’  identities  completely   private  while  accurately  reflecting  their  interests  and  behaviors  in  the  digital  landscape.  In  this  way,  it   becomes  possible  to  deliver  highly  relevant,  personalized  content  and  experiences  to  individuals  without   learning  those  individuals’  actual  identities;  instead,  only  their  relevance  scores  are  visible.  

 

Here  is  how  the  xPatterns  persona  module  works:    

• All  content  types  are  given  a  relevance  score  based  on  the  personalized  attributes  of  the  user   • The  user  profile  can  be  initialized  from  existing  enterprise  data  sources  

• Profile  attributes  can  be  dynamically  updated  from  real-­‐time  inferred  or  explicit  behavioral  data   • Applications  can  be  designed  to  give  consumers  full  management  of  their  personas  

• Persona  attributes  are  unstructured,  meaning  they  don’t  have  to  be  selected  from  static  lists    

NLP-­‐P   (Natural   Language   Pre-­‐Processing):   Atigeo   has   a   set   of   healthcare   domain-­‐specific   natural   language  processing  built  on  top  of  the  existing  open  source  projects  and  mutliple  sources  of  references.     The   pre-­‐processing,   which   can   be   applied   to   any   domain,   consists   of   body   and   sentence   extraction,   negation   tagging,   normalization,   lemmatization   and   removal   of   stop   words.   This   is   used   to   improve   overall  relevance  of  xPatterns  at  time  of  generation  of  corpuses  and  at  query  time.        

Applications  

Atigeo   has   been   working   with   several   partners   to   solve   their   real   life   important   Big   Data   analytic   challenges.  The  following  are  some  examples:    

 

xPatterns  Clinical  Auto-­‐Coding:  Often  times,  there  is  an  under-­‐coding  problem  where  hospitals  are  not   billing  the  insurance  companies  correctly  to  get  paid  an  accurate  amount.  Hospitals  are  facing  a  shortage   of   trained   staff   to   translate   Electronic   Medical   Records   (EMRs)   to   required   ICD-­‐9   and   ICD-­‐10,   CPT,   HCPCS,  APC  Grouper,  Charge  Master,  DRG  codes  and  more.  Atigeo  has  developed  an  intelligence  system   to  automatically  suggest  correct  codes  for  any  number  of  EMRs.  We  are  also  able  to  take  big  data  sets  of   past   EMRs   and   run   them   through   our   intelligence   system   to   perform   an   audit   or   add   more   accurate  

(11)

                                               

codes,   creating   a   complete   view   of   actual   clinical   services   for   compliance   or   research   purposes.     In   addition   to   NLP,   the   product   assembles   multiple   intelligent   methodologies   including   inference,   classification,  ontology  and  machine  learning  that  differentiate  Atigeo  from  its  competitors.  

 

   

Research   -­‐   Document   Discovery:   As   our   analytics   algorithms   are   specially   designed   to   solve   unstructured   data   relevance   problems,   we   have   applied   them   to   a   large   set   of   unstructured   text   documents  as  our  first  Big  Data  usage  scenario.  We  processed  gigabytes  of  medical  research  documents   (PubMed)   and   patents   (USPTO)   by   assigning   relevance   scores   and   generating   domain   concepts.   Users   can  submit  search  queries  to  find  relevant  documents  organized  in  clusters.  The  platform  continues  to   improve  through  applying  machine  learning  to  users’  interactions  with  the  documents.  

 

Through   xPatterns   Relevance,   document   discovery   is   no   longer   a   linear   search   problem.   We   have   developed   a   visualization   tool   that   allows   users   to   easily   navigate   among   clusters   of   many   relevant   documents  and  sometimes  even  discover  relevant  concepts  and  documents  that  are  non-­‐obvious  to  the   original  search  query.  

(12)

                                               

   

Clinical   Analytics/Intelligence   as   a   Service:   Atigeo   developed   a   clinical   intelligence   layer   on   the   xPatterns  framework.    With  easy  access  to  pre-­‐loaded  medical  domain  toolboxes  in  the  cloud,  users  can   run  analytics  against  their  own  large  data  set  such  as  EMRs.  xPatterns’  analytical  toolset  allows  users  to   do  natural  language  data  mining,  correlations,  etc.  to  find  insightful  patterns  on  a  given  research  topic.   For  this  specific  use  case,  there  are  tremendous  benefits  of  leveraging  a  cloud  service,  which  xPatterns   supports.  Benefits  include:  

 

1. Scalability  and  agility:  Initially,  processing  Big  Data  requires  a  large  number  of  servers,  which  will   then  not  be  required  once  the  data  is  processed  and  the  results  stored.  Cloud  services  provide   the   flexibility   for   scaling   up   and   down   as   needed.   Leveraging   the   cloud,   an   enterprise   can   optimize  their  processing  power  without  waste.  

2. Deployment  and  maintenance  cost:  Upfront  investment  is  high  for  infrastructure  deployments   and   skilled   staffing.   The   cost   of   keeping   up   with   the   latest   software   is   expensive   when   advancement  is  happening  very  rapidly  in  this  space.  

3. Time:  Cloud  flexibility  takes  deployment  time  out  of  the  equation,  and  it  also  gives  an  enterprise   the  ability  to  control  turnaround  time  for  output.      

Conclusion  

The  question  today  is  no  longer  about  whether  or  not  a  company  should  invest  in  Big  Data  Analytics  to   stay  competitive,  but  how  to  gain  insight  hidden  beneath  the  surface  of  the  data  as  well  as  how  to  lower   the  total  cost  of  ownership  and  improve  time  to  market  in  the  face  of  these  challenges.  Big  Data  brings   both  opportunities  and  challenges  are  met  by  xPatterns,  which  lowers  barriers  to  entry  in  the  Big  Data   space   by   taking   away   the   complexity   and   advancing   the   insight.   Infrastructure   and   talent   acquisition   should   not   be   any   enterprise’s   major   concern.   The   focus   should   be   on   the   solution,   which   means   Atigeo’s  xPatterns  is  the  enabling  “Big  Data  Intelligence  platform”  for  the  21st  Century.  

References

Related documents

Assembling and analyzing this amount of data, or even assembling and analyzing many much shorter genome sequences like those of the West Nile or AIDS viruses or pathogenic

● Highlight is high performance dataflow supporting iteration, fine-grain, coarse grain, dynamic, synchronized, asynchronous, batch and streaming ● Two distinct

If you need to process smaller numbers of rows, consider storing them in a temporary table in SQL Server or a temporary fi le and only writing them to Hadoop when the data size

A new service included in the Proofpoint platform, Proofpoint Targeted Attack Protection ™ deploys an array of advanced technologies including Big Data analysis techniques, URL

Traditional data governance tools either treat Hadoop as a black box with no visibility or access into internal data manipulation (aka ETL, etc.), or impose significant

Batch load into a traditional data warehouse or database daily Push to external systems daily.. Example: abandoned

Integration of people, process and technology for faster and better decisions, based on real-time data and integrated work processes. • Smart sensors,

2:30pm Exercise: Running example pipelines with sample data sets 2:45pm Running analysis at scale using Globus Genomics. 3:15pm Exercise/Demonstration: Executing Exome, RNA,