• No results found

How To Get More Data From Your Computer

N/A
N/A
Protected

Academic year: 2021

Share "How To Get More Data From Your Computer"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

Industry Perspective:

Big Data and Big Data Analytics

David Barnes Program Director

Emerging Internet Technologies IBM Software Group

(2)

What is Big Data?

(3)

The Adjacent

Possible

(4)

Inexpensive disk

+ Increased processing power

+ Data Warehouse

+The Web

+ X

= Big Data

X=Sensors used to gather climate information, posts to social

media sites, digital pictures and videos, transaction records, cell

phone GPS signals, and more.

(5)

© 2010 IBM Corporation

161 exabytes of data were created in 2006 –

3 million times the amount of information contained

in all the books ever written.

In 2010 the number reached hit 988 exabytes.

IDC estimates that 1.8 zettabytes were created and

replicated in 2011.

(6)

© 2010 IBM Corporation

Every day, people create the equivalent of 2.5

quintillion bytes of data from sensors, mobile devices,

online transactions, and social networks.

Every month people send one billion Tweets and post

30 billion messages on Facebook.

90% (or more) of the world’s data is unstructured.

(7)

The true nature of information

(8)

Is noisy

Is often times dirty

Is often full of valuable information

Unstructured Data

(9)

© 2010 IBM Corporation

Big Data has swept into every industry

and business function.

Businesses need to put the power of Big

Data analytics in the hands of their

business employees – Data Scientist is

somewhat misleading.

Leaders in every sector will have to

grapple with the implications of big

data, not just a few data-oriented

managers.”

– McKinsey Global Institute

The Big Data Imperative

9

Big Data Business

Patterns

Computational Journalism

Chief Legal Officer

Retail Business Planner

IT Systems Management

Pharma - Clinical Trials

Business Fraud Detection

Evidence Based Medicine

Web Archiving

. . .

(10)

© 2010 IBM Corporation

Today’s Problem

Data growing at compound annual growth of 60%/year

Storage capacity continue to increase dramatically

Storage access speeds have not kept up

At transfer speed of 500 MB/sec - 1 terabyte of data

will require ~30 mins to read from single drive

Enter Map/Reduce

• Automates the mechanisms of large-scale distributed computation ( i.e. work

distribution, load balancing, replication, failure/recovery)

• Divide & Conquer: Split 1 terabyte split among 100 drives will require ~20 seconds

to read

• M/R parallel processing model provides cost effective framework for new generation

of analytic applications on unstructured or semi-structured data

(11)

© 2010 IBM Corporation

Requirement: A New Class of Big Data Applications

Big Data analytics must be

brought to the line-of-business

user.

• Leverage easy-to-use

manipulation metaphors

• Use natural language

technologies for analytics

• Provide rich visualizations to

quickly identify insights

(12)

Demo

Buyer Sentiment Analysis

(13)

© 2010 IBM Corporation Sharenomics - Rise of Social Economy Slide

Social Media: Chiliean Earthquake 2010

2010 Chilean earthquake fifth largest

earthquake in recorded history

The affected areas suffered major

devastation - buildings, airports,

hospitals, prisons, bridges, and roads

were severely damaged

Land-based communications systems

suffered major outages

The wireless 3G infrastructure remained

intact and operational

13

(14)

© 2010 IBM Corporation Sharenomics - Rise of Social Economy Slide

Social Media: Chiliean Earthquake 2010

14

Social networking on wireless

networks major form of

communications

Extreme Blue students collected 226

million Tweets, analyzed,categorized

by incidence type and location

Tweets included - Can I get food? Can

I get gas? Are the bridges down -

images

The results were visualized

Completed in ~12 weeks

(15)

© 2010 IBM Corporation

Big Data = Volume, Variety and Velocity

15

Volume - Scale from terabytes to zettabytes

Variety - Relational and non-relational data types from an ever-

expanding variety of sources

Velocity - Streaming data and large volume data movement

(16)

© 2010 IBM Corporation

Big Data = Volume, Variety and Velocity

Volume - Scale from terabytes to zettabytes

Variety - Relational and non-relational data types from an ever-

expanding variety of sources

Velocity - Streaming data and large volume data movement

(17)
(18)
(19)

The Supercomputer is based on over 1,200 high

powered IBM System X servers and can perform

150 trillion calculations per second -- equivalent

to 30 million calculations per Danish citizen per

second.

Vestas expects its data sets will grow to 20-plus

petabytes over the next four years.

(20)

© 2010 IBM Corporation

Big Data = Volume, Variety and Velocity

Volume - Scale from terabytes to zettabytes

Variety - Relational and non-relational data types from an ever-

expanding variety of sources

Velocity - Streaming data and large volume data movement

(21)

©  2011  IBM  Corporation

Seton  Healthcare  Family

Reducing  CHF  readmission  to  improve  care  

Business  Challenge

Seton  Healthcare  strives  to  reduce  the  occurrence  of  high   cost  Congestive  Heart  Failure  (CHF)  readmissions  by  

proactively  identifying  patients  likely  to  be  readmitted  on   an  emergent  basis.  

What’s  Smart?

IBM  Content  and  Predictive  Analytics  for  Healthcare  

solution  will  help  to  better  target  and  understand  high-­‐risk   CHF  patients  for  care  management  programs  by:

Smarter  Business  Outcomes

Seton  will  be  able  to  proactively  target  care  management   and  reduce  re-­‐admission  of  CHF  patients.

Teaming  unstructured  content  with  predictive  analytics,   Seton  will  be  able  to  identify  patients  likely  for  re-­‐admission   and  introduce  early  interventions  to  reduce  cost,  mortality  

IBM  solution

IBM  Content  and   Predictive  Analytics   for  Healthcare

IBM  Cognos  Business   Intelligence

IBM  BAO  solution   services

Utilizing  natural  language  processing  to  extract  key  elements   from  unstructured  History  and  Physical,  Discharge  Summaries,   Echocardiogram  Reports,  and  Consult  Notes

Leveraging  predictive  models  that  have  demonstrated  high   positive  predictive  value  against  extracted  elements  of   structured  and  unstructured  data  

Providing  an  interface  through  which  providers  can  intuitively   navigate,  interpret  and  take  action

“IBM  Content  and  Predictive  Analytics  for  Healthcare  uses  the  same  type  of  natural  language  processing  as  IBM  Watson,   enabling  us  to  leverage  information  in  new  ways  not  possible  before.  We  can  access  an  integrated  view  of  relevant  

clinical  and  operational  information  to  drive  more  informed  decision  making  and  optimize  patient  and  operational   outcomes.”

(22)

©  2011  IBM  CorporaUon

2 ©  2011  IBM  CorporaUon

IBM  Content  and  PredicUve  AnalyUcs  for  Healthcare

The  Seton  CHF  Readmission  SoluUon  

Unstructured  Data

(Cerner  Clinical  Documenta0on:  

History  and  Physical,  Discharge   Summary,  Echocardiogram.)

Structured  Data

(Avega  Cost  Data,  DSS  Admission   History,  DSS  Procedure  History,   Cerner  Clinical  Events)

Raw  

Informa=on

Search  and  Visually  Explore   (Mine)

Monitor,  Dashboard  and   Report  (Cognos  BI)

Ques%on  and  Answer*

Custom  SoluBons

Dynamic   Mul=mode Interac=on IBM  Content  and  

Predic=ve   Analy=cs

Content  AnalyBcs

• Natural  Language  Processing

• Medical  Fact  and  Rela0onship   Extrac0on  (Annota0on)

• Trend,  PaIern,  Anomaly, Devia0on  Analysis

PredicBve  AnalyBcs

• Predic0ve  Scoring  and   Probability  Analysis

Analyzed  and   Visualized Informa=on

Health   Integra=on   Framework

Data  Warehouse  and  Model Master  Data  Management Advanced  Case  Management Business  AnalyBcs

Partners  (HLI) Specialized  Research

IBM  Watson  for   Healthcare

Confirm  hypotheses  or  seek  alternaFve   ideas  with  confidence  based  responses   from  learned  knowledge*

UUlizing  natural  language   processing  to  extract  key   elements  from  unstructured   History  and  Physical  and   Discharge  Summary

Leveraging  predicUve  models  that   have  demonstrated  high  posiUve   predicUve  value  against  extracted   elements  of  structured  and  

unstructured  data  

Providing  an  interface  through   which  providers  can  intuiUvely   navigate,  interpret  and  take   acUon

(23)

©  2011  IBM  CorporaUon

The  Data  We  Thought  Would  Be  Useful  …  Wasn’t

• 113  candidate  predictors  from  structured  and  unstructured  data  sources

• Structured  data  was  less  reliable  then  unstructured  data  –  increased  the  reliance  on  unstructured  data New  Unexpected  Indicators  Emerged  …  Highly  Predic=ve  Model

• 18  accurate  indicators  or  predictors  (see  next  slide)

Predictor  Analysis %  Encounters

Structured  Data %  Encounters   Unstructured  Data

Ejec0on  Frac0on  (LVEF) 2% 74%

Smoking  Indicator 35%

(65%  Accurate) 81%

(95%  Accurate)

Living  Arrangements <1% 73%

(100%  Accurate)

Drug  and  Alcohol  Abuse 16% 81%

Assisted  Living 0% 13%

What  Really  Causes  Readmissions  at  Seton

Key  Findings

3

97%  at  80th  percen0le

49%  at  20th  percen0le

(24)

©  2011  IBM  CorporaUon

Cognos  dashboard  reporUng  system  can  help  in  monitoring  the  key  clinical,  

operaUonal  and  financial  metrics.    More  importantly,  being  able  to  track  down  

the  top  priority  cases  for  case  management.  

5

Visualizing  the  Results:  Readmissions  Dashboard

1.Clinical  Sta=s=cs:  

admission  count,  

readmission  count    and   readmission  rate

2.Opera=onal  Sta=s=c:  

Counts  of  different  length   of  stay  periods

3.Financial  Sta=s=c:    Total   direct  cost  by  total  

admission  and  by   readmission

4.Mortality:  mortality  rate 5.Average  length  of  stay   6.Average  direct  cost  by   total  admission  and  by   readmission  only

7.PA  Model  Score:  

Distribu0on  of  propensity   of  readmission

1 2 3

4 5 6

7

(25)

© 2010 IBM Corporation

Big Data = Volume, Variety and Velocity

Volume - Scale from terabytes to zettabytes

Variety - Relational and non-relational data types from an ever-

expanding variety of sources

Velocity - Streaming data and large volume data movement

(26)

© 2010 IBM Corporation

USC Annenberg School of Communications

(27)

© 2010 IBM Corporation

InfoSphere Streams

27

(28)

© 2010 IBM Corporation

Big Data Platform Vision

28

Big Data Enterprise Engines

Big Data Solutions

Internet Scale Analytics

Streaming Analytics

Developers End Users Administrators

Big Data User Environments

Bringing Big Data to the Enterprise

Client and Partner Solutions

Open Source Foundational Components

Hadoop MapReduce HDFS Hbase Pig Lucene Jaql

AGENTS INTEGRATION

Marketing Warehouse Appliances

Data Warehouse

Database

Analytics

Business Intelligence Master Data

Mgmt

InfoSphere Warehouse

Netezza

InfoSphere MDM

DB2

SPSS

Cognos

Unica

References

Related documents