1 © 2011 IBM Corporation
From Data to Foresight:
Laura Haas, IBM Fellow
IBM Research - Almaden
The road from data to foresight is long
Must acquire, integrate, enhance and align
Must deal with missing and incomplete data
Must store, protect, and manage
Must create models and other analytics and test them
Must run these analyses efficiently over large data volumes
Must understand and share results
Requires significant (and expensive) EXPERTISE in data management,
systems, analytics, and the domain
Takes TIME
?
How can I reduce my?
Consumer Reports RAINFALL ERROR RAINFALL ERROR SATURATION & SURFACE Runoff OVERLAND ROUTING UPDATE STATE UPDATE STATE UPDATE STATE UPDATE STATE SOLVE STATE EQUATIONS SOLVE STATE EQUATIONS SOLVE STATE EQUATIONS SOLVE STATE EQUATIONS PERCOLATION PERCOLATION MISCELLANEOUS FLUXES MISCELLANEOUS FLUXES MISCELLANEOUS FLUXES MISCELLANEOUS FLUXES MISCELLANEOUS FLUXES UPPER LAYER EVAPORATION UPPER LAYER EVAPORATION UPPER LAYER EVAPORATION LOWER Layer EVAPORATION LOWER Layer EVAPORATION LOWER Layer EVAPORATION INTERFLOW BASE FLOW BASE FLOW BASE FLOW SATURATION & SURFACE Runoff PERCOLATION INTERFLOW SOLVE STATE EQUATIONS LOWER LAYER EVAPORATION UPPER LAYER EVAPORATION Misc fluxes UPDATE STATENote: in addition to dependencies shown, most flux calculations are dependent on values of state variables at the previous timestep Instantaneous Runoff Routed Runoff Total Water: Upper Layer, Lower Layer OUTPUT
Legend: Flux computations State computations
3 © 2011 IBM Corporation
The 4 V’s of data
Volume
Velocity
Variety
Veracity*
Data at Rest
Terabytes to
exabytes of existing
data to process
Data in Motion
Streaming data,
milliseconds to
seconds to respond
Data in Many
Forms
Structured,
unstructured, text,
multimedia
Data in Doubt
Uncertainty due to
data inconsistency
& incompleteness,
ambiguities, latency,
deception, model
approximations
Valuable new insights are hidden in this wealth of data!
Identify criminals and threats
from disparate video, audio,
and data feeds
Make risk decisions based on
real-time transactional data
Predict weather patterns to plan
optimal wind turbine usage, and
optimize capital expenditure on
asset placement
Detect life-threatening
conditions at hospitals in
time to intervene
5 © 2011 IBM Corporation
Fortunately, new platforms can unlock the value of data
BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics
Analytic Applications
IBM Big Data Platform
Systems
Management
Application
Development
Visualization
& Discovery
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
New analytic applications drive the
requirements for a big data platform
• Integrate and manage the full
variety, velocity and volume of data
• Apply advanced analytics to
information in its native form
• Visualize all available data for
ad-hoc analysis
• Develop new analytic applications
• Optimize and control scheduling of
many simultaneous analyses
Outcome-based medicine vision: Leverage public and private content, rich analytics
to improve treatment outcomes
Research & Development and
Intellectual Property
Target Identification and Validation Lead Discovery and Optimization Safety and Efficacy
Genomics Proteomics Metalobomics Chemical and Biological Extraction, Profiling, Analytics, And Reasoning
Clinical Decision Support
Patient Similarity and Segmentation Patient Cohorts for Clinical Support Clinical Genomics Analysis
Comparative Effectiveness Research Predictive Modeling of Outcome Disease Progression Analysis Treatment Cost Analysis Temporal Analysis
Patient experience
and social
community support
Patient first hand experiences Social community
development and support
Target Selection Candidate Selection Development Selection Target Identification Lead Discovery Preclinical Development Clinical I II IIIl Patient Experience Launch Patient Outcome Medical Care
7 © 2011 IBM Corporation
An Example: Leveraging data to accelerate life sciences R&D
► R&D
Find white space and gain insight into complex chemical and biological patents; Gain early insights into given target-compound match from past patents for better research target & target-compound selection decisions► Legal
Detect IP infringement earlier and increase the quality of patent filings► Corporate Strategy / Business Dev
Identify collaboration and acquisition targets for greater research value and effectiveness and find patent in- and out licensing candidates for efficient management and monetization of IP► Valuable insights into competitive landscape, white space, and IP portfolio
► High quality chemical extractions available hours after patents are available from patent authorities
► Previously unobtainable insights at the scientists’ fingertips with the touch of a button
► Fast and easy search and analysis drastically reducing search time from weeks and months to just minutes
The Benefits
Highly volatile, increasingly complex environment
Traditional R&D is not delivering
New approaches are needed
Collaborative R&D models
The new normal requiring open platforms, clear boundaries and protectionAgile responses
Vital to drive fast adaptation to changing competitive IP landscape including, adjustments to strategy, portfolio investments and partnershipsEffective IP portfolio management
Delivering key value for out-licensing and monetizing of non-core IPStrategic ecosystem development
Growth and competitive differentiation through aggressive collaboration, early identification of acquisition and recruitment targetsThe Situation
IBM BAO strategic IP insight platform (SIIP)
A unique and powerful
data and analytics offering
Aggregates and processes
30M+ patents and scientific literature from around the globeAutomatically extracts
chemical and biological entities – 200M+ chemical compound instances to dateGenerates
chemical and biological entity profilesSearches and analyzes
using natural language-based inputs for key relationship discovery and IP insightsReasoning
about causality of drug, diseases, targets, and efficacy and side effectsIntegrates and enhances
existing data and applicationsA Smart Entity Profiling, Analytics and Reasoning Methodology
Medicine
Disease
Patients
IP - Legal status - Assignee - Foreign filings - Expiration Date - . . . Drug - Activity - Half life - Protein Binding - . . . Physical - Computational - Molecular Weight - MF, Bp, Mp - . . . Spectral- IR - NMR - Mass Spectra - . . . Toxocity - Clinical Trials - Pre-Clinical - . . . Pathways - Metabolic - Genetic - Environmental - Cellular - Organism - . . . Screening - Activity - . . . Genetic -. . . Organisms - Organism - Organ - Cell - Tissue - . . . Life styles -. . . Reactions - Enzymes - . . .Patents
Literature
ExperimentalHTS
Medical
Records
Clinical
Business
Medical History -. . .Social
•An integrated framework leveraging broad set of data, and many types of analytics:
• Hypothesis generation • Entity extraction and
profiling
• Relationship discovery and analytics
• Summarization • Reasoning
• Scoring and ranking • Predictive modeling •Key steps:
• Extract key entities • Combine information
from multiple sources • Discover relationships
Medical Records
9 © 2011 IBM Corporation