• No results found

Data Science & Multi-Stakeholder Partnerships

N/A
N/A
Protected

Academic year: 2021

Share "Data Science & Multi-Stakeholder Partnerships"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Science &

Multi-Stakeholder Partnerships

Annika Jimenez

Global Head of Data Science Services

(2)
(3)

Data Science Value Chain

Instrumen-tation Logs Capture Store and Prepare Transform Access Development Model Deploy Applications Process Change Product

Engineer Engineer Platform DBA

Data Engineer/Program mer Data Engineer Data Scientist Platform

(4)

Sample Scenario – Data Sources

Clinical Trial Data

(Full EMR, imaging, clinical notes, disease progression, EDSS)

Experimental Data

(biosamples, genomics, mass-spec)

Operational Data on Clinical Trial

Execution

EMR

Medical Claims

(Humana, CMS, others)

Syndicated Prescription

(IMS)

Pharmacy Claims

(Humana, i3)

Patient Communities

Website Logs

Call Center Data

Public genomics databases

Literature

Patient Registry

Self Reporting

(PatientsLikeMe, Social Media)

Discovery

Clinical

Trials

Patient

Engagement

Provider Data

(Factual, DocGraph)

+

+

+

(5)

What Is A “Data Scientist”?

Programmi

ng

Sk

ill

s

Mathematical/Statistical Skills

(6)

Pivotal Data Science Dream Team

Derek Lin – Network Security, Fraud Detection, Speech and Language

Processing, (Principal Scientist at RSA, M.S. in Signal Processing, USC)

Hulya Farinas – Optimization, Resource Allocation in Healthcare (Modeler

at M-Factor, IBM, Ph.D. in Operations Research, University of Florida)

Kaushik Das – Mathematical Modeling in Energy, Retail and Telco(Director of Analytics at M-Factor, M.S. in Mineral Engineering, UC Berkeley)

Rahel Jhirad – Quantitative Modeling and Risk Management in Trading and Finance (Global Risk Management at Saloman Brothers, Morgan Stanley, Ph.D. in Economics, Princeton, M.S. in Mathematics, Courant Institute)

Sarah Aerni – Genomics and Machine Learning (Ph.D. in Biomedical

Informatics, Stanford)

Mariann Micsinai – Next Generation Sequencing (Market Risk Management Associate at Lehman Brothers, Ph.D. in Computational Biology, NYU and Yale)

Emily Kawaler – Clinical Informatics and Machine Learning (M.S. in

Computer Sciences, University of Wisconsin-Madison)

Joseph Zadeh – IT/Network Traffic and Financial Modeling (Ph.D. in

Mathematics, Purdue)

Victor Fang – Imaging and Graph Analytics, Machine Learning (Sr. Scientist at Riverain Medical, Ph.D. in Computer Sciences, University of Cincinnati)

Anirudh Kondaveeti – Trajectory Data Mining and Machine Learning (Ph.D. in Computing & Dec. Systems Eng, Arizona State University)

Hong Ooi – Insurance and Finance Risk Modeling (Statistician at ANZ, Ph.D. in Statistics, Australian National University)

Michael Brand –Text, Speech and Video Research for Retail, Finance and

Gaming (Chief Scientist at Verint Systems, M.S. in Applied Mathematics, Weizmann Institute)

Kee Siong Ng – Data Mining in Healthcare

(Sr. Data Miner at Medicare Australia, Ph.D. in Computer Science, and Postdoctoral Fellow, Australian National University)

Noah Zimmerman – Statistics and Immunology (Ph.D. in Biomedical

Informatics, Stanford)

Noelle Sio – Digital Media Analytics and Mathematical Modeling(Sr. Analyst at eHarmony, Fox Interactive Media (Myspace), M.S. in Applied Mathematics, Cal Poly Pomona)

Jin Yu – Stochastic Optimization, Robust Statistics in Machine Learning,

Computer Vision (Research Associate at U of Adelaide, Ph.D. in Machine Learning, Australian National University)

Rashmi Raghu – Computational Methods and Analysis (Ph.D. in Mechanical Engineering, Stanford)

Woo Jung – Bayesian Inference and Demand Analysis (Sr. Statistician at

M-Factor, M.S. in Statistics, Stanford)

Jarrod Vawdrey – Marketing Analytics & SAS (Analytics Consultant at Aspen Marketing, B.S. in Mathematics, Kennesaw State University)

Niels Kasch – Text Analytics and NLP (Ph.D. in Computer Science, UMBC) • Vivek Ramamurthy – Online Learning, Stochastic Modeling, Convex

Optimization (Ph.D. in Operations Research, UC Berkeley)

Srivatsan Ramanujam – NLP and Text Mining

(Natural Language Scientist at Sony, Salesforce.com, M.S. in Computer Sciences, UT Austin)

Alexander Kagoshima – Time Series, Statistics and Machine Learning (M.S. in

(7)
(8)

Data Science Curricula

Online:

“Introduction to Data Science”

University of Washington

https://www.coursera.org/course/datasci

“Machine Learning”

Stanford University

https://www.coursera.org/course/ml

“Introduction to Databases”

Stanford University

https://www.coursera.org/course/db

“Introduction to Artificial Intelligence”

Stanford University

http://www.udacity.com/overview/Course/cs271/Cou

rseRev/1

Syracuse: Certificate of Advanced Study in

Data Science

http://ischool.syr.edu/future/cas/datascience.as

px

Northwestern: M.S. Analytics

http://www.analytics.northwestern.edu/

North Carolina State University, Institute for

Advanced Analytics: M.S. Analytics

http://analytics.ncsu.edu/

Columbia Institute of Data Science

Lots More!

(9)

Data Science and Big Data Analytics

Course and Certification

Course Overview

“Open” curriculum

Practitioner’s approach

Enables immediate participation

on analytics projects

Prepares for EMC Proven

Professional Data Science

Associate Certification

(10)

Hard Problem Solving

Academia

Data

Owners

Distributed

Computing

• Genomics

• Video

• Energy

*

Innovation

(11)

RNA Sequencing

Customer

Translational Pathology Department of a

University

Business Problem

Reducing time spent on processing and

analyzing RNA sequences

Challenges

The center identified three bottleneck steps

in their RNA sequencing pipeline: Alignment,

splice junction detection, and gene level

expression

Solution

Algorithms implemented in UAP are three

times faster than the client’s existing

platform.

In addition to

improving the

algorithm run time,

we identified

modifications to the

method capable of

improving prediction

on short exons.

(12)

Pivotal Industry Collaboration

Analytics Workbench

1000-node Hadoop cluster

Publicly available data sets

Largest test-bed for Hadoop, Mahout, etc.

Universities and non-academia

Mission: Provide a collaborative, community-focused, open and

innovative platform for rapid discovery and demonstration of

solutions to the world's biggest data challenges.

(13)

References

Related documents

The cluster variables - age, type of organisation structure, formalization level, number of managerial hierarchy levels - have been chosen based on the works by (Hanks et al.

We detect a significant positive relationship between knowledge stock at non-university research institutes and the number of entrepreneurs with high affinity to research and

The decision variable comes in strongly statistically signicant as a driver of the standard deviation (eq. 13), where Governing Council days with (bigger) policy rate changes

6/d. The court stated that § 186 was different from mail and wire fraud predicate acts and the in- clusion of § 186 violations as a specific predicate act suggested

In the early stages of our collaboration, I helped the team to contextualise our work within the emerging discipline of Transition Design, which provided a framework

The value of wear rates is directly proportional to the sliding speed for palm stearin and has higher wear rates compared to commercial mineral engine oil.. Whereas

A Master is a Master indeed, a Master in all three phases of life: A Guru or Master on the physical plane, shar- ing our joys and sorrows, guiding affectionately each one of us in

Switch the logical standby database to the primary database role: SQL> alter database commit to switchover to primary;..