• No results found

!"#$%&'()*+*,-%')').*/-(.-%0*/-121)&1-23

N/A
N/A
Protected

Academic year: 2021

Share "!"#$%&'()*+*,-%')').*/-(.-%0*/-121)&1-23"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

!"#$%&'()*+*,-%')').*/-(.-%0*/-121)&1-23

Ivo D. Dinov

Assoc. Director

MIDAS

Ed & Training

Nandit Soparkar

CEO, Ubiquiti

Patrick Harrington

Co-Founder

Chief Data Scientist

CompGenome.com

Erin Shellman

Research Scientist

AWS

(2)

MIDAS Data Science

Education & Training:

Challenges & Opportunities

Ivo D. Dinov

Associate Professor, School of Nursing

Associate Education Director, Michigan Institute for Data Science (MIDAS)

University of Michigan

(3)

4%&'()%5*6'.*7%&%*8$'1)$1**

9#--'$#5%*9()2&155%&'()3

Recently established Data Science institutes and curricular programs

!

(4)

:;7<8*6'.*7%&%*!$(2=2&103

http://socr.umich.edu/docs/BD2K/BigDataResourceome.html

!

:;7<8*6'.*7%&%*!$(2=2&103

http://socr.umich.edu/docs/BD2K/BigDataResourceome.html

http://socr.umich.edu/docs/BD2K/BigDataResourceome.html

http://socr.umich.edu/docs/BD2K/BigDataResourceome.html

MIDAS

(5)

:;7<8*>($#2*()*71?15(@').*6'.*7%&%*8A'5523

o

!

Listening

: streams, analyzing sentiment, intent and trends;

o

!

Looking

: searching, indexing and memory management of heterogeneous

datasets; Raw, derived or indexed data as well as meta-data;

o

!

Programming

: Handling Map-Reduce/HDFS, No-SQL DB, protocol

provenance, algorithm development/optimization, pipeline workflows;

o

!

Inferring

: Principles of data analyses, Bayesian modeling, inference,

uncertainty and quantification of likelihoods; Reasoning & logic; Analytics:

Regression, feature selection, dimensionality reduction, temporal patterns,

validation; Actionable Knowledge

o

!

Machine Learning

: Classification, clustering, mining, information

extraction, knowledge retrieval, decision making;

o

!

Predicting

: Forecasting, neural models, deep learning, and research topics;

o

!

Summarizing

: Presentation of data, processing protocol, analytics

(6)

8@1$'B$*7%&%*8$'1)$1*,-%')').*9C%551).123

o

!

Technological advances and

students’ IT

skills far outpace educators’

technological expertise

and the current

rate of IT adoption in college curricula

o

!

Lack of

open learning resources and

interactive interoperable platforms

o

!

Difficulty sharing

data, tools, materials

& activities across different disciplines

o

!

Limited technology-enhanced

continuing

instructor education opportunities

o

!

Discipline-specific

knowledge boundaries

o

!

Skills for communication and teamwork

involving cooperation in dynamic groups

of dispersed researchers

o

!

Ability to aggregate & harmonize data

(e.g., fusion of qualitative and quantitative

elements).

knowledge boundaries

(e.g., fusion of qualitative and quantitative

and the current

data, tools, materials

continuing

knowledge boundaries

knowledge boundaries

continuing

(7)

9(-1*:;7<8*!"#$%&'()*9(0@()1)&2

3

o

!

Drivers

: Motivations, datasets (6D of Big Data),

challenges, applications

!

o

!

Methods

: foundational and trans-disciplinary

scientific techniques

!

o

!

Tools

: web-services, software, code, platforms

!

o

!

Analytics

: practice of data interrogation

!

(8)

!D'2&1)&*7%&%*8$'1)$1*

!"#$%&'()*+*,-%')').3

o

!

Undergraduate DS Degree Program

!

o

!

Stats + Engineering

!

o

!

Started Fall 2015

!

o! www.eecs.umich.edu/eecs/undergraduate/data-science!

Undergraduate DS Degree Program

o

!

DS Summer Institute (Public Health)

!

o

!

20+ faculty, 40+ trainees

!

o

!

Full support/in residence (1 month)

!

o

!

BigDataSummerInst.sph.umich.edu

!

o

!

Big Data Summer Bootcamp

!

o

!

5 years running,

Business-Engineering

!

o

!

Practice of Econ-Bio-Social

Analytics

!

o

!

ibug-um15.github.io/2015-summer-camp

!

o

!

Graduate DS Certificate Program

!

o

!

Transdisciplinary

(SM,SPH,LS&A,SN,CoE,SI)!

o

!

12 cr, Modeling+Technology

+Practice

!

o

!

http://MIDAS.umich.edu/certificate

!

Full support/in residence (1 month)

BigDataSummerInst.sph.umich.edu

(9)

:;7<8*!"#$%&'()*+*,-%')').*EF(').*>(-G%-"H

3

o

!

Graduate Data Science Certificate Program

!

o

!

Enrollment of 100 (UMich) students

!

o

!

Develop the Online Graduate Data Science Certificate Curriculum

!

o

!

Develop a 5-yr dual undergrad-grad MS Program in Data Science

!

o

!

DS Summer Institute (Public Health)

!

o

!

Big Data Summer Bootcamp

!

o

!

MIDAS Seminar Series (academia, industry, government, partners)

!

o

!

Web-streamed and archived for broader impact

!

o

!

National Events

!

o

!

BD2K, Midwest Big-Data-Hub, Math, Stats, Computer Vision,

Machine Learning, Informatics, …

!

o

!

Funding: NIH, NSF, Foundation: Research & Traineeship

(10)

:'$C'.%)*7'I1-1)$1*')*3

7%&%*8$'1)$1*!"#$%&'()*+*,-%')').3

o

!

Public-Private-Partnership (PPP)

Model

!

o

!

Integration of Research + Practice +

Education

!

o

!

Problems + Modeling + Technology +

Applications

!

UMich

Industry

Fed

Orgs

Community

Problems + Modeling + Technology +

Applications

Problems + Modeling + Technology +

(11)

8&%&'2&'$2*J)5')1*9(0@#&%&'()%5*K12(#-$1*E8J9KH*

;)&1.-%&'()*(L*!"#$%&'()M*K121%-$C*+*/-%$&'$1

3

o

!

Probability and Statistics EBook

!

o

!

Motivation, Methods, Techniques,

Practice

!

o

!

wiki.socr.umich.edu/index.php/EBook

!

o

!

Data sets

!

o

!

100’s of Research, Observed & Simulated

!

o

!

wiki.socr.umich.edu/index.php/

SOCR_Data

!

o

!

Hands-on Activities

(Team-focused)

!

o

!

Modeling, Inference, Concept Demos

!

o! wiki.socr.umich.edu/index.php/SOCR_EduMaterials!

o

!

Applets and Webapps

!

o

!

Over 500 Web tools

!

o

!

Distributed services based on Java, HTML5/

JavaScript

!

SOCR has over 9.6M users worldwide since 2002

!

www.SOCR.umich.edu

!

Features: Free, Cloud-service, no-barrier access, LGPL/ CC-BY!

Ref: Dinov et al., TS (2013), DOI: 10.1111/test.12012!

100’s of Research, Observed & Simulated

(Team-focused)

Modeling, Inference, Concept Demos

(12)

:;7<8N8J9K*7%2CO(%-"P*6'.*7%&%*>#2'()*!D%0@51

3

(13)

6'.*7%&%*<)%5=&'$2

3

http://socr.umich.edu/HTML5/Dashboard

!

o

!

Web-service combining and

integrating multi-source

socioeconomic and medical

datasets

!

o

!

Big data analytic processing

!

o

!

Interface for exploratory

navigation, manipulation and

visualization

!

o

!

Adding/removing of visual queries

and interactive exploration of

multivariate associations

!

o

!

Powerful HTML5 technology

enabling mobile on-demand

computing

!

(14)

:;7<8*J)5')1*7%&%*8$'1)$1*!"#$%&'()*/-(.-%0*E78!/H3

MIDAS plans to

:

o

!

Develop new hands-on

MOOC DS courses

(data, learning modules,

services, code snippets, applications/partners) …

o

!

Deploy MIDAS

Training as a Service (TaaS)

MOOC platform

o

!

Compile

Datasets

(research-derived, observational, simulated) – identify,

aggregate, manage, navigate, and service exemplary Big Data sets

o

!

Faculty/Mentors

–instructors, collaborators, partners, students and staff to

develop a core DS course-series (4 courses)

o

!

Instructional resources and complete end-to-end

learning modules

o

!

Track and Validate the DSEP Program

– annual reports (MIDAS ETC),

(15)

How to Engage & Partner in MIDAS Education?

Partners

!

o

!

Open-Source

Community

!

o

!

Academia

!

o

!

Trainees

!

o

!

Industry

!

o

!

Non-profit

!

o

!

Government

!

o

!

Philanthropy

!

o

!

Community

Orgs

!

Activities

!

!

Drivers

!

o

!

Challenges

!

o

!

Datasets

!

o

!

Applications

!

Methods

!

!

Technologies

!

!

Tools/

Services

!

!

Training Opps

!

o

!

Mentorship

!

o

!

DS Projects

!

o

!

Internships

!

(16)

:;7<8*!"#$%&'()*+*,-%')').*,1%0

3

MIDAS Education & Training Committee

Ivo Dinov, Margaret Hedstrom, Honglak Lee, Sebastian Zöllner, Richard Gonzalez, Kerby Shedden

Other Contributing MIDAS Faculty

Engineering

Alfred Hero: Electrical Engineering and Computer Science; Biomedical Engineering, Stats H. V. Jagadish: Electrical Engineering and Computer Science

Mike Cafarella: Computer Science and Engineering

Karthik Duraisamy: Atmospheric, Oceanic, and Space Sciences Judy Jin: Industrial & Operations Engineering

Dragomir Radev: School of Information; Computer Science and Engineering; Linguistics

Information & Health Sciences

Brian Athey: Computational Medicine and Bioinformatics, Medicine Carl Lagoze: School of Information

Qiaozhu Mei: School of Information Jeremy Taylor, Biostatistics, Public Health

Basic Sciences

Vijay Nair: Statistics & Engineering

George Alter: Institute for Social Research, History Christopher Miller: Physics & Astronomy

August Evrard: Physics & Astronomy Anna Gilbert: Mathematics, Engineering

Stephen Smith: Ecology and Evolutionary Biology

Ambuj Tewari: Statistics; Computer Science and Engineering

Contact

http://MIDAS.umich.edu

!

[email protected]

!

[email protected]

(17)

!"#$%&'()*+*,-%')').*/-(.-%0*/-121)&1-23

Ivo D. Dinov

Assoc. Director MIDAS

Ed & Training

Nandit Soparkar

CEO, Ubiquiti

Patrick Harrington

Co-Founder/Chief Data Scientist

CompGenome.com

Erin Shellman

(18)

!"#$%&'()*+*,-%')').*/-(.-%0*/-121)&1-23

Ivo D. Dinov

Assoc. Director

MIDAS

Ed & Training

Nandit Soparkar

CEO, Ubiquiti

Patrick Harrington

Co-Founder

Chief Data Scientist

CompGenome.com

Erin Shellman

Research Scientist

AWS

(19)

6'.*7%&%*8$'1)$13

From a descriptive to a constructive definition of Big Data

!

IBM’s 4V’s – volume, variety, velocity

(speed)

and veracity

(reliability)

!

MIDAS Big Data Characterization

– identifies gaps, challenges & needs

Example: analyzing observational data of

1,000’s Parkinson’s disease patients

based on 10,000’s signature biomarkers

derived from multi-source imaging,

genetics, clinical, physiologic, phenomics

and demographic

!

data elements.

!

!

Software developments, student training,

service platforms and methodological

advances associated with the Big Data

Discovery Science all present existing

opportunities for learners, educators,

researchers, practitioners and policy

makers

!

Mixture of quantitative! & qualitative estimates! Dinov, et al. (2014)! Incompleteness

Size

Expected 2016 Increase 2018 Expected 2016 Increase 2018 Expected 2016 Increase 2018

(20)

Q-="1-R2*5%GP*!D@()1)&'%5*F-(G&C*(L*7%&%*3

Dinov, et al., 2014

!

Gryo_Byte Cryo_Short Cryo_Color 0 2E+15 4E+15 6E+15 1 µm 10 µm 100 µm 1mm 1cm Gryo_Byte Cryo_Short Cryo_Color 1mm Neuroimaging(GB)

Genomics_BP(GB) Moore’s Law (1000'sTrans/CPU)

0 5000000 10000000 15000000 Neuroimaging(GB) Genomics_BP(GB)

Moore’s Law (1000'sTrans/CPU)

Increase of Image Resolution

!

Data volume

!

Increases faster

!

than computational power

!

"

Time

"!

Walter, Sc i A m , 2005 ! storag

e doubles every 1.2 yrs

! Computational pow er ! dou ble each 1.6 years !

(21)

7%&%*8$'1)$1*,-%')').*/-(.-%0P*9(-1*9#--'$#5#03

Themes!

Training

!

Examples

!

Data

Scr

ubbing

!

Big Data Management

!

Big Data technology, Searching, indexing, memory management, Information extraction, feature selection, Supervised and unsupervised-learning, stream mining

Big Data Representation

!

Matrix Representation of Sets, Minhashing, Jaccard Similarity, Distance Measures, Euclidean Distances, Jaccard Distance, Cosine Distance, Edit Distance, Hamming Distance, Networks, Graph similarity, Sets as Strings, Prefix Indexing, Data Streams and Processing, Representative Samples, Bloom Filter, Flajolet-Martin Algorithm, TF/IDF, MoM, MLE, Alon-Matias-Szegedy Algorithm for Second Moments, Datar-Gionis-Indyk-Motwani Algorithm (DGIM)

!

Data

Mining

!

Mining

Social-Network Graphs

!

Bio-Social Network Graphs, Root-Mean-Square Error, UV-Decomposition, The NetFlix Challenge, Tweeter Challenge, Distances and Clustering of Social-Network Graphs, Network Cluster

Betweenness, Complete Bipartite Subgraphs, Graph Partition, Matrices and Graphs, Eigen-values/ vectors of the Laplacian Matrix, Graph Neighborhood Properties, Diameter of a Graph, Transitive Closure and Reachability, Crowdsourcing Algorithms

Modeling

!

Statistical Modeling, Machine Learning, Computational Modeling, Feature Extraction, Power Laws, Map-Reduce/Hadoop

!

Link Analysis

!

PageRank, Spider Traps, Taxation and loops in Big Data traversal, Random Walks

!

Data

Inference

!

Clustering

!

Curse of Dimensionality, Classification and Regression Trees/Random Forests, Hierarchical Clustering, K-means Algorithms, Bradley, Fayyad, and Reina (BFR) Algorithm, CURE Algorithm, Clusters in the GRGPF Algorithm

!

Classification

!

Random Forest, SVM, Neural Networks, Latent Class Models, Finite Mixture Models

!

Dimensionality

Reduction

!

Eigenvalues and Eigenvectors, Principal-Component Analysis, Singular-Value Decomposition, Wrapper-Based vs Filter-Based Feature Selection, Independent Components Analysis,

(22)

BD

Big Data

Information

Knowledge

Action

Raw Observations

Processed Data

Maps, Models

Actionable Decisions

Data Aggregation

Data Fusion

Causal Inference

Treatment Regimens

Data Scrubbing

Summary Stats

Networks, Analytics

Forecasts, Predictions

Semantic-Mapping

Derived Biomarkers

Linkages, Associations

Healthcare Outcomes

BD

BD

BD

BD

BD

BD

BD

Information

Knowledge

Processed Data

Maps, Models

I

K

Knowledge

Action

Maps, Models

Actionable Decisions

Knowledge

Maps, Models

References

Related documents