• No results found

How do we train Data Scientists and Data Engineers?

N/A
N/A
Protected

Academic year: 2021

Share "How do we train Data Scientists and Data Engineers?"

Copied!
52
0
0

Loading.... (view fulltext now)

Full text

(1)

How do we train Data Scientists

and Data Engineers?

Eric Rozier

Asst Prof of EECS at the University of Cincinnati

Faculty Mentor DSSG at the University of Chicago

(2)

Training the Next Generation of

Data Scientists

Focus on two main programs:

– Summer 3 month intensive program

• DSSG

– Normal year curriculum development to support in class hands-on experiences

(3)
(4)
(5)
(6)
(7)
(8)

http://dssg.uchicago.edu @datascifellows

Eric & Wendy Schmidt

Data Science for Social Good

(9)
(10)

Data Science for Social Good @datascifellows

What is DSSG?

40-50 Fellows in teams of 3-4 Experienced Mentors 12 weeks in Chicago Impactful problems with non-profit & govt

partners

Data Science for

Social Good Fellowship

(11)

Goals of the Fellowship

• Train data scientists who care about and understand how to solve social problems

• Expose and train governments & non profits to use data to make better decisions

• Seed a community of people and organizations working together to make social impact

• Create open source data science tools that are targeted at the needs of high impact social problems

(12)

Data Science for Social Good @datascifellows

48 Fellows

8 Mentors

14 Projects

12 Weeks

36 Fellows

6 Mentors

12 Projects

12 Weeks

2013

2014

By the Numbers

(13)

Ideal Fellows

Making an Impact with Data

Computer Science &

Programming Statistics & Machine Learning

Econometrics & Social Science Methods

Databases Experimental Design

Communication Problem Formulation

(14)

Data Science for Social Good @datascifellows

~1000

Applicants

40

countries ~250 Universities

84

fellows

Computer Scientists

Statisticians Economists Public Policy

(and other computational and quantitative fields)

2013-2014 Fellows

CMU U. of Chicago Northwestern Harvard MIT Stanford ITAM Cornell Yale Villanova Ohio State USC U Penn Notre Dame U of Minnesota U of Michigan Cambridge McGill UC Berkeley U of Colorado Swarthmore Oberlin UIUC Emory Duke Fordham Johns Hopkins IIT SAIC NYU Penn State Simon Fraser UC Santa Barbara

(15)

Partners: Non-Profits, Government Agencies, Corporations with a Social Mission

Geographies: Local, State, National, and International

Types of Problems: Impact Evaluation, Targeting, Risk Modeling,

Types of Data: Structured data, geospatial data, time series, text data, network data

(16)

Data Science for Social Good @datascifellows

Health Energy Education

Economic Development Corruption Federal Budgeting Predicting lead poisoning Home inspection data Reducing energy use via

disaggregatio n Smart-meter data Predicting high school dropout Education records Targeting and assessing urban revitalization Administra -tive data Detecting collusion Contract data Identifying earmarks Congress-ional bills

(17)
(18)

Data Science for Social Good @datascifellows

• Improving high school graduation rates by identifying at-risk students early

• Increasing government transparency by identifying earmarks

• Developing new strategies to reduce maternal mortality

• Preventing Lead Poisoning by proactive home inspections and health check-ups

(19)

Buildings: 197,157 Time: 76 years Money: $98 million Buildings: 42,695 Time: 16.4 years Money: $21.3 million Buildings: 378 Time: 2 months Money: $189,000

Prediction Saves Time & Money

(20)

The Eric & Wendy SchmidtData Science for Social Good Summer Fellowship 2014

At Risk Children

Lead Levels During Childhood Even without detailed child-level features, there are strong, sanity-checked, prediction-capable patterns

(21)
(22)

Target: Prediction From Birth

(23)
(24)

Data Science for Social Good @datascifellows

Who we’re looking for ?

(25)

Expertise in one or more of the following ares

• Computer Science • Statistics

• Public Policy • Social Science

• Other Quantitative or Analytical Areas

• Some coding experience

• Passion for making a social impact

• Problem solving (critical thinking) experience • Enjoy working on a team

(26)

Data Science for Social Good @datascifellows • Deep expertise in computer science, machine learning, statistics, or social sciences • Experience working on real problems in industry • Experience leading teams

and managing projects

Mentors

Ben Yuhas Principal, Yuhas Consulting Group Eric Rozier Assistant Professor of Electrical and Computer Engineering Kate Cagney

Sociology & Health Studies Director, Population

Research Center U’Chicago

Joe Walsh

Lead Forecaster for GE Healthcare & Policy

(27)

Organizations that…

1. Have an interesting social-impact problem to solve

2. Have data that can help solve it

3. Have a desire to put our work into action

• Especially interested in longer-term collaborations beyond 12 weeks of the fellowship

Project Partners

Governments / Government Orgs Foundations Research Institutions Non-Profits

(28)

Data Science for Social Good @datascifellows

Get Involved!

Application deadlines: Fellows: Feb 1, 2015 • Mentors: Feb 1, 2015 • Partners: Jan 10, 2015

Applications & more info: http://dssg.uchicago.edu • Or email: [email protected]

(29)

Data Science in

the Curriculum with

(30)
(31)

The Data Deluge

Big Data education suffers from similar

challenges.

How do we help

students drink from

the fire hose?

(32)

Big Data and the Curriculum

Big Data is putting pressure on the curriculum

– Not just CS/ECE: Business, Finance, Social Science, Economics, Biology, Medicine, Public Policy

NIH has held several meetings on Big Data

education.

– Wants to integrate

Big Data/Data Science into the regular curriculum.

(33)

NIH Conclusions

Teach from case studies

– Proper training should include hands on experience with real data.

– Use and study of cutting edge:

• Tools

(34)

Teach from case studies

– Proper training should include hands on experience with real data.

– Use and study of cutting edge:

• Tools

• Techniques

(35)

NIH Conclusions

Train Data Scientists to work as team

members.

– The team is one of the most important parts of real data science applications.

(36)

New Ways of Thinking

Get students used to the pace of change,

(37)

New Ways of Learning

(38)

Active Learning

After 2 weeks we tend to remember:

Passive learning

• 10% of what we read

• 20% of what we hear

• 30% of what we see

• 50% of what we hear and see

Active learning

• 70% of what we say

(39)

Bloom’s Taxonomy

Evaluation

Synthesis

Analysis

Application

Comprehension

Knowledge

(40)

Three Pronged Approach

Reading, presenting,

and discussing

current state of the

art.

Hands on study with

real data.

Original research in

(41)
(42)
(43)

Creating a Classroom Around a

Digital Observatory

(44)
(45)

Transitioning a DSSG like

Environment to the Year

Identify a smaller number of partners to work

with larger groups on a longer time scale.

Understand that our expectations need to be

tempered

– Summer – exclusive, competitive program with international recruitment

– Year – drawn from, admittedly excellent, student body at large, motivation may be lower.

(46)

Frontiers of Data Science Class

Several published papers resulting from the

class.

Mixed undergrad and graduate,

interdisciplinary environment.

Awarded Frontiers of

Engineering Education by

the NAE

(47)

Growth of the Course

First year

– 8 students

– Electrical Engineers, Computer Engineers,

Computer Scientists, Environmental Scientists, Economists

(48)

Growth of the Course

First year

– 8 students

– Electrical Engineers, Computer Engineers,

Computer Scientists, Environmental Scientists, Economists

Second year

– 14 students

(49)
(50)

Developing Scalable Infrastructures

Understand the financial limitations of the

classroom

Develop resources which can be leveraged for

research and curriculum, a practical

curriculum based on real experience will have

similar needs anyway!

(51)

The Need for Practice in the

Academy

We need to train ourselves in Data Science to

teach it.

– Many faculty haven’t had real industrial experience with Data Science.

– The field and practice is changing fast.

– Encourage the development of Data Science

workshops, boot camps, and summer programs for faculty as well as students.

(52)

More information

http://dssg.io

References

Related documents

In pursuit of its long-term strategic goal of sustainable profi table growth, Bekaert is aiming for worldwide market and technological leadership in selected applications of its

In this section, we give two examples of decision problem sequences ¯ D in which it seems necessary to let RIAs evaluate bidders on particular subsequences, rather than merely

Because the lower bore seems to generate no radiation, the end correction should be negligible, then its fundamental fre- quency is calculated as 345,000/(220 × 2) = 784 Hz if the

Based on this degradation model, the maintenance cost of two proposed prognostic maintenance policies are briefly reviewed and compared with the traditional degradation

allocation across application needs, (ii) index management to facilitate indexing of data on flash, (iii) storage reclamation to handle deletions and reclamation of storage space,

From 1990 through 1999 almost 3.2 billion guilders from the Netherlands’ budget for development assistance were spent on relief of the external debt of developing countries. A

Interestingly, John states that in his vision, the book oflife was not the only one which was opened for the judgement of the dead (Rev. Hawthorne's Black Man, presumably Satan, has

Intuitively, happen-before based approaches extract happen-before causal partial orders by analyzing thread communication in the observed execution; the extracted causal par- tial