• No results found

Big Data in Education

N/A
N/A
Protected

Academic year: 2021

Share "Big Data in Education"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data in Education

Alex J. Bowers, Ph.D.

Associate Professor of Education Leadership

Teachers College, Columbia University

(2)
(3)

Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly.

(4)

Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly.

You’re dealing with Big Data when you’re working with data that doesn’t fit into your computer unit. Note that makes it an evolving definition: Big Data has been around for a long time. . . . Today, Big Data means working with data that doesn’t fit in one computer. p.24

(5)
(6)

Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly.

You’re dealing with Big Data when you’re working with data that doesn’t fit into your computer unit. Note that makes it an evolving definition: Big Data has been around for a long time. . . . Today, Big Data means working with data that doesn’t fit in one

computer. p.24

A data scientist is someone who knows how to extract meaning from and interpret data, which requires both tools and methods from statistics and machine learning, as well as

being human… Once she gets the data into

shape, a crucial part is exploratory data analysis, which combines visualization and

data sense… She’ll communicate with team members, engineers, and leadership in clear language and with data visualizations so that even if her colleagues are not immersed in the data themselves, they will understand the implications. p.16

(7)

Statistician

Data Scientist

http://flowingdata.com/2013/12/18/data-scientist-surpasses-statistician-on-google-trends/ December 18, 2013

Data scientist surpasses statistician on Google

Trends

(8)

Mike Bostock & D3:

github.com/mbostock

(9)

Kris Erickson http://ericksondata.com/wp/2012/venn-diagram-of-data-science/

(10)

But what do school practitioners (the users)

actually say that they need?

(11)

But what do school practitioners (the users)

actually say that they need?

Brocato, K., Willis, C., Dechert, K. (2014) Longitudinal Data Use: Ideas for District, Building and Classroom Leaders. In A.J. Bowers, A.R. Shoho, B. G. Barnett (Eds.) Using Data in Schools to Inform Leadership and Decision Making (forthcoming). Charlotte, NC: Information Age Publishing Inc.

(12)

Brocato, K., Willis, C., Dechert, K. (2014) Longitudinal Data Use: Ideas for District, Building and Classroom Leaders. In A.J. Bowers, A.R. Shoho, B. G. Barnett (Eds.) Using Data in Schools to Inform Leadership and Decision Making (forthcoming). Charlotte, NC: Information Age Publishing Inc.

(13)

Brocato, K., Willis, C., Dechert, K. (2014) Longitudinal Data Use: Ideas for District, Building and Classroom Leaders. In A.J. Bowers, A.R. Shoho, B. G. Barnett (Eds.) Using Data in Schools to Inform Leadership and Decision Making (forthcoming). Charlotte, NC: Information Age Publishing Inc.

What components of a statewide longitudinal

data system are needed for that system to best

meet the respective needs of superintendent,

principal and teacher leaders?

(14)

Superintendents

Teachers

Principals

Longitudinal Data System

What components of a longitudinal data system are needed for that system

to best meet the respective needs of superintendent, principal, and teacher

leaders?

(15)

Superintendents (

n

=55)

Individual Student/Teacher

• Teacher data: performance, attendance, past experience

• Who rides the bus

• Record of parent conferences

• Test scores for students as well as whether or not they

have met growth

Comparative

• All district’s comparative test data, district demographics

• Access test data, budget information, comparing per pupil

expenditures, demographic data for the local area as

compared to the region and state

• Student growth from year to year, demographic group

growth

(16)

Principals (

n

=45)

Teacher Data

• Teachers’ past performance and years of experience

• Background of teachers and records showing student test

scores broken down by teacher

• Student and teacher achievement and growth data,

teacher evaluation information, student attendance and

dropout data, staff attendance, accountability information

Student Data

• Achievement information on teachers based on student

performance

• Background of teachers being hired; teacher performance

data

• Are we able to keep high performing teachers in our

district?

(17)

Teachers (

n

=193)

(18)

Brocato, Willis & Dechert (2014) p.6

(19)

Figure 1. Summary of eight themes and corresponding frequency percentage in the “more frequent” level of category

What components of a longitudinal data system are needed for that system

to best meet the respective needs of superintendent, principal, and teacher

leaders?

(20)

Which stages of the Data Science Process do we do well in education?

Where are the challenges?

(21)

K-12 Schooling Outcomes

Dropout risk

Accuracy, precision, sensitivity & specificity

Longitudinal patterns in teacher assigned grades

Data driven decision making (3DM)

Targeted and data-informed resource allocation

to improve schooling outcomes

Bowers, A.J., Sprott, R., Taff, S.A. (2013) Do we Know Who Will Drop Out? A Review of the Predictors of Dropping out of High School: Precision, Sensitivity and Specificity. The High School Journal.96(2), 77-100. http://hdl.handle.net/10022/AC:P:21258

(22)

Issues of Accuracy in Predicting High School Dropout

• Dropping out of high school in the U.S. is associated with a multitude of negative outcomes.

• However, other than a select number of demographic and background variables, we know little about the accuracy of current dropout predictors that are school-related.

• Current predictions accurately predict only about 50-60% of students who actually dropout.

• According to Gleason & Dynarski (2002), accurate prediction of who will dropout is a resource and efficiency issue:

– A large percentage of students are mis-identified as at-risk.

– A large percentage of students who are at-risk are never identified.

• Current reporting of dropout “flags” across the literature is haphazard.

– Almost none report accuracy

– Many report specificity or sensitivity, but rarely both

• A dropout flag may be highly precise, in that almost all of the students with the flag dropout, but may not be accurate since the flag may identify only a small proportion of the dropouts.

(23)

Re-analyzing Past Dropout Flags for Accuracy,

Precision, Sensitivity and Specificity

Literature search.

Queried multiple databases:

JSTOR, Google Scholar, EBSCO, Educational Full Text Wilson Web

Studies were included that:

Were published since 1979

Examined High School dropout

Examined school-wide characteristics and included all students

A focus on the student level

Reported frequencies for re-analysis

Initially yield 6,434 overlapping studies

301 studies were read in full

140 provided school-wide samples and quantifiable data

36 articles provided enough data for accuracy re-calculations

Yield 110 separate dropout flags

(24)

Dropout Graduate Dropout a True-positive (TP) Correct b False-positive (FP) Type I Error a+b Graduate c False-negative (FN) Type II Error d True-negative (TN) Correct c+d a+c b+d a+b+c+d=N Event P re di ct or

Precision = a/(a + b) Positive Predictive Value

True-Positive Proportion = a/(a + c) Sensitivity

True-Negative Proportion = d/(b + d) Specificity

False-Positive Proportion = b/(b + d) 1-Specificity

Event table for calculating dropout contingency proportions

Fawcett (2004); Hanley & McNeil (1982); Swets (1988); Zwieg & Campbell (1993)

“Hits”

(25)

0

0.2

0.4

0.6

0.8

1.0

T

ru

e

-p

o

s

it

iv

e

p

ro

p

o

rt

io

n

(Se

n

s

it

iv

it

y

)

0

0.2

0.4

0.6

0.8

1.0

False-positive proportion (1-Specificity)

Perfect prediction

Better prediction

Worse prediction

Low attendance

An example of the true-positive proportion plotted against the false-positive

proportion for Balfanz et al. (2007) comparing the relative operating characteristics (ROC) of each dropout flag.

(26)

0

0.2

0.4

0.6

0.8

1.0

T

ru

e

-p

o

s

it

iv

e

p

ro

p

o

rt

io

n

(Se

n

s

it

iv

it

y

)

0

0.2

0.4

0.6

0.8

1.0

False-positive proportion (1-Specificity)

Relative operating characteristics (ROC) of all dropout flags reviewed, plotted as the true-positive proportion against the false-positive proportion. Numbers refer to dropout indicator IDs.

(27)
(28)

0

0.2

0.4

0.6

0.8

1.0

T

ru

e

-p

o

s

it

iv

e

p

ro

p

o

rt

io

n

(Se

n

s

it

iv

it

y

)

0

0.2

0.4

0.6

0.8

1.0

False-positive proportion (1-Specificity)

Relative operating characteristics (ROC) of all dropout flags reviewed, plotted as the true-positive proportion against the false-positive proportion. Numbers refer to dropout indicator IDs.

(29)

0

0.2

0.4

0.6

0.8

1.0

T

ru

e

-p

o

s

it

iv

e

p

ro

p

o

rt

io

n

(Se

n

s

it

iv

it

y

)

0

0.2

0.4

0.6

0.8

1.0

False-positive proportion (1-Specificity)

Relative operating characteristics (ROC) of all dropout flags reviewed, plotted as the true-positive proportion against the false-positive proportion. Numbers refer to dropout indicator IDs.

(30)

http://itp.wceruw.org/documents/KnowlesDropoutEarlyWarningSystemforITPJan2013.pdf

(31)

Knowles, J. (2014)

References

Related documents

ry‟abanyamigabane bafite ubwiganze mu migabane kandi bafite ijambo rya mbere mu buyobozi bwe biyemeje mu buryo bwemewe n‟Ikigo ndetse n‟Isoko ry‟Imari n‟Imigabane

Church Health: A Comparison of the Clergy and Laity Perspectives on the Church’s Discipleship, Spiritual Formation, and Outreach.. An Honors College Project Thesis

This course covers the major elements of logistics management including gaining competitive advantages through logistics and supply chain, the customer service dimension of logistics,

With a view to strengthening relations among US allies in Asia, the new strategy documents for the first time call on Japan to pursue trilateral security cooperation among the

Simulating clinical concentrations and delivery rates of a typical intravenous infusion, a variety of routinely used pharmaceutical drugs were tested for potential binding to

In this study, we assessed the dynamics of the abundance and community structure of selected soil bacterial communities as a function of plant cultivar, growth stage, and soil

Although the CJEU has not specified whether in case the conditions above are met national courts shall apply a longer limitation period or not apply any limitation period at all,

The results suggest that financial development measured by broad money and domestic credit to private sector has a highly statistically significant negative effect on the