• No results found

Predicting Student Persistence Using Data Mining and Statistical Analysis Methods

N/A
N/A
Protected

Academic year: 2021

Share "Predicting Student Persistence Using Data Mining and Statistical Analysis Methods"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Predicting Student

Persistence Using Data

Mining and Statistical

Analysis Methods

Koji Fujiwara

Office of Institutional Research and Effectiveness

Bemidji State University & Northwest Technical College December 18, 2012

(2)

NTC Enrollment

Northwest

 

Technical

 

College

 

enrollment

 

grew

 

very

 

quickly

 

between

 

2004

 

and

 

2009,

 

but

 

now

 

it

 

has

 

been

 

falling

 

off.

 Fall 2004 Headcount = 794

 Fall 2009 Headcount = 1,604

(3)

Fall Semester Headcount and FTE

 FTE = Undergraduate Semester Hours / 15.

1,217 1,373 1,604 1,420 1,384 1,168 779 744 859 819 749 650 0 200 400 600 800 1,000 1,200 1,400 1,600 1,800

Fall 07 Fall 08 Fall 09 Fall 10 Fall 11 Fall 12

Headcount (30th day) FTE (Final)

(4)

NTC Enrollment Report

Headcount and FTE for Term by Campus - Northwest Technical College

Spring 2013 Date: 12/17/2012 Days to beginning of term: 27.5693

Unduplicated Duplicated

Campus Headcount Headcount FTE Credits FYE

NTC 576 576 366.13 5492 183.07

NTC Distance 696 895 309.07 4636 154.53

Total 1272 1471 675.19 10128 337.60

(5)

Retention & Grad Rates – IPEDS definition

78% 74% 81% 71% 76% 40% 37% 48% 35% 39% 32% 29% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

Fall 2007 (166) Fall 2008 (114) Fall 2009 (170) Fall 2010 (141) Fall 2011 (104)

Cohort (Size)

Fall to Spring Fall to Fall 150% Grad Rates

(6)

Retention/Success Study at NTC (& other colleges)

A

 

study

 

was

 

conducted

 

last

 

year

 

to

 

identify

 

the

 

factors

 

that

 

affected

 

student

 

persistence

 

(success,

 

to

 

be

 

defined

 

later).

 Factors: Credit Completion Rate & GPA

The

 

first

 

semester

 

credit

 

completion

 

rate

 

was

 

analyzed

 

using

 

a

2

 

(success

 

status:

 

successful

 

vs.

 

unsuccessful)

 

×

2

 

(college

 

ready

 

status:

 

ready

 

vs.

 

underprepared)

 

×

2

 

(late

 

registration

 

status:

 

regular

 

vs.

 

late)

(7)

What did we learn? (AIRUM 2011 / AIR 2012)

0 20 40 60 80 100

Early Reg. Late Reg. Early Reg. Late Reg. Underprepared College Ready

Unsuccessful Successful

Credit Comp. Rate (%)

We need to focus on 

these students as well.

(8)

More Proactive Retention/Success Efforts

How

 

can

 

an

 

IR

 

office

 

contribute

 

to

 

this

 

project?

 

We

 

looked

 

at

 

eight

 

data

 

mining

 

and

 

two

 

statistical

 

models

 

to

 

find

 

the

 

best

 

method

 

to

 

develop

 

a

 

predictive

 

model

 

for

 

(9)

Proposed Retention/Success Plan

[Summer]

Retention/Success Study 

& Develop a Model

[Fall]

Run the model

[Spring]

Interventions

(10)
(11)

Target Variable

Target

 

Variable:

 

Second

 

Fall

 

Success

 Define “Second Fall Success = Success” if

At the beginning of second Fall

 Students came back to NTC (retained) or

 Students transferred out to another institution (transferred) or

 Students completed the program (graduated)

(12)

Models

Decision Tree

Naive Bayes Classifier

K‐Nearest Neighbor Classifier

Neural Network

Support Vector Machine

Bagging

Boosting

Random Forest

Probit Regression

(13)

DATA

Combined

 

Fall

 

2008

 

– 2010

 

first

year

,

 

full

time

,

 

degree

 

seeking

students

 

data

 There was no difference in the success status and cohort year (Chi‐

square test, p = 0.72).

In

 

this

 

study,

 

we

 

included

 

first

time

 

regular

 

students

and

(14)

Predictor Variables

Age

Gender (Female/Male)

College Ready Status (Underprepared/Ready)

• if a student needed to take at least one college readiness course, 

then the student was categorized as “Underprepared”.

Pell Status (No/Yes)

Late Registration Status (before or on/after August 1st)

Underrepresented Status (No/Yes)

(15)

Predictor Variables

# Attempted Credit Hours

Enrolled Semester GPA

Enrolled Semester Credit Completion Rate

(16)
(17)

How to Estimate a Model Performance?

Simple

 

Cross

Validation

Original Data

N = 622

Testing (for validation)

N2 = 311

Training (for modeling)

(18)

How to Compare the Results?

We

 

propose

 

to

 

use

 

these

 

three

 

indicators:

 Overall Classification Rate

 Sensitivity

 Prob. that the model classified as “Successful” when given to a 

group of students who were actually “Successful”

 Specificity

 Prob. that the model classified as “Unsuccessful” when given to a 

(19)
(20)

Results

Method Classification Sensitivity Specificity

Logistic Regression  81.7% 89.8% 67.8%

Probit Regression 81.7% 90.8% 66.1%

Decision Tree 81.4% 89.3% 67.8%

Naive Bayes Classifier 82.3% 90.3% 68.7%

K‐Nearest Neighbor (5) 82.3% 90.3% 68.7%

Neural Network 81.4% 86.7% 72.2%

Support Vector Machine 82.6% 92.9% 65.2%

Bagging 82.3% 89.8% 69.6%

Boosting 78.9% 87.8% 63.5%

(21)

Conclusion and Future Research

We

 

are

 

going

 

to

 

use

 

a

 

Neural

 

Network model

 

and

 

a

 

bagging model.

December

• These models will be applied to the actual data. • The list will be sent to the Student Services.

January

• The Student Services will start contacting to the students. • Interventions (what types?)

Next Summer 

and Fall

• Summer – Conduct Retention Study • Fall – Evaluate the models

(22)

Conclusion and Future Research (contd.)

Current

 

Model

 Need Students First Semester Academic Performance Data

 Interventions: Spring Semester

Ideal

 

Model

 Can run before the Fall semester begins

 Need other predictors: i.e.) FA data?

References

Related documents

schwarz cherry/cover black cerisier/couverture noir Raffhalter Miami, 14 cm Tie Back Miami, 14 cm Rinceau Miami, 14 cm Wandabstand, in Metall walldistance, in metal distance de mur,

John Paul Crowe P 25/08/2014 development of the following: A loose "A" roof cattle shed, ancillary open slatted tank and feed passage, loose lean-to calf creep attached

The results of the sensorial profile indicate that Blueberry & Mint gums show better visual aspect indicators (colour, homogeneity, and general aspect),

F IGURE 6.—Box plots (horizontal line ¼ median, box ¼ interquartile range, whiskers are defined in Methods) of median (A) annual, (B) seasonal, and (C) diel movement rates of

A comprehensive overview on the effect of privatisation on efficiency is provided by Megginson and Netter (2001), who summarise existing empirical studies in the tele-

(2002) explored the construct of entrepreneurial marketing and suggest that it is "the proactive identification and exploitation of opportunities for acquiring and

S2 S2 S1 S1 ID ID Operate Operate w/ Restraint w/ Restraint IOmin IOmin High High Current Current Setting Setting Restrain Restrain Load flow w/ Load flow w/ CT Error . CT

The WWWTB initiative had significant overlaps with the P6 implementation due to the fact that the WWWTB effort was intended to define the boundaries of the portfolio of projects