• No results found

Developing Data- Driven Predictive Models of Student Success. Kresge Data Mining Project Phase Two Report

N/A
N/A
Protected

Academic year: 2021

Share "Developing Data- Driven Predictive Models of Student Success. Kresge Data Mining Project Phase Two Report"

Copied!
70
0
0

Loading.... (view fulltext now)

Full text

(1)

           

 

Developing  Data-­‐Driven  Predictive  Models  of  Student  

Success  

 

Kresge  Data  Mining  Project  

Phase  Two  Report  

 

 

 

 

 

 

University  of  Maryland  University  College  

November  27,  2013  

(2)

Table  of  Contents  

Executive  Summary  ...  1  

Introduction  ...  3  

Research  Goals  ...  3  

Section  1:  General  Grant  Overview  ...  4  

Section  2:  Key  Findings  and  Conclusions  from  Phase  1  ...  5  

Objectives  and  Milestones  ...  6  

Section  3:  Relevant  Literature  ...  7  

Section  4:  Data  Sources  ...  10  

Section  5:  Overview  of  Research  Design  and  Target  Variables  ...  10  

Section  6:  Key  Findings  from  Data  Mining    ...  12  

Research  Goal  1:  Profile  students  based  on  community  college  course  taking  behaviors    ...    12  

Figure  1.  Change  in  GPA  for  students  retained  or  not  retained  at  UMUC  ...  14  

Table  1.    Community  college  grade  distributions  for  students  successful  or  not  at  UMUC  ....  15  

Table  2.    Community  college  grade  distributions  for  students  retained  or  not  at  UMUC  ...  16  

Figure  2.    Success  quadrants  ...  16  

Figure  3.    Likelihood  of  community  college  course  selections  for  Stars    ...  18  

Figure  4.    Likelihood  of  community  college  course  selections  for  Strivers  and  Slackers    ...  19  

Figure  5.    Likelihood  of  community  college  course  selections  for  Splitters  ...  20  

Figure  6.    Binned  number  of  community  college  credits  by  community  college  GPA  for  each   success  profile  ...  22  

Figure  7.    Binned  number  of  community  college  credits  by  UMUC  GPA  for  each  success   profile  ...  23  

Figure  8.    Binned  number  of  community  college  credits  by  delta  GPA  for  each  success  profile    ...  24  

Table  3.    Community  college  credits  and  GPA  and  UMUC  GPA  by  success  profile  ...  25  

Section  7:  Key  Findings  from  Predictive  Analyses    ...  26  

Research  Goal  2:  Identify  demographic  profiles  of  MC  and  PGCC  students  transferring  to  UMUC  26   Table  4.    Description  of  demographic  and  community  college  course  taking  background  data   clusters  for  Montgomery  College  ...  27  

Table  5.    Description  of  demographic  and  community  college  course  taking  background  data   clusters  for  Prince  George’s  Community  College  ...  28  

(3)

 

Table  6.    UMUC  first  term  GPA  ...  30  

Research  Goal  4:  Identify  demographic  and  community  college  background  factors  predicting   course  success  at  UMUC  ...  30  

Table  7.    Summary  of  predictors  for  logistic  regressions  predicting  overall  GPA  and  success   in  specific  courses  ...  32  

Research  Goal  4a:  Examine  demographic,  community  college  background  factors,  and  course   efficiency  as  predicting  course  success  at  UMUC  ...  33  

Table  8.    Courses  taken  at  community  college  by  institution  ...  34  

Table  9.    Success  at  UMUC  by  coursework  taken  or  not  taken  ...  34  

Table  10.    Course  efficiency  rates  differentiated  by  types  of  courses  taken  or  not  ...  34  

Table  11.    Results  of  multivariate  logistic  regression  analysis  of  success  at  UMUC      ...  35  

Research  Goal  4b:  Examine  demographic,  community  college  background  factors,  and  change  in   GPA  as  predicting  retention  at  UMUC  ...  36  

Table  12.    Results  of  multivariate  logistic  regression  analysis  of  retention  at  UMUC      ...  36  

Research  Goal  5:  Investigate  predictors  of  behaviors  in  WebTycho  and  success  at  UMUC  ...  37  

Table  13.    Description  of  WebTycho  activity  clusters  for  Montgomery  College  ...  39  

Table  14.    Description  of  WebTycho  activity  clusters  for  Prince  George’s  Community  College  ...  40  

Table  15.    Summary  of  top  ten  predictors  of  success  at  UMUC  and  WebTycho  cluter   membership  ...  42  

Section  8:  Summary  of  Results    ...  44  

Section  9:  Research  and  Intervention  Planning  in  Phase  3  ...  46  

References    ...  50  

Appendices  ...  52    

   

(4)

Executive  Summary      

This  report  documents  analyses  and  findings  completed  in  Phase  2  of  the  Kresge  Data  Mining  Grant:   Developing  Data-­‐Driven  Predictive  Models  of  Student  Success.    This  grant  was  awarded  to  

University  of  Maryland  University  College  (UMUC)  in  collaboration  with  two  community  college   partners:  Montgomery  College  (MC)  and  Prince  George’s  Community  College  (PGCC).    The  purpose   of  the  grant  was:    

 

1. To  build  an  integrated  database  tracking  students  across  institutions,  from  community   college  to  UMUC.  

2. To  use  predictive  statistical  models  and  data  mining  techniques  to  track  and  model   students’  progress  across  institutions.  

3. To  identify  factors  predictive  of  students’  success  at  UMUC  that  may  inform  the  

development  of  interventions  aimed  to  improve  outcomes  for  undergraduate  students   transferring  from  community  colleges  to  UMUC  or  other  four-­‐year  institutions.  

 

In  Phase  1  of  the  grant  UMUC,  in  collaboration  with  partner  institutions,  designed  and  developed  a   database,  the  Kresge  Data  Mart  (KDM),  with  records  of  more  than  250,000  students.  This  database   includes  information  on  student  demographics,  academic  performance  at  UMUC  and  the  

community  college,  and  student  behaviors  in  courses  hosted  in  WebTycho,  UMUC’s  propriety  online   learning  management  system.  

 

Key  results  from  Phase  1  included  a  literature  review  of  publications  on  students’  performance  in   online  courses,  successful  course  completion,  re-­‐enrollment,  and  retention.    Further,  literature  on   data  mining  techniques  in  higher  education  was  examined.    The  literature  review  showed  that   factors  such  as  the  number  of  schools  students  attended,  the  number  of  credits  students   transferred,  and  the  students’  community  college  GPA  were  associated  with  successful  course   completion  and  retention.    Regression  analyses  determined  that  students’  online  classroom   activities  prior  to  the  start  of  a  class  and  during  the  early  weeks  of  the  course  were  predictive  of   successful  course  completion.    

 

In  Phase  1,  three  goals  for  the  project    were  identified:    

1. Validate  the  predictive  models  and  data  mining  techniques  explored  in  Phase  1  on  an   expanded  dataset.  

2. Build  profiles  of  successful  students  and  their  online  learning  behaviors.  

3. Develop  interventions  to  improve  the  success  of  students  transferring  from  community   colleges  to  UMUC.  

 

The  above  three  goals  were  accomplished  in  Phase  2,  which  involved  examination  of  students’   demographic  profiles,  course  work  from  the  community  colleges,  and  performance  at  UMUC.    A   variety  of  methodologies  were  used  to  identify  predictors  of  students’  success  and  retention.    These   include:    

 

1. Cluster  analyses  to  determine  profiles  of  students  based  on  demographic  factors  and   community  college  course-­‐taking  backgrounds.  

(5)

2. Logistic  regression  to  examine  demographic  factors  and  variables  associated  with  students’   community  college  course-­‐taking  histories  to  predict  success  at  UMUC.  

3. Cluster  analyses  to  determine  profiles  of  students’  online  behaviors  in  courses  at  UMUC.     4. Data  mining  techniques  to  identify  profiles  in  the  student  population  based  on  GPA  and  re-­‐

enrollment.    Community  college  grade  distributions  and  course  taking  preferences  for  these   different  groups  of  students  were  examined.    

 

In  addition  to  predicting  outcomes  associated  with  success,  analyses  in  Phase  2  determined  a   variety  of  trends  characterizing  the  student  population  and  developed  student  profiles  based  on   demographics,  prior  academic  work,  and  online  classroom  behavior.      

 

The  primary  outcome  measures  of  interest  in  Phase  2  include  students’  success  at  UMUC,  defined  as   earning  a  first  term  GPA  of  2.0  or  above  and  students’  retention  at  UMUC  within  12  months  

following  their  first  academic  term.    Key  findings  are  presented  below.          

1. Across  studies,  age  and  marital  status  were  associated  with  success  at  UMUC.    Older,   married  students  are  more  likely  to  succeed,  perhaps  indicative  of  students’  maturity  or  a   stronger  commitment  to  their  educational  goals.  

2. Four  success  profiles  of  students  at  UMUC  were  identified  based  on  students’  GPA  and  re-­‐ enrollment.    Profiles  differed  in  terms  of  community  college  course  taking  preferences  and   course  load,  and  in  the  change  in  GPA  when  transferring  to  UMUC.    Again,  these  results   suggests  that  the  degree  of  student  preparedness,  particularly  in  specific  target  areas  (e.g.,   accounting,  economics),  is  predictive  of  success  at  UMUC.  

3. Course  efficiency,  the  ratio  of  credits  earned  to  credits  attempted,  in  the  community  college   was  determined  to  be  a  predictor  of  success  at  UMUC.  The  higher  the  course  efficiency,  the   more  likely  a  student  will  succeed.      

4. A  new  factor,  delta  GPA,  was  introduced  in  these  analyses,  corresponding  to  the  difference   between  students’  GPA  at  the  community  college  and  at  UMUC.    While  most  students   experienced  a  decreased  GPA  when  transferring  to  UMUC,  the  magnitude  of  this  decrease   was  predictive  of  students’  continued  enrollment  at  UMUC,  beyond  the  first  term  (i.e.,   retention)  

5. Similarly,  students  who  took  math  or  honors  courses  in  community  college  were  more   likely  to  succeed  at  UMUC,  suggesting  that  rigor  of  community  college  courses  may  prepare   students  to  succeed  at  a  university.    

6. Students’  behaviors  in  the  online  classroom  indicated  high  variability  in  the  extent  to  which   they  engage  in  course  content  and  course-­‐related  activities.    A  substantial  percentage  of   students  accessed  course  content  and  course  materials  to  a  limited  extent,  thus  impacting   successful  course  completion.        

 

Based  on  findings  in  Phase  2,  interventions  aimed  at  promoting  success  of  transfer  students  at   UMUC  are  presented.    These  interventions  differ  in  the  audience  targeted  and  whether  they  provide   social  support  (e.g..,  peer  mentor)  or  academic  support  (e.g..,  check-­‐list)  to  promote  student  

success.    Further,  long-­‐term  initiatives  to  promote  student  success  that  have  been  developed   collaboratively  with  partner  institutions  are  introduced.  

 

(6)

Introduction    

The  purpose  of  this  report  is  to  document  work  done  by  UMUC,  MC,  and  PGCC  on  the  Kresge  Data   Mining  Grant:  Developing  Data-­‐Driven  Predictive  Models  of  Student  Success.      This  report  has  three   primary  purposes:    

 

1. To  review  prior  work  completed  on  the  Kresge  Data  Mining  Grant  in  Phase  1.  

2. To  document  work  completed  in  Phase  2  of  the  grant,  expanding  on  findings  from  Phase  1.   3. To  introduce  research-­‐driven  future  directions  and  interventions  aimed  at  promoting  

transfer  students’  success  at  UMUC;  the  evaluation  of  these  interventions  will  be   undertaken  in  Phase  3  of  the  Kresge  grant.    

 

The  research  in  this  report  has  been  conducted  by  the  UMUC  Institutional  Research  Office.  Research   from  Phase  2  has  been  documented  in  detail.    This  report  presents  the  research  in  nine  sections:  

 

Section  1:  General  grant  overview  

Section  2:  Key  findings  and  conclusions  from  Phase  1   Section  3:  Relevant  literature  

Section  4:  Data  sources  

Section  5:  Overview  of  research  design  and  target  variables   Section  6:  Key  findings  from  data  mining  

Section  7:  Key  findings  from  predictive  analyses     Section  8:  Summary  of  results    

Section  9:  Research  and  intervention  planning  in  Phase  3    

In  Phase  2,  five  key  research  goals  were  accomplished.    Specifically,  researchers  were  able  to:    

1. Profile  students  at  UMUC  based  on  community  college  course  taking  behaviors.   2. Identify  demographic  profiles  of  MC  and  PGCC  students  transferring  to  UMUC.   3. Determine  MC  and  PGCC  transfer  students’  performance  at  UMUC.  

4. Identify  demographic  and  community  college  background  factors  predicting  course   success  at  UMUC.  

a. Examine  demographic,  community  college  background  factors,  and  course   efficiency  as  predicting  course  success  at  UMUC.  

b. Examine  demographic,  community  college  background  factors,  and  change  in   GPA  as  predicting  retention  at  UMUC.  

5. Investigate  predictors  of  behaviors  in  WebTycho  and  success  at  UMUC.    

 

(7)

Section  1:  General  Grant  Overview    

Grant  Partnership    

UMUC  is  a  four-­‐year  public  university  that  offers  online  degree  programs  to  a  diverse  population  of   working  adults.    With  support  from    this  grant,  UMUC  established  partnerships  with  two  Maryland   community  colleges  that  also  serve  large  and  diverse  student  populations.    Montgomery  College   (MC),  established  in  1946,  enrolls  over  60,000  students  annually.    Prince  George’s  Community   College  (PGCC)  enrolls  more  than  40,000  students  from  approximately  128  different  countries.     Both  institutions  serve  the  metro-­‐D.C.  area,  but  differ  in  that  PGCC  serves  more  low  income   students.    Both  institutions  have  endorsed  the  goals  of  this  project  and  are  committed  to  working   with  UMUC  to  find  ways  to  promote  student  success  throughout  their  academic  careers.  

 

Financial  Support    

The  Kresge  Foundation  awarded  UMUC  a  $1.2  million  grant  to  build  an  integrated  database,  explore   data  mining  techniques,  build  predictive  models  of  student  success,  implement  and  evaluate  

intervention  strategies  that  are  designed  to  improve  student  success,  and  disseminate  the  results  of   this  research  to  national  constituents.  

 

In  Phase  1  of  the  research  study,  approximately  41%  of  total  grant  funds  were  expended  on   purchasing  hardware  to  house  the  data-­‐mining  database,  collecting  data  from  partner  institutions,   and  to  provide  dedicated  salaries  for  a  data  mining  specialist  and  a  graduate  assistant.    Additional   staff  resources  were  provided  in  kind  by  UMUC.    In  Phase  2,  UMUC  expended  funds  for  additional   data  collection,  data  mining  consulting,  and  conferences  presentations.  (See  Appendix  A  for  the   financial  statement.)  In  Phase  3,  expenses  are  expected  to  total  $400,000.    These  funds  are  intended   to  be  spent  on  collecting  additional  data  from  the  community  colleges,  additional  data  mining   research,  and  implementing  interventions,  with  a  graduate  student  to  coordinate  the  interventions.   In  addition,  funds  will  support  a  national  convening  to  present  and  discuss  research  findings  on   educational  data  mining,  predictive  modeling,  and  learner  analytics.      

   

(8)

Section  2:  Key  Findings  and  Conclusions  from  Phase  I    

In  Phase  1,  a  Memorandum  of  Understanding  (MOU)  was  negotiated  and  signed  between  UMUC  and   partner  institutions  in  order  to  clarify  the  data  security  and  parameters  for  use  of  this  data  in  the   research  project.    The  MOU  allows  UMUC  researchers  to  conduct  research  using  individual  student   data  while  protecting  student  information  and  confidentiality.  

 

UMUC,  in  collaboration  with  partner  community  colleges  MC  and  PGCC,  designed,  developed,  and   implemented  a  database  of  over  250,000  student  records.    The  Kresge  Data  Mart  (KDM)  contains   information  on  student  demographics,  academic  performance  at  the  community  colleges  and  at   UMUC,  and  student  behaviors  in  the  online  classroom  at  UMUC.  

 

Key  outcomes  of  Phase  1  included  a  literature  review  on  students’  success  in  online  courses.     Further,  literature  about  the  use  of  data  mining  techniques  in  higher  education  was  identified  and   reviewed;  this  literature  is  described  in  Section  3.  Data  mining  determined  that  factors  associated   with  successful  outcomes  included  students’  prior  academic  work,  namely  the  number  of  schools   students  attended,  the  number  of  credits  students  transferred,  and  students’  GPA  in  community   college.      These  predictors  were  associated  with  both  successful  GPA  and  retention  at  UMUC.     Additional  findings  from  Phase  1  included  that  certain  online  course  behaviors,  such  as  opening  and   reading  conference  notes  in  the  first  four  weeks  of  a  course,  were  associated  with  course  success,  as   was  students’  engagement  in  the  online  classroom  prior  to  the  start  of  a  class.    

 

The  analyses  in  Phase  1  were  focused  on  examining  a  large  variety  of  factors  to  determine  their   value  in  predicting  student  success.  These  findings  were  used  to  develop  initial  predictive  models  of   successful  performance  at  UMUC.  These  predictive  models  were  refined  and  validated  in  Phase  2.  

 

At  the  conclusion  of  Phase  1,  three  goals  for  the  completion  of  the  grant  were  identified:    

1. Validate  the  predictive  models  and  data  mining  techniques  explored  in  Phase  1  on  an   expanded  dataset.  

2. Build  profiles  of  successful  students  and  their  online  learning  behaviors.  

3. Develop  interventions  to  improve  the  success  of  students  transferring  from  community   colleges  to  UMUC.  

     

Objectives  and  Milestones    

Specific  objectives  and  milestones  are  presented  below  for  each  stage  of  the  research  project.     Objectives  from  Phase  1  and  Phase  2  of  the  project  are  abridged  with  planned  Phase  3  work  further   expanded.    These  objectives  and  milestones  have  been  modified  throughout  the  course  of  the   project,  but  are  consistent  with  grant  requirements.  

(9)

Objectives     Milestones   Status  

Phase  1   April  2011  –  October  2012  

Develop  a  Project   Action  Plan  

Develop  a  project  action  and  collaboration  plan  with  the   partnering  agencies.  

Complete   Data  Collection  and  

Preparation  

Prepare  a  data  “universe”  (integrated  database  system)  on  CC   transfer  students  in  the  UMUC  population  (KDM)  

Complete   Understand  variables;  define  student  characteristics  and  

retention  data;  develop  data  dictionary.  

Complete   Data  Analysis   Conduct  initial  predictive  analyses  and  employ  data  mining  

techniques  to  identify  factors  contributing  of  students’   success    

Complete  

Project  Evaluation   Conduct  ongoing  project  evaluation.    Take  action  on   identified  areas  for  improvement.  

Complete  

Phase  2   November  2012  –  October  2013  

Develop  and   Validate  Analytic   Models  of  Student   Success  

Analyze  data  and  identify  factors  that  predict  success/failure.   Complete   Validate  predictive  analyses  and  models  developed  through  

data  mining  techniques  to  predict  students’  success  and   retention  at  UMUC.  

Complete  

Build  student  profiles  based  on  analyses.   Complete   Disseminate  Key  

Findings   Discuss  results  with  Kresge  Workgroup  and  share  with  advisory  board.   Complete   Discuss  results  with  Project  Partners  and  obtain  feedback.   Complete   Present  key  findings  at  national  conferences  on  higher  

education    

Ongoing   Develop  

Interventions  

Work  with  stakeholders  at  UMUC  and  CC  partners  to  develop   a  list  of  potential  interventions.  

Complete   Project  Evaluation   Conduct  ongoing  project  evaluation.    Take  action  on  

identified  areas  for  improvement.  

Ongoing   Research  Plan  3   Design  and  develop  KDM2  to  update  and     In  progress  

Plan  Phase  3  analyses  on  expanded  integrated  data.   In  progress  

Phase  3   November  2013  –  October  2014  

Develop   Interventions  

Review  relevant  literature  on  interventions  that  promote   student  success  in  online  learning.  

In  progress   Develop  an  implementation  plan  and  timeline  for  piloting  of  

interventions.  

In  progress   Implement  Pilot  

Interventions   Implement  and  evaluate  pilot  interventions.   Not  yet  started   Disseminate  

Results  on   Interventions  

Develop  and  disseminate    report  on  the  pilot  interventions     Not  yet   started   Phase  3  Analyses   Develop  and  execute  Phase  3  research  plan   In  progress   Report  Findings   Present  key  findings  from  Phase  3  analyses  at  national  

conferences;  publish  research  in  journals  

Not  yet   started   Prepare  written  report  of  both  Phase  3  analyses  and  full  

scope  of  Kresge  grant  work.   Not  yet  started   Dissemination  of  

Results  and   Resources  

Develop  website  and  repository  for  educational  data  mining  

and  student  success.   Not  yet  started  

Host  a  national  convening  on  data  mining  and  learner  

analytics.     Not  yet  started  

Project  Evaluation   Conduct  final  project  evaluation.       Not  yet   started  

(10)

Section  3:  Relevant  Literature    

The  literature  review  discussed  below  addresses  examinations  of  factors  contributing  to  students’   success  in  online  courses,  research  on  the  use  of  data  mining  techniques  in  educational  research,   and  research  on  factors  impacting  the  success  and  retention  of  non-­‐traditional  students.  A  review  of   published  literature  on  students’  success  in  online  courses,  research  on  the  use  of  data  mining   techniques  in  educational  research,  and  research  on  factors  impacting  the  success  and  retention  of   non-­‐traditional  students  was  undertaken  to  inform  the  development  of  interventions  aimed  at   promoting  students’  success.  

Online  student  success  literature  

Current  literature  on  student  success  focuses  on  student  outcomes  such  as  course  success,  course   withdrawal,  retention,  and  retention.  For  example,  student  variables  such  as  student  

characteristics,  previous  course  work,  grades,  and  time  spent  in  course  discussions  and  activities   may  be  useful  in  predicting  course  success  (Aragon  &  Johnson,  2008;  Morris  &  Finnegan,  2009;   Morris,  Finnegan  &  Lee  2009;  Park  &  Choi,  2009).    Course-­‐level  variables  acquired  from  student   login  data  from  the  learning  management  system  may  have  predictive  value  in  measuring  course   withdrawal  (Willging  &  Johnson,  2008;  Nistor  &  Neubauer,  2010).  Student,  course,  program,  and   institution  level  variables  such  as  student  characteristics,  number  of  transfer  credits,  final  grade  in   any  given  course,  experience  in  online  environments,  and  course  load  may  be  useful  in  predicting   re-­‐enrollment  and  retention  (Aragon  &  Johnson,  2008;  Morris  &  Finnegan,  2009;  Boston,  Diaz,   Gibson,  Ice,  Richardson  &  Swan,  2011).  

 

 Although  these  studies  showcase  a  variety  of  findings  related  to  student  success,  the  majority  of   studies  in  retention  in  online  learning  environments  use  traditional  statistical  or  qualitative   methods.  Park  and  Choi  (2009)  point  out  that  expansion  of  methods  such  as  data  mining  may  have   utility  when  student,  course,  program,  and  institutional  level  variables  are  well  defined  and  

institutionally  meaningful.  Literature  related  to  educational  data  mining  focusses  on  exploratory   research.    

 

Educational  data  mining  literature    

Data  mining  is  a  method  of  discovering  new  and  potentially  useful  information  from  large  amounts   of  data  (Baker  &  Yacef,  2009;  Luan,  2001).  Educational  data  mining  is  a  subset  of  the  field  of  data   mining  that  draws  on  a  wide  variety  of  literatures  such  as  statistics,  psychometrics,  and  

computational  modeling  to  examine  relationships  that  may  predict  student  outcomes  (Romano  &   Ventura,  2007;  Baker  &  Yacef,  2009).  In  educational  data  mining,  data  mining  algorithms  are  used   to  create  and  improve  models  of  student  behavior  in  order  to  better  understand  student  learning   (Luan,  2002).    

 

Data  mining  methods  are  most  helpful  for  finding  patterns  already  present  in  data,  not  necessarily   in  testing  hypotheses  (Luan,  2001).  Baker  and  Yucef  (2009)  suggest  that  research  in  higher   education  should  use  a  variety  of  algorithms,  such  as  classification,  clustering  or  association   algorithms  in  determining  relationships  between  variables.  Although  many  definitions  of  these   techniques  exist  in  data  mining  literature,  Han  and  Kamber  (2001)  offer  the  following  definitions.   Classification  is  the  process  of  finding  a  set  of  models  or  functions  that  describe  and  distinguish  data   classes  or  concepts  to  predict  a  class  of  objects  whose  class  label  is  unknown.  Clustering  analyzes   data  objects  that  are  related  to  similar  outcomes  without  consulting  a  class  label.  Association  is  the  

(11)

discovery  of  rules  showing  attribute  value  conditions  that  occur  frequently  together  in  a  given  set   of  data  (Han  &  Kamber,  2001).    

   

Recent  research  suggests  that  these  data  mining  algorithms  can  be  used  to  examine  variables   related  to  student  success.  Yu,  DiGangi,  Jannach-­‐Pennell,  Lo,  and  Kaprolet  (2010)  used  a  

classification  algorithm  to  explore  potential  predictors  related  to  student  retention  in  a  traditional   undergraduate  institution.  In  this  study,  the  authors  used  a  decision  tree  to  explore  demographic,   academic  performance,  and  enrollment  variables  as  they  related  to  student  retention.  This  study   revealed  a  predictable  relationship  between  earned  hours  and  retention,  but  also  found  that  at  this   institution,  retention  was  closely  related  to  state  of  residence  (in-­‐state/out  of  state)  and  living   location  (on  campus/off  campus).  The  authors  speculate  that  this  finding  points  to  the  potential   utility  of  online  courses  in  improving  retention  for  out-­‐of-­‐state  or  off-­‐campus  students.    

 

Despite  these  recent  developments  in  exploring  variables  related  to  student  success  in  traditional   higher  education  settings,  research  using  data  mining  techniques  to  uncover  relationships  among   variables  in  online  courses  is  limited  in  scope.  This  study  is  designed  to  fill  this  gap  in  the  extant   literature  by  utilizing  data  on  online  students  who  attended  multiple  institutions.    

 

Retention  in  Non-­‐Traditional  Student  Populations    

Historically,  research  on  student  retention  largely  focused  on  the  experiences  of  traditional  

students,  until  a  seminal  book  by  Tinto  (1993)  expanded  on  extant  models  of  retention  to  consider   which  factors  may  impact  the  retention  of  non-­‐traditional  students.    Across  the  literature,  non-­‐ traditional  students  are  considered  to  be  those  above  age  26  or  taking  classes  through  non-­‐ traditional  pathways,  including  distance  and  online  learning.    For  both  traditional  and  non-­‐ traditional  students,  retention  was  thought  to  be  a  consequence  of  students’  academic  and  social   integration  (Tinto,  1993).    Other  research  has  echoed  the  central  role  of  social  factors  in  predicting   retention  for  non-­‐traditional  students,  online,  and  distance  learners  (Boston,  Diaz,  Gibson,  Ice,   Richardson,  &  Swan,  2009).  At  the  same  time,  the  processes  and  policies  that  foster  social  

integration  in  online  environments  are  different  from  the  factors  that  foster  social  connections  in   more  traditional  settings.  For  students  enrolled  in  online  courses,  feelings  of  social  integration  may   stem  from  learners  and  instructors  conveying  a  sense  of  themselves  through  the  use  of  para-­‐ language  (i.e.,  emoticons),  self-­‐disclosure,  humor  or  other  verbal  expressions  of  personal  emotions   and/or  values  (Boston  et  al.,  2009).    These  behaviors  are  believed  to  result  in  open  communication,   trust,  and  group  cohesion  and  are  identified  as  necessary  for  successful  collaboration  (Boston  et  al.,   2009).  

 

Using  social  network  analysis,  Dawson  (2010)  found  that  visualizing  classroom  interaction  patterns   could  provide  insights  into  the  nature  of  interactions  for  high-­‐  versus  low-­‐achieving  students   completing  an  online  course.  Dawson  (2010)  determined  that  high-­‐performing  students  primarily   interacted  with  other  high-­‐performing  students,  and  likewise,  low-­‐performing  students  were  more   likely  to  have  interactions  with  other  low-­‐performing  students.    More  importantly,  in  examining   instructor-­‐student  interactions,  instructors  networked  with  high-­‐performing  students  (81.7%)  at   significantly  higher  rates  than  they  did  with  low-­‐performing  students  (34.61%).        

 

Social  connections  in  online  learning  may  result  in  cognitive  and  learning  gains  as  well.    Rovai   (2002)  found  a  correlation  between  levels  of  engagement  in  the  classroom  community  and   increased  levels  of  content  learning  and  understanding;  this  was  especially  true  for  females.  

(12)

 

Theories  of  student  retention  have  considered  the  contributions  that  student  motivation  and   challenges  that  external  barriers  may  present  for  students’  continued  enrollment  in  college.     Kember  (1989)  presents  students’  decisions  to  re-­‐enroll  as  the  result  of  a  cost-­‐benefit  analysis,   wherein  students  compare  the  price  of  attendance  and  time-­‐commitment  associated  with  college   attendance  to  the  anticipated  benefits  of  receiving  a  degree.      

 

Examinations  of  student  retention  have  focused  on  two  complimentary  processes,  those  of   persistence  and  attrition  (e.g.,  Rovai,  2002);  with  positive  academic  variables  associated  with   persistence  and  negative  academic  variables  associated  with  attrition  (Bean  &  Metzner,  1985).    In   predicting  persistence,  external  factors,  such  as  family  and  organizational  support  of  the  students’   academic  efforts,  played  a  major  role  in  determining  intent  to  persist,  and  course  satisfaction  and   perceived  relevance  to  students’  daily  lives  was  a  significant  source  of  motivation  to  persist  in   college  course  work  (Park  &  Choi,  2009).    

 

Predictive  models  of  student  retention  have  considered  students’  background  factors,  such  as   previous  GPA  and  academic  performance  (Bean  &  Metzner,  1985).    Further,  students’  use  of  web-­‐ based  technologies  positively  impacted  students’  engagement  and  retention  for  online  learners   (Chen,  Lambert,  &  Guidry,  2010).  

 

Whereas  the  aforementioned  studies  focused  on  individual  student  factors  predicting  retention,   Moore  and  Fetzner  (2009)  addressed  the  institutional  characteristics  that  fostered  commitment  in   non-­‐traditional  students.    These  factors  included  having  a  leadership  culture  that  fosters  

commitment  to  student  success  and  institutional  policies  and  practices  that  incorporate  student   support  services  and  technological  support.    For  online  learners,  access  to  services  and  to  support   that  meets  their  needs  was  found  to  be  crucial  (Moore  &  Fetzner,  2009).      Further,  student  

satisfaction,  defined  as  students  happy  with  their  progress  and  with  support  received  for  learning,   and  with  a  perception  that  the  knowledge  they  were  learning  was  valuable,  was  predictive  of   retention.    Faculty  satisfaction,  stemming  from  involvement  in  curricular  design  and  training  in  the   use  of  online  technologies  supporting  learning,  were  found  to  be  key  to  engagement  and  

contributors  to  retention  (Moore  &  Fetzner,  2009).        

The  findings  from  the  published  literature,  offers  insights  into  (a)  factors  that  may  be  modeled  as   predictive  of  students’  success,  (b)  techniques  that  may  be  used  to  investigate  and  model  student   success,  and  (c)  areas,  specific  to  the  needs  of  non-­‐traditional  learners,  that  may  be  targeted  for   intervention.  

 

(13)

Section  4:  Data  Sources    

One  of  the  key  achievements  of  Phase  1  of  the  Kresge  research  grant  was  the  development  of  the   KDM,  an  integrated  multi-­‐institutional  database  that  chronicles  the  prior  academic  work  of  transfer   students.    Data  for  the  KDM  came  from  four  data  systems:  

 

1. Banner  -­‐  Montgomery  College’s  Student  Information  System  

2. Datatel  -­‐  Prince  George’s  Community  College’s  Student  Information  System   3. PeopleSoft  –  UMUC’s  student  information  system.  

4. WebTycho  –  UMUC’s  propriety  learning  management  system  that  records  students’   activities  in  an  online  classroom.  

 

Demographic,  academic,  and  enrollment  data  were  collected  on  each  student  from  each  institution.     In  addition,  transfer  data  and  online  classroom  behavior  data  were  included  from  UMUC.  

Demographic  data  included  students’  gender,  age,  marital  status,  and  race/ethnicity.  Enrollment   data  included  course  registration,  program  of  study  or  major,  and  student  status.  Academic  data   included  information  about  students’  academic  history  prior  to  transferring  to  UMUC,  such  as   course  grades,  repeated  courses,  and  remedial  coursework.    Transfer  data  included  the  number  of   courses  transferred,  transfer  GPA,  and  prior  degrees  earned.    There  were  two  sources  for  this  data:   community  college  data  provided  through  the  Kresge  project,  and  UMUC  transcript  data.  The  latter   may  be  incomplete  because  UMUC  records  contain  information  only  on  courses  students  chose  to   transfer  to  UMUC  and  equivalent  to  a  UMUC  course.  Classroom  behavior  data  was  specific  to  each   course  and  each  WebTycho  session.  Each  session  recorded  a  login  time,  access  to  various  modules   within  the  classroom,  and  posting  of  or  responding  to  conference  notes.    Each  action  that  students   made  in  the  classroom  was  recorded  and  totaled  for  each  session,  defining  student  activity.      

 

The  KDM  served  as  the  primary  resource  for  all  the  analyses  and  findings  for  this  research  grant.     Section  5  describes  the  research  and  methods  for  Phase  2.        

 

Section  5:  Overview  of  Research  Design  and  Target  Variables    

In  Phase  2,  research  was  developed  to  comprehensively  answer  and  expand  on  questions   introduced  during  Phase  1  of  the  project.  The  findings  from  these  knowledge  sheets  are  

summarized  in  subsequent  sections.  Section  6  of  this  report  presents  findings  from  data  mining   analyses  focused  on  exploratory  analyses  identifying  potential  predictors  of  students’  success  and   retention  at  UMUC.    The  following  questions  were  considered.  

 

1. Which  profiles  of  students  at  UMUC  can  be  identified?    

2. To  what  extent  does  community  college  course  taking  differentiate  each  success  profile  at   UMUC?  

 

Section  7  of  this  report  presents  findings  from  predictive  analyses,  including  cluster  analyses  and   logistic  regression,  modeling  factors  in  students’  demographic  and  community  college  course  taking   backgrounds  that  predict  success  at  UMUC  and  validating  specific  predictors  of  students’  success   identified  in  Section  6.    The  following  questions  were  considered.    

     

3. What  are  the  demographic  profiles  of  community  college  students  transferring  from  MC  and   PGCC  to  UMUC?    

(14)

4. Which  factors  from  students’  demographic  profiles  and  course-­‐taking  backgrounds  in  CC   predict  success  at  UMUC  overall,  and  in  specific  courses?    

5. What  kinds  of  online  learning  behaviors  do  students  transferring  to  UMUC  engage  in?      

These  questions  encompassed  examinations  of  students’  performance  in  community  college  overall   (Research  Questions  1  and  2)  as  well  as  in  specific  courses  (Research  Questions  2  and  4  ).  The   questions  examined  not  only  UMUC  GPA  but  also  reenrollment  (Research  Question  1)  as  a  desired   outcome  variable,  and  considered  not  only  performance  but  also  process  and  learning  behaviors  at   UMUC  (Research  Question  4).    In  addition,  a  number  of  possible  predictors  of  success  not  

previously  considered,  were  included,  such  as  students’  course  efficiency  in  community  college  (the   ratio  of  credits  completed  to  credits  attempted)  and  change  in  GPA  (the  difference  between  

students’  community  college  and  UMUC  GPA).      

Student  Population.  The  population  of  interest  for  the  Phase  2  analyses  was  defined  as  first  term   undergraduate  students  transferring  to  UMUC  from  MC  or  PGCC  between  Spring  2005  and  Spring   2012.  Subsets  of  this  population  were  drawn  for  subsequent  analyses.  

 

Variables.    In  this  report  a  number  of  outcomes  are  associated  with  student  success:        

Course  Success  –  earning  a  final  grade  of  A,  B,  or  C  in  any  course.  

Unsuccessful  Course  Completion  –  earning  a  grade  of  D,  F,  FN,  or  W  in  a  course.   Student  success  –  students’  first  term  GPA  of  2.0  or  above.  

Re-­‐enrollment  –  enrollment  in  the  immediate  next  semester  after  initial  enrollment.   Retention  –defined  as  re-­‐enrollment  at  UMUC  within  12  months  after  initial  enrollment.    

The  first  term  GPA  cut-­‐off  point  of  2.0  is  based  on  current  UMUC  policies  that  define  academic   probation.  On  a  4-­‐point  scale,  2.0  corresponds  to  a  C  average.

(15)

Section  6:  Key  Findings  from  Data  Mining      

The  findings  presented  in  this  section  are  a  result  of  data  mining  efforts  aimed  at  identifying   factors  contributing  to  students’  success  and  retention  at  UMUC.    Data  mining  is  an  exploratory   technique  that  identifies  factors  emerging  from  big  data  and  allows  iterative  predictive  models   to  be  run,  using  a  variety  of  algorithms  and  boosting  techniques  to  improve  prediction  accuracy.     In  the  data  mining  phase  of  the  analyses,  a  large  number  and  variety  of  models  were  run  with   the  aim  of  predicting  retention  and  student  success  at  UMUC.    The  key  models  and  factors   identified  through  data  mining  are  presented.    A  summary  of  models  to  be  discussed  in  the   results  can  be  found  in  Appendix  B,  along  with  information  about  model  fit.  

 

Research  Goal  1.  Profile  students  at  UMUC  based  on  community  college  course  taking   behaviors  

 

In  these  analyses  two  joint  indicators  of  students’  success  at  UMUC  were  used:  achievement  at   UMUC  of  a  first-­‐semester  GPA  of  2.0  or  above  and  retention  at  UMUC.    These  indicators  of   success  were  used  to  create  outcome  profiles,  and  then  a  predictive  model  was  built  on  the   students’  prior  academic  work  and  demographic  variables.  

 

Sample.  The  initial  data  set  consisted  of  14,218  students  with  a  total  of  187,697  course   enrollments  from  Montgomery  College,  and  11,046  students  and  a  total  of  156,373  course   enrollments  from  Prince  George’s  Community  College.    

 

The  top  50  courses  from  each  community  college  were  determined  and  were  organized  by   course  subject  area.    These  top  50  courses  from  each  community  college  represented  a  sample   from  a  total  number  of  1,404  PGCC  courses  and  the  2,737  MC  courses.    As  a  result,  the  final   dataset  included  12,637  students  and  108,237  enrollments.    The  number  of  students  and  a   listing  of  all  of  the  courses  included  in  each  data  set  are  included  in  Appendix  C.    

 

Methods.  Data  exploration  was  performed  using  IBM  Modeler,  SPSS,  SAS  JMP  10  Pro,  and  Excel.   Data  were  transformed  and  new  variables  were  created  as  needed.    Transformations  were   performed  in  Modeler,  JMP,  and  Excel.  

A  variety  of  black  box  algorithms  -­‐  neural  nets,  boosted  trees,  and  Random  Forests  -­‐    were  used   to  develop  profiles  of  students’  success.        Random  Forests  is  a  recently  developed  algorithm   which  provides  strong  data  modeling,  but  its  findings  may  not  be  readily  interpretable.  It  built  a   large  number  of  small  trees  and  averaged  the  results.    JMP’s  Bootstrap  Forest,  used  on  a  dataset   of  variables  derived  solely  from  the  community  college  data,  provided  a  way  of  differentiating   the  likelihood  of  retention  among  those  students  with  low  UMUC  GPAs.  

To  evaluate  effectiveness,  these  models  were  developed  on  a  subset  of  the  data  and  then   applied  to  a  different  subset  (a  holdout  dataset)  that  had  not  been  used  in  the  model  building.   The  misclassification  rate  (the  proportion  of  wrong  predictions)  was  used  to  evaluate  the   effectiveness  of  the  models.  A  number  of  other  measurements  of  effectiveness  were  assessed,   including  lift,  sensitivity,  specificity,  false  positive  rate  and  false  negative  rate.  However,  the   models  which  performed  well  on  the  original  dataset  did  not  yield  equally  good  results  on  the   holdout  dataset,  indicating  that  the  models  were  overfitting  the  data  (i.e.,  they  would  not   generalize  well  to  other  data).  

(16)

Four  indices  of  model  fit  were  used  to  compare  and  evaluate  model  quality.    Performance   indicators  were  calculated  based  on  model  fit  for  the  validation  data  subset.        

• Overall accuracy is the percentage of students correctly identified as “successful” or “not successful.”

• Accuracy improvement (lift) compares the accuracy  of the model to the accuracy of predicting the majority case (“successful”) for everyone. Negative lift means the accuracy is worse than simply predicting the majority case for everyone.

• False positive rate is the percentage of “not successful” students identified as “successful.”

• False negative rate is the percentage of “successful” students identified as “not successful.”    

 

Results  -­‐  Student  retention  

Change  in  GPA.    The  first  set  of  models  used  students’  retention  as  an  outcome.    The  strongest   predictor  of  student  retention  was  change  in  GPA,  computed  by  subtracting  students’  

community  college  GPA  from  their  GPA  in  their  first  semester  at  UMUC.    Values  range  from  -­‐4.0   to  +4.0  and  were  binned  in  intervals  of  0.25.  Among  students  who  were  retained  within  a  year,   only  40%  experienced  a  drop  in  their  GPA  in  their  first  semester  at  UMUC.    By  contrast,  among   students  who  were  not  retained  within  a  year,  70%  had  experienced  a  decrease  in  their  GPA.      

!The  main  finding  is  that,  regardless  of  whether  their  UMUC  GPA  was  above  or  below   2.0,  students  whose  first-­‐semester  GPA  at  UMUC  was  lower  than  what  it  was  at  community   college  were  less  likely  to  demonstrate  persistence  at  the  university.      

Model  1  summary  information  is  presented  in  Appendix  B.  The  distribution  of  delta  GPA  is   presented  in  Figure  1  on  the  following  page.

(17)

Figure  1.  

Change  in  GPA  for  Students  Retained  or  Not  Retained  at  UMUC  

0%   10%   20%   30%   40%   50%   60%   70%   80%   90%   100%   %   of  s tu de nt s   re ta in ed  o r  n ot  

Change  in  GPA  from  CC  to  UMUC  (binned)  

Change  in  GPA  and  Reten;on  

(18)

Results  -­‐  Student  Success  

Demographic  Factors.    A  variety  of  models,  presented  in  Appendix  B,  were  used  to  determine   predictors  of  success.    First,  a  model  was  developed  to  predict  students’  success  using  demographic   factors.  Independent  models  predicting  success  at  UMUC  based  on  student  demographics  were  run   separately  for  MC  and  PGCC  students.    Model  information  is  summarized  in  Models  2,  3,  and  4  in   Appendix  B  

Community  College  GPA.    Community  college  GPA  was  binned  as  being  successful  if  greater  than  or   equal  to  2.0,  or  unsuccessful  if  less  than  2.0.    CC  GPA  was  found  to  be  a  significant  predictor  of   students’  success  at  UMUC  GPA.  Further,  students’  success  at  UMUC  was  predicted  by  the   percentage  of  A,  B,  and  C  grades  that  students’  received  at  community  college.    See  Appendix  B,   Models  5,  6,  and  7  for  summary  information.  Distributions  of  community  college  grades  for   students  classified  as  successful  or  not  successful  at  UMUC  are  displayed  in  Table  1.      

!  The  main  finding  is  that  students  who  earned  a  UMUC  first  term  GPA  of  2.0  or  above  were   more  likely  to  have  earned  As  at  community  college  than  students  earning  a  UMUC  first  term  GPA   below  2.0.    

!Conversely,  students  who  earned  a  UMUC  first  term  GPA  below  2.0  were  more  likely  to   have  earned  Fs  or  Ws  at  community  college  than  students  earning  a  UMUC  first  term  GPA  above  2.0.     See  Appendix  B,  Model  10  for  summary  information.    The  importance  of  students’  community   college  performance  in  predicting  UMUC  success  was  upheld  through    both  data  mining  and   predictive  (Section  7)  approaches.        

 

Table  1.  Community  College  Grade  Distributions  for  Students  Successful   or  Not  at  UMUC      (N=15890)  

CC  grades  (mean  %)   A  

grades   grades  B   grades  C   grades  D   grades  F   grades  W   UMUC  GPA  ≥  2.0  

(10,871  students)   30%   27%   17%   6%   10%   11%  

UMUC  GPA  <  2.0  

(5,019  students)   16%   20%   17%   7%   22%   19%  

Note:  Grade  distributions  were  computed  based  on  the  total  number  of  course  enrollments   Similarly,  distributions  of  community  college  grades  for  students  classified  as  retained  or  not  at   UMUC  are  displayed  in  Table  2  on  the  following  page.      

!No  substantial  differences  were  found  when  evaluating  whether  or  not  there  were   differential  community  college  grade  distributions  for  those  students  retained  at  UMUC  within  a   year.            

(19)

Table  2.    Community  College  Grade  Distributions  for  Students  Retained  or   Not  UMUC.      (N=15890)  

CC  grades  (mean  %)   A   grades   B   grades   C   grades   D   grades   F   grades   W   grades   Retention  YES   26%   26%   17%   6%   12%   12%   Retention  NO   22%   23%   16%   6%   17%   16%    

In  addition  to  independently  considering  these  two  outcomes  of  student  success  –  UMUC  GPA  and   retention  at  UMUC  –  researchers  also  examined  these  two  predictors  jointly.    Thus,  profiles  of   student  success  at  UMUC  were  determined  that  classified  students  based  on  successful  GPA  and   retention.  All  combinations  of  the  two  attributes  were  examined.    Four  quadrants  were  formed  with   students  evidencing  a  high  or  low  GPA,  and  being  retained  or  not.    These  four  Success  Quadrants   were  named  Stars,  Strivers,  Slackers,  and  Splitters.    Each  quartile  is  described  in  Figure  2  below.   Figure  2.  

Success  Quadrants    

                                                           

References

Related documents