In this chapter, I describe the data source, samples, measures, and data analysis plans for my dissertation. This dissertation is composed of two studies. The first study is a replication study to examine science motivation in rural African American high school students using Eccles et al. (1983) expectancy-value theory of motivation. The second study investigates the
relationships between parental socialization and rural African American students’ science expectancies using the Eccles et al. (1983) parent socialization model of expectancy-value theory. The following section describes the source of data used for both studies. Subsequent sections detail the study variables and analytic plans utilized in each study. The chapter
concludes with an overview of the research questions, hypotheses, and analytic strategies used in this dissertation.
Source of Data HSLS:09 Sample Design
The High School Longitudinal Study of 2009 (HSLS:2009; Ingels et al., 2013) is a
nationally representative, longitudinal study of over 23,000 students from 944 public and private schools that contained both 9th and 11th grades across the United States. During the base year
(Fall 2009), a randomly selected sample of ninth-grade students were administered an online questionnaire and a mathematics assessment within the school setting. Students’ parents, lead school counselors, school administrators, mathematics and science teachers were also
administered questionnaires online or via phone. The data for this dissertation were taken from the first follow-up, which took place when most sample members were in 11th grade (Spring
47
2012), and the transcript study (Fall 2013). During the first follow-up, students were again administered an online questionnaire and a mathematics assessment. Counselors and principals at each of the participating schools were also administered questionnaires for the first follow-up. In addition, a random subsample of parents was administered a questionnaire. In September 2013, transcripts were requested from all base-year schools, as well as from any school that students had reported transferring to during the first follow-up and/or the 2013 Update study that occurred during the summer of 2013. Variables from the first follow-up student questionnaire and high school transcript study were analyzed in Study 1. In Study 2, variables from the first follow-up student questionnaire and first follow-up parent questionnaire were analyzed.
HSLS: Instrumentation
This dissertation utilized data from both the student and parent questionnaires of the first follow-up of the HSLS:09 that occurred during the spring of 2012. Data from the High School Transcript Study of the HSLS:09 were also analyzed. These surveys are described below:
First follow-up student questionnaire. This online survey was administered during the spring of 2012 to students who had participated in the base year survey as ninth grade students during the fall 2009 semester. Students were given the survey regardless of their current school enrollment status and was designed to include students who had been retained, had dropped out, graduated early, or who had transferred to other schools/educational settings in the time that had elapsed since the base year study. Students were surveyed on a variety of topics related to their school experience including: attendance, grades, demographics, and family background (including plans and preparations for the post-high school), knowledge of college admission processes (testing, financial aid, college characteristics), influences on thinking and behavior, and peer behaviors and aspirations. Students were also asked specifically about their experiences
48
with science and mathematics, including course enrollment, feelings about math and science classes, math and science identity and utility, participation in science and mathematics extracurricular programs.
First follow-up parent questionnaire. During the first follow-up in 2012, a parent questionnaire was administered via online, telephone, or in-person interview to a random subsample of students’ parents. The parent or guardian most familiar with the school situation and experience of the student was asked about household characteristics, prior educational experiences of the sample member, parental involvement with the sample members’ education, and postsecondary planning, including parent expectations and aspirations.
High school transcript. In September 2013, researchers began collecting transcripts from all of the base-year schools that participated in the study. In addition, transcript requests were also sent to the transfer schools reported by students during the first follow-up in 2012 and the 2013 Update. School administrators were asked to report enrollment, testing, and course- taking2 information for each student, in addition to the school’s grading and graduation
policies/requirements (Ingels et al., 2015). Student information in the transcript study included: participation in specialized programs, student enrollment dates (graduation date or final
withdrawal date), reason student left school (graduated, transferred, etc.), diploma type (standard diploma, General Educational Development (GED) certificate, certificate of attendance, etc.), cumulative grade point average (GPA), standardized test scores for the PSAT, SAT/ACT, AP,
2 The US Dept of Education, Institute of Education Sciences, December 2005 style guide indicates coursetaking as a compound word.
49
IB and/or SAT subject tests, and course-taking histories for grades 9 through 12 (any high school-level courses, such as algebra, geometry, or foreign language, taken prior to 9th grade were also included). Student course-taking histories included: course title, course number, school year, grade level, and term in which course was taken, credits earned per course, and course grade. Schools also provided information about their grading scale, course weighting systems used, GPA formula used, school term system, and diploma requirements.
HSLS: Participants. The first follow-up of the HSLS:09 included data from 20,594 students from 904 schools across the United States. Data were collected during the Spring 2012 semester when most students were in their 11th grade year. The original sample was comprised of 10,384 (50.4%) males and 10,210 (49.6%) females. In addition, 17,164 (83.3%) of the students surveyed attended public schools. Approximately 84% of the respondents attended the same school as in ninth grade (base-year school), while the remainder of the respondents had either transferred (11.6%), dropped out (2.1%), were being homeschooled (1.1%) or had graduated early (0.9%). Demographic information about the racial and ethnic backgrounds of first follow-up participants is included in Table 2.
The High School Transcript Study included transcripts from 21,928 students from over 900 schools across the United States (including students who were recruited during the base year, but did not participate in the first follow-up study). The study included 11,146 males (50.8%) and 10,782 females (49.2%). As with the follow-up study, most of the students in the study attended public schools (n = 18,123, 82.6%). (See Table 2). Nearly 25% of these students were identified as rural (5,288). Demographic information about the racial and ethnic
50 Table 2.
HSLS:09 Subject Demographics: By Race
First Follow-Up Studya
High School Transcript Studyb
Race Number Percentage (%) Number Percentage (%)
American Indian/Alaska Native 187 0.91% 219 1.00% Asian 2,129 10.34% 2,270 10.34% African American/Black, Non-Hispanic 2,518 12.23% 2,685 12.24% Hispanic/Latino 3,193 15.50% 3,488 15.91% White, Non-Hispanic 12,217 59.32% 12,897 58.82% Multiracial 350 1.69% 369 1.68%
51
Study Variables and Analytic Plans
Variables used in this dissertation are both single item and composite variables drawn from the student and parent questionnaires of first follow-up of HSLS:09 and a single item variable from the High School Transcript Study of HSLS:09. The HSLS:09 research team (Ingels et al., 2013) created a range of composite variables using two or more questionnaire items, suitable for use as endogenous variables in data analysis. Additionally, imputation was used for missing responses in many cases to lessen item nonresponse bias. For more information on composite variables and their associated scale scores, please see Chapter 5 of the HSLS:09
First Follow-up Data File Documentation (NCES 2014-361). All variables used in this dissertation are described below within their respective studies. For composite variables, the questionnaire items are described in Tables 3 - 5
Sample Selection: Studies 1 and 2
Consistent with the research aims of the dissertation, Study 1 and Study 2 both included a sample of African American students from rural public schools. As described below, a three- step process was used to identify the rural African American students for data analyses. I began by extracting publicly available data from HSLS official website
(https://nces.ed.gov/surveys/hsls09/) using SPSS Statistics Version 24 software package. From the whole dataset, I extracted the African American data first and then among these data, I extracted only the samples who attended rural, public schools (See Figure 5).
To identify the students’ race and ethnicity, the HSLS:09 researchers created a composite variable for the race/ethnicity (X2RACE) for each student respondent. It was composed of six dichotomous composite variables (e.g., X2WHITE, X2BLACK, X2ASIAN) with students being identified as either being part of that racial category or excluded from it (e.g.,
52
White/not White, Black/not Black, Asian/not Asian). Students indicated their race on the base year questionnaire, as well as the first follow-up. If missing, the composite was imputed using race/ethnicity data from the parent questionnaires. For both studies of the dissertation, the composite variable (X2BLACK) was used to create a sample that included students whose response had been coded as 1 = BLACK.
Next, I used school locale (X2LOCALE) to exclude students from non-rural locales. The sample included students whose school had been coded as 4 = Rural. Further, I also excluded students who attended non-public schools (X2CONTROL).
Last, I excluded students who were no longer in school by the 11th grade (Spring 2012 First Follow-up Study). This step eliminated students who may have graduated early, dropped out, began homeschooling or who may have become ineligible for the study (e.g., moved outside of the United States, incarceration). These steps yielded a total of 824 rural African American 11th grade public school students.
53
Figure 5. Steps used to extract the study sample from the HSLS:09 dataset All Students
(N= 23,503)
African American Students (X2BLACK =1)
(n= 3,832)
African American rural students
(X2LOCALE = 4) (n = 947)
African American rural public school students
(X2CONTROL = 1) (n = 889)
African American rural public school students currently enrolled in school during
Spring 2012 (n= 824)
54
Study 1 –Replication Study
Endogenous variables. The first study conducted for this dissertation was a replication study of Eccles et al. (1983) expectancy-value theory of motivation using a national sample of rural African American high school students. As described earlier, student expectancies and subjective valuing in a domain are theorized to predict achievement related behaviors (e.g., persistence, choice, and performance).
Science expectancies (Science self-efficacy). X2SCIEFF is a scale of the sample
member’s science self-efficacy; higher X2SCIEFF values represent higher science self-efficacy.
HSLS:09 researchers (Ingels et al., 2013) created through principal components factor analysis (weighted by W2STUDENT) and standardized to a mean of 0 and standard deviation of 1. The inputs to this scale were S2STESTS, S2STEXTBOOK, S2SSKILLS, and S2SASSEXCL (see Table 3 for item descriptions). The coefficient of reliability (alpha) for the scale is .92.
Science subjective task value. Subjective task value (STV) is composed of four different values: attainment/identity value, utility value, intrinsic value, and cost. HSLS:09 researchers created composite variables for the first three components of STV, which were used for this study.
X2SCIID is a scale of the sample member’s science identity. Sample members who tend to agree with the statements “You see yourself as a science person” or “Others see me as a science person” will have higher values for X2SCIID. HSLS:09 researchers (Ingels et al., 2013) calculated a coefficient of reliability (alpha) for the scale of .89.
X2SCIUTI is a scale of the sample member’s perception of the utility of science; higher values represent perceptions of greater science utility. The variable was created through
55
principal components factor analysis (weighted by W2STUDENT) and standardized to a mean of 0 and standard deviation of 1. The inputs to this scale were S2SUSELIFE, S2SUSECLG, and S2SUSEJOB (see Table 4 for item descriptions). HSLS researchers (Ingels et al., 2013) calculated a coefficient of reliability (alpha) for the scale of .82.
X2SCIINT is a scale of the sample member’s interest in his or her Spring 2012 science course; higher values represent greater interest in the science course. The variable was created through principal components factor analysis (weighted by W2STUDENT) and standardized to a mean of 0 and standard deviation of 1. The inputs to this scale were S2SENJOYING,
S2SWASTE, S2SBORING, S2FAVSUBJ, and S2SENJOYS (see Table 4 for item descriptions.
HSLS researchers (Ingels et al., 2013) calculated a coefficient of reliability (alpha) for the scale of .77.
Table 3.
Science Self-Efficacy Item Descriptors
Composite Variable Item Description
Scale of Science Efficacy
X2SCIEFF How much do you agree or disagree with the following statements about [science course title/science]?
S2STESTS You [are/were] confident that you [can/could] do an excellent job on tests in this course. /You are confident that you can do an excellent job on science tests.
S2STEXTBOOK You [are/were] certain that you [can/could] understand the most difficult material presented in the textbook used in this course. / You are certain that you can understand the most difficult material presented in science textbooks.
S2SSKILLS You [are/were] certain that you [can/could] master the skills [being taught/that were taught] in this course. /You are certain that you can master science skills.
S2SASSEXCL You [are/were] confident that you [can/could] do an excellent job on assignments in this course. /You are confident that you can do an excellent job on science assignments.
Source: Ingels et al. (2013)
Table 4.
Science Subjective Task Value Item Descriptors
Composite Variable Item Description
Scale of the sample member’s science identity
(Attainment value)
X2SCIID How much do you agree or disagree with the following statements about science?
S2SPERSON1 You see yourself as a science person. S2SPERSON2 Others see you as a science person. Scale of the sample member’s
perception of the utility of science
X2SCIUTI How much do you agree or disagree with the following statements about science?
S2SUSELIFE Science is useful for everyday life. S2SUSECLG Science is useful for college.
S2SUSEJOB Science is useful for a future career. Scale of the sample member’s
interest in his or her science course
X2SCIINT How much do you agree or disagree with the following statements about science? S2SENJOYING You [are enjoying/enjoyed] this class very much. /You enjoy science classes
very much
S2SWASTE You [think/thought] this class [is/was] a waste of your time. /You think science classes are a waste of your time.
S2SBORING You [think/thought] this class [is/was] boring. /You think science classes are boring.
S2FAVSUBJ Not including lunch or study periods, what [is/was] your favorite school subject? S2SENJOYS you really enjoy science?
Source: Ingels et al. (2013)
58
Study 1: Exogenous variables. Achievement-related behaviors (e.g., persistence, choice, and performance) are indicative of student motivation (Andersen & Ward, 2014; Eccles et al., 1983; Graham, 2004). Choice was measured by the number of science credits earned by students as reported on their transcript. X3TCREDSCI indicates the number of science credits earned by students, ranging from 0 to 8 and includes half-credits.
Student effort was measured using X2SEFFORT. X2SEFFORT is a scale of the student’s answers about science effort. HSLS researchers (Ingels et al., 2013) created this variable through principal components factor analysis and standardized to a mean of 0 and standard deviation of 1. The inputs to this scale were S2SATTENTION, S2SONTIME, S2SSTOPTRYING, and S2SGETBY (See Table 5 for item descriptions).
To determine which students planned to persist in science beyond high school, I used X2STU30OCC_STEM1. X2STU30OCC_STEM1 was coded based on the 2-digit Occupational Information Network (O*NET) code of the job that the student plans to or would like to have at age 30 using the 2000 Standard Occupational Classification taxonomy. Job titles and duties were matched to descriptions from the Occupational Information Network (O*NET) and then researchers further categorized the occupation based on its relation to STEM
(X2STU30OCC_STEM1) (Ingels et al., 2013). X2STU30OCC_STEM1 was coded as follows: 0 = Not a STEM occupation, 1 = Life and Physical Science, Engineering, Mathematics, and Information Technology Occupations, 2 = Social Science Occupations, 3 = Architecture
Occupations, 4 = Health Occupations, 5 = Split across two sub-domains, or 6 = Unspecified sub- domain. If missing, these variables were imputed from the base-year student questionnaire.
59 Table 5.
Achievement Behaviors Item Descriptors
Composite Variable Item Description of Item
Scale of the student’s answers about science effort
X2SEFFORT How often [do/did] you do these
things in [science course title]?
S2SATTENTION You [pay/paid] attention to the
teacher.
S2SONTIME You [turn/turned] in your assignments
and projects on time.
S2SSTOPTRYING When an assignment [is/was] very
difficult, you [stop/stopped] trying.
S2SGETBY You [do/did] as little work as possible;
you just [want/wanted] to get by.
Source: Ingels et al. (2013)
Analytic plan. In the following sections, I describe the analytic procedures used in the replication study (Study 1). I provide details about the steps taken to prepare the data for
analysis (e.g., recoding, data screening), in addition to the steps taken to test the replication study research hypotheses.
Recoded variables. I recoded item responses to the occupational questions asked in the student questionnaire (X2STU30OCC_STEM1). These items initially included six codes to indicate plans for future employment in subdomains within STEM (e.g., Health Occupations, Life/Physical Science, etc.), as well codes to indicate indicating non-STEM related occupations, uncodeable entries, and other non-responses. All codes indicating STEM employment were collapsed into one (1= Plans to persist in a STEM occupation), as domain types were not pertinent to the analyses. Any responses indicating non-STEM related occupational plans were coded as 0. Uncodeable entries were recoded as missing (999).
In addition, HSLS researchers assigned a value of -7 to any student who did not have a full set of responses to the variables that were used to create the composite variables that were
60
being used in this study (X2SCIEFF, X2SCIINT, X2SCIUTI, and X2SCIID). I recoded these entries as missing to remove them from being included in the analyses.
Categorical data. As noted in the previous section, one of the exogenous variables (X2STU30OCC_STEM1) had been recoded into a dichotomous variable (1 = Plans to persist in a STEM occupation, 0 = No plans to persist in a STEM occupation). As a result, this variable was treated as categorical in nature and I utilized the Weighted Least Squares Means and Variances (WLSMV) estimator for modeling and to handle missing data.
Descriptive analyses. Using SPSS 24, I conducted descriptive analysis including the analysis of means, standard deviations, skewness, and kurtosis. Bivariate correlations were also conducted using scatterplots in order to identify cases with extreme values and high levels of correlation (r >.85). Mplus version 7 was used to test the structural models hypothesized in this study. According to Múthen & Múthen (2012), Mplus can analyze both categorical and
continuously scored variables.
Applying weights. I used the HSLS:09 analytic weights to adjust for subsampling of sample units, to calibrate the sample for population counts, and to factor in nonresponses. As a result, two analytic weights (W2STUDENT and W3STUDENTTR) were applied during the analyses to provide more accurate population estimates.
Applying replicate weights. To account for clustering that may have occurred within
HSLS:09 sampling procedures, I employed balanced repeated replicate weights. This procedure was used because students who are within the same school may have greater similarities to one another than to students from other schools as a result of being exposed to similar environments (e.g., counselors, administrators, teachers, courses). The clustering of students has a potential effect on variance estimation and results in the underestimation of the actual variability in the
61
population (i.e., underestimated variance and standard errors) (NCES, 2012). HSLS:09 uses 200 balanced repeated replicate weights for each of the student analytic samples (W2STUDENT1- W2STUDENT200, W3HSTRANS 1-W3HSTRANS 200). Estimation was repeated 200 times to