CHAPTER THREE Methods - Relative Age Effects and Measures of Potential in the Primary Grades

This study is an examination of the relationship between relative age and students’ achievement test performance, teacher ratings of students’ academic performance, and teacher ratings of students’ learning behaviors. I have three research questions aimed at exploring this topic.

Research Questions and Hypotheses

In this section I outlined the three research questions used to guide this study and explained my hypotheses for each question.

Research Question 1: How does relative age predict student performance on academic assessments and teacher assessments of student behaviors in the primary grades after controlling for student SES and gender?

Students enter kindergarten with a range of skills, and the younger a student is, the less time he/she has had to learn and the fewer opportunities he/she has to experience learning. It is reasonable to expect that a relationship exists between relative age and student performance on math, science, and reading assessments, particularly in kindergarten. When students enter

kindergarten, teachers gauge a student’s readiness using objective measures such as standardized assessments and behavior-based assessments such as observations of behaviors and the student’s maturity. Because maturity is linked to absolute age, it stands to reason that if a student is

comparatively young for the grade, then he/she will seem less mature. By analyzing teacher observations of student behaviors and a student’s relative age, we can determine whether the relativity of a student’s age is related to the teacher’s perceptions of the student’s behaviors. I

expect relative age, after controlling for student SES and gender, will predict student performance on academic assessments and teacher assessments of student behaviors.

Research Question 2: How does the effect of relative age on measures of academic performance and teacher ratings of student behaviors attenuate as students age?

Bloom (1966) acknowledged that the first three years of a child’s school career are the most developmentally important in education as students experience a period of rapid cognitive and intellectual growth. Elder and Lubotsky’s (2009) results supported this developmental theory, finding that the positive effect of being relatively older became non-significant after first grade, and age-related gaps began diminishing soon after. Because students grow so quickly in their first few years of schooling, I expect that the strength of the relationship between a student’s relative age and his/her achievement scores and teacher observations of student performance will decrease as the student progresses through the grades.

Research Question 3: What is the magnitude of the relationship between teacher assessments of student behaviors and student performance on academic assessments in the primary grades?

Particularly in elementary school, teachers are very familiar with their students’ performance and abilities. Teachers make instructional decisions and establish expectations based on student performance (Südkamp et al., 2012). Researchers have linked student performance and teacher perceptions of student performance in an attempt to determine the accuracy of teacher judgments (Südkamp et al., 2012) and to explore student characteristics that might influence teacher perceptions (Meissel et al., 2017). Because of this strong research support, I expect the relationship between teacher assessments of student behaviors and student performance on academic assessments to be relatively strong.

To address these research questions, I applied an instrumental variable estimator incorporated into an autoregressive cross-lagged model with three latent variable pathways to longitudinal data. In this chapter, I describe the sample and variables of interest and provide an explanation of the methodology.

Sample Description

The sample for this study consisted of students in primary grades across the United States from 2010-2013. The participants took part in the third version of the Early Childhood

Longitudinal Study (ECLS) program, the ECLS-K:2011 (NCES, 2017). The nationally-

representative sample was selected in a three-stage process from students attending private and public school kindergarten in 2010-2011. During the first stage, the United States was divided into sampling units; then, schools and programs within the sampling units were selected for participation, with attention to the ethnic subgroups that were desired for oversampling (Asian, Native Hawaiian, and Other Pacific Islander). In the last stage, children in the selected schools were invited to participate in the study. The characteristics of this sample are outlined in

Appendix B. In accordance with the results dissemination procedures outlined by NCES for the restricted-use data, all n values pertaining to student descriptions have been rounded to the nearest 10 (NCES, 2017).

There are 18,170 students in the sample from 1,310 schools across all 50 states divided into four regions. Nine states made up the Northeast region, which consisted of 3,010 (16.56%) students with 2,540 attending a public school. The Midwest region was represented by 3,870 (21.29%) students from 12 states with 3,220 attending public school. Sixteen states and the District of Columbia made up the South region, which consisted of 6,640 (36.54%) students, 6,070 of whom attended a public school. The West region had 4,660 (25.64%) students with

4,130 attending public schools from across 13 states. The largest number of students attended schools in city (n = 6,010) and suburb locations (n = 6,790), while the smallest numbers came from rural (n = 3,960) and town (n = 1,410) locations.

Approximately 12% (n = 2,220) of students attended private schools. Of those private schools, 1760 students attended one with a religious affiliation. Almost 47% of student participants identified as White, non-Hispanic (n = 8,500); 2,400 identified as Black, non- Hispanic; 4,590 as Hispanic; 1,540 as Asian, non-Hispanic; 120 as Native Hawaiian/Other Pacific Islander, non-Hispanic; 170 as American Indian or Alaska Native, non-Hispanic; 830 as Two or more races; and 50 as Unknown.

From the 2010-2011 base-year sample of 18,170 students, 15,390 were still eligible at the third grade data collection (the 2013-2014 school year, Wave 7). The overall response rate for Wave 7 was 84.2 weighted and 79.9 unweighted (n = 12,900). Of the school types, public schools (n = 11,690) and private Catholic school (n = 594) had the highest response rates of 95.1 and 94.1, respectively. Of those eligible, 1,740 students (approximately 11% of the third grade sample) formed the unknown/homeschooled group, which had one of the lowest response rates (n = 50, response rate = 2.7) for the Wave 7 data collection. The South (n = 4,570) and Midwest (n = 2,690) regions of the United States had the highest response rates (South = 95.3; Midwest = 95.2) of the census regions. Hispanic students (n = 3,540) had the highest response rate of all student race/ethnicity categories at 86.3, while the American Indian or Alaska Native, non- Hispanic group had the lowest (response rate = 70.7).

Data Collection

Data collection began in the 2010-2011 school year with a cohort of students in

fifth grade. The data used in the current study span four years from the 2010-2011 school year through the 2013-2014 school year, when most of the students in the study were in third grade. The direct cognitive assessments were completed in the fall and spring of each school year. Because the primary years of a student’s education are a time for rapid cognitive development, the assessments given at each wave were not exactly the same, but did measure the same

underlying constructs. The achievement assessments were vertically scaled to account for student growth. The child-level teacher questionnaires were administered in the fall and spring of each school year as well, but the teachers were not asked the same questions at every wave. For example, at Wave 1 teachers were not asked to provide ratings for a student’s general ability level in a content area, and at Wave 6 teachers were not asked to provide feedback about a student’s ability to perform specific content area skills.

Instruments

The outcome variables of interest in this study are the child-level reading, math, and science, as well as items from the child-level classroom teacher questionnaire.

Direct cognitive assessments. The battery of tests given to students was designed to require approximately 60 minutes for students to complete. Trained and certified child assessors administered the tests individually. The battery of assessments consisted of two executive functioning measures and reading, math, and science achievement tests. Assessors screened students’ language to determine which components of the assessment to administer (Tourangeau et al., 2016). The language screener was used through Wave 4. Regardless of performance on the language screener, all students received the same first set of items on the reading assessment, which was a combination of items from the Simon Says and Art Show tasks from the Preschool Language Assessment Scale (preLAS; Duncan & De Avila, 2000).

The reliabilities of the reading, math, and science scores range from .75 to .95 (NCES, 2017). The reading reliability decreased across waves, starting at .95 in Wave 1 and .87 at Wave 7, but the number of items administered also decreased over time (Tourangeau et al., 2016). Math had consistent reliability coefficients across all waves with the lowest being .92 and the highest .94. Science had the lowest reliability at each wave (.75 at Wave 2, and .83 at all other waves) compared to reading and math, presumably attributable to the diversity of the science content and low number of total items (Tourangeau et al., 2016). Construct validity of the achievement assessments was maintained through the development of potential test items based on national and state performance standards, state achievement assessment, and commercial achievement assessments (Tourangeau et al., 2016). Curriculum experts and school teachers established the content validity of the assessment items.

The math, reading, and science assessments were individually administered as two-stage adaptive tests. All students received the same questions of varying difficulty on the first stage. A student’s performance on the first stage determined which form of the second stage test he/she received (low, middle, or high difficulty). For the questions, children were required to either point to a response or tell the assessor an answer, but they were not required to explain their reasoning or write their answers.

Reading. The reading assessment is a measure of basic skills such as letter recognition,

beginning and ending sounds, sight vocabulary, decoding multisyllabic words, vocabulary knowledge, and reading comprehension. The reading passages represented grade-appropriate content, length, and language complexity across a variety of literary genres (Tourangeau et al., 2016). The earlier grades focused on the basic skills, and as the grade level increased, the focus

shifted more toward reading comprehension. The reading assessment was administered in the fall and spring of each year, kindergarten through fifth grade.

Math. The math assessment covered the following topics: conceptual knowledge,

procedural knowledge, and problem solving; number sense, properties, and operations;

measurement; geometry and spatial sense; data analysis, statistics, and probability; and patterns, algebra, and functions (Tourangeau et al., 2016). Although students could see the majority of the text of the questions, the assessors also read the questions to the students so as to limit the dependence on reading and language. Assessors also provided students with paper and pencil. The math assessment was given in the fall and spring of each year, kindergarten through fifth grade.

Science. The science assessment included questions about physical science, life science,

Earth science, space science, and scientific inquiry (Tourangeau et al., 2016). As with the math assessments, the assessors read the questions, potential responses, and any associated text to students to avoid a confounding measurement of reading ability or comprehension. The science assessment was given in the spring of kindergarten and then in the fall and spring of subsequent years.

For the reading, math, and science assessments, the data file contained thetas, standard errors of the thetas, and IRT scale scores for each assessment at each wave. The theta scores are estimates of a child’s achievement based on his/her performance on the items administered. They are used to represent a student’s latent ability; the scores are not item-dependent. The IRT scale score uses the theta to predict the probability the child would have answered an item correctly had he/she been administered all items. Then the probabilities for all items in a content area are combined to create one scale score for each subject area (Tourangeau et al., 2016). For this

study, I used the IRT scale scores, because they are more appropriate for cross-sectional and longitudinal analyses and are generally more easily interpretable (Tourangeau et al., 2016).

Child-level teacher questionnaires. Teachers used their observations of students’ behaviors to answer questions about students’ knowledge, skill level, and social interactions.

Indirect cognitive assessments. Teachers evaluated students’ academic achievement in

language and literacy, science, and mathematical thinking using the Academic Rating Scale (ARS). The instrument initially was developed for the ECLS-K to overlap with and supplement the direct cognitive assessments. Teachers used a 5-point scale to identify the degree to which a student had acquired and expressed the given skills and behaviors. These ratings of content area- specific skills were a part of the teacher questionnaire at Waves 1 through 4. In Waves 2 through 7, teacher provided a rating of a student’s generalized ability in a particular content area. For Waves 2 through 4, the content areas were reading, math, science, and social studies. Writing and oral language ratings were added starting with Wave 6. Reliability estimates for ratings of students’ academic performance were not reported.

Learning behavior measures. The social skills instrument used was adapted from the

Social Skills Rating System (SSRS; Elliott, 1990). This assessment focused on factors of social competence and problem behaviors. Items were also adapted from the Children’s Behavior Questionnaire (CBQ; Putnam & Rothbart, 2006) and the Temperament in Middle Childhood Questionnaire (TMCQ; Simonds & Rothbart, 2004). These items were descriptions of behaviors related to attentional focusing, and teachers rated how true the description was of the student. The reliability estimates for this behavior construct ranged from a low of .83 at Wave 4 to a high of .96 at Waves 6 and 7. Additionally, teachers reported how often students exhibited certain behaviors related to approaches to learning. These scores on these questions were combined into

an overall measure of a student’s approach to learning, which had a reliability estimate of .91 at each wave. This assessment was given in the fall and spring of the project’s first year and then every spring of subsequent school years.

Demographics. Standard demographic variables from the data set, such as student sex and SES, were used as control variables. The dataset contains variables representing parent income, parent education, and occupation prestige. These variables were combined and normalized to act as a composite SES variable. I also incorporated dummy coded variables reflecting the following: whether a student was an EL at Wave 2, a first time kindergartener, and a racial or ethnic minority with White and Asian combined as the reference group. The school- level demographic variables I used were percentage of Free or Reduced Lunch (FRL) eligible in the school, district level poverty, and percentage of non-White students in the school.

Variables

I used a combination of existing and computed variables. Below I have outlined how I used and/or calculated the variables of interest. First, I provide a detailed description of how I calculated the age-related variables and the executive functioning indicator. Then, I describe the different coding systems I used and the specific variables to which they applied. I end this section with a description of how I dealt with missing data and a short description of the data cleaning process.

Computation of variables. I calculated the student’s actual age at kindergarten entry (actual age or AA); youngest age, according to state or district laws, at which a student could enter kindergarten (eligibility age or EA); predicted relative age at kindergarten entry (predicted relative age or PRA); and actual relative age at Waves 1, 2, 4, 6, and 7 (actual relative age or ARA).

Actual age. The database provided a kindergarten entry age for most students, as well as

the month and year of birth. The provided kindergarten entry age is the student’s age on

September 1 of the year the student first entered kindergarten. However, some students attended some kindergarten-like program prior to the 2010-2011 school year. For those students, the provided kindergarten entry age is the student’s age on September 1 of a school year prior to the 2010-2011 school year (see more under Repeaters in the Variable Coding section). Because the year the student entered kindergarten for the first time, if it was a year other than the 2010-2011 school year, was not provided and not all schools start on September 1, I recalculated the student’s actual age on the school start date of the 2010-2011 school year. To do this, I subtracted the student’s provided month and year of birth from the school start date. The difference was recorded in months as the student’s actual age at kindergarten entry.

Eligibility age. Eligibility age (EA) is the youngest age a student can be on the date

school started in the 2010-2011 school year. This age is dependent on the state- or district- mandated rules on kindergarten entry age (i.e., the cutoff date by which a student must turn 5 to enter school in 2010-2011) and the date the 2010-2011 school year began at the school the student attended in Wave 1. To calculate this variable, I subtracted the cutoff date from the school start date. If the difference was positive, the cutoff date was before the school start date and all students entering school in 2010-2011 should be 5 years or older. If the difference was negative, the cutoff date was after the school start date and some students could enter

kindergarten at less than 5 years old. The difference was added to the universal cutoff age of 60 months (5 years) and this sum was the EA (ENTRYAGE in the models). For example, if the cutoff date was July 31, 2010, and the school start date was August 31, 2010, the difference would be +1 month. This value would be added to 60 months, for an EA of 61 months. If the

cutoff date was October 31, 2010, and the school start date was August 31, 2010, the

approximate difference would be -2 months. When the difference is added to 60 months, the eligibility age for this student (and others in the same school) would be 58 months. The EA ranged from 54.80 months to 65.30 months

Actual relative age. Actual Relative Age (ARA) is the student’s actual relative age to

his/her kindergarten schoolmates in 2010-2011. I used the student’s cutoff date (the date by which a student had to be born to enter kindergarten) and date of birth (DOB; provided as the month and year). I subtracted the month and year of birth from the cutoff month, day, and year. I converted the year difference into months by multiplying by 12 and added it to the difference from the month calculation. This sum was the student’s ARA in months. I recalculated students’ ARAs at each wave to account for two potential issues. The first pertains to how data were collected. After consent was obtained for kindergarten students in the 2010-2011 school year, the data collectors continued to gather data on students regardless of the student’s grade level. If a student did not progress to the next grade level with his cohort, his/her relative age would change because he/she was assessed (at least by teachers) in comparison to those in his/her current grade level, not the grade level of his/her cohort. To account for students’ being retained or accelerated at a grade level, I recalculated the ARAs at each wave. The second potential issue was because a student could change schools between assessment waves, his/her ARA could also change,

depending on whether the cutoff dates between the schools differed. Because of these possibility, students’ ARAs were recalculated at each wave. This recoding process is described in detail under Movers in the Variable Coding section.

Predicted relative age. Predicted Relative Age (PRA) is the student’s predicted relative

Eligibility Status, date of birth (DOB; provided as the month and year), and the date by which a student had to be born to be eligible to enter kindergarten (cutoff date). First, I calculated the student’s ARA and eligibility status for the 2010-2011 school year (see Eligibility status under

In document Relative Age Effects and Measures of Potential in the Primary Grades (Page 54-109)