5. DATA COLLECTION METHODS AND RESPONSE RATES
5.7 Data Collection Quality Control
Data collection quality control efforts began with the development and testing of the CATI/CAPI applications and FMS. As these applications were programmed, extensive testing of the system was conducted. This testing included review by project design staff, statistical staff, and the programmers themselves. This testing by staff members, representing different aspects of the project, was designed to ensure that the systems were working properly from all perspectives. Quality control processes continued with the development of field procedures that maximized cooperation and thereby reduced the potential for nonresponse bias.
A live pilot test of the systems and procedures, including field supervisor and assessor training, was conducted in April and May 1998 with 12 elementary schools in the Washington, DC, metropolitan area. The purpose of the pilot test was to ensure that all the systems were working properly. Modifications to the data collection procedures, training programs, and systems were made to improve efficiency and reduce respondent burden. Modifications to the parent interview to address some issues raised by pilot test respondents were also made at this time.
Quality control activities continued during training and data collection. During the assessor training, field staff practiced conducting the parent interview in pairs and practiced the direct child assessment with kindergarten children brought to the training site for this purpose. When the fieldwork began, field supervisors observed each assessor conducting child assessments and made telephone calls to parents to validate the interview. Field managers made telephone calls to the schools to collect
information on the school activities for validation purposes. A sample of the assessor-completed OLDS score sheets were rescored in the home office for quality control purposes.
5.7.1 Child Assessment Observations
Field supervisors conducted on-site observations of the child assessments. During the fall- kindergarten collection, one observation was completed for each assessor within the first two weeks of the field period. In spring-kindergarten, two observations were completed for each assessor. The first observation was within two weeks after the assessments began, and the second observation was completed within three weeks of the first observation.
A standardized observation form was used to evaluate the assessor’s performance in conducting the child assessment. The assessor was rated in three areas:
Rapport building and working with the child – such as use of neutral praise and the
assessor’s response to various child behaviors.
Cognitive assessment activities – such as reading questions verbatim, the use of
acceptable probes, the use of appropriate hand motions, and the absence of coaching.
Specific assessment activities – such as correctly coding answers to open-ended
questions in the assessment, weighing and measuring the child correctly, and following administration procedures.
The field supervisors recorded their observations on the form and then reviewed the form with the assessor. The most frequent problems observed were not reading the items verbatim and inappropriate gesturing. Feedback was provided to the assessors on the strengths and weaknesses of their performance and, when necessary, remedial training was provided in areas of weakness. A training video reinforcing appropriate gesturing and reading items verbatim was created and used in the spring training.
5.7.2 Parent Validations
Parent validation forms were generated for approximately ten percent of the completed parent interviews. The first parent interview completed by an assessor was always validated. Over the course of the field period, a running count of an assessor’s completed parent interviews was maintained, and each tenth completed parent interview was selected for validation. This ensured that ten percent of
each assessor’s cases were selected for validation. The parent validation was approximately five minutes long.
Field supervisors used a standardized parent validation script when calling the parents. The script covered the following topics:
Verification of the child’s name, date of birth, and gender; and
Between eight and ten questions from the current round interview were re-asked of the
parent.
During the validation process, no evidence was found of parent interviews being falsified.
5.7.3 School Validations
To ensure that assessments proceeded smoothly, a validation call was completed with the school principal in approximately ten percent of each supervisor’s assigned schools in both the fall- and spring-kindergarten collections.
Field managers conducted the school validations by telephone. The first school that each team completed was called to ascertain how well the preassessment and assessment activities went. If the feedback from the school was positive, the fifth school that each team completed was called. If any problems were indicated in the first validation call, immediate action was taken with the field supervisor. The validation feedback was discussed with the supervisor and remedial action was provided, including in-person observation of the supervisor’s next school if necessary.
Field managers used a standardized script when calling the school principals. The script covered the following topics:
How well the ECLS-K supervisor organized and executed the sampling tasks;
An overall rating of how the assessments went;
Feedback about the study from the children and kindergarten teachers;
Suggestions for improving procedures and making it easier for a school to participate;
and
5.7.4 Quality Control of the OLDS Scoring
The OLDS used to screen children for English language proficiency included the “Let’s Tell Stories” subtest. This subtest involved reading the child a short story and having the child repeat it back to the assessor. The child’s responses were recorded verbatim and scored by the assessor. Responses to this subtest are unique to each child, and it was important for interviewers’ and coders’ scoring of the child’s responses to match the preLAS®2000 standards.
ECLS-K assessors were trained to conduct the OLDS using audiotapes of the stories and children’s responses to the stories. Assessors listened to the audiotaped stories and to the child’s responses and recorded the child’s responses verbatim. Then the assessor scored the story using the preLAS®2000 rules. Reasons for scoring each story a particular way were discussed in detail. Differences
between the assessor’s scores and the correct scores were discussed during training, so assessors could understand the difference between the scores. Several stories in each scoring category were provided for practice to fine tune the assessor’s scoring. Then the scoring ability of each assessor was tested. Only assessors who scored a 90 percent accuracy in scoring the training stories as matched against the
preLAS®2000 samples were allowed to conduct the OLDS.
A ten percent sample of each assessor’s OLDS stories were recoded in the home office. The coders received the same training as the assessors. Coders then scored the stories independently. If the home office coders’ scores differed from the assessor, their scores were verified by their supervisor. All cases were adjudicated by lead trainers for the OLDS. Approximately 66 percent of the stories had complete score agreement between the assessor, coder, and lead trainer. The additional 33 percent of the stories had score agreement by two of the three scorers.
5.7.5 Assessor Effects
Individual Test Administrator effects and Design Effects
A multi-level analysis3 was carried out to estimate components of variance in fall- and
spring-kindergarten cognitive scores associated with the: (1) student level, (2) school level, (3) team
3 Bryk, A. & Raudenbush, S.W. (1992). Hierarchical Linear Models: Applications and data analysis methods. New York, Sage Publications.
leader, and (4) individual test administrator. This secondary analysis was motivated by Westat’s earlier finding of larger than expected design effects. In addition, the impact on the above sources of variance of the SES indicator (parent’s education) was also estimated. It was expected that much of the clustering of students within neighborhood schools (hence higher design effects) could be explained by SES.
In addition to the potential clustering effects related to shared parent SES within schools, there was a concern that the individual mode of administration might inject additional and unwanted variance to both the individual and the between school components of variance in the cognitive scores. Since it is more difficult to standardize test administrations when tests are individually administered, this source of variance could contribute to the high design effects if the individual assessors differed systematically in their modes of administration.
It was found that the component of variance associated with the individual test administration effect was negligible in all three cognitive areas and thus had little or no impact on the design effects. Much of the design effects with respect to cognitive scores could be explained by parents’ SES.