PART II. STANDARDS
12. DATA FILE CREATION STANDARDS
12.2 DATA INTEGRATION
To ensure that PIAAC data from all sources are reliably and consistently integrated.
Rationale
In PIAAC, the structure of the data is complex and data will originate from a variety of sources:
Sample design international file (SDIF) from the country’s study management system (e.g. ID numbers, selection probabilities, screener disposition codes);
Assessment design meta and workflow data from the TAO platform (e.g. random assignments, branching and redirection to paper-and-pencil mode, sequence of adaptive measures);
Background questionnaire (BQ) responses from the TAO platform;
Log/audit information for the background questionnaire and general workflow (e.g. time taken, validation checks, interview pauses, interviewer actions);
Cognitive assessment responses and scores for automatically scored items from the TAO platform; Auxiliary and audit information from the TAO platform (e.g. time taken, number of activities,); Scoring of paper-and-pencil booklets (main and reliability scoring);
Coding of education, occupation, industry, language, country and region.
The corresponding databases and files must be matched and checked for structural consistency using unique record identifiers (see Section 10.6). Because of the complexity of the different sources of data in PIAAC, and given that most data will originate from the TAO platform, it is imperative that the Consortium provide software so that the national databases can be reliably built and verified on a continuous basis as the survey progresses and so that data can be delivered on time.
“Integration,” as used below, refers to the structural assembly of the above-mentioned sets/sources of variables to form the country database. “Importing,” as used below, refers to the incremental addition of data for individual cases or sets of cases from a particular source to the country database.
Standards, Guidelines and Recommendations
Standard 12.2.1 All data collected for PIAAC will be imported into a national database using the data integration software (i.e., DME “Data Management Expert”) provided by the Consortium, following specifications in the corresponding operational manuals and international/national record layouts (codebooks).
Guideline 12.2.1A The participating countries are responsible for data integration supervised by a
National Data Manager. The Consortium will provide support for this activity in the form of software, manuals, codebooks and mandatory training for National Data Managers as part of or separate from NPM meetings.
Guideline 12.2.1B All data has to be verified for structural consistency within and across sources and
for agreement with the internationally defined formats and record layouts. Countries are responsible for assuring that sample design and disposition data are recorded for every case (household or person), including those that do not enter the VM, and for checking that disposition codes are in agreement with the availability of BQ and assessment responses (CBA and/or PP). For all applicable cases, countries are responsible for assuring the availability and correct matching of BQ responses, computer-based
174
assessment responses and behavioural information, paper-and-pencil assessment scores and captured responses, and any applicable coding (education, occupation, industry, country, language, and region).
Guideline 12.2.1C Data must be imported on a regular and incremental basis as the survey progresses
(e.g. the ongoing import of data files generated by TAO for each respondent as they are returned from the field by interviewers).
Guideline 12.2.1D Information on data missing as a result of technical problems in the VM, lost paper
instruments, denied permission to share or for other reasons must be recorded and provided to the Consortium as detailed in the Data Management Manual.
Standard 12.2.2 Any national instrument adaptations, as agreed upon with the Consortium, must be reflected in the national record layout (codebooks).
Guideline 12.2.2A Adaptations to the national context must be reflected in the national record layout
before data are imported, based on the corresponding documentation. (See Section 6.2.) For instance, additional values in a BQ multiple-choice question must be reflected in the national record layout and must correspond to the BQ data that are expected to be imported from the TAO platform.
Guideline 12.2.2B All adaptations must be thoroughly tested prior to the production use of the data
integration software (DME).
Guideline 12.2.2C The integration of data will follow the adapted national record layout. Any
necessary recoding or mapping to re-establish the international record layout will be carried out after all data have been imported and integrated according to the documentation (i.e. the Background Questionnaire Adaptation Sheet) as agreed upon between the country and the Consortium. Per default, the Consortium will assume responsibility to map nationally adapted and international variables.
Quality Control Procedures
The data integration software provided by the Consortium will facilitate the adaptation of the record layout to the national context, the integration and importing of data and the verification of data accuracy.
The Consortium will review national adaptations to the BQ/JRA from a data and coding perspective before they are implemented.
The software will have the capability of generating reports that provide an overview of the consistency of the entire database. Each country will be required to generate and review these reports on a regular basis, make corrections as necessary and to conduct a final review before delivering the database to the Consortium.
175
12.3 DATA VERIFICATION