Chapter 3: MATERIALS AND METHODS
3.6 Data Linkage Process
The PATH data was provided to CHeReL with the following identifiers for linkage: Given name, Middle name, Family name (surname), Date of birth, Sex, Address. Locality/Suburb, State and Post code. The PATH data custodian also provided a record identification number (RecID), a unique code number which is assigned to each PATH participant and provided to CHeReL in order to maintain privacy. The RecID is different from the participants original PATH ID.
The MLK extract was linked to the PATHrecords using probabilistic record linkage methods and
ChoiceMaker software (Borthwick, Buechi, & Goldberg, 2003). Probabilistic matching computes weights for identifiers based on how well they can identify a match or a non-match, and uses these to calculate a probability that two records match. From this record pairs are classified as matches, non-matches or possible matches.
ChoiceMaker uses ‘blocking’ and ‘scoring’ to identify definite and possible matches. During blocking, ChoiceMaker searches the target datasets for records which are possible matches to each other. There are two types of blocking. The exact blocking algorithm requires records to have the same set of valid fields and the same values for these fields. The automated blocking algorithm builds a set of conditions that are used to find as many records as possible that potentially match each other. Scoring employs a combination of a probabilistic decision, which is computed using a machine learning technique, and absolute rules which include upper and lower probability cut-offs, to determine whether each potential match denotes or possibly denotes the same person. Upper and lower probability cut-offs initially start at 0.75 and 0.25 for a linkage and are adjusted for each individual linkage to ensure false links are kept to a minimum. The false positive rate for this extraction was 3/1,000 records (0.3%).
Once the linkages were finalised, CHeReL created a Project Person Number (PPN) for each person identified in the linkage and assigned this PPN to the PATH, APC, and EDIS records. CHeReL returned the PPN and the matching RecID from the PATH dataset to the data custodians. The data custodians are staff from ACT Health Government Epidemiology Branch. The data custodians then supplied the datasets with all the approved information from the source database plus the PPN to the project investigator (the author).
To allow linked records for the same individual to be identified and extracted the project investigator had to match the PPN generated by CHeReL to the unique RecID which the PATH data custodian has assigned to each PATH participant. The RecID was then matched to each participant’s original PATH ID number. The results of the linkage are given in Table 2.
3.6.2 Ethics
To gain approval to access CHeReL data an appropriate legal basis and ethics approval was required. Firstly, the project investigator was required to complete the CHeReL Application for Data form. This form outlines the project, including details of the investigators, the background, aim, research design and methods of the study, datasets to be linked and the linkage required. Investigators must also specify participant consent and the storage and retention of data. The investigators also specified those variables from both APC and ED data they wanted (e.g. date and time of admission, date and time of separation) and provided a research protocol. After reviewing these documents CHeReL provided the project investigators with a technical feasibility letter.
Ethics approval was then sought from the ACT Health Human Research Ethics Committee for the project. Approval from the PATH and ACT Health data custodians was also requested. Once these were approved the PATH data custodian provided CHeReL with personal identifiers from the PATH dataset. No clinical data were sent to CHeReL. PATH data were encrypted with a password and transferred to CHeReL using a secure file transfer facility.
3.6.3 Variables Requested
The variables requested for ACT APC data include age in years at time of admission, sex and marital status of participant, the date and time of admission, the date and time of separation and the length of stay. Day stay flags indicating if the patient’s admission was a same day or overnight stay were also requested. To identify the amount of time each PATH participant spent in hospital between waves each episode of hospital care was linked and the length of stay summed to form a complete hospital stay variable. Primary and additional diagnosis, up to 100 diagnoses, was requested, along with Major Diagnosis Category, to check if hospital admission was related to, or contributed to, declines in cognition. Finally, the hospital service-care type and separation mode were requested.
The variables requested for ACT EDIS included age, sex and marital status. A number of time variables were requested, including arrival date and time, triage date and time, seen date and time and actual departure date. To gauge the seriousness of the condition triage category, type of visit, diagnosis and departure status were requested. The triage category is a number between one and five which indicates how quickly patients should be treated based on how critical their condition is, with category one patients requiring immediate resuscitation. The type of visit describes whether the presentation is an emergency visit, return visit, if it is pre-arranged, if the patient is in transit or if the patient is dead on arrival. Finally, the departure status indicated whether the patient was
admitted to the hospital, if they were referred to another hospital, if they departed without being admitted or referred to another hospital, if they did not wait to be attended by a health professional or left at their own risk after being attended by health professional but before the emergency department service episode was completed. If they died in the emergency department as a non- admitted patient or were dead on arrival.
3.6.4 Data Management
The ACT APC and ACT EDIS datasets were stored on password protected files on password protected computers. These datasets were saved in both Microsoft Excel and SPSS formats. All statistical analyses were conducted using SPSS. Details on statistical analysis, including exclusion criteria and missing data, are provided in the manuscripts in Chapters 6 and 7.