This section addresses the final objective of this chapter: to examine the extent to which the inclusion of unverified CPRD-recorded cancer diagnoses causes overestimates in incidence figures from CPRD data, by repeating the primary analysis from Chapter 4 including only cancer registry recorded diagnoses.
5.8.1 Primary analysis: recap of one year cancer incidence
After exclusions (defined in the previous chapter), there were 31, 261 patients with thrombocytosis (9, 435 men and 21, 826 women) and 7, 969 patients with a normal platelet count (2, 599 men and 5, 370 women). Table 5.4 shows the number of men and women with thrombocytosis, and the number of each who had a record in either the CPRD or the cancer registry showing a diagnosis of cancer within one year of their index date (the date of their first blood test showing thrombocytosis, or equivalent).
5.8.1.1 Thrombocytosis cohort
There were 2, 453 cancers recorded in either the CPRD or the cancer registry in the thrombocytosis cohort; this represents a cancer incidence of 11.6% (95% CI 11.0-12.3) in men and 6.2% (95% CI 5.9-6.5) in women (Table 5.4).
5.8.1.2 Normal platelet count cohort
There were 225 diagnoses of cancer recorded in either the CPRD or the cancer registry in the normal platelet count cohort; this represents a cancer incidence of 4.1% (95% CI 3.4-4.9) in men and 2.2% (95% CI 1.8-2.6) in women (Table 5.4).
5. CPRD validation study
Table 5.4: Comparison of the number of incident cancer cases in each cohort when data from both the CPRD and the cancer registry, or only the cancer registry, are included. Thrombocytosis Men Women Included diagnoses N n cancers diagnosed Incidence % (95% CI) N n cancers diagnosed Incidence % (95% CI) Either source 9, 435 1, 098 11.6 (11.0-12.3) 21, 826 1, 355 6.2 (5.9-6.5) Cancer registry 9, 333 1, 021 10.9 (10.3-11.6) 21,668 1, 265 5.8 (5.5-6.2) Normal platelet count
Men Women Included diagnoses N n cancers diagnosed Incidence % (95% CI) N n cancers diagnosed Incidence % (95% CI) Either source 2, 599 106 4.1 (3.4-4.9) 5, 370 119 2.2 (1.8-2.6) Cancer registry 2, 580 93 3.6 (2.9-4.3) 5, 345 109 2.0 (1.7-2.4)
5.8.2 Sensitivity analysis: one year cancer incidence
The number of cancer registry recorded diagnoses was determined for each group. These and the respective incidences are presented in Table 5.4.
There were 2, 488 qualifying records of cancer diagnoses in the cancer registry. The majority of these (2, 286, 91.9%) were in patients with thrombocytosis; 1, 021 in men with thrombocytosis and 1, 265 in women with thrombocytosis. In patients with thrombocytosis, the one year incidence of cancer in men with thrombocytosis, when only cancer registry-recorded diagnoses are included, was 10.9% (95% CI 10.3-11.6) for men and 5.8% (95% CI 5.5-6.2) for women. In patients with a normal platelet count, there were 93 records of diagnoses in men and 109 in women. The one year cancer incidence in this subgroup was 3.6% (95% CI 2.9-4.3) for men and 2.0% (95% CI 1.7-2.4) for women.
5.8.3 Comparing one year incidence when all recorded diagnoses or only cancer registry recorded diagnoses are included
Including only cancer registry-recorded diagnoses in the one year incidence analysis resulted in 190 fewer diagnoses; 90 fewer in male patients and 100 fewer in female patients. The one year incidence decreased in all groups by less than 1%; a decrease of 0.7% for men with thrombocytosis, 0.4% decrease for women with thrombocytosis, 0.5% for men with a normal platelet count, and 0.2% for women with a normal platelet count.
5. CPRD validation study
This plus the overlapping confidence intervals between measures for each group suggests that the one year incidence of cancer does not greatly change in any of the subgroups as a result of including only cancer-registry recorded diagnoses in the analysis; and that there is little overestimation in incidence figures as a result of including CPRD-recorded diagnoses that were not confirmed by the cancer registry.
5.9
Chapter discussion
The validation study presented in the first part of this chapter has encouraging results; 5, 924 of 7, 028 cancer diagnoses recorded in the CPRD were confirmed by the cancer registry (a PPV of 84.3%). This result is supported by Boggon et al. (2013) who found that 4, 830 of 5, 797 CPRD recorded cancers were confirmed by the cancer registry (83.3%). In that study, there were more cases of colorectal, lung, urinary tract, and pancreatic cancer recorded in the cancer registry than the CPRD; lung and pancreatic cancer have particularly poor prognosis. Patients with these diagnoses may be more likely to die in hospital soon after their diagnosis, and details of this may not be fed back to their primary care record. That study also found that cancers diagnosed through routes other than histology (such as myeloma and leukaemia) were under-recorded in the cancer registry; some cancer registries are typically over-reliant on histology. The majority (528 out of 967, 54.6%) of CPRD cancer records that were not validated by a cancer registry record in Boggon et al. (2013) were validated by other means; hospital records or practice records. CPRD cancers that are not confirmed by the cancer registry cannot be assumed to be false records.
Dregan et al. (2012) found a higher level of concurrence between the CPRD and the cancer registry; 92% of CPRD cancers were confirmed by the registry data. However, that study included only colorectal, gastro-oesophago, respiratory, and urinary tract cancers. These four sites were chosen as the validation study builds on an earlier study from the same authors that investigated the incidence of cancer in patients with haematuria, haemoptysis, dysphagia, and rectal bleeding. Comparing the present study with Dregan et al. (2012)’s work, it appears that records of some types of cancer in the CPRD are more valid than others.
The present study found that sex, year of diagnosis, and age group at diagnosis were significant predictors of concordance in recording between the CPRD and the cancer registry. Boggon et al. (2013) also examined predictors of concordance, and found that age was predictive (with less CPRD recording with greater age). Sex did not predict concordance in that study, and year of diagnosis was not examined.
It is possible that some patients have codes in their CPRD records which indicate a cancer diagnosis, when they are not a true case. These errors can be caused by processes
5. CPRD validation study
or errors in the healthcare system, or data recording issues. Healthcare system issues include diagnoses being mistakenly recorded in the CPRD; Dregan et al. (2012) found that 77% of incorrect CPRD cancer records had an alarm symptom recorded; this could have resulted in mistakenly diagnosed cancer, or a ‘suspected’ cancer record being picked up in research studies as a diagnosis. Disagreement between the two sources on the primary site of diagnosis could also be attributed to erroneous recording if the first wrongly diagnosed site was recorded in the CPRD, and the correct site later recorded; the first cancer in the patient’s records would not match the later, correct, cancer registry record. The two data sources collect data in different ways at different points in the healthcare system; the CPRD data are updated monthly with multiple consultations and entries per diagnosis, whereas the cancer registry aims to make one single record for each cancer diagnosis in each patient. Data recording issues include the possibility that a cancer registry cancer is recorded just after the end date of the patient’s registration with a CPRD practice; and their diagnosis may not be fed back to their old practice. The CPRD and the cancer registry each use different coding dictionaries which could result in inconsistencies, and mistakes in the patient ID number used to link the two data sources may mean that a CPRD patient appears to have no record in the cancer registry, or vice versa.
The sensitivity analysis presented at the end of this chapter found that only includ- ing cancer registry diagnoses in the primary analysis (identifying the cancer incidence in patients with thrombocytosis and with a normal platelet count) did not greatly al- ter the results; the estimated cancer incidence for men with thrombocytosis was 11.6% (11.0-12.3) using CPRD and cancer registry diagnoses and 10.9% (10.3-11.6) using can- cer registry diagnoses only. Similarly, for women with thrombocytosis the incidence was 6.2% (5.9-6.5) including CPRD and cancer registry diagnoses, and 5.8% (5.5-6.2) using cancer registry diagnoses only.