7Methodological considerations of the research contained in this thesis

The challenge of an evaluation exercise in absence of optimal reference standards

Before embarking on a mission to change the current practice of signal management we need to know what the actual performance is. The usual method used to assess performance of signal detection algorithms is by using diagnostic test-related parameters [65] as sensitivity, specificity, positive predictive value and negative predictive value or AUC. These metrics assess the algorithms’ capacity to discriminate between true signals and non-causal associations. To be able to use these metrics we need a reference standard for comparison, composed of ‘true positive’ and ‘true negative’ signals, classified according to the best currently available evidence. The sources of evidence for true drug-disease associations may be: published scientific literature, product information leaflets or expert opinion.

The absence of a robust reference standard represents a major obstacle in evaluating the performance of signal detection methods. Even when they do exist, reference standards are mostly sub-optimal. Firstly, most reference standards are limited in size, due to time-constraints. They contain a limited number of drug event-associations and it is customary to focus on a small set of drugs or outcomes of interest. This is also the case for our paediatric reference standard used in Chapter 4.1, which focuses on of 16 paediatric drugs and 16 ADRs. Secondly, many reference standards also lack verified true negatives (controls), and their focus is on positive test cases only. This is a major limitation since in absence of true negative associations we cannot assess specificity of the method or the AUC. Only a partial performance can be calculated. We have avoided this in our research by using only reference standards with both positive and negative cases. A third limitation, which is also very difficult to avoid, is the possible correlation between the constructed reference standard and the database where the method is applied. Even if not directly consulted in the creation of the reference standard, information from spontaneous reporting often contributes to product labelling and the patients’ perception of ADRs and might influence the classification [66]. We could not avoid this completely in either since we used information from product information leaflets for verifying the true positive signals. Information contained in product information leaflets might influence the reporting behaviour.

Ultimately, constructing a universally valid reference standard to test signal detection methods for challenging since causality assessment is not a black and white decision and is also fluctuating with time. Knowledge accrues over time as supplementary data like new studies, better conducted and in larger populations substantiated with biological evidence, or simply more cases become available. This is one reason why many research groups construct their own reference standards at the time of the study. The most common approach has been to use historical (‘time-frozen’) safety signals as positive controls. However, as mentioned before, the signals might change over time and therefore lead to misclassification. Noren et al. [67] argue that evaluation should be done against emerging and not established adverse events, and a time stamped reference database of ADRs would be the best way forward.

519883-L-sub01-bw-Pacurariu 519883-L-sub01-bw-Pacurariu 519883-L-sub01-bw-Pacurariu 519883-L-sub01-bw-Pacurariu Processed on: 5-6-2018 Processed on: 5-6-2018 Processed on: 5-6-2018

Processed on: 5-6-2018 PDF page: 132PDF page: 132PDF page: 132PDF page: 132

Chapter 7

132

Earlier efforts to develop reference standards were usually not systematic or transparent about their decision process, were limited in the size and diversity of drug-outcome pairs included, or lacked negative controls. This eventually got better as various research groups attempted to create reference standards for the purpose of testing signal detection methods:

EU-ADR reference standard was based on existing scientific literature and expert opinion and included 44 positive associations and 50 negative controls for the ten outcomes of interest: bullous eruptions; acute renal failure; anaphylactic shock; acute myocardial infarction; rhabdomyolysis; aplastic anaemia/pancytopenia; neutropenia/agranulocytosis; cardiac valve fibrosis; acute liver injury; and upper gastrointestinal bleeding [68]. PROTECT reference standard was compiled based on information contained in the product information of 220 drugs approved in Europe [15]. Date when the ADR appeared in product information is also captured. It contains only positive test cases. Harpaz et al. constructed a reference set based on drug labelling revisions, such as new warnings, which were issued and communicated by the US Food and Drug Administration in 2013. The reference standard includes 44 drugs and 38 events, both positive and negative cases and is time indexed, containing the date when an association (positive test case) became known according to product labels [66].

For the purpose of methods testing OMOP built a reference set of 399 test cases: 165 ‘positive controls’ that represent medical product exposures for which there is evidence to suspect an association with the outcome, and 234 ‘negative controls’ that are drugs for which there is no evidence that they are associated with the outcome, for four health outcomes of interest: acute myocardial infarction, acute liver injury, acute renal failure, and gastrointestinal bleeding. The reference standard spans 181 unique drugs, including nonsteroidal anti-inflammatory drugs, antibiotics, antidepressants, angiotensin-converting enzyme inhibitors, β-blockers, antiepileptics, and glucose-lowering drugs. The work is continued by OhDSI who tries to develop an impressive reference set of 1,000 active ingredients across 100 HOIs [69]. They want to capitalize on previusly contructed reference sets and use a wide range of information sources as: literature, product information and observational healthcare data.

Since none of the existing reference standards was fit for purpose (being restricted either in number of products or outcomes covered), in this thesis we used two reference standards tailored to our research. One was constructed based on published scientific literature and expert opinion (Chapter 3.1) and consists of both positive and negative reference drug events pairs, focused on selected outcomes of interest. We included both positive and negative controls, and scientific literature was used as source of information. In contrast to the approach used in previous studies, verification was performed for all drug-event associations associated with the events of interest, irrespective if they were highlighted as signals or not.

519883-L-sub01-bw-Pacurariu 519883-L-sub01-bw-Pacurariu 519883-L-sub01-bw-Pacurariu 519883-L-sub01-bw-Pacurariu Processed on: 5-6-2018 Processed on: 5-6-2018 Processed on: 5-6-2018

7Methodological considerations of the research contained in this thesis

Chapter 7

132

Summary, general discussion and future perspectives

133

7