Chapter 4: Development of a Bayesian mixture model to enhance interpretation of
4.4 Discussion
4.4.4 Model limitations
As mentioned earlier, the largest constraint to full interpretation of the model output described here is the use of the same data for both model fitting and validation. This was considered unavoidable, as little other parasitologically-confirmed data were available. As such, whilst the modelling framework described here may be reasonable, extra care should be taken when interpreting the results of this preliminary evaluation. The difficulty in obtaining good quality parasitological data is a known issue for echinococcosis. As the positive predictive value of purgation for identification of echinococcosis would be expected to be 100%, this approach could therefore be used to identify positive samples. However, the accurate identification of negative samples is very challenging, and itself would be worthy of further investigation. Whilst some account for this is incorporated into the model structure (using the Z-score to exclude potential false negatives), the presence of false negatives (which would be expected to overrepresent those individuals with low burdens which comprise a large proportion of the population, if not a large proportion of the total parasite biomass) is a considerable problem for full evaluation. Although the use of faecal samples from known nonendemic / echinococcosis-negative areas is a simple solution to this problem, these dogs are unlikely to be representative of negative dogs in a highly
123
endemic area. Further work investigating optimal methods of parameterising the current model in the face of this challenge would be beneficial.
One central concept of the model developed here is that the distribution of coproELISA OD values from noninfected dogs will be broadly Gaussian distributed. The issues associated with the use of the (unbounded) Gaussian distribution to model an outcome which can only take positive values were introduced in the previous chapter, and remain a potential issue here. However, these issues were considered relatively trivial. Analysis of coproantigen data from nonendemic sites has repeatedly suggested that the distribution of OD values amongst true Echinococcus spp negative dogs follows a Gaussian distribution (author’s own observation) – hence the original practice of calculating a cut-off for positivity which is three standard deviations above the mean of a known negative panel (Deplazes et al., 1992; Allan et al., 1992). The necropsy negative samples used here did not follow a Gaussian distribution – with at least three dogs having higher than expected OD values. As described in the previous chapter, the data in the current study are of high quality (having been based upon necropsy and visual inspection of intestines by experienced individuals), but were not derived using the ‘gold standard’ test (the sedimentation and counting technique). As such, it is plausible that some infected animals (especially those with low burdens) may not have been detected, and as such will be classified as false negatives (Allan and Craig, 2006). However, a similar non-Gaussian distribution of OD values amongst negative samples has also been reported from a study of foxes in France (Raoul et al., 2001). This data was based upon the sedimentation and counting technique, and therefore the sensitivity would be expected to be high (Raoul et al., 2001).
One other possible explanation for the observed lack of Normality amongst negative samples relates to the disparity between infection and the presence of coproantigens. The coproantigen ELISA test detects Echinococcus coproantigens rather than the presence of worms per se. Coproantigens may be present for some days after removal of the worms themselves with a cestocidal drug (Deplazes et al., 1990; Allan et al., 1990; Jenkins et al., 2000). As a praziquantel dosing campaign was in place in the study area at the time of the study (van Kesteren et al., 2015), those dogs with high OD
124
readings may have recently been treated with praziquantel (leaving them free of worms but still with residual coproantigens). The possibility of cross reaction with other cestodes such as Taenia spp was considered unlikely in the current case as necropsy was conducted and Taenia are large worms which would be difficult to overlook.
Further work is required to evaluate the usefulness of the current Bayesian finite mixture model as a method of interpretation of ELISA data, in particular with regards to the incorporation of worm burden data. One other issue of relevance is coendemicity of different strains or species of Echinococcus in an area. The data used here was taken from an area principally endemic for E. granulosus sensu lato (which includes a number of different species and strains, but which all have a similar lifecycle). However, there are major foci of Echinococcus spp infection where both E. granulosus and E. multilocularis coexist (including Kyrgyzstan). Due to differences in the lifecycles of these two species (in particular in terms of intermediate host preference, but also in terms of patterns of infection in domestic dogs (Kapel et al., 2006) and potential immunity (Budke et al., 2005b), the distributions of parasites in infected dogs and the effect of an intervention campaign may differ between species. As the coproELISA does not allow species identification, PCR techniques are required to distinguish these species. Methods of incorporation of PCR data into the current model, possibly using latent class models (Hartnack et al., 2013) and/or Bayesian strategies (Praet et al., 2013) will be investigated in future work.
4.5 Conclusions
The current paper describes a novel approach for interpretation of canine coproantigen ELISA data based upon Bayesian finite mixture modelling. The model can be made identifiable through the incorporation of samples of known status taken from endemic areas. The limited sensitivity of the methods of diagnosis available can be incorporated into the Gaussian-distributed negative component of the mixture model, and the skewed distribution of positive samples can be explicitly accounted for by using Polya trees. The output of the model can be used for traditional dichotomous
125
interpretation of sample data (along with estimation of the sensitivity and specificity of the test), but it is suggested that attention is given to the possibility of interpretation of results on a continuous scale. Methods of interpreting this data at the population level and at the individual level are discussed, along with potential areas of further application – in particular, by incorporating the model output into statistical and mathematical models. Despite these promising signs, further work is required to evaluate this approach, using data collected from other areas, and also considering incorporation of PCR data in order to allow identification of species of Echinococcus present.
126