“The world is richer in association than in meanings, and it is the part of wisdom to differentiate the two”
John Barth, novelist
Once confounding, chance and bias have all been eliminated as possible contributory factors to an observed association, the validity thereof can be assessed by employing a set of inductive epidemiological criteria, originally proposed by Hill in 1965 (Hill, 1965). These criteria, some of which are applicable to genetic association studies, include consistency (or replication) and biologic plausibility.
I.5.1. REPLICATION, REPLICATION, REPLICATION
In the absence of a more extensive understanding of the effects at the molecular level that may contribute to the aetiology of OCD, replication (preferably in multiple independent studies) may be the most powerful evidence in favour of causality (Campbell and Rudan, 2002). In fact, it has been suggested that statistically significant associations should be replicated before any declaration for a susceptibility gene is accepted (Colhoun et al., 2003;
Little et al., 2002). In population-based genetic association studies conducted on OCD, a large degree of inconsistencies in results has been observed; however, the problem of non-replication is not limited to OCD investigations, but seems to be plague most association studies investigating the genetic aetiology of complex disorders.
At least three studies investigating how wide-spread the problem of non-replication of population-based genetic association studies in complex disorders is, have been conducted.
Hirschhorn et al. (2002) observed that only the results of six studies (out of 166 “positive”
genetic association studies) were consistently reproduced. Similarly, Ioannidis et al. (2003) performed 55 meta-analyses (comprising 579 study comparisons), and found that only 16% of the genetic associations identified were replicated without the influence of heterogeneity or bias. On the other hand, in a recent study Lohmueller et al. (2003) concluded that approximately a quarter of previously published associations are true, with false negative associations (usually due to underpowered studies) accounting for a large proportion of the inconsistent results. The authors subsequently advocate testing previously reported associations, replicated at least once, in large samples to identify the true genetic risk factors.
It could, of course, be argued that the inconsistencies and inability to replicate statistically significant findings may represent true variations of underlying associations between populations (Colhoun et al., 2003), as co-factors associated with the disease may be represented variably in different populations. Factors such as different degrees of LD between marker and susceptibility alleles, allele and haplotype frequency differences, environmental modifiers and patient ascertainment strategies may all contribute towards discrepant genetic association results between populations (Vieland, 2001; Stephens et al., 2001; Glatt et al., 2001).
As already mentioned, the degree of LD between the marker and disease-susceptibility alleles may vary between populations. For this reason, LD should always be examined within the context of the study population, rather than simply assuming LD between two variants based on the results from another study involving a different population. Moreover, association studies may be difficult to replicate if the variant under investigation possesses a low ES, variable penetrance and/or allele frequencies across populations. In particular, significant associations with rare alleles (with a frequency of below 5%) are more likely to be population-specific, and thus less likely to be replicated (Campbell and Rudan, 2002;
Pritchard, 2001; Wright and Hastie, 2001).
To circumvent the potential lack of replication, it may be conducive to examine alternative, more common (where possible, functional) variants in the same gene for association with the disorder. It has also been suggested that an “internal check” for association be conducted in the same population as the original positive study (Campbell and Rudan, 2002). Ideally, replication should be conducted in both family- and population-based association studies,
demonstrating both linkage and association of the variant with the disorder, further reinforcing evidence for the association (Owen et al., 1997).
In order to interpret an association that has not been consistently replicated, it is necessary to distinguish which of the factors are most relevant to discrepancies between the studies that results in non-replication, and to control for them. It is also important to note that a lack of replication does not negate the causal relationship between the variant and the disorder;
instead, it may indicate the need for further studies in certain populations, or a more detailed analysis of the gene containing the variant under consideration (Tabor et al., 2002).
I.5.2. BIOLOGICAL PLAUSIBILITY OF THE CANDIDATE GENES AND POLYMORPHISMS IN GENETIC ASSOCIATION STUDIES
A candidate gene is one that, on the basis of prior physiological, genetic or biochemical characterisation, is suspected to contribute to the aetiology of the disorder under investigation.
Candidate genes can be categorised into either positional candidates, chosen on the basis of a genomic location that has previously been found to be associated with the disorder, or hypothesis-driven candidates, chosen because of their (hypothesised) role in the aetiology of the disorder.
Obviously, any investigation into the causality of an observed association needs to account for the biological validity of the candidate gene, which depends on the prior probability that the candidate gene (and the variant under investigation) are involved in (in this case) OCD pathology. Therefore, it follows that a low prior probability of candidature will result in an increased risk of attaining false positive results.
In theory, the idea of prior probability would enable one to quantify biologic plausibility on a probability scale, and to incorporate it into subsequent statistical analyses; in reality, however, the prior probability that a candidate gene is involved in the development of OCD is difficult to determine exactly, due to the presently incomplete knowledge regarding the biological mechanisms of pathology. Prior probability can therefore, at best, be estimated, and is thus largely subjective and hypothesis-driven (Freimer and Sabatti, 2004).
On the basis of such estimation, the prior probability that a candidate gene is biologically plausible is increased if the gene has been found to be associated with existing familial forms
of the disorder and/or the same disease in a population of different ethnicity; if the gene variant is found to be involved in molecular mechanisms of the disorder; if the gene possesses a high mRNA copy in tissues thought to be affected by the pathological process of the disorder; and if sufficiently valid experimental evidence (for example, animal studies, gene knock-out models) exists to support the role of the candidate gene (or variant) in the disorder (or related disorders).
The prior probability that the variant under investigation is involved in the disorder (or is in LD with a susceptibility allele) cannot be dismissed when assessing the biologic plausibity of candidate genes. Given the wide array of variants that one can choose from in an association study, available data needs to be sifted through to prioritise and select which polymorphisms will be most conducive to detecting association. The most likely polymorphisms to be associated with disease are those that affect the function of the candidate gene and its associated protein (Tabor et al., 2002). Therefore, at first glance, it would seem that an expedient approach would be to identify variants within the coding regions of candidate genes for use as markers. However, even non-coding variants have been found to influence gene function, especially those contained in regulatory regions (Horikawa et al., 2000).
Consequently, searches restricted to only coding variants may bypass important information contained within the non-coding regions. Unfortunately, present knowledge pertaining to the characterisation of regulatory regions and their effect on level of gene or phenotype expression is still in its infancy. In spite of this, it is known that the functional effects of polymorphisms within candidate genes are normally complex, and that, at a molecular level, the combinatorial nature of alleles should be taken into account. Therefore, it would be more conducive to the study that the genetic variants under investigation be considered in their haplotypic context, rather than in isolation.