• No results found

Suppose that among n1 sampled cases (Y = 1) and n0 sampled controls (Y = 0), a fixed number of cjk subjects are observed at level k upon the random variable Xj, where this variable is defined such that each element within its support represents a specific level to the combination of

Gj and E, all j = 1, . . . , m and k ∈ {1, . . . , 3ε} = Xε. Precisely, for each Gj ∈ {0, 1, 2} = G and E ∈ {0, 1, . . . , ε − 1} = Eε, we have

Xj = 1 + Gj+ 3E. (4.2)

Note that any pattern of Gj and E can be expressed through combination(s) of element(s) from the support of Xj, Xε. These element(s) form a subset of the collection Xε, which we denote by the subscript of the pattern LC, C ∈ {A, B}. For example, consider the pattern given by (4.1). It is,

LA= (Gj∈ {0, 1}) ∧ (E = 1) ⇐⇒ Xj ∈ {4, 5},

for which A = {4, 5}.

When studying associations between disease status and explanatory variables by way of a para- metric additive model (e.g., logistic regression), our interest centers on determining which variables belong in the model and estimating their corresponding effect size (measured by way of the appro- priate coefficient of the model). On the other hand, when studying these associations within the context of logic patterns – as is the circumstance here – we seek logic expressions which are associ- ated with disease status. If Wlrepresents the indicator random variable with success/failure defined as those elements within the support of Xj lying within the collection Al/Bl, some l = 1, . . . , q, we consider the test of the null hypothesis of no association between disease status (Y ) and Wl, versus the alternative hypothesis for the existence of some association between these variables. Rejection of the null hypothesis, suggests that the odds of disease statistically significantly differs between the two levels in Wl. In turn, this indicates that the logical pattern ordered pair (LAl, LBl) is associated with disease status.

In the context of the aforementioned test of hypotheses, we essentially seek the logical pat- tern ordered pair (LAl, LBl) which yields the smallest p-value (i.e., strongest association signal) – computed under the null hypothesis – amongst all possible logical pattern ordered pairs thereof. Without loss of generality, assume that LAl and LBl are chosen such that the sets Al and Bl form a binary partition of the collection Xε – that is, we assume Al∪ Bl = Xε such that Al∩ Bl = ∅; equivalently, LAl and LBl are assumed complementary patterns. There are in fact 2

(3ε−1)− 1 dis- tinct binary partitions of the collection Xε, each relating to a unique logical pattern ordered pair (LAl, LBl). Thus, the consideration of the logical pattern ordered pairs pertaining to all partitions

of Xε, leads one to conducting a total of 2(3ε−1)− 1 inordinate tests of the null hypothesis of no association between Wl and disease status at each locus.

However, from a biological perspective, it makes sense to restrict attention upon logical patterns in Gj and E connected by the ∧ operator (e.g., (4.1)), particularly when searching for GxE interac- tion. This said, we construct our q-fold set of candidate pattern ordered pairs {(LAl, LBl)}l=1,...,q by formulating patterns upon Gj and E in a systematic manner, with the aims of assessing GxE interaction as well as assessing the main effects upon the genetic and environmental factors. Since heterozygous individuals most often have an intermediate phenotype, or the identical phenotype to that of the homozygous variant individuals (dominant GMI)2 or the homozygous wild-type individ- uals (recessive GMI) [109], here we consider the heterozygous genotype at locus j (i.e., Gj = 1) as an intermediate to the two homozygous genotypes. Hence, we consider the following four combinations of Gj: Gj ∈ {0, 1}, Gj ∈ {1, 2}, Gj = 0, and Gj = 2. Coalescing these combinations in Gj with those for the environmental factor (e.g., Gj = 0 combined with E = 0), we obtain a total of 4ε candidate patterns, denoted LA1, . . . , LA4ε (for clarity, we index candidate patterns by l). For each l = 1, . . . , 4ε, the candidate pattern LBl is defined to be the complement of LAl. Each of these candidate pattern ordered pairs formulate a distinct random variable Wl, by which to test the null hypothesis of no association between Wland disease status. Taken collectively, the hypothesis tests involving the random variables {Wl}l=1,...,4ε assess the effect of GxE interaction.

Following the line of regression modeling – which incorporates both main effects and interaction effects – we can also incorporate candidate patterns to assess genetic and environment main effects. The candidate patterns for assessing the genetic main effect are those encompassing the dominant and recessive genetic models, given by

LA4ε+1= (Gj∈ {1, 2}) ∧ (E ∈ Eε)

and

LA4ε+2 = (Gj = 2) ∧ (E ∈ Eε) ,

respectively, where the candidate pattern LBlis defined to be the complement of LAl, each l ∈ {4ε + 1, 4ε+2}. Each of these candidate pattern ordered pairs formulate a distinct random variable Wl, by which to test the null hypothesis of no association between Wland disease status. Taken collectively,

2Unless otherwise specified, henceforth when we speak of a GMI upon a SNP locus it is assumed in terms of the

the hypothesis tests involving the random variables W4ε+1 and W4ε+2 assess the main effect for Gj. On the other hand, consider now the candidate patterns essential for assessing the main effect in E. Insofar as we make no assumption regarding intermediate effects for the environmental factor, we consider this factor as a nominal categorical variable.3 We model our candidate pattern ordered pairs in an analogous manner to that of dummy coding a qualitative predictor in regression modeling, where level zero is our baseline group in E. In this regard, whenever ε > 2 the candidate patterns LAl and LBl will not be complements of one another. More precisely, for each l = 4ε + 3, . . . , 5ε + 1, the candidate patterns for assessing the environment main effect are defined by

LAl= (Gj ∈ G) ∧ (E = l − (4ε + 2))

and

LBl= (Gj∈ G) ∧ (E = 0) .

Each of these candidate pattern ordered pairs formulate a distinct random variable Wl, by which to test the null hypothesis of no association between Wl and disease status. Taken collectively, the hypothesis tests involving the random variables {Wl}l=4ε+3,...,5ε+1assess the main effect for the environmental factor. Table 4.1 summarizes our proposed candidate patterns for assessing – upon SNP locus j – each of the genetic and environmental main effects, and GxE interaction. Overall, a total of q = 5ε + 1 candidate pattern ordered pairs (LAl, LBl) are considered within our GEM approach presented here.