Association testing in linear time

6.2 Future work

6.2.5 Association testing in linear time

In this thesis, we have worked on approaches that work well for datasets with up to several thousands of samples and hundreds of thousands of markers. While this is good enough for most datasets that are publicly available at this moment, it will not suffice for new datasets that are currently created. Large consortia, like the Genetic Investigation of Anthropometric Traits, have already gathered data for over hundreds of thousands of individuals (Berndt et al., 2013), requiring new algorithms that scale linearly with the number of samples and the number of markers. Fortunately, a number of fast and accurate sparse approximations to Gaussian processes have been proposed recently, which scale well and allow for parallel inference (Hensman et al., 2013; Gal et al., 2014). We think that similar inference schemes as presented there can also be used to accelerate association testing in linear mixed models. Moreover, we are exploring new covariance functions which might allow for faster inference. For instance, Davies and Ghahramani (2014) suggest to construct a kernel based on random partitioning of the data. The similarity between two points is thereby defined as the probability that the two points are assigned to the same cluster. In genetics, SNP-based clustering is inevitably linked to inferring the population structure (Pritchard et al., 2000), making this the perfect choice for designing a covariance function that can capture the relatedness between samples. We also started exploring a permutation-based approach to correct for population structure (Huang et al., 2013). The key idea behind this is that the exchange probability between two samples is proportional to their similarity, making it more likely that samples are shu✏ed

within the same population. The method is generic in two ways: firstly, we can use an arbitrary covariance function to measure the similarity between the samples, and secondly, we can choose any desired test statistic for association testing.

Appendix A

Appendix

In this appendix, we provide a brief refresher to probability theory and linear algebra. For a more detailed introduction of probability theory, we refer the reader to (Wasserman, 2004; Bishop, 2006) and for linear algebra to (Lay, 2012; Golub and Van Loan, 1996).

A.1 Probability theory

In probability theory, we are given an experiment and want to assign a probability to a possible event A. An event is thereby a set of possible outcomes, and the set of of all possible outcomes is called the sample space ⌃.

Formally, we can then define a probability function p, as as any function that fulfills the following three axioms:

1. p(A)_{ 0 for all events A 2 ⌃.} 2. p(⌦) = 1.

3. if A1, A2, . . .2 ⌃ are disjoint, then

p 1 [ i=1 Ai ! = 1 X i=1 p(Ai). (A.1)

Probabilities can either be interpreted as “relative frequencies with which an event occurs in the long run”(Bulmer, 1979) or as a “subjective degree of belief”(Koller and Friedman, 2009), resulting in the frequentists and the Bayesian school of thought. While there has been and still is much dispute about the di↵erent merits of the two

approaches, a deeper discussion lies beyond the scope of this thesis, and can for instance be found in (Efron, 1986; MacKay, 2002). In this work, we use a pragmatic approach and apply techniques from both fields, depending on our subjective opinion what suits best for the problem at hand.

Basic properties Let A and B be two events. Then, their joint distribution is denoted as p(A, B). The two events are independent, if

p(A, B) = p(A)p(B). (A.2)

If p(B) > 0, the conditional probability of A given B is p(A_{|B) =} p(A, B)

p(B) . (A.3)

The product rule is then obtained by rewriting the definition of the conditional probability as

p(A, B) = p(A)p(B|A), (A.4)

and the sum rule is given by

p(A) =X

p(A, B). (A.5)

By combining the sum and product rule, we directly get to Bayes theorem p(A_{|B) =} Pp(B|A)p(A)

Bp(B|A)p(A)

. (A.6)

Random variable A random variable describes a mapping X : ⌦ ! R from the sample space to the real numbers. Random variables can either be discrete, taking a countable list of values, or continuous, taking any value in one or multiple intervals. From now on, we concentrate on the continuous case.

The cumulative distribution function FX : R ! [0, 1] of a random variable X is

given by

FX(x) = p(X  0). (A.7)

If the random variable X is continuous, the corresponding probability density function fX fulfills the following properties:

A.1. Probability theory 101 1. fX(x) 0

2. R1₁fX(x)dx = 1

3. p(a < X < b) =R_abfX(x)dx for a b.

The cumulative distribution function FX and the probability density function fX are

then linked via the identity

FX(x) =

Z x 1

fX(t)dt. (A.8)

Moments The expected value of a random variable X is defined as E [X] =

xf (x)dx. (A.9)

and its variance as

Var [X] =E⇥(X E [X])2⇤. (A.10)

The standard deviation of the random variable X is defined as the square root of its variance.

Gaussian distribution The Gaussian distribution is parameterized by the mean µ and the variance 2_{, and defined by the probability density function}

N x µ, 2 ₌ _p 1 2⇡ 2 exp 1 2 2(x µ) 2 . (A.11)

The distribution is bell-shaped around the mean and its width is determined by the variance parameter, see also Figure A.1. Its extension to d dimensions is given by

N (x |µ, ⌃) = p 1 (2⇡)d_|⌃|exp ✓ 1 2(x µ) >_⌃ 1_(x _µ) ◆ , (A.12)

where µ is now the d-dimensional mean vector, and ⌃ is the d⇥d covariance matrix. The importance of the Gaussian distribution is grounded by the central limit theorem. It states that, under mild conditions, the sum of a large number of independent random variables, that need not necessarily be Gaussian distributed by themselves, will be approximately Gaussian distributed (Bulmer, 1979).

Figure A.1: One-dimensional Gaussian distribution. Gaus- sian probability density function (pdf), with mean µ = 0 and variance 2 = 1, is shown in blue. The mean is depicted with red, the standard deviation with green. 68.27% of the mass of the distribution is within one standard deviation of the mean (yellow-shaded area).

Apart from that, the Gaussian distribution is also loved for its analytically tractability. If the marginal distribution p(x) and the conditional distribution p(y|x) are normally distributed

p(x) = _{N x µ, ⇤} 1 , (A.13)

p(y_{|x) = N y Ax + b, L} 1 , (A.14)

it is easy to show that the marginal distribution of p(y) and the conditional distribution p(x|y) are again Gaussians, and have a closed-form solution for the mean and the covariance (Bishop, 2006)

p(y) = _{N y Aµ + b, L} 1_{+ A⇤} 1_A> _, _(A.15)

p(x_{|y) = N x ⌃}⇥A>L(y b) + ⇤µ⇤, ⌃ , (A.16) where ⌃ = ⇤ + A>LA 1.

Prior, posterior and likelihood In Bayesian statistics, the parameters ✓ are not fixed, but random variables. They are endowed with a prior distribution p(✓) that reflects our beliefs, before we have seen any data. The likelihood function p(D|✓) connects the parameters with the data: it describes how good the parameters ✓ can explain the dataset D. After having observed the data, we can then update our beliefs of the parameters ✓ by using Bayes theorem

p(✓|D) = p(D|✓)p(✓)

A.2. Linear algebra 103

In document Modeling the polygenic architecture of complex traits (Page 109-115)