Multi-level Experiments - limma: Linear Models for Microarray and RNA-Seq Data User s Guide

We have considered paired comparisons, and we have considered comparisons between two indepen-dent groups. There are however experiments that combine both of these types of comparisons.

Consider a single-channel experiment with the following targets frame:

FileName Subject Condition Tissue

This experiment involves 6 subjects, including 3 patients who have the disease and 3 normal subjects.

From each subject, we have expression profiles of two tissue types, A and B.

In analysing this experiment, we want to compare the two tissue types. This comparison can be made within subjects, because each subject yields a value for both tissues. We also want to compare diseased subjects to normal subjects, but this comparison is between subjects.

If we only wanted to compare the two tissue types, we could do a paired samples comparison.

If we only wanted to compared diseased to normal, we could do an ordinary two group comparison.

Since we need to make comparisons both within and between subjects, it is necessary to treat Patient as a random effect. This can be done in limma using theduplicateCorrelation function.

The two experimental factors Condition and Tissue could be handled in many ways. Here we will assume that it is convenient to join the two into a combined factor:

> Treat <- factor(paste(targets$Condition,targets$Tissue,sep="."))

> design <- model.matrix(~0+Treat)

> colnames(design) <- levels(Treat)

Then we estimate the correlation between measurements made on the same subject:

> corfit <- duplicateCorrelation(eset,design,block=targets$Subject)

> corfit$consensus

Then this inter-subject correlation is input into the linear model fit:

> fit <- lmFit(eset,design,block=targets$Subject,correlation=corfit$consensus)

Now we can make any comparisons between the experimental conditions in the usual way, exam-ple:

Then compute these contrasts and moderated t-tests:

> fit2 <- contrasts.fit(fit, cm)

> fit2 <- eBayes(fit2)

Then

> topTable(fit2, coef="DiseasedvsNormalForTissueA")

will find those genes that are differentially expressed between the normal and diseased subjects in the A tissue type. And so on.

This experiment has two levels of variability. First, there is the variation from person to person, which we call the between-subject strata. Then there is the variability of repeat measurements made on the same subject, the within-subject strata. The between-subject variation is always expected to be larger than within-subject, because the latter is adjusted for baseline differences between the subjects. Here the comparison between tissues can be made within subjects, and hence should be more precise than the comparison between diseased and normal, which must be made between subjects.

Chapter 10

Two-Color Experiments with a Common Reference

10.1 Introduction

Now consider two-color microarray experiments in which a common reference has been used on all the arrays. If the same channel has been used for the common reference throughout the experiment, then the expression log-ratios may be analysed exactly as if they were log-expression values from a single channel experiment. In these cases, the design matrix can be formed as for a single channel experiment.

When the common reference is dye-swapped, the simplest method is to setup the design matrix using themodelMatrix() function and the targets file.

10.2 Two Groups

Suppose now that we wish to compare two wild type (Wt) mice with three mutant (Mu) mice using arrays hybridized with a common reference RNA (Ref):

FileName Cy3 Cy5 File1 Ref WT File2 Ref WT File3 Ref Mu File4 Ref Mu File5 Ref Mu

The interest here is in the comparison between the mutant and wild type mice. There are two major ways in which this comparison can be made. We can either

1. create a design matrix which includes a coefficient for the mutant vs wild type difference, or 2. create a design matrix which includes separate coefficients for wild type and mutant mice and

then extract the difference as a contrast.

For the first approach, the design matrix should be as follows

> design

WTvsREF MUvsWT

Here the first coefficient estimates the difference between wild type and the reference for each probe while the second coefficient estimates the difference between mutant and wild type. For those not familiar with model matrices in linear regression, it can be understood in the following way. The matrix indicates which coefficients apply to each array. For the first two arrays the fitted values will be just the WTvsREFcoefficient, which is correct. For the remaining arrays the fitted values will be WTvsREF + MUvsWT, which is equivalent to mutant vs reference, also correct. For reasons that will be apparent later, this is sometimes called the treatment-contrasts parametrization. Differentially expressed genes can be found by

> fit <- lmFit(MA, design)

> fit <- eBayes(fit)

> topTable(fit, coef="MUvsWT", adjust="BH")

There is no need here to use contrasts.fit() because the comparison of interest is already built into the fitted model. This analysis is analogous to the classical pooled two-sample t-test except that information has been borrowed between genes.

For the second approach, the design matrix should be

WT MU

The first coefficient now represents wild-type vs the reference and the second represents mutant vs the reference. Our comparison of interest is the difference between these two coefficients. We will call this the group-means parametrization. Differentially expressed genes can be found by

> fit <- lmFit(MA, design)

> cont.matrix <- makeContrasts(MUvsWT=MU-WT, levels=design)

> fit2 <- contrasts.fit(fit, cont.matrix)

> fit2 <- eBayes(fit2)

> topTable(fit2, adjust="BH")

The results will be exactly the same as for the first approach.

The design matrix can be constructed 1. manually,

2. using the limma functionmodelMatrix(), or 3. using the built-in R functionmodel.matrix(). Let Groupbe the factor defined by

> Group <- factor(c("WT","WT","Mu","Mu","Mu"), levels=c("WT","Mu"))

For the first approach, the treatment-contrasts parametrization, the design matrix can be computed

> design <- modelMatrix(targets, parameters=param)

or by

> design <- model.matrix(~Group)

> colnames(design) <- c("WTvsRef","MUvsWT")

all of which produce the same result. For the second approach, the group-means parametrization, the design matrix can be computed by

> design <- cbind(WT=c(1,1,0,0,0),MU=c(0,0,1,1,1))

or by

> param <- cbind(WT=c(-1,1,0),MU=c(-1,0,1))

> rownames(param) <- c("Ref","WT","Mu")

> design <- modelMatrix(targets, parameters=param)

or by

> design <- model.matrix(~0+Group)

> colnames(design) <- c("WT","Mu")

all of which again produce the same result.

10.3 Several Groups

The above approaches for two groups extend easily to any number of groups. Suppose that the experiment has been conducted to compare three RNA sources, “RNA1”, “RNA2” and “RNA3”.

For example the targets frame might be

FileName Cy3 Cy5

For this experiment the design matrix could be formed by

> design <- modelMatrix(targets, ref="Ref")

after which the analysis would be exactly as for the equivalent single channel experiment in Sec-tion 9.3. For example, to make all pair-wise comparisons between the three groups one could proceed

> fit <- lmFit(eset, design)

> contrast.matrix <- makeContrasts(RNA2-RNA1, RNA3-RNA2, RNA3-RNA1, levels=design)

> fit2 <- contrasts.fit(fit, contrast.matrix)

> fit2 <- eBayes(fit2)

and so on.

Chapter 11

Direct Two-Color Experimental Designs

11.1 Introduction

Direct two-color designs are those in which there is no common reference, but the RNA samples are instead compared directly by competitive hybridization on the same arrays. Direct two-color designs can be very efficient and powerful, but they require the most statistical knowledge to choose the appropriate design matrix.

In document limma: Linear Models for Microarray and RNA-Seq Data User s Guide (Page 49-55)