Statistics for concentration-dependent analyses

the other genes to adjust the individual genes in terms of the global variance. In case of d0 = 0,

tgj = tgjholds (Ritchie et al., 2015).

3.4 Statistics for concentration-dependent analyses

In concentration-dependent gene expression studies a convincing concentration progression is a criterion for data quality. Genes that are deregulated by a compound at a certain tested concentration are usually also deregulated at the next higher concentration. Therefore, genes with a deviating expression profile, i.e. genes with a non-monotonous concentration progression, may be indicative of low-data quality and, hence, should be treated with caution. In order to improve the data reliability, Grinberg et al. (2014) has introduced two indices for the progression analysis of gene alterations over increasing concentration levels, the progression profile index and the progression profile error indicator. Both statistics return exclusivity indices for the comparison of adjacent concentrations. They are calculated for each compound and incubation time point separately. Mathematically, the indices are defined as the probability of being not deregulated at a certain concentration level subject to the condition of being deregulated at an other concentration level.

Let C1 and C2 denote two concentration levels and GDiffC1 and G

Diff

C2 the events of being dif-

ferentially expressed at C1 and C2. The complement GDiffC1 indicates the event of being not

differentially expressed at C1. The conditional probability of GDiffC1 given G

Diff

C2 is then defined as

the ratio of the probability of the intersection of the events GDiff

C1 and G

Diff

C2 , and the probability of

the event GDiff

C2 : P GDiff C1 |G Diff C2 = PGDiff C1 ∩ G Diff C2 P GDiff C2 . (3.9)

This quantity is estimated by replacing the events with the corresponding relative proportions of genes that are deregulated.

3.4.1 Progression profile index

The progression profile index is defined as the ratio of two proportions, the proportion of genes

that are deregulated exclusively at the higher concentration C2, and the proportion of genes that

are deregulated in total at C2. In formula in (3.9) this corresponds to the situation C1 < C2.

concentration, whereas values close to one indicate many additional genes deregulated at the higher concentration.

3.4.2 Progression profile error indicator

The progression profile error indicator is defined vice versa to the progression profile index, namely as the ratio of the proportion of genes that are deregulated exclusively at the lower concentration, and the proportion of genes that are deregulated in total at the lower concentration.

In terms of formula (3.9), it holds C1 > C2. Values close to one indicate that a high fraction

of genes are deregulated exclusively at a lower but not at the respective higher concentration. Values close to zero indicate the revers case. Compounds with values above 0.5 are considered as indicative of an implausible concentration progression.

3.4.3 Modified progression profile error indicator

The modified progression profile error indicator is an adjustment of the progression profile error indicator and has been introduced for the case that only a few genes are altered in total. As a certain amount of false positive genes is to be expected, a tolerance limit, i.e. a minimum amount of differentially expressed genes, should be set before including the respective genes in the calculations of the progression profile error indicator. Therefore the number of genes deregulated in total is incorporated in the calculation of that index. The progression profile error indicator is altered in the sense if the value of the index is larger than 0.5 and the number of genes deregulated at the respective lower concentration is below 20, the value of the index is set to zero. The interpretation of the modified index is the same as for the progression profile error indicator.

3.4.4 Selection value

To systematically analyze stereotypic versus compound-specific gene expression responses, the selection value principle has been introduced in Grinberg et al. (2014). A stereotypic response means that an expression alteration is induced by many compounds, while a specific expression response is induced by individual compounds or small numbers of compounds. For a gene, the selection value determines the number of compounds that induces a change in its expression. Compounds are ranked gene-wise in order of magnitude, in case of upregulated genes compounds are ranked from high to low fold changes and in case of downregulated genes from low to high

3.4 Statistics for concentration-dependent analyses 31

values. The selection value x for a gene (Sv x) defines the rank of the compound, indicating that the gene is induced by at least x compounds. The threshold for the critical change is pre-specified. In case of small replicate numbers, it is recommended to consider higher thresholds to keep the number of false positive genes as low as possible. The probability of false positive alerts decreases with increasing fold change. The higher the selection value the less compound-specific is the response. For a given fold change the so-called Sv 20 genes refer to those genes which respond to at least 20 compounds reflecting a stereotypical response. By contrast, a compound- specific response is here specified by Sv 3 genes, i.e. genes which are deregulated by at least three compounds. Note, that genes of higher selection values always overlap with genes of lower selection values, i.e. Sv 20 genes are a subset of Sv 3 genes.

Based on the selection value concept a consensus Sv x signature of genes comprises the Sv x gene lists of all individual test conditions. That means, the consensus Sv x list includes all those genes that show for at least one of the tested conditions a change in expression. Consensus genes are often used for the comparison of different model organisms, test systems, or data sets.

3.4.5 Overlap ratio

The overlap ratio is introduced to approach the question whether the overlap of genes between two test conditions, condition 1 and condition 2, corresponds to a randomly expected result. The ratio quantifies to which degree genes in the overlap are overrepresented, whereby a value of 1.0 indicates a random overlap and values higher than 1.0 are indicative of an overlap which is higher than expected by chance in case of independence. A ratio of 2.0, for example, indicates that twofold more genes are in the overlap than randomly expected. The overlap ratio is defined as follows:

Overlap ratio = O · nGene universe

nCondition 1· nCondition 2

where nGene universe represents the total number of genes on the array (array ˆ= sample), nCondition 1

represents the total number of genes that are altered under the influence of test condition 1,

nCondition 2indicates the total number of genes differentially expressed under test condition 2, and

O represents the number of genes in the overlap. Significance of overrepresentation is calculated by the Fisher test. The basic idea of the overlap ratio was first presented in Shinde et al. (2017).

In document Statistical analysis of concentration-dependent high-dimensional gene expression data (Page 33-36)