sex-biased expression
4. Preliminary study of sex-biased gene expression using microarrays
4.2.3 Statistical analysis
For the MFU arrays, I performed the analysis using a Microsoft Excel template designed by John Parsch. This provided a quick and easy way to analyze the results from microarray experiments using the MFU arrays. However, it was inflexible and required the data to have exactly the same layout for each experiment. Background correction was performed by subtracting the mean background intensity from the mean foreground intensity for each channel at each spot. The mean value of the eight replicate spots per gene was then used as the raw intensity value. The intensities were not logarithm transformed before analysis.
To normalize the intensities of the two channels (Cy3 = green and Cy5 = red), an adjustor was calculated from the ratio of the two channels of the control spots (male genomic DNA and female genomic DNA), i.e. the raw intensity of Cy5 was multiplied by the adjustor to make the average signal intensity of the two channels equal. For each slide, genes were assigned to sex-bias categories using either a fold-change cutoff or the results of a paired t-test, or both. For example, a gene with a
male/female ratio greater than 2 and P-value less than 0.05 from the paired t-test was classified as male-biased. A nested ANOVA test, which was similar to the paired t-test, was also performed to classify genes into different sex-bias categories. Assuming the sex bias classifications from previous studies (Ranz et al 2003; Parisi et al. 2003; Gibson et al. 2004) were "correct", I could estimate the frequency of type I and type II errors in my results.
A nested ANOVA test was also performed over all four replicates (including dye swaps and biological replicates) to classify genes as male-, female-, and nonsex-biased in their expression. The numbers of false negative errors (type II) and false positive errors (type I) were estimated in the same way as described above. Overall, every gene could be classified into a sex bias group by this analysis, although some classifications were not consistent across the different analysis methods. Gene expression in adult males and females of two highly inbred D. melanogaster lines from Africa (ZB82 and ZB398, derived from Lake Kariba, Zimbabwe) and two highly inbred D. melanogaster lines from Europe (EU01 and EU20, derived from Leiden, Netherlands) was analyzed in this manner. To further investigate inconsistencies with previous studies (Ranz et al 2003; Parisi et al. 2003; Gibson et al. 2004), the 91 genes were classified into two groups: genes with high quality signal, whose intensity value in at least one of the channels was one standard deviation above the local background in any replicate, and genes with low quality signal, whose intensity value in neither channel was one standard deviation above the local background in any replicate. To quantify the quality of signal of each gene, a signal score was defined as follows: a value of 1 was given to a gene if the intensity value in either channel was one standard deviation above the local background at a spot, otherwise a value of 0 was given. Because there were eight replicate spots for each gene, the maximum signal score for each slide was eight, meaning all eight replicate spots from that slide had a high quality signal. A signal score of zero means that all eight replicate spots were of low quality. In the end, the signal score for each gene was calculated as the mean signal score for that gene over the four replicate slides.
For the DGRC-1 arrays, the spot signal intensities were normalized using the CARMAweb server (https://carmaweb.genome.tugraz.at/carma/), which supports an intuitive graphical interface for the normalization and analysis of microarray data derived from current microarray platforms. In general, the microarray analysis can be
replicate handling, and 3) detection of differentially expressed genes. A schematic of how this is achieved in CARMAweb is shown in Figure 15. The preprocessing of two color microarrays consists of three steps: background correction, within array normalization and between array normalization. For my case, the parameters "subtract", "print tip loess", and "quantile method" were set for the above three steps, respectively. To exclude low quality spots, my analysis used only those spots whose intensity value in at least one of the channels was one standard deviation above the local background in any replicate slide. After normalization, the ratio of the fluorescence intensity of the two channels for each of the spots from each of the replicates was used as input for the Bayesian analysis software, BAGEL (Townsend and Hartl 2002), to detect differentially expressed genes between adult males and females. Two highly inbred D. melanogaster lines from Africa (ZB82 and ZB398, derived from Lake Kariba, Zimbabwe), two highly inbred D. melanogaster lines from Europe (EU01 and EU20, derived from Leiden, The Netherlands), and one lab strain of D. simulans (s1, derived from Chapel Hill, NC; Meiklejohn et al. 2004) were used for this analysis.
Figure 15. The picture above shows how this analysis workflow is modeled in CARMAweb and how the different components can be linked together. The central part is the Data directory. All files listed there can be used as input for a new analysis step. After each analysis step the results can be returned to this Data directory, or further analyses can be performed directly on the result files from a previous step. (From https://carmaweb.genome.tugraz.at/carma/)