Microarray data acquisition, normalisation and analysis Hybridised slides were scanned with a confocal laser scanner

(ScanArray Lite, Microarray Analysis System, PerkinElmer Life Sciences, Beaconsfield, UK). The microarrays were first scanned at a laser output of 633 nm to detect Cy5 labelled cDNA first, as this dye is more sensitive to photo-bleaching. Scans were performed at 10 µm resolution and laser

intensities and PMT gain ranging from 60-90 % and then saved as *TIFF files. The process was then repeated with a laser output of 543 nm to detect Cy3 labelled cDNA. One scan for each dye type was selected for analysis. The criteria for selection included; the scans with the highest average signal intensity; no saturated features (spots) and a low background signal. The fluorescent signal intensities of both dyes for each spot were measured using Quantarray Microarray Analysis software (PerkinElmer Life Sciences). Features were located with a 20 x 20 spot-grid with 224.5 nm spacing

between spots, a 160 nm standard feature diameter, grid elasticity was set to 10 and feature elasticity was set to 50. The grid and feature elasticity is a feature of the Quantarray Microarray Analysis software, and allows automatic location of the sub-arrays (grids) and features which fall outside the expected location as dictated by the grid and feature spacing. Adaptive quantification, which allows the spots intensity to be measured from spots of different sizes, was performed with a maximum feature diameter of 190 nm with background measurements taken between 220 to 265 nm from the centre point (Figure 3.20.a).

Figure 3.20.a. Red area represents feature data area (variable 160-190 nm diameter), blue portion represents background area measured from 220-265 nm from the centre-point of the feature.

Data processing was performed using the web-accessible MicroArray Data Suites of Computed Analysis (MADSCAN) software (Le Meur et al., 2004). MADSCAN then follows a stepwise data processing method encompassing:

Filtration. Median background intensities (taken from the 220-265 nm reading) of replicate spots are subtracted from each of the corresponding foreground intensities (taken from the 160-190nm feature data area). Features which exhibited high diameter variance, signal below background level or signal above saturation level were removed from further analysis.

Normalisation. Genes showing the lowest variation in expression between the two conditions were selected by an iterative Rank Invariant Method algorithm (Tseng et al., 2001). This method ranks all of the spot intensities in the Cy3 and Cy5 channels and the spots where the difference in rank is less than 5 are deemed to be non-variant. The data for the remaining genes are subjected to an intensity dependent normalisation to the non- variant genes using the Lowess fitness algorithm (Dudoit et al., 2000). Lowess fitness stands for Locally weighted estimated and uses the

geometrical mean (A) and a constant (k) to minimize signal-dependent non- linear bias between the two intensity levels, Cy3 (G) and Cy5 (R). Madscan performs a within-print tip (local) normalization with a smooth parameter (defined as the fraction of data used to smooth at each data point) f = 0.40.

Scaling. After normalisation, scaling procedures were applied to bring the variances of filtered and normalised expression values among replicate spots to the same variation level.

Outlier detection. Due to the small number of replicates available in microarray experiments, the use of modified statistical tests are required to

experiments represent the genes showing significant changes in expression between the two conditions.

The median absolute deviation test is a measure of the spread of data, similar to standard deviation (Burke, 2001), and is expressed as;

MADamedian xi@xb L L L M M M T U 1 za 0.675 _x i@xm ` a MAD fffffffffffffffffffffffffffffffffffffffffffffff ₂ if ziL L M M≥3.5then x_iis an outlier. 3

The Grubb’s test specifically detects outliers in data. If the largest value in a data set is suspected of being an outlier then;

Tna

xn@xb

fffffffffffffffffff_, ₄

but if the smallest value is suspected then;

T_iax

b_@x1 σ

ffffffffffffffffff_, ₅

where xn is the largest value, x1 is the smallest value, x is the mean of

all the n values and s is the sample standard deviation of all n values. The

critical value for the test performed depends on the sample size n and the

selected significance level

Data integration. The replicated data points are summarised using mean and coefficient of variation values per slide and between replicated slides. This step consolidates the data sets and allows the comparison between them.

Hierarchical clustering with Cluster and Tree View software 1.6.6 (Eisen et al., 1998) was performed with the SSH array data to obtain a good visual data representation. These programs permit the clustering of genes

Fold change method (Draghici, 2003) was used for selection of differentially expressed genes. Using the fold change method, genes were considered up-regulated if the ratio between salinity stress and control normalised expression values was reproducibly (including dye-swap) higher than 1.5 (mRNA more abundant in fish encountering salinity stress) and under expressed if the ratio was lower than 0.6 (mRNA is less abundant in fish encountering salinity stress).

Statistical analysis was performed using Student's t-test to determine the significance of inter-individual differences between gene expression in fish in the FW to SW acclimation experiments.

In document Development of microarray techniques for the study of gene expression in the European eel (Anguilla anguilla) during silvering and migration to seawater (Page 88-93)