1.1 Experimental prediction approaches 22
1.1.6 Microarrays 33
Once the whole genome sequences of the major organisms had been completed, searching the function and the structure of the genes is a long term task. This is “functional genomics”. Obtaining an overview of the global gene expression patterns in normal and disease conditions will enable researchers to develop understanding of gene spatio-temporal interactions and regulations. Microarray technology led the transition from studies of the individual biological functions of a few related genes, proteins or pathways towards more global investigations of cellular activity. Microarray technology began in 1989, and was announced to the wider scientific community in a publication by Schena et al that made researcher aware of the potential of array technology (Schena et al., 1995). Schena and colleagues described the high capacity of cDNA microarray
technology to monitor the gene expression of 45 Arabidopsis genes in parallel. This represented a major advance over Northern blotting, which reported expression level of only one gene at a time. Since then, the use of microarray technologies has been reported for multiple organisms, including yeast (Lashkari et al., 1997), Drosophila (White et al., 1999) and human.
1.1.6.1 Types of microarray
Microarray technology varies in terms of manufacturing method and detection method. There are two types of arrays in terms of manufacturing method, namely spotted arrays and oligonucleotide arrays. In spotted microarrays, the probes are oligonucleotides (oligos), cDNA or small fragments of PCR products that correspond to the genes that are “spotted” on the glass slide.
Oligonucleotide microarrays, typically refers to the specific technique of manufacturing used by companies such as Agilent where the oligos are longer sequences such as 60-mer probes and Affymetrix where the oligos are shorter sequences 25-mer probes; in both cases, the oligos are synthetic in origin, rather than derived from DNA clones.
There are two detection methods: one-colour microarrays or two colour microarrays. In one-colour microarray, one sample is processed, labelled for example with fluorescent dye, and applied to a microarray, such as those available for Affymetrix. In two-colour microarrays, two samples that are to be compared are labelled with different fluorophores and put on one microarray. The relative intensities of each fluorophore are used in ratio-based analysis to identify up-regulated and/or down-regulated genes. The fact that samples share the same background will significantly reduce any background effect and
increase the sensitivity of detection (Tang et al., 2007).
Affymetrix microarrays
Affymetrix (www.affymetrix.com) is a company based in United States that
manufactures DNA microarrays (also called GeneChips). The company
manufactures different types of array; expression arrays, exon arrays, tiling arrays and miRNA arrays of different organisms. The company now designs chip technology aimed towards clinical diagnosis.
Three-prime expression microarrays
The expression arrays are the first generation of the Affymetrix microarrays. The probes are designed to be complementary to the target sequences at the 3’UTR of the annotated, predicted genes and ESTs which called a consensus sequence in Affymetrix parlance (Cui and Loraine, 2009). Each gene is represented by
multiple probe pairs (also known as probe sets) which are used to measure the level of transcription of each ORF sequence represented on the Genechip. Each probe set has 25mer probe pairs selected from the target sequence to be perfect match and mismatch oligos. Each probe-pair consists of a perfect-match (PM) and mismatch (MM) probe. The PM probe is a 25-base sequence complementary to the target gene, whilst the MM probe is identical to the PM probe but a single
mismatch at 13th base. The sequences on the expression arrays are believed to
recognize unique regions of the three-prime of the gene. Figure 1.3 detailed the Genechip design method.
Figure 1-3 A schematic of a Affymetrix probe set.
Each gene is represented by multiple probe pairs. Each probe-pair consists of a perfect-match (PM) and mismatch (MM) probe. The PM probe is a 25-base sequence complementary to the target gene, whilst the MM probe is identical to the PM probe but a single mismatch at 13th base Picture taken from www.vsni.co.uk/software/genstat/htmlhelp/marray/AffymetrixChips.htm The Drosophila Genome 2.0 Array was designed with sequence and annotation from FlyBase Drosophila Genome draft version 3.1, the Berkeley Drosophila Genome Project (BDGP) and additional public content from the Drosophila community. The array contains 18,880 probe sets covering over 18,500
transcripts. Fourteen pairs of oligonucleotide probes are used to measure the level of transcription of each ORF sequence represented on the Genechip Drosophila Genome 2.0 Array.
Tiling Microarrays
Tiling arrays are designed with the probes tiled across the whole target genome. The probes for some arrays are partially overlapping such as the S. cerevisiae Tiling 1.0R Array, whilst some arrays have non-overlapping probes such as the
Drosophila tiling 2.0R Array (details also in chapter 3.2.1.2). Tiling arrays can be
used for a range of applications including genome mapping, novel gene
discovery, DNA-protein interaction (ChIP-chip) and DNA methylation studies. The comparison of different types of microarray design are illustrated in Figure 1.4, the probes of 3’-end expression array are at the 3’-end of the genes, exon array probes are designed in each known exons of the genes and tiling array probes are tiled across the whole genome.
Figure 1-4 Diagram of different types of Affymetrix Microarrays.
The picture shows the design strategy of different type arrays. 3’ expression arrays’ probes at 3’ end, exon arrays’ probes at major exons and tiling microarrays’ probes across the genome. Pictured adapted from Affymetrix Web http//: www. Affymetrix.com.
1.1.6.2 Microarray and transcriptional profiling
A transcriptional profile is the main application of microarray that can measure gene expression patterns, gene structure and gene functions at the whole genome level. The whole genome expression array is designed for this purpose. The first whole genome microarray was employed for yeast in 1997; the arrays contained up to 2,479 yeast open reading frames (ORFs). The results of three experiments showed that many genes were differentially expressed under the three environmental conditions (Lashkari et al., 1997). Transcriptional profiling
analysis can be used in disease diagnosis, and the analysis of gene expression in cancer including disease pathology, progression, resistance to treatment, response to cellular microenvironments, and may ultimately lead to improve early diagnosis and innovative therapeutic approaches for cancer (DeRisi et al., 1996). Expression analysis using microarray has also been applied in the
toxicological research to define how the regulation and expression of genes mediate the toxicological effects associated with exposure to a chemical (Bartosiewicz et al., 2001a; Bartosiewicz et al., 2001b). For Drosophila, the
Dow/Davies lab created a FlyAtlas website (www.flyatlas.org). This web helps
the researchers all over the world to design the correct experiments to look for the gene expression pattern in specific tissues (Chintapalli et al., 2007; Wang et al., 2004).
1.1.6.3 Microarray and genotyping
Another main application of microarrays is their use for comparative genomic analysis. The use of microarray technology for genotyping is further advanced than for transcript profiling as illustrated. Single nucleotide polymorphism (SNP) is the most frequent type of variation in the genome. A single nucleotide
polymorphism array (SNP array) is a useful tool for studying slight variations between whole genomes. Specific uses of this technology include determining individual genome information (Redon et al., 2006), determining disease susceptibility (Botstein and Risch, 2003) and measuring the efficacy of drug therapies (Martinelli et al., 2009), SNPs can also be used to study genetic abnormalities in cancer (Bacolod et al., 2009).
1.1.6.4 Microarray and novel gene discovery
Traditional molecular approaches to identifying genes, including cloning and sequencing large collection of cDNAs (ESTs), have succeeded at identifying tens of thousands genes, they eventually reach a point of greatly diminished returns. Transcripts that are low abundance or expressed in rare cell types or in response to specific stimuli may never be identified by these methods (Mockler et al., 2005). Microarray can be used to solve some of these problems, allowing
confirmation of the predicted genes models such as expression arrays as well as a tool for novel gene discovery for example Tiling arrays. Tiling arrays have the probes tiled the whole genome, covering essentially all nonrepetitive regions of
the genome, and so enable the discovery of novel genes or novel alternative splicing. Human tiling arrays have been used to interrogate chromosomes 21 and 22 via 25-mer probes spaced on average every 35bp. The human tiling array studies used human cells line and tissue samples. The results indicated that a much larger portion of human genome is transcribed than was previously predicted and also revealed the activity of novel noncoding genes in human genome (Cawley et al., 2004; Kampa et al., 2004; Kapranov et al., 2002).
Drosophila tiling arrays has been used 25-mer oligonucleotide probes, spaced
evenly across the Drosophila genome at intervals of approximately 35 base pairs. The studies using tiling arrays of Drosophila genome show that 85% of the fly genome is transcribed and processed into mature transcripts, representing 30% of the fly genome and 30% of detected embryonic transcription is unannotated (Manak et al., 2006). Tiling array studies of 25 Drosophila cell lines also revealed more than one thousand novel transcribed regions (Cherbas et al., 2011).
Drosophila tiling arrays will be discussed further in chapter 3. Custom exon
arrays can also be used as a gene discovery tool for detecting novel splice junctions, which can subsequently use to find novel genes.
1.1.6.5 Genomic DNA mask for probe selection method
Although oligo-nucleotide arrays are a powerful and widely used tool for large- scale gene-expression profiling, most commercial arrays (Affymetrix arrays) are only available for model species. For example, Drosophila expression arrays are only available for Drosophila melanogaster but not available for other Drosophila species. Hammond and his colleagues developed a method to improve the
sensitivity of high-density oligonucleotide arrays when applied to heterologous species by using ‘Genomic DNA based probe selection strategy’ on the available species’ arrays (cross-hybridization) to mask off the heterologous sequences between the two species to improve the sensitivity of the gene expression detection. This is a potential gene discovery method for non-model species (Davey et al., 2009; Hammond et al., 2005).
1.1.6.6 Microarray data analysis
Microarray data analysis is the most difficult challenge in microarray
and even from operator to operator. The issue here is how to normalize the results to make them comparable? The normalization method is the key step. Normalization is a process that adjusts microarray data for effects that arise from the variation in the technology rather than from the biological differences. There are a variety of normalization schemes in use, including total-intensity, ratio-based and both linear and nonlinear regression techniques (Quackenbush, 2001, 2002).
RMA (Robust Multiarray Average) and GC-RMA are a very popular normalization method for microarrays especially for Affymetrix microarrays (Irizarry et al., 2003). Details also referred to Chapter 2, Section 2.7.4.
It is important to deposit microarray data in a format that can be used by others, such as NCBI, Geo and Array express repositories. Minimum information about a microarray experiment (MIAME) is the first successful submission method for microarray data to bring at least some basic standard to a microarray-based assay (Brazma et al., 2001). This standard information makes microarray data more useful and comparable.
There are no definite methods for data analysis but some commercial software and self-made pipelines are applied to the analysis of microarray data, such as Partek (Downey, 2006), Genespring, and Bioconductor which is a major package written in the R statistical language.
1.1.6.7 Advantages and disadvantages of microarrays
The microarray is the first technology that allows a global view of the gene expression patterns in the genome. It allows comparative genome analysis to find the SNP, copy number variation, novel genes and alternative splicing. The disadvantage is that microarray technology uses the hybridization values rather than digital count values to measure genes expression. As a result, microarray doesn’t generate absolute gene expression values and the
hybridization values are subjected to background noise. Microarrays require
prior knowledge of the genome and do not support de novo sequences. For novel
interfere with the identification of gene boundaries so that the information necessary for novel genes is subject to error. Normalization methods are rather difficult to apply to ensure that the microarray analyses can be compared to each other.