Array Design - Structure of Thesis - Genetic algorithm-neural network: feature extraction for b

1.6 Structure of Thesis

2.1.1 Array Design

At present, two prevalent approaches for DNA arrays are cDNA and oligonucleotide arrays, depicted in Figure 2.1, adopt different experimental platforms. The clone-based platform is used to produce cDNA arrays, while the oligonucleotide-based platform is used to create a high density of oligonucleotide arrays. Both arrays exploit hybridisation, however, they differ in terms of probe lengths and its composition, layout of sequences in the array, cross-hybridisation and hybridisation effects from an immobilised substrate (Mah et al., 2004), as well as objectives of the studies. For the studies where the focus is on a specific subject area and the abundance ratio of differentially expressed genes is needed, such as genes relevant to particular metabolic pathways, the low-density of cDNA array is required. Whereas, for the studies where little prior information on relevant genes is available, or where an unbiased overview of global changes in gene expression patterns is required, the high-density of oligonucleotide array is the best option (Tomiuk and Hofmann, 2001). Table 2.2 on page 21 presents a comparison study based on cDNA and oligonucleotide arrays, along with their

Table 2.1: Resources for microarray experiments and microarray repositories.

Website URL Resources

National Center for Biotechnol- ogy Information (NCBI)

http://www.ncbi.nlm.nih.gov/ GEO database, genbank, analysis software, search browsers

European Bioinformatics Insti- tute (EBI)

http://www.ebi.ac.uk/arrayexpress genbank, biological ontology, ArrayExpress database

National Human Genome Re- search Institute (NHGRI)

http://research.nhgri.nih.gov/ cDNA microarray protocols, cDNA microarray repository, analysis software

Broad Institute, cancer genomic group

http://www.broad.mit.edu/ analysis software, microarray repository & associated articles

Standford University, genomic department

http://smd.stanford.edu/ experiment protocols, SOURCE & AmiGO browsers, analysis software, cDNA microarray repository

Microarray Gene Expression Data (MGED) Society

http://www.mged.org/ MIAME standard, gene ontologies, MAGE GO Consortium http://www.geneontology.org/ gene ontology and partici-

pating laboratories

distinct advantages and disadvantages.

cDNA arrays containing cDNA fragments that are generated by PCR amplification of the cDNA clone, which is the reverse-transcriptase of two different biological samples mRNA that are labelled with different dye colours and hybridised to DNA sequences, that are robotically spotted on the surface of the glass slide (Ebert and Golub, 2004). After hybridisation, a special scanner is used to measure the intensity of fluorescence of each differentially expressed gene on a fine grid of pixels and to produce the digital image of hybridised arrays. Normally, higher fluorescence indicates a higher expression value of the gene in the sample. The cDNA array is relatively simple to produce and is inexpensive for laboratories with access to robotic equipment, however, it needs careful attention to the chemistry which adheres the DNA to the glass (Ebert and Golub, 2004). A lack of standard procedure due to manufacturing errors and improvised techniques used in producing high-quality cDNA arrays by individual research laboratories has caused more unnecessary problems than one might expect. For instance, the primary technical difficulty in microarray experiments is the amount of each DNA probe that is robotically spotted on different arrays. To control inconsistency, sample RNA is often hybridised with a defined amount of reference RNA that is labelled with a different fluorescent

Figure 2.1: The microarray experiments: Oligonucleotide versus cDNA arrays. In all microarray experiments, the RNA is extracted from the biological sample and the RNA is then amplified using PCR assays. For oligonucleotide microarrays, the probes are directly synthesised onto solid surface and the single-dye colour is used to read the gene expression in the sample. For cDNA microarrays, the PCR products from cDNA libraries are deposited onto a solid surface and the two-dye colour is used to read the gene expression in the samples.

dye (Ebert and Golub, 2004). However, this yielded another technical concern, that is, the amount of reference RNA in the hybridisation process is dependent on the amount of probes that are robotically spotted and also the manufacturer’s guideline. Additionally, cDNA probes often contain repetitive sequences, as a result, the process becomes intensive, especially when the experiment is conducted on a genome-wide scale, consequently, the cross-hybridisation has become more problematic (Ebert and Golub, 2004). A significant advantage of cDNA array is that it does not require prior sequence information due to it being initially designed for sequence modelling. Thus, it is an attractive alternative for model organisms whose genomes are not yet sequenced (Ebert and Golub, 2004). The disadvantages of cDNA array are, inconsistency in the procedure adopted by the individual research laboratories and manufacturers resulting in a variation in gene measurements, intensive computational cost on a genome-wide scale experiment and is problematic on cross-hybridisation due to repetitive sequences in cDNA microarrays.

Comparing to cDNA arrays, high-density oligonucleotide array is an active area of technological development (Parmigiani et al., 2003) and the most widely used oligonucleotide arrays are manufactured by Affymetrix (Santa Clara, CA) which uses the in-situ photolithographic synthesis technique to produce oligonucleotides

onto the array chips. Therefore, oligonucleotide arrays, also referred to as Affymetrix arrays, are less problematic than cDNA arrays. The oligonucleotide arrays contain short oligonucleotide probes with a length between 25 and 60 mers (base-pairs) that are either synthesised in-situ or robotically spotted on the surface of the glass side. In oligonucleotide array, sample RNA is prepared, labelled with dye colour and hybridised to an array which is then scanned into digital image to obtain a fluorescence intensity value for each probe. Unlike cDNA arrays, oligonucleotide arrays use single-channel labelling system, i.e. single dye colour, in hybridisation and each oligo probe contains a unique oligonucleotide sequence (Ebert and Golub, 2004) that ease the hybridisation process. Due to the short probe length in oligonucleotide arrays, the hybridisation specificity is more easily controlled than cDNA arrays.

Literature (Mah et al., 2004; Asyali et al., 2006) shows that although cDNA and oligonucleotide arrays have a poor correlation on the DNA probes, both arrays are able to display similar characteristics in the data, even though, the combined results of both arrays have been not possible. Hence, studies on the combination of results on multiple similar arrays have gained close attention from the bioinformatics field. With the robust and reproducible gene expression data that can be generated on multiple similar arrays, the technicality aspects of array design have become less critical (Ebert and Golub, 2004). However, the degree of similarity of the DNA probe sets and the expression level of the corresponding transcript in the experiments still play important roles in the reproducibility aspect and yet, this issue remains a topic of intensive research (Asyali et al., 2006).

In document Genetic algorithm-neural network: feature extraction for bioinformatics data. (Page 37-40)