We cannot predict the collective behavior of a complex system only from knowing its components’ behavior. A solution to understanding biological systems, especially cellular systems, is to identify and map the interactions
between components of a cellular network and use theoretical tools of net- work science to analyze them. Human cells have about 20,000 protein-coding genes which encode an estimated 6.13 million proteins (including modified protein variants as a result of alternative splicing, single amino acid poly- morphisms, and posttranslational modifications) [207]. Experiments on sin- gle molecules are inefficient for untangling the web of interactions among these cellular components. Microarray technology provides a precise and high-throughput technique for identifying biochemical activities of millions of biomolecules and interactions among them in a single experiment (e.g. up
to 105
interactions per cm2
of a microarray [167]). Microarrays can help study gene expression patterns, antibody-antigen interactions, protein-DNA inter- actions, and many more aspects of cellular phenotypes [212]. In Chapter 4, I will use data produced from protein binding microarrays. To construct geno- type networks and simulate populations on such networks. I will briefly de- scribe the fundamentals of microarray technology and some of its prominent applications.
Before explaining microarray technology, I provide some definitions that will be useful for describing the technology. A library of biomolecules is a set of all molecules that differ from one another in a well-defined way. For exam- ple, a DNA library may contain all possible DNA sequences of length 10 or only a subset of such sequences. There are different classes of libraries that differ in their spatial organization. The simplest library is a mixture of ran- domly generated sequences [56], which can be used to find aptamers (short DNA or peptide molecules that bind to specific biomolecules) [120]. A sec- ond class comprises libraries whose elements are spatially separated through binding to different microscopic beads [78]. In such libraries, one does not have a priori knowledge about what sequence is bound to each bead, and this needs to be determined with further experiments. Finally, there are li- braries whose elements are arrayed on a supporting surface such as a plastic or glass microscope slide, a film, or a semiconductor chip [63, 231]. In such an array, we know the exact location of each element of the library.
Once an array is ready, analytes can be applied to the array. A scanner detects and records any interaction between analytes and the array elements. Detec- tion of interactions which needs to be ultrasensitive to detect single molecule reactions, can be mediated by labeling [65, 91], or through other methods, such as electrochemical methods [60, 272]. The large amounts of data pro- duced by microarray experiments require the use of bioinformatics methods.
The most widely used types of microarrays are DNA microarrays [222]. They are used to detect gene expression patterns and to sequence mutations on a large scale. DNA microarrays consist of DNA oligonucleotides attached to a supporting surface. The sample to be analyzed can be DNA or RNA (con- verted to cDNA) that can hybridize with the array elements. Only highly complementary sequences remain attached to the array after a washing step. Using microarray, one can determine gene expression levels of thousands of genes. Other applications include, but are not limited to, comparative genomic hybridization, chromatin immunoprecipitation on chip, single nu- cleotide polymorphism detection, and alternative splicing detection.
DNA microarrays are provide only limited information about protein abun- dance and functions. One reason is that protein expression levels do not always correlate with mRNA levels [89, 159]. To solve this problem, protein microarrays were first introduced in 1983 [33]. In a protein array, proteins or nonpeptide aptamers are used as the elements mobilized on a surface. Pro- tein microarrays can be employed to identify protein expression, or to iden- tify protein–protein interactions, disease biomarkers and the DNA-binding specificity of protein variants [272].
Protein binding microarray (PBM) technology comprises another category of microarrays that provides rapid, high-throughput characterization of protein- DNA interactions in vitro [19, 20, 176]. Complex response of cells to environ- mental changes and changes in gene expression throughout development are mediated by transcription factors binding to DNA sequences. Transcription factors can activate or repress gene expression by promoting or inhibiting transcription of genes. DNA binding sites of transcription factors in eukary- otes are usually short (6–10 base pairs) [19]. PBMs provide a means of mea- suring binding affinity of transcription factors to all possible DNA binding sequences in a single experiment. This helps building full landscapes of tran- scription factor binding affinities, and provide insight into the regulatory functions of transcription factors. Moreover, by quantifying binding affin- ity of transcription factors to all possible sequences, one can detect binding differences in homologous proteins. In PBMs, transcription factors are ex- pressed with epitope tags (sequences that can be recognized by antibodies), and then applied to arrays of double-stranded DNA arrays. A washing step removes any transcription factor that is not highly complementary to DNA
sequences. Fluorophore-conjugated antibodies are then applied to microar- rays, which attach to epitope tags of transcription factors and help quanti- fying binding of transcription factors to DNA sequences [18]. A limitation of current PBMs is that they can only detect binding affinities of transcrip- tion factors with short motifs (less than 12bp) [19]. Prokaryotic transcription factors can bind to DNA sites 20bp or longer.