CHAPTER 3. METHODS AND PROCEDURES 69
3.1 Background 69
Many biologists today are interested in specific genes of plants and animals. While many different types of genes have been identified, gene roles and functions are still not clearly understood or defined. Discovering the roles of genes is extremely important for work in identifying important characteristics such disease or age resistance. Universities and corporations around the world are creating tools and visualizations to help scientists explore genes and their roles.
3.1.1 Biology
DNA (Deoxyribonucleic acid) contains specific genetic instructions that are used in the development and function of all living matter; basically it contains the blueprint needed to construct all cellular components. DNA sequences are also involved in regulating genetic information. In order for a organism to survive, it must regulate cellular processes. The area of DNA that contains this information is a gene. A gene expression is the process in which inheritable information is made into a functional gene product, such as protein or RNA. A gene contains genetic information as well as the sequence for Ribonucleic acid (RNA), a nucleic acid that transmits genetic information from DNA to proteins. RNA is crucial because it assists cells in the creation of proteins. The process, stimulated by enzymes, to convert an entire gene to RNA is called transcription. Certain types of RNA even regulate which genes are active. RNA, usually single-stranded, has different types, and one kind is messenger RNA (mRNA). mRNA defines one or more protein sequences, and is ephemeral,
i.e. more is made as needed. The mRNA is used to make a matching protein sequence, a process called translation. mRNA delivers this sequence to a cell’s ribosome to create needed proteins. These proteins are then responsible for building the structure of the organism’s body and helping different chemical reactions take place. These reactions are called pathways. Pathways can be very complex, requiring a lot of different resources to function properly. Many different pathways can exist within a single cell. Pathways are important to maintaining homeostasis within an organism. Metabolic networks are a collection of pathways.
Scientists are able to knock-out (get rid of) or silence (suppress) specific genes in order to test different conditions. To identify the function of unknown genes, scientist use
microarray analysis. Microarrays, also known as gene chips, contain 100-20,000 DNA gene samples and possibly 2 – 80 experiment conditions. These chips are made of glass or nylon substrates. Each chip contains specific DNA samples plotted in the array by a robotic
printer. A fluorescent labeled mRNA from an experimental condition is spread over the chip. The mRNA will bond strongly/weakly with certain DNA. Using a laser scan, sensors detect the various levels with which the sample expressed each gene. The level that each gene is expressed can then lead to recognizable patterns and help identify function (Seo and Shneiderman 2005).
The ability to recognize gene functions faces several obstacles. For example, identification can be difficult with genes that exhibit similar profiles. Another obstacle includes the sheer volume of data that is generated with these types of experiments.
Scientists can also compare gene interactions using identified pathways. However, there are various pathway databases, and these databases are based on primary publications.
Unfortunately, consistent terminology and identifiers have not been agreed upon. In addition only certain organisms have even moderately documented pathways. Finally, agreed upon pathways are subject to change based on the introduction of new research.
3.1.1.1 Investigative process
To understand genes behavior and regulation scientist run complex tests using one or more genes. The results of these tests are then run through various statistical evaluations.
Often, the use of visualizations at this point can show clusters of genes as well as any gene that does not behave as expected. Since some genes have very similar profiles, it can be hard to determine what exactly causes reactions to take place. Scientists rely on databases of stored experiments and papers to illuminate what genes they are looking at and to help determine roles. After an interesting gene(s) is found, the next step is determining its overall function within a cell. Pathways, the chain of chemical reactions, are needed to show cause and effect relationships. Known pathways are also stored in databases for scientific use. See Figure 47 for the investigative process.
Figure 47. Discovery process
3.1.2 Biological Visualization Tools
While researchers can determine interesting genes and expressions using statistical measures, it is hoped that the use of visualizations of genetic data will decrease research time and lead to new discoveries. Tools have already been developed for each stage of the
discovery process (see Figure 47).
Several databases exist containing biological data available to scientists. One of the most difficult aspects for using these resources is the fact that there is no standard method of organization, and the scientific community has not agreed on universal identifiers for
different genes. TAIR (2008) provides the AraCyc tool. AraCyc is a visualization tool for biochemical pathways of Arabidopsis thaliana (mustard plant) and is supported by the
Pathway Tools software. AraCyc includes a mix of information extracted from peer- reviewed literature and computationally predictions. The MetNet group at Iowa State University provides MetNetDB (MetNet 2008). This database contains information on networks from the metabolic and regulatory interactions in Arabidopsisthaliana. Database information is based on information from biologists. Interactions that are stored include transcription, translation, protein modification, assembly, allosteric regulation, and translocation from one subcellular compartment to another. The database also provides AraCyc-curated pathways and AGRIS-curated regulatory networks. Data is derived from a collection of other web databases. MetNetDB also provides a Curator tool that can be used to both query and modify the database.
Once the known biological data is collected, scientists then have the task of finding interesting genes within an organism. The use of tools to visualize statistical significance greatly speeds up this process. GeneVis (Baker et al. 2002) is a particle-based system that provides an environment for visually exploring genetic regulatory networks. It simulates genetic network behavior based on probabilistic occurrences of gene-protein interactions. Two different visualizations are provided: visualization of the movement of regulatory proteins and visualization of the relative concentration of those proteins. Different
representational models are offered, including a protein interaction representation, a protein concentration representation, and a network structure representation. Protein interaction focuses on the activities of individual proteins. Protein concentration displays the relative spread and concentration of proteins. Finally, network structure depicts the genetic network dependencies present in the simulation. GeneVis offers some interactive components. These include animations between representations and three types of viewing lenses (fuzzy lens, base-pair lens, and ring lens). The simulation starts at the cell level. A large circle in the middle of the visualization represents a chromosome. Various genes (represented by small spheres) are then plotted on the chromosome. Small fuzzy dots around the genes and chromosome represent proteins. Color is used to distinguish the type of protein each dot represents. With the dynamic nature of the tool, one can see not only when a protein binds with one gene, but also its resulting effect in the environment. Users can then switch
between three lenses to see protein interaction, concentration, or representation for one protein type.
exploRase (MetNet 2008) is statistical-based gene visualization tool written in R and available from Iowa State University. The purpose of this tool is to allow the user to explore and analyze multivariate Systems biology data. It handles transcriptomic, metabolomic, and proteomic data. A graphical user interface is provided on top of the script-based R language. This visualization tool allows users to load biological data and analyze ‘omics data from the context of metabolic and regulatory networks. Three files are necessary to use exploRase, but once loaded the files are save as a set. The user is also able to save calculated statistics. exploRase uses various chart and plot representations (dotplot, scattergrams, parallel
coordinate plots, etc), which are configurable by the user. The system also supports
coordinated multi-window display. A selection (brush) in one window links the action to the other windows. The user is also able to sort tables of metadata and link this information via color-coding to the display.
After a scientist has identified interesting genes, he/she must then move on to the step of discovering how the gene interacts in the organism. Pathways can be thought of the chain of chemical reactions that take place in a cell. By knowing how and what is affected, scientists can narrow the role of certain genes. Cytoscape (Cytoscape 2008) is an open source software platform for visualizing molecular interaction networks. It integrates
interactions with gene expression profiles as well as other state data. Additional features can be added as plugins. The plugin functionalities can range from profiling analyses, layout direction, and file format support. The user of Cytoscape is able to customize visualization of his/her data in various formats (hyperbolic, node-link, etc). The user can also map
attributes to different colors, line thickness, and border color. The visualization is laid out in a 2D plane and allows the user to choose a layout algorithm. Interactive tools include zoom/pan, overview, and marking. Filtering is made possible by selecting nodes and/or interactions based on data (threshold, p-value, gene expression level). Cytoscape can also find active sub networks and clusters. The user session can be saved for future work. This files contains network, attributes, desktop states, properties and visual styles of the tool. Finally, Cytoscape network visualization data can be exported to a static image in a variety of
formats. The Cytoscape team highly encourages other teams to create plugins for this tool. FCModeler (MetNet 2008) is a Java plugin to Cytoscape. All the plugins are designed to visualize network data from the MetNet database. The goal of these plugins is to provide a modeling framework for biologists to explore hypotheses, analyze network structure, and visualize results of experiments for different types of omics data. The plugins feature a sub- graph creator, dot layout, and algorithms for finding cycle and paths.