In this example, the server provides a gateway or access point to the Internet to access the national databases for gene data analysis. Individual computers, running different operating systems, share access to data generated by the microarray image scanner as soon as it's generated. For example, even though a workstation may be running MacOS, UNIX, Linux, or some version of the Windows operating system, and the microarray image scanner controller operates under a proprietary operating system, the network provides a common communications channel for sharing and
capturing data from the experiment as well as making sense of it through computer-based analysis. The network also supports the sharing of resources, such as printers, modems, plotters, and other networked peripherals. In addition, a wireless extension of the network allows the researchers to share the wireless laptop for manipulating the data, such as by transforming spot data from the image analysis workstation to array data that can be manipulated by a variety of complex data- manipulation utilities. In this context, the purpose of the LAN is to provide instantaneous connectivity between the various devices in the laboratory, thereby facilitating the management, storage, and use of the data.
Consider the process without the network depicted in Figure 3-1. The gene analysis workstation would have to be connected directly to the Internet—a potentially dangerous proposition without a software or hardware firewall or safety barrier to guard against potential hackers. Similarly, the
results of any analysis would have to be separately archived to a floppy, Zip® disk, or CD-ROM. In addition, sharing experimental data would require burning a CD-ROM or using other media
compatible with the other workstations in the laboratory. Simply attaching a data file to an e-mail message or storing it in a shared or open folder on the server would be out of the question. Data could also be shared through printouts, but because the computers aren't part of a network, each workstation requires its own printer, plotter, modem, flatbed scanner, or other peripherals. For
example, unless the expression analysis workstation has its own connection to the Internet, results of the experiment can't be easily communicated to collaborating laboratories or even the department in an adjoining building. Furthermore, even though many of the public online bioinformatics databases accept submissions on floppy or other media, the practice is usually frowned upon in favor of
electronic submission.
Without the wireless component of the LAN, researchers in the lab would not be able to instantly explore the data generated by the scanning and analysis workstation, but would have to wait until the other researchers operating a workstation have time to write the data to a disk or other media. More importantly, every workstation operator would be responsible for backing up and archiving their own data—a time-consuming, high-risk proposition. It's far more likely, for example, that a
researcher in the laboratory will fail to manually archive local data on a regular basis than it is for a central, automated backup system to fail.
This brief tour of this prototypical microarray laboratory highlights several applications of networks in bioinformatics. The underlying advantage of the network is the ability to move data from one
computer to another as quickly, transparently, and securely as possible. This entails accessing online databases, publishing findings, communicating via e-mail, working with other researchers through integrated networked applications known as groupware, and downloading applications and large data sets from online sources via file transfer protocol (FTP) and other methods.
Although many of these features can be had by simply plugging in a few network cards and following a handful of instruction manuals, chances are that several key functions won't be available without considerably more knowledge of network technology. For example, selecting and configuring a network requires that someone make educated decisions regarding bandwidth, reliability, security, and cost. Furthermore, mixed operating system environments typical of bioinformatics laboratories, which tend to have at least one workstation running Linux or UNIX, presents challenges not found in generic office networks.
What's more, it may not be obvious from the simple network depicted in Figure 3-1 that
bioinformatics networks present unique networking challenges that typically can't be addressed by generic network installations. The first is that there is a huge amount of data involved. The network isn't handling short e-mail messages typical of the corporate environment, but massive sequence strings, images, and other data. In addition, unlike networks that support traditional business transaction processing, data are continually flowing from disk arrays, servers, and other sources to computers for processing because the data can't fit into computer RAM. As a result, the network and external data sources are in effect extensions of the computer bus, and the performance of the network limits the overall performance of the system. It doesn't matter whether the computer processor is capable of processing several hundred million operations per second if the network feeding data from the disks to the computer has a throughput of only 4–5 Mbps.
This chapter continues the exploration of the Internet, intranets, wireless systems, and other network technologies that apply directly to sharing, manipulating, and archiving sequence data and other bioinformatics information. The following sections explore network architecture—how a network is designed, how the components on the system are connected to the network, and how the
components interact with each other. As illustrated in Figure 3-2, this includes examining networks from the perspective of:
● Geographical scope
● Underlying model or models used to implement the network
● Signal transmission technology
● Bandwidth or speed
● Protocol or standards used to define how signals are handled by the network
● Ownership or funding source involved in network development
● Hardware, including cables, wires, and other media used to provide the information conduit
from one device to the next ● Content carried by the network