Use of research databases - Information search and access

3. Changing research practices

3.3 Information search and access

3.3.3 Use of research databases

Databases play an increasingly valuable role in research. The OECD (1998) noted that:

Scientists in many fields now produce data sets which are accessible via the Internet to colleagues around the globe. The Internet also provides new opportunities for scientists in different countries to combine local data sets into global ones. This is useful for research projects requiring data from around the world, notably in biological and Earth-related sciences, (eg. the Human Genome Project and the International Geosphere-Biosphere Programme). One notable… example was the immediate release of data collected by the Hubble Space Telescope (HST) to any astronomer wishing to study it (OECD 1998, p28).

Important too in areas as diverse as astronomy, medicine and music is the increasing ability to store and manipulate images and sounds.

Analysis of this wealth of data increasingly involves complex software.

Technologies such as satellite imaging systems and particle accelerators collect huge amounts of data, the interpretation of which often requires specialised software. Computer

7_{Personal communication from Philip Kent, Executive Manager, Knowledge and Information}

networks can provide wider access to such software. For researchers, one of the most important changes wrought by the Internet, and particularly the WWW, has been the ability to readily upload specialised software code. Transfer and use of software via the Internet have become as essential to many researchers as e-mail. Given the increased sophistication of software and the considerable investment required to develop it, the incentive to share software is increasing. Programmes that earlier would have been written solely for personal use are now made available over the Internet, where libraries of free software for scientific purposes are growing (OECD 1998, p36).

Box 3.1 Examples of shared databases

Large shared databases have become important resources in many fields of science and social science. These databases allow researchers working on different pieces of large problems to contribute to and benefit from the work of other researchers and shared resources. Examples of such databases include the following:

GenBank (www.ncbi.nlm.nih.gov/Genbank/) is the National Institute of Health’s annotated collection of publicly available DNA sequences. As of June 2001, GenBank contained approximately 12.9 billion base pairs from 12.2 million sequence records. The number of nucleotide base pairs in its database has doubled approximately every 14 months. As part of a global collaboration, GenBank exchanges data daily with European and Japanese gene banks.

The Protein Data Bank (www.rcsb.org/pdb/) is the worldwide repository for the processing and distribution of three-dimensional biological macromolecular structure data (Berman et al. 2000).

The European Space Agency (ESA) Microgravity Database (www.esa.int/cgi-

bin/mgdb) gives scientists access to information regarding all microgravity

experiments carried out on ESA and National Aeronautics and Space Administration missions by European scientists since the 1960s.

The Tsunami Database (www.ngdc.noaa.gov/seg/hazard/tsu.html) provides information on tsunami events from 49 B.C. to the present in the Mediterranean and Caribbean Seas and the Atlantic, Indian, and Pacific Oceans. It contains information on the source and effects of each tsunami.

The Earth Resources Observation Systems Data Center (edcwww.cr.usgs.gov/ ) houses the National Satellite Land Remote Sensing Data Archive, a comprehensive, permanent record of the planet’s land surface derived from almost 40 years of satellite remote sensing. By 2005, the total holdings will come to some 2.4 million gigabytes of data.

Source: National Science Board (2002) Science and Engineering Indicators 2002, Arlington, VA: National Science Foundation, 2002 (NSB-02-1), p8-25.

Not only are ICTs important for storing and sharing scientific information, they are also increasingly important in automated data collection. In many areas data are now collected in digital form (eg. seismic data, remote imaging, mineral composition and oceanographic data, music and film) – ie. born digital. This makes data analysis faster and easier. Sinclair (1999) cited the example of automated gene sequencers, which use robotics to process samples and computers to manage, store, and retrieve data, and have made possible the rapid sequencing of the human

genome, which in turn has resulted in unprecedented expansion of genomic databases (National Science Board 2002, p8-25).

Certainly in such areas as genetics, genomics and proteomics the cutting edge of scientific research and of information technology converge (eg. in bioinformatics) (Houghton 2002b). Databases allow researchers to analyse and manipulate protein structures and give access to huge volumes of genetic information. Discussing the genetic and genomic revolutions, Tollerman et al. (2001), suggested that:

The genomics wave is technology-driven, formed by the integration of new high throughput techniques with powerful new computing capabilities. It is active throughout R&D, most immediately at the drug discovery stage, and promises to enhance productivity greatly, without jeopardizing downstream success rates. The genetics wave is data-driven, exploiting the details of individuals' genetic variation that are emerging from the oceans of genomic data…

We characterize genomics… as the confluence of two interdependent trends that are fundamentally changing the way R&D is conducted: industrialization (creating vastly higher throughputs, and hence a huge increase in data), and informatics (computerized techniques for managing and analyzing those data). The surge of data − generated by the former, and processed by the latter − is of a different order from the data yields of the pre- genomics era (Tollerman et al. 2001, pp1-2).

In a study of U.K.-based academic researchers, Education for Change et al. (2002) found that 48% of the researchers they surveyed were using computerised datasets of primary data, and 34% thought that their use would increase in the future. They found that use was higher in the sciences, but still considerable in the arts and humanities. By research field:

• 31% of U.K.-based medical and biological sciences researchers considered datasets to be essential to their research, a further 24% used them and 44% believed that their use would increase;

• 28% of physical sciences and engineering researchers considered datasets to be essential to their research, a further 23% used them and 39% believed that their use would increase;

• 27% of social science researchers considered datasets to be essential to their research, a further 24% used them and 31% believed that their use would increase;

• 33% of areas studies and languages researchers considered datasets to be essential to their research, a further 12% used them and 23% believed that their use would increase; and

• 14% of arts and humanities researchers considered datasets to be essential to their research and a further 23% used them (Education for Change et al. 2002).

Similarly, in a survey of scientists and engineers in the United States, the National Science Foundation found that 34% reported using digital libraries and data repositories, and a further 23% expected to do so in the future (Atkins et al. 2003, pB5).

Lyman and Varian (2000) estimated that the world produced between 1 and 2 exabytes (ie. billion gigabytes) of unique information per year, roughly 250 megabytes for every man, woman and child on earth. Printed documents of all kinds comprise only 0.003% of the total. As Hey and Trefethen (2002) put it, “...it is evident that e-science data generated from sensors, satellites, high-performance computer simulations, high-throughput devices, scientific images and so on will soon dwarf all of the scientific data collected in the whole history of scientific exploration.” Closer to home, it is notable that the fastest growing area of ICT related trade between Australia and the rest of the world since the early 1990s has been database and subscription services. Between 1993-94 and 2001-02, database services exports increased by 34% per annum and imports increased 33% per annum, compared with increases in overall ICT related services exports of 7.4% per annum and imports of 5.9% per annum (Houghton 2002a).

There is truly a revolution in the systems of scientific and scholarly information access, which has not yet entirely registered with most of those writing on scholarly communication. The U.S. National Research Council (2001) went so far as to suggest that:

The rapidly expanding availability of primary sources of data in digital form may be shifting the balance of research away from working with secondary sources such as scholarly publications. Researchers today struggle to extract meaning from these masses of data, because our techniques of searching, analyzing, interpreting, and certifying information remain primitive. New automated systems, and perhaps new intermediary institutions for searching and authenticating information, will develop to provide these services, much as libraries and scholarly publications served these roles in the past (National Research Council 2001, p5).

The level of use of research databases and expectations regarding future use, and the potential therein for new modes of information access and dissemination, suggest a need to pay much more attention to the use of databases and non- traditional, non-text digital objects in the future.

In document Changing Research Practices in the Digital Information and Communication Environment (Page 69-72)