• No results found

DATA ORGANIZATION AND STORAGE

The Laboratory Use of Computers

VI. DATA ORGANIZATION AND STORAGE

The laboratory and the individual scientists can easily be overwhelmed with the sheer volumes of data produced today. Very rarely can an analytical problem be answered with a single sample let alone a single analysis. Compound

this by the number of problems or experiments that a scientist must address and the amount of time spent organizing and summarizing the data can eclipse the time spent acquiring it. Scientific data also tends to be spread out among several different storage systems. The scientist’s conclusions based on a series of experiments are often documented in formal reports. Instrument data is typically contained on printouts or in electronic files. The results of individual experiments tend to be documen- ted in laboratory notebooks or on official forms designed for that purpose.

It is important that all of the data relevant to an exper- iment be captured: the sample preparation, standard prep- aration, instrument parameters, as well as the significance of the sample itself. This meta data must be cross- referenced to the raw data and the final results so that they can be reproduced if necessary. It is often written in the notebook or in many cases it is captured by the analyti- cal instrument and stored in the data file and printed on the report where it cannot be easily searched. Without this information, the actual data collected by an instrument can be useless, as this information may be crucial in its interpretation.

Scientists have taken advantage of various personal productivity tools such as electronic spreadsheets, per- sonal databases, and file storage schemes to organize and store their data. While such tools may be adequate for a single scientist such as a graduate student working on a single project, they fail for laboratories performing large numbers of tests. It is also very difficult to use such highly configurable, nonaudited software in regulated environments. In such cases, a highly organized system of storing data that requires compliance to the established procedures by all of the scientific staff is required to ensure an efficient operation.

A. Automated Data Storage

Ideally, all of the scientific data files of a laboratory would be cataloged (indexed) and stored in a central data reposi- tory. There are several commercial data management systems that are designed to do just this. Ideally these systems will automatically catalog the files using indexing data available in the data files themselves and then upload the files without manual intervention from the scientist. In reality, this is more difficult than it would first appear. The scientist must enter the indexing data into the scientific application and the scientific application must support its entry. Another potential problem is the proprietary nature of most instrument vendor’s data files. Even when the instrument vendors are willing to share their data formats, the sheer numbers of different instrument file formats make this a daunting task. Still, with some stan- dardization, these systems can greatly decrease the time

scientists spend on mundane filing-type activities and provide a reliable archive for the laboratory’s data. These systems also have the added benefit of providing the file security and audit trail functionality required in regulated laboratories on an enterprise-wide scale instead of a system-by-system basis.

However, storing the data files in a database solves only part of the archiving problem. Despite the existence of a few industry standard file formats, most vendors use a pro- prietary file format as already discussed. If the data files are saved in their native file format, they are only useful for as long as the originating application is available or if a suitable viewer is developed. Rendering the data files in a neutral file format such as XML mitigates the obsolescence problem but once again requires that the file format be known. It will also generally preclude reana- lyzing the data after the conversion.

B. Laboratory Information Management Systems

Analytical laboratories, especially quality control, clinical testing labs, and central research labs, produce large amounts of data that need to be accessed by several different groups such as the customers, submitters, ana- lysts, managers, and quality assurance personnel. Paper files involve a necessarily manual process for searching results, requiring both personnel and significant amounts of time. Electronic databases are the obvious solution for storing the data so that it can be quickly retrieved as needed. As long as sufficient order is imposed on the storage of the data, large amounts of data can be retrieved and summarized almost instantaneously by all interested parties.

A database by itself, however, does not address the workflow issues that arise between the involved parties. Laboratories under regulatory oversight such as pharma- ceutical quality control, clinical, environmental control, pathology, and forensic labs must follow strict procedures with regard to sample custody and testing reviews. Laboratory information management systems (LIMS) were developed to enforce the laboratory’s workflow rules as well as store the analytical results for convenient retrieval. Everything from sample logging, workload assignments, data entry, quality assurance review, man- agerial approval, report generation, and invoice processing can be carefully controlled and tracked. The scope of a LIMS system can vary greatly from a simple database to store final results and print reports to a com- prehensive data management system that includes raw data files, notebook-type entries, and standard operating procedures. The degree to which this can be done will be dependent upon the ability and willingness of all con- cerned parties to standardize their procedures. The LIMS

functions are often also event and time driven. If a sample fails to meet specifications, it can be automatically programed to e-mail the supervisor or log additional samples. It can also be programed to automatically log required water monitoring samples every morning and print the corresponding labels.

It was mentioned earlier that paper-based filing systems were undesirable because of the relatively large effort required to search for and obtain data. The LIMS database addressed this issue. However, if the laboratory manually enters data via keyboard into its LIMS database, the laboratory can be paying a large up-front price in placing the data in the database so that it can be easily retrieved. Practically from the inception of the LIMS systems, direct instrument interfaces were envisioned whereby the LIMS would control the instrumentation and the instrument would automatically upload its data. Certainly this has been successfully implemented in some cases but once again the proprietary nature of instru- ment control codes and data file structures makes this a monumental task for laboratories. Third party parsing and interfacing software has been very useful in extracting information from instrument data files and uploading the data to the LIMS systems. Once properly programed and validated, these systems can bring about very large productivity gains in terms of the time saved entering and reviewing the data as well as resolving issues related to incorrect data entry. Progress will undoubtedly continue to be made on this front since computers are uniquely qualified to perform such tedious, repetitive tasks, leaving the scientist to make conclusions based on the summarized data.

BIBLIOGRAPHY

Bigelow, S. J. (2003). PC Hardware Desk Reference. Berkeley: McGraw Hill.

Crecraft, D., Gergely, S. (2002). Analog Electronics: Circuits, Systems and Signal Processing. Oxford: Butterworth-Heinemann. Horowitz, P., Hill, W. (1989). The Art of Electronics. 2nd. ed.

New York: Cambridge University Press.

Isenhour, T. L., Jurs, P. C. (1971). Anal. Chem. 43: 20A. Jurs, P. C., Kowalski, B. R., Isenhour, T. L. (1969). Anal. Chem.

41: 21.

Lai, E. (2004). Practical Digital Signal Processing for Engineers and Technicians. Oxford: Newnes.

Lavine, B. K. (1992). Signal processing and data analysis. In: Haswell, S. J., ed. Practical Guide to Chemometrics. New York: Marcell Dekker, Inc.

Massart, D. L., Vandeginste, B. G. M., Deming, S. N., Michotte, Y., Kaufman, L. (1988). Chemometrics: A Text- book. Amsterdam: Elsevier.

Materns, H., Naes, T. (1989). Multivariate Calibration. New York: John Wiley and Sons Ltd.

Moriguchi, I., Komatsu K., Matsushita, Y. (1980). J. Med. Chem. 23: 20.

Mueller, S. (2003). Upgrading and Repairing PCs. 15th ed. Indianapolis: Que.

Paszko, C., Turner, E. (2001). Laboratory Information Manage- ment Systems. 2nd ed. New York: Marcel Dekker, Inc. Van Swaay, M. (1997). The laboratory use of computers. In:

Ewing, G. W., ed. Analytical Instrumentation Handbook. 2nd ed. New York: Marcel Dekker, Inc.

Wold, S., Sjostrom, M. (1977). SIMCA—a method for analyzing chemical data in terms of similarity and analogy. In: Kowalski, B. R., ed. Chemometrics: Theory and Practice. Society Symp, Ser. No. 52. Washington D.C.: American Chemical Society.

2