SCIENTIFIC DATA VISUALIZATION AND DIGITAL IMAGE PROCESSING FOR STRUCTURAL BIOLOGY

(1)

DIGITAL IMAGE PROCESSING FOR

STRUCTURAL BIOLOGY

A Thesis

Submitted to the Faculty of

Purdue University by

Ioana Maria Boier Martin

In Partial Fulfillment of the Requirements for the Degree

of

Doctor of Philosophy

(2)

TABLE OF CONTENTS

Page

LIST OF TABLES . . . . vii

LIST OF FIGURES . . . . viii

ABSTRACT . . . . xi

1. INTRODUCTION . . . . 1

1.1 Motivation and Goals. . . . 1

1.2 Organization of the Thesis . . . . 2

1.3 Research Contributions . . . . 3

2. FUNDAMENTAL CONCEPTS IN STRUCTURAL BIOLOGY . . . . 5

2.1 Crystals and Diffraction. . . . 5

2.2 X-ray Crystallography . . . . 8

2.3 Electron Microscopy . . . . 12

2.4 Other Methods for Structure Determination . . . . 13

3. COMBINING DATA VISUALIZATION WITH COMPUTATIONS . . . . . 15

3.1 Overview . . . . 15

3.2 Survey of Software Packages for Structural Biology . . . . 16

3.3 Tonitza A Scientific Visualization Package for Structural Biology . . . . 19

3.3.1 The Graphical User Interface . . . . 19

3.3.2 Input/Output. . . . 22 3.3.3 Computations . . . . 24 3.3.4 Data Visualization . . . . 29 3.4 Algorithms . . . . 33 3.4.1 Computing Isosurfaces . . . . 34 3.4.2 Arcball Rotations . . . . 36

3.5 Software Engineering Design Issues. . . . . 38

3.6 Conclusion . . . . 39 a

(3)

Page

4. IMAGE PROCESSING OF ELECTRON MICROGRAPHS . . . . 41

4.1 Motivation . . . . 41

4.2 Overview of Digital Image Processing Techniques . . . . 42

4.2.1 Image Enhancement. . . . 43

4.2.2 Image Segmentation . . . . 47

4.3 Digital Image Processing of Electron Micrographs . . . . 50

4.4 Automatic Particle Selection . . . . 52

4.4.1 Related Work . . . . 53

4.4.2 The Road to the Crosspoint Method . . . . 54

4.4.3 The Crosspoint Method . . . . 55

4.4.3.1 Preprocessing. . . . 56

4.4.3.2 Particle identification. . . . 57

4.4.3.3 Postprocessing . . . . 62

4.4.3.4 Results . . . . 63

4.4.3.5 Sensitivity of the Crosspoint Method . . . . 68

4.5 EMMA A Package for Image Processing of Electron Micrographs . . . . 68

4.5.1 Motivation and General Structure . . . . 68

4.5.2 Particle Extraction . . . . 70

4.5.3 Image Processing . . . . 70

4.6 Conclusion . . . . 71

5. PARALLEL PROCESSING METHODS AND APPLICATIONS . . . . 73

5.1 Introduction . . . . 73

5.2 Adaptive Load Balancing Strategies . . . . 73

5.2.1 Problem Definition . . . . 73

5.2.2 Macromolecular Structure Computations . . . . 74

5.2.3 Experimental Studies . . . . 76

5.3 Parallel Algorithms for Objects with High Symmetry . . . . 82

5.3.1 Problem Definition . . . . 82

5.3.2 Structure Determination of Spherical Viruses . . . . 83

5.4 Conclusion . . . . 87

6. CONCLUSIONS AND FUTURE WORK . . . . 88

6.1 Algorithms and Heuristics . . . . 88

6.2 Software Packages . . . . 89

6.3 Future Work . . . . 90 a

(4)

Page BIBLIOGRAPHY . . . . 92

Appendix A: Data Sets Used in the Visualization Examples . . . . 97 Appendix B: Electron Micrographs Used for Testing the Crosspoint Method . . 98 Appendix C: Data Formats Accepted in Tonitza and EMMA . . . . 100 Appendix D: Using Tonitza for the Development of Cluster Labeling Algorithms 104 VITA . . . . 105 LIST OF PUBLICATIONS . . . . 106

(5)

LIST OF TABLES

Table Page

LIST OF TABLES

Table Page

3.1 Selected software packages used in molecular modeling . . . . 17

4.1 Timing results for the Crosspoint Method . . . . 64

4.2 Number of particles identified by the Crosspoint Method . . . . 65

4.3 Sensitivity of the Crosspoint Method to changes in the particle radius in the case of the PHIX-B micrograph. True radius is approximately 25 pixels.. . . . 69

5.1 Approximate values of break points and slopes as the number N of work units varies (P = 20, t₀ = 2, = 0.1) . . . . 77

5.2 Approximate values of break points and slopes as the transfer latency t₀ varies (P = 20, N = 500, = 0.1) . . . . 78

5.3 Approximate values of break points and slopes as the dynamic load factor varies (P = 20, t₀ = 2, = 0.1) . . . . 79

5.4 Execution time (in seconds) for single and double interpolation . . . . 86

C.1 The MAP INTEGER*2 format . . . . 101

C.2 The PURDUE INTEGER*2 format . . . . 101

C.3 The PARTICLE format . . . . 102

C.4 The SGI header format . . . . 103 α

α

η α

(6)

LIST OF TABLES

Table Page

LIST OF FIGURES

Figure Page

2.1 An icosahedron viewed along each of its symmetry axes. . . . 6

2.2 Schematic representation of the diffraction process . . . . 7

2.3 Portion of a diffraction pattern from an HRV14 crystal . . . . 8

2.4 Fitting a structure in an electron density map . . . . 11

2.5 Image formation in a lens (from [BAKE95]) . . . . 12

3.1 A snapshot of the screen during a Tonitza session . . . . 20

3.2 The colormap editor . . . . 23

3.3 The material editor . . . . 24

3.4 Data rotation window . . . . 25

3.5 Average electron density as a function of the particle radius . . . . 26

3.6 Correlation coefficient as a function of the magnification factor . . . . 27

3.7 Correlation coefficient as a function of the particle radius . . . . 27

3.8 Ross-River virus surface with antibody fragments attached. The surface of the antibodies was computed as a difference map between the complex form of the virus attached with antibodies and the native virus structure. . . . 28

3.9 Electron density histogram for the Ross-River virus structure . . . . 29

3.10 Equatorial contour map for the Ross-River virus structure . . . . 30

(7)

Figure Page

3.12 Stack of mask contours for a Coxackievirus B3 asymmetric unit . . . . 31

3.13 Spherical sections viewed along two-, three-, and five-fold axes in the case of the Ross-River virus . . . . 32

3.14 Shaded isosurface representations of Ross-River virus data: (a) boxed spike, (b) view along a three-fold axis overlaid with a lattice which shows the positions of the symmetry elements . . . . 33

3.15 Ambiguous case in the Marching Cubes method: depending on the value at the centre of the top face of the cube, either (a) the hexagon, or (b) the two triangles belong to the surface polygonalization . . . . 35

3.16 Naive implementation of the Marching Cubes algorithm . . . . 35

4.1 Histogram of a low-dose digitized electron micrograph (the numbers on the left represent the indices of the gray levels present in the image) . . . . 44

4.2 Pseudo-code for the histogram equalization algorithm . . . . 45

4.3 Histogram of the same electron micrograph as in Figure 4.1 after equalization . 45 4.4 Spatial lowpass filter of size . . . . 46

4.5 A basic highpass spatial filter . . . . 46

4.6 Mask used for high-boost filtering . . . . 47

4.7 The Sobel operator masks . . . . 48

4.8 Mask used to compute the Laplacian . . . . 48

4.9 Circle detection using the Hough transform . . . . 49

4.10 Schematic representation of the 3D reconstruction process (from [DERO68]) . 53 4.11 (a) Portion of a micrograph containing the images of several virus particles, (b) the variation of the pixel intensity along the horizontal line shown in (a) . . 54 4.12 The micrograph in Figure 4.11(a) after a Sobel transformation . . . . 55

4.13 The steps of the Crosspoint Method: (a) original micrograph, (b) the micro-graph after histogram equalization, (c) the micromicro-graph after histogram equa-lization and averaging, (d) the final result of the Crosspoint Method . . . . . 56

(8)

Figure Page 4.14 Pseudo-code describing the marking phase of the Crosspoint Method . . . . 58 4.15 Various situations in the clustering algorithms: (a) eight neighbors of the

rent pixel are considered in the stack algorithm, (b) four neighbors of the cur-rent pixel are considered in the coloring algorithm, (c) two separate clusters must be merged in the coloring algorithm . . . . 59 4.16 Pseudo-code for the stack algorithm . . . . 60 4.17 Pseudo-code for the coloring algorithm . . . . 61 4.18 Disconnecting particles by thinning: (a) particle identification without thining,

(b) particle identification with thinning . . . . 63 4.19 The result of the Crosspoint Method applied to the PHIX-A micrograph . . . 65 4.20 The result of the Crosspoint Method applied to the PHIX-B micrograph . . . 66 4.21 The result of the Crosspoint Method applied to the BMV micrograph . . . . 66 4.22 The result of the Crosspoint Method applied to the REO-18 micrograph . . . 67 4.23 The result of the Crosspoint Method applied to the T1LHC micrograph . . . . 67 4.24 A snapshot of the screen during an EMMA session . . . . 72 5.1 Plot of imbalance versus dispersion as varies . . . . 80 5.2 Plot of imbalance versus dispersion as N varies . . . . 81 5.3 Generic model of the load imbalance function of the load dispersion . . . . . 81 5.4 Asymmetric unit and its border region . . . . 82 5.5 The convergence rate of the single and double interpolation methods . . . . . 86 6.1 A parallel virtual environment for structural biology computations. . . . 90 D.1 Using Tonitza to verify the correctness of a parallel labeling algorithm:

(a) correct labeling of clusters, (b) incorrect labeling of clusters . . . . 104 η

(9)

ABSTRACT

Martin, Ioana M. Boier Ph.D., Purdue University, August 1996. Scientific Data Visualiza-tion and Digital Image Processing for Structural Biology. Major Professor: Dan C. Mari-nescu.

This thesis focuses on the design and development of algorithms and tools for inter-active data visualization and digital image processing of large data sets produced in struc-tural biology experiments.

We describe various computational and visualization algorithms, which we have developed and implemented as part of the Tonitza package for interactive visualization and analysis of structural biology data. The computational algorithms include methods for fitting of data sets using correlation and scaling, various types of interpolation, and algo-rithms for generating statistical information. Several two- and three-dimensional represen-tations of the data sets are described in detail.

We present a number of image processing methods for extracting information from images, and we discuss their applications to electron microscopy. Such methods extend the scientist’s ability to study images of biological structures. We describe the Crosspoint Method, a new technique we developed for automatic detection of the positions of virus particles in electron micrographs. We present the heuristics and the algorithms involved and compare the results obtained with those reported in the literature. We introduce EMMA, a new package for digital processing of micrograph images which includes the Crosspoint Method.

As part of the research presented in this thesis, we also describe several parallel algo-rithms and load balancing schemes for structural biology applications. They helped improve our understanding of the complexity of the problems involved and of how the data is transformed from the moment it is collected until the moment it is displayed.

(10)

1. INTRODUCTION 1.1 Motivation and Goals

The determination of the structure of biological macromolecules has traditionally required the most powerful computers available. Parallel and distributed computing has opened new possibilities in structural biology by allowing the analysis of larger and more complex problems, but interfacing with such computing systems has become more diffi-cult. Complex problem solving environments with friendly user interfaces, which support data visualization, knowledge processing, and automatic data migration are necessary. The rate of data acquisition has also improved considerably because of more powerful syn-chrotrons and data collection devices such as CCD (Charge Coupled Device) detectors. Data rates of one frame every few seconds, with a frame consisting of up to pixels of 16 to 24 bits each will be quite common in a few years. Though most of the data processing can be done off line, the sheer volume of data collected in structural biology experiments requires efficient algorithms, data storage systems, and powerful computers.

The research presented in this thesis is part of an interdisciplinary effort to apply high performance computing techniques to structural biology. Such techniques fall into three main categories: (a) the development of parallel algorithms and data management strategies for performing various tasks [CORN94], (b) the design of interactive tools for digital image processing, data visualization and analysis [BOIR96], and (c) the develop-ment of a parallel virtual environdevelop-ment that supports concurrent execution of parallel and sequential programs on computing platforms with various architectures [SIRB96].

The goals of this research are: (a) to develop a graphics environment which com-bines interactive visualization of large data sets with computational methods that allow for data analysis in terms of its structural significance, (b) to design a new algorithm for auto-4096×4096

(11)

matic selection of virus particles from electron micrographs and a digital image process-ing package which is centered around the implementation of this algorithm, (c) to simulate the behavior of computationally intensive irregular problems for a given class of load bal-ancing schemes, and (d) to study and compare the performance of two different algorithms for problems involving objects with a high degree of symmetry.

1.2 Organization of the Thesis

This thesis focuses on applications of high-performance computing methods to structural biology. The emphasis is on computer graphics and digital image processing methods and their use for the determination of the structure of biological macromolecules. Sequential and parallel computational aspects are also considered.

Chapter 2 gives a brief introduction to structural biology. Fundamental concepts related to structure determination, such as crystals, diffraction, image formation, etc., are defined. These concepts are essential for understanding the design of the algorithms and methods presented in later chapters.

Chapter 3 is devoted to combining data visualization with computations to allow biologists to analyze, manipulate, display, and debug data. An interactive graphics system we developed for this purpose is described and compared with other existing molecular modeling software packages. Algorithms, computational methods, software engineering issues, and specific design problems are also discussed. Examples based on data sets rep-resenting virus structures recently under study are included.

Chapter 4 is centered on the Crosspoint Method, a new algorithm for automatic selection of virus particles from electron micrographs. The method, results, and compari-sons with existing automatic selection procedures are presented in detail. The method is discussed in the context of traditional digital image processing techniques, some of which are described. An interactive image processing package which includes the Crosspoint Method, traditional image processing transformations, and a mechanism for manual selec-tion and extracselec-tion of individual particle images is also described.

A glimpse at the computational challenges involved in generating the data sets used for structure determination is offered in Chapter 5. A simulation experiment, designed to

(12)

estimate the performance of hybrid load balancing schemes on distributed memory MIMD systems is presented. An application for which such schemes have proven useful is dis-cussed. A comparison between two different techniques for performing computations related to highly symmetrical structures is also described.

Chapter 6 summarizes the work presented in this thesis and describes some consider-ations for future research.

1.3 Research Contributions

The most important contribution of the research described in this thesis is the Cross-point Method for automatic detection of the positions of virus particles in electron micro-graphs (see section 4.4.3 on page 55). This method combines traditional image processing techniques with heuristics and a new algorithm for the detection of particle centers. The complexity of the method is linear in the number of pixels in the image as opposed to the O( ) complexity of the algorithms described in the literature [HEEL82], [FRAN84], [OLSO89], [THUM95]. The results obtained using this method are much bet-ter than those previously reported, both in bet-terms of the running time and the quality of the solution. The Crosspoint Method is used to replace manual selection of particles and thus to improve the efficiency of the three-dimensional structure reconstruction process. The resolution of three-dimensional reconstructions can be increased by allowing a larger number of particle projections to be considered, but this is feasible only using an auto-mated procedure for the identification of the positions of individual particles. The method is also very important for the control of the electron microscope.

EMMA is a new image processing package built around the Crosspoint Method which allows the structural biologist to employ the automatic virus particle detection algo-rithm in an interactive, user-friendly environment. The package also provides capabilities for traditional processing of electron micrographs.

A new correlation method for fitting data sets is outlined in section 3.3.3 on page 24. This method is iterative and allows scaling of data sets obtained from separate computa-tions relative to each other. It is particularly useful for analyzing the differences between two similar structures.

(13)

Tonitza is a graphics package for the visualization, analysis and manipulation of very large data sets specific to structural biology, developed as part of the work reported in this thesis. The integration of data manipulation and visualization offers distinct advantages (e.g., uniform graphical user interface, increased efficiency) over using separate packages for each of these tasks.

(14)

2. FUNDAMENTAL CONCEPTS IN STRUCTURAL BIOLOGY 2.1 Crystals and Diffraction

Biomacromolecules often occur naturally or in vitro as organized structures com-posed of subunits arranged in a symmetrical way. Such structures are readily studied by diffraction methods. Some of the fundamental concepts concerning crystalline matter, symmetry relationships and diffraction theory are summarized in this chapter.

A crystal is a regular arrangement of atoms, ions, or molecules, and is conceptually built up by the continuing translational repetition of some structural pattern [BAKE95]. This pattern, or unit cell, may contain one or more molecules or a complex assembly of molecules. In three dimensions, the unit cell is defined by three edge lengths a, b, c, and three interaxial angles, , , . The seven three-dimensional crystal systems (triclinic,

monoclinic, orthorombic, tetragonal, trigonal, hexagonal, and cubic) arise from the seven basic space-filling shapes that unit cells can adopt.

A lattice is a mathematical formalism that defines an infinite array of imaginary points: each point in the lattice is identical to every other point. That is, the view from each point of a lattice is identical to the view in the same direction from any other point (this condition is not obeyed by finite crystals). Three dimensional lattices are defined by three translations a, b, c, and three axes at angles , , to each other. Crystal lattices may be

primitive, with one lattice point per unit cell, or centered, containing two or four points per

unit cell. The crystal structure is built up by placing a motif at every lattice point. The motif may be asymmetric or symmetric.

The symmetry of any crystal of a biological molecule can be described only by rota-tions and/or translarota-tions. This is because biological molecules such as proteins mainly consist of l-amino acids and hence, reflection and inversion symmetries are not allowed. Such structures are called enantiomorphic. The crystal structure, crystal lattice, and motif

α β γ

(15)

are all restricted in the symmetries they can display, but biomacromolecular assemblies themselves are not restricted, in the sense that they may display additional internal,

non-crystallographic symmetry (i.e., symmetry that is not contained within the allowed lattice

symmetries). The symmetry of a three-dimensional structure is described by a space

group. There are 230 ways to generate a regular pattern from a motif associated with a

three-dimensional lattice, hence 230 space groups. Only 65 space groups are compatible with the enantiomorphic biological structures.

The asymmetric unit is a part of a symmetric object from which the object can be generated by symmetry operations. The number of asymmetric units may be less than, equal, or greater than the number of molecules in the unit cell. If the number of asymmet-ric units is equal to or less than the number of molecules in the unit cell, then the molecule either contains no symmetry or it contains non-crystallographic symmetry. If the number of asymmetric units is greater than the number of molecules in the unit cell, then the mol-ecules must occupy special positions and possess the appropriate symmetry elements of the space group.

Of special interest for structure determination is the icosahedral symmetry, which generally denotes the symmetry of an icosahedron and/or that of a dodecahedron. This type of symmetry governs the arrangement of protein subunits within the shells of

spheri-cal viruses [BRAN91]. Figure 2.1 shows the symmetry properties of an icosahedron.

There are three different types of rotations that bring it into self-coincidence. The symme-try elements corresponding to these rotations are twelve five-fold, twenty three-fold, and thirty two-fold axes of rotation.

(16)

Diffraction methods including X-ray, neutron, electron, and optical diffraction

pro-vide a powerful way to study molecular structures. The ultimate goal is to understand the chemical properties of molecules by determining their atomic structures. Presently, only diffraction techniques such as X-ray and neutron diffraction are routinely capable of revealing the arrangement of atoms in molecular structures.

Diffraction is the non-linear propagation of electromagnetic radiation and occurs

when an object scatters the incident radiation. The rays scattered from different portions of the object interfere both constructively and destructively, producing a diffraction pattern which can be recorded. The recording can be made either on film (the classical method) or using an electronic detector which feeds the signals detected directly in a digitized form into a computer. Figure 2.2 is a schematic representation of the diffraction process. Figure 2.3 shows a portion of the diffraction pattern of an HRV14 crystal [STEL96]. The diffrac-tion pattern consists of points, also called reflecdiffrac-tions. Each reflecdiffrac-tion arises from interfer-ence of rays scattered from all irradiated portions of the object. Structure determination by diffraction methods involves measuring or calculating structure factors at all points in the diffraction pattern. Each structure factor is a complex number, described by amplitude and

phase.

Figure 2.2 Schematic representation of the diffraction process incident radiation

crystal

diffracted rays

photographic film

(17)

Figure 2.3 Portion of a diffraction pattern from an HRV14 crystal

Amplitude is the strength of interference at a particular point and is proportional to

the square root of the intensity in the recorded pattern. The phase is the relative time of arrival of the scattered radiation at a particular point (e.g., photographic film), and this information is lost when the diffraction pattern is recorded.

Sir W. L. Bragg of Cambridge University described diffraction from crystals as aris-ing from the reflection of radiation from imaginary parallel planes of electron density. Each set of planes is characterized by three Miller indices, (h, k, l), which are the recipro-cals of the intercepts, in unit cell edge lengths, that the set of planes makes with the axes of the unit cell. The intensity of each (h, k, l) reflection is proportional to the electron density

distribution in the (h, k, l) planes [BLUN76].

2.2 X-ray Crystallography

X-rays are short wavelength electromagnetic radiation, emitted when electrons jump from a higher to a lower energy state. In laboratories, X-rays are produced by high-voltage tubes in which a metal plate is bombarded with highly accelerated electrons and this causes X-rays of a specific wavelength to be emitted. More powerful X-ray beams are pro-duced using synchrotron storage rings, where electrons travel close to the speed of light.

(18)

The first prerequisite for solving a three-dimensional structure of a protein by X-ray crystallography is a well-ordered crystal that strongly diffracts X-rays. Crystallization is often quite difficult to achieve for large macromolecules, and crystal growth can be slow [BRAN91].

For a typical protein crystal of a small macromolecule such as myoglobin, each of the about 20,000 diffracted beams measured contains scattered X-rays from each of the 1500 or so atoms in the molecule. To extract information about individual atoms from such a system requires a considerable computational effort.

A major concern of structure determination using X-ray crystallography is retrieving the phase information lost upon recording of the diffraction pattern. The phases could be obtained if it were possible to focus the scattered rays with a lens to form an image. Unfor-tunately, such lenses do not exist, and some other method must be used. For small mole-cules (several thousands of atoms), the “phase problem” has been solved by Nobel Prize winners J. Karle and H. Hauptman [HAUP53] who developed the so-called direct

meth-ods. For large proteins however (millions of atoms), the problem remains and several

tech-niques, such as the Multiple Isomorphous Replacement and the Molecular Replacement have been devised [BRAN91].

The Molecular Replacement method (MR) [ROSS72] utilizes the identity of struc-ture in different parts of the crystallographic asymmetric unit, caused by the repetition of the same subunit structure in the formation of the whole molecule. It may also use the rela-tionship between different crystal forms of the same or similar molecules. The method consists of solving three main problems. (a) The rotation problem: systematic inspection of the Patterson map (this is a map of vectors connecting the heavy atoms [BRAN91]) allows determination of the relative orientation of independent molecules (or subunits of molecules) within one crystal lattice or between different crystal forms. Rotation matrices, , are computed in this stage. (b) The translation problem: the translation of subunits must be determined with respect to the designated crystallographic symmetry elements. This may also be done by inspection of the Patterson function. A translation vector, d, is computed in this stage. Using the rotation matrices determined in the previous stage, the

(19)

exact relationship between a point x in a standard molecule (or subunit) and the corre-sponding point , in a different subunit, can be expressed as . (c) The

phase problem: determine the phases corresponding to the recorded amplitudes. Two types

of symmetry are considered at this stage: the crystallographic symmetry, independent of the molecules under study, and the non-crystallographic symmetry, occurring within a molecule itself. In order to determine the phases which will allow determination of the molecular structure, a number of equations are set up. These represent the condition that for the set of observed structure factor amplitudes the electron density distribution within the volume of the unit-cell is identical within all subunits related by both crystallographic and non-crystallographic symmetry and that it is zero, or constant outside these volumes. The iterative computation required for solving these equations is currently being carried out on parallel machines, in a process known as phase refinement and extension. The amplitudes of the structure factors are obtained by processing the diffraction data and are used to determine the non-crystallographic symmetry. An initial, low resolution model for the phases (and hence for the molecular structure) is obtained by one of a series of meth-ods. For example, a virus may be assumed to be a hollow shell or the model of a related virus may be used. In the first iteration, the phases derived from the model are combined with the corresponding observed structure factor amplitudes. In subsequent iterations, cal-culated structure factor amplitudes are replaced by observed ones. In the next step, the model is expanded from one asymmetric unit to the whole unit cell. An electron density map is computed by Fourier synthesis. The electron density values are then averaged among all the structurally identical non-crystallographic units. As a result, a new and more accurate map is obtained. Using this map and an inverse Fourier transform, a set of calcu-lated structure factors (phases and amplitudes) is produced (also at this point, the resolu-tion can be extended). These phases, better than the original ones, are combined with the observed amplitudes replacing the previous, less exact phase set and the entire cycle is repeated. After a number of cycles the phases usually converge. The measure of conver-gence is evaluated by a correlation coefficient which relates the observed structure factors (determined experimentally) and the calculated ones.

(20)

Figure 2.4 Fitting a structure in an electron density map

The electron density map has to be interpreted as a polypeptide chain with a particu-lar amino acid sequence. Several limitations of the data may complicate the interpretation of this map. The map itself may contain approximation errors in the phases. The resolution of the data is also a factor that affects the quality of the map and is, in turn, affected by the quality of the crystal. From a map at low resolution (10 or larger) one can obtain only the shape of the molecule. At medium resolution (between 4 and 6 ) it is possible to distinguish some secondary structural features (e.g., -helices, -sheets), and at higher resolution the path of the polypeptide chain can be traced and a known amino acid sequence can be fit into the map. Very high resolution (1 ) is required to observe atoms as discrete spheres of density. However, few structures have been determined to such high resolution. Figure 2.4 shows a two-dimensional example of how a known model can be fitted into an electron density contour map, by placing the atoms of the known structure at peaks of electron density. Mask maps are sometimes used in conjunction with electron density maps to distinguish between different particles.

Computer graphics is currently extensively used for both chain tracing and model building, to present the data and to manipulate the models.

A°

A° A°

α β

(21)

2.3 Electron Microscopy

In the case of electron microscopy, structure can be directly visualized because the electrons can be focused with lenses to form images. In the absence of noise, an image is considered to contain structural information (amplitudes and phases of the structure fac-tors) in directly interpretable form. According to Abbe’s theory, image formation is a two-stage, double-diffraction process. That is, an image is the diffraction pattern of the diffrac-tion pattern of an object (see Figure 2.5 ).

Figure 2.5 Image formation in a lens (from [BAKE95])

With an ideal lens system, an image depicts every detail present in the object. In the first stage a parallel beam of rays incident on the object is scattered and the interference pattern (Fraunhofer diffraction pattern) is brought to focus at the back focal plane of the lens. This stage is also referred to as the forward Fourier transformation. The intensity distribution of the recorded diffraction pattern of an object is proportional to the square of the Fourier transform of that object. The second stage of image formation occurs when the scattered radiation passes beyond the back focal plane of the lens and interferes (recombines) to form an image. This is the inverse Fourier transformation stage. Note that the image not exactly represent the object, because some scattered rays never enter the lens and can-not be focused at the image plane.

Image processing of electron micrographs provides an objective way to extract reli-able structural information from noisy images. Noise appears in all micrographs to various

incident radiation

(22)

extents and can arise from a variety of sources, such as the specimen itself, specimen sup-port film, microscope, and photography.

The goal of structure determination by means of electron microscopy is to produce three-dimensional electron density maps based on information obtained from projected specimen images. The theoretical foundation for this three-dimensional reconstruction process is given by the Projection Theorem which states that the Fourier transform of the

projected structure of a three-dimensional object is equivalent to a two-dimensional cen-tral section of the three-dimensional Fourier transform of the object, normal to the direc-tion of projecdirec-tion [CROW70]. Indeed, the Fourier transform of the funcdirec-tion is:

. The central section Z = 0 of the transform is given by:

, with .

This approach is similar to conventional X-ray crystallography, except that in this case the phases can be computed from the image. The different views may be collected either from a single particle by using a tilting stage in the microscope, or from a number of particles in different but identifiable orientations. In general, it is desirable to combine data from dif-ferent particles, so that imperfections can be averaged out [CRRA71]. Methods for digital image processing of electron micrographs are presented in detail in section 4.3 on page 50.

2.4 Other Methods for Structure Determination

X-ray crystallography and electron microscopy are only two of a number of methods developed to investigate molecular structures. Diffraction is only one of the many phe-nomena that can be exploited to gather information about the arrangement of atoms within molecules. For example, certain atomic nuclei have a magnetic moment or spin. The chemical environment of such nuclei can be probed by Nuclear Magnetic Resonance (NMR). This technique can be exploited to give information on the distances between atoms in a molecule. These distances can then be used to derive a three-dimensional

ρ(x y z, , ) F X Y Z( , , ) =

∫

∫∫

ρ(x y z, , ) e⋅ 2⋅ ⋅ ⋅π i (xX+yY+zZ)dxdydz

(23)

model of the molecule. NMR and X-ray crystallography are in many respects complemen-tary. X-ray crystallography deals with the structure of proteins in crystalline state, whereas NMR determines the structure in solution. X-ray crystallography is more suitable for char-acterization of protein surfaces and the water structure around the protein, whereas NMR is more suitable for investigation of dynamic processes such as those during folding. X-ray crystallography remains the only method available to determine the structure of large pro-tein molecules, whereas NMR is the method of choice for small propro-tein molecules that might be difficult to crystallize [BRAN91].

Structural studies are often insufficient to infer the function of a protein from its structure. It is then necessary to combine biochemical studies with structural information. The specific role of each amino acid residue for the function of the protein can be tested by making specific mutations of that residue and examining the properties of the new protein. By combining such functional studies in solution, DNA techniques, and three-dimensional structure determination, scientists are trying to gain insights into the way molecules work.

(24)

3. COMBINING DATA VISUALIZATION WITH COMPUTATIONS 3.1 Overview

Graphics packages for data exploration like IRIS Explorer [EXPL91] and AVS (Application Visualization System) [AVSP89] offer distinct advantages to scientists and engineers in need of data visualization tools. Yet, when data visualization and very spe-cific computations are tightly coupled together, the use of generic graphics packages looses some of its appeal. For example, to display simultaneously two three-dimensional reconstructions of a spherical virus may require resizing one of the data sets relative to the other so that the two are brought to the same scale [BAKE95]. Finding the optimal magni-fication factor to be used for resizing is based on computing the correlation of the two data sets. The correlation procedure is iterative and consists of defining the correlation regions (one is not interested in correlating densities within the nucleic acid core) and the magnifi-cation range, determining an optimal magnifimagnifi-cation factor that maximizes the correlation coefficient, refining it, and scaling one map relative to the other. For such a sequence of computation and visualization steps, the integration of data transformations and data visu-alization into a specialized tool provides distinct advantages: there is a uniform graphical user interface, the efficiency is increased because data manipulation is done in main mem-ory and the need for I/O operations is minimized.

Considerations like the ones described above provided the motivation for Tonitza, an interactive package for visualization, analysis, and data manipulation. It is aimed at the field of structural biology, but it can be used to explore any scalar field data (see Appendix D). It allows processing and display of multivariate gridded data in a variety of representa-tions and it is available on graphics workstarepresenta-tions running OpenGL [OPGL93].

(25)

3.2 Survey of Software Packages for Structural Biology

A variety of generic as well as specialized software packages for molecular model-ing are available today. Generic visualization packages like AVS [AVSP89] and IRIS Explorer [EXPL91] are suitable for physical sciences and engineering and can be used for molecular modeling. Such packages have a modular design and application specific soft-ware can be generated by combining different modules into flow networks. Flow networks are then executed as scripts.

According to [NIHG96], specialized software packages used for molecular model-ing can be classified into several categories, based on their functionality (see also Table 3.1). (a) Structure database manipulation packages are designed to either search and access the major chemical structure databases (the Brookhaven Protein Databank, the Cambridge Structural Database, the Drug Information System 3D Database) or create, maintain and search local chemical structure databases. PDBtool, for example, is a Protein Data Bank browser that allows querying and verification of macromolecular structure data. It includes a three-dimensional viewer and renderer and a variety of graphical and textual structure verification tools. It runs on SGI and SUN platforms. (b) Model building packages allow the user to fit structures into electron density maps and/or compute and display all the molecular graphs that correspond to a given chemical formula, prescribed and forbidden substructures, optional conditions (intervals for possible ring sizes, hybrid-ization of carbon atoms), stereo isomers, etc. An example is O, a general purpose macro-molecular modeling environment. The program supports model building and display of macromolecules. Due to specific features such as building and rebuilding of models into electron density, O is mainly aimed at the field of protein crystallography. The program is built on top of a versatile database system which contains the entire molecular data in a predefined structure. It is available on ESV, SGI, and HP platforms. Another example is XtalView, a complete package for solving macromolecular crystal structures by isomor-phous replacement, including building the molecular model. It runs on Sun, DEC, SGI, IBM, and PC computers and takes full advantage of the modern workstation environment. (c) Structure drawing/visualization packages have revolutionized the publication and pre-sentation of chemistry information. In addition, they can be used to visualize and

(26)

manipu-late structures in various representations and/or to input structures to some molecular modeling programs. RasMol, for example, is a program for the visualization of proteins, nucleic acids and small molecules, aimed at displaying, teaching, and generation of publi-cation quality images. The program runs on X-compatible Unix computers, Macintosh, PowerMac, and PCs running Microsoft Windows. Another example for this category is VMD, designed for the visualization of biological systems such as proteins, nucleic acids, and lipid bilayer assemblies. It provides a wide variety of methods for rendering and col-oring a molecule: simple points and lines, CPK spheres and cylinders, licorice bonds, backbone tubes and ribbons, etc. It runs on SGI workstations.

Functionality Name Source

Structure database manipulation

Iditis Oxford Molecular PROTEP Tripos Associates

PDBtool San Diego Supercomputing Center ChemDBS-3D Chemical Design, Inc.

Model building

O Aarhus University

XtalView Scripps Institute MOLGEN+ University of Bayreuth CONCORD Tripos Associates

Cobra Oxford Molecular

Structure drawing / visualization

RasMol University of Edinburgh ChemDraw Cambridge Scientific ISIS Draw Molecular Design, Ltd. Raster3D University of Washington VMD University of Illinois, Urbana Kekule PSI International

(27)

(d) Molecular mechanics packages include features such as energy minimization (used to optimize initial geometries or to repair poor geometries), template forcing (i.e., evaluating whether a molecule can adopt a template conformation consistent with a given model), and torsion forcing (to obtain Ramachandran-type contour plots for proteins). (e)

Molecu-lar dynamics packages attempt to solve Newton’s equation of motion:

(where i runs through all free atoms and the gradient is Molecular mechanics

CHARMm Molecular Simulations, Inc. Discover Biosym Technologies Polaris Molecular Simulations, Inc.

Molecular dynamics

MacroModel Columbia University QUANTA/

CHARMm

Columbia University SYBYL Tripos Associates

Cobra Oxford Molecular

Insight II Biosym Technologies

Distance geometry

DGEOM Dupont

X-PLOR Molecular Simulations, Inc. SYBYL Tripos Associates

Quantum chemistry

GAMESS Iowa State University

HONDO IBM Corporation

UniChem Cray Research NMR-based structure

determination

NMRchitect Biosym Technologies X-PLOR Molecular Simulations, Inc.

DGEOM Dupont

Functionality Name Source

Table 3.1 Selected software packages used in molecular modeling

m_i t2 2 ∂ ∂ x_i ⋅ = ∇xiE_total ∇xiEtotal

(28)

derived from the energy function), produce molecular dynamics simulations, store them in trajectory files, etc. (f) Conformational searching packages include algorithms for effi-cient searching of conformational space defined in terms of torsion angles and multi-molecular translations/rotations, some of which pertain well to parallelization. (g)

Dis-tance geometry packages facilitate molecular structure determination using metric matrix

distance geometry. (h) Quantum chemistry packages include various applications of den-sity functional theory. (i) NMR-based structure determination packages allow determina-tion and refinement of structures based on interproton distance estimates, coupling constants measurements, and other information, such as known hydrogen-bonding pat-terns.

These are just a few examples of molecular modeling packages. There are many oth-ers not listed in this survey which focuses only on programs widely used by the structural biology community.

3.3 Tonitza Scientific Visualization Package for Structural Biology

We have developed Tonitza, a package which consists of a graphical user interface (GUI), input/output, visualization, and computation modules. This section describes the overall structure of these modules. All figures are based on data sets representing Ross-River or Coxackie B3 virus structures [CHEN95], [MUCK95] (see Appendix A).

3.3.1 The Graphical User Interface

The user interface is based on X-Windows and Motif [YOUN90]. The main style is that of a direct-manipulation user interface [FOLE90] in which the objects and attributes that can be operated on are represented visually and operations are invoked by actions per-formed on the visual representations, typically by using the mouse. However, this interac-tion style is not sufficient by itself and other interface styles, such as menus, are also included. Figure 3.1 shows a snapshot of the screen during a Tonitza session and illustrates the style of the user interface.

A –

(29)

Once an object has been displayed in the main window, it can be manipulated in var-ious ways using the mouse or the dials.

(30)

Depending on the representation, objects may be rotated, translated, scaled, and/or clipped with a plane. A special algorithm, based on quaternion algebra was used to implement rotation of objects around an arbitrary axis using the mouse and will be discussed in sec-tion 3.4.2 on page 36. Objects may be customized using colormap and material editors. A

colormap editor allows the user to specify the mapping of data values to colors to create a

so-called colormap. Such maps are used for continuous scale representations. The design of the colormap editor is based on Icol [ICOL92]. The visualization session usually starts with a gray scale colormap, with colors varying linearly from black (corresponding to the lowest data value) to white (for the highest data value). A colormap saved from a previous session may also be used as an initial colormap for the current session. The editor allows the user to create linearly interpolated colormaps between key points in the Red-Green-Blue (RGB) color space. It consists of a menu, a graph area, a color cell area, and a set of slider bars (see Figure 3.2). The menu has options for saving the current colormap in a file and for restoring a previous mapping. The graph area consists of two parts: the lower half contains the colors currently stored in the colormap in the form of vertical rectangles (the overall appearance is that of a smoothly varying color scale); the upper half contains a graph depicting the red, green, and blue color components for each of the colors in the col-ormap. The color cell area consists of cells, one for each color in the graph area. Tonitza tries to allocate the largest number of colors available. It first tries to allocate 256 colors. If it cannot, then it tries to allocate 128, and so on. Each color cell is selectable with the mouse and, when selected, it defines a key point or knot. Key points are displayed in the graph area as black vertical lines. The slider bars (one for each of the R, G, and B compo-nents) allow the color corresponding to the current key point to be modified. This change is reflected in both the graph and color cell areas. The R, G, B components are no longer linearly varying between two colors, but they are piecewise linearly varying between con-secutive key points. Figure 3.2 shows a snapshot of the colormap editor.

The material editor may be used in conjunction with surface representations to mod-ify the material properties of the current object, whether it is displayed in a wireframe or

shaded representation. For wireframe objects, only the color can be altered. Shaded

(31)

and adjusting the primary reflections of light from the object surface [WATT93]. All shad-ing and lightshad-ing calculations are done via OpenGL library calls [OPGL93] which imple-ment the Phong reflection model [WATT93]. In this model the intensity associated with a vertex in the polygonal representation of an object has three components. (a) The ambient

term is the ambient color of the light scaled by the ambient material property. The ambient

material property affects the overall color of the object. It is independent of the position of the viewpoint and it is most noticeable where an object receives no direct illumination. (b) The diffuse term has a more complex expression, depending on the angle at which light impinges on the surface. The diffuse reflectance of the material plays the most important role in determining what the human eye perceives the color of the object to be. It is affected by the color of the incident diffuse light and the angle of the incident light relative to the normal direction. The position of the viewpoint does not affect the diffuse reflec-tance of the material at all. (c) The specular term combines the specular color of the light, the specular property of the material, the shininess of the material (also known as specular exponent), and the angle between the direction of the light and the direction of the viewer. The specular reflection from objects produces highlights. The higher the shininess of the object, the smaller and brighter the highlight is. The material editor allows the user to define the ambient, diffuse, and specular material properties of an object, to specify its shininess and its opacity. Figure 3.3 shows the material editor. In addition to the interface components necessary for material specification, it contains a small drawing area in which an ellipsoid made of the current material is displayed. This helps the user to get an idea of what material a particular choice of parameters yields, before this material is applied to any of the objects displayed.

3.3.2 Input/Output

In Tonitza, the input/output (I/O) module is responsible for: reading/writing data files from/to the disc, automatic file format recognition, conversion between formats, and handling I/O errors. The program accepts as input gridded data produced by scientific and engineering software. It provides support for a variety of data formats used in X-ray crys-tallography and electron microscopy (see Appendix C). In addition, it accepts as input

(32)

image and movie files. Tonitza also allows the user to produce new data and image files and to export them. The I/O module can be easily enhanced to include new data formats. In general, a file accepted or generated by Tonitza consists of a header followed by data. The format of the data is automatically detected using the information provided in the file header. The data consists of a set of values that form a scalar field in a two- or three-dimensional space.

(33)

3.3.3 Computations

Specific computations provide support for data analysis. They may be used indepen-dently or in conjunction with the visualization module to obtain information about the contents of the data. The computational functions supported in Tonitza are described in what follows.

(34)

Data rotation allows computation of rotated maps by resampling from the original

data. This feature is useful, for example, in the case of three-dimensional electron density maps representing virus particles reconstructed by electron microscopy methods. Such particles usually exhibit a high degree of symmetry and it is important to be able to select subvolumes for visualization based on symmetry elements relative to the original data set which is usually given in a “standard orientation”. This feature also allows one to slice the data volume with arbitrary planes and to visualize it in arbitrary orientations. Figure 3.4 shows a snapshot of the dialog window used for the specification of the rotation parame-ters.

Figure 3.4 Data rotation window

The correlation and scaling of two data sets allows the user to compare the two sets, and to adjust one relative to the other by resampling based on the value of a magnification

factor. For example, cryo-electron microscopy and image reconstruction of the Ross-River

virus [CHEN95] showed that the T = 4 quaternary structure [BRAN91] of the nucleo-capsid consists of pentamer and hexamer clusters of the nucleo-capsid protein, but not dimers as have been observed in several crystallographic studies [CHOI91]. Comparison of the sur-face features of Ross-River virus with and without bound antibody Fab fragments reveals

(35)

the locations where the 240 Fab fragments bind to the 80 copies of glycoprotein spikes. The binding domain of these antibodies also seems to be a region in which the virus engages in the host recognition [SMIT95]. Before the two structures can be displayed together, they have to be brought to the same scale. The magnification factor is used to resize one of the data sets relative to the other based on their correlation. The correlation procedure is iterative and it consists of four steps: (a) define the regions over which corre-lations are made based on a plot of the average electron density as a function of the parti-cle radius (Figure 3.5), (b) compute an optimal magnification scale factor from a range of factors as the one that maximizes the correlation of the two data sets in the regions defined in the previous step (Figure 3.6), (c) using the optimal magnification factor determined in step (b), plot the correlation coefficient as a function of the particle radius and use this plot to redefine the correlation regions (Figure 3.7), and (d) determine the coefficients of a lin-ear transformation to be applied to the electron density values in one data set relative to the other so that the two data sets have the same average density value. Steps (b) and (c) may be repeated iteratively, to improve the precision of the magnification factor.

(36)

Figure 3.6 Correlation coefficient as a function of the magnification factor

Figure 3.7 Correlation coefficient as a function of the particle radius

1.0

0.7 1.25

(grid points)

1

(37)

The composite map feature allows one to compute a linear combination of two data

sets. The new map may be subsequently displayed in various representations and may give some insight about the relationship between the two maps. A special case is the difference of two data sets. Figure 3.8 shows a shaded surface representation of the Ross-River virus in its native form together with Fab antibody fragments attached to it. The surface of the antibodies corresponds to a map computed as the difference between data representing the complex form of the virus attached with antibodies and the native virus structure.

Figure 3.8 Ross-River virus surface with antibody fragments attached. The surface of the antibodies was computed as a difference map between the com-plex form of the virus attached with antibodies and the native virus structure.

Graphs are used in Tonitza to support other computational features (i.e., the steps of

the correlation procedure). Figure 3.5 offers an example of such a graph.

Histograms provide statistical information about the data. They may be computed

(38)

density). Figure 3.9 shows a histogram for the electron density data corresponding to a Ross-River virus particle.

Figure 3.9 Electron density histogram for the Ross-River virus structure 3.3.4 Data Visualization

Tonitza supports two- and three-dimensional visualization of data. It allows interac-tive zooming, rotation, translation of the objects displayed, image recording, animation, and printing. Multiple data sets and/or multiple scalar properties can be displayed simulta-neously. The representations available and examples of how they may be used to investi-gate biological structures are presented here.

Planar sections allow the display of data in sections parallel to the planes of

coordi-nates. Using the data rotation feature described in the previous section, arbitrary sections through the data volume may be obtained. Sections of different orientations can be dis-played simultaneously, at different positions within the data volume. A set of slider bars allows the user to sweep the entire volume with such sections. The data in planes may be

(39)

represented as a set of contour lines at user-defined levels, as a continuous scale map, by associating data values with colors, or by using both representations superimposed. The colors can be interactively edited using the colormap editor described in section 3.3.1 on page 19. Stacks of contours can be created out of a set of sections, to allow interpretation from a three-dimensional perspective. This fairly primitive form of depth cueing will be replaced in future versions by stereopsis. Figure 3.10 shows a contour map for the Ross-River virus, with electron density contoured at three levels. The map illustrates the overall organization within the multilayered virus structure [CHEN95]. The positions of icosahe-dral two-, three-, and five-fold axes are shown. Figure 3.11 depicts an equatorial section (through the same virus structure) as a continuous scale map. Such a density map by itself does not allow one to differentiate between the nucleic acid, protein, and lipid components of the virus particle, but low-angle neutron and X-ray scattering results permit one to ascribe specific features to these components. The map also shows regions of membrane pinching which are suggested to be the location of the transmembrane connections [CHEN95]. The arrows in Figure 3.11 indicate two such regions. A stack of contours for a Coxackievirus B3 mask map is shown in Figure 3.12. The map reveals the spatial arrange-ment of particles within an asymmetric unit.

Figure 3.10 Equatorial contour map for the Ross-River virus structure

2

3 3

5

(40)

Figure 3.11 Equatorial continuous scale map for the Ross-River virus structure

(41)

Spherical sections are similar to continuous scale maps, except that now data values

are interpolated on spheres instead of planes. Such a representation is useful when one is interested in visualizing the distribution of scalar values at various radii within the data volume. Sweeping such sections through the entire volume may reveal particular proper-ties which cannot be inferred from planar representations. In the case of the Ross-River virus, spherical sections reveal the glycoprotein spikes as flower-like structures that project outward from the virus structure and have a hollow base (Figure 3.13) [CHEN95].

Figure 3.13 Spherical sections viewed along two-, three-, and five-fold axes in the case of the Ross-River virus

Isosurfaces are a different kind of representation which provides information about

sets of points within a data volume that have associated a particular scalar value. An isos-urface is the three-dimensional equivalent of a contour line in the plane. In Tonitza, isosur-faces can be displayed as wireframe or shaded, for the entire data volume or for selected subvolumes. The algorithm for computing isosurfaces, based on the classical Marching Cubes method [LORE87] is discussed in detail in section 3.4.1 on page 34. Figure 3.14 (a) shows a shaded surface representation of a Ross-River spike. The figure clearly reveals the three-fold nature of the surface spike and the bilobal nature of each of the spike petals

(42)

[CHEN95]. Figure 3.14 (b) shows a shaded representation of a Ross-River isosurface viewed along a three-fold axis. The figure is overlaid with a lattice which shows the rela-tive positions of the spikes with respect to the two-, three-, and five-fold symmetry axes [CHEN95].

Figure 3.14 Shaded isosurface representations of Ross-River virus data: (a) boxed spike, (b) view along a three-fold axis overlaid with a lattice which shows the positions of the symmetry elements

3.4 Algorithms

Computer graphics has the potential to go beyond its preoccupation with photoreal-ism and simple surfaces and expand into the medium of useful applications for various areas of human interest. The tools that enable this expansion are the algorithms, and the graphics programmer must weave together their strengths and weaknesses to best serve a particular problem. We describe in this section two of the most complex algorithms used in Tonitza, which have been adapted to fulfill the efficiency requirements imposed by the need for visualizing very large data sets.

(a) (b) 3 2 2 2 5 5 5

(43)

3.4.1 Computing Isosurfaces

Until recently, wireframe isosurfaces have been the predominant form of producing electron density maps in interactive modeling environments, whereas tools which pro-vided solid isosurface representations have traditionally been non-interactive.

In 1977 Fuchs & al. [FUCH77] proposed a method to construct optimal contour sur-faces from consecutive planar contours. An isosurface is built from a set of triangular tiles obtained by determining the optimal surface between a pair of contours. This method is computationally complex and does not allow for the fast generation of isosurfaces required in interactive computer graphics. In 1986 G. Wyvill & al. [WYVI86] introduced the concept of “soft object” and described a method of producing such objects by generat-ing a field and computgenerat-ing isosurfaces at various levels within this field. The method for computing the isosurfaces uses the “Marching Cubes” concept, although this name is associated with a paper published a year later by Lorensen and Cline [LORE87].

The Marching Cubes method offers a much more structured approach than Fuchs’ method for high resolution surface extraction from three-dimensional data grids. In this algorithm, scalar values are assumed to be given at each point of a grid. The contouring threshold is compared with each of the eight grid values to qualify the position of each grid point as being inside or outside the volume of interest. Each cell in the grid has a number of edges marked with intersection points. The arrangement of the edges can be classified into 256 cases. The vertex normals required for shading are generated using the original grid values: at each grid point a gradient vector is determined using a central dif-ference approximation which is then normalized to unit length.

For large data sets like those obtained from structural biology experiments it is very important that the algorithm for computing isosurfaces be fast. The techniques we used to speed up the computation of isosurfaces were combined from various sources and are dis-cussed here.

Let us note that the Marching Cubes method as described by Lorensen and Cline may generate holes in the surfaces due to ambiguities caused by the fact that only values at grid points are considered. Figure 3.15 depicts such an ambiguous situation. How to decide whether to choose the hexagon shown in (a) or the two triangles shown in (b) ?

(44)

The solution is given by Wyvill & al. [WYVI86] who examine not only the values of the field at the grid points, but also at the centre of each grid cell face and at the centre of each grid cell. For the example in Figure 3.15 this means that if the top face of the cube has a value of “inside” (filled circle) at its centre then the hexagon is the right choice. Oth-erwise, the two triangles are to be chosen as part of the isosurface polygonalization.

Figure 3.15 Ambiguous case in the Marching Cubes method: depending on the value at the centre of the top face of the cube, either (a) the hexagon, or (b) the two triangles belong to the surface polygonalization

Figure 3.16 shows the pseudo-code for a naive implementation of the Marching Cubes method:

for each grid cell do

find polygon(s) of intersection of isosurface with

the grid cell

compute surface normals at polygon(s) vertices

triangulate polygon(s)

endfor

Figure 3.16 Naive implementation of the Marching Cubes algorithm

The vertices of the polygon(s) of intersection are computed by linear interpolation from the grid points. The first idea that comes to mind when trying to speed up the algorithm is to take advantage of the fact that neighboring cells share polygon vertices and to organize

(45)

the computation in such a way that vertices are computed and stored only once. Next, we also take advantage of the fact that the number of edge combinations in which an isosur-face can intersect a grid cell is finite [WATT93]. Each cell vertex is assigned a bit and the string of eight bits associated with the eight vertices of a cell is used to index a particular grid cell in an edge table. This table contains a list of the edges which are intersected for a given arrangement of vertices with respect to the isosurface and information about the cor-responding triangulation. The benefit of using such a table is that it can be precomputed. The computation of the surface normals constitutes a major processing bottleneck. To speed up this process, Henn & al. [HENN96] propose an alternative preprocessing step to be applied to the entire data grid. Their modification of the original isosurface algorithm is motivated by the fact that for recontouring using a new threshold value the same gradient vectors are to be used for the interpolation of the surface normals. The idea is to calculate the gradient vectors only once and to store them for reuse, thus reducing the time for rec-ontouring by removing this computation from the processing performed for each grid cell. The combined use of all the techniques previously described allowed us to generate complex isosurfaces for large data sets in real-time. A related issue is the display of such surfaces after they have been generated. We addressed this issue by taking advantage of the graphics hardware of Silicon Graphics workstations, which proved crucial in allowing real-time display and manipulation of objects.

3.4.2 Arcball Rotations

Interactive graphics systems often require techniques that allow the user to rotate graphical objects freely in three-dimensional space using commonly available two-dimen-sional input devices such as the mouse. The problem is that there is no single natural map-ping from the two parameters of the input device to the three-dimensional space. Several methods have been devised to solve this problem: the virtual trackball [HULT90], the roll-ing ball [HANS92], and the arcball [SHOE94]. The first two methods mentioned here are ill-behaved in the sense that they exhibit path dependence. That is, when dragging the mouse from point A to point B the end result depends on the path followed. For handling rotations with the mouse in Tonitza, we have chosen Shoemake’s method [SHOE94]

(46)

which has the advantages that it is path independent and it is relatively easy to implement. A description of the arcball rotation controller based on quaternion algebra follows.

The set of all possible rotations fits naturally into a coherent algebraic structure, the

quaternions [FOLE90]. A unit quaternion , with x2+

y2+ z2+ w2= 1, represents a rotation by around the axis given by the unit vector . Performing successive rotations corresponds to multiplying quaternions. The product of two quaternions qp represents a rotation p followed by the rotation q. Given two points and on a unit sphere, they can be written as unit quaternions: and . Their ratio converts the arc between them into a rotation [SHOE94]. The axis of rotation is perpendicular to the plane containing the two vectors and and the angle of rotation is twice the angle between them: = . The arcball uses the mouse press and drag positions as the end points of an arc generating a rotation. The user presses a mouse button at and drags it to . As the mouse is dragged, changes continuously and so does the rotation. It can be easily proved that the effect of an arcball rotation is path independent. That is, a stroke from to followed by a stroke from to gives the same effect as a direct stroke from to . Indeed, ( )( ) =

. Once the user starts dragging, it matters where the mouse is positioned, but not how it got there. Another advantage of this method is that there is no penalty for losing a mouse sample, which is often hard to avoid [SHOE94].

From user interface design point of view, we consider that the arcball motion is a better alternative to a set of slider bars that would allow rotations only about a predefined set of rotation axes. It is not only general, but it is also more natural and it can be mastered quickly. The motion of the user using the arcball mimics the motion corresponding to rotating a ball on a planar surface using the palm of the hand.

q = [(x y z, , ),w] = [vˆsinθ,cosθ] 2θ vˆ vˆ₀ vˆ₁ [vˆ₀,0] [vˆ₁,0] vˆ₁vˆ₀–1 vˆ₀ vˆ₁ vˆ₁vˆ₀–1 [vˆ₀×vˆ₁,vˆ₀vˆ₁] vˆ₀ vˆ₁ vˆ₁ vˆ₀ vˆ₁ vˆ₁ vˆ₂ vˆ₀ vˆ₂ vˆ₂vˆ–₁1 vˆ₁vˆ₀–1 vˆ₂vˆ₀–1