• No results found

Design and Implementation of a Molecular Graphics Display Program.

(A) Introduction,

When work began on this project, the Computer Graphics laboratory at the University of St. Andrews had the following equipment: Tektronix 4109 16 - colour terminal plus printer, a Digital VT340 16 - colour terminal, a MicroVAX GPX/1I colour workstation, and an LJ250 colour printer.

The colour workstation has an extremely large palette of colours available, and it can display many more of these at once than is possible on the other two VDU s. Production of the graphics is also much quicker because the workstation is connected directly to the computer, and driven by special software. A recent acquisition is a Tektronix CAChe system, which is a Macintosh based colour graphics system, which can produce 3-D images using crossed polaroid spectacles and filters over the screen to make the appropriate image visible to only one eye at once. Although the program described in this chapter is not compatible with the new system, it will be possible to modify it so that molecular structure files can be interchanged, and converted into CAChe format.

The Chem-X [70] package was also running on the VAX, and this was the main program available for the manipulation of molecular graphics, and it has three major drawbacks. The first of these is that the program will only drive the VT340 and Tektronix terminals, which are the least powerful of the available display equipment, and the second is that the display of large structures is painfully slow. This is because the picture has to be transmitted down a 9600 baud communications line to these terminals. The final drawback is that the program is only supplied as an executable module, and we are therefore unable to modify the program to perform any specific tasks related to the work at St. Andrews. Other disadvantages of Chem-X are its difficulty of use, and the particularly long time between starting up the program and being able to display a structure. Although the latter is not actually a problem, it is

annoying to the user, especially if the program is only being used to check a single structure, for example to ensure that it is the correct geometrical isomer. Therefore, it was decided that a new program should be written, which we would be able to modify at will, and which would use the more powerful facilities of the MicroVAX workstation.

(B) Details of the Program.

The program, MHCDraw is written in the FORTRAN programming language on the VAX, and uses the Graphics Kernel System (GKS ) for device independent use of many different terminals, this also means that individual device drivers do not have to be written for use with other terminals. GKS provides many standard routines to perform graphics output, including line drawing and polygon filling, display area scaling and the ability to provide input via the mouse, and hence allows easy access to these features, without the programmer needing to worry about the details. Most of the program is written in standard FORTRAN-77, so the task of moving it to another machine should be fairly easy. It consists of between 8000 and 9000 lines of code, including comments and blank lines. It also has a small on-line help facility, which gives the user a list of all the commands, and a brief description of what they do.

The main uses of the program can be divided into three types :

1) Conversion between different molecular graphics file formats.

2) Graphics display.

(C) Data type conversion and file manipulation, (Cl) Output File Formats.

The storage and manipulation of molecular structures is best done in Cartesian Co-ordinates. This enables the use of simple transformations to perform rotations and ti'anslations to obtain different views of the molecule under study. In addition to the co­ ordinates, the machine stores a list of connectivities, which tells it which atom is joined to which, allowing the representation of the molecule as a stick picture, with bonded atoms being joined by lines. This method of representation has the advantage that it is extremely fast to draw.

It is not particularly easy for the user of a program to obtain a mental picture of a molecule when it is represented in Cartesian co-ordinates, and the generally excepted method for the input of data into Theoretical Chemistry programs is in internal co­ ordinates in the form of a Z - matrix. For most atoms, this involves providing the atom type, bond length, angle and dihedral angle, plus atom numbers with which these parameters are refered to. In MOPAC [36], these are called Na, Nb and Nc. The first atom is always at the co-ordinate origin, and requires no other parameters. The second is along the x-axis, and only requires a bond length, whilst the third requires both an angle and a bond length. An example of a possible MOPAC Z - matrix for methanol is shown in Figure 3.1.

Atom Length Flag 1 Angle Flag 2 Dihedral Flag 3 Ha Nb Nc C H 1.1 1 1 H 1.1 1 109.5 1 1 2 O 1.3 1 109.5 1 120.0 1 1 2 3 H 0.9 1 109.5 1 90.0 1 4 1 2 H 1.1 1 109.5 1 240.0 1 1 2 3 Figure 3.1

MOPAC Z - matrix for Methanol

All angles are entered in degrees and bond lengths in Angstroms. The flags tell the program whether or not the geometry of a particular variable is to be optimised, whilst Na, 1% and Nc for the penultimate hydrogen atom ( number 5 ), show that the bond length is defined as the distance to O4 (since Na = 4), the angle is that between H5, O4 and Ci (i.e. Nb = 1 ). The dihedral angle is the twist angle between H5, O4, Ci and H2 (i.e. Nc = 2 ). More specifically, in this example, if atoms O4 and Ci are superimposed, with O4 at the front, then the dihedral is the apparent angle between H5 and H2 ( see Figure 3.2 ). Values of x in the diagram are positive if we rotate clockwise when moving from front (H5 )to back (H2).

Figure 3.2

This method of representation makes it much easier for the user to picture the molecule, and to build up a data file for it. Hence the graphics display program has to be able to convert between the two different data types.

It should be noted that input to programs such as MOPAC is possible using Cartesian co-ordinates, but the use of an internal co-ordinate system makes it much easier to follow the course of the optimisations, and to analyse results, such as the forces on the atoms, which can be reported in terms of bond stretching and angle deformation, rather than a change in X, Y and Z co-ordinates. It follows that the ability to convert cartesian co-ordinates to internal co-ordinates is also a useful feature.

The various programs used to calculate structures and display properties all use different formats and orders for the data. Some also use atomic symbols, whilst others use the atomic number. Hence, there was a potential use for a program that converts between these different formats to save time for the users, and also to save space on the computer disks, because the ease of inter - conversion means that only one co-ordinate file needs to be stored on disk.

Since routines to calculate dihedrals and angles from Cartesian co-ordinates and the Cartesian system from a Z - matrix were already available in the code for MOPAC [36], it was not necessary to rewrite such algorithms. Therefore, the source of these subroutines is acknowledged.

MHCDraw will produce the input atomic co-ordinates for MOPAC ( using the command mopac ), the Gaussian series [37, 51] ( command gaussian ), the Quest module of AMBER [64] (command quest ), and it will write out the modified Crystal Structure Cartesian format required by 3D2. This file format also includes the atomic charges for the molecule, and also can be read by Chem-X. A further possible file format, is output using the form of the Protein Data Bank ( PDB ). This format can

also be used with the AMBER program. This contains the atom type, co-ordinates, amino acid residue type and number. The latter information can also be stored in an extended crystal structure-type file (again, filetype .CHG ) designed especially for this use. This speeds up the study of protein molecules because it stores both the amino acid residue information from the PDB file and the connectivity information found in the CHG file format, and hence the connectivities do not need to be calculated every time the protein is read in.

(C2) Data File Formats used for Storage of Molecular Structures.

For the results of a MOPAC calculation, the archive files ( filetype .ARC ) are probably most useful to keep, because they contain both the structure and results of the calculation. For other calculations, the logical filetype to keep are the CHG files. Not only can this be used to store a particular view of the molecule, but it also has the atomic charges and connectivities, which are quite time consuming to calculate. For very large molecules this can also be written out in binary file format (filetype .PAC) to save disk space. Although these files are not transferable to other computers, they can save about 25% of the space used by a normal CHG file, and are quicker to read because less disk access is needed.

To produce the above output, the program stores the following information : Atom type, atomic number, 'read-in' co-ordinates, transformed co-ordinates (i.e. those corresponding to the current view of the molecule ), connectivities, plus amino acid residues and numbers from protein structure files, as well as the atomic charges. The use of two sets of co-ordinates enables a chosen view of the molecule, such as one that highlights the active site of an enzyme, to be stored whilst maintaining the original orientation of the molecule. Internal co-ordinates are also stored in the program, but only if they have been read in from a file or they have been calculated because they are explicitly needed. They are not calculated automatically because the process would be

very time consuming for a protein or other large system. Thus it is necessary for the program to store a series of flags to indicate if the internal co-ordinates data is present, or if they need to be calculated before writing out a MOPAC input file, for example.

(C3) Construction of the Z - matrix.

If the Z - matrix is constructed in the order that the atoms were read in, and atoms are 'referenced' with respect to previous ones, the values of Na, Nb, and Nc must be less than the number of the atom currently being considered. This system works if the atomic ordering has been done sensibly, but data from sources such as the Crystal Structure database often deviate from a numbering system that is logical as far as data file generation is concerned. If the ordering is bad, then this process leads to bad referencing and meaningless bond lengths. This can be seen in the following example:

Atom 2 can only be referenced back to atom 1, giving an extremely long bond length, and small angle C1-C2-C3. This could lead to trouble if the data file produced is used for a geometry optimisation using internal co-ordinates.

If the source of the co-ordinates for a molecule is in the form of internal co­ ordinates (i.e. bond lengths, bond angles and dihedrals ) then the z-matrix does not need to be constructed. If this is not the case, and only the co-ordinates and connectivities are available, the program has to decide which atoms to use as reference atoms, i.e. the values of Na, Nb and Nc. Thus the algorithm used by MHCDraw

searches the connectivity table stalling from atom one, for an atom that is chemically bound to the one under consideration. This is done in two passes : initially, hydrogen atoms are ignored because it leads to a more useful ordering if the referencing is done with atoms along a carbon chain. If the program cannot find a suitable atom, it searches again for possible atoms, this time including hydrogen atoms. If this second search also fails, it means that the atom is part of a separate molecule, and it is referenced with respect to the preceding atom in the list.

fC4) Input File Formats,

The program can obtain geometrical data using the read command, from the Crystal Structure database, Protein Structures, output from AMBER [64], Chem-X [70], MOPAC archive files, Gaussian [37, 51] log files and QUEST [64] log files. It will also look for the atomic charges from the last two formats, and extract them if they are present. Charges are also read from the preceding two formats. The extraction of atomic charges is particularly useful because it allows for easy input to two electrostatic potential programs used by the group. These are ASP [71], which compares the potential, and 3D2 [67], which is used to display the maps. This automatic analysis of the output files from programs such as Gaussian and QUEST is extremely useful because these are painful tasks to do by hand. A recent addition to the program is a routine to analyse the output of the ESPFIT program. This is used to fit charges to reproduce electrostatic potentials, which gives a simple method of comparing potentials calculated in a rigourous manner. The potential can be simulated with an array of point charges using Espfit, the geometry and charges can then be read (and displayed) using the graphics program, and the written out in the CHG format used for normal structure storage.

(D) Graphical Display,

Molecules can be displayed as stick pictures, in which chemically bonded atoms are joined with a line. This line is split into two colours, indicating the types of atom that make up the bond. The colouring system normally depends upon atom type, and is generally consistent with that used by 3D2, although all halogens are coloured light blue, and the default colour (for atoms which do not have a specified colour) is orange. These colours are as follows : Green (carbon ), red ( oxygen ) dark blue ( nitrogen ), white ( hydrogen ), yellow ( sulphur ), purple ( phosphorous ) and as mentioned above, light blue for halogens. The display background is black to make the display easier to view, which is why this colour has not been used for carbon, as is sometimes the case. A second method of colouring atoms is possible, which is used if more than one molecule is being studied at once. This colours all the atoms in a particular molecule one colour, and colours different molecules ( or segments, as they are termed in MHCDraw) with different hues.

Alternatively, each atom can be displayed as a shaded sphere with a radius equal to the covalent bonding radius of the atom. This radius was chosen for reasons of speed, because it allow the picture to be constructed from a series of five concentric shaded circles, as opposed to constructing a molecular surface using individual points on the surface as with Connolly's MS [57] program. At this radius, there is only a small amount of overlap between adjacent atoms, so intersections on the surface do not need to be calculated, but the display still gives a reasonable idea of the space occupied by the atoms, and of the cavities between them. The atoms are sorted according to z - co-ordinate, and those at the back are drawn first, so that they do not obscure those at the front. The sorting is performed upon an array of pointers to the atoms, rather than by swapping the actual co-ordinates and connectivity information, so that the ordering of the molecule is preserved.

(DI) Three - Dimensional Aspects of the Display,

Most molecules are inherently three - dimensional, and as described in Chapter One, their 3D shape is a vital factor in determining their use as a drug, but it is impossible to represent this information truly on most graphical displays. Nevertheless, some indication of the 3D structure of a molecule can play a crucial role in studies such as inhibitor docking (i.e. visualising the active site of a protein, and positioning the inhibitor within it) and checking structural isomers to make sure that the correct molecule is being studied.

The program attempts to enhance the users perception of 3D in two ways. Firstly, the extensive colour map available on the GPX workstation allows depth - cueing of the atoms. This involves chosing a shade of the colour used for each atom that depends upon the z - co-ordinate. Atoms at the back of the molecules are coloured dimly, whilst at the front, they are bright. This makes the far side of the molecule fade into the distance, and hence makes those atoms which are nearer look closer. The range of shades can be adjusted by the user, so that the effect can either be slight, or conversely, so that the atoms at the back almost disappear.

The second method of enhancing the three - dimensionality of a molecule relies upon the fact that GKS allows the easy modification of line width, and so thicker lines are draw for bonds at the front of the molecule, and thinner ones at the back. This function is obtained with the auto-width on command, and should be followed by a second command ( width followed by an integer ) to tell the program the largest line width it should use. Normally, only four different widths are needed to obtain a realistic effect, especially when used used in conjunction with shading. This method is also of some use with the less versatile terminals in the laboratory, such as the Tektronix, from which it is simple to obtain hard copies.

(D2) Multiple Segments.

MHCDraw allows the user to read in and manipulate up to twenty molecules at once using the read I append command. Hence molecules can be superimposed and compared or added together and written out in a single file. Individual molecules are refered to by the program as segments, and it will give the user a list of the current segments with the showseg command.

(D3) Rotation of Molecules.

The molecular view can be changed by rotating the molecule with the rotate (or rot) command. With this, three angles have to be specified, these are the rotations about the x, y and z - axes respectively. Angles may be specified to be positive or negative (i.e. 315° is the same as - 45°) or greater than 360° if so desired.

The rotation is always done with respect to the initial view (i.e. the orientation read in from the file ). Hence, two commands of rot 0 0 30 will produce the same

Related documents