• No results found

I. SpinningTop

8. Testing and Discussion

8.1.2. Fragment Generation

How fragments are generated can have an effect on the ability of a given Athenaeum to pa- rameterise certain molecules. Namely, situations may arise whereby even though all atoms of a molecule are mapped to at least one fragment in the Athenaeum, some bonds and angles do not have corresponding mapped bonds or angles in any fragments of the Athenaeum. Due to the rules introduced in section 7.4.1.2 for checking the validity of fragments containing cycles or portions of cycles, it is possible for rotatable bonds between two cycles to not have any mapped fragments unless both cycles are fully contained within one fragment. Such a case occurs with molecule XII. As a means to work around this limitation, an additionalFULLY_FRAGMENTswitch was created which removes the cyclic rule when generating an Athenaeum. This allows for par-

8.1. Algorithm Optimisation

Figure 8.4:Distributions of the diameter of charge groups as determined withwvalues from (a) 1 to (j) 10. The filled blue histograms show the charge group diameter distribution produced for eachwvalue and the green outlines show the reference ATB distribution. The axes are scaled to fit the reference values vertically and the largest calculated value horizontally. Single atom charge groups are excluded due to their diameter being zero.

8. Testing and Discussion

Figure 8.5:Edge terminated version of the fragment in figure 7.11. The subgraph under consideration as a fragment is given in black. The blue vertices and edges are the overlap region from edges not incident to a cycle, withk=2. The green vertex and edge are the overlap region from edges incident to a cycle, withkc=1. Vertices and edges not part of the fragment and overlap region are in grey.Dashed edges are the terminal edges of the fragment.

tial cycles which will provide parameters for the rotatable bonds. A mid-point between following the validity rules of cycles laid out in section 7.4.1.2 and allowing full fragmentation of cycles would be to only allow full fragmentation of cycles when there is at least one complete cycle in the fragment.

Another potential means of dealing with this issue is through modifying the way in which the relationship between fragments and their overlap region is described. Instead of being vertex terminated at the boundary with their overlap region, fragments could be edge terminated. An example of such a fragment is given in figure 8.5. Effectively, this would mean that bonded parameters associated with the edges between a fragment and its overlap region are kept as part of the fragment mapping. The parameter inheritance scheme described in section 7.6 requires that for a given bonded term, all vertices between which the term is defined must be contained within the fragment. By switching to an edge terminated fragment description, the requirement would change to allowing one of the vertices to be in the overlap region. Of course, the vertex in the overlap region would not contribute to the non-bonded parameters. For the tests undertaken here, this method has not been implemented, with the issue dealt with through allowing full cycle fragmentation.

8.1. Algorithm Optimisation

8.1.3. Fragment Size

The size of a fragment can be described as either a strict count on the number of atoms in the fragment, or as a determination of the molecular mass of a fragment. The strict counting scheme is used here as a count scheme is more desirable due to its inherit ability to ensure all fragments contain bonded terms. Limits to fragment sizes come in two forms: a lower size limit and an upper size limit. With the use of a counting scheme, a lower size limit is effectively enforced by the requirement for bonded parameters to be mapped. Though dihedral terms are mapped independently of the Athenaeum fragments, it may be desirable for the fragments to be large enough, on average, to define a dihedral term. Effectively, this means each fragment should have a path between two atoms with a length of at least four. Excluding hydrogen, the atoms of molecules within SRC9064 have an average degree of 2.889. Rounding this up to 3, it can be seen that, on average, the smallest fragment containing a given atom with a path length of at least four would require a minimum size of five. Of course, this value can be increased as desired; five is just the absolute smallest recommended fragment size.

There is no obvious driving force towards providing an upper fragment size limit. A given Athenaeum has an implicit upper size limit based on a combination of source molecule sizes and overlap length. The upper size limit would be redundant if it was larger than the size of the largest source molecule, and the overlap length reduces this limit as any fragments must have an overlap. As there is no fundamental reason for a fragment size upper limit, tests presented have no upper size limit enforced.

8.1.4. Overlap

The length of the overlap used when generating an Athenaeum plays a major role in the ability of an Athenaeum to map to a target molecule, and the reliability of the parameters produced in the end. To gain insight into the effect of overlap length on the applicability of parameters pro- duced by the CherryPicker algorithm, both the bonded and non-bonded terms are independently investigated.

8.1.4.1. Bonded Terms

To investigate the applicability of bonded terms, four Athenaeums with different overlap lengths were generated using the SRC9064 molecules. Full fragmentation of cycles was allowed in all cases. Overlap lengths used were between zero and three inclusive. Each Athenaeum was mapped to all molecules in CPT023, as well as to an additional ten large molecules*, with a *cholesteryl chloroformate, cis-13-docosenoyl chloride, 9Z,12Z-octadecadienoyl chloride, N,N,N,N-

8. Testing and Discussion

Figure 8.6: Distributions of the number of potential bond parameters mapped to all molecules within CPT023 using Athenaeums with overlap lengths of between (a) 0 and (d) 3. Athenaeums were generated using SRC9064 with full cyclic fragmentation enabled, and the mapped UA parameters counted. The ver- tical magenta line shows the mean number of matched parameter types for each Athenaeum. Numerical values for the mean (μ) and standard deviation (σ) of the distributions are provided.

bit-mask of4261937151†for vertex mapping and28‡for edge mapping, and a lower fragment size limit of five. The number of potential bond and angle parameters for each bond and angle were then counted and distributions plotted. These distributions are shown in figure 8.6 for the bond parameters and figure 8.7 for the angle parameters.

Both of these figures show the same trends, though the statistics of bond parameter distribu- tions are always noticeably lower than the corresponding angle parameter distribution. With an overlap of zero, the distribution is very broad with a high mean value. With an overlap of one, butylphenylazo)phenoxy)undecyl ester, pentacontane, and palytoxin. Except for pentacontane, these molecules are not a part of CPT023 as SRC9064 does not produce fragments which can fully map them. Pentacontane is a straight chain alkane, which is well represented in the other molecules, and was used purely for initial development purposes.

all available information except stereo chemistry and vertex degree.only bond order.

8.1. Algorithm Optimisation

Figure 8.7: Distributions of the number of potential angle parameters mapped to all molecules within CPT023 using Athenaeums with overlap lengths of between (a) 0 and (d) 3. Athenaeums were generated using SRC9064 with full cyclic fragmentation enabled, and the mapped UA parameters counted. The ver- tical magenta line shows the mean number of matched parameter types for each Athenaeum. Numerical values for the mean (μ) and standard deviation (σ) of the distributions are provided.

the distribution becomes concentrated at lower values, but still has a long tail of high numbers of matched bonded parameters. Overlaps of two and three continue this trend with increased con- centration of the distribution at lower values, and a diminishing of the tail. Generally speaking, a large number of potential bonded parameters is undesirable as such a situation indicates that fragments are mapped which poorly match the surrounding environment. An overlap of zero clearly shows this, as there is no accounting for the environment around a fragment in the map- ping process. The long tail of the distributions with overlap one show that even though having an overlap of one is a vast improvement on having no overlap, a larger overlap should generally be desired.

8. Testing and Discussion

8.1.4.2. Non-bonded Terms

To investigate the applicability of non-bonded terms, namely the atomic point charges, a slightly different approach was taken. Instead of generating an Athenaeum and performing mapping to a target molecule, three single atom fragments, with overlap lengths of zero, one, and two, were chosen and mapped to all the molecules in SRC9064. The use of a lower size limit means that a single atom would never be mapped in a general mapping procedure, however it does provide a simple means to investigate the effect of increasing overlap length. These mappings were performed with only element and formal charge for vertices, and bond order for edges. The atom fragments chosen were: the hydrogen of a methyl group bonded to another carbon; a carbon within an aromatic carbon ring, with a hydrogen substituent only; and the neutral oxygen of a negatively charged carboxylic acid functional group, bonded to another carbon. For each fragment and overlap combination, the point charges for all matching atoms within SRC9064 were plotted in a distribution. These distributions are given in figure 8.8.

With an overlap of zero we see that the methyl hydrogen has a bimodal distribution (fig- ure 8.8a) due to the inability of a fragment with zero overlap to be able to distinguish between polar and non-polar hydrogens. The aromatic carbon has a distribution skewed towards negative partial atomic charges (figure 8.8d), and the carboxylic oxygen has a broad base with a sharp central peak (figure 8.8g). All distributions have a reasonably large difference between their mean and median. With an overlap of one, the methyl hydrogen distribution becomes unimodal as polar and non-polar hydrogen atoms are now able to be distinguished, however the carboxylic oxygen distribution becomes bimodal. This bimodality comes about due to being unable to distinguish between oxygens of protonated carboxylic acids and deprotonated carboxylic acids. Additionally, the distributions are narrower and means and medians are a lot closer together. Finally, with an overlap of two, all distributions have a relatively small standard deviation and almost identical mean and median. In combination with the bond parameter results, this indi- cates that an acceptable overlap length to use is at least two.

8.2. Parameterisation Tests

In order to evaluate the parameter sets returned by the CherryPicker algorithm, several testing methods were devised: a static test to determine how well a QM minimised structure is main- tained (section 8.2.1), a dynamic test to determine deviations of bond lengths and angles from a QM minimised structure (section 8.2.2), and a test comparing with experimental NMR data (section 8.2.3).

To perform these tests, five Athenaeums were generated from SRC9064 with overlaps of

8.2. Parameterisation Tests

Figure 8.8:Distributions of partial atomic charges mapped to single atom target fragments with varying degrees of overlap. The central atom of each fragment was (a) to (c) a methyl hydrogen, (d) to (f) an aromatic carbon and (g) to (i) a carboxylic acid oxygen. Each charge distribution obtained by mapping the single atom fragment to the molecules in SRC9064 is plotted as a blue histogram, with the cyan curves being the gaussian formed with the distributions’ mean (μ) and standard deviation (σ). Vertical lines are given in magenta for the mean, and green for the median (ν) of the distribution. The counts (y-axis) have been normalised so that the largest count becomes 1.0, with the actual counts (N) given.

8. Testing and Discussion

Table 8.1:Molecules successfully parameterised by each Athenaeum.

Athenaeum Parameterised molecules

OLP0 I, II, III, IV, V, VI, VII, VIII, IX, X, XI, XII, XIII, XIV, XV, XVI, XVII, XVIII, XIX, XX, XXI, XXII, XXIII

OLP1 I, II, III, IV, VII, VIII, IX, X, XII, XIV, XVI, XVII, XX, XXI, XXII, XXIII

OLP2 I, III, VII, VIII, IX, X, XVI, XXII OLP3 I, III, VII, VIII, IX, XVI

OLP4 I, III, VII, VIII, IX, XVI

between zero and four (OLP0 to OLP4) inclusive. Full cyclic fragmentation was allowed in the fragment generation stage. Parameterisations were then performed by mapping each molecule in CPT023 with each Athenaeum. Additionally, each molecule was submitted to the ATB for parameterisation at either its QM0 or QM1 level, depending on the number of atoms in the molecule, so as to provide comparative parameters. Due to the higher level of theory used for generation of the parameters for molecules in SRC9064, it is expected that the CherryPicker parameters will out perform the ATB source parameters.

Most Athenaeums were unable to fully parameterise all of the molecules, due to the set of molecules in SRC9064 being what was available rather than specifically designed for fragment based parameterisation. The molecules each Athenaeum successfully parameterised are given in table 8.1. In all of the tests outlined below, the molecular mechanics calculations were performed using the GROMOS molecular dynamics engine.

8.2.1. Structural Minimisation

A simple test of a parameter set is to determine if, when the structure is energy minimised using the assigned parameter set, the minimised structure is chemically reasonable.

Method To provide a reference, chemically reasonable, structure, each molecule in CPT023 was QM structurally optimised at the B3LYP/6-31G(d) level of theory using GAMESS-US. With the optimised structure as the initial conformation, molecular mechanics energy minimi- sation was performed using the parameter set assigned from each Athenaeum, and the resulting minimised structure rotated to align with the QM optimised structure. A RMSD between the two conformations was calculated. Results are given in table 8.2. As a comparison, for each

8.2. Parameterisation Tests

Table 8.2:RMSD between QM optimised structure and molecular mechanics optimised structure. Values are in nm.

CherryPicker ATB

Athenaeum Min Mean Max Min Mean Max

OLP0 0.0069 0.0282 0.1412 0.0091 0.0347 0.1604 OLP1 0.0072 0.0246 0.1136 0.0114 0.0263 0.0550 OLP2 0.0133 0.0233 0.0378 0.0172 0.0258 0.0389 OLP3 0.0131 0.0222 0.0406 0.0172 0.0242 0.0330 OLP4 0.0149 0.0239 0.0450 0.0172 0.0242 0.0330

Athenaeum, the RMSD values obtained when minimising the subset of molecules that was pa- rameterised by that Athenaeum but using the parameters obtained from the ATB are also given. Results The smaller the RMSD, the closer the molecular mechanics minimised structure matches the reference QM structure. As such, a small RMSD shows that, the parameters as- signed using CherryPicker obtain a local minimum conformation which is similar to the QM op- timised structure. The results in table 8.2 show that, even with an Athenaeum generated with an overlap of zero or one, the CherryPicker parametrisation algorithm generates parameters which are better at maintaining the QM optimised geometry than the corresponding ATB parameters. Though the sample sizes are small, this result does bode well for the CherryPicker algorithm as it produces parameters which rival the performance of ATB generated parameters at a fraction of the computational cost, especially when the molecule has a large number of atoms. For exam- ple, a parametrisation of palytoxin by the ATB takes nearly 13 hours, whereas the CherryPicker algorithm can complete the parametrisation in approximately 10 minutes. The results also rein- force the requirement that an Athenaeum should be generated with an overlap length of at least two. Though OLP0 and OLP1 both have molecules with very low RMSD values, they also have molecules with much larger RMSD values than the Athenaeums with longer overlaps. It is more desirable to use an Athenaeum which produces reasonable parameters for molecules in general, rather than good parameters for a few types of molecules and bad parameters for the rest.

8.2.2. Dynamic Stability

Static stability, such as that given by the structural minimisation, is a useful indicator that pa- rameters are at least reasonable. However, they should also act reasonably during a simulation. Method To investigate this, seven of the eight molecules successfully parameterised using the OLP2 Athenaeum were simulated for 50 ns, using both the OLP2 parameters and the ATB

8. Testing and Discussion

parameters. Though molecule III was completely parameterised using the OLP2 Athenaeum, simulations using both that parameter set and the ATB parameter set were unable to be un- dertaken as there were issues with the SHAKE algorithm applying constraints to some solvent molecules. Each molecule was solvated in a cubic box of chloroform with at least 15 Å between the box walls and the molecule. Initial velocities were generated from a Maxwell distribution at 60 K. The temperature was increased to 300 K in steps of 60 K, with a short 10 ps simulation undertaken at each intermediate step. Temperature was maintained by coupling to a Berendsen thermostat. No pressure coupling was involved. A twin range cut-off scheme was used, with a short range cut-off of 8 Å calculated every time step, and a long range cut-off of 14 Å calculated every five steps along with the pairlist update. Bond length constraints were applied to only the solvent molecules using SHAKE. As such, a time step of 0.5 fs was used. Conformations were printed out every 0.5 ps The resulting simulations are analysed in this and the proceeding section.

Results Here, we look at the variation in bond length and angle over the course of the simu- lation. For each frame of the simulation, the percentage deviation of every bond and angle from the QM optimised structure was calculated. For each bond and angle, a mean percentage devia- tion was calculated. An overall molecular mean was calculated from these values. These results are given in table 8.3. Over the course of a simulation, bond lengths and angles are expected to vibrate around an equilibrium value, which is here referenced as the corresponding value in the QM optimised structure. As such, smaller average percentage deviations imply, though do not guarantee, that the assigned parameter sets are appropriate. Use of a percentage deviation, as opposed to absolute deviation, accounts for bonds or angles with weaker force constants having larger absolute deviations as they would generally have longer equilibrium values, which means large absolute deviations are a smaller percentage deviation.

Generally, the results between the OLP2 parameters and the ATB parameters are fairly similar. Occasionally, the ATB results have less variation than the OLP2 results, and vice versa. The mean bond length deviation tends to sit between 1.5 % and 2.0 %. The ATB parameters stray outside these bounds with molecules IX and X. The most notable deviation is found with the OLP2 parameters and molecule XVI. In this case, the mean deviation is noticeably raised by