• No results found

I. SpinningTop

7. Algorithmic Design and Theory

7.6. Parameterisation

7.6.3. Non-bonded Terms

The non-bonded terms consist of van der Waals and electrostatic charge terms, that is an atom type and a point charge. For each atom in the target molecule, from the mappings provided by the subgraph isomorphism testing procedure, lists of all atom types and point charges mapped to it are created. The GROMOS force field uses atom types to keep track of the van der Waals parameters assigned to each atom. The atom type is a discrete value and so the atom type of the atom in the target molecule is set to the mode of the list of atom types. Point charges are more complex as they are continuous values. Experimentation showed that the distribution of mapped partial charges on a specific atom tend towards a narrow, unimodal distribution with the mean and median of the data set being approximately equal, when a sufficiently large number of fragments are mapped. As such, the mean of the mapped partial charges is used as the point charge on an atom. As the actual distribution of mapped partial charges is generally hidden from the user, the median, standard deviation, and sample size are provided in the output to allow the user some means of determining the reliability of each value.

7. Algorithmic Design and Theory

Due to the nature of the subgraph isomorphism testing, it is possible for an atom from a molecule in the Athenaeum to be mapped to a single atom in the target condensed molecular graph multiple times. Conceivably, this could result in a situation where the mean of the partial charges is distorted due to the large number of identical partial charges from a single fragment atom. This case is not taken into consideration here, but there are a number of ways that it could be. One method would be in the subgraph isomorphism testing phase. In this case, given a vertexvin multiple fragments of a molecule, if vmaps to a vertex in the target condensed molecular graph, all mappings except the largest (or smallest) fragment are discarded. In this way, the fragments mapped will be as large (or small) as possible, thereby reducing the number of multiple maps of a single fragment atom by a reduction in the number of fragments mapped. Another method would be to apply a diminishing returns to the count for each atom with multiple maps. That is, each additional mapping of an atom to a single vertex in the target molecular graph could be given a lesser weight. This could be treated in either a continuous or discrete manner. In the continuous manner, each additional mapping after the initial adds, for instance, some fraction of the partial charge of the atom to the sum of mapped partial charges. Of course, the count of mapped atoms will need to be increased by the same fraction. A discrete manner would increment the effective count of mapped atoms at certain actual count values. For instance, an implementation may require one mapped atom for the first effective count, four for the second, nine for the third and so on. In the limit, this can be seen as only counting each mapped atom once.

7.6.3.1. Charge Group Determination

The GROMOS force field makes use of the charge group concept, therefore it is necessary to arrange atoms into small groups based on the parameters produced by CherryPicker. When the partial charges of a group of atoms add up to exactly zero, the leading term of the electrostatic in- teraction between two such groups of atoms is of dipolar character. The sum of the1/rmonopole

contributions of the various atom pairs in the group-group interaction will be zero. Use of charge groups in calculating the electrostatic energy increases performance as fewer calculations are re- quired. Additionally, use of charge groups reduces errors and discontinuities in the energy as atoms move in and out of the cutoff radius. As such, it is desirable that the atoms of a molecule be organised into groups such that the sum of the group partial charges add up to, ideally, zero, or integer values otherwise. In their paper describing an algorithm for partitioning a molecule into charge groups, Canzaret al.describe why charge groups should not be too large, and that they should be connected.10 Namely, the effective cutoff distance of an individual atom in a given charge group is given by the cutoff distance minus the distance to the centre of geometry of the

7.6. Parameterisation

charge group. If the distance of an atom to the centre of geometry becomes large, the effective cutoff becomes small, leading to an undesirable increase in errors and discontinuities. Imposing the restriction that charge groups should be connected aids in this as connectivity imposes spatial proximity. A similar approach to the algorithm described by Canzaret al.is undertaken here to determine an optimal charge group assignment.

The algorithm described here is designed to subdivide a graphGintoN connected subgraphs which minimises some penalty function. The subgraphs ofG are all disjoint. That is, given any pair of subgraphs G1= (V1,E1) andG2= (V2,E2),V1∩V2=/0. The penalty function is

calculated for individual subgraphs, and summed together to give an overall penalty for the subdivision. Following the condition that charge groups should not be too large, the subgraphs have a soft maximum weightw. The weight of a vertex is the number of vertices contracted into a vertex, including the vertex itself, after a leaf contraction of the molecular graph is undertaken. A leaf contraction is a contraction of leaf vertices, i.e. those vertices with degree 1, into their parent vertex. This is primarily undertaken so as to ensure that hydrogen atoms are never in their own charge group. The soft nature ofwmeans that a given a graph with maximum degree of 4, a subgraph will have an upper weight limit ofw+3. That is, during the optimisation process, a subgraph with total weightw−1 could have a vertex with weight 4 (the vertex plus three leaves contracted into it) added to it, but a subgraph with total weightwwill not have any more vertices added. The penalty function used to determine optimal charge group partitioning is:

ρ(G) =

g∈S(G) |

i∈g qi−

i∈g FC(i)| (7.1)

whereρ(G)is the total penalty score for a subdivisionS(G)of graphG,gis a subgraph ofG, qiis the partial atomic charge on atomiand FC(i)is the formal charge of atomias determined

by the algorithm described in chapter 9. As part of the leaf contraction, the formal charge and partial charge of leaves that are contracted are added to those of their parent vertex.

LetG= (V,E) be a graph, with subgraphsG1=G[V] andG2=G[V \V]. It should be

obvious that the minimum penalty charge group partition ofG, CGP(G)can be given as min(CGP(G)) =min(CGP(G1)) +min(CGP(G2)). (7.2)

That is, a graph can be split into two subgraphs and the minimum penalty charge group parti- tioning of the graph can be obtained from the minimum penalty charge group partitioning of the two subgraphs. This fact lends itself to solving the optimisation problem using dynamic programming. By applying this fact recursively to the subgraphsG1andG2, the optimisation

7. Algorithmic Design and Theory

Algorithm 3Generate the optimal charge group partitioning of a graph 1: procedureCHARGEGROUPPARTITION(G)

2: functionCGP(B)

3: ifB≡/0then terminate recursion when bag is empty 4: return[], 0

5: else ifBhas been seenthen

6: returnseen_subgraphs[B] 7: end if

8: v←min(B) choose a vertex fromB

9: for allG= (V,E)CONNECTSUB(B\ {v},{v},N(v))do 10: best_part, best_penCGP(B\V)

11: g_pen←ρ(G)

12: ifbest_pen+g_pen<seen_subgraphs[B]then

13: seen_subgraphs[B][V]+best_part, best_pen+g_pen 14: end if

15: end for

16: returnseen_subgraphs[B] 17: end function

18: seen_subgraphs = dict() associative array to store seen subgraphs 19: returnCGP(V)

20: end procedure

this is given in algorithm 3. Though this algorithm is guaranteed to give an optimal charge group partition, the optimal charge group partition is not guaranteed to contain only charge groups with integer values.

To further improve the performance of the algorithm, an upper limit on the size of a graph to subdivide can be provided. Then, if a graph is larger than this size, it is recursively split into approximately even sized subgraphs until the subgraph size is less than this upper limit. In such a case, the charge group partition will not necessarily be optimal, but assuming the upper limit is much larger thanw, it will be a good approximation.

7.6.3.2. Charge Redistribution

Adding partial atomic charges from different sources or, as is the case here, adding means of lists of partial atomic charges is likely to result in non-integer total molecular charge once all atoms in the target molecule have been processed. There are two ways that this could be handled, a global scaling of all partial atomic charges, or more localised scaling of specific partial atomic

7.6. Parameterisation

charges. A global scaling method would scale each partial atomic charge by the value:

qscaled=

qraw·Qi∈Gqi

(7.3) whereqscaled is the scaled partial atomic charge of an atom, qraw is the initial partial atomic

charge of an atom,Qis the target molecular charge and∑i∈Gqiis the total molecular charge as

given by the sum of initial partial atomic charges of all atoms. Because partial atomic charges are given to only a few decimal places (between three and five), when there is a small difference between the target molecular charge and the initial total molecular charge, the scaling of each individual partial charge may not be sufficient for changes to be noticeable within the number of decimal points output. As such, a more localised charge redistribution scheme is utilised here.

The charge redistribution scheme developed here applies a number of rules to individual atoms and small groups of atoms. These rules generate a matrix equation of the formAx=bwherex is the partial charge changes that need to be applied to each atom, which can then be solved for xusing linear least squares. Each atom in the target molecule has a column inA, and the rows are the results of applying the rules to the molecule. If a certain rule does not apply for an atom, it is given a value of0in that row, otherwise the value is that determined by the rule. The rules used are formalised in equations 7.4 to 7.11 and are described below.

qreq=Q−

i∈G qi (7.4) qCGf−qCG0=0 ifqCG0 Z (7.5) p(CGf) =p(CG0) (7.6) Δqi=0 ifiis pre-mapped (7.7) Δqi=0 ifqi=0 (7.8) Δqi=0 if(χi−max n∈N(i)(χn)0.5)(qreq<0) (7.9) Δqi=0 if(χi−max n∈N(i)(χn)0.5)(qreq>0) (7.10) Δqiqj if j∈symm(i) (7.11)

The rules can be divided up into global rules, charge group rules, and atomic rules.

Global Rules Possibly the most important rule is that the amount of additional charge that should be added to the molecule, qreq, needs to be equal to the difference between the target

molecular charge,Q, and the total initial charge∑qi, as given by equation 7.4. If this rule is not

7. Algorithmic Design and Theory

black hole.

Charge Group Rules An initial charge group partitioning can be performed using the initial partial atomic charges. The charge groups given by this partitioning can then be used to apply charge group based rules. Two such rules are developed, equations 7.5 and 7.6, though only the first is generally used.

Equation 7.5 shows that if a given charge group has an integer total charge, after the redistri- bution of charge on the molecule, it should retain its integer charge. This rule does not preclude the addition of charge to atoms within this charge group as a small partial charge added to one atom can be balanced by subtracting the same magnitude partial charge from another in the group.

Equation 7.6 shows that the dipole of a charge group, p(CG), should remain constant with charge redistribution. As a dipole is conformationally dependent, this rule is not generally utilised.

Atomic Rules Atomic rules apply to individual or pairs of vertices of the molecular graph. Equation 7.7 shows that if a vertexihas been mapped using the first Athenaeum, its charge,qi,

should remain constant through charge redistribution. This rule follows from the spirit of the first Athenaeum as a source of well tested parameters. Allowing charge to be added to such a pre-mapped vertex would mean that the parameters would no longer match those of the source fragments.

Equation 7.8 shows that atoms with zero total partial charge should not have additional charge added to them. Generally this would be unnecessary as an explicit rule as any changes would be minor. However, it is provided for as sometimes it may be desired.

Equations 7.9 and 7.10 describe the situation when an atom is involved in a polar bond, defined as a bond where the magnitude of the difference in electronegativities, χ, of the two atoms making up the bond is larger than 0.5. If a total positive additional charge is required, additional charge is only added to the least electronegative atom of a polar bond, and if a total negative additional charge is required, additional charge is only added to the most electronegative atom of a polar bond.

The final rule is given by equation 7.11 which shows that if two vertices are determined to be symmetric partners of one another, the total charge added to one of the vertices must be equal to the total charge added to the other. This ensures that the symmetry of the molecule is maintained. Finalising Charges and Charge Groups The matrix formed by applying the rules described above is solved using linear least squares. Once solved, this gives the amount of additional

7.6. Parameterisation 1 2 3 4 5 6 7 8 9 1,2 2,3 3,4 4,5 1,5 1,6 6,7 6,8 6,9

Figure 7.13:Example of a line graph. The graphGis drawn in black with the line graphL(G)superim- posed in blue.

charge that needs to be added to each atom in the molecule. Due to rounding errors, adding these additional partial charges to their atoms may result in very minor differences between the target charge and the total charge. To deal with this, any remaining charge once additional charge has been added and the new partial charges rounded to the correct number of decimal points is added to either the single atom with largest signed partial charge, or the set of symmetric partners. In this way, the symmetry of the molecule is retained. With final charges determined, the charge groups can be recalculated.