• No results found

Accuracy Test of the Force Field Based Quasi-Chemical Method PAC-MAC

Chapter III

ABSTRACT

We present a comprehensive evaluation of the recently developed Pair Configurations to Molecular Activity Coefficients (PAC-MAC) method. PAC-MAC is a force field based quasi-chemical method for rapid calculation of binary phase diagrams. The accuracy of the method is tested by comparing the calculated excess mixing free energy with experimental data for 1092 binary mixtures. The root mean squared error (RMSE) is shown to be 0.15 kBT. Furthermore, a comparison with UNIFAC and molecular simulations is performed. UNIFAC shows a significantly higher accuracy (RMSE: 0.07 kBT), whereas molecular simulations lead to comparable results. The accuracy of both molecular simulations as well as PAC- MAC depends highly on the used force field. The binary parameters of UNIFAC are optimized using experimental miscibility data, whereas the force field parameters are not. Therefore, a better performance of UNIFAC is expected. A concise dataset shows the capacity of PAC-MAC in predicting results obtained using Monte Carlo simulations.

This work is published: Sweere, A. J. M.; Serral Gracia, R.; Fraaije, J. G. E. M. Extensive Accuracy Test of the Force-Field-Based Quasichemical Method PAC- MAC. J. Chem. Eng. Data 2016, 61, 3989-3997

Accuracy Test of the Force Field Based Quasi-Chemical Method PAC-MAC

INTRODUCTION

A predictive model for the calculation of thermodynamic miscibility properties is the key to interpret, predict or replace chemical experiments. Throughout the years, several methods have been developed, with pros and cons for each of them. UNIFAC1, COSMO-RS2, 3 and SAFT4, 5 are revered for their accuracy and calculation time. However, a common drawback of these methods is the lack of a three-dimensional molecular structure. Molecular dynamics6, on the other hand, explicitly comprises the configurational information, but at the cost of extensive computations. A fast model, in which also the molecular conformation is taken into account, is the primary objective of the Pair Configurations to Molecular Activity Coefficients (PAC-MAC) method presented in Chapter II.

PAC-MAC is a force field based quasi-chemical method for the calculation of activity coefficients of multi-component mixtures. A force field refers to simplified potential energy functions and corresponding parameters for the calculation of the potential energy of a molecular system. The quasi-chemical approximation of Guggenheim7 refers to an improvement of the regular solution model. The regular solution model assumes a random distribution of pairwise interactions between sites or molecules, whereas the quasi-chemical approximation contains a better description of the pair distribution by taking intermolecular pair energies into account.

The operational procedure of PAC-MAC consists of three steps. The first step is defining the surface of an optimized molecular conformation for all components within the mixture. The second step involves the sampling of many 4m (with m = 105) adjacent molecular pair configurations, representing the possible observable configurations of two molecules within the first coordination shell of a mixture in the condensed phase. For each sampled pair, the intermolecular energy is calculated using classical force fields, by default the OPLS all-atom (OPLS-AA) force field8, and the portion of the surface occupied by the neighboring molecule is tracked. In the final step, an expression for the free energy, based on the quasi-chemical approximation of Guggenheim7, is minimized with respect to the pair probabilities, subject to a constraint stating the molecular surface to be fully occupied by neighboring molecules. The constraint is added in order to obtain a proper packing of the molecules, because it is assumed that a molecule in the condensed phase is completely surrounded, even if the molecule contains associative sites. Activity coefficients and corresponding miscibility properties are derived from the expression of the free energy.

As shown in Chapter II, sampling of the required 4·105 pair configurations in PAC-MAC takes about 5 minutes on a single core (Intel® Xeon® E5-2620) and the additional calculation time to obtain activity coefficients is about 20 seconds for each chosen mole fraction. Monte Carlo (MC) simulations often require about 109 MC steps to obtain a precise result for a single mole fraction9, implying a great reduction of calculation time with the PAC-MAC model for the prediction of miscibility properties. A proof of concept of the PAC-MAC model was provided by

Chapter III

comparing the obtained results with experimental data from the NIST Thermodynamic Data Engine database10 for a concise set of binary mixtures.

In Chapter III, we extend the accuracy test of PAC-MAC by comparing the obtained excess Gibbs free energy of mixing with experimental data for a diverse dataset containing 1092 binary mixtures. In the extension, we have found it expedient to introduce two modifications to the original PAC-MAC method. The first modification involves the introduction of vacuum surface elements to soften the constraint on molecular surface coverage. The other modification involves the Staverman-Guggenheim expression11 rewritten in PAC-MAC variables, to have a description of the combinatorial entropy of athermal mixtures. The obtained results are compared with UNIFAC12-14 and, for a few binary mixtures, also with molecular simulations9, 15, 16. The derivation of the Staverman-Guggenheim expression, the derivation of the expression for the activity coefficients and calculated data are included in the Supporting Information.

Accuracy Test of the Force Field Based Quasi-Chemical Method PAC-MAC

THEORETICAL BASIS

The PAC-MAC method has been extensively discussed in Chapter II, to which we refer for all details. The general outline of PAC-MAC is formulated in this section; two modifications to the original model are explained in detail. The first adjustment involves the introduction of a vacuum surface fraction to allow partial occupation around the molecule. The other adjustment involves a reformulation of the Staverman-Guggenheim11 expression. The method consists of three parts, explained in three subsections. The first subsection (“Tessellation of the molecular surface”) presents our definition of the molecular surface and the ensuing tessellation of the surface. The second section (“Sampling of pair configurations”) explains the sampling procedure of the pair configurations. The last section (“Solving system of equations”) reveals the expression for the free energy.

Tessellation of the molecular surface

First, the molecular surfaces of all components in the mixture are required to perform a PAC-MAC calculation for obtaining miscibility properties. The molecular surface is defined by the exterior of spheres centered on the atomic nuclei17. The radius of the sphere around atom k is given by:

0.66

k kk

R   (1)

Where kk is the Lennard-Jones distance parameter and the factor 0.66 (referred

to as sseg in the Chapter II) is an atom-type independent scaling coefficient

obtained by fitting the coordination numbers of 12 selected solvent molecules calculated using molecular dynamics (MD) simulations. The molecular surface is divided into polygonal surface panels (mostly pentagon- and hexagon-shaped) with an area I

j

A that is approximately equal to 0.5 Å2. The chosen surface panel size is a trade-off between a fast PAC-MAC calculation (large size) and a high resolution of the molecule (small size). In Chapter II, we used 200 surface panels independent of the size of the molecule. A panel size of 0.5 Å2 results in a comparable resolution for small molecules like acetonitrile (191 surface panels) and ethanol (206 surface panels).

Various algorithms are developed to create a uniform distribution of panels over the molecular surface3, 18, 19. Within PAC-MAC, the tessellation is based on the spherical Fibonacci mapping technique20. We generate a spiral over the convex hull of the molecular surface with a separation distance of 0.8 Å between the turns. A convex hull is equal to the solvent accessible surface using a spherical probe with an infinite radius21. Subsequently, the centers of the panels are placed on the spiral separated by a distance of 0.8 Å from each other. Then, by using a repulsive force between the centers of the panels a more uniform distribution is obtained. Finally, the polygonal surface panels are obtained using a Voronoi diagram22 and the vertices

Chapter III

of the panels are translated from the convex hull to the molecular surface. Figure 1 shows the tessellation of the molecular surface of acetonitrile (total area: 100.9 Å2) into 191 surface panels.

Figure 1. Acetonitrile molecule. Left: Ball-and-stick representation. Right: Molecular surface in PAC-MAC, divided into 191 surface panels of approximately 0.5 Å2.

Sampling of pair configurations

In the PAC-MAC method, m pair configurations are sampled, with m = 105 by default, between all possible combinations of adjacent molecules. The sampling procedure consists of 6 steps:

The vertices of the polygonal surface panels are translated from the molecular surface to the convex hull of the surface to avoid indentation.

A random point on the convex hull surface is chosen on both molecules.

The molecules are rotated around their geometrical center in order to obtain the normal vector of both points on the convex hull surface pointing in opposite direction.

The molecules are translated in order to have both points on the convex hull surface on the same location.

Both molecules obtain a random rotation around the normal vector.

Both molecules are randomly translated along the normal vector according to a uniform distribution in an interval controlled by atom type-independent distance scale parameters.

Accuracy Test of the Force Field Based Quasi-Chemical Method PAC-MAC

Figure 2. Pair sampling procedure. 1) Convex hull around the molecular surface. 2) Choosing a random point on the convex hull surface. 3) Rotation of molecules to obtain two normal vectors in opposite direction. 4) Translation of molecules to have both points on the same location. 5) Rotation of molecules around their normal vector. 6) Random translation in interval. Colors of surface panels refer to the closest atomic centers: hydrogen (white), carbon (gray), nitrogen (blue) or oxygen (red).

Two calculations are performed for each sampled pair configuration. The first calculation is the intermolecular energy wi using the OPLS-AA force field

8. Within the OPLS-AA force field, the total intermolecular energy consists of Coulomb and Lennard-Jones interactions. Hydrogen bonding effects are included within the point charges of the atoms. The second calculation involves tracking of the surface panels that are blocked (we also refer to such panels as ‘occupied’) by the neighboring molecule. A surface panel is blocked if its normal vector, starting from the center of the panel and perpendicular to the convex hull surface, pierces the molecular surface of the neighboring molecule. Figure 3 shows the intermolecular energy and occupied surface panels of the pair configuration obtained in Figure 2.

Chapter III

Figure 3. Calculation of the intermolecular energy and blocked surface panels (black colored) for a sampled pair configuration.

Solving system of equations The probability IJ

i

x to obtain the sampled pair configuration i between molecules I and J is calculated by minimizing an expression for the free energy. The expression for the free energy is given by:

mix id SG comb vac int

FFFFFF (2)

id

F is the entropy of mixing of an ideal solution:

 

, ln id I I I A B B F x x Nk T  

(3)

In Equation (3), xI is the mole fraction of component I in the mixture of

molecules A and B. FSG

is the Staverman-Guggenheim correction (see Supporting Information): , , ln ln SG I I I I I I I A B I A B B I I I I x y F x y Nk T x y                 

(4)

In Equation (4), I and yI are respectively the volume fraction and coordination

fraction of component I in the mixture. Equation (4) is an analytically derived expression to describe the total number of possible ways to arrange molecules on a lattice, taking size and shape into account23. The quasi-chemical combinatorial term is7:

Accuracy Test of the Force Field Based Quasi-Chemical Method PAC-MAC , , 1 1 ln 2 IJ comb m IJ i i I A B J A B i B I J x m F z x Nk T    y y        

  

(5)

In Equation (5), z is the coordination number. We also introduce in this chapter the term vacuum fraction: the fraction of the molecular surface that is not in contact with a neighboring molecule. Because of steric effects, the molecular surface will never be fully occupied by surrounding molecules in the first coordination shell. Even in the fcc close-packed structure of equal spheres, 20% of the area is not covered by neighboring spheres, see the illustration in Figure 4.

Figure 4. Fcc close-packing of equal spheres. The green spheres cover 80.4% of the surface of the central red sphere.

We assume that no intermolecular energy is involved in the vacuum-contribution. This leads to a free energy contribution that is purely entropic:

, , , 1 ln I I vac L vac j j I vac I j I I vac I A B j B i I A x F x x Nk T   A x      

 

(6) In Equation (6), , vac j I

x is the vacuum surface fraction of surface panel j, xIvac being

the average vacuum surface fraction of molecule I, I j

A being the surface area of surface panel j and I

i

A

  being the average occupied surface area in an interaction on molecule I. Fint is the total intermolecular energy of the binary mixture:

, , 1 1 2 IJ int m IJ i i I A B J A B i B B w F z x Nk T    k T

  

(7)

Chapter III

In Equation (7), IJ i

w is the intermolecular energy of pair configuration i between molecules I and J calculated using the OPLS-AA force field.

The vacuum terms are absent in Chapter II. Because of the vacuum inclusion, the surface coverage constraint is now that each surface panel is covered by neighboring molecules or by the panel vacuum fraction:

, , 1 , 1 1 1 1.. , , 2 2 m m IJ IJ JI JI vac i ij i ij I j I I I J A B i J A B i z x o z x o x x x j L I A B           

 

 

(8)

So, the introduction of a vacuum fraction weakened the former strict constraint stating the molecule to be completely surrounded.

The free energy, Equation (2), is minimized by optimizing variables xiIJ, , vac j I

x and z subject to the constraint, Equation (8). Finally, the expression for the activity coefficient of molecule I

A B,

is obtained using the “intercept rule”:

 

ln 1 ex ex B I I B I F Nk T F x Nk T x             (9)

We refer to the Supporting Information for more details concerning the derivation of the activity coefficients, Equation (9), in PAC-MAC.

Accuracy Test of the Force Field Based Quasi-Chemical Method PAC-MAC

RESULTS AND DISCUSSION

A quantitative comparison of the original PAC-MAC model is performed by comparing the results for the excess free energy with experimental data from the NIST database10. The experimental dataset consists of 1092 vapor-liquid equilibrium diagrams of binary mixtures at constant temperature (Pxy diagrams) and contains 166 different molecules. The distribution of the combinations of molecules is depicted in Figure 5. The 2184 components in the experimental dataset are clustered in 9 categories. The 12 ketones, 4 aldehydes, 16 ethers and 11 esters in the set are grouped in the category “molecules containing oxygen (excluding alcohols)”. The 10 amines, 3 nitriles and 3 nitrogen-containing aromatic compounds are grouped in the category “molecules containing nitrogen”. Three examples of the 22 molecules with different or multiple functional groups are DMSO, 2-aminoethanol and nitrobenzene.

Figure 5. Distribution of the 1092 binary mixtures in the experimental dataset. Molecules are clustered in 9 categories.

The temperatures in the experimental dataset are on average 336.3 K and 95% of the binary mixtures have a temperature between 280 K and 400 K. The temperature distribution of the 1092 Pxy diagrams is presented by a histogram in Figure 6.

Chapter III

Figure 6. Temperature distribution of the experimental dataset of 1092 Pxy diagrams. Average temperature: 336.3 K, lowest temperature: 253.2 K (acetone - hexane), highest temperature: 523.1 K (quinoline - meta-cresol).

Using the extended Raoult’s Law, assuming gaseous ideality, the activity coefficient of component I is calculated from the experimental Pxy diagram24:

 

1

gas liq I I I liq liq I I x P x x P x    (10) With gas I x and liq I

x representing the mole fraction of component I in respectively the vapor and liquid phase,

 

liq

I

P x is the vapor pressure of the binary mixture at mole fraction liq

I

x . Subsequently, the experimental excess free energy of a binary mixture containing molecules A and B is calculated using:

 

 

ln ln

ex

exp liq liq

A A B B

B

F

x x

k T     (11)

The PAC-MAC model is tested by comparing the calculated excess free energy,

ex PAC MAC

F  , with Fexpex at a mole fraction of liq A

x = xliqB = 0.5 for all 1092 data points. If

the activity coefficients A and B are not available at liq A

x = xBliq = 0.5, then an

estimation is performed by linear interpolation between the nearest neighboring activity coefficients. The accuracy is represented by the correlation coefficient and the root mean squared error (RMSE) given by:

Accuracy Test of the Force Field Based Quasi-Chemical Method PAC-MAC

2 , , 1 1 n ex ex

PAC MAC k exp k k

RMSE F F

n  

 (12)

With n being the number of data points and k being the index of a data point. The experimental values for the excess free energy of mixing are included in the Supporting Information.

A scatterplot of ex PAC MAC

F  versus Fexpex for 1092 binary mixtures using the original

vacuum-free PAC model is shown in Figure 7. The dataset is divided into 5 different categories to obtain more insight in the plot. The 77 mixtures containing water are grouped together in the category “Water – X”. All 474 remaining mixtures containing one or more hydrogen bonding donors (alcohols or amines) are represented by the category “HB donor – X”. Mixtures without any hydrogen bonding involved are divided in mixtures in which both molecules contain heteroatoms (“Heteroatom – Heteroatom”, 130 data points), mixtures in which only one molecule contains heteroatoms (“Heteroatom – Hydrocarbon”, 285 data points) and mixtures of hydrocarbons (“Hydrocarbon – Hydrocarbon”, 126 data points).

Figure 7. Scatterplot of excess free energies of mixing, original PAC-MAC versus experimental.

Figure 7 shows reasonable correlation between the calculated and experimental excess free energy. However, the plot contains many outliers significantly reducing the correlation coefficient to 0.558 and increasing the RMSE to 0.273 kBT. The outliers are primarily attributable to mixtures containing water. Omission of the 77 binary mixtures containing water results in a reduction of the RMSE to 0.177 kBT.