Data Management and Analysis - New Methods for Understanding and Controlling the Self-Assembly

Since HOOMD-Blue is built as a general-purpose molecular dynamics package, no system specific functions are provided by it. The initial configurations of the desired epoxy system, with the desired system parameters which include information about the interaction parameters, thermodynamic state points and other simulation parameters such as the time step size and output data saving frequency are all important information that all come together to produce one simulation. It is a common practice to write a monolithic simulation script which contains all of the previously mentioned information. The advantage of this approach is that it is quite easy to make changes quickly to the simulation script. However, in order to explore variations of the

simulated model, it will become necessary to duplicate code which in turn makes the process of bug fixing more tedious since the same bug needs to be fixed in all the duplicated versions of the original simulation script. Overall this approach soon becomes impossible to maintain and intractable for large projects. In order to avoid this problem of duplicated code an object-orientedpythonpackage calledepoxpy102

is developed and used throughout this research.

2.6.1 epoxpy

epoxpy is an open-source python package that is essentially a collection of python classes that each represent a single simulation. These simulation classes form a layer of abstraction over HOOMD-Blue and the bonding plugin called dybond which is developed as part of this research. These simulation classes also leverage an open- source molecule builder package called mbuild to generate initial particle positions and connectivities of the epoxy system being modeled. Every simulation class has a function called execute which calls three sub routines: initialize, mix and run. The initialize function uses mbuild to generate the initial particle positions and connectivities. The mix function runs a high-temperature MD simulation with all the prescribed interaction potentials and generates a random initial condition that is representative of a fully miscible polymer blend that is uncured. This phase is important to relax some of the unphysical configurations that may be generated by the initial molecule builder. The run function executes the curing simulation or

coolingsimulations that are performed on the cured simulation output for purposes such as calculating equilibrium properties as ensemble averages for measuring Tg.

The simulation classes all have a common base class called Simulation which encapsulates the common functionality of any simulation. The next level in this

inheritance tree contains the EpoxySimulation class which encapsulates methods and variables that are common for all simulations of epoxy materials as shown in Figure 2.6. As this tree grows, the simulation class contains increasingly specific methods and variables that apply to specific simulations. For example, the

ABCTypeEpoxyLJHarmonicSimulationclass is responsible to running a specific epoxy system with “A”,“B” and “C” beads where the non-bonded interactions are modeled using LJ and the bonded interactions use the Harmonic bond potential unlike the

ABCTypeEpoxyDPDHarmonicSimulation which uses DPD as its non-bonded interaction. As each of these simulations is added to this class hierarchy, tests cases that run a small simulation and validate the output against predefined expected values are also created. Examples of simple predefined expected value include the number of particles in the system or the number of bonds in the system. More complex expected values include the average equilibrium bond distance of the system. Simple tests can potentially catch coding errors that would break the system and cause exceptions when deployed in production runs. These errors caught right at the time of origination, are easier to fix because the code change that caused the error is still fresh in the developer’s mind. Furthermore, these tests are automatically executed when the code is committed to the code repository. This practice of running tests on every incremental code change to make sure that the system has not broken is known as “continuous integration”.

2.6.2 epoxpy-flow

Most materials discovery projects such as the present research require running hun- dreds or even thousands of simulations per study which need to be executed parallelly on supercomputers with many computing nodes. Two components are necessary

Simulation EpoxySimulation ABCTypeEpoxySimulation ABCNPTypeEpoxyDPDHarmonicSimulation ABCTypeEpoxyDPDHarmonicSimulation ABCNPTypeEpoxyLJHarmonicSimulation ABCTypeEpoxyDPDFENESimulation ABCTypeEpoxyDPDLJSimulation ABCTypeEpoxyLJHarmonicSimulation ATypeEpoxySimulation ATypeEpoxyLJHarmonicSimulation

Figure 2.6: The class hierarchy in epoxpy. Note that only the yellow classes can be instantiated as objects and the others are all abstract classes.

for this workflow to be seamless. The first one is an efficient data management framework to store, manage and query the data generated by the simulations. The second one is a job submission framework that can create, submit and manage jobs on supercomputers. epoxpy-flow is a pythoncode that leverages a data management and job management framework called signac and signac-flow respectively to achieve both of these tasks. Apart from creating jobs for large parameter sweeps, and submitting them,epoxpy-flowis also responsible for performing post-processing operations on the simulation output so that data analysis is done in a somewhat decentralized fashion. The output of the postprocessing is typically the aggregate

data obtained from a single simulation. This data includes information such as the mean temperature, pressure, diffusivity, gel point and final cure fraction and are typically stored in a file format known as the JSON (JavaScript Object Notation).

2.6.3 pandas and jupyter notebooks

Even though most of the post-processing is performed remotely, the data analysis is performed locally by copying the aggregate data in JSON format. An easy to use web-based interactive python editor called jupyter notebook is used to write data analysis and visualization code. A pythonpackage called pandas is used to convert the raw JSON data into a table-based data structure called DataFrame with which it is very convenient to perform extensive data analysis. All of the data analysis and visualizations produced in this research is obtained using the procedures described here.

CHAPTER 3 DEVELOPING EFFICIENT METHODS FOR REACTION

MODELLING OF EPOXY CROSSLINKING

In document New Methods for Understanding and Controlling the Self-Assembly of Reacting Systems Using Coarse-Grained Molecular Dynamics (Page 52-57)