Biomolecular simulations can be conducted using highly diverse models and at different resolu- tions, each with their own strengths and weaknesses. In general, there is a trade-off between model detail and simulation efficiency, meaning that the greater the accuracy, the smaller the system that can be simulated. An appropriate resolution must be sufficiently detailed in order to accurately capture the properties of interest, while also being computationally tractable. Fig. 1.2 provides a summary of the different spatial resolutions applied to proteins and protein phase separation.
The most detailed resolution is quantum mechanics (QM) which explicitly accounts for elec- trons and orbitals, and is mostly limited to very small systems, such as short peptide sequences. Hybrid methods such as quantum mechanics/molecular mechanics (QM/MM) may be used to
Figure 1.2: Overview of simulation resolutions. Higher-resolution techniques give more detailed insight into chemical systems, but are hindered by low computational efficiency, and limits to system sizes which can be studied.
extend the size of the system, however QM regions are still limited to several hundred atoms[134]. One important contribution from QM calculations in the study of disordered proteins has been in constructing classical force field for atomic resolution simulations[135]. QM methods can also be used to improve and extend existing atomistic force fields to include non-canonical amino acids such as those with post-translational modifications[67].
Reducing the level of complexity to classical mechanics, the highest resolution can be achieved by using all-atom simulations with explicit representation of solvent molecules. These simulations have been applied to many biomolecular systems due to their high level of detail, and sufficient computational efficiency to consider a single IDP consisting of several hundred amino acids, or a small assembly of shorter IDPs. Such simulations are able to provide structural characteristics of protein sequences[55, 136, 137] as well as detailed information on inter-residue interactions[78, 103] in good agreement with experimental measurements[138]. Rauscher et al. simulated 27 copies of a 35-residue elastin-like peptide (ELP) for a combined simulation time of 165 µs and verify the
proposed liquid-like nature of ELP assemblies, showing that association is driven by nonspecific hydrophobic contacts and hydrogen bonds[78]. To date, this is the only such study for a large assembly of phase separating IDPs at atomic resolution. However, the system size and amount of sampling required to converge on reasonable results are highly cumbersome, and thus, inaccessible to most research groups. One way to reduce this barrier to sampling would be to develop/use atomic resolution force fields with implicit solvent using mean field theory [139].
To further overcome the obstacle of simulating large IDP assemblies, coarse-grained (CG) models are commonly employed in which a group of atoms may be represented collectively as a coarse-grained “bead”[140]. The degree to which a protein can be coarse-grained is flexible, and ranges from multiple beads per amino acid to multiple amino acids per bead[141], following the same trade-off described earlier between model detail and simulation efficiency. CG models commonly account for interactions between protein and solvent molecules implicitly by modifying the protein-protein interactions accordingly, further reducing the computational cost. CG models can also be system specific, being optimized to the experimental data of one particular system[142, 143], or can be more general-purpose, focusing on transferability and applicability to all IDP sequences[144–146]. Simulations of proteins in CG representation have been successfully applied to the study of IDP phase separation and assembly, including multiple beads per residue[56, 147], single bead per residue[88, 144], and multiple-residues-per-bead[148–150]. For the purpose of elucidating sequence-encoded phase separation, the balance lies at a single-bead-per-residue (residue-level) model which minimizes the computational cost while explicitly representing amino acid sequences. Dignon et al. proposed a general purpose, residue-level model which considers IDPs as flexible chains, ignoring secondary structure, and accounts for all 20 canonical amino acids based on either amino acid hydrophobicity[144] or bioinformatics-based contact potentials[151]. This model has successfully been implemented to reproduce sequence-dependent phase behavior of disordered proteins [33, 35]. This framework also accommodates for introduction of non-canonical amino acids, improved interaction potentials, and imposition of secondary structure either through rigid body constraints, or combined angle and dihedral potentials[144]. To date, the residue-level CG model is the most detailed model that can practically simulate the IDP phase coexistence.
polymer theories may also be applicable to the problem. Lin et al. combined Flory-Huggins theory with a random phase approximation (RPA) and successfully captured the interactions between charged amino acids[83]. They further saw a strong correlation between the radius of gyration (Rg) of their corresponding critical temperature, observing phase separation for polyampholytic
chains with different charge patterning [37].
1.2.2 Advanced sampling of phase coexistence
Even with well chosen models, efficient sampling of phase behavior remains a non-trivial task. One classic strategy is to improve sampling by constraining chains of polymers onto a simple lattice. Brute force lattice Monte Carlo simulations have been used at residue-level to study the phase behavior of short polyampholytic sequences to determine the effects of charge patterning on phase separation[86]. Other studies used a much coarser model, representing multi-domain proteins and RNAs as chains of interaction sites on a lattice, and parameterized to specifically capture behaviors observed in experiment[148, 149, 152]. Representing particles on a lattice, however, will be limited in its ability to capture densities in the condensed phase[85]. Representing chains off-lattice would therefore provide a more accurate representation protein chain, which we find to justify the additional challenge to sampling.
Another common approach for sampling phase coexistence is grand canonical Monte Carlo (GCMC) which involves attempting insertions and deletions of molecules randomly[153]. One weakness of GCMC is that the acceptance probability of inserting into a liquid-density phase drops rapidly as chain length increases, making study of IDPs prohibitive without the use of lattice coordination and/or other enhanced sampling techniques. One such technique is configurational bias Monte Carlo[154], which can be used to find the “holes” in the dense phase. On-lattice GCMC simulations using conformational biasing have been successful for systems of polymers up to 1000 residues[155]. Jacobs et al. utilize on-lattice GCMC with multicanonical biasing, and observed the effects of interaction strengths and number of unique components on phase separation, showing that intermolecular interactions have a greater influence than the number of components[156]. Another popular method is Gibbs ensemble Monte Carlo (GEMC), in which particles are modeled in two separate boxes of varying size where particles may be transferred
from one to the other, thus yielding two continuous “bulk” phases in coexistence[155].
Another efficient method to simulate the phase behavior is to use a slab geometry, where two coexisting phases are simulated in an elongated simulation box with periodic boundary conditions, having two planar interfaces perpendicular to the elongated axis[85, 144, 157]. This strategy can be used with virtually any representation of IDPs on- or off-lattice. Jung et al. have also shown the use of slab geometry on multi-component systems, leading to convergent results in excellent agreement with studies using semi-GCMC and GEMC[157]. These observations suggest that the slab sampling method could be highly beneficial to the study of LLPS of IDPs and other biomolecules.