As part of MSF, we implemented a multi-state design logic based on Rosetta’s genetic
algorithm to explore the sequence space. As described in Subsection 2.7.2, the GA maintains a population of design sequences that are evolved for a number of gener- ations. The fitness of an individual sequence results from the application-specific design protocol and a user-defined fitness function. For the initial implementation of MSF, two widely used Rosetta protocols were integrated: ENZDESprovides ligand binding and enzyme design functionality by repacking and redesigning residues around the binding/active site and by optimizing catalytic contacts [Richter et al., 2011]. ANCHOREDredesigns a protein-interface by using information from a known interaction at the same interface of one partner [Lewis and Kuhlman, 2011]. The resulting applications were validated and expose all options of the two protocols and
of the GA to the user. To accommodate the multi-state design approach, sequences are evaluated in the application specific design protocols for all given states. A com- bined score (=fitness) is calculated according to a so-called "dynamic aggregation function" (DAF, introduced by MPI_MSD[Leaver-Fay et al., 2011a]) that may weight different states individually and supports positive and negative design as well.
The framework MSFwas integrated as an additional protocol into Rosetta and is purely written in C++98. It aims at significantly reducing the development effort of equipping arbitrary Rosetta protocols with multi-state design capability. Therefore, MSF bundles a number of classes responsible for distributing tasks to available computational resources, by establishing MPI [Gropp et al., 1996] based communi- cation and task synchronization, as well as initialization and execution of Rosetta protocols. The software architecture is as follows for all protocols: One process is responsible for the logic of the GA and a user-defined number of additional pro- cesses are grafting and scoring, which guarantees high scalability. Every aspect of MSF was designed with the highest possible flexibility in mind. This allows the modification of superficial aspects of the algorithm as well as easy access to core elements due to a global allocation system managing the instantiation of polymor- phic key classes. Simply put, most important functions of the multi-state design pipeline are exposed to the application and may be modified for each application individually without breaking compatibility to existing applications. Due to the modularity of the implementation, it is easier to extend single-state Rosetta protocols with multi-state design capability in contrast to already existing multi-state applica- tions. In the following, ROSETTA:ENZDES (or for the sake of brevity ENZDES) and ROSETTA:MSF:GA:ENZDES(MSF:GA:ENZDES) are the names of the SSD and MSD implementations for enzyme design; ANCHOREDand MSF:GA:ANCHOREDare the SSD and MSD implementations for anchored protein-protein interface design. Comparison to existing MSD approaches in ROSETTA
This is not the first implementation of multi-state design in Rosetta. MSD methodol- ogy extends the application spectrum and thus, Rosetta offers several multi-state applications; noteworthy are MPI_MSD [Leaver-Fay et al., 2011a] and RECON[Sevy et al., 2015]. MPI_MSDprovides a generic multi-state design implementation based on a genetic algorithm that optimizes a single sequence on multiple states given a fit- ness function. RECONstarts by individually optimizing one sequence for each state; subsequently the computation of a consensus sequence is promoted by incremen- tally increasing convergence restraints. However, the current implementations of
both methods are limited to certain design tasks and cannot make use of fine-tuned protocols, e.g. those required for enzyme design (ENZDES) or anchored design of protein-protein interfaces (ANCHOREDDESIGN). In order to overcome this limita- tion, MSFwas developed and the integration of MSFinto Rosetta enables the use of already proven single-state protocols in a MSD environment.
MSF relates to MPI_MSD as a progression, relying on the same GA protocol and DAF accounting for the different states. In MPI_MSDthe maximal number of processes that can be used efficiently is limited by the number of design states, while MSFis highly scalable by being able to utilize up to states×population processes. Also, MSFwas written from scratch and designed with high maintainability and flexibility in mind, allowing any developer to extend existing Rosetta protocols with multi-state design capability. We thus present MSF as a third option for multi-state design in Rosetta which may especially be considered in cases where generic design algorithms cannot be applied or produce unsatisfactory results. MPI_MSD relies on the standard packing procedure which grafts sequences onto a pose in a generic way and does not support specialized tasks like enzyme or interface design out of the box. At the time of writing, it was also not possible to utilize RECONin the same manner, since it does not support the strict separation of processes required by the GA to synchronize the design processes during the transition of a generation. Availability, installation, and command line options
The integration of MSF into Rosetta’s master branch is current work of Samuel Schmitz at the time of writing and aims at providing MSF-support for upcoming ROSETTA releases. However, MSF is available via a git-branch based on version 2015.19.57819, which contains the final version of MSFused for benchmarking and retro-aldolase designs. The branch with the name SamuelSchmitz/msf_2015.19.57819 contains the two applications enzyme design (application msf_ga_enzdes) and an- chored interface design (msf_ga_anchored) and can be accessed only from Roset- taCommons developers by cloning https://github.com/RosettaCommons/main/ tree/SamuelSchmitz/msf_2015.19.57819 via git [Torvalds and Hamano, 2010]. It is required to compile Rosetta with MPI support, e.g. ./scons.py mode=release ex- tras=mpi msf_ga_enzdes msf_ga_anchored. The list of available command line options and an example command to run MSFare listed in Appendix A.