# PhD Defense - Haoyun Feng

Start: | 12/4/2014 at 1:00PM |
---|---|

End: | 12/4/2014 at 4:30PM |

Location: | 258 Fitzpatrick Hall |

**Haoyun Feng**

**December 4, 2014, 258 Fitzpatrick Hall, 1:00 pm**

*Advisor:*

**Dr. Jesus Izaguire**

*Committee Members:*

**Dr. Eric Darve Dr. Zoltan Toroczkai Dr. Christopher Sweet**

*Students/Faculty are welcome to attend the presentation portion of the defense*

*Title*

ENSEMBLE METHODS FOR FLUX CALCULATION

*Abstract*

Molecular Dynamics simulation is a numerical tool for simulating movements of molecules. It generates a sequence of coordinates representing Brownian motion of molecules, which is called a trajectory. Biological studies such as drug design usually involves reaction mechanisms over long time scales, which may cost several years of CPU clock time for simulation. Therefore, ensemble algorithms are developed to significantly accelerate MD simulations using distributed computing systems. A complication of ensemble algorithms is that they usually require a one dimensional reaction coordinate (RC), and it is challenging to extract RC from high dimensional conformational space. Two algorithms that overcomes this complication attracted attention over the past few years: the so-called Weighted Ensemble (WE) and Markov State Models (MSMs) methodologies. Instead of RC, clustering of microscopic configurations into networks of "macro-states'' is required for both algorithms. However, defining macor-states is still a complicated procedure which relies on sufficient sampling on the conformational space and the design of clustering algorithm. In Chapter 1,2,3, I show that WE rate predictions are less sensitive than MSM predictions to the definition of the macro-states. MSMs introduce significant biases in the computation of reaction rates, which depend on the boundaries of macro-states. On the other hand, AWE, a formulation of Weighted Ensemble that uses the notion of colors to compute fluxes along with a different algorithm to kill and split walkers, has reliable flux estimation on varying definitions of macro-states. Rigorous numerical experiments using alanine dipeptide and penta-alanine support this analyses. The results suggest that whereas an MSM provides a good idea of the metastable sets and visualization of overall dynamics, the computation of dynamical quantities is in general less biased when done using AWE. Although accuracy of AWE is not sensitive to the underlying partition, efficiency of AWE could be affected. Current WE algorithms are developed using Voronoi bins partition on conformational space, but this leads to poor partition on reaction coordinate. It is further discussed that the metastable states partition, which defines state with maximum kinetic connectivity inside, provides a better partition for AWE. Numerical results on alanine dipeptide show significant improvement on efficiency of AWE using metastable states partition over AWE using Voronoi bins partition, especially when setting small number of states for underlying partition. To further accelerating AWE, I worked on improving efficiency of algorithm for discovering metastable states from MD trajectories. In the existing studies, Monte Carlo simulated annealing (MCSA) has been widely applied to define metastable states with optimal metastability of the dynamical system. Chapter 6 proposes two greedy algorithms, G1 and G2, based on different definitions of local search space to improve efficiency and scalability of MCSA on distributed computing system. Numerical experiments are conducted on two biological systems, alanine dipeptide and WW domain. The numerical experiments demonstrate that G1 is the most efficient of the three on a single core machine and distributed computing system. Sequential version of G2 is the slowest but it gains the most speed up on distributed computing systems.