• The dynameomics entropy dictionary: a large-scale assessment of conformational entropy across protein fold space

      Towse, Clare-Louise; Akke, M.; Daggett, V. (2017-04-04)
      Molecular dynamics (MD) simulations contain considerable information with regard to the motions and fluctuations of a protein, the magnitude of which can be used to estimate conformational entropy. Here we survey conformational entropy across protein fold space using the Dynameomics database, which represents the largest existing dataset of protein MD simulations for representatives of essentially all known protein folds. We provide an overview of MD-derived entropies accounting for all possible degrees of dihedral freedom on an unprecedented scale. Although different side chains might be expected to impose varying restrictions on the conformational space that the backbone can sample, we found that the backbone entropy and side chain size are not strictly coupled. An outcome of these analyses is the Dynameomics Entropy Dictionary, the contents of which have been compared with entropies derived by other theoretical approaches and experiment. As might be expected, the conformational entropies scale linearly with the number of residues, demonstrating that conformational entropy is an extensive property of proteins. The calculated conformational entropies of folding agree well with previous estimates. Detailed analysis of specific cases identify deviations in conformational entropy from the average values that highlight how conformational entropy varies with sequence, secondary structure, and tertiary fold. Notably, alpha-helices have lower entropy on average than do beta-sheets, and both are lower than coil regions.
    • The effect of chirality and steric hindrance on intrinsic backbone conformational propensities: tools for protein design

      Childers, M.C.; Towse, Clare-Louise; Daggett, V. (2016-06-09)
      The conformational propensities of amino acids are an amalgamation of sequence effects, environmental effects and underlying intrinsic behavior. Many have attempted to investigate neighboring residue effects to aid in our understanding of protein folding and improve structure prediction efforts, especially with respect to difficult to characterize states, such as disordered or unfolded states. Host-guest peptide series are a useful tool in examining the propensities of the amino acids free from the surrounding protein structure. Here, we compare the distributions of the backbone dihedral angles (φ/ψ) of the 20 proteogenic amino acids in two different sequence contexts using the AAXAA and GGXGG host-guest pentapeptide series. We further examine their intrinsic behaviors across three environmental contexts: water at 298 K, water at 498 K, and 8 M urea at 298 K. The GGXGG systems provide the intrinsic amino acid propensities devoid of any conformational context. The alanine residues in the AAXAA series enforce backbone chirality, thereby providing a model of the intrinsic behavior of amino acids in a protein chain. Our results show modest differences in φ/ψ distributions due to the steric constraints of the Ala side chains, the magnitudes of which are dependent on the denaturing conditions. One of the strongest factors modulating φ/ψ distributions was the protonation of titratable side chains, and the largest differences observed were in the amino acid propensities for the rarely sampled αL region.
    • Insights into Unfolded Proteins from the Intrinsic ϕ/ψ Propensities of the AAXAA Host-Guest Series

      Towse, Clare-Louise; Vymetal, J.; Vondrasek, J.; Daggett, V. (2016-01-19)
      Various host-guest peptide series are used by experimentalists as reference conformational states. One such use is as a baseline for random-coil NMR chemical shifts. Comparison to this random-coil baseline, through secondary chemical shifts, is used to infer protein secondary structure. The use of these random-coil data sets rests on the perception that the reference chemical shifts arise from states where there is little or no conformational bias. However, there is growing evidence that the conformational composition of natively and nonnatively unfolded proteins fail to approach anything that can be construed as random coil. Here, we use molecular dynamics simulations of an alanine-based host-guest peptide series (AAXAA) as a model of unfolded and denatured states to examine the intrinsic propensities of the amino acids. We produced ensembles that are in good agreement with the experimental NMR chemical shifts and confirm that the sampling of the 20 natural amino acids in this peptide series is be far from random. Preferences toward certain regions of conformational space were both present and dependent upon the environment when compared under conditions typically used to denature proteins, i.e., thermal and chemical denaturation. Moreover, the simulations allowed us to examine the conformational makeup of the underlying ensembles giving rise to the ensemble-averaged chemical shifts. We present these data as an intrinsic backbone propensity library that forms part of our Structural Library of Intrinsic Residue Propensities to inform model building, to aid in interpretation of experiment, and for structure prediction of natively and nonnatively unfolded states.
    • Modeling Protein Folding Pathways

      Towse, Clare-Louise; Daggett, V. (2015-05-01)
      This chapter gives an introduction to protein simulation methodology aimed at experimentalists and graduate students new to in silico investigations. More emphasis is placed on the knowledge needed to select appropriate simulation protocols, leaving theoretical and mathematical depth for other texts to take care of. The chapter explains some of the more practical considerations of performing simulations of proteins, in particular, the additional considerations required when studying protein folding where nonnative environments are modeled. Forced unfolding simulations are highly relevant and invaluable in characterizing proteins naturally exposed to mechanical stress as a component of their biological function. The chapter illustrates this utility by discussing research that has been done primarily on the giant muscle protein titin. Using Molecular dynamics (MD) simulations to investigate protein folding faces two main challenges. The most obvious relates to the timescale of protein folding and the computational expense required for adequate sampling.
    • Nature versus design: the conformational propensities of D-amino acids and the importance of side chain chirality

      Towse, Clare-Louise; Hopping, G.G.; Vulovic, I.M.; Daggett, V. (2014-11-27)
      D-amino acids are useful building blocks for de novo peptide design and they play a role in aging-related diseases associated with gradual protein racemization. For amino acids with achiral side chains, one should be able to presume that the conformational propensities of L- and D-amino acids are a reflection of one another due to the straightforward geometric inversion at the Cα atom. However, this presumption does not account for the directionality of the backbone dipole and the inverted propensities have never been definitively confirmed in this context. Furthermore, there is little known of how alternative side chain chirality affects the backbone conformations of isoleucine and threonine. Using a GGXGG host-guest pentapeptide system, we have completed exhaustive sampling of the conformational propensities of the D-amino acids, including D-allo-isoleucine and D-allo-threonine, using atomistic molecular dynamics simulations. Comparison of these simulations with the same systems hosting the cognate L-amino acids verifies that the intrinsic backbone conformational propensities of the D-amino acids are the inverse of their cognate L-enantiomers. Where amino acids have a chiral center in their side chain (Thr, Ile) the β-configuration affects the backbone sampling, which in turn can confer different biological properties.
    • New Dynamic Rotamer Libraries: Data-Driven Analysis of Side-Chain Conformational Propensities

      Towse, Clare-Louise; Rysavy, S.J.; Vulovic, I.M.; Daggett, V. (2016-01-05)
      Most rotamer libraries are generated from subsets of the PDB and do not fully represent the conformational scope of protein side chains. Previous attempts to rectify this sparse coverage of conformational space have involved application of weighting and smoothing functions. We resolve these limitations by using physics-based molecular dynamics simulations to determine more accurate frequencies of rotameric states. This work forms part of our Dynameomics initiative and uses a set of 807 proteins selected to represent 97% of known autonomous protein folds, thereby eliminating the bias toward common topologies found within the PDB. Our Dynameomics derived rotamer libraries encompass 4.8 × 10(9) rotamers, sampled from at least 51,000 occurrences of each of 93,642 residues. Here, we provide a backbone-dependent rotamer library, based on secondary structure ϕ/ψ regions, and an update to our 2011 backbone-independent library that addresses the doubling of our dataset since its original publication.
    • When a domain is not a domain, and why it is important to properly filter proteins in databases: conflicting definitions and fold classification systems for structural domains make filtering of such databases imperative

      Towse, Clare-Louise; Daggett, V. (2012-12-01)
      Membership in a protein domain database does not a domain make; a feature we realized when generating a consensus view of protein fold space with our consensus domain dictionary (CDD). This dictionary was used to select representative structures for characterization of the protein dynameome: the Dynameomics initiative. Through this endeavor we rejected a surprising 40% of the 1,695 folds in the CDD as being non-autonomous folding units. Although some of this was due to the challenges of grouping similar fold topologies, the dissonance between the cataloguing and structural qualification of protein domains remains surprising. Another potential factor is previously overlooked intrinsic disorder; predictions suggest that 40% of proteins have either local or global disorder. One thing is clear, filtering a structural database and ensuring a consistent definition for protein domains is crucial, and caution is prescribed when generalizations of globular domains are drawn from unfiltered protein domain datasets.