http://www.cs.ucdavis.edu/~koehl/


Homology Modeling


2. Protein Sidechain Modeling: A Self Consistent Mean Field Approach

2.1 Overview

Modeling the side-chain conformation of amino acid residues of a protein molecule from its sequence and a model of its backbone is a sub-problem of many current efforts in protein tertiary structure determination, prediction and design. In comparative protein structure modelling for example, the backbone of the target protein is derived from a template, homologous protein structure, onto which side-chains must subsequently be placed. In response to this ubiquitous need, prediction of side-chain conformation has been an active area of research over the last ten years (for review, see [1]).

Three elements are key to all side-chain modelling methods: choice of the variables describing the conformational space, choice of the energy function that measures the proximity of a given conformation to the true native conformation of the protein, and choice of the minimization procedure. The next three sections cover these three points, while the last section gives links to available web services/programs.

2.2 Protein sidechain conformation. Rotamer libraries.


A protein structure is fully described by the 3-dimensional coordinates of all its atoms. Assuming fixed bond lengths and bond angles, the same protein can be described by its torsion (or dihedral) angles, with a significant reduction in the number of degree of freedom.



Even in dihedral angle space, the conformational space accessible to all sidechains of a protein remains very large. A key approximation which alleviate this problem in most existing methods for modelling sidechain conformation is the discretization of the sidechain conformation space, whereby a sidechain is only allowed to adopt a discrete set of conformations. This approximation is based on the observation that, in high-resolution experimental protein structures, side-chains tend to cluster around a discrete set of favored conformations, known as rotamers [2,3]. In most cases, these rotamers correspond to local minima on the side-chain potential energy map.

Many rotamer libraries are presently available [4-8]. Among these, it is worth mentioning the backbone dependent rotamer library originally developed by Dunbrack and Karplus [5]. The table below gives link to rotamer libraries available on the Web.

Rotamer libraries
Library name Reference Location
Ponder and Richards [4] http://www.fccc.edu/research/labs/dunbrack/sidechain/ponder_richards.rot
Dunbrack and Cohen [5,6] http://www.fccc.edu/research/labs/dunbrack/sidechain.html
Tuffery et al [7] http://condor.urbb.jussieu.fr/Rotamer.php/
DeMaeyer et al [8] http://www.fccc.edu/research/labs/dunbrack/sidechain/demaeyer.rot
Lovell et al [9] http://kinemage.biochem.duke.edu/databases/rotamer.php


2.3 Protein Energetics


  • Molecular force-fields
  • Predicting the 3 dimensional structure of a protein and simulating its dynamics remain two of the most active fields of research in computational biology. Essential to these goals is our understanding of the energetics of the protein molecule, i.e. of the forces that stabilize its native conformation. Recent advancements in computers and computational techniques enable us to study protein structure and dynamics more accurately in a more realistic environment. A key to these advancements is the use of full atom molecular force fields. Recent example of force fields include the Merck molecular force field (MMFF) [10], OPLS-AA [11], AMBER [12] and CHARMm [13].

  • Energy functions for sidechain modelling
  • All force fields set a distinction between bonded interactions (bond stretching, bond angle bending, torsion angle rotation), and non-bonded interactions. Most methods for modelling the conformation of sidechains use a discrete set of conformations to represent sidechains (i.e. rotamers), whereby bond lengths and bond angles are fixed, and the sidechain torsion angles correspond to (or are very close to) minima of the torsion angle potentials. In this case, only non-bonded ineractions are important. The basic functional form for the non bonded interaction energy between two atoms i and j include a vdW term approximated by a Lennard-Jones potential and an electrostatics term, described by a Coulomb potential:

    where rij is the distance between atom i and atom j, Aij, Bij, qi and qj are parameters given by the force field of interest. 0 is the dielectric constant.

  • Hydrogen bonds
  • Hydrogen bonds are the key to many phenomena, including the formation and stabilization of secondary structures. It is therefore important that the geometry of hydrogen bonds be incorporated as accurately as possible into potential energy functions. In most force fields, hydrogen bonds are implicitly accounted for by equations (1) and (2); explicit potentials are also available (see for example [14]).

  • Solvent interaction
  • Equations (1) and (2) account for intramolecular interactions, and do not take in account the effect of the solvent. Several models, both implicit and explicit, have been developed for solvent interactions. A review of these models is available here.


2.4 Search algorithms in sidechain conformation space


There are two classes of search algorithms in scientific computing: stochastic and deterministics. Stochastic algorithms such as Monte Carlo [15] and genetic algorithms [16] follow probabilistic trajectories converge, but they are not guaranteed to reach the global minimum of the system. Their outcome is also dependent on their initial conditions and on the random number generator seed. In constrast, deterministic methods such as the Dead End Elimination [17] and SCMF [18] will find the same results for a given set of parameters. They do not however always converge, most of the time because of the computational time they require.

Both classes of algorithms have been applied to the problem of modeling sidechain conformation (for review, see [1]). The same methods can be used for protein design. A more complete description of these methods can be found here.

2.5 Prediction of protein sidechain conformation using SCMF


Here I focus on the SCMF method that Marc Delarue and myself originally developed for protein sidechain modeling [18]. An overview of the Mean Field Theory is available here. Its application to sidechain modeling is outlined in the figure below.





Modeling sidechain conformation using SCMF requires special attention on the following points:

  • A two body energy function.
    The equation for the meanfield of the effective energy of the system was established in the case of a two body potential U. While both Lennard Jones and Coulomb potentials satisfy this condition, application of a multi-body potential such as a surface term for implicit solvent would require a modification of the formalism. The same problem would have to be solved if the Coulomb potential was replaced by a Generalized Born potential, since the Born radii are structure dependent. To my knowledge, this has not yet been attempted.

  • A soft potential for close contact.

  • Both the Lennard Jones potential and the Coulomb potential described by equation (1) becomes infinite when the interatomic distance becomes 0. This could be a problem since sidechains are positioned according to fixed rotamer orientations, and steric clashes could occur. To remove these infinite or very large energy barries, the potential energy function is truncated to a maximum value of 10 Kcal/Mol, as described by Levitt [19].

  • A geometric potential for disulphide bridges

  • if the protein contains known disulphide bridges, a distance-dependent potential is introduced to guide the formation of the disulphide bond with proper geometry, as defined in the ECEPP energy function [20].

  • Updating the probabilities

  • Direct application of the Boltzmann-like equation to update the probabilities can lead to oscillation in the value of the total effective energy of the system. These oscillations can be removed by setting a memory to the system [18,21]:



    where P is the matrix containing the probabilities, and the subscripts new, comp and old referes to the updated probability matrix, the probability matrix obtained from the Boltzmann-like formula, and the old probability matrix, respectively. µ is a constant factor, usually set to 0.5 [18].



2.6 Sidechain prediction: accuracy


  • Sidechain prediction based on SCMF
  • The SCMF method for sidechain prediction was tested on 340 protein domains, ranging in size from 100 to 695 residues. These are the "complete" domains (i.e. with complete backbone) of the SCOP 1.55 [22] fold library. Resultsare shown below.




  • Comparing SCMF and SCWRL


  • SCMF was shown to compare well with other automated techniques for the prediction of side-chain conformation [18]. The figure below shows the comparison of SCMF with SCWRL [23] for predicting the conformation of side-chains of the 340 proteins considered above.




    SCWRL first chooses for each sidechain the most probable rotamer based on the local conformation of the backbone near the sidechain, and then optimizes these conformations using Monte Carlo in order to remove clashes and improve packing. Over all residues, SCMF and SCWRL have the same levels of accuracy. SCWRL however works better than SCMF for exposed residues: the conformations of these side-chains are mainly defined from their interactions with the solvent, which is not included in the energy function used by SCMF. Solvent is not also considered by SCRWL, but is implicitly included into the probabilities of side-chain conformations based on the backbone conformation.



References

1. Vasquez, M. Modeling sidechain conformation. Curr. Opin. Struct. Biol. , 6, 217-221 (1996).

2. Janin, J, Wodak, S, Levitt and M, Maigret, B. Conformation of amino-acid side-chains in proteins. J. Mol. Biol., 125, 357-386 (1978).

3. Schrauber, H, Eisenhaber, F and Argos, P. Rotamers: to be or not to be? An analysis of amino acid side-chain conformations in globular proteins. J. Mol. Biol., 230, 591-612 (1993).

4. Ponder JW and Richards, FM. Tertiary templates for proteins : use of packing criteria in the enumeration of allowed sequences for different structural classes. J. Mol. Biol., 193, 775-791 (1987).

5. Dunbrack, RL and Karplus, M. Backbone-dependent rotamer library for proteins : application to side-chain prediction. J. Mol. Biol., 230, 543-574 (1993).

6. Dunbrack, RL and Cohen, FE. Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Sci., 6, 1661-1681 (1997).

7. Tuffery, P, Etchebest, C, Hazout, S and Lavery, R. A new approach to the rapid-determination of protein side-chain conformations. J. Biomol. Struct. Dyn., 8, 1267-1289 (1991).

8. DeMaeyer, M, Desmet, J and Lasters, I. All in one: A highly detailed rotamer library improves both accuracy and speed in the modelling of sidechains by dead-end elimination. Folding & Des., 2, 53-66 (1997).

9. Lovell, SC, Word, JM, Richardson, JS, Richardson, DC. The penultimate rotamer library. Proteins: Struct. Func. Genet., 40, 389-408 (2000).

10. Halgren TA. Merck molecular force field. I. Basis, form, scope, parametrization and performance of MMFF94. J. Comput. Comp., 17, 490-519 (1996).

11. Jorgensen, WL, Maxwell, DS and TiradoRives, J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc., 118, 11225-11236 (1996).

12. Cornell, WD et al. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc., 117, 5179-5197 (1995).

13. Mackerell, AD Jr., Wiorkiewicz-Kuczera, J and Karplus, M. An all-atom empirical energy function for the simulation of nucleic acids. J. Am. Chem. Soc., 117, 11946-11975 (1995).

14. Fabiola, F, Bertram, R, Korostelev and Chapman, MS. An improved hydrogen bond potential: impact on medium resolution protein structures. Protein Sci., 11, 1415-1423 (2002).

15. Metropolis, N, Rosenbluth, AW, Rosenbluth, MN, Teller, AH, Teller, E. Equations of state calculations by fast computing machines. J. Chem. Phys., 21, 1087-1092 (1953).

16. Holland, JH. Adaptation in Natural and Artificial Systems, The MIT Press, Boston (1993).

17. Desmet, J, DeMaeyer, M, Hazes, B and Lasters, I. The dead-end elimination theorem and its use in protein side-chain positioning. Nature (London), 356, 539-542 (1992).

18. Koehl, P and Delarue, M. Application of a self consistent mean field theory to predict protein side-chain conformations and estimate their conformational entropy. J. Mol. Biol., 239, 249-275 (1994).

19. Levitt, M. Protein folding by constrained minimization and molecular dynamics. J. Mol. Biol., 226, 507-533 (1992).

20. Monany, FA, McGuire, RF, Burgess, AW and Scheraga, HA. Energy parameters in polypeptides. VII. Geometric parameters, partial atomic charges, non bonded interactions, hydrogen bond interactions and intrinsic torsion potentials for the naturally occuring amino acids. J. Phys. Chem., 79, 2361-2366 (1975).

21. Finkelstein AV, and Reva, BA. Search for the stable state of a short chain in a molecular field. Protein Eng., 5, 617-624 (1992).

22. Murzin, AG, brenner, SE, Hubbard T and Chothia, C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247, 536-540 (1995).

23. Bower, MJ, Cohen, FE and Dunbrack, RL. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. J. Mol. Biol., 267, 1268-1282 (1997).





  Page last modified 2 January 2005 http://www.cs.ucdavis.edu/~koehl/BioEbook/