http://www.cs.ucdavis.edu/~koehl/


Optimization


2. Self Consistent Mean Field Theory

2.1 Overview

Many issues in computational biology face severe combinatorial problems. For protein structure prediction for example, a systematic search of the structure of a 50 residue peptide, for which each residue can adopt no more than five different states, would require testing 550, i.e. approximately 9x1034 conformations. Similarly, there are 1040 sequences to be tested for a systematic sequence engineering of a 100 residue proteins. For simplified systems, such as lattice models, these combinatorial problems can be solved exhaustively through an enumeration of all possible states [1]. This method, however, is not suited for the study of a much larger system, such as a protein, because of the exponential growth of the variable space. Various methods have been developed to alleviate this problem, and, among them, procedures based on Mean-field theory (MFT) have become very popular. In the next section, a brief description of MFT is provided. For a more detailed review, please see [2]. The next section gives a non exhaustive list of existing applications of MFT to computational biology
 

2.2 Mean field theory

We focus on the application of MFT to molecular systems.
 
  • The effective system: multicopy of the molecule

  •  

    We define the coordinate vector of all atoms in the molecular system as X. The probability of finding the molecular coordinates between X and X+dX is denoted by µ(X)dX, where µ(X) is the probability density of the coordinates, normalized to 1. The total energy of the system is given by:
     

    where U is the potential energy function.
    If the system contains a single molecule with a unique conformation whose coordinates are X0, Equation (1) becomes:
     

    The native conformation Xnat of the molecular system corresponds to the global minimum of Eeff. The search for this global minimum is hindered by the presence of many local minima. One way to alleviate this problem is to consider and effective, larger system as a computational tool to enhance the sampling protocol of the minimization procedure. This larger system is obtained by considering multiple copies of the molecule, or part of the molecule.

    Assumption 1: Hartree approximation

    The probability density µ is replaced by a product of independent probability densities of different systems, using a Hartree product [3]:
     

    As an example of such a partition into subsystems, a ligand and a protein can be considered separately [4,5]. Another example is the partition of a protein into backbone and side-chains [6]. The effective system is then built by considering multiple copies of each subsystem j.
     

     

    Multicopy sampling of the conformation of a protein.
    The protein is divided into its backbone (subsytem 0), and all its side-chain.
    In the simple case presented here, the Tyrosine at position j in the sequence
    has 3 possible conformations, corresponding to the 3 "copies" of subsystem j.
    Each copy k is given a weight or probability P(j,k), which are normalised to 1 over all copies.

     

    Assumption 2: Discrete sets

    The probability density µj of each subsystem is expanded into a finite number of delta functions:

    where kj runs over all Kj copies of the subsystem j. P(j,kj) are normalization factors or probabilities veryfying:

    For practical reasons, Kjis always finite.
     
     

  • The energy of the effective system

  •  
     

    Substituting Equations (3) and (4) into Equation (1) and integrating over the spatial variables leads to the following functional form for the energy of the effective system:

    Assumption 3: Pair-wise potential

    The potential energy function U is assumed to be a one and two-body potential:

    in which case Equation (6) reduces to:

    Equation (8) represents the effective energy function used in most MFT bio-applications. In essence, MFT replaces the problem of finding the global minimum energy given by equation (2) by the problem of finding the minimum of the "effective" potential energy described by equation (8). Its major advantage is a significant reduction of the variable space. For example, there are Kix Kj alternative configurations of subsystems i and j that can be examined using a single configuration of the effective system.
     
     

  • Minimization of the effective energy

  •  
     

    The minimization of Eeff can be obtained by three different routes:
     
     

    • Fixed probabilities, variable conformations: the LES protocol

    •  

      The normalisation factors P in Equation (8) are kept constant (usually taken as P(i,ki)=1/Kj, where Kj is the number of copies for subsystem j). The positions of each copy for each subsystem are then optimized by solving Newton-like equations of motion. This procedure was initially proposed as the LES (Locally Enhanced Sampling) method [3,7].
       
       

    • Fixed conformations, variable probabilities: the SCMF procedure

    •  

      The position of the various copies of the subsystems are supposed to be known and fixed in space. The effective system is then described by its free energy F:

      where T is the temperature, and S the entropy, defined as:

      The free energy F is a function of all factors P which are stored in a matrix P, whose current element P(j,k) is the probability that subsystem j is described by its possible state k.

      Minimization of F with respect to P leads to:

      where W(j,k) corresponds to the molecular field potential "felt" by the copy k in subsytem j:

      The system of equation (8) and (11) is then iterated until convergence, that is, until self-consistency is achieved. This procedure is referred to as SCMF (Self-Consistent Mean Field) method [2,6]
       
       

    • Simultaneous variation of conformations and probabilities

    •  

      With simultaneous variation of both coordinates and probabilities, it is theoretically possible to combine the advantages and avoid the disadvantages of the previous methods. A correct weighting (instead of weighting) between the different copies of each subsystem can be obtained because the probabilities are allowed to vary. Furthermore, the choice of the basis set of conformations is no more critical. For a detailed description of this approach, see [8].

2.3 Applications

In the first application of MFT in protein simulation, presented by Elber and Karplus [4], approximate mean-field treatment of protein-ligand dynamics enabled detailed studies to be made of the diffusion pathways of carbon monoxide through myoglobin. Finkelstein and Reva [9] published another application, namely testing stable protein folds. Other applications followed: find minimum-energy conformations for the side-chain modelling problem [6,10-13], for protein conformation optimization [14-16], for protein structure prediction on lattice [9, 17], for loop construction in protein homology modelling [18-20], and for protein sequence design [21-23].
 
 

References

1. Dill, KA, Bromberg, S, Yue, KZ, Fiebig, KM, Yee, DP, Thomas, PD and Chan, HS. Principles of protein folding - a perspective from simple exact models. Protein Sci., 4, 561-602 (1995).

2. Koehl, P and Delarue, M. Mean-field minimization methods for biological macromolecules. Curr. Opin. Struct. Bio., 6, 222-226 (1996).

3. Landau, LD and Lifshitz, EM. Quantum Mechanics. New York: Pregamon Press (1958).

4. Elber, R and Karplus, M. Enhanced sampling in molecular dynamics: use of time-dependent Hartree approximation for a simulation of carbon-monoxide diffusion through myoglobin. J. Am. Chem. Soc., 112, 9161-9175 (1990).

5. Czerminski, R and Elber, R. Computational studies of ligand diffusion in globins: 1. Leghemoglobin. Proteins: Struct. Funct. Genet., 10, 70-80 (1991).

6. Koehl, P and Delarue, M. Application of a self consistent mean field theory to predict protein side-chain conformations and estimate their conformational entropy. J. Mol. Biol., 239, 249-275 (1994).

7. Straub, JE and Karplus, M. Energy equipartitioning in the classical time-dependent Hartree approximation. J. Chem. Phys., 94, 6737-6739 (1991).

8. Huber, T, Torda, AE and VanGunsteren, WF. Optimization methods for conformational sampling using a Boltzmann-weighted mean field approach. Biopolymers, 39, 103-104 (1996).

9. Finkelstein AV, and Reva, BA. Search for the stable state of a short chain in a molecular field. Protein Eng., 5, 617-624 (1992).

10. Lee C. Predicting protein mutant energetics by self consistent ensemble optimisation. J. Mol. Biol., 236, 918-939 (1994).

11. Roitberg, A and Elber, R. Modelling side-chains in peptides and proteins: application of the locally enhanced sampling and the simulated annealing method to find minimum energy conformations. J. Chem. Phys., 95, 9277-9287 (1991)

12. Jackson, RM, Gabb, HA and Sternberg, MJE. Rapid refinement of protein interfaces incorporating solvation: application to the docking problem. J. Mol. Biol., 276, 265-285 (1998).

13. Mendes, J, Soares, CM and Carrondo, MA. Improvement of side-chain modeling in proteins with the self-consistent mean field theory method based on an analysis of the factors influencing prediction. Biopolymers, 50, 111-131 (1999).

14. Olszewski KA, Piela, L and Scheraga, HA. Meanfield theory as a tool for intramolecular conformation optimisation. 1. Tests on terminally-blocked alanine and met-enkephalin. J. Phys. Chem., 96, 4672-4676 (1992).

15. Olszewski KA, Piela, L and Scheraga, HA. Meanfield theory as a tool for intramolecular conformation optimisation. 2. Tests on the homopolypeptides decaglycine and icosalanine. J. Phys. Chem., 97, 260-266 (1993).

16. Olszewski KA, Piela, L and Scheraga, HA. Meanfield theory as a tool for intramolecular conformation optimisation. 3. Test on melittin. J. Phys. Chem., 97, 267-270 (1993).

17. Rabow, AA and Scheraga, HA. Lattice neural network minimization. Application of neural network optimization for locating the global-minimum conformations of proteins. J. Mol. Biol., 232, 1157-1168 (1993).

18. Zheng, Q, Rosenfeld, R and Kyle, JD. Theoretical analysis of the multi-copy sampling method in molecular modeling. J. Chem. Phys., 99, 8892-8896 (1993).

19. Zheng, Q, Rosenfeld, R, DeLisi, C and Kyle, JD. Multiple copy sampling in protein loop modeling: computational efficiency and sensitivity to dihedral angle perturbations. Protein Sci., 3, 493-506 (1994).

20. Koehl, P and Delarue, M. A self consistent mean field approach to simultaneous gap closure and side-chain positioning in homology modeling. Nature Struct. Biol., 2, 163-170 (1995).

21. Reva, BA and Finkelstein, AV. A new approach to the design of a sequence with the highest affinity for a molecular surface. Protein Eng., 5, 625-628 (1992).

22. Kono, H and Doi, J. Energy minimization method using automata network for sequence and sidechain conformation prediction from given backbone geometry. Proteins: Struct. Funct. Genet., 19, 244-255 (1994).

23. Kono, H and Saven, J. Statistical theory for protein combinatorial libraries. Packing interactions, backbone flexibility, and the sequence variability of a main-chain structure. J. Mol. Biol., 306, 607-628 (2001).





  Page last modified 18 March 2005 http://www.cs.ucdavis.edu/~koehl/BioEbook/