Patrice Koehl (University of California, Davis)
Michael Levitt (Stanford University)
Marc Delarue (Institut Pasteur, Paris, France)





Why Do We Need Comparative Protein Structure Modelling ?

It has been hypothesized that the total number of different protein folds is finite, and roughly of the order of 1000. The fact that the protein structure space is finite, and much smaller than the protein sequence space has given rise to the hope that it is possible to have representative structure models for all protein sequences, without going into the expensive procedures of systematic experimental structure determination. Structural genomics is currently focusing on the construction of an extensive library of folds, and a figure of 10,000 to 100,000 representative proteins has been proposed. With such a library, it is expected that models for all proteins can be constructed.
 

The aim of structural genomics is to generate enough protein structures
that any unknown protein will have a close homolog whose structure is known.
The success of this approach depends on the number of protein structures (i.e.
number of red dots, as well as on the quality of the tools for detecting structural
homologs (defined schematically by the radii of the circles)

 

The success of this approach is, however, strongly correlated to our ability to identify a proper structural template for a protein of interest, and to build an accurate motel for this protein, based on the template. Techniques to solve the identification step, or "fold recognition" problem rely on the assumption that similarities between the sequences of two proteins imply similarities between the structures of these proteins. The building step is usually referred to "homology modelling", or "comparative protein structure modelling". ProModel was designed as a method for solving this problem, and is described in details below.
 
 

What is Comparative Protein Structure Modelling?

Comparative protein structure modeling usually proceeds in 4 steps:
  • (a) Fold recognition. Firstly, a template protein structure is identified as a plausible model for the protein sequence of interest. This step is usually referred to as fold recognition, and relies on sequence matching [5,6], and/or threading techniques in which the sequence is tested against a library of protein folds.
  • (b) Building the 3D model. In most cases, the template protein only provides an incomplete framework for building a 3-dimensional model for the protein of interest (the "target"). This framework consists of pieces of protein backbones corresponding to the conserved regions in the alignment of the sequence of the template and target protein. The second step of comparative modelling is to fill the gaps in the framework (this is usually referred as "loop building"), and predicting the conformation of the sidechains of the target protein on the completed backbone.
  • (c) Refinement. The structural model generated after step (b) is as good as expected from the sequence alignment. This model can be further refined using energy minimization, either with molecular mechanics or molecular dynamics programs.
  • (d) Assessment of the models. After refinement, the quality of the final model is assessed using standard energy functions and/or manual visual inspection using molecular graphics program.

  •  
     
 

Comparative Protein Structure Modelling: Building the structural model:
- The alignment between the template and target sequences defines conserved region between the two proteins,
from which a framework for the target protein is built.
Gaps in the framework correspond to insertions and deletions.
- A structural model for the target protein is built starting from the framework.


 

What is ProModel, and how does it work ?


ProModel is a suite of programs designed to perform comparative protein structure modelling. Using the PDB file containing the template protein, and the alignment of the sequence of the protein to be modelled and the template sequence, ProModel generates a 3-dimensional model including all heavy atoms. ProModel is based on the Self Consistent Meanfield theory, both for loop modelling and for sidechain conformation prediction. The following links provide both theoretical background and practical information on how ProModel works:

  • Self Consistent Mean Field Theory

  • Many problems in computational biology face large combinatorial obstacles that cannot be solved exhaustively. The self consistent mean field theory or SCMF provides an efficient, fast and robust method to alleviate these obstacles. A genneral description of this theory applied to protein modelling is available here.
  • Sidechain Modelling

  • The simplest homology modelling case occurs when there are no gaps in the alignment of the sequences of the template and target protein, in which case the framework consists of a complete backbone. Building a structural model for the target protein is then reduced to predicting the conformation of the sidechains of the protein. A solution to this problem based on the SCMF theory is described here.
  • Generating loops

  • In general, the framework used as a starting point for comparative protein structure modelling consists of fragments of protein backbone, separated by gaps. One of the first step in the modelling process is to generate possible protein fragment conformations in order to fill in these gaps. A brief description of the method used in ProModel to solve this problem, as well as references to other existing techniques is available here.
  • Generating 3D models

  • ProModel combines all tools described above in a general program designed to select protein fragments for filling the gaps in the framework and simulataneously precit the conformation of all sidechains of the protein. A general overview of ProModel is available here.
     

Useful links

 

ProModel is one among many programs written for solving the comparative protein structure moedlling program. We provide links to other available programs and/or web services. This list is by no means exhaustive.
 
 
Name Web site
MODELLER http://guitar.rockerfeller.edu/modeller/
3D-Jigsaw http://www.bmm.icnet.uk/servers/3djigsaw/
SDSC1 http://cl.sdsc.edu/hm.html
SCWRL http://www.fccc.edu/research/labs/dunbrack/scwrl/
SWISS-MODEL http://www.expasy.ch/swissmod/SWISS-MODEL.html
ESyPred3D http://www.fundp.ac.be/urbm/bioinfo/esypred

 
 


  Page last modified 18 March 2005 http://www.cs.ucdavis.edu/~koehl/ProModel/