http://www.cs.ucdavis.edu/~koehl/


Protein Sequence Design


1. Specificity versus Stability

1.1 What is specificity ?


The inverse folding problem was originally defined by Pabo [1] as the problem of defining the sequences compatible with a given protein fold. A successful protein design calculation should generate a sequence compatible with the template fold (the "design in" procedure), and incompatible with competing folds (the "design out" procedure, or specificity problem). This problem can be reformulated as finding the sequence, S, such that it has a high probability, P, to be in the template conformation, Cnat, at room temperature. P is given by:


E(C,S) is the energy of sequence S in conformation C, T is the temperature and k is the Boltzmann constant. The denominator in eq. (1) corresponds to a partition function, Z. A rigorous approach to the problem of maximizing the probability P would require simultaneous and complete explorations of all of sequence space and conformation space. While this may be feasible for a short peptide chain with a simplified representation [2], it cannot be applied to a longer protein chain with a detailed all-atom representation.

In some studies, this problem has been ignored: Malakauskas and Mayo [3] successfully redesigned the core of the B1 domain of protein G, using a variant of the dead-end elimination algorithm, without explicit consideration of specificity. Ignoring specificity however has not always been successful. In the case of the HP model, for example, superstable sequences have been designed with all H inside and all P at the surface [4,5]. These sequences however are not specific, and can fold into many "native" conformations [6].

1.2 Specificity through Foldability

Several criteria have been proposed to simplify or replace the optimization of the occupational probability defined in equation (1). These criteria all relate to the concept of "foldability", i.e. the degree to which a particular sequence is likely to fold. To correlate the foldability with the stability and kinetic accessibility of particular proteins, the following parameters were suggested:

  • the energy gap between the folded state and the lowest energy structure from an ensemble of non folded states [7];
  • the ratio Tf/Tg, where Tf is the temperature at which the folded state becomes stable, and Tg is a heteropolymer glass transition temperature [8,9];

  • , where is the difference in the energy of the folded states and the average energy of non-folded states and is the standard deviation of the energy distribution temperature [8,10,11].

Some of these criteria have been used in model studies based on lattice models. It has not been clear, however, how to integrate any of these terms into a protein design algorithm that does not involve enumeration of all sequences and structures.

Even though a systematic exploration of conformational space is not possible, there have been attempts to include competing backbones in full atom, off-lattice protein design procedures. Among the successful results, it is worth mentioning the recent work of Harbury and co-workers [12] who design families of -helical bundle proteins with a right-handed super-helical twist. In their design procedure, the overall protein fold was specified by hydrophobic-polar residue patterns, whereas the bundle oligomerization states, and interior side-chain rotamers were engineered by computational enumeration of packing in alternate backbone structures. Their designed peptides were found experimentally to form -helical dimers, trimers and tetramers in agreement with their design goals. Similarly, the design of metal-ion binding sites with specified spatial arrangement sites was found to be possible only by considering specificity through the inclusion in the design procedure of alternating backbones [13-16]. These procedure remains unfortunately limited to the study of small peptides small repeats in longer peptides, and/or small regions in proteins such as metal binding sites, and cannot be applied directly to the complete design of even a medium sized protein.

1.3 Specificity and the Random Energy Model

As an alternative, Shakhnovich and Gutin [17,18] proposed a simple, approximate solution to the problem of specificity, based on the random energy model (for review, see Pande et al. [19]). In this approach, the partition function Z (denominator of eq. (1)) is assumed to depend only on the amino acid composition and not on the ordered sequence itself. Given this approximation, specificity can be achieved by optimization in sequence space alone, provided that the amino acid composition of the sequence is held constant. This procedure has been applied to protein design simulation on lattice [17, 18]. A major feature of this approach is that it is computationally feasible, even in the case of full-atom representations.

References

1. Pabo, C. Designing proteins and peptides. Nature, 301, 200 (1983).

2. Seno, F, Vendruscolo, M, Maritan, A and Banavar, JR. Optimal Protein Design Procedure. Physical Review Letters, 77, 1901-1904 (1996).

3. Malakauskas, SM and Mayo, SL. Design ; Structure and Stability Of a Hyperthermophilic Protein Variant. Nature Structural Biology, 5, 470-475 (1998).

4. Shakhnovich, EI and Gutin, AM. A new approach to the design of stable proteins. Protein Eng., 6, 793-800 (1993).

5. Shakhnovich, EI and Gutin, AM. Engineering of stable and fast-folding sequences of model proteins. Proc. Natl. Acad. Sci. (USA), 90, 7195-7199 (1993).

6. Yue, K and Dill, KA. Inverse protein folding problem: designing polymer sequences. Proc. Natl. Acad. Sci. (USA), 89, 4163-4167 (1992).

7. Shakhnovich, EI. Theoretical-Studies Of Protein-Folding Thermodynamics and Kinetics. Current Opinion In Structural Biology, 7, 29-40 (1997).

8. Goldstein, RA, Luthey-Schulten, ZA and Wolynes, PG. Optimal protein folding codes from spin glass theory. Proc. Natl. Acad. Sci. (USA), 89, 4918-4922 (1992).

9. Socci, N and Onuchic, J. Folding kinetics of protein-like heteropolymers. J. Chem. Phys., 101, 1519-1528 (1994).

10. Abkevich, V, Gutin, A and Shakhnovich, E. Improved design of stable and fast-folding model proteins. Fold. Des., 1, 221-230 (1996).

11. Melin, R, Li, H, Wingreen, N and Tang, C. Designability, thermodynamic stability, and dynamics in protein folding: a lattice model study. J. Chem. Phys., 110, 1252-1262 (1999).

12. Harbury, P, Plecs, J, Tidor, B, Alber, T and Kim, P. High-resolution protein design with backbone freedom. Science, 282, 1462-1467 (1998).

13. Coldren, CD, Hellinga, HW and Caradonna, JP. The Rational Design and Construction Of a Cuboidal Iron-Sulfur Protein. Proceedings Of the National Academy Of Sciences Of the United States Of America, 94, 6635-6640 (1997).

14. Pinto, AL, Hellinga, HW and Caradonna, JP. Construction Of a Catalytically Active Iron Superoxide-Dismutase By Rational Protein Design. Proceedings Of the National Academy Of Sciences Of the United States Of America, 94, 5562-5567 (1997).

15.Hellinga, HW. The Construction Of Metal Centers In Proteins By Rational Design. Folding & Design, 3, R1-R8 (1998).

16. Hellinga, HW. Construction Of a Blue Copper Analog Through Iterative Rational Protein Design Cycles Demonstrates Principles Of Molecular Recognition In Metal Center Formation. Journal Of the American Chemical Society, 120, 10055-10066 (1998).

17. Shakhnovich, EI and Gutin, AM. A new approach to the design of stable proteins. Protein Eng., 6, 793-800 (1993).

18. Shakhnovich, EI and Gutin, AM. Engineering of stable and fast-folding sequences of model proteins. Proc. Natl. Acad. Sci. (USA), 90, 7195-7199 (1993).

19. Pande, VS, Grosberg, AY and Tanaka, T. Statistical-Mechanics of Simple-Models of Protein-Folding and Design. Biophysical Journal, 73, 3192-3210 (1997).







  Page last modified 2 January 2005 http://www.cs.ucdavis.edu/~koehl/BioEbook/