Design tests

Using large-scale experimental data

Recognition specificity

The goal of this protocol is to predict sequences that are consistent with a given protein structure and function. In the benchmark, predictions of tolerated sequences are compared with sequences experimentally selected to be functional (in this case, peptides selected by phage display to be recognized by a given PDZ domain). The performance metrics include area under an ROC-curve (AUC), different sequence profile comparisons (AAD: average absolute difference, and Frobenius distance) and the predicted rank of the most frequent amino acid at each position (Rank Top).

The data sets used in this benchmark are:

  • Human PDZ - a set of 17 human PDZ domains with (i) published phage display data of peptide sequences recognized by the domain and (ii) a PDB structure with a bound peptide;
  • Erbin point mutant - a phage display dataset of peptides recognized by synthetic variants of the Erbin PDZ domain with single point mutations at 10 different positions near the binding site;
  • Erbin 10 mutation - a phage display dataset of peptides recognized by 61 different synthetic variants of the Erbin PDZ domain containing between four and ten mutations.
The SCOP class associated with PDZ domains is b (all beta proteins).

The Sequence Tolerance protocol (see publications below) can be used for specificity prediction and library design (i.e. for phage display, yeast display, or other selection techniques). The protocol determines the scores of a large number of sequences for a given input structure. The scores can be reweighted to emphasize inter- or intra-molecular interactions.

In the current benchmark, the sequence tolerance protocol is run over an ensemble of structures generated by the backrub protocol, which allows changes in both backbone (backrub moves) and side chain (rotamer repacking) degrees of freedom. Backrub moves are accepted or rejected using a Monte Carlo algorithm using the Metropolis criterion.

Sequence tolerance benchmark description
A description of the computational prediction scheme. The general scheme for profile prediction is shown using the DLG1-2 PDZ domain as an example. Each member of a backrub ensemble is used to generate a Position Weight Matrix (PWM). The PWMs are combined into a unified PWM for the final prediction and evaluated by comparison with experimental data from phage display.
Protocol documentation
Smith, CA, Kortemme, T. Structure-Based Prediction of the Peptide Sequence Space Recognized by Natural and Synthetic PDZ Domains. 2010.
J Mol Biol 402(2):460-74. doi: 10.1016/j.jmb.2010.07.032
Smith, CA, Kortemme, T. Predicting the Tolerated Sequences for Proteins and Protein Interfaces Using RosettaBackrub Flexible Backbone Design. 2011.
PLoS ONE 6(7):e20451. doi: 10.1371/journal.pone.0020451