The goal of this protocol is to predict sequences that are consistent with a given protein structure and function. In the benchmark, predictions of tolerated sequences are compared with sequences experimentally selected to be functional (in this case, peptides selected by phage display to be recognized by a given PDZ domain). The performance metrics include area under an ROC-curve (AUC), different sequence profile comparisons (AAD: average absolute difference, and Frobenius distance) and the predicted rank of the most frequent amino acid at each position (Rank Top).
The data sets used in this benchmark are:
The Sequence Tolerance protocol (see publications below) can be used for specificity prediction and library design (i.e. for phage display, yeast display, or other selection techniques). The protocol determines the scores of a large number of sequences for a given input structure. The scores can be reweighted to emphasize inter- or intra-molecular interactions.
In the current benchmark, the sequence tolerance protocol is run over an ensemble of structures generated by the backrub protocol, which allows changes in both backbone (backrub moves) and side chain (rotamer repacking) degrees of freedom. Backrub moves are accepted or rejected using a Monte Carlo algorithm using the Metropolis criterion.