This benchmark measures how accurately the change in stability of a monomeric protein (ΔΔG) caused by a point mutation is predicted. A ΔΔG value may be predicted as the difference in score between the modeled wild-type and mutant structures (for Rosetta protocols, this score is measured in Rosetta energy units, or REU). The predicted ΔΔG values are compared against ΔΔG values (in kcal/mol) measured by published experiment assays. The metrics used to evaluate the protocols are: i) the linear correlation (Pearson coefficient) between experimental and predicted values; ii) the mean absolute error (MAE) of same i.e. the average of the sum of absolute differences, or errors, between the pairs of experimental values and their corresponding predicted values; and iii) the stability classification accuracy, which measures whether a mutation was correctly predicted to be (de)stabilizing or neutral.
The correlation metric is impartial to the scale of the scoring function used for prediction i.e. one unit of the scoring function need not correspond to 1 kcal/mol. The other two metrics consider absolute values. In the tables below, the MAE metric considers the experimental and predicted values to have the same unit of measurement e.g. 1 kcal/mol = 1 REU. Therefore, a high MAE may indicate a difference in absolute values rather than relative values. In the stability classification accuracy metric, the definition of the range of ΔΔG values for which a mutation is considered neutral determines the result. In the tables below, we have defined the range of neutral experimental ΔΔG values as between -1 kcal/mol and +1 kcal/mol and the range of neutral predicted values as between -1 unit (e.g. -1 REU) and +1 unit of the scoring function of the method.
This benchmark contains three curated datasets which have been previously published: i) a curated set of 1005 single point mutants collected by Guerois et al.; ii) a curated set of 2154 single point mutants collected by Potapov et al.; and iii) a curated set of 1210 single point mutants collected by Kellogg et al. where the mutated protein chains are limited to 350 residues. All of the data in these datasets are taken from experimental measurements and most originate from the ProTherm database, a large database of experimental measurements of changes in protein stability upon point mutation. We have made some modifications to the datasets from the original publications, such as: i) updating deprecated PDB identifiers and correcting PDB IDs, PDB residue IDs, and ΔΔG values based on cross-referencing to the respective publications; ii) attributing each record of a dataset with publications from which the ΔΔG values originate; and iii) adding secondary structure information for the mutated residues (derived using mkdssp) as well as the SCOP class and Pfam domains using information from SIFTS and SCOPe.
We have also compiled a new, lightly-curated dataset of 2971 single point mutants from the ProTherm database which we refer to below as ProTherm*. This dataset contains most of the single point mutations available in the database where the corresponding X-ray crystal structures has a resolution of at least 2.5 Å, taking mean values when multiple measurements had been published. A small number of records were discounted due to a variance of more than 2.5 kcal/mol in the experimental values and mutations in transmembrane proteins were omitted.
The four datasets overlap both in terms of their point mutations and, for a given mutation, in terms of published experimental ΔΔG values. We have determined this overlap to a large degree and find that nearly all of the mutations in the datasets are contained within the complete ProTherm database (see graph below). The three previously published datasets use aggregated (mean) values from multiple ΔΔG measurements on some data points.
The recommended Rosetta protocol for computing changes in monomeric protein stability upon point mutation is a method described by Kellogg et al. and referred to as Protocol 16 below, named after its row in Table 1 [Kellogg et al.]. Protocol 16 combines a soft-repulsive potential for conformational sampling of side-chains with a standard hard-repulsive potential for minimization. The ΔΔG value is calculated as the difference between the three best-scoring wild-type structures and the three best-scoring mutant structures out of (typically) fifty such pairs.