Design tests

Using evolutionary information

Sequence profile recovery and amino acid covariation

Amino acid covariation, where the identities of amino acids at different sequence positions are correlated, is a hallmark of naturally occurring proteins. This benchmark quantifies the extent to which computational protein design methods can recapitulate naturally occurring amino acid covariation.

To compare amino acid covariation in natural and predicted designed protein sequences, we selected a dataset of 40 protein domains that were diverse with respect to their secondary structure composition and fold class (the fold classes are defined explicitly in the paper below and are also contained in the protocol capture). We then quantified natural amino acid covariation for each domain by creating a multiple sequence alignment for the domain, followed by computing covariation between every pair of columns in the multiple sequence alignment by using a mutual information based method. Pairs of amino acid positions with a covariation score that is two standard deviations above the mean or greater were considered to be highly covarying pairs.

We predicted designed protein sequences for each of the 40 domains using RosettaDesign (see publication below). We first used the standard RosettaDesign fixed backbone protocol, which takes a crystal structure as input and runs Monte Carlo simulated annealing, to predict 500 designed sequences for each domain structure. We then quantified amino acid covariation in the designed sequences and compared it to natural amino acid covariation for each domain.

To investigate the effect of the magnitude of backbone flexibility in the design protocol, we generated conformation ensembles with a variety of protocols, including Backrub, Kinematic Closure ("KIC"), Small Phi-Psi moves ("Small"), and Relax. We designed sequences for each ensemble and quantified similarity to natural covariation for each set of sequences and observed a significant increase in covariation similarity for flexible backbone simulations relative to the fixed backbone simulation. The benchmark also examines several other sequence characteristics, including sequence recovery, sequence profile similarity, and sequence entropy.

Figure
Covariation benchmark description

Flow chart of the computational strategy to compare natural and designed amino acid covariation [Ollikainen & Kortemme].

For each domain family, a crystal structure of the domain is obtained from the PDB. This structure is passed to a protocol that generates a conformational ensemble of protein structures. Each structure in this ensemble is then passed to a protocol that designs a low energy sequence consistent with the structure. Amino acid covariation is calculated for every pair of positions in the designed sequences and the designed covariation is compared to the covariation seen among naturally occurring sequences with the same protein domain.

Figure
Covariation benchmark description

Flow chart of the computational strategy to compare natural and designed amino acid covariation [Ollikainen & Kortemme].

For each domain family, a crystal structure of the domain is obtained from the PDB. This structure is passed to a protocol that generates a conformational ensemble of protein structures. Each structure in this ensemble is then passed to a protocol that designs a low energy sequence consistent with the structure. Amino acid covariation is calculated for every pair of positions in the designed sequences and the designed covariation is compared to the covariation seen among naturally occurring sequences with the same protein domain.

Publications
Ollikainen, N, Kortemme, T. Computational Protein Design Quantifies Structural Constraints on Amino Acid Covariation. 2013.
PLoS Comput Biol 9(11):e1003313. doi: 10.1371/journal.pcbi.1003313