Abstract.
Protein loops often play important roles in biological
functions. Modeling loops accurately is crucial to determining the
functional specificity of a protein. Despite the recent progress in
loop prediction approaches, which led to a number of algorithms over
the past decade, few rigorous algorithmic approaches exist to model
protein loops using global orientational restraints, such as those
obtained from residual dipolar coupling (RDC) data in solution NMR
spectroscopy. In this article, we present a novel, sparse data,
RDC-based algorithm, which exploits the mathematical interplay between
RDC-derived sphero-conics and protein kinematics, and formulates the
loop structure determination problem as a system of lowdegree
polynomial equations that can be solved exactly, in closed-form. The
polynomial roots, which encode the candidate conformations, are
searched systematically, using provable pruning strategies that triage
the vast majority of conformations, to enumerate or prune all possible
loop conformations consistent with the data; therefore, completeness
is ensured. Results on experimental RDC datasets for four proteins,
including human ubiquitin, FF2, DinI and GB3, demonstrate that our
algorithm can compute loops with higher accuracy, a 3- to 6-fold
improvement in backbone RMSD, versus those obtained by traditional
structure determination protocols on the same data. Excellent results
were also obtained on synthetic RDC datasets for protein loops of
length 4, 8 and 12 used in previous studies. These results suggest
that our algorithm can be successfully applied to determine protein
loop conformations, and hence will be useful in high-resolution
protein backbone structure determination, including loops, from sparse
NMR data.