The Molecular Basis For Receptor-Ligand Recognition In the Immunoglobulin Superfamily
Gil, Nelson Homero, Jr.
MetadataShow full item record
The immunoglobulin superfamily (IgSF) is one of the largest in the human proteome, comprising nearly 500 cell-surface and secreted members critical to processes ranging from the modulation of the immune response to the formation of neural synapses. Molecular-level knowledge of IgSF binding interfaces remains scarce: only ~5% of IgSF proteins currently have crystallographic structures in complexed form. The primary motivation of this thesis has been to systematically elucidate the molecular-level details of IgSF binding interfaces in order to aid identification of potential binding partners. To this end, we have developed broadly-applicable computational approaches that extract functional information from protein sequence and structure data. Chapter I provides an expanded introduction to the IgSF and reviews the historical development of protein function prediction using sequence-based and structure-based methods. In addition, the principles of pharmacophore-based protein binding partner prediction are reviewed. Chapter II details the development of Selection of Alignment by Maximal Mutual Information (SAMMI), an algorithm to identify an optimal multiple sequence alignment (MSA) for functional residue detection by conservation analysis. This approach hypothesizes that the mutual information among aligned sequence positions will be maximal in those MSAs that include the most diverse set possible of structurally and functionally homogeneous sequence homologs. In Chapter III the performance of SAMMI is examined in the context of state-of-the-art functional residue prediction methods. We demonstrate that simple conservation analysis of SAMMI-selected MSAs improves upon modern methods, which mostly include sequence conservation as one of several input features for machine learning. We further show that a simple combinatorial MSA sampling algorithm will generally produce an MSA including an optimal set of homologs whose conservation analysis doubles state-of-the-art performance, at which point the primary source of error is due to binding site definition. In Chapter IV, we describe the development of a structure-based binding interface prediction algorithm in which binding sites of structurally homologous proteins are mapped onto a query protein of interest. We demonstrate that IgSFs tend to share the structural locations of their binding sites, and that the reliability of these predictions can be estimated based on their agreement with sequence-based conservation analysis. This allows us to provide binding interface predictions and associated reliability scores for IgSF proteins with unknown binding interfaces. In Chapter V, we provide a new quality indicator for the Protein Ligand Interface Design (ProtLID) algorithm, which predicts binding partners given a protein binding interface. ProtLID creates a residue-specific-(rs)-pharmacophore – a map of residue binding preferences that potential binding partners should satisfy. We implemented a novel statistical analysis of the rs-pharmacophore spatial matching distribution to provide a confidence measurement for any specific binding partner prediction. In summary, this thesis provides novel insights in the study of the IgSF and in the field of bioinformatics. SAMMI is the first objective approach to the selection of representative sequences for subsequent analysis. Our work also highlights the largely neglected importance of optimally composing MSAs for conservation analysis. Furthermore, we provide a systematic examination of IgSF binding interfaces and formalize the notion that IgSF proteins tend to bind at similar geometric locations on their structure. Finally, we provide a quality measure that can complement the ProtLID approach and prioritize experimental study of computational predictions.