Computational studies of de novo motif discovery in aptamer selections
Shieh, Kevin R.
MetadataShow full item record
Aptamers are synthetic oligonucleotides with many promising applications as biomarkers, diagnostics, and therapeutics. Through a combinatorial chemistry technique known as SELEX (systematic evolution of ligands by exponential enrichment), researchers have the potential to develop novel aptamer binders against almost any target molecules, proteins, or cells. Following the SELEX process, we identify functional aptamers that can be characterized by their underlying motifs that are responsible for the specific binding interaction. Cloning and sequencing of candidate aptamers, however, is limited to tens of sequences, constraining the number of sequences that one can analyze at the bench. By contrast, next-generation sequencing (NGS) techniques provide increased sequencing coverage of the RNA species in the enriched pool. The sequencing of multiple rounds also has the potential to reveal insights into the dynamics of aptamer evolution and enrichment throughout the selection. One caveat is that with the greater sequencing coverage, we must rely on new methods of analysis to sift through the selected sequence libraries produced by NGS methods to identify a tractable subset for analysis at the bench.;In this dissertation, we explore de novo motif discovery in the selection of aptamers using NGS and SELEX. Our studies leverage computational analysis of sequencing data to elucidate the evolutionary changes occurring during a selection and we use this information to improve the identification of novel aptamer motifs. We first provide background information about aptamers and the screening technique for identifying them using NGS and SELEX We then introduce frequency distributions as a mechanism to partition enriched populations from background distributions, thereby reducing the sequence space for analysis. We have developed a program for experimentalists called AptCompare, which simplifies the analysis of NGS results from aptamer selections. AptCompare provides a pipeline and graphical user interface for non-bioinformaticians to run simple pre-processing of sequencing reads and analysis of processed data. The program combines six existing methods of analysis, and we compare the results of these six programs on the same selection. One of these programs, RNAmotifAnalysis, is computationally intensive, and many laboratories may not have sufficient computing resources to run it. To facilitate this analysis, we provide a computing framework that enables the execution of tasks in the distributed computing environment of the Open Science Grid. Our work seeks to enhance the design of novel aptamers by aiding in the discovery and characterization of their binding motifs.