Oral Candidacy - Olivia Choudhury
|Start:||4/12/2016 at 1:00PM|
|End:||4/12/2016 at 3:30PM|
|Location:||258 Fitzpatrick Hall|
Faculty and students are welcome to attend the presentation portion of the defense.
April 12, 2016
258 Fitzpatrick Hall
Adviser: Dr. Scott Emrich
Dr. Kevin Bowyer Dr. Jeanne Romero-Severson Dr. Douglas Thain
"Expediting Analysis and Improving Fidelity of Emerging Genome Data"
The plummeting cost of genome sequencing has spurred the generation of massive quantities of genomic data over the last two decades. Based on the availability of well-developed genomic resources of itself or a close relative, organisms can be categorized as model (with a known reference) or non-model (without a known reference) species. In both cases, sequencing throughput has out-paced server performance, causing a major bottleneck in the rate of data analysis. The problem is further exacerbated due to the difficulty involved in extracting useful information from next generation sequencing data, primarily due to the presence of missing or erroneous values. Although model species resort to existing reference genomes for imputation and correction, the problem is more challenging for non-model species.
It is imperative, therefore, to formulate efficient and accurate computational formalisms to address the afore-mentioned challenges. To this end, the first phase of this thesis proposal focuses on the development of high-throughput frameworks for expediting genomic data analysis. In this context, a regression-based predictive model is designed in order to determine an optimal runtime configuration for cost-efficient utilization of computing resources for next generation sequence analysis. The second phase of this thesis proposal provides an efficient approach for improving the fidelity of genomic data containing substantial missing or specious values in two bioinformatics applications: (i) haplotype phasing; and (ii) single molecule real-time sequencing. The final phase of this proposal discusses the implementation and performance of the proposed methodologies in improving the quality of multiple malaria vector genomes and understanding chromosome evolution in the ecologically (and economically) important oak genome. The goal of this research is to develop systematic formalisms for the improvement of quality and pace, the two challenging facets of analyzing large-scale, emerging genome data, to efficiently and robustly, extricate the underlying wealth of information.