Oral Candidacy - Nicholas Hazekamp

Start: 8/3/2017 at 1:00PM
End: 8/3/2017 at 5:30PM
Location: 315 Stinson Remick
Attendees: Faculty and students are welcome to attend the presentation portion of the defense.
Nicholas Hazekamp
Adviser:  Dr. Douglas Thain
Dr.  Scott Emrich        Dr. Nirav Merchant      Dr. Jaroslaw Nabrzyski 

 Title :

 “Scalable Data Partitioning and Resource Provisioning in Data-Intensive Bioinformatic Workflows”


 As users work to scale up data processing scientific workflows are often the solution. However, when using workflows they run into two problems: naive partitioning of data and inaccurate resource provisioning of tasks. This proposal shows how methods such as dynamic job expansion, storage management, and flexible resource specification have been used to address these issues. Also, it will put forward planned worked to dynamically partition data at runtime to avoid naive user-side decisions. This includes techniques to determine appropriate partitions and hoist shared data transfer out of the model for more consistent results. Dynamic partitioning will be developed on a synthetic application, and verified using BWA and MAKER on several datasets using HTCondor, SGE, and the NSF Jetstream cloud platform. We anticipate dynamic partitioning will outperform static partitioning execution time when comparing with a static pool of resources.