Home > Events > Oral Candidacy - Paige Rodeghero

Oral Candidacy - Paige Rodeghero

Start: 5/11/2017 at 1:00PM
End: 5/11/2017 at 4:00PM
Location: 181 Fitzpatrick
Attendees: Faculty and students are welcome to attend the presentation portion of the defense.
Add to calendar:
iCal vCal

Paige Rodeghero

Oral Candidacy

May 11, 2017          1:00 pm           181 Fitzpatrick

Adviser:  Dr. Collin McMillan


Dr. Jane Cleland-Huang        Dr. Nick Kraft        Dr. Aaron Striegel


Title: Learning Programmer Behavior to Improve Automatic Documentation Generation Algorithms


Programmers spend a large portion of their time reading and navigating source code in order to comprehend it.  However, studies of program comprehension consistently find that programmers would prefer to focus on small sections of code during software maintenance, and "try to avoid" comprehending everything.  Programmers depend on documentation to quickly understand source code.  Source code documentation explains how the source code works in a variety of ways in plain English text.  It summarizes source code, explains the behavior of a section of code, shows relationships to other code, etc.  Specifically, a 'source code summary" is a small (typically 1-3 sentences) amount of text explaining what the source code does or how it can be used. Programmers prefer reading these summaries to reading code, as Roehm et al. and others have demonstrated.  In this proposal, I focus on improving the creation of these source code summaries by observing programmers documenting source code and writing or modifying algorithms to mimic their behavior.

Unfortunately, source code summaries are time-consuming to write.  Because software is constantly changing, the documentation often needs to also be frequently updated.  However, as Fluri et al. describes many times the documentation is left in its original state due to time constraints.  The result is that documentation may be incorrect and often times misleading. Recently, efforts to automatically generate documentation have proliferated. The long-term goal is to reduce the manual effort to write source code summaries and be able to generate summaries from source code with little to no effort.  Currently, programmers use tools such as JavaDoc and Doxygen.  These require programmers to write summaries as specially-formatted metadata, and all the tools do is put that information into HTML documents -- that is useful, but still leaves a majority of the effort to a human expert.  

Unfortunately, these tools do not generate content such as method summaries and just allow programmers to format the documentation content that they, themselves wrote.  

Recent research has targeted the problem of automatic source code summarization.  Generally speaking, these summarization tools work by analyzing the source code, determining the important terms, and creating summaries based on those terms.  The current tools aimed at performing this task have been shown to be effective under specific conditions, but are unable to achieve human-level quality summaries. The reason that current tools are unable to achieve the quality of a human-written summary is because there is a knowledge gap in the literature: the research community does not know precisely what a summary should include.  Summaries should reflect what programmers need, and we do not know exactly what programmers need.  With an improved understanding of how programmers write documentation, we would be better equipped to automatically generate source code summaries.        

My research seeks to close this gap in the current literature. My strategy is to: 1) study how programmers write documentation, in order to 2) write algorithms that mimic their process.  I will 1) observe programmers reading source code and then mimic the programmers by writing an algorithm that extracts the same information they looked at, 2) observe programmers having meetings and mimic their behavior creating user stories, and 3) observe programmers communication in industry to categorize their behavior.  My research assists code  summarization  research  by  providing a guide to the way programmers work and read source code.