Oral Candidacy - Tomer Levinboim
|Start:||8/20/2015 at 2:00PM|
|End:||8/20/2015 at 5:00PM|
|Location:||315 Stinson Remick|
Faculty and students are welcome to attend the presentation portion of the defense.
August 20, 2015 315 Stinson Remick 2:00pm
Adviser: Dr. David Chiang
Dr. Kevin Knight Dr. Tijana Milenkovic Dr. Dong Wang Dr, Tim Weninger
"Better Exploiting Translation Invertibility and Transitivity in Low-Resource Machine Translation"
Proposed thesis: Translation invertibility and transitivity can be better exploited in machine translation (and in particular, low-resource machine translation), while keeping the underlying algorithms computationally efficient.
Overview: Our intuition about translation tells us it is both invertible and transitive. That is, that the process of translating a sentence from some language and then back should likely yield the original sentence (hence translation is invertible) and that carefully translating from (say) French to English and then from English to Spanish, should likely yield a valid Spanish-to-French translation (hence, translation is transitive).
Although requiring these two properties from a Machine Translation (MT) system seems reasonable, existing approaches have either not utilized them, or failed to utilize them effectively. In particular, the most successful and prevalent Statistical MT approach, phrase-based MT, is entirely based on a string rewriting and re-ordering mechanism whose construction does not take these properties into account. Therefore, I propose to improve the output quality of phrase-based MT systems by developing efficient and effective invertibility and transitivity based machine learning (ML) techniques that can be integrated along the MT pipeline.
Using invertibility and transitivity seem especially appealing due to their abstract mathematical nature and the fact that they are properties of translation and not of any particular language. In contrast, many other approaches try to add linguistic knowledge (such as syntax, morphology, or semantics) which requires high quality manually annotated data to model and whose intergration into the MT pipeline often complicates the underlying ML algorithms. Such problems and others are further exacerbated for language-pairs that suffer from small amounts of training data (low-resource).
In the talk, I will discuss my existing invertibility and transitivity based contributions to SMT, as well as propose three new contributions along the MT pipeline, which together will serve to support my proposed thesis.