PhD Defense - Tomer Levinboim
|Start:||3/28/2017 at 11:00AM|
|End:||3/28/2017 at 2:00PM|
|Location:||100 Stinson Remick|
March 28, 2017 11:00am 100 Stinson Remick
Adviser: Dr. David Chiang
Dr. Kevin Knight Dr. Tijana Milenkovic Dr. Dong Wang Dr. Tim Weninger
Invertibility and Transitivity in Low-Resource Machine Translation
Phrase-based machine translation (MT) emerged as a successful translation technology in the early 2000s after years of attempts using rule-based and word-based methods. This translation technology can be viewed as a string re-writing mechanism that composes a target language translation by segmenting, reordering and translating a given source sentence, using probabilistic rules that are estimated from bilingual corpora.
Since its inception, there have been many attempts to integrate linguistic knowledge (syntax, morphology) within the phrase-based MT framework. However, acquiring and applying such knowledge ultimately relies on large manually-annotated datasets (treebanks, part-of-speech annotations, morphologically analyzed words), which, unfortunately, are not available for most languages.
In this dissertation, I focus on the “low-resource” data scenario, in which the availability of linguistic analyzers cannot be assumed, and the size of the bilingual corpus is limited, which leads to statistical estimation problems for the underlying machine learning (ML) algorithms.
I argue that translation should exhibit two abstract properties -- invertibility and transitivity. Intuitively, these terms mean that, by translating from one specific language to another and then back, we expect to produce similar text to the original one, and, that translation through an intermediate language should not differ by much from direct source-to-target translation. This intuition stems from the observation that translation is a meaning-preserving transformation, or rather, that after translation, both input and output sentences should encode the same meaning.
In the talk, I will present efficient and language-independent ML techniques that promote or rely on transitivity and invertibility. I will show that integrating these properties into the phrase-based MT pipeline leads to significant improvements in low-resource MT.