Home > Events > PhD Defense - Tomer Levinboim

PhD Defense - Tomer Levinboim

Start: 3/28/2017 at 11:00AM
End: 3/28/2017 at 2:00PM
Location: 100 Stinson Remick
Add to calendar:
iCal vCal

Tomer Levinboim

Dissertation Defense

March 28, 2017          11:00am          100 Stinson Remick

Adviser:  Dr. David Chiang

Committee Members:

Dr. Kevin Knight     Dr. Tijana Milenkovic   Dr. Dong Wang       Dr. Tim Weninger     


Invertibility and Transitivity in Low-Resource Machine Translation


Phrase-based machine translation (MT) emerged as a successful translation technology in the early 2000s after years of attempts using rule-based and word-based methods. This translation technology can be viewed as a string re-writing mechanism that composes a target language translation by segmenting, reordering and translating a given source sentence, using probabilistic rules that are estimated from bilingual corpora.

Since its inception, there have been many attempts to integrate linguistic knowledge (syntax, morphology) within the phrase-based MT framework. However, acquiring and applying such knowledge ultimately relies on large manually-annotated datasets (treebanks, part-of-speech annotations, morphologically analyzed words), which, unfortunately, are not available for most languages.

In this dissertation, I focus on the “low-resource” data scenario, in which the availability of linguistic analyzers cannot be assumed, and the size of the bilingual corpus is limited, which leads to statistical estimation problems for the underlying machine learning (ML) algorithms.

I argue that translation should exhibit two abstract properties -- invertibility and transitivity.  Intuitively, these terms mean that, by translating from one specific language to another and then back, we expect to produce similar text to the original one, and, that translation through an intermediate language should not differ by much from direct source-to-target translation. This intuition stems from the observation that translation is a meaning-preserving transformation, or rather, that after translation, both input and output sentences should encode the same meaning.

In the talk, I will present efficient and language-independent ML techniques that promote or rely on transitivity and invertibility. I will show that integrating these properties into the phrase-based MT pipeline leads to significant improvements in low-resource MT.