Home > Seminars > Karl Stratos - Learning Effective Representations of Text

Karl Stratos - Learning Effective Representations of Text


11/9/2017 at 3:30PM


11/9/2017 at 4:45PM


140 DeBartolo


College of Engineering close button

David Chiang

David Chiang

VIEW FULL PROFILE Email: dchiang@nd.edu
Phone: 574-631-9441
Website: http://www.nd.edu/~dchiang/
Office: 326D Cushing Hall


College of Engineering Associate Professor
Natural language processing, machine learning, and digital humanities.
Click for more information about David
Add to calendar:
iCal vCal

Learning rich, useful and generalizable representations of text is a fundamental problem in natural language processing (NLP). They have long played a critical role in advancing the field, ranging from the agglomerative word clusters of Brown et al. (1992) to many intricate neural architectures today.

I will present a series of efforts on learning effective representations of text. The first part of the talk will focus on new spectral algorithms for certain representation learning problems in NLP: word clustering, word embedding, and unsupervised part-of-speech tagging (UAI 2014; ACL 2015; TACL 2016). This includes the first provably correct algorithm for the influential clustering framework of Brown et al. The second part of the talk will focus on new neural architectures that exploit domain-specific properties to calculate superior representations, including a Unicode-based sub-character architecture and a multitasking architecture for named-entity recogntion (EMNLP 2017). 

Seminar Speaker:

Karl Stratos

Karl Stratos

Toyota Technological Institute at Chicago

Karl Stratos is a research assistant professor at Toyota Technological Institute at Chicago (TTIC). Before TTIC, he was a senior research scientist at Bloomberg L.P. (2016-2017). He received a PhD in computer science from Columbia University where he focused on spectral methods in the context of NLP. His PhD advisor was Michael Collins; he also worked closely with Daniel Hsu during this time. He is broadly interested in statistical approaches to language processing. He is particularly interested in methods that can leverage unlabeled data, making his research tend toward semi-supervised/unsupervised learning. His recent research interests include information extraction, in particular entity linking.