Home > Seminars > What You Need to Know about Chinese for Chinese Language Processing

What You Need to Know about Chinese for Chinese Language Processing

Start:

12/2/2015 at 2:00PM

End:

12/2/2015 at 3:00PM

Location:

258 Fitzpatrick

Host:

College of Engineering close button
headerbottom

David Chiang

David Chiang

VIEW FULL PROFILE Email: dchiang@nd.edu
Phone: 574-631-9441
Website: http://www.nd.edu/~dchiang/
Office: 326D Cushing Hall

Affiliations

College of Engineering Associate Professor
Natural language processing, machine learning, and digital humanities.
Click for more information about David
574-631-9441
Add to calendar:
iCal vCal

In this talk, I will introduce essential knowledge of Chinese linguistics encompassing both the fundamental knowledge of the linguistic structure of Chinese as well as explanations regarding how such knowledge of the language can be explored in Chinese language processing. The perspective will be synergetic, aiming to provide comprehensive knowledge of the linguistic characteristics of the Chinese language along with insights and case studies explaining how such knowledge can help language technology.

The talk will be organized according to the structure of linguistic knowledge of Chinese, starting from the basic building block to the use of Chinese in context. The first part deals with characters (字) as the basic linguistic unit of Chinese in terms of phonology, orthography, and basic concepts. An ontological view of how the Chinese writing system organizes meaningful content as well as how this onomasiological decision affects Chinese text processing will also be discussed. The second part deals with words (词) and presents basic issues involving the definition and identification of words in Chinese, especially given the lack of conventional marks of word boundaries. The third part will focus on lemmatization and parts of speech (词类), underlining the unique challenges Chinese poses for lemmatization, as well as distributional properties of Chinese PoS and tagging systems. The fourth part deals with sentence and structure, focusing on how to identify grammatical relations in Chinese as well as a few Chinese-specific constructions. In each topic, an empirical foundation of linguistics facts are clearly explicated with a robust generalization, and the linguistic generalization is then accounted for in terms of its function in the knowledge representation system. Lastly this knowledge representation role is then exploited in terms of the aims of specific language technology tasks.  In terms of references, in addition to language resources and various relevant papers, the tutorial will make reference to Huang and Shi’s (2016) reference grammar for linguistic description of Chinese.

Seminar Speaker:

Chu-Ren Huang

Chu-Ren Huang

The Hong Kong Polytechnic University

Chu-Ren Huang’s main research areas are corpus and computational linguistics, lexical semantics, Chinese grammar, language resources, ontology, and digital humanities. He is currently a Chair Professor at the Hong Kong Polytechnic University. He is Fellow and President of the Hong Kong Academy of the Humanities, a permanent member of the International Committee on Computational Linguistics, and President of the Asian Association of Lexicography. He played an influential role in the development of the new fields of corpus and computational linguistics in the Chinese and Asian context. His research output includes 12 licensable language resources, 10 searchable online language databases, 20 books or edited volumes, over 130 journal article or book chapters, and over 380 refereed or invited conference papers. He currently serves as Chief Editor of the Journal Lingua Sinica, as well as Cambridge University Press’ Studies in Natural Language Processing and Springer’s The Humanities in Asia book series. He is an associate editor of both Journal of Chinese Linguistics, and Lexicography. He served Conference series in which he play advisory and/or organizing roles include ALR, ASIALEX,CLSW, CogALex, COLING, IsCLL, LAW, OntoLex, PACLIC, ROCLING, and SIGHAN. Chinese language resources constructed under his direction include the CKIP lexicon and ICG, Sinica, Sinica Treebank, Sinica BOW, Chinese WordSketch, Tagged Chinese Gigaword Corpus, Hantology, Chinese WordNet, and Emotion Annotated Corpus. He is the co-author of a Chinese Reference Grammar (Huang and Shi 2016), a book on Chinese Language Processing (Lu, Xue and Huang in preparation).