Home > Seminars > Cornelia Caragea - Keyphrase Extraction in Citation Networks: How Do Citation Contexts Help?

Cornelia Caragea - Keyphrase Extraction in Citation Networks: How Do Citation Contexts Help?

Start:

2/11/2016 at 3:30PM

End:

2/11/2016 at 5:00PM

Location:

356 Fitzpatrick

Host:

College of Engineering close button
headerbottom

Patrick Flynn

Patrick Flynn

VIEW FULL PROFILE Email: flynn@nd.edu
Phone: 574-631-8803
Website: http://www.nd.edu/~flynn
Office: 384A Fitzpatrick Hall
Curriculum Vitae

Affiliations

College of Engineering Duda Family Professor of Engineering
Computer Vision Biometrics Pattern Recognition Computer Graphics and Scientific Visualization Mobile Application Development
Click for more information about Patrick
574-631-8803
Add to calendar:
iCal vCal

Keyphrase extraction is defined as the problem of automatically extracting descriptive phrases or concepts from documents. Keyphrases for a document act as a concise summary of the document and have been successfully used in many applications such as query formulation, document clustering, classification, recommendation, indexing, and summarization. Previous approaches to keyphrase extraction generally used the textual content of a target document or a local neighborhood that consists of textually-similar documents. We posit that, in a scholarly domain, in addition to a document’s textual content and textually-similar neighbors, other informative neighborhoods exist that have the potential to improve keyphrase extraction. In a scholarly domain, research papers are not isolated. Rather, they are highly inter-connected in giant citation networks, in which papers cite or are cited by other papers in appropriate citation contexts, i.e., short text segments surrounding a citation’s mention. These contexts are not arbitrary, but they serve as brief summaries of a cited paper. We effectively exploit citation context information for keyphrase extraction and show remarkable improvements in performance over strong baselines in both supervised and unsupervised settings.

Seminar Speaker:

Cornelia Caragea

Cornelia Caragea

University of North Texas

Cornelia Caragea is an Assistant Professor at the University of North Texas in the Computer Science and Engineering department, where she directs the Machine Learning group. Her research interests lie at the intersection of artificial intelligence, machine learning, data mining, information retrieval, and natural language processing, with applications to text and image analysis, scientific data analysis, bioinformatics, and social media. She has published research papers in prestigious venues such as AAAI, IJCAI, WWW, EMNLP, ICDM, and ACM Transactions on the Web. Cornelia reviewed for many journals including Nature, ACM Transactions on Intelligent Systems and Technology, and IEEE Transactions on Knowledge and Data Engineering, served on several NSF panels, and was a program committee member for top conferences such as AAAI, IJCAI, ACL, NAACL, EMNLP, Coling, and CIKM. She also helped organize several workshops on scholarly big data in conferences such as IJCAI, AAAI, and IEEE BigData. Cornelia earned a Bachelor of Science degree in Computer Science and Mathematics from the University of Bucharest, and a Ph.D. in Computer Science from the Iowa State University. Prior to joining the University of North Texas in Fall 2012, she was a post-doctoral researcher at the Pennsylvania State University.