Home > Events > Oral Candidacy - Pingjie Tang

Oral Candidacy - Pingjie Tang

Start: 4/19/2018 at 10:30AM
End: 4/19/2018 at 12:30PM
Location: 384 Nieuwland
Attendees: Students and faculty are welcome to attend the presentation portion of the defense.
Add to calendar:
iCal vCal

Pingjie Tang
Oral Candidacy
April 19, 2018        10:30 am         384 Nieuwland
Adviser: Dr. Nitesh Chawla
Committee Members:
Dr. Pablo Robles-Granda      Dr. Jed Pitera      Dr. Tim Weninger


Multi-view Representation Learning: Approaches and Applications


 Extracting good quality data representation is a critical step to the success of machine learning algorithms.  Prior works typically rely on utilizing conventional feature engineering techniques to achieve this goal. However, feature engineering requires labor-intensity and handcrafted features have limited impressibility and subject to overfitting. Representation learning approaches have drawn significant attention in recent years which are primarily motivated by the emerging need for generating discriminative features without human supervision. Amid diverse representation learning categories, in this proposal we primarily focus on multi-view representation learning that attempts to harness data from multiple distinct feature spaces to learn data representations. We first introduce how to integrate data from different data sources by constructing a heterogeneous information network, and a relevance search task is performed in order to exploit the relatedness between heterogeneous typed data. Effective features are then designed to identify the network noises and facilitate the prediction. Next, we propose the automatic patent document categorization task and formulate it as a classic multi-label classification problem. We fuse both input documents (X) and observed label sets (Y) associated to each document by projecting them into a common low-dimensional embedding space using a deep neural network model, on which the interaction between relevant X and Y should be maximized by optimizing the embedding vectors for both data views. Real world data is always characterized by large amount of data labels and extremely imbalanced label distribution. In a extreme multi-label classification task, we take advantage of the superiority of deep metric learning to directly measure the correlation between X and Y and mitigate the tail label issue simultaneously. Finally, we present two proposed work including knowledge graph construction and inference and visual-aware recommender system and provide some potential approaches accordingly.