Home > Seminars > CSE Seminar Series-Guided Feature Selection from Incomplete Data in Disease-gene Prediction and Chemoinformatics Applications

CSE Seminar Series-Guided Feature Selection from Incomplete Data in Disease-gene Prediction and Chemoinformatics Applications


5/21/2012 at 3:00PM


5/21/2012 at 4:00PM


356A Fitzpatrick


College of Engineering close button

Tijana Milenkovic

Tijana Milenkovic

VIEW FULL PROFILE Email: tmilenko@nd.edu
Phone: 574-631-8975
Website: http://www.nd.edu/~tmilenko/
Office: 381 Fitzpatrick Hall


College of Engineering Associate Professor
I am the director of the Complex Networks Lab (http://www.cse.nd.edu/~cone/). My research interests are as follows. Complex networks and network mining: developing graph theoretic, mathematical, and computational algorithms for efficient extraction of function from topology of complex ...
Click for more information about Tijana
Add to calendar:
iCal vCal
It is frequently the case, when solving bioinformatics problems, that several descriptive aspects of the data are found to be informative. For example, both gene expression levels and genome sequences can be used for the discovery of genetic markers of disease susceptibility. Abundant studies have shown that in such cases, predictors benefit from the integration of those multiple representations. These studies, however, generally assume that all relevant representations (or feature sets) are available for every single sample in the dataset. In practice, this assumption is unfortunately often unrealistic; for example, several GWAS and microarray studies might be available for the same phenotype, but share only a handful of patients. Moreover, it is also possible that we are in possession of the features of additional samples that haven't been characterized for the problem at hand (such as the genotype and gene expression levels of individuals not phenotyped for the trait of interest). Under such conditions, it is desirable to make use of all available information, as opposed to being restricted to fully labeled, fully described samples. In particular, we wish to leverage the power of otherwise available information to guide feature selection on the rarest attributes. For that purpose, we present RosettaSVM, a co-regularized support vector machine approach that minimizes the classification error, over fully described data points, of predictors trained on either feature set, and apply it to select the most relevant attributes among those least frequently available on several simulated and real data sets from gene-disease association and chemoinformatics applications.

Seminar Speaker:

Chloe-Agathe Azencott

Max Planck Institute, Germany

Chloe-Agathe Azencott is a postdoctoral researcher in the Machine Learning and Computational Biology research group at the Max Planck Institute for Intelligent Systems and the Max Planck Institute for Developmental Biology in Tuebingen (Germany). She received her M.E. in Computer Science from Telecom Bretagne (France) and her M.S. in Mathematics and Computer Science from the University of Rennes 2 (France) in 2005. She received her PhD in Computer Science from the University of California, Irvine in 2010. Her primary research interests are applications of machine learning and data mining to the life sciences, in particular in the fields of drug discovery and personalized medicine.

Seminar Sponsors: