Home > Seminars > Aaron Clauset - The Ground Truth About Metadata and Community Detection in Networks

Aaron Clauset - The Ground Truth About Metadata and Community Detection in Networks


10/26/2017 at 3:30PM


10/26/2017 at 4:45PM


140 DeBartolo


College of Engineering close button

Nitesh Chawla

Nitesh Chawla

VIEW FULL PROFILE Email: nchawla@nd.edu
Phone: 574-631-1090
Website: http://www.nd.edu/~nchawla/
Office: 384 Nieuwland Science Hall


College of Engineering Frank M. Freimann Professor
Dr. Chawla's research interests are broadly in the areas of Big Data: data science, machine learning, network science and their applications social networks, healthcare informatics/analytics, and climate/environmental sciences. He directs the Notre Dame Interdisciplinary Center for Network ...
Click for more information about Nitesh
Add to calendar:
iCal vCal
Community detection is one of the most common tasks in network analysis, in which we seek to infer the underlying structural modules or groups of a network from the pattern of which nodes are connected. The standard evaluation metric of these algorithms is based on how closely these inferred communities correlate with node "metadata", i.e., node labels like a person's ethnicity in a social network or the brain region in a connectome.
In this talk, I will present two strong results on community detection and node metadata.
First, I'll introduce the No Free Lunch theorem for community detection, which proves that every community detection algorithms has the same average performance, across all network inputs, and the no-bijection theorem, which proves that no algorithm can always recover "ground truth" communities. However, by using node metadata to guide the community detection process, rather than as an evaluation target, better inferences about the network's organization can be obtained. To substantiate this claim, I'll introduce a Bayesian stochastic block model that automatically learns the correlation between node metadata and network communities, if any exists. The learned correlations are interesting in their own right, and allow us to make predictions about the community membership of nodes whose network connections are unknown. This method performs better than any algorithm that uses only structure or only metadata alone, and I will demonstrate its application to several real-world networks drawn from social, biological, and technological domains.
This is joint work with Leto Peel, Daniel B. Larremore, and Mark Newman.

Seminar Speaker:

Aaron Clauset

Aaron Clauset

University of Colorado Boulder

Aaron Clauset is an Assistant Professor in the Department of Computer Science and the BioFrontiers Institute at the University of Colorado Boulder, and is External Faculty at the Santa Fe Institute. He received a PhD in Computer Science, with distinction, from the University of New Mexico, a BS in Physics, with honors, from Haverford College, and was an Omidyar Fellow at the prestigious Santa Fe Institute.
Clauset is an internationally recognized expert on network science, data science, and machine learning for complex systems. His work has appeared in many prestigious scientific venues, including Nature, Science, PNAS, JACM, WWW, ICWSM, STOC, SIAM Review, and Physical Review Letters, and has been covered in the popular press by the Wall Street Journal, The Economist, Discover Magazine, New Scientist, Wired, Miller-McCune, the Boston Globe and The Guardian.