Home > Seokki Lee - PUG: A Framework for Efficiently Computing and Summarizing Why and Why-not Provenance

Seokki Lee - PUG: A Framework for Efficiently Computing and Summarizing Why and Why-not Provenance


9/19/2019 at 3:30PM


9/19/2019 at 4:45PM


131 DeBartolo


College of Engineering close button

Taeho Jung

Taeho Jung

VIEW FULL PROFILE Email: tjung@nd.edu
Phone: 574-631-8322
Website: https://sites.nd.edu/taeho-jung/
Office: 351 Fitzpatrick Hall


College of Engineering Assistant Professor
Big data security, user privacy, privacy-preserving computation, accountability
Click for more information about Taeho
Add to calendar:
iCal vCal
Explaining why an answer is in the result of a query or why it is missing from the result is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. Both types of questions, i.e., why and why-not provenance, have been studied extensively. In this work, we introduce a graph-based provenance model that, while syntactic in nature, is powerful enough to encode the evaluation of queries with negation (First-Order queries). We demonstrate that our model generalizes a wide range of provenance models from the literature. Using our model, we present the first practical approach for capturing such provenance for a set of (missing) query results a user is interested in. We introduce a rewriting technique that efficiently generates explanations, i.e., a part of the provenance which is relevant to the query outputs of interest. 
However, for why-not provenance, and to a lesser degree also why-provenance, explanations can still be very large resulting in severe scalability and usability challenges. We introduce a novel approximate summarization technique for provenance which overcomes these challenges. Our approach uses patterns to encode (why-not) provenance as a summarized representation of sets of elements from the provenance. We develop techniques for computing summaries balancing informativeness, conciseness, and completeness (fraction of the provenance described by the summary). To achieve scalability, we integrate a sampling technique into provenance capture and summarization. We implement these techniques in our PUG (Provenance Unification through Graphs) system which runs on top of a relational database. We demonstrate through extensive experiments that our approach scales to large datasets and produces comprehensive and meaningful explanations and summaries.

Seminar Speaker:

Seokki Lee

Seokki Lee

Illinois Institute of Technology

Seokki Lee is a PhD candidate in the Department of Computer Science at Illinois Institute of Technology since 2014. He completed M.S. in Engineering Management from California State University, Northridge (CSUN) in 2014, before which he had a rich industry experience for 6 years in South Korea (Compuware and ELITEK). He also obtained M.S. in Computer Science from Hanyang University, top 5 university in South Korea, in 2009. His research focuses on data provenance for many domain applications, databases, and Big Data. He has been actively publishing in top conferences and journals such as IEEE ICDE and VLDB Journal.