NOTES PERTAINING TO PUBLICATION, DISTRIBUTION, ETC.
Text of Note
Place of publication: United States, Ann Arbor; ISBN=9781085589277
DISSERTATION (THESIS) NOTE
Dissertation or thesis details and type of degree
Ph.D.
Discipline of degree
Computer Science
Body granting the degree
Illinois Institute of Technology
Text preceding or following the note
2019
SUMMARY OR ABSTRACT
Text of Note
Database provenance explains how results are derived by queries. However, many use cases such as auditing and debugging of transactions require understanding of how the current state of a database was derived by a transactional history. We introduce an approach for capturing the provenance of transactions. Our approach does not just work for serializable concurrency control protocols but also for non-serializable protocols including snapshot isolation. The main drivers of our approach are a provenance model for queries, updates, and transactions and reenactment, a novel technique for retroactively capturing the provenance of tuple versions. We introduce the MV-semirings provenance model for updates and transactions as an extension of the existing semiring provenance model for queries. Our reenactment technique exploits the time travel and audit logging capabilities of modern DBMS to replay parts of a transactional history using queries. Importantly, our technique requires no changes to the transactional workload or underlying DBMS and results in only moderate runtime overhead for transactions. We discuss how our MV-semirings model and reenactment approach can be used to serve a wide variety of applications and use cases including answering of historical what-if queries which determine the effect of hypothetical changes to past operations of a business, post-mortem debugging of transactions, and to create private data workspaces for exploration. We have implemented our approach on top of a commercial DBMS and our experiments confirm that by applying novel optimizations we can efficiently capture provenance for complex transactions over large data sets.
TOPICAL NAME USED AS SUBJECT
Computer science
UNCONTROLLED SUBJECT TERMS
Subject Term
Concurrency control protocol;Database;Data provenance;Reenactment;Transaction;Update statement