Sudeepa Roy

[photograph] class='iconDetails' />
    <div class=

                               Assistant Professor
                               Department of Computer Science
                               Duke University
                               308 Research Drive
                               Campus Box 90129
                               Durham, NC 27708-0129

                               Office: D325 LSRC Building
                               Phone: (919)-660-6596
                               Fax: (919) 660-6519
                               E-mail: sudeepa AT cs DOT duke DOT edu      





Background       Research       Projects       Funding       Services       Teaching     Students       Publications       Patents


Background

I joined the Department of Computer Science at Duke University in Fall 2015.
I am a member of the Duke Database Group (a.k.a. Duke Database Devils; more about Duke Blue Devils),
which is part of the Duke Systems and Architecture Group.

Before joining Duke, I was a postdoctoral research associate in the Department of Computer Science and Engineering,
University of Washington where I worked with Prof. Dan Suciu and the database group.

I graduated from the University of Pennsylvania with a Ph.D. in Computer and Information Science where I was advised by
Prof. Susan Davidson and Prof. Sanjeev Khanna. During my Ph.D., I did two internships at IBM Research, Almaden,
and received a Google PhD fellowship in Structured Data in 2011.

Go to top >>

Research

I am broadly interested in data and information management with a focus on foundational aspects of databases and
big data analysis. My current research focuses on building tools and techniques to help users leverage the maximum benefit
from the available data. While my ongoing work on causality and explanations in databases directly aims to assist users get
deep insights into data by providing causal analysis and rich explanations to their questions, my work in the areas of data and workflow provenance,
probabilistic databases, and crowd-sourcing
probes into compelling, fundamental questions that need to be answered
to enable end-to-end processing and analysis of unstructured, noisy, and unreliable data in today's world while preserving its entire context.

See my publications.

Projects



Go to top >>

Funding

Go to top >>

Services

Program Committee Member

Organization

Other Committee Member

External Reviewer

Go to top >>

Teaching

Go to top >>

Students

I am fortunate to work with a number of wonderful students at Duke!
(and the list below does not include the great students from other schools I work with).

Current students

Former students

Go to top >>

Publications    

Book Chapter

  1. Uncertain Data Lineage [pdf].
        Encyclopedia of Database Systems, 2nd edition, Springer, 2018.

  2. Provenance: Privacy and Security [pdf].
        (with Susan Davidson)
        Encyclopedia of Database Systems, 2nd edition, Springer, 2018.

Tutorial

  1. Causality and Explanations in Databases [pdf] [slides].
        (with Alexandra Meliou and Dan Suciu)
        International Conference on Very Large Data Bases (VLDB) 2014.

Journal Publication

  1. Computing Optimal Repairs for Functional Dependencies.
        (with Ester Livshits and Benny Kimelfeld)
        To appear in ACM Transactions on Database Systems (TODS), 2019.

  2. Exact Model Counting of Query Expressions: Limitations of Propositional Methods [pdf].
        (with Paul Beame, Jerry Li, and Dan Suciu)
        ACM Transactions on Database Systems (TODS), Vol. 42, Issue 1, pages 1:1-1:46, March 2017.

  3. Answering Conjunctive Queries with Inequalities [pdf].
        (with Paraschos Koutris, Tova Milo, and Dan Suciu)
        Theory of Computing Systems (TOCS), Springer, Vol. 61, Number 1, pages 2-30, 2017.

  4. Top-k and Clustering with Noisy Comparisons [pdf].
        (with Susan B. Davidson, Sanjeev Khanna, and Tova Milo)
        ACM Transactions on Database Systems (TODS), Vol. 39, Issue 4, pages 35:1--35:39, December 2014 (best paper special issue).

Invited Article

  1. Query Perturbation Analysis: An Adventure of Database Researchers in Fact-Checkings [pdf].
        (with Jun Yang, Pankaj K. Agarwal, Sudeepa Roy, Brett Walenz, You Wu, Cong Yu, and Chengkai Li)
        IEEE Data Engineering Bulletin 2018 (41(3), pages 28-42)

  2. On the Complexity of Evaluating Order Queries with the Crowd [pdf].
        (with Benoit Groz and Tova Milo)
        IEEE Data Engineering Bulletin 2015 (38(3), pages 44-58)

Manuscript

  1. Learning to Sample: Counting with Complex Queries [arxiv].
        (with Brett Walenz, Stavros Sintos, and Jun Yang)

  2. Principles of Progress Indicators for Database Repairing [arxiv].
        (with Ester Livshits, Ihab Ilyas, and Benny Kimelfeld)

  3. Generalized Deletion Propagation on Counting Conjunctive Query Answers. [arxiv]
        (with Debmalya Panigrahi and Shweta Patwa)

  4. FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference [arxiv].
        (with Cynthia Rudin, Alexander Volfovsky, and Tianyu Wang)

Conference Publication

  1. Almost Matching Exactly With Instrumental Variables [arxiv].
        (with M.Usaid Awan, Yameng Liu, Marco Morucci, Cynthia Rudin, and Alexander Volfovsky)
        Conference on Uncertainty in Artificial Intelligence (UAI) 2019.

  2. CAPE: Explaining Outliers by Counterbalancing [pdf].
        (with Zhengjie Miao, Qitian Zeng, Chenjie Li, Boris Glavic, and Oliver Kennedy)
        To appear in Proceedings of the VLDB Endowment (PVLDB), Vol 12, demonstration track, 2019.

  3. LensXPlain: Visualizing and Explaining Contributing Subsets for Aggregate Query Answers [pdf].
        (with Zhengjie Miao and Andrew Lee)
        To appear in Proceedings of the VLDB Endowment (PVLDB), Vol 12, demonstration track, 2019.

  4. Almost-Exact Matching with Replacement for Causal Inference [arxiv].
        (with Awn Dieng, Yameng Liu, Cynthia Rudin, and Alexander Volfovsky)
        International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.

  5. RATest: Explaining Wrong Queries Using Small Examples [pdf].
        (with Zhengjie Miao and Jun Yang)
        ACM SIGMOD International Conference on Management of Data (SIGMOD), demonstration track, 2019.

  6. Explaining Wrong Queries Using Small Examples [pdf].
        (with Zhengjie Miao and Jun Yang)
        ACM SIGMOD International Conference on Management of Data (SIGMOD), 2019.

  7. Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances [pdf].
        (with Zhengjie Miao, Qitian Zeng, and Boris Glavic)
        ACM SIGMOD International Conference on Management of Data (SIGMOD), 2019.

  8. iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks [pdf].
        (with Prajakta Kalmegh and Shivnath Babu)
        ACM SIGMOD International Conference on Management of Data (SIGMOD), 2019.

  9. Interactive Summarization and Exploration of Top Aggregate Query Answers [pdf].
        (with Yuhao Wen, Xiaodan Zhu, and Jun Yang)
        Proceedings of the VLDB Endowment (PVLDB) 2018, Vol 11 Issue 13/VLDB 2019.

  10. Computing Optimal Repairs for Functional Dependencies [arxiv].
        (with Ester Livshits and Benny Kimelfeld)
        Principles of Database Systems (PODS) 2018.

  11. iQCAR: A demonstration of an Inter-query Contention Analyzer for Cluster Computing Frameworks [pdf].
        (with Prajakta Kalmegh, Harrison Lundberg, Frederick Xu, and Shivnath Babu)
        ACM SIGMOD International Conference on Management of Data (SIGMOD), demonstration track, 2018.

  12. QAGView: Interactively Summarizing High-Valued Aggregate Query Answers [pdf].
        (with Yuhao Wen, Xiaodan Zhu, and Jun Yang)
        ACM SIGMOD International Conference on Management of Data (SIGMOD), demonstration track, 2018.

  13. Optimizing Iceberg Queries with Complex Joins [pdf].
        (with Brett Walenz and Jun Yang)
        ACM SIGMOD International Conference on Management of Data (SIGMOD) 2017.

  14. Explaining Query Answers with Explanation-Ready Databases [pdf] [slides].
        (with Laurel Orr and Dan Suciu)
        Proceedings of the VLDB Endowment (PVLDB) Vol 9/VLDB 2016.

  15. Answering Conjunctive Queries with Inequalities [pdf].
        (with Paraschos Koutris, Tova Milo, and Dan Suciu)
        International Conference on Database Theory (ICDT) 2015

  16. A Formal Approach to Finding Explanations for Database Queries [pdf] [slides].
        (with Dan Suciu)
        ACM SIGMOD International Conference on Management of Data (SIGMOD) 2014.

  17. Circuits for Datalog Provenance [pdf] [slides].
        (with Daniel Deutch, Tova Milo, and Val Tannen)
        International Conference on Database Theory (ICDT) 2014.

  18. Model Counting of Query Expressions: Limitations of Propositional Methods [pdf].
        (with Paul Beame, Jerry Li, and Dan Suciu)
        International Conference on Database Theory (ICDT) 2014.
        Invited to ACM TODS as one of the best papers in ICDT 2014

  19. Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases [pdf] [slides].
        (with Paul Beame, Jerry Li, and Dan Suciu)
        Conference on Uncertainty in Artificial Intelligence (UAI) 2013.

  20. Provenance-based Dictionary Refinement in Information Extraction [pdf] [slides].
        (with Laura Chiticariu, Vitaly Feldman, Frederick R Reiss and Huaiyu Zhu)
        ACM SIGMOD International Conference on Management of Data (SIGMOD) 2013.

  21. Using the Crowd for Top-k and Group-by Queries [pdf] [slides].
        (with Susan B. Davidson, Sanjeev Khanna and Tova Milo)
        International Conference on Database Theory (ICDT) 2013.
        Invited to ACM TODS as one of the best papers in ICDT 2013

  22. A Propagation Model for Provenance Views of Public/Private Workflows [pdf] [slides].
        (with Susan B. Davidson and Tova Milo)
        International Conference on Database Theory (ICDT) 2013.

  23. Queries with Difference on Probabilistic Databases [pdf] [slides].
        (with Sanjeev Khanna and Val Tannen)
        International Conference on Very Large Data Bases (VLDB) 2011.

  24. Provenance Views for Module Privacy [pdf] [slides].
        (with Susan B. Davidson, Sanjeev Khanna, Tova Milo, and Debmalya Panigrahi)
        Principles of Database Systems (PODS) 2011.

  25. Faster Query Answering in Probabilistic Databases using Read-Once Functions [pdf] [slides].
        (with Vittorio Perduca and Val Tannen)
        International Conference on Database Theory (ICDT) 2011.

  26. Enabling Privacy in Provenance-Aware Workflow Systems [pdf].
        (with Susan Davidson, Sanjeev Khanna, Julia Stoyanovich, Val Tannen, Yi Chen and Tova Milo)
        Vision Track, Conference on Innovative Data Systems Research (CIDR) 2011.

  27. An Optimal Labeling Scheme for Workflow Provenance Using Skeleton Labels [pdf].
        (with Zhuowei Bao, Susan Davidson and Sanjeev Khanna)
        ACM SIGMOD International Conference on Management of Data (SIGMOD) 2010.

  28. Optimizing User Views for Workflows [pdf] [slides].
        (with Olivier Biton, Susan Davidson and Sanjeev Khanna)
        International Conference on Database Theory (ICDT) 2009.

  29. STCON in Directed Unique-Path Graphs [pdf] [slides].
        (with Sampath Kannan and Sanjeev Khanna)
        Foundations of Software Technology and Theoretical Computer Science (FSTTCS) 2008.

  30. Automatic Translation of Simulink Models into Input Language of a Model Checker [pdf].
        (with Meenakshi B. and Abhishek Bhatnagar)
        International Conference on Formal Engineering Methods (ICFEM) 2006.

Workshop, Poster, and Other Publication

  1. iQCAR: Inter-Query Contention Analyze [pdf].
        (with Prajakta Kalmegh and Shivnath Babu)
        Symposium on Cloud Computing (SOCC), Poster, 2018.

  2. Hiding Data and Structure in Workflow Provenance [pdf].
        (with Susan B. Davidson and Zhuowei Bao)
        Invited paper, International Workshop on Databases in Networked Information Systems (DNIS) 2011.

  3. Privacy Issues in Scientific Workflow Provenance [pdf] [slides].
        (with Susan Davidson, Sanjeev Khanna and Sarah Cohen Boulakia)
        International Workshop on Workflow Approaches to New Data-centric Science (WANDS) 2010.

Ph.D. Dissertation

    Provenance and Uncertainty [pdf].
    Sudeepa Roy
    University of Pennsylvania, August 2012


Go to top >>

Patents

  1. Refining a dictionary for information extraction.
        (with Laura Chiticariu, Vitaly Feldman, Frederick Reiss, and Huaiyu Zhu)
        Assignee: International Business Machines Corporation (IBM)
        Publication Number: US 8775419 B2,  2014

  2. Automatic Translation of Simulink Models into Input Language of a Model Checker.
        (with Meenakshi B. and Abhishek Bhatnagar)
        Assignee: Honeywell International Inc.
        Publication Number: US 7698668 B2,  2010

Go to top >>