Sudeepa Roy

[photograph] class='iconDetails' />
    <div class=

                               Assistant Professor
                               Department of Computer Science
                               Duke University
                               308 Research Drive
                               Campus Box 90129
                               Durham, NC 27708-0129

                               Office: D325 LSRC Building
                               Phone: (919)-660-6596
                               Fax: (919) 660-6519
                               E-mail: sudeepa AT cs DOT duke DOT edu      





Background       Research       Funding       Services       Teaching       Publications       Patents


Background

I joined the Department of Computer Science at Duke University in Fall 2015.
I am a member of the Duke Database Group (a.k.a. Duke Database Devils; more about Duke Blue Devils),
which is part of the Duke Systems and Architecture Group.

Before joining Duke, I was a postdoctoral research associate in the Department of Computer Science and Engineering,
University of Washington where I worked with Prof. Dan Suciu and the database group.

I graduated from the University of Pennsylvania with a Ph.D. in Computer and Information Science where I was advised by
Prof. Susan Davidson and Prof. Sanjeev Khanna. During my Ph.D., I did two internships at IBM Research, Almaden,
and received a Google PhD fellowship in Structured Data in 2011.

Go to top >>

Research

I am broadly interested in data and information management with a focus on foundational aspects of databases and
big data analysis. My current research focuses on building tools and techniques to help users leverage the maximum benefit
from the available data. While my ongoing work on explanations in databases directly aims to assist users get
deep insights into data by providing rich explanations to their questions, my work in the areas of data and workflow provenance,
probabilistic databases, and crowd-sourcing
probes into compelling, fundamental questions that need to be answered
to enable end-to-end processing and analysis of unstructured, noisy, and unreliable data in today's world while preserving its entire context.

See my publications.
[photograph]

     

Project page for
FIREFly: Formal Interactive Rich Explanations on the Fly









Go to top >>

Funding

NSF CAREER Award #1552538: "CAREER: FIREFLY - Rich Explanations for Database Queries"

Go to top >>

Services

Program Committee Member

Organization

Award Committee Member

External Reviewer

Go to top >>

Teaching

Go to top >>

Publications     (by topic)

Tutorial

  1. Causality and Explanations in Databases [pdf] [slides].
        (with Alexandra Meliou and Dan Suciu)
        International Conference on Very Large Data Bases (VLDB) 2014.

Journal Publications

  1. Exact Model Counting of Query Expressions: Limitations of Propositional Methods [pdf].
        (with Paul Beame, Jerry Li, and Dan Suciu)
        ACM Transactions on Database Systems (TODS), Vol. 42, Issue 1, March 2017.
        (Preliminary versions appeared in ICDT 2014 and UAI 2013)

  2. Top-k and Clustering with Noisy Comparisons [pdf].
        (with Susan B. Davidson, Sanjeev Khanna, and Tova Milo)
        ACM Transactions on Database Systems (TODS), Vol. 39, Issue 4, December 2014 (best paper special issue).
        (A preliminary version appeared in ICDT 2013)

Invited Article

  1. On the Complexity of Evaluating Order Queries with the Crowd [pdf].
        (with Benoit Groz and Tova Milo)
        IEEE Data Engineering Bulletin 2015 (38(3), pages 44-58)

Conference Publications

  1. Optimizing Iceberg Queries with Complex Joins [pdf].
        (with Brett Walenz and Jun Yang)
        To appear in ACM SIGMOD International Conference on Management of Data (SIGMOD) 2017.

  2. Explaining Query Answers with Explanation-Ready Databases [pdf] [slides].
        (with Laurel Orr and Dan Suciu)
        Proceedings of the VLDB Endowment (PVLDB) Vol 9/VLDB 2016.

  3. Answering Conjunctive Queries with Inequalities [pdf].
        (with Paraschos Koutris, Tova Milo, and Dan Suciu)
        International Conference on Database Theory (ICDT) 2015

  4. A Formal Approach to Finding Explanations for Database Queries [pdf] [slides].
        (with Dan Suciu)
        ACM SIGMOD International Conference on Management of Data (SIGMOD) 2014.

  5. Circuits for Datalog Provenance [pdf] [slides].
        (with Daniel Deutch, Tova Milo, and Val Tannen)
        International Conference on Database Theory (ICDT) 2014.

  6. Model Counting of Query Expressions: Limitations of Propositional Methods [pdf].
        (with Paul Beame, Jerry Li, and Dan Suciu)
        International Conference on Database Theory (ICDT) 2014.
        Invited to ACM TODS as one of the best papers in ICDT 2014

  7. Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases [pdf] [slides].
        (with Paul Beame, Jerry Li, and Dan Suciu)
        Conference on Uncertainty in Artificial Intelligence (UAI) 2013.

  8. Provenance-based Dictionary Refinement in Information Extraction [pdf] [slides].
        (with Laura Chiticariu, Vitaly Feldman, Frederick R Reiss and Huaiyu Zhu)
        ACM SIGMOD International Conference on Management of Data (SIGMOD) 2013.

  9. Using the Crowd for Top-k and Group-by Queries [pdf] [slides].
        (with Susan B. Davidson, Sanjeev Khanna and Tova Milo)
        International Conference on Database Theory (ICDT) 2013.
        Invited to ACM TODS as one of the best papers in ICDT 2013

  10. A Propagation Model for Provenance Views of Public/Private Workflows [pdf] [slides].
        (with Susan B. Davidson and Tova Milo)
        International Conference on Database Theory (ICDT) 2013.

  11. Queries with Difference on Probabilistic Databases [pdf] [slides].
        (with Sanjeev Khanna and Val Tannen)
        International Conference on Very Large Data Bases (VLDB) 2011.

  12. Provenance Views for Module Privacy [pdf] [slides].
        (with Susan B. Davidson, Sanjeev Khanna, Tova Milo, and Debmalya Panigrahi)
        Principles of Database Systems (PODS) 2011.

  13. Faster Query Answering in Probabilistic Databases using Read-Once Functions [pdf] [slides].
        (with Vittorio Perduca and Val Tannen)
        International Conference on Database Theory (ICDT) 2011.

  14. Enabling Privacy in Provenance-Aware Workflow Systems [pdf].
        (with Susan Davidson, Sanjeev Khanna, Julia Stoyanovich, Val Tannen, Yi Chen and Tova Milo)
        Vision Track, Conference on Innovative Data Systems Research (CIDR) 2011.

  15. An Optimal Labeling Scheme for Workflow Provenance Using Skeleton Labels [pdf].
        (with Zhuowei Bao, Susan Davidson and Sanjeev Khanna)
        ACM SIGMOD International Conference on Management of Data (SIGMOD) 2010.

  16. Optimizing User Views for Workflows [pdf] [slides].
        (with Olivier Biton, Susan Davidson and Sanjeev Khanna)
        International Conference on Database Theory (ICDT) 2009.

  17. STCON in Directed Unique-Path Graphs [pdf] [slides].
        (with Sampath Kannan and Sanjeev Khanna)
        Foundations of Software Technology and Theoretical Computer Science (FSTTCS) 2008.

  18. Automatic Translation of Simulink Models into Input Language of a Model Checker [pdf].
        (with Meenakshi B. and Abhishek Bhatnagar)
        International Conference on Formal Engineering Methods (ICFEM) 2006.

Workshop and Other Publications

  1. Hiding Data and Structure in Workflow Provenance [pdf].
        (with Susan B. Davidson and Zhuowei Bao)
        Invited paper, International Workshop on Databases in Networked Information Systems (DNIS) 2011.

  2. On provenance and privacy [pdf].
        (with Susan Davidson, Sanjeev Khanna, Julia Stoyanovich, Val Tannen and Yi Chen)
        Keynote by Prof. Susan Davidson and invited paper, International Conference on Database Theory (ICDT) 2011.

  3. Privacy Issues in Scientific Workflow Provenance [pdf] [slides].
        (with Susan Davidson, Sanjeev Khanna and Sarah Cohen Boulakia)
        International Workshop on Workflow Approaches to New Data-centric Science (WANDS) 2010.

Ph.D. Dissertation

    Provenance and Uncertainty [pdf].
    Sudeepa Roy
    University of Pennsylvania, August 2012


Go to top >>

Patents

  1. Refining a dictionary for information extraction.
        (with Laura Chiticariu, Vitaly Feldman, Frederick Reiss, and Huaiyu Zhu)
        Assignee: International Business Machines Corporation (IBM)
        Publication Number: US 8775419 B2,  2014

  2. Automatic Translation of Simulink Models into Input Language of a Model Checker.
        (with Meenakshi B. and Abhishek Bhatnagar)
        Assignee: Honeywell International Inc.
        Publication Number: US 7698668 B2,  2010

Go to top >>