HUME:

A Unified and Declarative Approach to
Causal Analysis for Big Data

Summary

This is a collaborative project at the University of Washington (UW), University of California, Santa Cruz (UCSC), and Duke University.

Acknowledgment

This project is supported by a collaborative NSF IIS-medium Award (1703281 (UW), 1703331 (UCSC), and 1703431 (Duke))

.

Overview

Observational data is available today in multi-relational form, often extracted from various sources, and stored in multiple flat and interrelated tables. Standard statistical methods for conducting causal inference on observational data assume a very simple data model: a single table with independent units. This research has the potential to significantly impact application domains where differentiating causality from correlation is essential, e.g., education policy and cancer genomics. The HUME project develops techniques for efficient causal analysis using a declarative approach, over complex views, and over large datasets that are integrated from disparate data sources. HUME uses a SQL-like language and is integrated with a relational database system.

The project develops techniques for defining arbitrarily complex units, treatments, outcomes, and covariates, by combining joins, data mapping, and aggregates across multiple tables, and uses a causal network to choose a good set of covariates for causal inference. The first part of the project develops scalable techniques for sub-classification and matching for large data sets obtained by declaratively integrating multiple data sources. The second part of the project develops scalable methods for discovering causal relationships among the attributes in the views by constraint-based, search-based, and hybrid discovery processes. Finally, the third part of the project investigates interferences among units arising from the complex views by designing normal forms and automatic inference of underlying assumptions exploiting techniques from database theory.

Faculty PIs

Lise Getoor

University of California, Santa Cruz

Sudeepa Roy

Duke University

Dan Suciu

University of Washington

Postdocs

Golnoosh Farnadi

UCSC

Babak Salimi

UW

Students

Dhanya Sridhar

UCSC

Harsh Parikh

PhD student, Duke

Moe Kayali

Undergraduate student, UW

-->

Past Members

Corey Cole, UW
Neha Gupta, Duke
Yuchao Tao, Duke

Publications

Tianyu Wang, Marco Morucci, M. Usaid Awan, Yameng Liu, Sudeepa Roy, Cynthia Rudin, and Alexander Volfovsky
FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference. To appear in Journal of Machine Learning Research (JMLR), 2021.
Babak Salimi, Harsh Parikh, Moe Kayali, Lise Getoor, Sudeepa Roy, and Dan Suciu:
Causal Relational Learning [arxiv]. SIGMOD Conference 2020.
Sriram Srinivasan, Golnoosh Farnadi, and Lise Getoor:
BOWL: Bayesian Optimization for Weight Learning in Probabilistic Soft Logic. AAAI 2020.
Sriram Srinivasan, Eriq Augustine, and Lise Getoor:
Tandem Inference: An Out-of-Core Streaming Algorithm for Very Large-Scale Relational Inference. AAAI 2020.
Marco Morucci, Vittorio Orlandi, Sudeepa Roy, Cynthia Rudin, and Alexander Volfovsky:
Adaptive Hyper-box Matching for Interpretable Individualized Treatment Effect Estimation. UAI 2020.
M. Usaid Awan, Marco Morucci, Vittorio Orlandi, Sudeepa Roy, Cynthia Rudin, and Alexander Volfovsky:
Almost-Matching-Exactly for Treatment Effect Estimation under Network Interference. AISTATS 2020.
Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, and Dan Suciu:
Bag Query Containment and Information Theory. PODS 2020.
Dan Suciu:
Probabilistic Databases for All. PODS 2020.
Varun Embar, Sriram Srinivasan, and Lise Getoor: Estimating Aggregate Properties in Relational Networks with Unobserved Data. International Workshop on Statistical Relational AI (StarAI) 2020
Varun Embar, Bunyamin Sisman, Hao Wei , Xin Luna Dong, Christos Faloutsos, and Lise Getoor: Contrastive Entity Linkage: Mining Variational Attributes From Large Catalogs for Entity Linkage. Automated Knowledge Base Construction (AKBC) 2020
Eriq Augustine, Theodoros Rekatsinas, and Lise Getoor:
Tractable Probabilistic Reasoning Through Effective Grounding. ICML Workshop on Tractable Probabilistic Modeling (TPM)} 2019
Golnoosh Farnadi, Behrouz Babaki, and Lise Getoor:
A Declarative Approach to Fairness in Relational Domains. IEEE Data Engineering Bulletin 2019, Vol. 42 No. 3.
Babak Salimi, Luke Rodriguez, Bill Howe, Dan Suciu:
Interventional Fairness: Causal Database Repair for Algorithmic Fairness. SIGMOD Conference 2019: 793-810 **best paper award**
M.Usaid Awan, Yameng Liu, Marco Morucci, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky :
Almost Matching Exactly With Instrumental Variables. UAI 2019
Awa Dieng, Yameng Liu, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky:
Interpretable Almost-Exact Matching for Causal Inference. AISTATS 2019
Dhanya Sridhar and Lise Getoor :
Estimating Causal Effects of Tone in Online Debates.Intelligent User Interfaces (IUI) 2019 **IUI outstanding paper award**
Pigi Kouki, James Schaffer, Jay Pujara, John ODonovan, and Lise Getoor :
Personalized Explanations for Hybrid Recommender Systems. IJCAI 2019
Arti Ramesh, Dan Goldwasser, Bert Huang, Hal Daume,and Lise Getoor :
Interpretable Engagement Models for MOOCs using Hinge-loss Markov Random Fields. Transactions on Learning Technologies 2019
Sriram Srinivasan, Behrouz Babaki, Golnoosh Farnadi, and Lise Getoor :
Lifted Hinge-Loss Markov Random Fields. AAAI Conference on Artificial Intelligence 2019.
Varun R Embar, Sriram Srinivasan, and Lise Getoor :
Tractable Marginal Inference for Hinge-Loss Markov Random Fields. Third ICML workshop on Tractable Probabilistic Modeling 2019.
Eriq Augustine, Theodoros Rekatsinas , and Lise Getoor :
Tractable Probabilistic Reasoning Through Effective Grounding. Third ICML workshop on Tractable Probabilistic Modeling 2019.
Varun R Embar, Jay Pujara, and Lise Getoor :
Collective Alignment of Large-scale Ontologiess. AKBC Workshop on Federated KBs and the Open Knowledge Network 2019.
Prajakta Kalmegh, Shivnath Babu, Sudeepa Roy:
iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks. SIGMOD Conference 2019: 918-935
Babak Salimi, Luke Rodriguez, Bill Howe, Dan Suciu:
HypDB: A Demonstration of Detecting, Explaining and Resolving Bias in OLAP queries. PVLDB 2018 11(12): 2062-2065, **best demo award**
Babak Salimi, Johannes Gehrke, Dan Suciu:
Bias in OLAP Queries: Detection, Explanation, and Removal. SIGMOD Conference 2018: 1021-1035
Dhanya Sridhar, Jay Pujara, Lise Getoor:
Scalable Probabilistic Causal Structure Discovery. International Joint Conference on Artificial Intelligence (IJCAI) 2018.

Golnoosh Farnadi, Behrouz Babaki, Lise Getoor:
Fairness in Relational Domains. AIES 2018.
Golnoosh Farnadi, Behrouz Babaki, Lise Getoor:
Fairness-aware Relational Learning and Inference. International Workshop on Declarative Learning Based Programming (DeLBP) 2018.
Awa Dieng, Yameng Liu, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky:
Collapsing-Fast-Large-Almost-Matching-Exactly: A Matching Method for Causal Inference Manuscript 2018. Preliminary version on arXiv:1806.06802 [stat.ML]
Dhanya Sridhar, Aaron Springer, Victoria Hollis, Steve Whittaker, Lise Getoor:
Estimating Causal Effects of Exercise from Mood Logging Data.IJCAI CausalML Workshop 2018.
Prajakta Kalmegh, Harrison Lundberg, Frederick Xu, Shivnath Babu, Sudeepa Roy:
iQCAR: A Demonstration of an Inter-Query Contention Analyzer for Cluster Computing Frameworks. SIGMOD Conference 2018: 1721-1724
Babak Salimi, Corey Cole, Dan R. K. Ports, Dan Suciu:
ZaliQL: Causal Inference from Observational Data at Scale. PVLDB 10(12): 1957-1960 (2017)
Helga Gudmundsdottir, Babak Salimi, Magdalena Balazinska, Dan R. K. Ports, Dan Suciu:
A Demonstration of Interactive Analysis of Performance Measurements with Viska. SIGMOD Conference 2017: 1707-1710

Dhanya Sridhar, Jay Pujara, Lise Getoor:
Using Noisy Extractions to Discover Causal Knowledge. Workshop on Automated Knowledge Base Discovery (AKBC) 2017.
Dhanya Sridhar, Jay Pujara, Lise Getoor:
A Scalable Probabilistic Approach for Causal Structure Discovery Women in Machine Learning Workshop 2017.
Tianyu Wang, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky:
FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference Manuscript 2017. Preliminary version on arXiv:1707.06315 [stat.ML]

Tools/Resources

"Causal Relational Learning", SIGMOD 2020: codebase.
SIGMOD 2020 video on "Causal Relational Learning"
Probabilistic Soft Logic