Code

Optimal Sparse Decision Tree code

Fast Sparse Decision Tree Optimization via Reference Ensembles
(code) | (paper) | (speed up using guesses)

Creates optimal sparse decision trees quickly. Remember not to remove the regularization term! The predecessor to this algorithm is OSDT.

Sparse Generalized Additive Models code

Fast Sparse Generalized Linear and Additive Classification
(code) | (paper)

Produces sparse additive models quickly.

FasterRisk: Fast and Accurate Interpretable Risk Scores
(code) | (paper)

Creates risk assessment scoring systems, which are linear models with integer coefficients that estimate risk.

Learning Optimized Risk Scores from Large-Scale Datasets (RiskSLIM)
(code) | (paper)

Creates risk assessment scoring systems, which are linear models with integer coefficients that estimate risk. This code is slower than FasterRisk but can incorporate constraints and get provable optimality. Try using FasterRisk instead.

Supersparse Linear Integer Models (SLIM)
(matlab code) | (python code) | (matlab code) | (paper) | (bib)

For building scoring systems, which are linear models with integer coefficients. Part of winning entry for 2016 INFORMS Innovative Applications in Analytics Award. Note that this code is slow and the previous two algorithms are better.

Dimension Reduction code

PaCMAP for Dimension Reduction
(code) | (paper)

Winner of the 2023 John M. Chambers Statistical Software Award from the American Statistical Association.

Or's of And's (Disjunction of Conjunctions) code

Bayesian Or's of And's
(code and coupon data) | (data on UCI repo) | (paper) | (bib) | (code by Ritwik Mitra, Emily Dodwell, Elena Khusainova, Deirdre Paul)

For classification, an alternative to decision trees, inductive logic programming and associative classification.

Box Drawings for Learning with Imbalanced Data
(matlab code) | (paper) | (bib)

For imbalanced classification with real-valued features.

Variable Importance for the Rashomon Set

Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models
(code) | (paper)

MCR - Model Class Reliance
(code) | (paper)

For assessing variable importance of a model class for a dataset.

Matching for Causal Inference

FLAME - Fast Large Almost Matching Exactly
DAME - Dynamic Almost Matching Exactly
FLAME-IV - Almost Matching Exactly with Instrumental Variables
MALTS - Matching After Learning to Stretch
AHB - Adaptive Hyperboxes
(code) | (CRAN site)

For large scale interpretable matching in causal inference. Honorable mention for the 2022 John M. Chambers Statistical Software Award from the American Statistical Association.

Interpretable Neural Networks

Concept Whitening
(code) | (paper)

Interpretable Prototype Neural Networks (This Looks Like That)
(code) | (bib) | (paper)

Interpretable Deep Neural Networks with Hierarchical Prototypes
(code) | (paper)

Rule Lists and Falling Rule Lists

Certifiably Optimal RulE ListS (CORELS)
(code) | (R-bindings by Dirk Eddelbuettel) (paper)

For classification, an alternative to decision trees. Predecessors to this code are Bayesian Rule Lists (BRL) and Scalable Bayesian Rule Lists (SBRL) (R interface, C code - Creative Commons License). The CORELS code is more efficient, so please use that.

Optimized Falling Rule Lists and Softly Falling Rule Lists
(paper) | (code) | (bib)

For classification where the probabilities decrease along the list.

Falling Rule Lists (FRL)
(python code) | (paper) | (bib)

For classification where the probabilities decrease along the list. This algorithm is Bayesian, based on sampling. The one above uses optimization.

Name-based Ethnicity Classification

EthnicIA
(code) | (paper)

Superresolution

Photo Upsampling via Latent Space Exploration of Generative Models (PULSE)
(project page with code and online demo) | (paper)

Recidivism

Age of Unfairness
(code) | (paper)

Interpretable Models for Recidivism Prediction
(code for processing raw data) | (code for machine learning pipeline) | (paper)

Summary Explanations

Globally Consistent Summary-Explanations
(code) | (paper)

Recovery Curves

Recovery Curves
(code) | (paper)

Multi-armed Bandits with Time Series Information

Regulating Greed Over Time
(code) | (paper)

Crime Series Analysis

Series Finder
(code) | (paper)

For detecting crime series.

ROC Flexibility Data

ROC Flexibility Data
Used for several ranking papers (data)

Interpretable Unsupervised Learning (Clustering, Density Estimation)

ClusteR-specific Assorted Feature selecTion (CRAFT)
(code) | (paper)

Clustering with cluster-specific feature selection.

Bayesian Case Model (BCM)
(code) | (paper)

Prototype clustering with cluster-specific feature selection.

Higher Dimensional Histograms
(code) | (paper) | (bib)

For density trees and density rule lists.

List-growing

Growing a List
(python code) | (paper) | (bib)

A search engine that performs set expansion. Note that this code is artificially slowed down by a restriction on the number of queries per minute, imposed by search engine companies. Unrestricted access to a search engine would eliminate this issue.