Whenever we design new algorithms, we write code so we can evaluate them and then use them for solving problems in systems biology. Occasionally, these algorithms are of sufficient general use that we have taken the extra time to carefully re-implement them as software packages that are better documented, more efficient, more extensible, and more user-friendly than our standard research-grade code. Below are listed the software packages that fall into this category. Each package has its own site, linked below. In each case, the software is available with complete source code under a non-commercial use license. If you are interested in commercial licensing opportunities, please contact us.
Banjo stands for Bayesian Network Inference with Java Objects. Banjo is a highly efficient, configurable, and extensible package for the inference of either static or dynamic Bayesian networks. Banjo is currently limited to discrete variables only; however, it can discretize continuous data for you if you wish, and is modular and extensible so that new components can be written to handle continuous variables. The modular design also allows the user to mix and match various inference algorithm components to implement different learning procedures, ranging from simulated annealing with random local moves to greedy hillclimbing with all local moves, as well as create new ones.
SMLR stands for Sparse Multinomial Logistic Regression. SMLR is an efficient implementation of a true multiclass probabilistic classifier based on the well-studied multinomial logistic regression framework. However, we adopt a Bayesian perspective, enabling us to incorporate a Laplacian prior (related to LASSO) which promotes the learning of a sparse weight vector. The result is a classifier that can operate either directly on input features and perform automatic feature selection (embedded, not filter or wrapper), or with a kernel and perform automatic sample selection (much like the SVM). The objective function is convex so it has a unique global optimum. SMLR implements a suite of bound-optimization algorithms that we have developed to find this optimum efficiently, even when the number of samples or features is large (at least tens of thousands).
PRIORITY is a tool for de novo motif discovery in the context of transcription factor (TF) binding sites. It implements a new approach to motif discovery in which informative priors over sequence positions are used to guide the search. Although this approach will work for any motif model and any search/optimization strategy, the initial version of PRIORITY adopts a PSSM model and collapsed Gibbs sampling. PRIORITY is packaged with priors designed to measure how likely each sequence position is to be bound by three specific structural classes of TFs: basic leucine zipper, forkhead, and basic helix loop helix. In addition to discovering TF binding sites and a motif model for those binding sites, PRIORITY also predicts the structural class of the TF recognizing the binding sites.