Home Research Publications Teaching


I am now an assistant professor at the Computer Science Department of Duke University.

I got my Ph.D. from the Computer Science Department of Princeton University. My advisor is Sanjeev Arora. I was a post-doc at Microsoft Research, New England


My Research
I am broadly interested in theoretical computer science and machine learning. Modern machine learning algorithms such as deep learning try to automatically learn useful hidden representations of the data. How can we formalize hidden structures in the data, and how do we design efficient algorithms to find them? My research aims to answer these questions by studying problems that arise in analyzing text, images and other forms of data, using techniques such as non-convex optimization and tensor decompositions. See the Research page for more details.

My thesis: Provable Algorithms for Machine Learning Problems

Selected Publications
How to Escape Saddle Points Efficiently arxivabstract
with Chi Jin, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan. To appear in ICML 2017.

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are non-degenerate, all second-order stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free. Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the non-convex optimization community.

Matrix Completion has No Spurious Local Minimum arxivabstract
with Jason D. Lee and Tengyu Ma. In NIPS 2016. Best Student Paper.

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular and effective in practice. Despite recent progress in proving various non-convex algorithms converge from a good initial point, it remains unclear why random or arbitrary initialization suffices in practice. We prove that the commonly used non-convex objective function for matrix completion has no spurious local minima -- all local minima must also be global. Therefore, many popular optimization algorithms such as (stochastic) gradient descent can provably solve matrix completion with arbitrary initialization in polynomial time.

A Practical Algorithm for Topic Modeling with Provable Guarantees arxivabstractcode
with Sanjeev Arora, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu, in ICML 2013
Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model inference have been based on a maximum likelihood objective. Efficient algorithms exist that approximate this objective, but they have no provable guarantees. Recently, algorithms have been introduced that provide provable bounds, but these algorithms are not practical because they are inefficient and not robust to violations of model assumptions. In this paper we present an algorithm for topic model inference that is both provable and practical. The algorithm produces results comparable to the best MCMC implementations while running orders of magnitude faster.

Tensor decompositions for learning latent variable models arxiv abstractblog
with Anima Anandkumar, Daniel Hsu, Sham M. Kakade, Matus Telgarsky. In JMLR Vol 15.
This work considers a computationally and statistically efficient parameter estimation method
for a wide class of latent variable models|including Gaussian mixture models, hidden Markov
models, and latent Dirichlet allocation|which exploits a certain tensor structure in their loworder
observable moments (typically, of second- and third-order). Specifically, parameter estimation
is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric
tensor derived from the moments; this decomposition can be viewed as a natural generalization
of the singular value decomposition for matrices. Although tensor decompositions are generally
intractable to compute, the decomposition of these specially structured tensors can be efficiently
obtained by a variety of approaches, including power iterations and maximization approaches
(similar to the case of matrices). A detailed analysis of a robust tensor power method is provided,
establishing an analogue of Wedin's perturbation theorem for the singular vectors of
matrices. This implies a robust and computationally tractable estimation approach for several
popular latent variable models.

Email: rongge AT cs DOT duke DOT edu
Tel: +1 (919) 660-7330
Mail: Duke University
Campus Box 90129
308 Research Drive (LSRC Building)
Room D226 Durham, NC 27708 USA