Home Research Publications Teaching


I am now an assistant professor at the Computer Science Department of Duke University.

I got my Ph.D. from the Computer Science Department of Princeton University. My advisor is Sanjeev Arora. I was a post-doc at Microsoft Research, New England

  • I will be at Simons Institute at Berkeley this semester (Spring 2017).
  • The TCS group at Duke University invites applications for a postdoctoral position.
  • Paper with Tengyu Ma (student author) and Jason Lee got NIPS best student paper award!

My Research
I am broadly interested in theoretical computer science and machine learning. Modern machine learning algorithms such as deep learning try to automatically learn useful hidden representations of the data. How can we formalize hidden structures in the data, and how do we design efficient algorithms to find them? My research aims to answer these questions by studying problems that arise in analyzing text, images and other forms of data, using techniques such as non-convex optimization and tensor decompositions. See the Research page for more details.

My thesis: Provable Algorithms for Machine Learning Problems

Selected Publications
Matrix Completion has No Spurious Local Minimum arxivabstract
with Jason D. Lee and Tengyu Ma. In NIPS 2016. Best Student Paper.

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular and effective in practice. Despite recent progress in proving various non-convex algorithms converge from a good initial point, it remains unclear why random or arbitrary initialization suffices in practice. We prove that the commonly used non-convex objective function for matrix completion has no spurious local minima -- all local minima must also be global. Therefore, many popular optimization algorithms such as (stochastic) gradient descent can provably solve matrix completion with arbitrary initialization in polynomial time.

Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition arxiv abstractblog
with Furong Huang, Chi Jin and Yang Yuan. In COLT 2015.

We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in saddle points. In this paper we identify strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that stochastic gradient descent converges to a local minimum in a polynomial number of iterations. To the best of our knowledge this is the first work that gives global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points.

Our analysis can be applied to orthogonal tensor decomposition, which is widely used in learning a rich class of latent variable models. We propose a new optimization formulation for the tensor decomposition problem that has strict saddle property. As a result we get the first online algorithm for orthogonal tensor decomposition with global convergence guarantee.

A Practical Algorithm for Topic Modeling with Provable Guarantees arxivabstractcode
with Sanjeev Arora, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu, in ICML 2013
Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model inference have been based on a maximum likelihood objective. Efficient algorithms exist that approximate this objective, but they have no provable guarantees. Recently, algorithms have been introduced that provide provable bounds, but these algorithms are not practical because they are inefficient and not robust to violations of model assumptions. In this paper we present an algorithm for topic model inference that is both provable and practical. The algorithm produces results comparable to the best MCMC implementations while running orders of magnitude faster.

Tensor decompositions for learning latent variable models arxiv abstractblog
with Anima Anandkumar, Daniel Hsu, Sham M. Kakade, Matus Telgarsky. In JMLR Vol 15.
This work considers a computationally and statistically efficient parameter estimation method
for a wide class of latent variable models|including Gaussian mixture models, hidden Markov
models, and latent Dirichlet allocation|which exploits a certain tensor structure in their loworder
observable moments (typically, of second- and third-order). Specifically, parameter estimation
is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric
tensor derived from the moments; this decomposition can be viewed as a natural generalization
of the singular value decomposition for matrices. Although tensor decompositions are generally
intractable to compute, the decomposition of these specially structured tensors can be efficiently
obtained by a variety of approaches, including power iterations and maximization approaches
(similar to the case of matrices). A detailed analysis of a robust tensor power method is provided,
establishing an analogue of Wedin's perturbation theorem for the singular vectors of
matrices. This implies a robust and computationally tractable estimation approach for several
popular latent variable models.

Email: rongge AT cs DOT duke DOT edu
Tel: +1 (919) 660-7330
Mail: Duke University
Campus Box 90129
308 Research Drive (LSRC Building)
Room D226 Durham, NC 27708 USA