Rong Ge, Duke University

News

Teaching: Due to COVID-19 situations, I will no longer be teaching my graduate course Algorithmic Aspects of Machine Leanring 2020 Fall. I will be co-teaching COMPSCI 330 with Debmalya Panigrahi on both 2020 Fall and 2021 Spring semesters, and we will also be co-teaching a graduate seminar course in 2021 Spring.
Not news anymore but check my book with Majid Janzamin, Anima Anandkumar, Jean Kossafi on tensor decompositions. Also check the draft on a theory of deep learning book organized by Sanjeev Arora.
I visited IAS for the Special Year on Optimization, Statistics, and Theoretical Machine Learning.

My Research

I am broadly interested in theoretical computer science and machine learning. Modern machine learning algorithms such as deep learning try to automatically learn useful hidden representations of the data. How can we formalize hidden structures in the data, and how do we design efficient algorithms to find them? My research aims to answer these questions by studying problems that arise in analyzing text, images and other forms of data, using techniques such as non-convex optimization and tensor decompositions. See the Research page for more details.

My thesis: Provable Algorithms for Machine Learning Problems

Students & Post-docs

Current PhD students:
Abraham Frandsen
Xiang Wang
Keerti Anand (co-advised with Debmalya Panigrahi)
Chenwei Wu
Mo Zhou
Muthu Chidambaram (upcoming)

Post-docs:
Holden Lee (with Jianfeng Lu)
Yu Cheng (with many others in algorithms group, now faculty at UIC)

Selected Publications

How to Escape Saddle Points Efficiently

with Chi Jin, Praneeth Netrapalli, Sham M. Kakade, Michael I. Jordan. In ICML 2017.

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are non-degenerate, all second-order stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free. Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the non-convex optimization community.

Matrix Completion has No Spurious Local Minimum

with Jason D. Lee and Tengyu Ma. In NIPS 2016. Best Student Paper.

Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular and effective in practice. Despite recent progress in proving various non-convex algorithms converge from a good initial point, it remains unclear why random or arbitrary initialization suffices in practice. We prove that the commonly used non-convex objective function for matrix completion has no spurious local minima -- all local minima must also be global. Therefore, many popular optimization algorithms such as (stochastic) gradient descent can provably solve matrix completion with arbitrary initialization in polynomial time.

A Practical Algorithm for Topic Modeling with Provable Guarantees

with Sanjeev Arora, Yoni Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, Michael Zhu, in ICML 2013

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model inference have been based on a maximum likelihood objective. Efficient algorithms exist that approximate this objective, but they have no provable guarantees. Recently, algorithms have been introduced that provide provable bounds, but these algorithms are not practical because they are inefficient and not robust to violations of model assumptions. In this paper we present an algorithm for topic model inference that is both provable and practical. The algorithm produces results comparable to the best MCMC implementations while running orders of magnitude faster.

Tensor decompositions for learning latent variable models

with Anima Anandkumar, Daniel Hsu, Sham M. Kakade, Matus Telgarsky. In JMLR Vol 15.

This work considers a computationally and statistically efficient parameter estimation method
for a wide class of latent variable models|including Gaussian mixture models, hidden Markov
models, and latent Dirichlet allocation|which exploits a certain tensor structure in their loworder
observable moments (typically, of second- and third-order). Specifically, parameter estimation
is reduced to the problem of extracting a certain (orthogonal) decomposition of a symmetric
tensor derived from the moments; this decomposition can be viewed as a natural generalization
of the singular value decomposition for matrices. Although tensor decompositions are generally
intractable to compute, the decomposition of these specially structured tensors can be efficiently
obtained by a variety of approaches, including power iterations and maximization approaches
(similar to the case of matrices). A detailed analysis of a robust tensor power method is provided,
establishing an analogue of Wedin's perturbation theorem for the singular vectors of
matrices. This implies a robust and computationally tractable estimation approach for several
popular latent variable models.

Workshops STOC2017, STOC2018

Contact

Email:	rongge AT cs DOT duke DOT edu
Tel:	+1 (919) 660-7330
Mail:	Duke University Campus Box 90129 308 Research Drive (LSRC Building) Room D226 Durham, NC 27708 USA