Understanding the "effective capacity" of deep nets via a compression approach
Sanjeev Arora, Princeton University and Institute for Advanced Study
Deep nets typically have way more parameters than the number of training samples. Classical statistics/learning theory suggests they should be prone to overfitting (i.e., doing poorly on unseen data) but in practice this is not the case. Recent works try to give an explanation using PAC-Bayes and Margin-based analyses, but do not as yet result on bounds on "effective capacity" (roughly, sample complexity) better than naive parameter counting.

We describe new methods to give estimates of “effective capacity” that're orders of magnitude better in practice than earlier attempts. These rely upon new succinct reparametrizations of the trained net --- a compression scheme that is explicit and efficient.  Our results also provide some theoretical justification for widespread empirical success in compressing deep nets. Analysis of correctness of our compression relies upon some newly identified “noise stability” properties of trained deep nets, which are also experimentally verified. The study of these properties and resulting generalization bounds are also extended to convolutional nets, which had been difficult in earlier attempts.

Towards ML You Can Rely On
Aleksander Madry, MIT

Machine learning has made a significant progress over the last decade. In fact, many believe now that ML techniques are a “silver bullet”, capable of making progress on any real-world problem they are applied to.

But can we truly rely on this toolkit?

In this talk, I will discuss one of the key challenges to making ML be dependable and secure: the widespread vulnerability of state-of-the-art classifiers to adversarial misclassification (aka adversarial examples). I will then describe a framework that enables us to reason about this vulnerability in a principled manner as well as develop methods for alleviating the problem it poses.

Is Depth Needed for Deep Learning? Circuit Complexity in Neural Networks

Ohad Shamir, Weizmann Institute of Science
Deep learning, as its name indicates, is based on training artificial neural networks with many layers. A key theoretical question is to understand why such depth is beneficial, and when is it provably necessary to express certain types of functions. In fact, this question is closely related to circuit complexity, which has long been studied in theoretical computer science -- albeit for different reasons, and for circuits which differ in some important ways from modern neural networks. Despite this similarity, the interaction between the circuit complexity and machine learning research communities is currently quite limited. In this talk, I'll survey some of the recent depth separation results developed in the machine learning community, and discuss open questions where insights from circuit complexity might help.
The talk is aimed at a general theoretical computer science audience, and no prior knowledge about deep learning will be assumed.

Algorithmic Regularization in Over-parameterized Matrix Recovery and Neural Networks with Quadratic Activations
Tengyu Ma, Facebook AI/Stanford
Over-parameterized models are widely and successfully used in deep learning, but their workings are far from understood. In many practical scenarios, the learned model generalizes to the test data, even though the hypothesis class contains a model that completely overfits the training data and no regularization is applied. 

In this talk, we will show that such phenomenon occurs in over-parameterized matrix recovery models as well, and prove that the gradient descent algorithm provides additional regularization power that prevents the overfitting. The result can be extended to learning one-hidden-layer neural networks with quadratic activations. The key insight here is that gradient descent prefers searching through the set of low complexity (that is, low-rank) models first, and converges to a low complexity model with a good training error if such a model exists.

Based on joint work with Yuanzhi Li and Hongyang Zhang. https://arxiv.org/abs/1712.09203

Generalization and Implicit Regularization in Deep Learning.
Nathan Srebro, TTI Chicago

Do Deep Networks have Bad Local Minima?Brief survey on optimization landscape for neural networks
Rong Ge, Duke University
In practice, simple algorithms like gradient descent seems to be able to find optimal solutions for neural networks. One way to explain this is to study the optimization landscape: whether the objective function has any bad local optimal solutions. In this talk we will give a brief (and incomplete) survey on recent results for optimization landscape of neural networks in several different settings.

On PAC learning and deep learning
Amit Daniely, Hebrew university of Jerusalem/Google Tel-Aviv
I will discuss the extent to which PAC learning, as we know it today, can form a theoretical basis for deep learning.