Understanding the "effective capacity" of deep nets via a compression approach
Sanjeev Arora, Princeton University and Institute for Advanced Study
nets typically have way more parameters than the number of training
samples. Classical statistics/learning theory suggests they should be
prone to overfitting (i.e., doing poorly on unseen data) but in
practice this is not the case. Recent works try to give an explanation
using PAC-Bayes and Margin-based analyses, but do not as yet result on
bounds on "effective capacity" (roughly, sample complexity) better than
naive parameter counting.
We describe new methods to give
estimates of “effective capacity” that're orders of magnitude better in
practice than earlier attempts. These rely upon new succinct
reparametrizations of the trained net --- a compression scheme that is
explicit and efficient. Our results also provide some theoretical
justification for widespread empirical success in compressing deep
nets. Analysis of correctness of our compression relies upon some newly
identified “noise stability” properties of trained deep nets, which are
also experimentally verified. The study of these properties and
resulting generalization bounds are also extended to convolutional
nets, which had been difficult in earlier attempts.
Towards ML You Can Rely On
Aleksander Madry, MIT
learning has made a significant progress over the last decade. In fact,
many believe now that ML techniques are a “silver bullet”, capable of
making progress on any real-world problem they are applied to.
But can we truly rely on this toolkit?
this talk, I will discuss one of the key challenges to making ML be
dependable and secure: the widespread vulnerability of state-of-the-art
classifiers to adversarial misclassification (aka adversarial
examples). I will then describe a framework that enables us to reason
about this vulnerability in a principled manner as well as develop
methods for alleviating the problem it poses.
Is Depth Needed for Deep Learning? Circuit Complexity in Neural Networks
Ohad Shamir, Weizmann Institute of Science
learning, as its name indicates, is based on training artificial neural
networks with many layers. A key theoretical question is to understand
why such depth is beneficial, and when is it provably necessary to
express certain types of functions. In fact, this question is closely
related to circuit complexity, which has long been studied in
theoretical computer science -- albeit for different reasons, and for
circuits which differ in some important ways from modern neural
networks. Despite this similarity, the interaction between the circuit
complexity and machine learning research communities is currently quite
limited. In this talk, I'll survey some of the recent depth separation
results developed in the machine learning community, and discuss open
questions where insights from circuit complexity might help.
talk is aimed at a general theoretical computer science audience, and
no prior knowledge about deep learning will be assumed.
Algorithmic Regularization in Over-parameterized Matrix Recovery and Neural Networks with Quadratic Activations
Tengyu Ma, Facebook AI/Stanford
models are widely and successfully used in deep learning, but their
workings are far from understood. In many practical scenarios, the
learned model generalizes to the test data, even though the hypothesis
class contains a model that completely overfits the training data and
no regularization is applied.
In this talk, we will show
that such phenomenon occurs in over-parameterized matrix recovery
models as well, and prove that the gradient descent algorithm provides
additional regularization power that prevents the overfitting. The
result can be extended to learning one-hidden-layer neural networks
with quadratic activations. The key insight here is that gradient
descent prefers searching through the set of low complexity (that is,
low-rank) models first, and converges to a low complexity model with a
good training error if such a model exists.
Based on joint work with Yuanzhi Li and Hongyang Zhang. https://arxiv.org/abs/1712.09203
Generalization and Implicit Regularization in Deep Learning.
Nathan Srebro, TTI Chicago
Do Deep Networks have Bad Local Minima?Brief survey on optimization landscape for neural networks
Rong Ge, Duke University
practice, simple algorithms like gradient descent seems to be able to
find optimal solutions for neural networks. One way to explain this is
to study the optimization landscape: whether the objective function has
any bad local optimal solutions. In this talk we will give a brief (and
incomplete) survey on recent results for optimization landscape of
neural networks in several different settings.
On PAC learning and deep learning
Amit Daniely, Hebrew university of Jerusalem/Google Tel-Aviv
I will discuss the extent to which PAC learning, as we know it today, can form a theoretical basis for deep learning.