We analyze stochastic gradient descent for
functions. In many cases for non-convex functions the goal is to find a
reasonable local minimum, and the main concern is that gradient updates
are trapped in saddle points. In this paper we identify strict saddle
property for non-convex problem that allows for efficient optimization.
Using this property we show that stochastic gradient descent converges
to a local minimum in a polynomial number of iterations. To the best of
our knowledge this is the first work that gives global convergence
guarantees for stochastic gradient descent on non-convex functions with
exponentially many local minima and saddle points.
analysis can be applied to orthogonal tensor decomposition, which is
widely used in learning a rich class of latent variable models. We
propose a new optimization formulation for the tensor decomposition
problem that has strict saddle property. As a result we get the first
online algorithm for orthogonal tensor decomposition with global