Show Menu


Marco Mondelli (IST Austria): Understanding Gradient Descent for Over-parameterized Deep Neural Networks

Ort: MPI für Mathematik in den Naturwissenschaften Leipzig,  , Videobroadcast

Training a neural network is a non-convex problem that exhibits spurious and disconnected local minima. Yet, in practice neural networks with millions of parameters are successfully optimized using gradient descent methods. In this talk, I will give some theoretical insights on why this is possible. First, I will show that the combination of stochastic gradient descent and over-parameterization makes the landscape of deep networks approximately connected and, therefore, more favorable to optimization. Then, I will focus on a special case (two-layer network fitting a convex function) and provide a quantitative convergence result by exploiting the displacement convexity of a related Wasserstein gradient flow. Finally, I will go back to deep networks and show that a single wide layer followed by a pyramidal topology suffices to guarantee the global convergence of gradient descent. [Based on joint work with Adel Javanmard, Andrea Montanari, Quynh Nguyen, and Alexander Shevchenko]

No Attachment

Beginn: Aug. 4, 2020, 11 a.m.

Ende: Aug. 4, 2020, noon