Gradient Descent

The paper introduces a novel architecture called residual networks (ResNets), which significantly improves deep neural network training by using skip connections to mitigate the vanishing gradient problem. This approach achieved state-of-the-art performance on several benchmarks, including the ImageNet dataset, and has become foundational in modern deep learning applications.
From optimization, to convex optimization, to first order optimization, to gradient descent, to accelerated gradient descent, to AdaGrad, to Adam.