Gradient Descent

[Paper Exploration] Adam: A Method for Stochastic Optimization

From optimization, to convex optimization, to first order optimization, to gradient descent, to accelerated gradient descent, to AdaGrad, to Adam.

Gradient Descent

[Paper Exploration] Deep Residual Learning for Image Recognition

The paper introduces a novel architecture called residual networks (ResNets), which significantly improves deep neural network training by using skip connections to mitigate the vanishing gradient problem. This approach achieved state-of-the-art performance on several benchmarks, including the ImageNet dataset, and has become foundational in modern deep learning applications.