← Back to Gallery

Optimizer Shootout

Compare SGD, AdaGrad, RMSprop, and Adam on loss landscapes

Loss vs Iteration
Adaptive Learning Rates
Watch how different optimizers navigate loss landscapes. AdaGrad adapts to sparse features, RMSprop adds decay, Adam combines momentum with adaptive rates.
Loss Landscape
Parameters
Learning Rate 0.01
Noise Level 0.0
Controls
Optimizers
SGD
Loss: 0
AdaGrad
Loss: 0
RMSprop
Loss: 0
Adam
Loss: 0
Key Differences:
SGD: Fixed learning rate
AdaGrad: Accumulates squared gradients (slows down)
RMSprop: Exponential moving average (decay)
Adam: Momentum + adaptive rate