Loss vs Iteration
Adaptive Learning Rates
Watch how different optimizers navigate loss landscapes. AdaGrad adapts to sparse features, RMSprop adds decay, Adam combines momentum with adaptive rates.
Watch how different optimizers navigate loss landscapes. AdaGrad adapts to sparse features, RMSprop adds decay, Adam combines momentum with adaptive rates.
Loss Landscape
Parameters
Controls
Optimizers
SGD
Loss: 0
AdaGrad
Loss: 0
RMSprop
Loss: 0
Adam
Loss: 0
Key Differences:
• SGD: Fixed learning rate
• AdaGrad: Accumulates squared gradients (slows down)
• RMSprop: Exponential moving average (decay)
• Adam: Momentum + adaptive rate
• SGD: Fixed learning rate
• AdaGrad: Accumulates squared gradients (slows down)
• RMSprop: Exponential moving average (decay)
• Adam: Momentum + adaptive rate