Machine Learning

March 13, 2023

2.3 Batch and Momentum

1 Batch

1.1 Review: Optimization with Batch

1.2 Small Batch v.s. Large Batch

1.2.1 Training Time

Parallel Computing

1.2.2 Performance

Small Batch size has better performance for training data.
What’s wrong with large batch size on training data? Optimization Fails.
Smaller batch size and momentum help escape critical points.

Small Batch size has better performance on testing data.
What wrong with large batch size on testing data? Overfitting

Flat Minima is better than Sharp Minima.

1.3 Comparison

1.3.1 Reference

2 Momentum

2.1 (Vanilla) Gradient Descent

Move in the opposite direction of gradient descent.

2.2 Gradient Descent + Momentum

Movement: movement of last step minus gradient at present

is the weighted sum of all the previous gradient:

Next

2.4 Error Surface Is Rugged...

Previous

7 Multi-Criteria Decision Modeling