Idea
Gradient descent is an optimization algorithm used to minimize a function by iteratively moving towards the steepest descent, which is indicated by the negative gradient. It is widely used in machine learning and deep learning to minimize the cost function, thereby improving model accuracy.
How It Works
- Initialize Parameters: Start with random values for the model parameters (weights).
- Compute Gradient: Calculate the gradient (partial derivatives) of the cost function with respect to the parameters. This tells you the direction of the steepest ascent.
- Update Parameters: Adjust the parameters in the opposite direction of the gradient to minimize the cost function:
where is the parameters, is the learning rate and is the gradient of the cost function. 4. Iterate: Repeat the process until convergence (i.e., when changes in the cost function are negligible).
Stochastic Gradient Descent
SGD calculates the gradient of the cost function using only a single randomly selected training example in each iteration.
Feature | Batch Gradient Descent (BGD) | Stochastic Gradient Descent (SGD) | Mini-Batch Gradient Descent |
---|---|---|---|
Data Used for Each Update | Entire training dataset | Single training example | Small batch of training examples |
Computational Cost per Update | High | Low | Medium |
Convergence Path | Stable, direct | Noisy, erratic | Balanced |
Convergence Speed | Slow (for large datasets) | Fast | Faster than BGD, slower than SGD |
Potential for Local Minima Escape | Low | High | Medium |
Update Frequency | once per epoch | once per training example | once per mini-batch |