Idea

Gradient descent is an optimization algorithm used to minimize a function by iteratively moving towards the steepest descent, which is indicated by the negative gradient. It is widely used in machine learning and deep learning to minimize the cost function, thereby improving model accuracy.

How It Works

  1. Initialize Parameters: Start with random values for the model parameters (weights).
  2. Compute Gradient: Calculate the gradient (partial derivatives) of the cost function with respect to the parameters. This tells you the direction of the steepest ascent.
  3. Update Parameters: Adjust the parameters in the opposite direction of the gradient to minimize the cost function:

where is the parameters, is the learning rate and is the gradient of the cost function. 4. Iterate: Repeat the process until convergence (i.e., when changes in the cost function are negligible).

Stochastic Gradient Descent

SGD calculates the gradient of the cost function using only a single randomly selected training example in each iteration.

FeatureBatch Gradient Descent (BGD)Stochastic Gradient Descent (SGD)Mini-Batch Gradient Descent
Data Used for Each UpdateEntire training datasetSingle training exampleSmall batch of training examples
Computational Cost per UpdateHighLowMedium
Convergence PathStable, directNoisy, erraticBalanced
Convergence SpeedSlow (for large datasets)FastFaster than BGD, slower than SGD
Potential for Local Minima EscapeLowHighMedium
Update Frequencyonce per epochonce per training exampleonce per mini-batch