Idea

Gradient descent is an optimization algorithm used to minimize a function by iteratively moving towards the steepest descent, which is indicated by the negative gradient. It is widely used in machine learning and deep learning to minimize the cost function, thereby improving model accuracy.

How It Works

Initialize Parameters: Start with random values for the model parameters (weights).
Compute Gradient: Calculate the gradient (partial derivatives) of the cost function with respect to the parameters. This tells you the direction of the steepest ascent.
Update Parameters: Adjust the parameters in the opposite direction of the gradient to minimize the cost function:

θ = θ - α \nabla J (θ)

where $θ$ is the parameters, $α$ is the learning rate and $\nabla J (θ)$ is the gradient of the cost function. 4. Iterate: Repeat the process until convergence (i.e., when changes in the cost function are negligible).

Stochastic Gradient Descent

SGD calculates the gradient of the cost function using only a single randomly selected training example in each iteration.

Feature	Batch Gradient Descent (BGD)	Stochastic Gradient Descent (SGD)	Mini-Batch Gradient Descent
Data Used for Each Update	Entire training dataset	Single training example	Small batch of training examples
Computational Cost per Update	High	Low	Medium
Convergence Path	Stable, direct	Noisy, erratic	Balanced
Convergence Speed	Slow (for large datasets)	Fast	Faster than BGD, slower than SGD
Potential for Local Minima Escape	Low	High	Medium
Update Frequency	once per epoch	once per training example	once per mini-batch

Lin's Notes Garden

Explorer

Gradient Descent

Idea

How It Works

Stochastic Gradient Descent

Graph View

Table of Contents

Backlinks