Gradient Descent

202502012203
tags: #machine-learning #optimization #algorithms

Gradient descent is an iterative optimization algorithm that finds the minimum of a Cost Function by taking steps proportional to the negative gradient.

How it works:

Start with random parameters
Calculate the gradient (slope) of the cost function
Move in the opposite direction of the gradient
Repeat until convergence

Key variants:

Batch Gradient Descent: Uses entire dataset for each update
Stochastic Gradient Descent (SGD): Uses one example at a time
Mini-batch: Uses small batches, balancing efficiency and stability

The learning rate α controls step size. Too large causes overshooting, too small causes slow convergence.

For multiple variables, Vectorization makes computation efficient. Feature Scaling often helps gradient descent converge faster by normalizing input ranges.

Reference

Machine Learning Yearning by Andrew Ng