Gradient Descent
202502012203
tags: #machine-learning #optimization #algorithms
Gradient descent is an iterative optimization algorithm that finds the minimum of a Cost Function by taking steps proportional to the negative gradient.
How it works:
- Start with random parameters
- Calculate the gradient (slope) of the cost function
- Move in the opposite direction of the gradient
- Repeat until convergence
Key variants:
- Batch Gradient Descent: Uses entire dataset for each update
- Stochastic Gradient Descent (SGD): Uses one example at a time
- Mini-batch: Uses small batches, balancing efficiency and stability
The learning rate α controls step size. Too large causes overshooting, too small causes slow convergence.
For multiple variables, Vectorization makes computation efficient. Feature Scaling often helps gradient descent converge faster by normalizing input ranges.
Reference
Machine Learning Yearning by Andrew Ng