Regularization

202502012208
tags: #machine-learning #overfitting #optimization

Regularization adds a penalty term to the Cost Function to prevent overfitting by constraining model complexity.

Common regularization techniques:

L1 Regularization (Lasso):

Adds sum of absolute values of parameters
Promotes sparsity (drives some weights to zero)
Useful for feature selection

L2 Regularization (Ridge):

Adds sum of squared parameters
Shrinks weights towards zero but doesn't eliminate them
More common and stable than L1

Dropout:

Randomly sets some neurons to zero during training
Prevents co-adaptation of features
Specific to neural networks

Regularization strength is controlled by hyperparameter λ (lambda). Higher λ means more regularization but may increase bias. Use validation set to tune λ.

Regularization is most effective when combined with Feature Scaling to ensure penalties are applied fairly across features.

Reference

Machine Learning Yearning by Andrew Ng