Why Gradient Descent Became Stochastic

· Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Mathematics & Computational Sciences · Depth: Intermediate, long

Summary

The article details the mathematical derivation of linear regression parameters using both the normal equation and gradient descent, then introduces stochastic gradient descent. It begins with a simple linear regression example, deriving slope and intercept formulas, then generalizes to multiple features using matrix notation to derive the normal equation. The computational expense of the normal equation for large datasets, due to matrix inversion, is highlighted. Gradient descent is presented as an iterative alternative, detailing its update mechanism and the critical role of the learning rate. Finally, Stochastic Gradient Descent (SGD) is introduced as an optimization for very large datasets by updating parameters using single observations, contrasting it with batch gradient descent and mentioning mini-batch gradient descent.

Key takeaway

For Machine Learning Engineers optimizing linear regression models, you should prioritize iterative methods like Gradient Descent or Stochastic Gradient Descent when working with large datasets. The Normal Equation, while providing a closed-form solution, becomes computationally prohibitive due to matrix inversion with millions of data points or thousands of features. Carefully tune your learning rate to ensure efficient convergence without overshooting the optimal parameters, especially in deep learning where closed-form solutions are rare.

Key insights

Gradient Descent and its variants offer scalable alternatives to the computationally intensive Normal Equation for optimizing linear regression on large datasets.

Principles

Method

Gradient Descent iteratively updates model parameters β using β := β - α∂MSE/∂β, where α is the learning rate and ∂MSE/∂β is the loss function's gradient.

In practice

Topics

Best for: AI Student, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.