Logistic Regression from Scratch
Summary
This article details the implementation of logistic regression from scratch, focusing on binary classification problems. It begins by defining the model with a parameter vector β that maps n-dimensional data points to discrete class outputs, similar to linear regression, but specifically for two classes. The core concept involves transforming the model's linear output (z_i, ranging from - ∞ to ∞) into a probability (p_i, ranging from 0 to 1) using the logistic (sigmoid) function, derived from the log-odds (logit) concept. The article then explains how to fit this model using Maximum Likelihood Estimation (MLE), which involves minimizing the negative log-likelihood, also known as Binary Cross-Entropy loss. Since there is no closed-form solution for this minimization, the article introduces gradient descent as the approximation method, providing the mathematical derivation for the gradient of the loss function. Finally, it demonstrates the application of this logistic regression model to the "Titanic — Machine Learning from Disaster" Kaggle competition data, achieving an accuracy of 77% after 4000 training steps with a learning rate of 0.005.
Key takeaway
For Machine Learning Engineers building binary classifiers, understanding the mathematical foundations of logistic regression is crucial. You should implement the sigmoid function and Binary Cross-Entropy loss, then apply gradient descent to iteratively optimize model parameters. This foundational knowledge will prepare you for more complex neural network architectures and help you debug model training issues effectively.
Key insights
Logistic regression classifies binary data by converting linear model outputs into probabilities via the sigmoid function.
Principles
- Log-odds transform linear outputs into probabilities.
- Maximum Likelihood Estimation optimizes model parameters.
- Gradient descent minimizes non-convex loss functions.
Method
Define a linear model, convert its output to probabilities using the sigmoid function, then minimize Binary Cross-Entropy loss via gradient descent to fit parameters.
In practice
- Use `np.random.uniform` for initial model parameters.
- Divide loss and gradient by data points to prevent overflow.
- Add a bias term (column of 1s) to input data.
Topics
- Logistic Regression
- Binary Classification
- Gradient Descent
- Maximum Likelihood Estimation
- Sigmoid Function
Best for: AI Student, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.