Logistic Regression from Scratch

2026-03-21 · Source: Machine Learning on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, medium

Summary

This article details the implementation of logistic regression from scratch, focusing on binary classification problems. It begins by defining the model with a parameter vector β that maps n-dimensional data points to discrete class outputs, similar to linear regression, but specifically for two classes. The core concept involves transforming the model's linear output (z_i, ranging from - ∞ to ∞) into a probability (p_i, ranging from 0 to 1) using the logistic (sigmoid) function, derived from the log-odds (logit) concept. The article then explains how to fit this model using Maximum Likelihood Estimation (MLE), which involves minimizing the negative log-likelihood, also known as Binary Cross-Entropy loss. Since there is no closed-form solution for this minimization, the article introduces gradient descent as the approximation method, providing the mathematical derivation for the gradient of the loss function. Finally, it demonstrates the application of this logistic regression model to the "Titanic — Machine Learning from Disaster" Kaggle competition data, achieving an accuracy of 77% after 4000 training steps with a learning rate of 0.005.

Key takeaway

For Machine Learning Engineers building binary classifiers, understanding the mathematical foundations of logistic regression is crucial. You should implement the sigmoid function and Binary Cross-Entropy loss, then apply gradient descent to iteratively optimize model parameters. This foundational knowledge will prepare you for more complex neural network architectures and help you debug model training issues effectively.

Key insights

Logistic regression classifies binary data by converting linear model outputs into probabilities via the sigmoid function.

Principles

Log-odds transform linear outputs into probabilities.
Maximum Likelihood Estimation optimizes model parameters.
Gradient descent minimizes non-convex loss functions.

Method

Define a linear model, convert its output to probabilities using the sigmoid function, then minimize Binary Cross-Entropy loss via gradient descent to fit parameters.

In practice

Use `np.random.uniform` for initial model parameters.
Divide loss and gradient by data points to prevent overflow.
Add a bias term (column of 1s) to input data.

Topics

Logistic Regression
Binary Classification
Gradient Descent
Maximum Likelihood Estimation
Sigmoid Function

Best for: AI Student, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.