How I Finally Understood Backpropagation by Deriving It by Hand

2026-06-22 · Source: Deep Learning on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Intermediate, short

Summary

The article describes a complete, hand-derived explanation of backpropagation for a small neural network, emphasizing understanding the underlying calculus rather than just memorizing formulas. It details the computation of ∂E/∂w for each weight, starting with the output layer gradients. The derivation breaks down the chain rule application for ∂E/∂w₂ into factors: ∂E/∂ŷ = −(y − ŷ), ∂ŷ/∂z₂ = ŷ(1 − ŷ), and ∂z₂/∂w₂ = a₁, combining them to form ∂E/∂w₂ = −(y − ŷ) · ŷ(1 − ŷ) · a₁. The explanation then extends to hidden layer gradients, showing how ∂E/∂w₁ is computed by applying the chain rule through the longer dependency chain w₁ → z₁ → a₁ → z₂ → ŷ → E, introducing terms like ∂z₂/∂a₁ = w₂ and ∂a₁/∂z₁ = a₁(1 − a₁). The article defines δ₂ and δ₁ (output and hidden deltas) to simplify the gradient expressions, concluding that backpropagation is fundamentally the chain rule applied to a neural network's computational graph.

Key takeaway

For Machine Learning Engineers or AI Students struggling with backpropagation's mechanics, deriving it by hand is crucial. This process clarifies how the chain rule computes ∂E/∂w for each weight, revealing the geometric meaning of gradients and enabling you to debug implementations effectively. Invest time in this foundational exercise; it is the highest-return investment for truly understanding neural network behavior.

Key insights

Deriving backpropagation by hand provides a fundamental understanding of neural network gradient computation.

Principles

Backpropagation applies the chain rule.
Gradients indicate error change per weight.
Deriving equations clarifies network behavior.

Method

The article details a step-by-step derivation of backpropagation for a small neural network, computing ∂E/∂w for output and hidden layers using the chain rule and defining delta terms.

In practice

Derive backpropagation for a small network.
Work through equations by hand.
Analyze dependency chains for gradients.

Topics

Backpropagation
Neural Networks
Gradient Descent
Chain Rule
Machine Learning Education
Calculus for AI

Best for: AI Student, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Deep Learning on Medium.