➡️ Lecture 4: Forward Propagation

How a neural network makes its first guess. We'll follow the data on its exciting journey through the network, from input to final prediction!

🤔Part 1: What is Forward Propagation?

Imagine you show a picture of a cat to a brand new, untrained neural network and ask, "What is this?" The network has no idea what a cat is. Its internal weights and biases are just random numbers. The process it uses to take that image and produce its very first, completely random guess is called Forward Propagation.

It's the essential first step in the life of a neural network. It's a one-way trip for information, starting from the raw input data (the pixels of the cat image), moving forward through all the hidden layers of neurons, and ending at the final, conclusive guess from the output layer. This entire process is a giant chain of calculations, where the output of one layer becomes the input for the next.

Analogy: The Gourmet Sandwich Factory

Think of a neural network as a high-tech factory that makes sandwiches. Forward propagation is the entire assembly line from start to finish.

  • Input Layer (The Loading Dock): The raw ingredients arrive. For a sandwich, this is your bread, lettuce, tomato, and cheese. For a neural network, this is your data, like `[hours_studied, hours_slept]`.
  • Hidden Layer 1 (The Prep Station): This station takes the raw ingredients and does the first transformation. A "Chopping Neuron" might process the lettuce and tomato. A "Cheese Slicing Neuron" handles the cheese. It turns basic ingredients into prepared ingredients.
  • Hidden Layer 2 (The Assembly Station): This station doesn't see the original ingredients, only the prepped ones from the previous station. It combines them in a specific order, perhaps adding sauces or spices. It creates a more complex product.
  • Output Layer (The Final Inspection): This station looks at the fully assembled sandwich and makes a final judgment. It outputs a label: "This is a Club Sandwich" or "This is a Veggie Delight". This is the network's final prediction.

Forward propagation is this complete, uninterrupted flow. No station can work until the one before it is finished. It’s the process of turning raw data into a sophisticated prediction.

👣Part 2: A Step-by-Step Journey

Let's follow a single piece of data through a simple network. Our goal is to predict if a student will pass an exam based on two features.

Our Sample Problem

Inputs (X): `[hours_studied, hours_slept]`

Output (Y): `[will_pass]` (1 for Yes, 0 for No)

Example Student: Studied for 5 hours, slept for 8 hours. So, our input vector is `X = [5, 8]`.

Step 1: The Input Layer to the First Hidden Layer

The journey begins. The input data `[5, 8]` is sent to every neuron in the first hidden layer. Each of these hidden neurons has its own unique set of weights and its own bias. Let's say our hidden layer has two neurons, H1 and H2. Each will perform its own calculation.

The formula is the same one we learned for a single neuron: the weighted sum + bias. We do this for every neuron in the layer.

$$ Z^{[1]} = X \cdot W^{[1]} + b^{[1]} $$

Simple Explanation:
• The superscript `[1]` tells us we are working on the calculations for layer 1.
X is our input data, a matrix where each row is a sample. For our one student, it's `[5, 8]`.
is the weight matrix for layer 1. It contains all the connection strengths. If we have 2 inputs and 2 hidden neurons, this matrix will have 4 weights in a 2x2 shape.
is the bias vector for layer 1. Each hidden neuron (H1 and H2) gets its own personal bias to "nudge" its result.
is the final result of this calculation—a vector containing the raw scores for both H1 and H2, before they are activated.

Step 2: Applying the Activation Function

The raw scores in `Z¹` could be any number (e.g., `[7.3, -2.1]`). These aren't very useful for the next layer. We need to standardize them. To do this, we pass them through an activation function (like Sigmoid). This squishes every value to be between 0 and 1, turning the raw score into a meaningful signal strength.

$$ A^{[1]} = \sigma(Z^{[1]}) $$

Simple Explanation:
• We take the raw scores from the previous step, `Z¹`.
• We apply the Sigmoid function (σ) to every single number inside `Z¹`. A score of 7.3 might become ~0.999, and -2.1 might become ~0.11.
is the final output (the "activations") of the first hidden layer. This is the processed, meaningful information that gets passed forward to the next stage of the factory.

Step 3: On to the Output Layer!

The process now repeats itself exactly, but for the next layer. The activated output of our hidden layer, `A¹`, now becomes the input for the final output layer.

$$ Z^{[2]} = A^{[1]} \cdot W^{[2]} + b^{[2]} $$
$$ \hat{y} = A^{[2]} = \sigma(Z^{[2]}) $$

Simple Explanation:
• We perform the same weighted sum, but this time using the weights (W²) and biases (b²) that connect the hidden layer to the output layer.
• The final result, `A²`, is our network's official prediction! We often give it a special name, `ŷ` ("y-hat"), to distinguish it from the true answer, `y`.
• If `ŷ` is 0.89, the network is formally guessing that there is an 89% probability that the student will pass. This is the finished sandwich, ready to be served.

🔢Part 3: The Full Calculation: A Look at the Matrices

Let's visualize the entire calculation with the numbers from our interactive demo. This will make the matrix math crystal clear.

Input X: `[5, 8]`     Hidden Layer Weights W¹: `[[0.8, -0.5], [0.4, 0.9]]`     Hidden Layer Biases b¹: `[0.1, -0.2]`

Calculation for Z¹

$$ Z^{[1]} = \begin{bmatrix} 5 & 8 \end{bmatrix} \cdot \begin{bmatrix} 0.8 & -0.5 \\ 0.4 & 0.9 \end{bmatrix} + \begin{bmatrix} 0.1 & -0.2 \end{bmatrix} $$ $$ Z^{[1]} = \begin{bmatrix} (5 \cdot 0.8 + 8 \cdot 0.4) & (5 \cdot -0.5 + 8 \cdot 0.9) \end{bmatrix} + \begin{bmatrix} 0.1 & -0.2 \end{bmatrix} $$ $$ Z^{[1]} = \begin{bmatrix} (4.0 + 3.2) & (-2.5 + 7.2) \end{bmatrix} + \begin{bmatrix} 0.1 & -0.2 \end{bmatrix} $$ $$ Z^{[1]} = \begin{bmatrix} 7.2 & 4.7 \end{bmatrix} + \begin{bmatrix} 0.1 & -0.2 \end{bmatrix} = \begin{bmatrix} 7.3 & 4.5 \end{bmatrix} $$

Calculation for A¹

$$ A^{[1]} = \sigma(Z^{[1]}) = \sigma(\begin{bmatrix} 7.3 & 4.5 \end{bmatrix}) = \begin{bmatrix} 0.999 & 0.989 \end{bmatrix} $$
Input A¹: `[0.999, 0.989]`     Output Layer Weights W²: `[[1.2], [-0.8]]`     Output Layer Bias b²: `[0.3]`

Calculation for Z² and Final Prediction ŷ

$$ Z^{[2]} = \begin{bmatrix} 0.999 & 0.989 \end{bmatrix} \cdot \begin{bmatrix} 1.2 \\ -0.8 \end{bmatrix} + \begin{bmatrix} 0.3 \end{bmatrix} $$ $$ Z^{[2]} = \begin{bmatrix} (0.999 \cdot 1.2 + 0.989 \cdot -0.8) \end{bmatrix} + \begin{bmatrix} 0.3 \end{bmatrix} $$ $$ Z^{[2]} = \begin{bmatrix} 1.199 - 0.791 \end{bmatrix} + \begin{bmatrix} 0.3 \end{bmatrix} = \begin{bmatrix} 0.708 \end{bmatrix} $$ $$ \hat{y} = A^{[2]} = \sigma(0.708) = 0.670 $$

🚀Interactive Forward Propagation

Let's see it in action! Our simple network has 2 input neurons, a hidden layer with 2 neurons, and 1 output neuron. The weights and biases are already set. Change the student's study and sleep hours to see how the prediction changes!

Hours Studied: Hours Slept:

1. Hidden Layer Sums (Z¹)

`Z = (Inputs * Weights) + Bias`

Neuron H1: ...

Neuron H2: ...

2. Hidden Layer Activations (A¹)

`A = sigmoid(Z)`

Neuron H1 Output: ...

Neuron H2 Output: ...

3. Output Layer (ŷ)

Combines H1 and H2 outputs

Final Sum (Z²): ...

Prediction (A²): ...

Final Prediction: The student will PASS