Lab 4: MNIST Digit Classification

🔢 Teach a computer to read handwriting using professional AI tools.

Libraries: TensorFlow, Keras • Estimated Time: 3 hours

Part 1: From "Scratch" to Pro Tools

In the last lab, you built a neural network from scratch using NumPy. You manually coded forward propagation, backward propagation, and the weight updates. It was a fantastic way to learn what's happening under the hood!

But in the real world, data scientists don't do that every time. They use powerful libraries that automate the process. Today, we'll use TensorFlow and Keras, the most popular tools for building neural networks.

Our New Toolkit:

  • TensorFlow: A powerful library from Google for high-performance numerical computation. Think of it as the engine of our race car.
  • Keras: A user-friendly API that runs on top of TensorFlow. It provides simple, building-block-like commands to create neural networks. Think of it as the easy-to-use steering wheel and dashboard for our race car. You tell Keras what you want, and it handles all the complex TensorFlow engine work for you.

Our mission is to solve the "Hello, World!" of computer vision: classifying the MNIST dataset of handwritten digits.

Part 2: Getting the Data

Keras makes it incredibly easy to access famous datasets like MNIST.

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# Load the MNIST dataset from Keras
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

Just like that, we have our data! It's already split into two parts:

💡 Your Turn

Let's inspect our data. Use the `.shape` attribute to see the dimensions of the data. In a new Colab cell, type and run the following:

print(f"x_train shape: {x_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"x_test shape: {x_test.shape}")
print(f"y_test shape: {y_test.shape}")

What does `(60000, 28, 28)` mean? It means we have 60,000 images, and each image is a 28x28 grid of pixels.

Part 3: Preparing the Evidence

Before we feed our data to the network, we need to prepare it. This is one of the most important steps in any machine learning project.

3.1 Visualizing the Data

Let's look at one of the images to see what we're working with.

# Display the first image in the training data
plt.imshow(x_train[0], cmap='gray')
plt.show()
print(f"The label for the first image is: {y_train[0]}")
Handwritten digit 5
The label for the first image is: 5

3.2 Normalizing the Pixel Values

The pixel values in our images range from 0 (black) to 255 (white). Neural networks work best when input values are small, typically between 0 and 1. So, we'll normalize the data by dividing every pixel value by 255.

# Normalize pixel values to be between 0 and 1
x_train = x_train / 255.0
x_test = x_test / 255.0

💡 Your Turn

After running the code above, print the first pixel of the first image to confirm it's been normalized. Type `print(x_train[0][10][10])`. You should see a number between 0 and 1, not a large number like 192.

3.3 Flattening the Images

Our network will use simple ("Dense") layers that expect a 1D list of numbers, not a 2D grid. We need to "flatten" each 28x28 image into a single 1x784 array.

# Flatten the images from 28x28 to 784
x_train = x_train.reshape(-1, 28*28)
x_test = x_test.reshape(-1, 28*28)

print(f"New x_train shape: {x_train.shape}")
New x_train shape: (60000, 784)

Part 4: Assembling the Brain

Now for the fun part! With Keras, building a neural network is like stacking LEGO blocks. We'll use a `Sequential` model, which means a simple, layer-by-layer stack.

model = keras.Sequential([
  # Input Layer: We need to tell the model what shape the input data is (784 features)
  keras.layers.InputLayer(input_shape=(784,)),

  # Hidden Layer 1: A "Dense" layer means every neuron is connected to every neuron in the previous layer.
  # We'll use 128 neurons. The more neurons, the more complex patterns it can learn.
  # 'relu' (Rectified Linear Unit) is a common and effective activation function.
  keras.layers.Dense(units=128, activation='relu'),

  # Output Layer: It must have 10 neurons, one for each digit (0-9).
  # 'softmax' activation is perfect for classification. It turns the outputs into probabilities,
  # so all 10 neuron outputs will add up to 1. The highest probability is the model's prediction.
  keras.layers.Dense(units=10, activation='softmax')
])

Model Summary

Let's print a summary of our architecture.

model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 128) 100480 dense_1 (Dense) (None, 10) 1290 ================================================================= Total params: 101,770 Trainable params: 101,770 Non-trainable params: 0 _________________________________________________________________

Detective's Note: Look at "Param #". The first layer has 100,480 parameters! That's `784 inputs * 128 neurons + 128 biases`. Imagine calculating the gradient for all of those by hand! This is why we use Keras.

💡 Your Turn

What would the `Param #` be for the first dense layer if you changed it from `128` neurons to `64` neurons? Calculate it first (`784 * 64 + 64`), then change the code and run `model.summary()` to check your answer.

Part 5: Teaching the Brain

5.1 Compiling the Model

Before we can train, we need to "compile" the model. This is where we define the learning process.

model.compile(
  # Optimizer: This is the algorithm that performs the gradient descent. 'adam' is a popular and effective choice.
  optimizer='adam',

  # Loss Function: This measures how wrong the model's predictions are.
  # 'sparse_categorical_crossentropy' is used when you have multiple classes and the labels are integers (like 0, 1, 2...).
  loss='sparse_categorical_crossentropy',

  # Metrics: This is what we want to monitor during training. We want to see the 'accuracy'.
  metrics=['accuracy']
)

5.2 Training the Model

Now we're ready to train! The `.fit()` method is where the magic happens. It will show the data to the network, calculate the loss, and update the weights over and over.

# Train the model
history = model.fit(
  x_train, y_train,
  epochs=5, # An epoch is one full pass through the entire training dataset.
  batch_size=32, # Process the data in batches of 32 images at a time.
  validation_split=0.2 # Use 20% of training data for validation during training.
)
Epoch 1/5 1500/1500 [==============================] - 5s 3ms/step - loss: 0.2871 - accuracy: 0.9184 - val_loss: 0.1554 - val_accuracy: 0.9557 Epoch 2/5 1500/1500 [==============================] - 4s 3ms/step - loss: 0.1268 - accuracy: 0.9631 - val_loss: 0.1119 - val_accuracy: 0.9673 ... Epoch 5/5 1500/1500 [==============================] - 4s 3ms/step - loss: 0.0519 - accuracy: 0.9840 - val_loss: 0.0818 - val_accuracy: 0.9753

Wow! After just 5 passes through the data, our model is achieving over 97% accuracy on the validation set! This is the power of TensorFlow and Keras.

Part 6: Grading the Test

Our model did well on the validation data, but the real test is how it performs on the `x_test` set, which it has never seen at all.

test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f"\nTest Accuracy: {test_accuracy*100:.2f}%")
313/313 [==============================] - 1s 2ms/step - loss: 0.0768 - accuracy: 0.9759 Test Accuracy: 97.59%

Amazing! Now let's make a prediction on a single image and see the result.

# Select an image from the test set
test_image = x_test[0]
# The model expects a batch of images, so we add an extra dimension
test_image_batch = np.expand_dims(test_image, axis=0)
# Make a prediction
prediction = model.predict(test_image_batch)
predicted_label = np.argmax(prediction)

# Show the image and the prediction
plt.imshow(test_image.reshape(28,28), cmap='gray')
plt.show()
print(f"Model prediction: {predicted_label}")
print(f"Actual label: {y_test[0]}")
Handwritten digit 7
Model prediction: 7 Actual label: 7

💡 Your Turn

Copy the prediction code block above. Change the index from `[0]` to another number (e.g., `[25]`) to test a different image. Does the model get it right?

Part 7: Your Mission - Improve the Model

Assignment: Become an AI Architect

Your goal is to improve the test accuracy of our model. Can you get it above 98%? Experiment with the following ideas in your Colab notebook. Remember to rebuild and re-train the model after each change.

Ideas to Try:

  1. More Neurons: Change the number of units in the first Dense layer from `128` to `256`. Does a bigger layer help?
  2. Deeper Network: Add a second hidden Dense layer. After the first `Dense(128, ...)` layer, add another one, e.g., `keras.layers.Dense(units=64, activation='relu')`. Does making the network deeper improve performance?
  3. More Training: Increase the number of `epochs` from `5` to `10`. Does giving the model more time to learn help?
  4. The Dropout Technique: Overfitting is when a model learns the training data too well but fails on new data. A "Dropout" layer randomly "turns off" some neurons during training to prevent this. Try adding `keras.layers.Dropout(0.2)` after your Dense layer(s). This will randomly drop 20% of the neurons.

For each experiment, record the final test accuracy in a text cell. Which combination of changes gave you the best result?

Part 8: Bonus - Fashion Police

Now that you can classify digits, let's try something a bit harder: classifying images of clothing! The Fashion-MNIST dataset has the exact same format as MNIST (784 pixels, 10 classes), but it's a more challenging problem.

Kaggle & The Fashion-MNIST Dataset

This dataset is also built into Keras, making it easy to start. The labels are numbers from 0 to 9, corresponding to different clothing items like 'T-shirt/top', 'Trouser', 'Pullover', etc.

Your Challenge:

  1. Load the Data: In a new notebook, use `(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()` to get the data.
  2. Build and Train: Copy your best model architecture from the MNIST assignment. Preprocess the fashion data in the same way (normalize, flatten) and train your network on it.
  3. Evaluate: What test accuracy can you achieve on this harder dataset? It will likely be lower than what you got on MNIST. Can you tweak your model to get the best possible accuracy?

Part 9: Submission Guidelines

To complete this lab, please follow these instructions carefully.

  1. Complete all "Your Turn" tasks and the main "Lab Assignment" in a single Google Colab notebook. The Kaggle project is a bonus.
  2. In the assignment section, use Text Cells to clearly label each experiment you run and to report the final test accuracy for each one. Conclude with which model performed the best.
  3. Ensure all your code cells have been run so that their outputs and plots are visible.
  4. When you are finished, generate a shareable link. In Colab, click the "Share" button in the top right.
  5. In the popup, under "General access", change "Restricted" to "Anyone with the link" and ensure the role is set to "Viewer".
  6. Click "Copy link" and submit this link as your assignment.