Lab 5: Hyperparameter Optimization (Extended)

🎛️ Becoming the master chef of AI: Finding the perfect recipe for your model.

Libraries: Keras, Optuna • Estimated Time: 2+ hours

Part 1: What are Hyperparameters?

Imagine you're baking a cake (our neural network). The ingredients (like flour, sugar, eggs) are your data. The recipe instructions (mix, then bake) are your model code.

But what about the settings you can tweak? How long do you bake it for? At what temperature? How much baking powder do you use? These settings are the hyperparameters.

In AI, hyperparameters are the high-level settings you choose *before* training begins. They control the learning process itself.

Key Hyperparameters We'll Tune Today:

  • Learning Rate: How big of a step the model takes when adjusting its weights.
  • Number of Layers: How "deep" the network is.
  • Number of Neurons per Layer: How "wide" each layer is.
  • Optimizer: The specific algorithm used for gradient descent.
  • Dropout Rate: A regularization technique to prevent overfitting.

Finding the right combination is key to building a state-of-the-art model. This process is called Hyperparameter Optimization or Tuning.

Part 2: Our Starting Point - A Baseline Model

Before we can tune, we need a starting point. Let's build a simple model for the Fashion-MNIST dataset. This will be our "baseline" that we'll try to improve.

import tensorflow as tf
from tensorflow import keras
import numpy as np

# Load and preprocess the Fashion-MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0

def create_baseline_model():
  model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(units=128, activation='relu'),
    keras.layers.Dense(units=10, activation='softmax')
  ])

  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  return model

baseline_model = create_baseline_model()
baseline_model.fit(x_train, y_train, epochs=10, batch_size=64, verbose=0)

loss, accuracy = baseline_model.evaluate(x_test, y_test, verbose=0)
print(f"Baseline Accuracy: {accuracy * 100:.2f}%")
Baseline Accuracy: 88.45%

Okay, our starting point is ~88.5% accuracy after 10 epochs. Our mission is to beat this!

Part 3: Manual Tuning - The Detective Work

Let's start by manually changing one hyperparameter at a time to build our intuition. This will show you why automation is so valuable.

💡 Your Turn #1: Tune the Learning Rate

The `adam` optimizer has a default learning rate of `0.001`. Is that the best? Copy the baseline model code and create a new optimizer with a different learning rate. Try a few values. What happens with a very slow rate (`0.0001`) or a very fast rate (`0.01`)?

# Example for one learning rate
fast_optimizer = keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=fast_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

💡 Your Turn #2: Tune the Number of Neurons

Our baseline has one hidden layer with `128` neurons. Is this optimal? Try changing this number. Re-run the training with `64` neurons, and then with `256` neurons. Does a wider or narrower layer work better?

# Example for changing neurons
model = keras.Sequential([
  keras.layers.Flatten(input_shape=(28, 28)),
  keras.layers.Dense(units=256, activation='relu'), # Changed this line
  keras.layers.Dense(units=10, activation='softmax')
])

💡 Your Turn #3: Tune the Number of Layers

What if our model needs to be deeper to understand the data? Try adding a second hidden layer. Re-run the training with this new architecture. Does a deeper model improve accuracy?

# Example for adding a layer
model = keras.Sequential([
  keras.layers.Flatten(input_shape=(28, 28)),
  keras.layers.Dense(units=128, activation='relu'),
  keras.layers.Dense(units=64, activation='relu'), # New hidden layer
  keras.layers.Dense(units=10, activation='softmax')
])

After trying these, you've probably realized this manual process is slow and painful. You are only changing one thing at a time. What if the best model has a slow learning rate AND two layers? You'd have to test every combination. This is where automated tools shine.

Part 4: Automated Tuning with Optuna 🤖

Optuna is a library that automates the search for the best hyperparameters. You define the *range* of values to test, and Optuna intelligently explores that range to find the best combination.

4.1 Installing Optuna

!pip install optuna

4.2 Defining the Objective Function

This function tells Optuna how to build and evaluate one version of our model.

import optuna

def objective(trial):
  # Suggest a learning rate from 1e-5 to 1e-1 (on a log scale)
  lr = trial.suggest_float('learning_rate', 1e-5, 1e-1, log=True)
  # Suggest a number of neurons from 32 to 256
  n_units = trial.suggest_int('n_units', 32, 256)

  model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(units=n_units, activation='relu'),
    keras.layers.Dense(units=10, activation='softmax')
  ])

  model.compile(optimizer=keras.optimizers.Adam(learning_rate=lr), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  model.fit(x_train, y_train, epochs=10, batch_size=64, verbose=0)

  loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
  return accuracy

4.3 Running the Study

Now we create a "study" and tell Optuna to run our `objective` function. We'll run 20 trials.

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=20)

best_trial = study.best_trial
print(f"Best Accuracy: {best_trial.value}")
print(f"Best Params: {best_trial.params}")
Best Accuracy: 0.8915... Best Params: {'learning_rate': 0.0008..., 'n_units': 210}

4.4 Visualizing the Results

A huge benefit of Optuna is its built-in visualization tools. They help you understand the search.

# Shows how the best score improved over trials
optuna.visualization.plot_optimization_history(study).show()
Optuna Optimization History Plot
# Shows which hyperparameters were most important
optuna.visualization.plot_param_importances(study).show()
Optuna Parameter Importances Plot

From these plots, we can see the `learning_rate` was more important than `n_units` in our search, and the best results were found early on.

Part 5: Your Mission - The Ultimate Tuner

Assignment: Expand the Search Space

Your goal is to find an even better model by giving Optuna more hyperparameters to search through. Can you break 90% accuracy?

Your Task:

Modify the `objective` function to search for three more hyperparameters:

  1. Number of Layers: Use `trial.suggest_int('n_layers', 1, 3)`. Use a `for` loop to build the model with the chosen number of layers.
  2. Dropout Rate: For each hidden layer, add a Dropout layer. Suggest a rate with `trial.suggest_float('dropout', 0.1, 0.5)`. This helps prevent overfitting.
  3. Choice of Optimizer: Use `trial.suggest_categorical('optimizer', ['adam', 'rmsprop', 'sgd'])`.

Run the study again with this much larger search space for at least 50 trials. Report the best accuracy and the full set of best parameters you found.

Part 6: Going Further - Pruning Unpromising Trials

Running 50+ trials can be slow. What if a trial is obviously doing poorly after just a few epochs? It's a waste of time to finish training it. Pruning is a technique to automatically stop these unpromising trials early.

We can add a "callback" to our `.fit()` method that reports the validation accuracy to Optuna after each epoch. If it's not looking good, Optuna will raise an exception and stop the trial.

from optuna.integration import TFKerasPruningCallback

def objective_with_pruning(trial):
  # ... (Suggest all your hyperparameters here as in the assignment) ...
  n_layers = trial.suggest_int('n_layers', 1, 3)
  lr = trial.suggest_float('learning_rate', 1e-4, 1e-1, log=True)

  model = keras.Sequential()
  model.add(keras.layers.Flatten(input_shape=(28, 28)))
  for i in range(n_layers):
    n_units = trial.suggest_int(f'n_units_l{i}', 32, 256)
    model.add(keras.layers.Dense(n_units, activation='relu'))
  model.add(keras.layers.Dense(10, activation='softmax'))

  model.compile(optimizer=keras.optimizers.Adam(lr), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

  # Add the pruning callback
  pruning_callback = TFKerasPruningCallback(trial, 'val_accuracy')

  history = model.fit(
    x_train, y_train,
    epochs=20,
    validation_split=0.2, # Pruning requires a validation set
    callbacks=[pruning_callback],
    verbose=0
  )

  # Return the last reported validation accuracy
  return history.history['val_accuracy'][-1]

💡 Your Turn #4: Implement Pruning

Integrate the pruning callback into your full assignment solution. Create a new study and run it. You should see messages like `Trial 5 pruned` in the output. Does this speed up your search?

Part 7: Bonus - The Housing Price Challenge

Classification is one type of problem. Another is Regression, where you predict a continuous number. The classic "House Prices" competition on Kaggle is a perfect place to apply your new tuning skills.

Kaggle: House Prices - Advanced Regression Techniques

The goal is to predict the final sale price of a house based on 79 features.

Your Challenge:

  1. Setup: This dataset is more complex. You will need to handle missing values (e.g., fill them with the mean/median) and convert categorical features into numbers (e.g., using `pd.get_dummies`). This preprocessing is a major part of the challenge.
  2. Modify the Objective Function:
    • The final layer must have **1 neuron** with no activation: `keras.layers.Dense(units=1)`.
    • The loss function for regression is typically `mean_squared_error`.
    • The metric you want to MINIMIZE is `root_mean_squared_error`.
    • Set `direction='minimize'` when you create your Optuna study.
  3. Run the Study: Set up an Optuna study to find the best hyperparameters. Report the lowest error you achieve.

Part 8: Submission Guidelines

To complete this lab, please follow these instructions carefully.

  1. Complete all "Your Turn" tasks and the main "Lab Assignment" in a single Google Colab notebook. The Kaggle project is a bonus.
  2. For the assignment, make sure the output of your final Optuna study (with all hyperparameters and pruning) is visible, showing the final best value and best parameters.
  3. Add a Text Cell at the end summarizing the best accuracy you found and the hyperparameters that achieved it.
  4. Ensure all your code cells have been run so that their outputs and plots are visible.
  5. When you are finished, generate a shareable link. In Colab, click "Share" and set access to "Anyone with the link" can "Viewer".
  6. Click "Copy link" and submit this link as your assignment.