Lab 5: Hyperparameter Optimization (Extended)
🎛️ Becoming the master chef of AI: Finding the perfect recipe for
your model.
Libraries: Keras, Optuna • Estimated Time: 2+ hours
Part 1: What are Hyperparameters?
Imagine you're baking a cake (our neural network). The ingredients (like flour, sugar, eggs) are your
data. The recipe instructions (mix, then bake) are your model code.
But what about the settings you can tweak? How long do you bake it for? At what temperature?
How much baking powder do you use? These settings are the hyperparameters.
In AI, hyperparameters are the high-level settings you choose *before* training begins. They
control the learning process itself.
Key Hyperparameters We'll Tune Today:
- Learning Rate: How big of a step the model takes when adjusting its weights.
- Number of Layers: How "deep" the network is.
- Number of Neurons per Layer: How "wide" each layer is.
- Optimizer: The specific algorithm used for gradient descent.
- Dropout Rate: A regularization technique to prevent overfitting.
Finding the right combination is key to building a state-of-the-art model. This process is
called Hyperparameter Optimization or Tuning.
Part 2: Our Starting Point - A Baseline Model
Before we can tune, we need a starting point. Let's build a simple model for the Fashion-MNIST dataset.
This will be our "baseline" that we'll try to improve.
import tensorflow as tf
from tensorflow import keras
import numpy as np
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
x_train = x_train / 255.0
x_test = x_test / 255.0
def create_baseline_model():
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(units=128, activation='relu'),
keras.layers.Dense(units=10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
baseline_model = create_baseline_model()
baseline_model.fit(x_train, y_train, epochs=10, batch_size=64, verbose=0)
loss, accuracy = baseline_model.evaluate(x_test, y_test, verbose=0)
print(f"Baseline Accuracy: {accuracy *
100:.2f}%")
Baseline Accuracy: 88.45%
Okay, our starting point is ~88.5% accuracy after 10 epochs. Our mission is to beat this!
Part 3: Manual Tuning - The Detective Work
Let's start by manually changing one hyperparameter at a time to build our intuition. This will show you
why automation is so valuable.
💡 Your Turn #1: Tune the Learning Rate
The `adam` optimizer has a default learning rate of `0.001`. Is that the best? Copy the baseline
model code and create a new optimizer with a different learning rate. Try a few values. What happens
with a very slow rate (`0.0001`) or a very fast rate (`0.01`)?
fast_optimizer = keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=fast_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])
💡 Your Turn #2: Tune the Number of Neurons
Our baseline has one hidden layer with `128` neurons. Is this optimal? Try changing this number.
Re-run the training with `64` neurons, and then with `256` neurons. Does a wider or narrower layer
work better?
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(units=256, activation='relu'),
keras.layers.Dense(units=10, activation='softmax')
])
💡 Your Turn #3: Tune the Number of Layers
What if our model needs to be deeper to understand the data? Try adding a second hidden layer. Re-run
the training with this new architecture. Does a deeper model improve accuracy?
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(units=128, activation='relu'),
keras.layers.Dense(units=64, activation='relu'),
keras.layers.Dense(units=10, activation='softmax')
])
After trying these, you've probably realized this manual process is slow and painful. You
are only changing one thing at a time. What if the best model has a slow learning rate AND two layers?
You'd have to test every combination. This is where automated tools shine.
Part 4: Automated Tuning with Optuna 🤖
Optuna is a library that automates the search for the best hyperparameters. You define
the *range* of values to test, and Optuna intelligently explores that range to find the best
combination.
4.1 Installing Optuna
!pip install optuna
4.2 Defining the Objective Function
This function tells Optuna how to build and evaluate one version of our model.
import optuna
def objective(trial):
lr = trial.suggest_float('learning_rate', 1e-5, 1e-1, log=True)
n_units = trial.suggest_int('n_units', 32, 256)
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(units=n_units, activation='relu'),
keras.layers.Dense(units=10, activation='softmax')
])
model.compile(optimizer=keras.optimizers.Adam(learning_rate=lr), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=64, verbose=0)
loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
return accuracy
4.3 Running the Study
Now we create a "study" and tell Optuna to run our `objective` function. We'll run 20 trials.
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=20)
best_trial = study.best_trial
print(f"Best Accuracy: {best_trial.value}")
print(f"Best Params: {best_trial.params}")
Best Accuracy: 0.8915...
Best Params: {'learning_rate': 0.0008..., 'n_units': 210}
4.4 Visualizing the Results
A huge benefit of Optuna is its built-in visualization tools. They help you understand the search.
optuna.visualization.plot_optimization_history(study).show()
optuna.visualization.plot_param_importances(study).show()
From these plots, we can see the `learning_rate` was more important than `n_units` in our search, and the
best results were found early on.
Part 5: Your Mission - The Ultimate Tuner
Assignment: Expand the Search Space
Your goal is to find an even better model by giving Optuna more hyperparameters to
search through. Can you break 90% accuracy?
Your Task:
Modify the `objective` function to search for three more hyperparameters:
- Number of Layers: Use `trial.suggest_int('n_layers', 1, 3)`. Use a `for` loop
to build the model with the chosen number of layers.
- Dropout Rate: For each hidden layer, add a Dropout layer. Suggest a rate with
`trial.suggest_float('dropout', 0.1, 0.5)`. This helps prevent overfitting.
- Choice of Optimizer: Use `trial.suggest_categorical('optimizer', ['adam',
'rmsprop', 'sgd'])`.
Run the study again with this much larger search space for at least 50
trials. Report the best accuracy and the full set of best parameters you found.
Part 6: Going Further - Pruning Unpromising Trials
Running 50+ trials can be slow. What if a trial is obviously doing poorly after just a few epochs? It's a
waste of time to finish training it. Pruning is a technique to automatically stop these
unpromising trials early.
We can add a "callback" to our `.fit()` method that reports the validation accuracy to Optuna after each
epoch. If it's not looking good, Optuna will raise an exception and stop the trial.
from optuna.integration import
TFKerasPruningCallback
def objective_with_pruning(trial):
n_layers = trial.suggest_int('n_layers', 1, 3)
lr = trial.suggest_float('learning_rate', 1e-4, 1e-1, log=True)
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28, 28)))
for i in range(n_layers):
n_units = trial.suggest_int(f'n_units_l{i}', 32, 256)
model.add(keras.layers.Dense(n_units, activation='relu'))
model.add(keras.layers.Dense(10, activation='softmax'))
model.compile(optimizer=keras.optimizers.Adam(lr), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
pruning_callback = TFKerasPruningCallback(trial, 'val_accuracy')
history = model.fit(
x_train, y_train,
epochs=20,
validation_split=0.2,
callbacks=[pruning_callback],
verbose=0
)
return history.history['val_accuracy'][-1]
💡 Your Turn #4: Implement Pruning
Integrate the pruning callback into your full assignment solution. Create a new study and run it. You
should see messages like `Trial 5 pruned` in the output. Does this speed up your search?
Part 7: Bonus - The Housing Price Challenge
Classification is one type of problem. Another is Regression, where you predict a
continuous number. The classic "House Prices" competition on Kaggle is a perfect place to apply your new
tuning skills.
The goal is to predict the final sale price of a house based on 79 features.
Your Challenge:
- Setup: This dataset is more complex. You will need to handle missing values (e.g.,
fill them with the mean/median) and convert categorical features into numbers (e.g., using
`pd.get_dummies`). This preprocessing is a major part of the challenge.
- Modify the Objective Function:
- The final layer must have **1 neuron** with no activation: `keras.layers.Dense(units=1)`.
- The loss function for regression is typically `mean_squared_error`.
- The metric you want to MINIMIZE is `root_mean_squared_error`.
- Set `direction='minimize'` when you create your Optuna study.
- Run the Study: Set up an Optuna study to find the best hyperparameters. Report the
lowest error you achieve.
Part 8: Submission Guidelines
To complete this lab, please follow these instructions carefully.
- Complete all "Your Turn" tasks and the main "Lab Assignment" in a single Google Colab notebook. The
Kaggle project is a bonus.
- For the assignment, make sure the output of your final Optuna study (with all hyperparameters and
pruning) is visible, showing the final best value and best parameters.
- Add a Text Cell at the end summarizing the best accuracy you found and the hyperparameters that
achieved it.
- Ensure all your code cells have been run so that their outputs and plots are visible.
- When you are finished, generate a shareable link. In Colab, click "Share" and set
access to "Anyone with the link" can "Viewer".
- Click "Copy link" and submit this link as your assignment.