Hyperparameter tuning

Machine learning algorithms usually have (a lot of) hyperparameters. You need to tune these, for which there is a recipe, but which is also partially an art.

Intended for: BSc, MSc, PhD

1. Definition of hyperparameter

A parameter is a quantity that we change/optimize during the learning process (based on the data). Example: the parameters in a neural network.
A hyperparameter is a quantity that influences how the learning process actually executes. Examples: the number of layers in a neural network, the learning rate, etc.

2. Always tune your hyperparameters

Hyperparameters are absolute crucial for the performance of a learning algorithm. If they are off, your algorithms will not work, no matter how smart it is. Therefore: always tune (at least the most important) hyperparameters.

If you did not tune your hyperparameters, you cannot conclude that your learning algorithm does not work (complicated sentence, but I can't phrase it differently).

3. Tune the most important parameters first

Alright, you know you need to tune hyperparameters, but where to start. You typically have a lot of hyperparameters, and you do not have the budget to vary all of them. Therefore, always tune the most important parameters first. This includes:

Learning rate: in any machine learning experiment with a learning rate, tune the learning rate first. Always. It is almost by definition the most important hyperparameter.
Exploration rate: in any reinforcement learning experiment, also tune the exploration rate (there is usually one parameter that scales the exploration-exploitation trade-off).

4. Perform a grid search (or more advanced method)

Once you determined which hyperparameters you will tune, you have to decide on an actual approach.

The basic approach is to do a grid search. Take for example 3-4 values for the learning rate, and 3-4 values for the exploration parameter, and try each of their combination (for a few repetitions).
You can also use random search, which may actually provide benefits over grid search.
There are more advanced ways for automated hyperparameter optimization (HPO): read this book (Chapter 1) for an overview.

5. Vary hyperparameters in orders of magnitude.

When you try different values for hyperparameters, you usually want to vary them in orders of magnitude (this applies to many hyperparameters, such as the learning rate).

Example:
- For the learning rate of a neural network, you try [0.01, 0.02, 0.03, 0.04]. They all give poor performance. The problem is that this linear scale searches over the same order of magnitude, and you we're off this time.
- Instead, you try to search over [0.01, 0.001, 0.0001, 0.00001]. You find out 0.0001 actually gives good performance, and for this problem the learning rate needs to be in this range.

6. Get intuition for reasonable values (hyperparameter optimization is more art than science)

This is probably the most difficult alspect of hyperparameter optimization. You will simply not have the budget to optimize over all hyperparameters. Therefore, you will have to fix many parameters to 'reasonable' values, where you determine reasonable.

There is no good way to teach this: you gradually get this intuition over the years.
The best advice is to carefully check related methods (the papers and Github repositories), and check what hyperparameter settings they use for certain variables. These can be a starting point for your own settings.

7. Other resources

Hyperparameter optimization is such a crucial aspect of machine learning, but we actually talk and write very little about it.

Consider this excellent blog for an rare exception on how to tune your hyperparameters for neural network training.

Page updated

Google Sites

Report abuse