Extending Asynchronous Hyperband
Syne Tune provides powerful generic scheduler templates for popular methods like successive halving and Hyperband. These can be run with synchronous or asynchronous decision-making. The most important generic templates at the moment are:
FIFOScheduler: Full evaluation scheduler, baseclass for many others. See also
FIFOScheduler
.HyperbandScheduler: Asynchronous successive halving and Hyperband. See also
HyperbandScheduler
.SynchronousHyperbandScheduler: Synchronous successive halving and Hyperband. See also
SynchronousHyperbandScheduler
.
Chances are your idea for a new scheduler maps to one of these templates, in which case you can save a lot of time and headache by just extending the template, rather than re-implementing the wheel. Due to Syne Tune’s modular design of schedulers and their components (e.g., searchers, decision rules), you may even get more than you bargained for.
In this section, we will walk through an example of how to furnish the asynchronous successive halving scheduler with a specific searcher.
HyperbandScheduler
Details about asynchronous successive halving and Hyperband are given in the
Multi-fidelity HPO tutorial. This is a
multi-fidelity scheduler, where trials report intermediate results (e.g.,
validation error at the end of each epoch of training). We can formalize this
notion by the concept of resource \(r = 1, 2, 3, \dots\) (e.g.,
\(r\) is the number of epochs trained). A generic implementation of this
method is provided in class:HyperbandScheduler
.
Let us have a look at its arguments not shared with the base class
class:FIFOScheduler
:
A mandatory argument is
resource_attr
, which is the name of a field in theresult
dictionary passed toscheduler.on_trial_report
. This field contains the resource \(r\) for which metric values have been reported. For example, if a trial reports validation error at the end of the 5-th epoch of training,result
contains{resource_attr: 5}
.We already noted the arguments
max_resource_attr
andmax_t
in class:FIFOScheduler
. They are used to determine the maximum resource \(r_{max}\) (e.g., the total number of epochs a trial is to be trained, if not stopped before). As discussed in detail here, it is best practice reserving a field in the configuration spacescheduler.config_space
to contain \(r_{max}\). If this is done, its name should be passed inmax_resource_attr
. Now, every configuration sent to the training script contains \(r_{max}\), which should not be hardcoded in the script. Moreover, ifmax_resource_attr
is used, a pause-and-resume scheduler (e.g.,HyperbandScheduler
withtype="stopping"
) can modify this field in the configuration of a trial which is only to be run until a certain resource less than \(r_{max}\). Nevertheless, ifmax_resource_attr
is not used, then \(r_{max}\) has to be passed explicitly viamax_t
(which is not needed ifmax_resource_attr
is used).reduction_factor
,grace_period
,brackets
are important parameters detailed in the tutorial. Ifbrackets > 1
, we run asynchronous Hyperband with this number of brackets, while forbracket == 1
we run asynchronous successive halving (this is the default).As detailed in the tutorial,
type
determines whether the method uses early stopping (type="stopping"
) or pause-and-resume scheduling (type="promotion"
). Further choices oftype
activate specific algorithms such as RUSH, PASHA, or cost-sensitive successive halving.
Kernel Density Estimator Searcher
One of the most flexible ways of extending
HyperbandScheduler
is to provide it with
a novel searcher. In order to
understand how this is done, we will walk through
MultiFidelityKernelDensityEstimator
and
KernelDensityEstimator
.
This searcher implements suggest
as in
BOHB, as also detailed in
this tutorial. In a
nutshell, the searcher splits all observations into two parts (good and
bad), depending on metric values lying above or below a certain quantile, and
fits kernel density estimators to these two subsets. It then makes decisions
based on a particular ratio of these densities, which is approximating a
variant of the expected improvement acquisition function.
We begin with the base class
KernelDensityEstimator
,
which works with schedulers implementing
TrialSchedulerWithSearcher
(the most important one being FIFOScheduler
),
but already implements most of what is needed in the multi-fidelity context.
The code does quite some bookkeeping concerned with mapping configurations to feature vectors. If you want to do this from scratch for your searcher, we recommend to use
HyperparameterRanges
. However,KernelDensityEstimator
was extracted from the original BOHB implementation.Observation data is collected in
self.X
(feature vectors for configurations) andself.y
(values forself._metric
, negated ifself.mode == "max"
). In particular, the_update
method simply appends new data to these members.get_config
fits KDEs to the good and bad parts ofself.X
,self.y
. It then samplesself.num_candidates
configurations at random, evaluates the TPE acquisition function for each candidate, and returns the best one.configure_scheduler
is a callback which allows the searcher to check whether its scheduler is compatible, and to depend on details of this scheduler. In our case, we check whether the scheduler implementsTrialSchedulerWithSearcher
, which is the minimum requirement for a searcher.
Note
Any scheduler configured by a searcher should inherit from
TrialSchedulerWithSearcher
,
which mainly makes sure that
configure_scheduler()
is called before the searcher is first used. It is also strongly recommended
to implement configure_scheduler
for a new searcher, restricting usage
to compatible schedulers.
The class
MultiFidelityKernelDensityEstimator
inherits from KernelDensityEstimator
:
On top of
self.X
andself.y
, it also maintains resource values \(r\) for each datapoint inself.resource_levels
.get_config
remains the same, only its subroutinetrain_kde
for training the good and bad density models is modified. The idea is to fit these to data from a single rung level, namely the largest level at which we have observed at leastself.num_min_data_points
points.configure_scheduler
restricts usage to schedulers implementingMultiFidelitySchedulerMixin
, which all multi-fidelity schedulers need to inherit from (examples areHyperbandScheduler
for asynchronous Hyperband andSynchronousHyperbandScheduler
for synchronous Hyperband). It also callsconfigure_scheduler()
. Moreover,self.resource_attr
is obtained from the scheduler, so does not have to be passed.
Note
Any multi-fidelity scheduler configured by a searcher should inherit from both
TrialSchedulerWithSearcher
and
MultiFidelitySchedulerMixin
.
The latter is a basic API to be implemented by multi-fidelity schedulers, which
is used by the configure_scheduler
of searchers specialized to multi-fidelity
HPO. Doing so makes sure any new multi-fidelity scheduler can seamlessly be
used with any such searcher.
While being functional and simple, the
MultiFidelityKernelDensityEstimator
does not showcase the full range of
information exchanged between HyperbandScheduler
and a searcher. In
particular:
register_pending
: BOHB does not take pending evaluations into account.remove_case
,evaluation_failed
are not implemented.get_state
,clone_from_state
are not implemented, so schedulers with this searcher are not properly serialized.
For a more complete and advanced example, the reader is invited to study
GPMultiFidelitySearcher
and
GPFIFOSearcher
.
This searcher takes pending evaluations into account (by way of fantasizing).
Moreover, it can be configured with a Gaussian process model and an acquisition
function, which is optimized in a gradient-based manner.
Moreover, as already noted here,
HyperbandScheduler
also allows to configure the decision rule for
stop/continue or pause/resume as part of on_trial_report
. Examples for this
are found in
StoppingRungSystem
,
PromotionRungSystem
,
RUSHStoppingRungSystem
,
PASHARungSystem
,
CostPromotionRungSystem
.