Random Search
Random search is arguably the simplest HPO baseline. In a nutshell, _suggest
samples a new configuration at random from the configuration space, much like
our SimpleScheduler
above, and on_trial_result
does nothing except
returning SchedulerDecision.CONTINUE
. A slightly more advanced version
would make sure that the same configuration is not suggested twice.
In this section, we walk through the Syne Tune implementation of random search,
thereby discussing some additional concepts. This will also be a first example
of the modular concept just described: random search is implemented as generic
FIFOScheduler
configured by a
RandomSearcher
.
A self-contained implementation of random search would be shorter. On the other
hand, as seen in
syne_tune.optimizer.baselines
, FIFOScheduler
also powers GP-based
Bayesian optimization, grid search, BORE, regularized evolution and constrained
BO simply by specifying different searchers. A number of concepts, to be
discussed here, have to be implemented once only and can be maintained much more
easily.
FIFOScheduler and RandomSearcher
We will have a close look at
FIFOScheduler
and
RandomSearcher
. Let us first
consider the arguments of FIFOScheduler
:
searcher
,search_options
: These are used to configure the scheduler with a searcher. For ease of use,searcher
can be a name, and additional arguments can be passed viasearch_options
. In this case, the searcher is created by a factory, as detailed below. Alternatively,searcher
can also be aBaseSearcher
object.metric
,mode
: As discussed above inSimpleScheduler
.random_seed
: Several pseudo-random number generators may be used in scheduler and searcher. Seeds for these are drawn from a random seed generator maintained inFIFOScheduler
, whose seed can be passed here. As a general rule, all schedulers and searchers implemented in Syne Tune carefully manage such generators (and contributed schedulers are strongly encourage to adopt this pattern).points_to_evaluate
: A list of configurations (possibly partially specified) to be suggested first. This allows the user to initialize the search by default configurations, thereby injecting knowledge about the task. We strongly recommend every scheduler to support this mechanism. More details are given below.max_resource_attr
,max_t
: These arguments are relevant for multi-fidelity schedulers. Only one of them needs to be given. We recommend to usemax_resource_attr
. More details are given below.
The most important use case is to configure FIFOScheduler
with a new
searcher, and we will concentrate on this one. First, the base class of all
searchers is BaseSearcher
:
points_to_evaluate
: A list of configurations to be suggested first. This is initialized and (possibly) imputed in the base class, but needs to be used in child classes. Configurations inpoints_to_evaluate
can be partially specified. Any hyperparameter missing in a configuration is imputed using a “midpoint” rule. For a numerical parameter, this is the middle of the range (in linear or log scale). For a categorical parameter, the first value is chosen. Ifpoints_evaluate
is not given, the default is[dict()]
: a single initial configuration is determined fully by the midpoint rule. In order not to use initial configurations at all, the user has to passpoints_to_evaluate=[]
. The imputation of configurations is done in the base class.configure_scheduler
: Callback function, allows the searcher to configure itself depending on the scheduler. It also allows the searcher to reject schedulers it is not compatible with. This method is called automatically at the beginning of an experiment.get_config
: This method is called by the scheduler in_suggest
, it delegates the suggestion of a configuration for a new trial to the searcher.on_trial_result
: This is called by the scheduler in its ownon_trial_result
, also passing the configuration of the current trial. If the searcher maintains a surrogate model (for example, based on a Gaussian process), it should update its model withresult
data iffupdate=True
. This is discussed in more detail below. Note thaton_trial_result
does not return anything: decisions on how to proceed with the trial are not done in the searcher.register_pending
: Registers one (or more) pending evaluations, which are signals to the searcher that a trial has been started and will return an observation in the future. This is important in order to avoid redundant suggestions in model-based HPO.evaluation_failed
: Called by the scheduler if a trial failed. Default searcher reactions are to remove pending evaluations and not to suggest the corresponding configuration again. More advanced constrained searchers may also try to avoid nearby configurations in the future.cleanup_pending
: Removes all pending evaluations for a trial. This is called by the scheduler when a trial terminates.get_state
,clone_from_state
: Used in order to serialize and de-serialize the searcherdebug_log
: There is some built-in support for a detailed log, embedded inFIFOScheduler
and the Syne Tune searchers.
Below BaseSearcher
, there is
StochasticSearcher
, which
should be used by all searchers which make random decisions. It maintains a PRN
generator and provides methods to serialize and de-serialize its state.
StochasticAndFilterDuplicatesSearcher
extends StochasticSearcher
. It supports a number of features which are
desirable for most searchers:
Seed management for random decisions.
Avoid suggesting the same configuration more than once. While we in general recommend to use the default
allow_duplicates == False
, allowing for duplicates can be useful when dealing with configuration spaces of small finite size.Restrict configurations which can be suggested to a finite set. This can be very useful when using tabulated blackboxes. It does not make sense for every scheduler though, as some rely on a continuous search over the configuration space. You can inherit from
StochasticAndFilterDuplicatesSearcher
and still not support this feature, by insisting onrestrict_configurations == None
.
All built-in Syne Tune searchers either inherit from this class, or avoid
duplicate suggestions in a different way. Finally, let us walk through
RandomSearcher
:
There are a few features beyond
SimpleScheduler
above. The searcher does not suggest the same configuration twice (ifallow_duplicates == False
), and also warns if a finite configuration space has been exhausted. It also usesHyperparameterRanges
for random sampling and comparing configurations (to spot duplicates). This is a useful helper class, also for encoding configurations as vectors. The logic of detecting duplicates is implemented in the base classStochasticAndFilterDuplicatesSearcher
. Finally,debug_log
is used for diagnostic logs.get_config
first asks for another entry frompoints_to_evaluate
by way of_next_initial_config
. It then samples a new configuration at random. This is done without replacement ifallow_duplicates == False
, and with replacement otherwise. If successful, it also feedsdebug_log
._update
: This is not needed for random search, but is used here in order to feeddebug_log
.