Linking in a New Searcher
At this point, you should have learned everything needed for implementing a new scheduler, or for modifying an existing template scheduler to your special requirements. Say, you have implemented a new searcher to be plugged into one of the existing generic schedulers. In this section, we will look into how a new searcher can be made available in an easy-to-use fashion.
The Searcher Factory
Recall that our generic schedulers, such as
FIFOScheduler or
HyperbandScheduler allow the
user to choose a searcher via the string argument searcher
, and to
configure the searcher (away from defaults) by the dictionary argument
search_options
. While searcher
can also be a
BaseSearcher
instance, it is simpler and more convenient to choose the searcher by
name. For example:
Generic schedulers only work with certain types of searchers. This consistency is checked when
searcher
is a name, but may lead to subtle errors if not.Several arguments of a searcher are typically just the same as for the surrounding scheduler, or can be inferred from arguments of the scheduler. This can become complex for some searchers and leads to difficult boiler plate code in case
searcher
is to be created by hand.While not covered in this tutorial, constructing schedulers and searchers for Gaussian process based Bayesian optimization and its extensions to multi-fidelity scheduling, constrained or cost-aware search is significantly more complex, as can be seen in
syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
.
It is the purpose of
searcher_factory()
to create the correct
BaseSearcher
object for given
scheduler arguments, including searcher
(name) and search_options
. Let
us have a look how the constructor of
FIFOScheduler
calls the factory. We see
how scheduler arguments like metric
, mode
, points_to_evaluate
are
just passed through to the factory. We also need to set
search_options["scheduler"]
in order to tell searcher_factory
which
generic scheduler is calling it.
The
searcher_factory()
code should be straightforward to understand and extend. Pick a name for your
new searcher and set searcher_cls
and supported_schedulers
(the latter
can be left to None
if your searcher works with all generic schedulers). The
constructor of your searcher needs to have the signature
def __init__(self, config_space: dict, metric: str, **kwargs):
Here, kwargs
will be fed with search_options
, but enriched with fields
like mode
, points_to_evaluate
, random_seed_generator
, scheduler
.
Your searcher is not required to make use of them, even though we strongly
recommend to support points_to_evaluate
and to make use of
random_seed_generator
(as is
shown here). Here are
some best practices for linking a new searcher into the factory:
The Syne Tune code is written in a way which allows to run certain scenarios with a restricted set of all possible dependencies (see FAQ). This is achieved by conditional imports. If your searcher requires dependencies beyond the core, please make sure to use
try ... except ImportError
as you see in the code.Try to make sure that your searcher also works without
search_options
being specified by the user. You will always have the fields contributed by the generic schedulers, and for all others, your code should ideally come with sensible defaults.Make sure to implement the
configure_scheduler
method of your new searcher, restricting usage to supported scheduler types.
The Baseline Wrappers
In order to facilitate choosing and configuring a scheduler along with its
searcher, Syne Tune defines the most frequently used combinations in
syne_tune.optimizer.baselines
. The minimal signature of a baseline
class is this:
def __init__(self, config_space: dict, metric: str, **kwargs):
Or, in the multi-objective case:
def __init__(self, config_space: dict, metric: List[str], **kwargs):
If the underlying scheduler maintains a searcher (as most schedulers do),
arguments to the searcher (except for config_space
, metric
) are
given in kwargs["search_options"]
. If a scheduler is of multi-fidelity
type, the minimal signature is:
def __init__(self, config_space: dict, metric: str, resource_attr: str, **kwargs):
If the scheduler accepts a random seed, this must be kwargs["random_seed"]
.
Several wrapper classes in syne_tune.optimizer.baselines
have signatures
with more arguments, which are either passed to the scheduler or to the searcher.
For example, some wrappers make random_seed
explicit in the signature,
instead of having it in kwargs
.
Note
If a scheduler maintains a searcher inside, and in particular if it simply
configures FIFOScheduler
or
class:HyperbandScheduler
with a new
searcher, it is strongly recommended to adhere to the policy to specify
searcher arguments in kwargs["search_options"]
. This simplifies enabling
the new scheduler in the simple experimentation framework of
syne_tune.experiments
, and in general provides a common user
experience across different schedulers.
Let us look at an example of a baseline wrapper whose underlying scheduler is
of type FIFOScheduler
with a specific
searcher, which is not itself created via a searcher factory:
class REA(FIFOScheduler):
"""Regularized Evolution (REA).
See :class:`~syne_tune.optimizer.schedulers.searchers.regularized_evolution.RegularizedEvolution`
for ``kwargs["search_options"]`` parameters.
:param config_space: Configuration space for evaluation function
:param metric: Name of metric to optimize
:param population_size: See
:class:`~syne_tune.optimizer.schedulers.searchers.RegularizedEvolution`.
Defaults to 100
:param sample_size: See
:class:`~syne_tune.optimizer.schedulers.searchers.RegularizedEvolution`.
Defaults to 10
:param random_seed: Random seed, optional
:param kwargs: Additional arguments to
:class:`~syne_tune.optimizer.schedulers.FIFOScheduler`
"""
def __init__(
self,
config_space: Dict[str, Any],
metric: str,
population_size: int = 100,
sample_size: int = 10,
random_seed: Optional[int] = None,
**kwargs,
):
searcher_kwargs = _create_searcher_kwargs(
config_space, metric, random_seed, kwargs
)
searcher_kwargs["population_size"] = population_size
searcher_kwargs["sample_size"] = sample_size
super(REA, self).__init__(
config_space=config_space,
metric=metric,
searcher=LegacyRegularizedEvolution(**searcher_kwargs),
random_seed=random_seed,
**kwargs,
)
def create_gaussian_process_estimator(
config_space: Dict[str, Any],
metric: str,
random_seed: Optional[int] = None,
search_options: Optional[Dict[str, Any]] = None,
) -> Estimator:
scheduler = BayesianOptimization(
config_space=config_space,
metric=metric,
random_seed=random_seed,
search_options=search_options,
)
searcher = scheduler.searcher # GPFIFOSearcher
state_transformer = searcher.state_transformer # ModelStateTransformer
estimator = state_transformer.estimator # GaussProcEmpiricalBayesEstimator
# update the estimator properties
estimator.active_metric = metric
return estimator
class MORandomScalarizationBayesOpt(FIFOScheduler):
"""
Uses :class:`~syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveMultiSurrogateSearcher`
with one standard GP surrogate model per metric (same as in
:class:`BayesianOptimization`, together with the
:class:`~syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveLCBRandomLinearScalarization`
acquisition function.
If `estimators` is given, surrogate models are taken from there, and the
default is used otherwise. This is useful if you have a good low-variance
model for one of the objectives.
:param config_space: Configuration space for evaluation function
:param metric: Name of metrics to optimize
:param mode: Modes of optimization. Defaults to "min" for all
:param random_seed: Random seed, optional
:param estimators: Use these surrogate models instead of the default GP
one. Optional
:param kwargs: Additional arguments to
:class:`~syne_tune.optimizer.schedulers.FIFOScheduler`. Here,
``kwargs["search_options"]`` is used to create the searcher and its
GP surrogate models.
"""
def __init__(
self,
config_space: Dict[str, Any],
metric: List[str],
mode: Union[List[str], str] = "min",
random_seed: Optional[int] = None,
estimators: Optional[Dict[str, Estimator]] = None,
**kwargs,
):
try:
from syne_tune.optimizer.schedulers.multiobjective import (
MultiObjectiveMultiSurrogateSearcher,
MultiObjectiveLCBRandomLinearScalarization,
)
except ImportError:
raise
searcher_kwargs = _create_searcher_kwargs(
config_space, metric, random_seed, kwargs
)
if estimators is None:
estimators = dict()
else:
estimators = estimators.copy()
if isinstance(mode, str):
mode = [mode] * len(metric)
if "search_options" in kwargs:
search_options = kwargs["search_options"].copy()
else:
search_options = dict()
search_options["no_fantasizing"] = True
for _metric in metric:
if _metric not in estimators:
estimators[_metric] = create_gaussian_process_estimator(
config_space=config_space,
metric=_metric,
search_options=search_options,
)
# Note: ``mode`` is dealt with in the ``update`` method of the MO
# searcher, by converting the metrics. Internally, all metrics are
# minimized
searcher = MultiObjectiveMultiSurrogateSearcher(
estimators=estimators,
mode=mode,
scoring_class=partial(
MultiObjectiveLCBRandomLinearScalarization, random_seed=random_seed
),
**searcher_kwargs,
)
super().__init__(
config_space=config_space,
metric=metric,
mode=mode,
searcher=searcher,
random_seed=random_seed,
**kwargs,
)
class NSGA2(FIFOScheduler):
"""
See :class:`~syne_tune.optimizer.schedulers.searchers.RandomSearcher`
for ``kwargs["search_options"]`` parameters.
:param config_space: Configuration space for evaluation function
:param metric: Name of metric to optimize
:param population_size: The size of the population for NSGA-2
:param random_seed: Random seed, optional
:param kwargs: Additional arguments to
:class:`~syne_tune.optimizer.schedulers.FIFOScheduler`
"""
def __init__(
self,
config_space: Dict[str, Any],
metric: List[str],
mode: Union[List[str], str] = "min",
population_size: int = 20,
random_seed: Optional[int] = None,
**kwargs,
):
searcher_kwargs = _create_searcher_kwargs(
config_space, metric, random_seed, kwargs
)
searcher_kwargs["mode"] = mode
searcher_kwargs["population_size"] = population_size
super(NSGA2, self).__init__(
config_space=config_space,
metric=metric,
mode=mode,
searcher=NSGA2Searcher(**searcher_kwargs),
random_seed=random_seed,
**kwargs,
)
The signature has
config_space
,metric
, andrandom_seed
. It also has two searcher arguments,population_size
andsample_size
.In order to compile the arguments
searcher_kwargs
for creating the searcher, we first call_create_searcher_kwargs(config_space, metric, random_seed, kwargs)
. Doing so is particularly important in order to ensure random seeds are managed between scheduler and searcher in the same way across different Syne Tune schedulers.Next, the additional arguments
population_size
andsample_size
need to be appended to these searcher arguments. Had we usedkwargs["search_options"]
instead, this would not be necessary.Finally, we create
FIFOScheduler
, passingconfig_space
,metric
, as well as the new searcher viasearcher=RegularizedEvolution(**searcher_kwargs)
, and finally pass**kwargs
at the end.
Baselines and Benchmarking
As shown in this tutorial and
this tutorial, a particularly convenient
way to define and run experiments is using the code in
syne_tune.experiments
. Once a new scheduler has a baseline wrapper, it
is very easy to make it available there: you just need to add a wrapper in
syne_tune.experiments.default_baselines
. For the REA
example above,
this is:
from syne_tune.optimizer.baselines import REA as _REA
def REA(method_arguments: MethodArguments, **kwargs):
return _REA(**_baseline_kwargs(method_arguments, kwargs))
Contribute your Extension
At this point, you are ready to plug in your latest idea and make it work in Syne Tune. Given that it works well, we would encourage you to contribute it back to the community. We are looking forward to your pull request.