Syne Tune: Large-Scale and Reproducible Hyperparameter Optimization

This package provides state-of-the-art algorithms for hyperparameter optimization (HPO) with the following key features:
Wide coverage (>20) of different HPO methods, including:
Asynchronous versions to maximize utilization and distributed versions (i.e., with multiple workers);
Multi-fidelity methods supporting model-based decisions (BOHB, MOBSTER, Hyper-Tune, DyHPO, BORE);
Hyperparameter transfer learning to speed up (repeated) tuning jobs;
Multi-objective optimizers that can tune multiple objectives simultaneously (such as accuracy and latency).
HPO can be run in different environments (locally, AWS, simulation) by changing just one line of code.
Out-of-the-box tabulated benchmarks that allows you simulate results in seconds while preserving the real dynamics of asynchronous or synchronous HPO with any number of workers.
What’s New?
Andreas Mueller, co-creator and core contributor to scikit-learn, used Syne Tune extensively to optimize parameters of a hypernetwork which solves tabular classification tasks faster than state of the art boosted decision tree algorithms. Check out the video.
The experimentation framework of Syne Tune, providing an easy access to all the different methods, execution backends, and ways to run many experiments in parallel, is now available in
syne_tune.experiments
, there is no need to install from source anymore. This framework is the best place to start serious experimentation work with Syne Tune.New tutorial: Distributed Hyperparameter Tuning: Finding the Right Model can be Fast and Fun. Provides an overview of Syne Tune and its experimentation framework.
You can now create comparative plots, combining the results of many experiments, as shown here.
Local Backend supports training with more than one GPU per trial.
Speculative early checkpoint removal for asynchronous multi-fidelity optimization. Retaining all checkpoints often exhausts all available disk space when training large models. With this feature, Syne Tune automatically removes checkpoints that are unlikely to be needed. Details.
New Multi-Objective Scheduler:
LinearScalarizedScheduler
. The method works by taking a multi-objective problem and turning it into a single-objective task by optimizing for a linear combination of all objectives. This wrapper works with all single-objective schedulers.Support for automatic termination criterion proposed by Makarova et al. Instead of defining a fixed number of iterations or wall-clock time limit, we can set a threshold on how much worse we allow the final solution to be compared to the global optimum, such that we automatically stop the optimization process once we find a solution that meets this criteria.
Installation
To install Syne Tune from pip, you can simply do:
pip install 'syne-tune[basic]'
For development, you need to install Syne Tune from source:
git clone https://github.com/awslabs/syne-tune.git
cd syne-tune
python3 -m venv st_venv
. st_venv/bin/activate
pip install --upgrade pip
pip install -e '.[basic,dev]'
This installs Syne Tune in a virtual environment st_venv
. Remember to activate
this environment before working with Syne Tune. We also recommend building the
virtual environment from scratch now and then, in particular when you pull a new
release, as dependencies may have changed.
See our change log to check what has changed in the latest version.
In the examples above, Syne Tune is installed with the tag basic
, which
collects a reasonable number of dependencies. If you want to install all
dependencies, replace basic
with extra
. You can further refine this
selection by using
partial dependencies.
What Is Hyperparameter Optimization?
Here is an introduction to hyperparameter optimization in the context of deep learning, which uses Syne Tune for some examples.
First Example
To enable tuning, you have to report metrics from a training script so that they
can be communicated later to Syne Tune, this can be accomplished by just
calling report(epoch=epoch, loss=loss)
, as shown in this example:
import logging
import time
from syne_tune import Reporter
from argparse import ArgumentParser
if __name__ == "__main__":
root = logging.getLogger()
root.setLevel(logging.INFO)
parser = ArgumentParser()
parser.add_argument("--epochs", type=int)
parser.add_argument("--width", type=float)
parser.add_argument("--height", type=float)
args, _ = parser.parse_known_args()
report = Reporter()
for step in range(args.epochs):
time.sleep(0.1)
dummy_score = 1.0 / (0.1 + args.width * step / 100) + args.height * 0.1
# Feed the score back to Syne Tune
report(epoch=step + 1, mean_loss=dummy_score)
Once you have annotated your training script in this way, you can launch a tuning experiment as follows:
from pathlib import Path
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint
from syne_tune.optimizer.baselines import ASHA
# Hyperparameter configuration space
config_space = {
"width": randint(1, 20),
"height": randint(1, 20),
"epochs": 100,
}
# Scheduler (i.e., HPO algorithm)
scheduler = ASHA(
config_space,
metric="mean_loss",
resource_attr="epoch",
max_resource_attr="epochs",
search_options={"debug_log": False},
)
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height_simple.py"
)
tuner = Tuner(
trial_backend=LocalBackend(entry_point=entry_point),
scheduler=scheduler,
stop_criterion=StoppingCriterion(max_wallclock_time=30),
n_workers=4, # how many trials are evaluated in parallel
)
tuner.run()
This example runs ASHA with
n_workers=4
asynchronously parallel workers for max_wallclock_time=30
seconds on the local machine it is called on
(trial_backend=LocalBackend(entry_point=entry_point)
).
Experimentation with Syne Tune
If you plan to use advanced features of Syne Tune, such as different execution
backends or running experiments remotely, writing launcher scripts like
examples/launch_height_simple.py
can become tedious. Syne Tune provides an
advanced experimentation framework, which you can learn about in
this tutorial, or also in
this one. Examples for the
experimentation framework are given in benchmarking.examples
and
benchmarking.nursery
.
Supported HPO Methods
The following hyperparameter optimization (HPO) methods are available in Syne Tune:
Method |
Reference |
Searcher |
Asynchronous? |
Multi-fidelity? |
Transfer? |
---|---|---|---|---|---|
deterministic |
yes |
no |
no |
||
random |
yes |
no |
no |
||
model-based |
yes |
no |
no |
||
model-based |
yes |
no |
no |
||
any |
yes |
yes |
no |
||
random |
no |
yes |
no |
||
model-based |
no |
yes |
no |
||
model-based |
no |
yes |
no |
||
random |
yes |
yes |
no |
||
model-based |
yes |
yes |
no |
||
model-based |
yes |
yes |
no |
||
evolutionary |
no |
yes |
no |
||
model-based |
yes |
yes |
no |
||
DyHPO * |
model-based |
yes |
yes |
no |
|
model-based |
yes |
yes |
no |
||
random |
yes |
yes |
no |
||
evolutionary |
yes |
no |
no |
||
model-based |
yes |
no |
no |
||
evolutionary |
no |
yes |
no |
||
deterministic |
yes |
no |
yes |
||
ASHA-CTS ( |
random |
yes |
yes |
yes |
|
RUSH ( |
random |
yes |
yes |
yes |
|
any |
yes |
yes |
yes |
*: We implement the model-based scheduling logic of DyHPO, but use the same Gaussian process surrogate models as MOBSTER and HyperTune. The original source code for the paper is here.
The searchers fall into four broad categories, deterministic, random, evolutionary and model-based. The random searchers sample candidate hyperparameter configurations uniformly at random, while the model-based searchers sample them non-uniformly at random, according to a model (e.g., Gaussian process, density ration estimator, etc.) and an acquisition function. The evolutionary searchers make use of an evolutionary algorithm.
Syne Tune also supports BoTorch searchers,
see BoTorch
.
Supported Multi-objective Optimization Methods
Method |
Reference |
Searcher |
Asynchronous? |
Multi-fidelity? |
Transfer? |
---|---|---|---|---|---|
model-based |
yes |
no |
no |
||
random |
yes |
yes |
no |
||
evolutionary |
no |
no |
no |
||
model-based |
yes |
no |
no |
||
model-based |
yes |
no |
no |
HPO methods listed can be used in a multi-objective setting by scalarization
(LinearScalarizationPriority
)
or non-dominated sorting
(NonDominatedPriority
).
Security
See CONTRIBUTING for more information.
Citing Syne Tune
If you use Syne Tune in a scientific publication, please cite the following paper:
Syne Tune: A Library for Large Scale Hyperparameter Tuning and Reproducible Research
@inproceedings{
salinas2022syne,
title = {{Syne Tune}: A Library for Large Scale Hyperparameter Tuning and Reproducible Research},
author = {David Salinas and Matthias Seeger and Aaron Klein and Valerio Perrone and Martin Wistuba and Cedric Archambeau},
booktitle = {International Conference on Automated Machine Learning, AutoML 2022},
year = {2022},
url = {https://proceedings.mlr.press/v188/salinas22a.html}
}
License
This project is licensed under the Apache-2.0 License.
Frequently Asked Questions
Why should I use Syne Tune?
Hyperparameter Optimization (HPO) has been an important problem for many years, and a variety of commercial and open-source tools are available to help practitioners run HPO efficiently. Notable examples for open source tools are Ray Tune and Optuna. Here are some reasons why you may prefer Syne Tune over these alternatives:
Lightweight and platform-agnostic: Syne Tune is designed to work with different execution backends, so you are not locked into a particular distributed system architecture. Syne Tune runs with minimal dependencies.
Wide range of modalities: Syne Tune supports multi-fidelity HPO, constrained HPO, multi-objective HPO, transfer tuning, cost-aware HPO, population based training.
Simple, modular design: Rather than wrapping all sorts of other HPO frameworks, Syne Tune provides simple APIs and scheduler templates, which can easily be extended to your specific needs. Studying the code will allow you to understand what the different algorithms are doing, and how they differ from each other.
Industry-strength Bayesian optimization: Syne Tune has special support for Gaussian process based Bayesian optimization. The same code powers modalities like multi-fidelity HPO, constrained HPO, or cost-aware HPO, having been tried and tested for several years.
Support for distributed parallelized experimentation: We built Syne Tune to be able to move fast, using the parallel resources AWS SageMaker offers. Syne Tune allows ML/AI practitioners to easily set up and run studies with many experiments running in parallel.
Special support for researchers: Syne Tune allows for rapid development and comparison between different tuning algorithms. Its blackbox repository and simulator backend run realistic simulations of experiments many times faster than real time. Benchmarking is simple, efficient, and allows to compare different methods as apples to apples (same execution backend, implementation from the same parts).
If you are an AWS customer, there are additional good reasons to use Syne Tune over the alternatives:
If you use AWS services or SageMaker frameworks day to day, Syne Tune works out of the box and fits into your normal workflow. It unlocks the power of distributed experimentation that SageMaker offers.
Syne Tune is developed in collaboration with the team behind the Automatic Model Tuning service.
What are the different installation options supported?
To install Syne Tune with minimal dependencies from pip, you can simply do:
pip install 'syne-tune'
If you want in addition to install our own Gaussian process based optimizers,
Ray Tune or Bore optimizer, you can run pip install 'syne-tune[X]'
where
X
can be:
gpsearchers
: For built-in Gaussian process based optimizers (such asBayesianOptimization
,MOBSTER
, orHyperTune
)aws
: AWS SageMaker dependencies. These are required for remote launching or for theSageMakerBackend
raytune
: For Ray Tune optimizers (seeRayTuneScheduler
), installs all Ray Tune dependenciesbenchmarks
: For installing dependencies required to run all benchmarks locally (not needed for remote launching orSageMakerBackend
)blackbox-repository
: Blackbox repository for simulated tuningyahpo
: YAHPO Gym surrogate blackboxeskde
: For BOHB (such asSyncBOHB
, orFIFOScheduler
orHyperbandScheduler
withsearcher="kde"
)botorch
: Bayesian optimization from BoTorch (seeBoTorchSearcher
)dev
: For developers who would like to extend Syne Tunebore
: For Bore optimizer (seeBORE
)
There are also union tags you can use:
basic
: Union of dependencies of a reasonable size (gpsearchers
,kde
,aws
,moo
,sklearn
). Even if size does not matter for your local installation, you should considerbasic
for remote launching of experiments.extra
: Union of all dependencies listed above.
Our general recommendation is to use pip install 'syne-tune[basic]'
, then add
dev
if you aim to extend Syne Tunebenchmarks
if you like to run Syne Tune real benchmarks locallyblackbox-repository
if you like to run surrogate benchmarks with the simulator backendvisual
if you like to visualize results of experiments
In order to run schedulers which depend on BOTorch, you need to add botorch
,
and if you like to run Ray Tune schedulers, you need to add raytune
(both
of these come with many dependencies). If the size of the installation is of no
concern, just use pip install 'syne-tune[extra]'
.
If you run code which needs dependencies you have not installed, a warning message tells you which tag is missing, and you can always install it later.
To install the latest version from git, run the following:
pip install git+https://github.com/awslabs/syne-tune.git
For local development, we recommend using the following setup which will enable you to easily test your changes:
git clone https://github.com/awslabs/syne-tune.git
cd syne-tune
python3 -m venv st_venv
. st_venv/bin/activate
pip install --upgrade pip
pip install -e '.[basic,dev]'
This installs everything in a virtual environment st_venv
. Remember to
activate this environment before working with Syne Tune. We also recommend
building the virtual environment from scratch now and then, in particular when
you pull a new release, as dependencies may have changed.
How can I run on AWS and SageMaker?
If you want to launch experiments or training jobs on SageMaker rather than on your local machine, you will need access to AWS and SageMaker on your machine. Make sure that:
awscli
is installed (see this link)AWS credentials have been set properly (see this link).
The necessary SageMaker role has been created (see this page for instructions. If you’ve created a SageMaker notebook in the past, this role should already have been created for you).
The following command should run without error if your credentials are available:
python -c "import boto3; print(boto3.client('sagemaker').list_training_jobs(MaxResults=1))"
You can also run the following example that evaluates trials on SageMaker to test your setup.
python examples/launch_height_sagemaker.py
What are the metrics reported by default when calling the Reporter
?
Whenever you call the reporter to log a result, the worker time-stamp, the
worker time since the creation of the reporter and the number of times the
reporter was called are logged under the fields
ST_WORKER_TIMESTAMP
,
ST_WORKER_TIME
, and
ST_WORKER_ITER
. In addition, when running on
SageMaker, a dollar-cost estimate is logged under the field
ST_WORKER_COST
.
To see this behavior, you can simply call the reporter to see those metrics:
from syne_tune.report import Reporter
reporter = Reporter()
for step in range(3):
reporter(step=step, metric=float(step) / 3)
# [tune-metric]: {"step": 0, "metric": 0.0, "st_worker_timestamp": 1644311849.6071281, "st_worker_time": 0.0001048670000045604, "st_worker_iter": 0}
# [tune-metric]: {"step": 1, "metric": 0.3333333333333333, "st_worker_timestamp": 1644311849.6071832, "st_worker_time": 0.00015910100000837701, "st_worker_iter": 1}
# [tune-metric]: {"step": 2, "metric": 0.6666666666666666, "st_worker_timestamp": 1644311849.60733, "st_worker_time": 0.00030723599996917983, "st_worker_iter": 2}
How can I utilize multiple GPUs?
To utilize multiple GPUs you can use the local backend
LocalBackend
, which will run on the GPUs available
in a local machine. You can also run on a remote AWS instance with multiple GPUs
using the local backend and the remote launcher, see
here,
or run with the SageMakerBackend
which spins-up one
training job per trial.
When evaluating trials on a local machine with
LocalBackend
, by default each trial is allocated to
the least occupied GPU by setting CUDA_VISIBLE_DEVICES
environment
variable. When running on a machine with more than one GPU, you can adjust
the number of GPUs assigned to each trial by num_gpus_per_trial
. However,
make sure that the product of n_workers
and num_gpus_per_trial
is not
larger than the total number of GPUs, since otherwise trials will be delayed.
You can also use gpus_to_use
in order restrict Syne Tune to use a subset of
available GPUs only.
What is the default mode when performing optimization?
The default mode is "min"
when performing optimization, so the target metric
is minimized. The mode can be configured when instantiating a scheduler.
How are trials evaluated on a local machine?
When trials are executed locally (e.g., when
LocalBackend
is used), each trial is evaluated as a
different sub-process. As such the number of concurrent configurations evaluated
at the same time (set by n_workers
when creating the
Tuner
) should account for the capacity of the machine where
the trials are executed.
Is the tuner checkpointed?
Yes. When performing the tuning, the tuner state is regularly saved on the
experiment path under tuner.dill
(every 10 seconds, which can be configured
with results_update_interval
when creating the Tuner
).
This allows to use spot-instances when running a tuning remotely with the remote
launcher. It also allows to resume a past experiment or analyse the state of
scheduler at any point.
Where can I find the output of the tuning?
When running locally, the output of the tuning is saved under
~/syne-tune/{tuner-name}/
by default. When running remotely on SageMaker,
the output of the tuning is saved under /opt/ml/checkpoints/
by default and
the tuning output is synced regularly to
s3://{sagemaker-default-bucket}/syne-tune/{tuner-name}/
.
Can I resume a previous tuning job?
Yes, if you want to resume tuning you can deserialize the tuner that is regularly checkpointed to disk possibly after having modified some part of the scheduler or adapting the stopping condition to your need. See examples/launch_resume_tuning.py. for an example which resumes a previous tuning after having updated the configuration space.
How can I change the default output folder where tuning results are stored?
To change the path where tuning results are written, you can set the
environment variable SYNETUNE_FOLDER
to the folder that you want.
For instance, the following runs a tuning where results tuning files are
written under ~/new-syne-tune-folder
:
export SYNETUNE_FOLDER="~/new-syne-tune-folder"
python examples/launch_height_baselines.py
You can also do the following for instance to permanently change the output folder of Syne Tune:
echo 'export SYNETUNE_FOLDER="~/new-syne-tune-folder"' >> ~/.bashrc && source ~/.bashrc
What does the output of the tuning contain?
Syne Tune stores the following files metadata.json
, results.csv.zip
,
and tuner.dill
, which are respectively metadata of the tuning job, results
obtained at each time-step, and state of the tuner. If you create the
Tuner
with save_tuner=False
, the tuner.dill
file is
not written. The content of results.csv.zip
can be customized.
How can I enable trial checkpointing?
Since trials may be paused and resumed (either by schedulers or when using
spot-instances), the user may checkpoint intermediate results to avoid starting
computation from scratch. Model outputs and checkpoints must be written into a
specific local path given by the command line argument
ST_CHECKPOINT_DIR
. Saving/loading model checkpoint
from this directory enables to save/load the state when the job is
stopped/resumed (setting the folder correctly and uniquely per trial is the
responsibility of the backend). Here is an example of a tuning script with
checkpointing enabled:
import argparse
import json
import logging
import os
import time
from pathlib import Path
from syne_tune import Reporter
from syne_tune.constants import ST_CHECKPOINT_DIR
report = Reporter()
def load_checkpoint(checkpoint_path: Path):
with open(checkpoint_path, "r") as f:
return json.load(f)
def save_checkpoint(checkpoint_path: Path, epoch: int, value: float):
os.makedirs(checkpoint_path.parent, exist_ok=True)
with open(checkpoint_path, "w") as f:
json.dump({"last_epoch": epoch, "last_value": value}, f)
if __name__ == "__main__":
root = logging.getLogger()
root.setLevel(logging.INFO)
parser = argparse.ArgumentParser()
parser.add_argument("--num-epochs", type=int, required=True)
parser.add_argument("--multiplier", type=float, default=1)
parser.add_argument("--sleep-time", type=float, default=0.1)
# convention the path where to serialize and deserialize is given as st_checkpoint_dir
parser.add_argument(f"--{ST_CHECKPOINT_DIR}", type=str)
args, _ = parser.parse_known_args()
num_epochs = args.num_epochs
checkpoint_path = None
start_epoch = 1
current_value = 0
checkpoint_dir = getattr(args, ST_CHECKPOINT_DIR)
if checkpoint_dir is not None:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.json"
if checkpoint_path.exists():
state = load_checkpoint(checkpoint_path)
logging.info(f"resuming from previous checkpoint {state}")
start_epoch = state["last_epoch"] + 1
current_value = state["last_value"]
# write dumb values for loss to illustrate sagemaker ability to retrieve metrics
# should be replaced by your algorithm
for current_epoch in range(start_epoch, num_epochs + 1):
time.sleep(args.sleep_time)
current_value = (current_value + 1) * args.multiplier
if checkpoint_path is not None:
save_checkpoint(checkpoint_path, current_epoch, current_value)
report(train_acc=current_value, epoch=current_epoch)
When using the SageMaker backend, we use the
SageMaker checkpoint mechanism
under the hood to sync local checkpoints to S3. Checkpoints are synced to
s3://{sagemaker-default-bucket}/syne-tune/{tuner-name}/{trial-id}/
,
where sagemaker-default-bucket
is the default bucket for SageMaker. A complete
example is given by
examples/launch_height_sagemaker_checkpoints.py.
The same mechanism is used to regularly write the
tuning results to S3 during remote tuning.
However, during remote tuning with the local backend, we do not want
checkpoints to be synced to S3, since they are only required temporarily on the
same instance. Syncing them to S3 would be costly and error-prone, because the
SageMaker mechanism is not intended to work with different processes writing to
and reading from the sync directory concurrently. In this case, we can switch
off syncing checkpoints to S3 (but not tuning results!) by setting
trial_backend_path=backend_path_not_synced_to_s3()
when creating the
Tuner
object. An example is
fine_tuning_transformer_glue/hpo_main.py.
It is also supported by default in the
experimentation framework and in
RemoteLauncher
.
There are some convenience functions which help you to implement checkpointing for your training script. Have a look at resnet_cifar10.py:
Checkpoints have to be written at the end of certain epochs (namely those after which the scheduler may pause the trial). This is dealt with by
checkpoint_model_at_rung_level(config, save_model_fn, epoch)
. Here,epoch
is the current epoch, allowing the function to decide whether to checkpoint or not.save_model_fn
stores the current mutable state along withepoch
to a local path (see below). Finally,config
contains arguments provided by the scheduler (see below).Before the training loop starts (and optionally), the mutable state to start from has to be loaded from a checkpoint. This is done by
resume_from_checkpointed_model(config, load_model_fn)
. If the checkpoint has been loaded successfully, the training loop may start with epochresume_from + 1
instead of1
. Here,load_model_fn
loads the mutable state from a checkpoint in a local path, returning itsepoch
value if successful, which is returned asresume_from
.
In general, load_model_fn
and save_model_fn
have to be provided as part
of the script. For most PyTorch models, you can use
pytorch_load_save_functions
to this end. In general, you will want to
include the model, the optimizer, and the learning rate scheduler.
Finally, the scheduler provides additional information about checkpointing in
config
(most importantly, the path in
ST_CHECKPOINT_DIR
). You don’t have to worry about
this: add_checkpointing_to_argparse(parser)
adds corresponding arguments to
the parser.
How can I retrieve the best checkpoint obtained after tuning?
You can take a look at this example examples/launch_checkpoint_example.py which shows how to retrieve the best checkpoint obtained after tuning an XGBoost model.
How can I retrain the best model found after tuning?
You can call tuner.trial_backend.start_trial(config=tuner.best_config())
after tuning to retrain the best config,
you can take a look at this example
examples/launch_plot_example.py
which shows how to retrain the best model found while tuning.
Which schedulers make use of checkpointing?
Checkpointing means storing the state of a trial (i.e., model parameters, optimizer or learning rate scheduler parameters), so that it can be paused and potentially resumed at a later point in time, without having to start training from scratch. The following schedulers make use of checkpointing:
Promotion-based asynchronous Hyperband:
HyperbandScheduler
withtype="promotion"
ortype="dyhpo"
, as well as other asynchronous multi-fidelity schedulers. The code runs without checkpointing, but in this case, any trial which is resumed is started from scratch. For example, if a trial was paused after 9 epochs of training and is resumed later, training starts from scratch and the first 9 epochs are wasted effort. Moreover, extra variance is introduced by starting from scratch, since weights may be initialized differently. It is not recommended running promotion-based Hyperband without checkpointing.Population-based training (PBT):
PopulationBasedTraining
does not work without checkpointing.Synchronous Hyperband:
SynchronousGeometricHyperbandScheduler
, as well as other synchronous multi-fidelity schedulers. This code runs without checkpointing, but wastes effort in the same sense as promotion-based asynchronous Hyperband
Checkpoints are filling up my disk. What can I do?
When tuning large models, checkpoints can be large, and with the local backend, these checkpoints are stored locally. With multi-fidelity methods, many trials may be started, and keeping all checkpoints (which is the default) may exceed the available disk space.
If the trial backend TrialBackend
is
created with delete_checkpoints=True
, Syne Tune removes the checkpoint of a
trial once it is stopped or completes. All remaining checkpoints are removed at
the end of the experiment. Moreover, a number of schedulers support early
checkpoint removal for paused trials when they cannot be resumed anymore.
For promotion-based asynchronous multi-fidelity schedulers (
ASHA,
MOBSTER,
HyperTune), any
paused trial can in principle be resumed in the future, and
delete_checkpoints=True`
alone does not remove checkpoints. In this case,
you can activate speculative early checkpoint removal, by passing
early_checkpoint_removal_kwargs
when creating
HyperbandScheduler
(or
ASHA
,
MOBSTER
,
HyperTune
). This is a kwargs
dictionary with the following arguments:
max_num_checkpoints
: This is mandatory. Maximum number of trials with checkpoints being retained. Once more than this number of trials with checkpoints are present, checkpoints are removed selectively. This number must be larger than the number of workers, since running trials will always write checkpoints.approx_steps
: Positive integer. The computation of the ranking score is a step-wise approximation, which gets more accurate for largerapprox_steps
. However, this computation scales cubically inapprox_steps
. The default is 25, which may be sufficient in most cases, but if you need to keep the number of checkpoints quite small, you may want to tune this parameter.max_wallclock_time
: Maximum time in seconds the experiment is run for. This is the same as passed toStoppingCriterion
, and if you use an instance of this asstop_criterion
passed toTuner
, the value is taken from there. Speculative checkpoint removal can only be used if the stopping criterion includesmax_wallclock_time
.prior_beta_mean
: The method depends on the probability of the event that a trial arriving at a rung ranks better than a random paused trial with checkpoint at this rung. These probabilities are estimated for each rung, but we need some initial guess. You are most likely fine with the default. A value \(< 1/2\) is recommended.prior_beta_size
: See alsoprior_beta_mean
. The initial guess is a Beta prior, defined in terms of mean and effective sample size (here). The smaller this positive number, the weaker the effect of the initial guess. You are most likely fine with the default.min_data_at_rung
: Also related to the estimators mentioned withprior_beta_mean
. You are most likely fine with the default.
A complete example is
examples/launch_fashionmnist_checkpoint_removal.py.
For details on speculative checkpoint removal, look at
HyperbandRemoveCheckpointsCallback
.
Where can I find the output of my trials?
When running LocalBackend
locally, results of trials
are saved under ~/syne-tune/{tuner-name}/{trial-id}/
and contains the
following files:
config.json
: configuration that is being evaluated in the trialstd.err
: standard errorstd.out
: standard output
In addition all checkpointing files used by a training script such as intermediate model checkpoint will also be located there. This is exemplified in the following example:
tree ~/syne-tune/train-height-2022-01-12-11-08-40-971/
~/syne-tune/train-height-2022-01-12-11-08-40-971/
├── 0
│ ├── config.json
│ ├── std.err
│ ├── std.out
│ └── stop
├── 1
│ ├── config.json
│ ├── std.err
│ ├── std.out
│ └── stop
├── 2
│ ├── config.json
│ ├── std.err
│ ├── std.out
│ └── stop
├── 3
│ ├── config.json
│ ├── std.err
│ ├── std.out
│ └── stop
├── metadata.json
├── results.csv.zip
└── tuner.dill
When running tuning remotely with the remote launcher, only config.json
,
metadata.json
, results.csv.zip
and tuner.dill
are synced with S3
unless store_logs_localbackend=True
when creating Tuner
,
in which case the trial logs and informations are also persisted.
Is the experimentation framework only useful to compare different HPO methods?
No, by all means no! Most of our users do not use it that way, but simply to speed up experimentation, often with a single HPO methods, but many variants of their problem. More details about Syne Tune for rapid experimentation are provided here and here. Just to clarify:
We use the term benchmark to denote a tuning problem, consisting of some code for training and evaluation, plus some default configuration space (which can be changed to result in different variants of the benchmark).
While the code for the experimentation framework resides in
syne_tune.experiments
, we collect example benchmarks inbenchmarking
(only available if Syne Tune is installed from source). Many of the examples there are about comparison of different HPO methods, but some are not (for example,benchmarking.examples.demo_experiment
).In fact, while you do not have to use the experimentation framework to run studies in Syne Tune, it is much easier than maintaining your own launcher scripts and plotting code, so you are strongly encouraged to do so, whether your goal is benchmarking HPO methods or simply just find a good ML model for your current problem faster.
How can I plot the results of a tuning?
Some basic plots can be obtained via
ExperimentResult
. An example is given in
examples/launch_plot_results.py.
How can I plot comparative results across many experiments?
Syne Tune contains powerful plotting tools as part of the experimentation framework
in mod:syne_tune.experiments
, these are detailed
here. An example is provided as
part of
benchmarking/examples/benchmark_hypertune.
How can I specify additional tuning metadata?
By default, Syne Tune stores the time, the names and modes of the metric being
tuned, the name of the entrypoint, the name backend and the scheduler name. You
can also add custom metadata to your tuning job by setting metadata
in
Tuner
as follow:
from syne_tune import Tuner
tuner = Tuner(
...
tuner_name="plot-results-demo",
metadata={"tag": "special-tag", "user": "alice"},
)
All Syne Tune and user metadata are saved when the tuner starts under
metadata.json
.
How do I append additional information to the results which are stored?
Results are processed and stored by callbacks passed to
Tuner
, in particular see
StoreResultsCallback
. In order to add more
information, you can inherit from this class. An example is given in
StoreResultsAndModelParamsCallback
.
If you run experiments with tabulated benchmarks using the
BlackboxRepositoryBackend
, as demonstrated in
launch_nasbench201_simulated.py,
results are stored by
SimulatorCallback
instead, and you need to inherit from this class. An example is given in
SimulatorAndModelParamsCallback
.
I don’t want to wait, how can I launch the tuning on a remote machine?
Remote launching of experiments has a number of advantages:
The machine you are working on is not blocked
You can launch many experiments in parallel
You can launch experiments with any instance type you like, without having to provision them yourselves. For GPU instances, you do not have to worry about setting up CUDA, etc.
You can use the remote launcher to launch an experiment on a remote machine.
The remote launcher supports both LocalBackend
and
SageMakerBackend
. In the former case, multiple
trials will be evaluated on the remote machine (one use-case being to use a
beefy machine), in the latter case trials will be evaluated as separate
SageMaker training jobs. An example for running the remote launcher is
given in
launch_height_sagemaker_remotely.py.
Remote launching for experimentation is detailed in this tutorial or this tutorial.
How can I run many experiments in parallel?
You can remotely launch any number of experiments, which will then run in parallel, as detailed in this tutorial, see also these examples:
Local backend: benchmarking/examples/launch_local/
Simulator backend: benchmarking/examples/benchmark_dehb/
SageMaker backend: benchmarking/examples/launch_sagemaker/
Note
In order to run these examples, you need to have installed Syne Tune from source.
How can I access results after tuning remotely?
You can either call load_experiment()
, which will
download files from S3 if the experiment is not found locally. You can also
sync directly files from S3 under ~/syne-tune/
folder in batch for instance
by running:
aws s3 sync s3://{sagemaker-default-bucket}/syne-tune/{tuner-name}/ ~/syne-tune/ --include "*" --exclude "*tuner.dill"
To get all results without the tuner state (you can omit the include
and exclude
if you also want to include the tuner state).
How can I specify dependencies to remote launcher or when using the SageMaker backend?
When you run remote code, you often need to install packages
(e.g., scipy
) or have custom code available.
To install packages, you can add a file
requirements.txt
in the same folder as your endpoint script. All those packages will be installed by SageMaker when docker container starts.To include custom code (for instance a library that you are working on), you can set the parameter
dependencies
on the remote launcher or on a SageMaker framework to a list of folders. The folders indicated will be compressed, sent to S3 and added to the python path when the container starts. More details are given in this tutorial.
How can I benchmark different methods?
The most flexible way to do so is to write a custom launcher script, as detailed in this tutorial, see also these examples:
Local backend: benchmarking/examples/launch_local/
Simulator backend: benchmarking/examples/benchmark_dehb/
SageMaker backend: benchmarking/examples/launch_sagemaker/
Fine-tuning transformers: benchmarking/examples/fine_tuning_transformer_glue/
Hyper-Tune: benchmarking/examples/benchmark_hypertune/
Note
In order to run these examples, you need to have installed Syne Tune from source.
What different schedulers do you support? What are the main differences between them?
A succinct overview of supported schedulers is provided here.
Most methods can be accessed with short names by from
syne_tune.optimizer.baselines
, which is the best place to start.
We refer to HPO algorithms as schedulers. A scheduler decides which configurations to assign to new trials, but also when to stop a running or resume a paused trial. Some schedulers delegate the first decision to a searcher. The most important differences between schedulers in the single-objective case are:
Does the scheduler stop trials early or pause and resume trials (
HyperbandScheduler
) or not (FIFOScheduler
). The former requires a resource dimension (e.g., number of epochs; size of training set) and slightly more elaborate reporting (e.g., evaluation after every epoch), but can outperform the latter by a large margin.Does the searcher suggest new configurations by uniform random sampling (
searcher="random"
) or by sequential model-based decision-making (searcher="bayesopt"
,searcher="kde"
,searcher="hypertune"
,searcher="botorch"
,searcher="dyhpo"
). The latter can be more expensive if a lot of trials are run, but can also be more sample-efficient.
An overview of this landscape is given here.
Here is a tutorial for multi-fidelity schedulers. Further schedulers provided by Syne Tune include:
Bayesian optimization by density-ratio estimation:
BORE
Regularized evolution:
REA
Median stopping rule:
MedianStoppingRule
How do I define the configuration space?
While the training script defines the function to be optimized, some
care needs to be taken to define the configuration space for the hyperparameter
optimization problem. This being a global optimization problem without
gradients easily available, it is most important to reduce the number of
parameters. A general recommendation is to use
streamline_config_space()
on your configuration space,
which does some automatic rewriting to enforce best practices. Details on how
to choose a configuration space, and on automatic rewriting, is given
here.
A powerful approach is to run experiments in parallel. Namely, split your hyperparameters into groups A, B, such that HPO over B is tractable. Draw a set of N configurations from A at random, then start N HPO experiments in parallel, where in each of them the search space is over B only, while the parameters in A are fixed. Syne Tune supports massively parallel experimentation, see this tutorial.
How do I set arguments of multi-fidelity schedulers?
When running schedulers like ASHA
,
MOBSTER
,
HyperTune
,
SyncHyperband
,
or DEHB
, there are mandatory parameters
resource_attr
, max_resource_attr
, max_t
, max_resource_value
.
What are they for?
Full details are given in this tutorial. Multi-fidelity HPO needs metric values to be reported at regular intervals during training, for example after every epoch, or for successively larger training datasets. These reports are indexed by a resource value, which is a positive integer (for example, the number of epochs already trained).
resource_attr
is the name of the resource attribute in the dictionary reported by the training script. For example, the script may reportreport(epoch=5, mean_loss=0.125)
at the end of the 5-th epoch, in which caseresource_attr = "epoch"
.The training script needs to know how many resources to spend overall. For example, a neural network training script needs to know how many epochs to maximally train for. It is best practice to pass this maximum resource value as parameter into the script, which is done by making it part of the configuration space. In this case,
max_resource_attr
is the name of the attribute in the configuration space which contains the maximum resource value. For example, if your script should train for a maximum of 100 epochs (the scheduler may stop or pause it before, though), you could useconfig_space = dict(..., epochs=100)
, in which casemax_resource_attr = "epochs"
.Finally, you can also use
max_t
instead ofmax_resource_attr
, even though this is not recommended. If you don’t want to include the maximum resource value in your configuration space, you can pass the value directly asmax_t
. However, this can lead to avoidable errors, and may be inefficient for some schedulers.
Note
When creating a multi-fidelity scheduler, we recommend to use
max_resource_attr
in favour of max_t
or max_resource_value
, as
the latter is error-prone and may be less efficient for some schedulers.
Is my training script ready for multi-fidelity tuning?
A more detailed answer to this question is given in the multi-fidelity tutorial. In short:
You need to define the notion of resource for your script. Resource is a discrete variable (integer), so that time/costs scale linearly in it for every configuration. A common example is epochs of training for a neural network. You need to pass the name of this argument as
max_resource_attr
to the multi-fidelity scheduler.One input argument to your script is the maximum number of resources. Your script loops over resources until this is reached, then terminates.
At the end of this resource loop (e.g., loop over training epochs), you report metrics. Here, you need to report the current resource level as well (e.g., number of epochs trained so far).
It is recommended to support checkpointing, as is detailed here.
Note
In pause-and-resume multi-fidelity schedulers, we know for how many
resources each training job runs, since it is paused at the next rung
level. Such schedulers will pass this resource level via
max_resource_attr
to the training script. This means that the
script terminates on its own and does not have to be stopped by the
trial execution backend.
How can I visualize the progress of my tuning experiment with Tensorboard?
To visualize the progress of Syne Tune in
Tensorboard, you can pass
the TensorboardCallback
to the
Tuner
object:
from syne_tune.callbacks import TensorboardCallback
tuner = Tuner(
...
callbacks=[TensorboardCallback()],
)
Note that, you need to install TensorboardX to use this callback:
pip install tensorboardX
The callback will log all metrics that are reported in your training script via
the report(...)
function. Now, to open Tensorboard, run:
tensorboard --logdir ~/syne-tune/{tuner-name}/tensorboard_output
If you want to plot the cumulative optimum of the metric you want to
optimize, you can pass the target_metric
argument to
class:syne_tune.callbacks.TensorboardCallback
. This will also report the best
found hyperparameter configuration over time. A complete example is
examples/launch_tensorboard_example.py.
How can I add a new scheduler?
This is explained in detail in this tutorial, and also in examples/launch_height_standalone_scheduler.
Please do consider contributing back your efforts to the Syne Tune community, thanks!
How can I add a new tabular or surrogate benchmark?
To add a new dataset of tabular evaluations, you need to:
write a blackbox recipe able to regenerate it by extending
BlackboxRecipe
. You need in particular to provide the name of the blackbox, the reference so that users are prompted to cite the appropriated paper, and a code that can generate it from scratch. Seesyne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.lcbench
for an example.add your new recipe class in
syne_tune.blackbox_repository.conversion_scripts.recipes
to make it available in Syne Tune.
Further details are given here.
How can I reduce delays in starting trials with the SageMaker backend?
The SageMaker backend executes each trial as a SageMaker training job, which encurs start-up delays up to several minutes. These delays can be reduced to about 20 seconds with SageMaker managed warm pools, as is detailed in this tutorial or this example. We strongly recommend to use managed warm pools with the SageMaker backend.
How can I pass lists or dictionaries to the training script?
By default, the hyperparameter configuration is passed to the training script
as command line arguments. This precludes parameters from having complex types,
such as lists or dictionaries. The configuration can also be passed as JSON
file, in which case its entries can have any type which is JSON-serializable.
This mode is activated with pass_args_as_json=True
when creating the trial
backend:
trial_backend = LocalBackend(
entry_point=str(entry_point),
pass_args_as_json=True,
)
The trial backend stores the configuration as JSON file and passes its filename as command line argument. In the training script, the configuration is loaded as follows:
parser = ArgumentParser()
# Append required argument(s):
add_config_json_to_argparse(parser)
args, _ = parser.parse_known_args()
# Loads config JSON and merges with ``args``
config = load_config_json(vars(args))
The complete example is
here.
Note that entries automatically appended to the configuration by Syne Tune, such
as ST_CHECKPOINT_DIR
, are passed as command line
arguments in any case.
How can I write extra results for an experiment?
By default, Syne Tune is writing
these result files at the end of an experiment.
Here, results.csv.zip
contains all data reported by training jobs, along
with time stamps. The contents of this dataframe can be customized, by adding
extra columns to it, as demonstrated in
examples/launch_height_extra_results.py.
Examples
Tune XGBoost
Install dependencies
[ ]:
%pip install 'syne-tune[basic]'
%pip install xgboost
[ ]:
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import PythonBackend
from syne_tune.config_space import randint, uniform, loguniform
from syne_tune.optimizer.baselines import BayesianOptimization
from syne_tune.experiments import load_experiment
Define the training function
[ ]:
def train(n_estimators: int, max_depth: int, gamma: float, reg_lambda: float):
''' Training function (the function to be tuned) with hyperparameters passed in as function arguments
This example demonstrates training an XGBoost model on the UCI ML hand-written digits dataset.
Note that the training function must be totally self-contained as it needs to be serialized.
Everything (including variables and dependencies) must be defined or imported inside the function scope.
For more information on XGBoost's hyperparameters, see https://xgboost.readthedocs.io/en/stable/parameter.html
For more information about the dataset, see https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html
'''
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from syne_tune import Reporter
import xgboost
import numpy as np
X, y = load_digits(return_X_y=True)
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.25, random_state=42)
report = Reporter()
clf = xgboost.XGBClassifier(
n_estimators=n_estimators,
reg_lambda=reg_lambda,
gamma=gamma,
max_depth=max_depth,
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_val)
accuracy = (np.equal(y_val, y_pred) * 1.0).mean()
# report metrics back to syne tune
report(accuracy = accuracy)
Define the tuning parameters
[ ]:
# Hyperparameter configuration space
config_space = {
"max_depth": randint(1,10),
"gamma": uniform(1,10),
"reg_lambda": loguniform(.0000001, 1),
"n_estimators": randint(5, 15)
}
# Scheduler (i.e., HPO algorithm)
scheduler = BayesianOptimization(
config_space,
metric="accuracy",
mode="max"
)
tuner = Tuner(
trial_backend=PythonBackend(tune_function=train, config_space=config_space),
scheduler=scheduler,
stop_criterion=StoppingCriterion(max_wallclock_time=30),
n_workers=4, # how many trials are evaluated in parallel
)
Run the tuning
[ ]:
tuner.run()
tuning_experiment = load_experiment(tuner.name)
print(f"best result found: {tuning_experiment.best_config()}")
tuning_experiment.plot()
Launch HPO Experiment Locally
import logging
from pathlib import Path
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint
from syne_tune.optimizer.baselines import (
RandomSearch,
ASHA,
)
from examples.training_scripts.height_example.train_height import (
RESOURCE_ATTR,
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
)
from syne_tune.try_import import try_import_gpsearchers_message
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
max_epochs = 100
n_workers = 4
config_space = {
MAX_RESOURCE_ATTR: max_epochs,
"width": randint(0, 20),
"height": randint(-100, 100),
}
entry_point = (
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
scheduler_kwargs = {
"config_space": config_space,
"metric": METRIC_ATTR,
"mode": METRIC_MODE,
"max_resource_attr": MAX_RESOURCE_ATTR,
}
schedulers = [
RandomSearch(**scheduler_kwargs),
ASHA(**scheduler_kwargs, resource_attr=RESOURCE_ATTR),
]
try:
from syne_tune.optimizer.baselines import BayesianOptimization
# example of setting additional kwargs arguments
schedulers.append(
BayesianOptimization(
**scheduler_kwargs,
search_options={"num_init_random": n_workers + 2},
)
)
from syne_tune.optimizer.baselines import MOBSTER
schedulers.append(MOBSTER(*scheduler_kwargs, resource_attr=RESOURCE_ATTR))
except Exception:
logging.info(try_import_gpsearchers_message())
for scheduler in schedulers:
logging.info(f"\n*** running scheduler {scheduler} ***\n")
trial_backend = LocalBackend(entry_point=str(entry_point))
stop_criterion = StoppingCriterion(
max_wallclock_time=20, min_metric_value={METRIC_ATTR: -6.0}
)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
)
tuner.run()
Along with several of the examples below, this launcher script is using the following train_height.py training script:
"""
Example similar to Raytune, https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/skopt_example.py
"""
import logging
import time
from typing import Optional, Dict, Any
from syne_tune import Reporter
from argparse import ArgumentParser
from syne_tune.config_space import randint
report = Reporter()
RESOURCE_ATTR = "epoch"
METRIC_ATTR = "mean_loss"
METRIC_MODE = "min"
MAX_RESOURCE_ATTR = "steps"
def train_height(step: int, width: float, height: float) -> float:
return 100 / (10 + width * step) + 0.1 * height
def height_config_space(
max_steps: int, sleep_time: Optional[float] = None
) -> Dict[str, Any]:
kwargs = {"sleep_time": sleep_time} if sleep_time is not None else dict()
return {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
**kwargs,
}
if __name__ == "__main__":
root = logging.getLogger()
root.setLevel(logging.INFO)
parser = ArgumentParser()
parser.add_argument("--" + MAX_RESOURCE_ATTR, type=int)
parser.add_argument("--width", type=float)
parser.add_argument("--height", type=float)
parser.add_argument("--sleep_time", type=float, default=0.1)
args, _ = parser.parse_known_args()
width = args.width
height = args.height
num_steps = getattr(args, MAX_RESOURCE_ATTR)
for step in range(num_steps):
# Sleep first, since results are returned at end of "epoch"
time.sleep(args.sleep_time)
# Feed the score back to Syne Tune.
dummy_score = train_height(step, width, height)
report(
**{
"step": step,
METRIC_ATTR: dummy_score,
RESOURCE_ATTR: step + 1,
}
)
Fine-Tuning Hugging Face Model for Sentiment Classification
"""
Example for how to fine-tune a DistilBERT model on the IMDB sentiment classification task using the Hugging Face SageMaker Framework.
"""
import logging
from pathlib import Path
from sagemaker.huggingface import HuggingFace
import syne_tune
from benchmarking.benchmark_definitions import distilbert_imdb_benchmark
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
get_execution_role,
default_sagemaker_session,
)
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune.remote.constants import (
HUGGINGFACE_LATEST_FRAMEWORK_VERSION,
HUGGINGFACE_LATEST_PYTORCH_VERSION,
HUGGINGFACE_LATEST_TRANSFORMERS_VERSION,
HUGGINGFACE_LATEST_PY_VERSION,
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
# We pick the DistilBERT on IMDB benchmark
# The 'benchmark' dict contains arguments needed by scheduler and
# searcher (e.g., 'mode', 'metric'), along with suggested default values
# for other arguments (which you are free to override)
random_seed = 31415927
n_workers = 4
benchmark = distilbert_imdb_benchmark()
mode = benchmark.mode
metric = benchmark.metric
config_space = benchmark.config_space
# Define Hugging Face SageMaker estimator
root = Path(syne_tune.__path__[0]).parent
estimator = HuggingFace(
framework_version=HUGGINGFACE_LATEST_FRAMEWORK_VERSION,
transformers_version=HUGGINGFACE_LATEST_TRANSFORMERS_VERSION,
pytorch_version=HUGGINGFACE_LATEST_PYTORCH_VERSION,
py_version=HUGGINGFACE_LATEST_PY_VERSION,
entry_point=str(benchmark.script),
base_job_name="hpo-transformer",
instance_type=benchmark.instance_type,
instance_count=1,
role=get_execution_role(),
dependencies=[root / "benchmarking"],
sagemaker_session=default_sagemaker_session(),
)
# SageMaker backend
trial_backend = SageMakerBackend(
sm_estimator=estimator,
metrics_names=[metric],
)
# Random search without stopping
scheduler = RandomSearch(
config_space, mode=mode, metric=metric, random_seed=random_seed
)
stop_criterion = StoppingCriterion(
max_wallclock_time=3000
) # wall clock time can be increased to 1 hour for more performance
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
)
tuner.run()
Requirements:
Running this script requires Syne Tune to be installed from source.
Runs on four
ml.g4dn.xlarge
instances
In this example, we use the SageMaker backend together with the SageMaker Hugging Face framework in order to fine-tune a DistilBERT model on the IMDB sentiment classification task. This task is one of our built-in benchmarks. For other ways to run this benchmark on different backends or remotely, consult this tutorial.
A more advanced example for fine-tuning Hugging Face transformers is given here.
Launch HPO Experiment with Python Backend
"""
An example showing to launch a tuning of a python function ``train_height``.
"""
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import PythonBackend
from syne_tune.config_space import randint
from syne_tune.optimizer.baselines import ASHA
def train_height(steps: int, width: float, height: float):
"""
The function to be tuned, note that import must be in PythonBackend and no global variable are allowed,
more details on requirements of tuned functions can be found in
:class:`~syne_tune.backend.PythonBackend`.
"""
import logging
from syne_tune import Reporter
import time
root = logging.getLogger()
root.setLevel(logging.INFO)
reporter = Reporter()
for step in range(steps):
dummy_score = (0.1 + width * step / 100) ** (-1) + height * 0.1
# Feed the score back to Syne Tune.
reporter(step=step, mean_loss=dummy_score, epoch=step + 1)
time.sleep(0.1)
if __name__ == "__main__":
import logging
root = logging.getLogger()
root.setLevel(logging.INFO)
max_steps = 100
n_workers = 4
metric = "mean_loss"
mode = "min"
max_resource_attr = "steps"
config_space = {
max_resource_attr: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
}
scheduler = ASHA(
config_space,
metric=metric,
max_resource_attr=max_resource_attr,
resource_attr="epoch",
mode=mode,
)
trial_backend = PythonBackend(tune_function=train_height, config_space=config_space)
stop_criterion = StoppingCriterion(
max_wallclock_time=10, min_metric_value={metric: -6.0}
)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
)
tuner.run()
The Python backend does not need a separate training script.
Population-Based Training (PBT)
import logging
from pathlib import Path
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import PopulationBasedTraining
from syne_tune import Tuner
from syne_tune.config_space import loguniform
from syne_tune import StoppingCriterion
if __name__ == "__main__":
logging.getLogger().setLevel(logging.DEBUG)
max_trials = 100
config_space = {
"lr": loguniform(0.0001, 0.02),
}
entry_point = (
Path(__file__).parent / "training_scripts" / "pbt_example" / "pbt_example.py"
)
trial_backend = LocalBackend(entry_point=str(entry_point))
mode = "max"
metric = "mean_accuracy"
time_attr = "training_iteration"
population_size = 2
pbt = PopulationBasedTraining(
config_space=config_space,
metric=metric,
resource_attr=time_attr,
population_size=population_size,
mode=mode,
max_t=200,
perturbation_interval=1,
)
local_tuner = Tuner(
trial_backend=trial_backend,
scheduler=pbt,
stop_criterion=StoppingCriterion(max_wallclock_time=20),
n_workers=population_size,
results_update_interval=1,
)
local_tuner.run()
This launcher script is using the following pbt_example.py training script:
import numpy as np
import argparse
import logging
import json
import os
import random
import time
from syne_tune import Reporter
from syne_tune.constants import ST_CHECKPOINT_DIR
report = Reporter()
def pbt_function(config):
"""Toy PBT problem for benchmarking adaptive learning rate.
The goal is to optimize this trainable's accuracy. The accuracy increases
fastest at the optimal lr, which is a function of the current accuracy.
The optimal lr schedule for this problem is the triangle wave as follows.
Note that many lr schedules for real models also follow this shape:
best lr
^
| /\
| / \
| / \
| / \
------------> accuracy
In this problem, using PBT with a population of 2-4 is sufficient to
roughly approximate this lr schedule. Higher population sizes will yield
faster convergence. Training will not converge without PBT.
"""
lr = config["lr"]
checkpoint_dir = config.get(ST_CHECKPOINT_DIR)
accuracy = 0.0 # end = 1000
start = 1
if checkpoint_dir and os.path.isdir(checkpoint_dir):
with open(os.path.join(checkpoint_dir, "checkpoint.json"), "r") as f:
state = json.loads(f.read())
accuracy = state["acc"]
start = state["step"]
midpoint = 100 # lr starts decreasing after acc > midpoint
q_tolerance = 3 # penalize exceeding lr by more than this multiple
noise_level = 2 # add gaussian noise to the acc increase
# triangle wave:
# - start at 0.001 @ t=0,
# - peak at 0.01 @ t=midpoint,
# - end at 0.001 @ t=midpoint * 2,
for step in range(start, 200):
if accuracy < midpoint:
optimal_lr = 0.01 * accuracy / midpoint
else:
optimal_lr = 0.01 - 0.01 * (accuracy - midpoint) / midpoint
optimal_lr = min(0.01, max(0.001, optimal_lr))
# Compute accuracy increase
q_err = max(lr, optimal_lr) / min(lr, optimal_lr)
if q_err < q_tolerance:
accuracy += (1.0 / q_err) * random.random()
elif lr > optimal_lr:
accuracy -= (q_err - q_tolerance) * random.random()
accuracy += noise_level * np.random.normal()
accuracy = max(0, accuracy)
# Save checkpoint
if checkpoint_dir is not None:
os.makedirs(os.path.join(checkpoint_dir), exist_ok=True)
path = os.path.join(checkpoint_dir, "checkpoint.json")
with open(path, "w") as f:
f.write(json.dumps({"acc": accuracy, "step": step}))
report(
mean_accuracy=accuracy,
cur_lr=lr,
training_iteration=step,
optimal_lr=optimal_lr, # for debugging
q_err=q_err, # for debugging
# done=accuracy > midpoint * 2 # this stops the training process
)
time.sleep(2)
if __name__ == "__main__":
root = logging.getLogger()
root.setLevel(logging.INFO)
parser = argparse.ArgumentParser()
parser.add_argument("--lr", type=float)
parser.add_argument(f"--{ST_CHECKPOINT_DIR}", type=str)
args, _ = parser.parse_known_args()
params = vars(args)
pbt_function(params)
For this toy example, PBT is run with a population size of 2, so only two parallel workers are needed. In order to use PBT competitively, choose the SageMaker backend. Note that PBT requires your training script to support checkpointing.
Visualize Tuning Progress with Tensorboard
"""
Example showing how to visualize the HPO process of Syne Tune with Tensorboard.
Results will be stored in ~/syne-tune/{tuner_name}/tensoboard_output. To start
tensorboard, execute in a separate shell:
.. code:: bash
tensorboard --logdir /~/syne-tune/{tuner_name}/tensorboard_output
Open the displayed URL in the browser.
To use this functionality you need to install tensorboardX:
.. code:: bash
pip install tensorboardX
"""
import logging
from pathlib import Path
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune import Tuner, StoppingCriterion
from syne_tune.config_space import randint
from syne_tune.callbacks.tensorboard_callback import TensorboardCallback
from syne_tune.results_callback import StoreResultsCallback
from examples.training_scripts.height_example.train_height import (
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.DEBUG)
random_seed = 31415927
max_steps = 100
n_workers = 4
config_space = {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
}
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
trial_backend = LocalBackend(entry_point=entry_point)
# Random search without stopping
scheduler = RandomSearch(
config_space, mode=METRIC_MODE, metric=METRIC_ATTR, random_seed=random_seed
)
stop_criterion = StoppingCriterion(max_wallclock_time=20)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
n_workers=n_workers,
stop_criterion=stop_criterion,
results_update_interval=5,
# Adding the TensorboardCallback overwrites the default callback which consists of the StoreResultsCallback.
# To write results on this disk as well, we put this in here as well.
callbacks=[
TensorboardCallback(target_metric=METRIC_ATTR, mode=METRIC_MODE),
StoreResultsCallback(),
],
tuner_name="tensorboardx-demo",
metadata={"description": "just an example"},
)
tuner.run()
Requirements:
Needs
tensorboardX
to be installed:pip install tensorboardX
.
Makes use of train_height.py.
Tensorboard visualization works by using a callback, for example
TensorboardCallback
,
which is passed to the Tuner
. In order to visualize
other metrics, you may have to modify this callback.
Bayesian Optimization with Scikit-learn Based Surrogate Model
import copy
from pathlib import Path
from typing import Tuple
import logging
import numpy as np
from sklearn.linear_model import BayesianRidge
from examples.training_scripts.height_example.train_height import (
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
)
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint
from syne_tune.optimizer.schedulers import FIFOScheduler
from syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl import (
EIAcquisitionFunction,
)
from syne_tune.optimizer.schedulers.searchers.sklearn import (
SKLearnSurrogateSearcher,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn import (
SKLearnEstimator,
SKLearnPredictor,
)
class BayesianRidgePredictor(SKLearnPredictor):
"""
Predictor for surrogate model given by ``sklearn.linear_model.BayesianRidge``.
"""
def __init__(self, ridge: BayesianRidge):
self.ridge = ridge
def predict(self, X: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
return self.ridge.predict(X, return_std=True)
class BayesianRidgeEstimator(SKLearnEstimator):
"""
Estimator for surrogate model given by ``sklearn.linear_model.BayesianRidge``.
None of the parameters of ``BayesianRidge`` are exposed here, so they are all
fixed up front.
"""
def __init__(self, *args, **kwargs):
self.ridge = BayesianRidge(*args, **kwargs)
def fit(
self, X: np.ndarray, y: np.ndarray, update_params: bool
) -> SKLearnPredictor:
self.ridge.fit(X, y.ravel())
return BayesianRidgePredictor(ridge=copy.deepcopy(self.ridge))
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
max_epochs = 100
n_workers = 4
config_space = {
"width": randint(1, 20),
"height": randint(1, 20),
MAX_RESOURCE_ATTR: 100,
}
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
# We use ``FIFOScheduler`` with a specific searcher based on our surrogate
# model
searcher = SKLearnSurrogateSearcher(
config_space=config_space,
metric=METRIC_ATTR,
estimator=BayesianRidgeEstimator(),
scoring_class=EIAcquisitionFunction,
)
scheduler = FIFOScheduler(
config_space,
metric=METRIC_ATTR,
mode=METRIC_MODE,
max_resource_attr=MAX_RESOURCE_ATTR,
searcher=searcher,
)
tuner = Tuner(
trial_backend=LocalBackend(entry_point=entry_point),
scheduler=scheduler,
stop_criterion=StoppingCriterion(max_wallclock_time=60),
n_workers=n_workers,
)
tuner.run()
Requirements:
Needs
sckit-learn
to be installed. If you installed Syne Tune withsklearn
orbasic
, this dependence is included.
In this example, a simple new surrogate model is implemented based on
sklearn.linear_model.BayesianRidge
, and Bayesian optimization is run with
this surrogate model rather than a Gaussian process model.
Launch HPO Experiment with Simulator Backend
"""
Example for running the simulator backend on a tabulated benchmark
"""
import logging
from syne_tune.experiments.benchmark_definitions.nas201 import nas201_benchmark
from syne_tune.blackbox_repository import BlackboxRepositoryBackend
from syne_tune.backend.simulator_backend.simulator_callback import SimulatorCallback
from syne_tune.optimizer.baselines import ASHA
from syne_tune import Tuner, StoppingCriterion
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
n_workers = 4
dataset_name = "cifar100"
benchmark = nas201_benchmark(dataset_name)
# Simulator backend specialized to tabulated blackboxes
max_resource_attr = benchmark.max_resource_attr
trial_backend = BlackboxRepositoryBackend(
elapsed_time_attr=benchmark.elapsed_time_attr,
max_resource_attr=max_resource_attr,
blackbox_name=benchmark.blackbox_name,
dataset=dataset_name,
)
# Asynchronous successive halving (ASHA)
blackbox = trial_backend.blackbox
scheduler = ASHA(
config_space=blackbox.configuration_space_with_max_resource_attr(
max_resource_attr
),
max_resource_attr=max_resource_attr,
resource_attr=blackbox.fidelity_name(),
mode=benchmark.mode,
metric=benchmark.metric,
search_options={"debug_log": False},
random_seed=random_seed,
)
max_wallclock_time = 3600
stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
# Printing the status during tuning takes a lot of time, and so does
# storing results.
print_update_interval = 700
results_update_interval = 300
# It is important to set ``sleep_time`` to 0 here (mandatory for simulator
# backend)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
sleep_time=0,
results_update_interval=results_update_interval,
print_update_interval=print_update_interval,
# This callback is required in order to make things work with the
# simulator callback. It makes sure that results are stored with
# simulated time (rather than real time), and that the time_keeper
# is advanced properly whenever the tuner loop sleeps
callbacks=[SimulatorCallback()],
)
tuner.run()
Requirements:
Syne Tune dependencies
blackbox-repository
need to be installed.Needs
nasbench201
blackbox to be downloaded and preprocessed. This can take quite a while when done for the first timeIf AWS SageMaker is used or an S3 bucket is accessible, the blackbox files are uploaded to your S3 bucket
In this example, we use the simulator backend with the NASBench-201
blackbox. Since time is simulated, we can use
max_wallclock_time=3600
(one hour), but the experiment finishes
in mere seconds. More details about the simulator backend is found in
this tutorial.
Joint Tuning of Instance Type and Hyperparameters using MOASHA
"""
Example showing how to tune instance types and hyperparameters with a Sagemaker Framework.
"""
import logging
from pathlib import Path
from sagemaker.huggingface import HuggingFace
from syne_tune import StoppingCriterion, Tuner
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.instance_info import select_instance_type
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
get_execution_role,
default_sagemaker_session,
)
from syne_tune.config_space import loguniform, choice
from syne_tune.constants import (
ST_WORKER_TIME,
ST_WORKER_COST,
ST_INSTANCE_TYPE,
)
from syne_tune.optimizer.schedulers.multiobjective import MOASHA
from syne_tune.remote.constants import (
DEFAULT_CPU_INSTANCE_SMALL,
HUGGINGFACE_LATEST_FRAMEWORK_VERSION,
HUGGINGFACE_LATEST_PYTORCH_VERSION,
HUGGINGFACE_LATEST_TRANSFORMERS_VERSION,
HUGGINGFACE_LATEST_PY_VERSION,
)
from syne_tune.remote.remote_launcher import RemoteLauncher
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
n_workers = 2
epochs = 4
# Select the instance types that are searched.
# Alternatively, you can define the instance list explicitly:
# :code:`instance_types = ["ml.c5.xlarge", "ml.m5.2xlarge"]`
instance_types = select_instance_type(min_gpu=1, max_cost_per_hour=5.0)
print(f"tuning over hyperparameters and instance types: {instance_types}")
# define a search space that contains hyperparameters (learning-rate, weight-decay) and instance-type.
config_space = {
ST_INSTANCE_TYPE: choice(instance_types),
"learning_rate": loguniform(1e-6, 1e-4),
"weight_decay": loguniform(1e-5, 1e-2),
"epochs": epochs,
"dataset_path": "./",
}
entry_point = (
Path(__file__).parent.parent
/ "benchmarking"
/ "training_scripts"
/ "distilbert_on_imdb"
/ "distilbert_on_imdb.py"
)
metric = "accuracy"
# Define a MOASHA scheduler that searches over the config space to maximise accuracy and minimize cost and time.
scheduler = MOASHA(
max_t=epochs,
time_attr="step",
metrics=[metric, ST_WORKER_COST, ST_WORKER_TIME],
mode=["max", "min", "min"],
config_space=config_space,
)
# Define the training function to be tuned, use the Sagemaker backend to execute trials as separate training job
# (since they are quite expensive).
trial_backend = SageMakerBackend(
sm_estimator=HuggingFace(
framework_version=HUGGINGFACE_LATEST_FRAMEWORK_VERSION,
transformers_version=HUGGINGFACE_LATEST_TRANSFORMERS_VERSION,
pytorch_version=HUGGINGFACE_LATEST_PYTORCH_VERSION,
py_version=HUGGINGFACE_LATEST_PY_VERSION,
entry_point=str(entry_point),
base_job_name="hpo-transformer",
# instance-type given here are override by Syne Tune with values sampled from ST_INSTANCE_TYPE.
instance_type=DEFAULT_CPU_INSTANCE_SMALL,
instance_count=1,
max_run=3600,
role=get_execution_role(),
dependencies=[str(Path(__file__).parent.parent / "benchmarking")],
sagemaker_session=default_sagemaker_session(),
disable_profiler=True,
debugger_hook_config=False,
),
)
remote_launcher = RemoteLauncher(
tuner=Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=StoppingCriterion(max_wallclock_time=3600, max_cost=10.0),
n_workers=n_workers,
sleep_time=5.0,
),
dependencies=[str(Path(__file__).parent.parent / "benchmarking")],
)
remote_launcher.run(wait=False)
Requirements:
Needs code from
benchmarking.training_scripts.distilbert_on_imdb
,which requires Syne Tune to be installed from source.
Runs training jobs on instances of type
ml.g4dn.xlarge
,ml.g5.xlarge
,ml.g4dn.2xlarge
,ml.p2.xlarge
,ml.g5.2xlarge
,ml.g5.4xlarge
,ml.g4dn.4xlarge
,ml.g5.8xlarge
,ml.g4dn.8xlarge
,ml.p3.2xlarge
,ml.g5.16xlarge
. This list of instances types to be searched over can be modified by the user
In this example, we use the SageMaker backend together with the SageMaker Hugging Face framework in order to fine-tune a DistilBERT model on the IMDB sentiment classification task:
Instead of optimizing a single objective, we use
MOASHA
in order to sample the Pareto frontier w.r.t. three objectivesWe not only tune hyperparameters such as learning rate and weight decay, but also the AWS instance type to be used for training. Here, one of the objectives to minimize is the training cost (in dollars).
Multi-objective Asynchronous Successive Halving (MOASHA)
"""
Example showing how to tune multiple objectives at once of an artificial function.
"""
import logging
from pathlib import Path
import numpy as np
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers.multiobjective import MOASHA
from syne_tune import Tuner, StoppingCriterion
from syne_tune.config_space import uniform
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
np.random.seed(0)
max_steps = 27
n_workers = 4
config_space = {
"steps": max_steps,
"theta": uniform(0, np.pi / 2),
"sleep_time": 0.01,
}
entry_point = (
Path(__file__).parent
/ "training_scripts"
/ "mo_artificial"
/ "mo_artificial.py"
)
mode = "min"
np.random.seed(0)
scheduler = MOASHA(
max_t=max_steps,
time_attr="step",
mode=mode,
metrics=["y1", "y2"],
config_space=config_space,
)
trial_backend = LocalBackend(entry_point=str(entry_point))
stop_criterion = StoppingCriterion(max_wallclock_time=20)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
sleep_time=0.5,
)
tuner.run()
This launcher script is using the following mo_artificial.py training script:
import time
from argparse import ArgumentParser
import numpy as np
from syne_tune import Reporter
def f(t, theta):
# Function drawing upper-right circles with radius set to ``t`` and with center set at
# (-t, -t). ``t`` is interpreted as a fidelity and larger ``t`` corresponds to larger radius and better candidates.
# The optimal multiobjective solution should select theta uniformly from [0, pi/2].
return {
"y1": -t + t * np.cos(theta),
"y2": -t + t * np.sin(theta),
}
def plot_function():
import matplotlib.pyplot as plt
ts = np.linspace(0, 27, num=5)
thetas = np.linspace(0, 1) * np.pi / 2
y1s = []
y2s = []
for t in ts:
for theta in thetas:
res = f(t, theta)
y1s.append(res["y1"])
y2s.append(res["y2"])
plt.scatter(y1s, y2s)
plt.show()
if __name__ == "__main__":
# plot_function()
parser = ArgumentParser()
parser.add_argument("--steps", type=int, required=True)
parser.add_argument("--theta", type=float, required=True)
parser.add_argument("--sleep_time", type=float, required=False, default=0.1)
args, _ = parser.parse_known_args()
assert 0 <= args.theta < np.pi / 2
reporter = Reporter()
for step in range(args.steps):
y = f(t=step, theta=args.theta)
reporter(step=step, **y)
time.sleep(args.sleep_time)
PASHA: Efficient HPO and NAS with Progressive Resource Allocation
"""
Example for running PASHA on NASBench201
"""
import logging
from syne_tune.experiments.benchmark_definitions.nas201 import nas201_benchmark
from syne_tune.blackbox_repository import BlackboxRepositoryBackend
from syne_tune.backend.simulator_backend.simulator_callback import SimulatorCallback
from syne_tune.optimizer.baselines import PASHA
from syne_tune import Tuner, StoppingCriterion
if __name__ == "__main__":
logging.getLogger().setLevel(logging.WARNING)
random_seed = 1
nb201_random_seed = 0
n_workers = 4
dataset_name = "cifar100"
benchmark = nas201_benchmark(dataset_name)
# simulator backend specialized to tabulated blackboxes
max_resource_attr = benchmark.max_resource_attr
trial_backend = BlackboxRepositoryBackend(
blackbox_name=benchmark.blackbox_name,
elapsed_time_attr=benchmark.elapsed_time_attr,
max_resource_attr=max_resource_attr,
dataset=dataset_name,
seed=nb201_random_seed,
)
blackbox = trial_backend.blackbox
scheduler = PASHA(
config_space=blackbox.configuration_space_with_max_resource_attr(
max_resource_attr
),
max_resource_attr=max_resource_attr,
resource_attr=blackbox.fidelity_name(),
mode=benchmark.mode,
metric=benchmark.metric,
random_seed=random_seed,
)
max_num_trials_started = 256
stop_criterion = StoppingCriterion(max_num_trials_started=max_num_trials_started)
# printing the status during tuning takes a lot of time, and so does
# storing results
print_update_interval = 700
results_update_interval = 300
# it is important to set ``sleep_time`` to 0 here (mandatory for simulator
# backend)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
sleep_time=0,
results_update_interval=results_update_interval,
print_update_interval=print_update_interval,
# this callback is required in order to make things work with the
# simulator callback. It makes sure that results are stored with
# simulated time (rather than real time), and that the time_keeper
# is advanced properly whenever the tuner loop sleeps
callbacks=[SimulatorCallback()],
)
tuner.run()
Requirements:
Syne Tune dependencies
blackbox-repository
need to be installed.Needs
nasbench201
blackbox to be downloaded and preprocessed. This can take quite a while when done for the first time
PASHA typically uses max_num_trials_completed
as the stopping criterion.
After finding a strong configuration using PASHA,
the next step is to fully train a model with the configuration.
Constrained Bayesian Optimization
"""
Example for running constrained Bayesian optimization on a toy example
"""
import logging
from pathlib import Path
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import FIFOScheduler
from syne_tune.config_space import uniform
from syne_tune import StoppingCriterion, Tuner
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
n_workers = 2
config_space = {
"x1": uniform(-5, 10),
"x2": uniform(0, 15),
"constraint_offset": 1.0, # the lower, the stricter
}
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "constrained_hpo"
/ "train_constrained_example.py"
)
mode = "max"
metric = "objective"
constraint_attr = "my_constraint_metric"
# Local backend
trial_backend = LocalBackend(entry_point=entry_point)
# Bayesian constrained optimization:
# :math:`max_x f(x), \mathrm{s.t.} c(x) <= 0`
# Here, ``metric`` represents :math:`f(x)`, ``constraint_attr`` represents
# :math:`c(x)`.
search_options = {
"num_init_random": n_workers,
"constraint_attr": constraint_attr,
}
scheduler = FIFOScheduler(
config_space,
searcher="bayesopt_constrained",
search_options=search_options,
mode=mode,
metric=metric,
random_seed=random_seed,
)
stop_criterion = StoppingCriterion(max_wallclock_time=20)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
)
tuner.run()
This launcher script is using the following train_constrained_example.py training script:
import logging
import numpy as np
from syne_tune import Reporter
from argparse import ArgumentParser
report = Reporter()
if __name__ == "__main__":
root = logging.getLogger()
root.setLevel(logging.DEBUG)
parser = ArgumentParser()
parser.add_argument("--x1", type=float)
parser.add_argument("--x2", type=float)
parser.add_argument("--constraint_offset", type=float)
args, _ = parser.parse_known_args()
x1 = args.x1
x2 = args.x2
constraint_offset = args.constraint_offset
r = 6
objective_value = (
(x2 - (5.1 / (4 * np.pi**2)) * x1**2 + (5 / np.pi) * x1 - r) ** 2
+ 10 * (1 - 1 / (8 * np.pi)) * np.cos(x1)
+ 10
)
constraint_value = (
x1 * 2.0 - constraint_offset
) # feasible iff x1 <= 0.5 * constraint_offset
report(objective=-objective_value, my_constraint_metric=constraint_value)
Restrict Scheduler to Tabulated Configurations with Simulator Backend
"""
Example for running the simulator backend on the "lcbench" tabulated
benchmark. The scheduler is restricted to work with the configurations
which have been evaluated under the benchmark.
"""
import logging
from syne_tune.experiments.benchmark_definitions.lcbench import lcbench_benchmark
from syne_tune.blackbox_repository import BlackboxRepositoryBackend
from syne_tune.backend.simulator_backend.simulator_callback import SimulatorCallback
from syne_tune.optimizer.baselines import BayesianOptimization
from syne_tune import Tuner, StoppingCriterion
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
n_workers = 4
dataset_name = "airlines"
benchmark = lcbench_benchmark(dataset_name)
# Simulator backend specialized to tabulated blackboxes
# Note: Even though ``lcbench_benchmark`` defines a surrogate, we
# do not use this here
max_resource_attr = benchmark.max_resource_attr
trial_backend = BlackboxRepositoryBackend(
elapsed_time_attr=benchmark.elapsed_time_attr,
max_resource_attr=max_resource_attr,
blackbox_name=benchmark.blackbox_name,
dataset=dataset_name,
)
# GP-based Bayesian optimization
# Using ``restrict_configurations``, we restrict the scheduler to only
# suggest configurations which have observations in the tabulated
# blackbox
blackbox = trial_backend.blackbox
restrict_configurations = blackbox.all_configurations()
scheduler = BayesianOptimization(
config_space=blackbox.configuration_space_with_max_resource_attr(
max_resource_attr
),
max_resource_attr=max_resource_attr,
mode=benchmark.mode,
metric=benchmark.metric,
random_seed=random_seed,
search_options=dict(restrict_configurations=restrict_configurations),
)
max_wallclock_time = 3600
stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
# Printing the status during tuning takes a lot of time, and so does
# storing results.
print_update_interval = 700
results_update_interval = 300
# It is important to set ``sleep_time`` to 0 here (mandatory for simulator
# backend)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
sleep_time=0,
results_update_interval=results_update_interval,
print_update_interval=print_update_interval,
# This callback is required in order to make things work with the
# simulator callback. It makes sure that results are stored with
# simulated time (rather than real time), and that the time_keeper
# is advanced properly whenever the tuner loop sleeps
callbacks=[SimulatorCallback()],
)
tuner.run()
Requirements:
Syne Tune dependencies
blackbox-repository
need to be installed.Needs
lcbench
blackbox to be downloaded and preprocessed. This can take quite a while when done for the first timeIf AWS SageMaker is used or an S3 bucket is accessible, the blackbox files are uploaded to your S3 bucket
This example is similar to the one above, but here we use the tabulated LCBench benchmark, whose configuration space is infinite, and whose objective values have not been evaluated on a grid. With such a benchmark, we can either use a surrogate to interpolate objective values, or we can restrict the scheduler to only suggest configurations which have been observed in the benchmark. This example demonstrates the latter.
Since time is simulated, we can use max_wallclock_time=3600
(one hour),
but the experiment finishes in mere seconds. More details about the simulator
backend is found in
this tutorial.
Tuning Reinforcement Learning
"""
This launches a local HPO tuning the discount factor of PPO on cartpole.
To run this example, you should have installed dependencies in ``requirements.txt``.
"""
import logging
from pathlib import Path
import numpy as np
from syne_tune.backend import LocalBackend
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import ASHA
import syne_tune.config_space as sp
from syne_tune import Tuner, StoppingCriterion
if __name__ == "__main__":
logging.getLogger().setLevel(logging.DEBUG)
np.random.seed(0)
max_steps = 100
metric = "episode_reward_mean"
mode = "max"
max_resource_attr = "max_iterations"
trial_backend = LocalBackend(
entry_point=Path(__file__).parent
/ "training_scripts"
/ "rl_cartpole"
/ "train_cartpole.py"
)
scheduler = ASHA(
config_space={
max_resource_attr: max_steps,
"gamma": sp.uniform(0.5, 0.99),
"lr": sp.loguniform(1e-6, 1e-3),
},
metric=metric,
mode=mode,
max_resource_attr=max_resource_attr,
resource_attr="training_iter",
search_options={"debug_log": False},
)
stop_criterion = StoppingCriterion(max_wallclock_time=60)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=2,
)
tuner.run()
tuning_experiment = load_experiment(tuner.name)
print(f"best result found: {tuning_experiment.best_config()}")
tuning_experiment.plot()
This launcher script is using the following train_cartpole.py training script:
"""
Adapts the introductionary example of rllib that trains a Cartpole with PPO.
https://docs.ray.io/en/master/rllib/index.html
The input arguments learning-rate and gamma discount factor can be tuned for maximizing the episode mean reward.
"""
from argparse import ArgumentParser
from syne_tune import Reporter
from ray.rllib.algorithms.ppo import PPO
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument("--max_training_steps", type=int, default=100)
parser.add_argument("--lr", type=float, default=5e-5)
parser.add_argument("--gamma", type=float, default=0.99)
args, _ = parser.parse_known_args()
# Configure the algorithm.
config = {
# Environment (RLlib understands openAI gym registered strings).
"env": "CartPole-v0",
"num_workers": 2,
# Use "tf" for TensorFlow, "torch" for PyTorch, "tf2" for
# tf2.x eager execution
"framework": "torch",
"gamma": args.gamma,
"lr": args.lr,
}
trainer = PPO(config=config)
reporter = Reporter()
# Run it for n max_training_steps iterations. A training iteration includes
# parallel sample collection by the environment workers as well as
# loss calculation on the collected batch and a model update.
# Episode reward mean is reported each time.
for i in range(args.max_training_steps):
results = trainer.train()
reporter(
training_iter=i + 1,
episode_reward_mean=results["episode_reward_mean"],
)
This training script requires the following dependencies to be installed:
tensorboardX==2.5.1
opencv-python
ray[rllib]==2.9.1
dm-tree==0.1.8
gymnasium==0.28.1
tensorflow==2.11.1
pygame==2.1.2
Launch HPO Experiment with SageMaker Backend
"""
Example showing how to run on Sagemaker with a Sagemaker Framework.
"""
import logging
import os
from pathlib import Path
from sagemaker.pytorch import PyTorch
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
get_execution_role,
default_sagemaker_session,
)
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
)
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune.remote.constants import (
DEFAULT_CPU_INSTANCE_SMALL,
PYTORCH_LATEST_FRAMEWORK,
PYTORCH_LATEST_PY_VERSION,
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
max_steps = 100
n_workers = 4
max_wallclock_time = 5 * 60
config_space = {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
}
entry_point = (
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
# Random search without stopping
scheduler = RandomSearch(
config_space, mode=METRIC_MODE, metric=METRIC_ATTR, random_seed=random_seed
)
if "AWS_DEFAULT_REGION" not in os.environ:
os.environ["AWS_DEFAULT_REGION"] = "us-west-2"
trial_backend = SageMakerBackend(
# we tune a PyTorch Framework from Sagemaker
sm_estimator=PyTorch(
instance_type=DEFAULT_CPU_INSTANCE_SMALL,
instance_count=1,
framework_version=PYTORCH_LATEST_FRAMEWORK,
py_version=PYTORCH_LATEST_PY_VERSION,
entry_point=str(entry_point),
role=get_execution_role(),
max_run=10 * 60,
sagemaker_session=default_sagemaker_session(),
disable_profiler=True,
debugger_hook_config=False,
),
# names of metrics to track. Each metric will be detected by Sagemaker if it is written in the
# following form: "[RMSE]: 1.2", see in train_main_example how metrics are logged for an example
metrics_names=[METRIC_ATTR],
)
stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
sleep_time=5.0,
tuner_name="hpo-hyperband",
)
tuner.run()
Requirements:
Access to AWS SageMaker. More details are provided in this tutorial.
This example can be sped up by using SageMaker managed warm pools, as in this example.
Makes use of train_height.py.
SageMaker Backend and Checkpointing
import logging
from pathlib import Path
from sagemaker.pytorch import PyTorch
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
get_execution_role,
default_sagemaker_session,
)
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
RESOURCE_ATTR,
)
from syne_tune.optimizer.baselines import ASHA
from syne_tune.remote.constants import (
DEFAULT_CPU_INSTANCE_SMALL,
PYTORCH_LATEST_FRAMEWORK,
PYTORCH_LATEST_PY_VERSION,
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
max_steps = 100
n_workers = 4
delete_checkpoints = True
max_wallclock_time = 5 * 60
config_space = {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
}
entry_point = (
Path(__file__).parent
/ "training_scripts"
/ "checkpoint_example"
/ "train_height_checkpoint.py"
)
# ASHA promotion
scheduler = ASHA(
config_space,
metric=METRIC_ATTR,
mode=METRIC_MODE,
max_resource_attr=MAX_RESOURCE_ATTR,
resource_attr=RESOURCE_ATTR,
type="promotion",
search_options={"debug_log": True},
)
# SageMaker backend: We use the warm pool feature here
trial_backend = SageMakerBackend(
sm_estimator=PyTorch(
instance_type=DEFAULT_CPU_INSTANCE_SMALL,
instance_count=1,
framework_version=PYTORCH_LATEST_FRAMEWORK,
py_version=PYTORCH_LATEST_PY_VERSION,
entry_point=str(entry_point),
role=get_execution_role(),
max_run=10 * 60,
sagemaker_session=default_sagemaker_session(),
disable_profiler=True,
debugger_hook_config=False,
keep_alive_period_in_seconds=60, # warm pool feature
),
metrics_names=[METRIC_ATTR],
delete_checkpoints=delete_checkpoints,
)
stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
sleep_time=5.0,
tuner_name="height-sagemaker-checkpoints",
start_jobs_without_delay=False,
)
tuner.run()
Requirements:
This launcher script is using the following train_height_checkpoint.py training script:
import logging
import time
from typing import Optional, Dict, Any
import json
from pathlib import Path
import os
import numpy as np
from syne_tune import Reporter
from argparse import ArgumentParser
from syne_tune.config_space import randint
from syne_tune.constants import ST_CHECKPOINT_DIR
report = Reporter()
RESOURCE_ATTR = "epoch"
METRIC_ATTR = "mean_loss"
METRIC_MODE = "min"
MAX_RESOURCE_ATTR = "steps"
def load_checkpoint(checkpoint_path: Path) -> Dict[str, Any]:
with open(checkpoint_path, "r") as f:
return json.load(f)
def save_checkpoint(checkpoint_path: Path, epoch: int, value: float):
os.makedirs(checkpoint_path.parent, exist_ok=True)
with open(checkpoint_path, "w") as f:
json.dump({"epoch": epoch, "value": value}, f)
def train_height_delta(step: int, width: float, height: float, value: float) -> float:
"""
For the original example, we have that
.. math::
f(t + 1) - f(t) = f(t) \cdot \frac{w}{10 + w \cdot t},
f(0) = 10 + h / 10
We implement an incremental version with a stochastic term.
:param step: Step t, nonnegative int
:param width: Width w, nonnegative
:param height: Height h
:param value: Value :math:`f(t - 1)` if :math:`t > 0`
:return: New value :math:`f(t)`
"""
u = 1.0 - 0.1 * np.random.rand() # uniform(0.9, 1) multiplier
if step == 0:
return u * 10 + 0.1 * height
else:
return value * (1.0 + u * width / (width * (step - 1) + 10))
def height_config_space(
max_steps: int, sleep_time: Optional[float] = None
) -> Dict[str, Any]:
kwargs = {"sleep_time": sleep_time} if sleep_time is not None else dict()
return {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
**kwargs,
}
if __name__ == "__main__":
root = logging.getLogger()
root.setLevel(logging.INFO)
parser = ArgumentParser()
parser.add_argument("--" + MAX_RESOURCE_ATTR, type=int)
parser.add_argument("--width", type=float)
parser.add_argument("--height", type=float)
parser.add_argument("--sleep_time", type=float, default=0.1)
parser.add_argument(f"--{ST_CHECKPOINT_DIR}", type=str)
args, _ = parser.parse_known_args()
width = args.width
height = args.height
checkpoint_dir = getattr(args, ST_CHECKPOINT_DIR)
num_steps = getattr(args, MAX_RESOURCE_ATTR)
start_step = 0
value = 0.0
if checkpoint_dir is not None:
checkpoint_path = Path(checkpoint_dir) / "checkpoint.json"
if checkpoint_path.exists():
state = load_checkpoint(checkpoint_path)
start_step = state["epoch"]
value = state["value"]
else:
checkpoint_path = None
for step in range(start_step, num_steps):
# Sleep first, since results are returned at end of "epoch"
time.sleep(args.sleep_time)
# Feed the score back to Syne Tune.
value = train_height_delta(step, width, height, value)
epoch = step + 1
if checkpoint_path is not None:
save_checkpoint(checkpoint_path, epoch, value)
report(
**{
"step": step,
METRIC_ATTR: value,
RESOURCE_ATTR: epoch,
}
)
Note that SageMakerBackend
is configured to use
SageMaker managed warm pools:
keep_alive_period_in_seconds=300
in the definition of the SageMaker estimatorstart_jobs_without_delay=False
when creatingTuner
Managed warm pools reduce both start-up and stop delays substantially, they are strongly recommended for multi-fidelity HPO with the SageMaker backend. More details are found in this tutorial.
Retrieving the Best Checkpoint
"""
An example showing how to retrieve the best checkpoint of an XGBoost model.
The script being tuned ``xgboost_checkpoint.py`` stores the checkpoint obtained after each trial evaluation.
After the tuning is done, this example loads the best checkpoint and evaluate the model.
"""
import logging
from pathlib import Path
from examples.training_scripts.xgboost.xgboost_checkpoint import evaluate_accuracy
from syne_tune.backend import LocalBackend
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import BayesianOptimization
from syne_tune import Tuner, StoppingCriterion
import syne_tune.config_space as cs
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
n_workers = 4
config_space = {
"max_depth": cs.randint(2, 5),
"gamma": cs.uniform(1, 9),
"reg_lambda": cs.loguniform(1e-6, 1),
"n_estimators": cs.randint(1, 10),
}
entry_point = (
Path(__file__).parent / "training_scripts" / "xgboost" / "xgboost_checkpoint.py"
)
trial_backend = LocalBackend(entry_point=str(entry_point))
tuner = Tuner(
trial_backend=trial_backend,
scheduler=BayesianOptimization(config_space, metric="merror", mode="min"),
stop_criterion=StoppingCriterion(max_wallclock_time=10),
n_workers=n_workers,
)
tuner.run()
exp = load_experiment(tuner.name)
best_config = exp.best_config()
checkpoint = trial_backend.checkpoint_trial_path(best_config["trial_id"])
assert checkpoint.exists()
print(f"Best config found {best_config} checkpointed at {checkpoint}")
print(
f"Retrieve best checkpoint and evaluate accuracy of best model: "
f"found {evaluate_accuracy(checkpoint_dir=checkpoint)}"
)
This launcher script is using the following xgboost_checkpoint.py training script:
import os
from argparse import ArgumentParser
from pathlib import Path
import numpy as np
import xgboost
from sklearn.datasets import load_digits
from syne_tune import Reporter
from syne_tune.constants import ST_CHECKPOINT_DIR
class SyneTuneCallback(xgboost.callback.TrainingCallback):
def __init__(self, error_metric: str) -> None:
self.reporter = Reporter()
self.error_metric = error_metric
def after_iteration(self, model, epoch, evals_log):
metrics = list(evals_log.values())[-1][self.error_metric]
self.reporter(**{self.error_metric: metrics[-1]})
pass
def train(
checkpoint_dir: str,
n_estimators: int,
max_depth: int,
gamma: float,
reg_lambda: float,
early_stopping_rounds: int = 5,
) -> None:
eval_metric = "merror"
early_stop = xgboost.callback.EarlyStopping(
rounds=early_stopping_rounds, save_best=True
)
X, y = load_digits(return_X_y=True)
clf = xgboost.XGBClassifier(
n_estimators=n_estimators,
reg_lambda=reg_lambda,
gamma=gamma,
max_depth=max_depth,
eval_metric=eval_metric,
callbacks=[early_stop, SyneTuneCallback(error_metric=eval_metric)],
)
clf.fit(
X,
y,
eval_set=[(X, y)],
)
print("Total boosted rounds:", clf.get_booster().num_boosted_rounds())
save_model(clf, checkpoint_dir=checkpoint_dir)
def save_model(clf, checkpoint_dir):
checkpoint_dir.mkdir(parents=True, exist_ok=True)
path = os.path.join(checkpoint_dir, "model.json")
clf.save_model(path)
def load_model(checkpoint_dir):
path = os.path.join(checkpoint_dir, "model.json")
loaded = xgboost.XGBClassifier()
loaded.load_model(path)
return loaded
def evaluate_accuracy(checkpoint_dir):
X, y = load_digits(return_X_y=True)
clf = load_model(checkpoint_dir=checkpoint_dir)
y_pred = clf.predict(X)
return (np.equal(y, y_pred) * 1.0).mean()
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument("--max_depth", type=int, required=False, default=1)
parser.add_argument("--gamma", type=float, required=False, default=2)
parser.add_argument("--reg_lambda", type=float, required=False, default=0.001)
parser.add_argument("--n_estimators", type=int, required=False, default=10)
parser.add_argument(f"--{ST_CHECKPOINT_DIR}", type=str, default="./")
args, _ = parser.parse_known_args()
checkpoint_dir = Path(vars(args)[ST_CHECKPOINT_DIR])
train(
checkpoint_dir=checkpoint_dir,
max_depth=args.max_depth,
gamma=args.gamma,
reg_lambda=args.reg_lambda,
n_estimators=args.n_estimators,
)
Launch with SageMaker Backend and Custom Docker Image
"""
Example showing how to run on Sagemaker with a custom docker image.
"""
import logging
from pathlib import Path
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.custom_framework import CustomFramework
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
get_execution_role,
default_sagemaker_session,
)
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
)
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune.remote.constants import DEFAULT_CPU_INSTANCE_SMALL
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
max_steps = 100
n_workers = 4
config_space = {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
}
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
# Random search without stopping
scheduler = RandomSearch(
config_space, mode=METRIC_MODE, metric=METRIC_ATTR, random_seed=random_seed
)
# indicate here an image_uri that is available in ecr, something like that "XXXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/my_image:latest"
image_uri = ...
trial_backend = SageMakerBackend(
sm_estimator=CustomFramework(
entry_point=entry_point,
instance_type=DEFAULT_CPU_INSTANCE_SMALL,
instance_count=1,
role=get_execution_role(),
image_uri=image_uri,
max_run=10 * 60,
job_name_prefix="hpo-hyperband",
sagemaker_session=default_sagemaker_session(),
disable_profiler=True,
debugger_hook_config=False,
),
# names of metrics to track. Each metric will be detected by Sagemaker if it is written in the
# following form: "[RMSE]: 1.2", see in train_main_example how metrics are logged for an example
metrics_names=[METRIC_ATTR],
)
stop_criterion = StoppingCriterion(max_wallclock_time=600)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
sleep_time=5.0,
)
tuner.run()
Requirements:
This example is incomplete. If your training script has dependencies which you would to provide as a Docker image, you need to upload it to ECR, after which you can refer to it with
image_uri
.
Makes use of train_height.py.
Launch Experiments Remotely on SageMaker
"""
This example show how to launch a tuning job that will be executed on Sagemaker rather than on your local machine.
"""
import logging
from pathlib import Path
from argparse import ArgumentParser
from sagemaker.pytorch import PyTorch
from syne_tune import StoppingCriterion, Tuner
from syne_tune.backend import LocalBackend
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
get_execution_role,
default_sagemaker_session,
)
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
)
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune.remote.constants import (
DEFAULT_CPU_INSTANCE_SMALL,
PYTORCH_LATEST_FRAMEWORK,
PYTORCH_LATEST_PY_VERSION,
)
from syne_tune.remote.remote_launcher import RemoteLauncher
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
parser = ArgumentParser()
parser.add_argument("--use_sagemaker_backend", type=int, default=0)
args = parser.parse_args()
use_sagemaker_backend = bool(args.use_sagemaker_backend)
max_steps = 100
n_workers = 4
config_space = {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
}
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
# We can use the local or sagemaker backend when tuning remotely.
# Using the local backend means that the remote instance will evaluate the trials locally.
# Using the sagemaker backend means the remote instance will launch one sagemaker job per trial.
if use_sagemaker_backend:
trial_backend = SageMakerBackend(
sm_estimator=PyTorch(
instance_type=DEFAULT_CPU_INSTANCE_SMALL,
instance_count=1,
framework_version=PYTORCH_LATEST_FRAMEWORK,
py_version=PYTORCH_LATEST_PY_VERSION,
entry_point=entry_point,
role=get_execution_role(),
max_run=10 * 60,
base_job_name="hpo-height",
sagemaker_session=default_sagemaker_session(),
disable_profiler=True,
debugger_hook_config=False,
),
)
else:
trial_backend = LocalBackend(entry_point=entry_point)
num_seeds = 1 if use_sagemaker_backend else 2
for seed in range(num_seeds):
# Random search without stopping
scheduler = RandomSearch(
config_space, mode=METRIC_MODE, metric=METRIC_ATTR, random_seed=seed
)
tuner = RemoteLauncher(
tuner=Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
n_workers=n_workers,
tuner_name="height-tuning",
stop_criterion=StoppingCriterion(max_wallclock_time=600),
),
# Extra arguments describing the resource of the remote tuning instance and whether we want to wait
# the tuning to finish. The instance-type where the tuning job runs can be different than the
# instance-type used for evaluating the training jobs.
instance_type=DEFAULT_CPU_INSTANCE_SMALL,
# We can specify a custom container to use with this launcher with <image_uri=TK>
# otherwise a sagemaker pre-build will be used
)
tuner.run(wait=False)
Requirements:
Makes use of train_height.py.
This launcher script starts the HPO experiment as SageMaker training job, which allows you to select any instance type you like, while not having your local machine being blocked. This tutorial explains how to run many such remote experiments in parallel, so to speed up comparisons between alternatives.
Launch HPO Experiment with Home-Made Scheduler
"""
Example showing how to implement a new Scheduler.
"""
import logging
from pathlib import Path
from typing import Optional, List, Dict, Any
import numpy as np
from syne_tune.backend import LocalBackend
from syne_tune.backend.trial_status import Trial
from syne_tune.optimizer.scheduler import (
TrialScheduler,
SchedulerDecision,
TrialSuggestion,
)
from syne_tune.tuner import Tuner
from syne_tune.stopping_criterion import StoppingCriterion
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
)
class SimpleScheduler(TrialScheduler):
def __init__(
self, config_space: Dict[str, Any], metric: str, mode: Optional[str] = None
):
super(SimpleScheduler, self).__init__(config_space=config_space)
self.metric = metric
self.mode = mode if mode is not None else "min"
self.sorted_results = []
def _suggest(self, trial_id: int) -> Optional[TrialSuggestion]:
# Called when a slot exists to run a trial, here we simply draw a
# random candidate.
config = {
k: v.sample() if hasattr(v, "sample") else v
for k, v in self.config_space.items()
}
return TrialSuggestion.start_suggestion(config)
def on_trial_result(self, trial: Trial, result: Dict[str, Any]) -> str:
# Given a new result, we decide whether the trial should stop or continue.
# In this case, we implement a naive strategy that stops if the result is worse than 80% of previous results.
# This is a naive strategy as we do not account for the fact that trial improves with more steps.
new_metric = result[self.metric]
# insert new metric in sorted results
index = np.searchsorted(self.sorted_results, new_metric)
self.sorted_results = np.insert(self.sorted_results, index, new_metric)
normalized_rank = index / float(len(self.sorted_results))
if self.mode == "max":
normalized_rank = 1 - normalized_rank
if normalized_rank < 0.8:
return SchedulerDecision.CONTINUE
else:
logging.info(
f"see new results {new_metric} which rank {normalized_rank * 100}%, "
f"stopping it as it does not rank on the top 80%"
)
return SchedulerDecision.STOP
def metric_names(self) -> List[str]:
return [self.metric]
if __name__ == "__main__":
logging.getLogger().setLevel(logging.DEBUG)
random_seed = 31415927
max_steps = 100
n_workers = 4
config_space = {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
}
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
# Local backend
trial_backend = LocalBackend(entry_point=entry_point)
np.random.seed(random_seed)
scheduler = SimpleScheduler(
config_space=config_space, metric=METRIC_ATTR, mode=METRIC_MODE
)
stop_criterion = StoppingCriterion(max_wallclock_time=20)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
)
tuner.run()
Makes use of train_height.py.
For a more thorough introduction on how to develop new schedulers and searchers in Syne Tune, consider this tutorial.
Launch HPO Experiment on mlp_fashionmnist Benchmark
"""
Example for how to tune one of the benchmarks.
"""
import logging
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import HyperbandScheduler
from syne_tune import Tuner, StoppingCriterion
from benchmarking.benchmark_definitions.mlp_on_fashionmnist import (
mlp_fashionmnist_benchmark,
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.DEBUG)
# We pick the MLP on FashionMNIST benchmark
# The 'benchmark' dict contains arguments needed by scheduler and
# searcher (e.g., 'mode', 'metric'), along with suggested default values
# for other arguments (which you are free to override)
random_seed = 31415927
n_workers = 4
benchmark = mlp_fashionmnist_benchmark()
# If you don't like the default config_space, change it here. But let
# us use the default
config_space = benchmark.config_space
# Local backend
trial_backend = LocalBackend(entry_point=str(benchmark.script))
# GP-based Bayesian optimization searcher. Many options can be specified
# via ``search_options``, but let's use the defaults
searcher = "bayesopt"
search_options = {"num_init_random": n_workers + 2}
# Hyperband (or successive halving) scheduler of the stopping type.
# Together with 'bayesopt', this selects the MOBSTER algorithm.
# If you don't like the defaults suggested, just change them:
scheduler = HyperbandScheduler(
config_space,
searcher=searcher,
search_options=search_options,
max_resource_attr=benchmark.max_resource_attr,
resource_attr=benchmark.resource_attr,
mode=benchmark.mode,
metric=benchmark.metric,
random_seed=random_seed,
)
stop_criterion = StoppingCriterion(max_wallclock_time=120)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
)
tuner.run()
Requirements:
Needs “mlp_fashionmnist” benchmark, which requires Syne Tune to have been installed from source.
In this example, we tune one of the built-in benchmark problems, which is useful in order to compare different HPO methods. More details on benchmarking is provided in this tutorial.
Transfer Tuning on NASBench-201
from typing import Dict
from syne_tune.blackbox_repository import load_blackbox, BlackboxRepositoryBackend
from syne_tune.backend.simulator_backend.simulator_callback import SimulatorCallback
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.schedulers import FIFOScheduler
from syne_tune.optimizer.schedulers.transfer_learning import (
TransferLearningTaskEvaluations,
BoundingBox,
)
from syne_tune import StoppingCriterion, Tuner
def load_transfer_learning_evaluations(
blackbox_name: str, test_task: str, metric: str
) -> Dict[str, TransferLearningTaskEvaluations]:
bb_dict = load_blackbox(blackbox_name)
metric_index = [
i
for i, name in enumerate(bb_dict[test_task].objectives_names)
if name == metric
][0]
transfer_learning_evaluations = {
task: TransferLearningTaskEvaluations(
hyperparameters=bb.hyperparameters,
configuration_space=bb.configuration_space,
objectives_evaluations=bb.objectives_evaluations[
..., metric_index : metric_index + 1
],
objectives_names=[metric],
)
for task, bb in bb_dict.items()
if task != test_task
}
return transfer_learning_evaluations
if __name__ == "__main__":
blackbox_name = "nasbench201"
test_task = "cifar100"
elapsed_time_attr = "metric_elapsed_time"
metric = "metric_valid_error"
bb_dict = load_blackbox(blackbox_name)
transfer_learning_evaluations = load_transfer_learning_evaluations(
blackbox_name, test_task, metric
)
scheduler = BoundingBox(
scheduler_fun=lambda new_config_space, mode, metric: FIFOScheduler(
new_config_space,
points_to_evaluate=[],
searcher="random",
metric=metric,
mode=mode,
),
mode="min",
config_space=bb_dict[test_task].configuration_space,
metric=metric,
num_hyperparameters_per_task=10,
transfer_learning_evaluations=transfer_learning_evaluations,
)
stop_criterion = StoppingCriterion(max_wallclock_time=7200)
trial_backend = BlackboxRepositoryBackend(
blackbox_name=blackbox_name,
elapsed_time_attr=elapsed_time_attr,
dataset=test_task,
)
# It is important to set ``sleep_time`` to 0 here (mandatory for simulator backend)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=4,
sleep_time=0,
# This callback is required in order to make things work with the
# simulator callback. It makes sure that results are stored with
# simulated time (rather than real time), and that the time_keeper
# is advanced properly whenever the tuner loop sleeps
callbacks=[SimulatorCallback()],
)
tuner.run()
tuning_experiment = load_experiment(tuner.name)
print(tuning_experiment)
print(f"best result found: {tuning_experiment.best_config()}")
tuning_experiment.plot()
Requirements:
Syne Tune dependencies
blackbox-repository
need to be installed.Needs
nasbench201
blackbox to be downloaded and preprocessed. This can take quite a while when done for the first timeIf AWS SageMaker is used or an S3 bucket is accessible, the blackbox files are uploaded to your S3 bucket
In this example, we use the simulator backend with the NASBench-201 blackbox. It serves as a simple demonstration how evaluations from related tasks can be used to speed up HPO.
Transfer Learning Example
"""
Example collecting evaluations and using them for transfer learning on a
related task.
"""
from examples.training_scripts.height_example.train_height import (
height_config_space,
METRIC_ATTR,
METRIC_MODE,
)
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.baselines import BayesianOptimization, ZeroShotTransfer
from syne_tune.optimizer.schedulers import FIFOScheduler
from syne_tune.optimizer.schedulers.transfer_learning import (
TransferLearningTaskEvaluations,
BoundingBox,
)
from syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher import (
QuantileBasedSurrogateSearcher,
)
import argparse
import copy
import numpy as np
from pathlib import Path
def add_labels(ax, conf_space, title):
ax.legend()
ax.set_xlabel("width")
ax.set_ylabel("height")
ax.set_xlim([conf_space["width"].lower - 1, conf_space["width"].upper + 1])
ax.set_ylim([conf_space["height"].lower - 10, conf_space["height"].upper + 10])
ax.set_title(title)
def scatter_space_exploration(ax, task_hyps, max_trials, label, color=None):
ax.scatter(
task_hyps["width"][:max_trials],
task_hyps["height"][:max_trials],
alpha=0.4,
label=label,
color=color,
)
colours = {
"BayesianOptimization": "C0",
"BoundingBox": "C1",
"ZeroShotTransfer": "C2",
"Quantiles": "C3",
}
def plot_last_task(max_trials, df, label, metric, color):
max_tr = min(max_trials, len(df))
plt.scatter(range(max_tr), df[metric][:max_tr], label=label, color=color)
plt.plot([np.min(df[metric][:ii]) for ii in range(1, max_trials + 1)], color=color)
def filter_completed(df):
# Filter out runs that didn't finish
return df[df["status"] == "Completed"].reset_index()
def extract_transferable_evaluations(df, metric, config_space):
"""
Take a dataframe from a tuner run, filter it and generate
TransferLearningTaskEvaluations from it
"""
filter_df = filter_completed(df)
return TransferLearningTaskEvaluations(
configuration_space=config_space,
hyperparameters=filter_df[config_space.keys()],
objectives_names=[metric],
# objectives_evaluations need to be of shape
# (num_evals, num_seeds, num_fidelities, num_objectives)
# We only have one seed, fidelity and objective
objectives_evaluations=np.array(filter_df[metric], ndmin=4).T,
)
def run_scheduler_on_task(entry_point, scheduler, max_trials):
"""
Take a scheduler and run it for max_trials on the backend specified by entry_point
Return a dataframe of the optimisation results
"""
tuner = Tuner(
trial_backend=LocalBackend(entry_point=str(entry_point)),
scheduler=scheduler,
stop_criterion=StoppingCriterion(max_num_trials_finished=max_trials),
n_workers=4,
sleep_time=0.001,
)
tuner.run()
return tuner.tuning_status.get_dataframe()
def init_scheduler(
scheduler_str, max_steps, seed, mode, metric, transfer_learning_evaluations
):
"""
Initialise the scheduler
"""
kwargs = {
"metric": metric,
"config_space": height_config_space(max_steps=max_steps),
"mode": mode,
"random_seed": seed,
}
kwargs_w_trans = copy.deepcopy(kwargs)
kwargs_w_trans["transfer_learning_evaluations"] = transfer_learning_evaluations
if scheduler_str == "BayesianOptimization":
return BayesianOptimization(**kwargs)
if scheduler_str == "ZeroShotTransfer":
return ZeroShotTransfer(use_surrogates=True, **kwargs_w_trans)
if scheduler_str == "Quantiles":
return FIFOScheduler(
searcher=QuantileBasedSurrogateSearcher(**kwargs_w_trans),
**kwargs,
)
if scheduler_str == "BoundingBox":
kwargs_sched_fun = {key: kwargs[key] for key in kwargs if key != "config_space"}
kwargs_w_trans[
"scheduler_fun"
] = lambda new_config_space, mode, metric: BayesianOptimization(
new_config_space,
**kwargs_sched_fun,
)
del kwargs_w_trans["random_seed"]
return BoundingBox(**kwargs_w_trans)
raise ValueError("scheduler_str not recognised")
if __name__ == "__main__":
max_trials = 10
np.random.seed(1)
# Use train_height backend for our tests
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
# Collect evaluations on preliminary tasks
transfer_learning_evaluations = {}
for max_steps in range(1, 6):
scheduler = init_scheduler(
"BayesianOptimization",
max_steps=max_steps,
seed=np.random.randint(100),
mode=METRIC_MODE,
metric=METRIC_ATTR,
transfer_learning_evaluations=None,
)
print("Optimising preliminary task %s" % max_steps)
prev_task = run_scheduler_on_task(entry_point, scheduler, max_trials)
# Generate TransferLearningTaskEvaluations from previous task
transfer_learning_evaluations[max_steps] = extract_transferable_evaluations(
prev_task, METRIC_ATTR, scheduler.config_space
)
# Collect evaluations on transfer task
max_steps = 6
transfer_task_results = {}
labels = ["BayesianOptimization", "BoundingBox", "ZeroShotTransfer", "Quantiles"]
for scheduler_str in labels:
scheduler = init_scheduler(
scheduler_str,
max_steps=max_steps,
seed=max_steps,
mode=METRIC_MODE,
metric=METRIC_ATTR,
transfer_learning_evaluations=transfer_learning_evaluations,
)
print("Optimising transfer task using %s" % scheduler_str)
transfer_task_results[scheduler_str] = run_scheduler_on_task(
entry_point, scheduler, max_trials
)
# Optionally generate plots. Defaults to False
parser = argparse.ArgumentParser()
parser.add_argument(
"--generate_plots", action="store_true", help="generate optimisation plots."
)
args = parser.parse_args()
if args.generate_plots:
from syne_tune.try_import import try_import_visual_message
try:
import matplotlib.pyplot as plt
except ImportError:
print(try_import_visual_message())
print("Generating optimisation plots.")
""" Plot the results on the transfer task """
for label in labels:
plot_last_task(
max_trials,
transfer_task_results[label],
label=label,
metric=METRIC_ATTR,
color=colours[label],
)
plt.legend()
plt.ylabel(METRIC_ATTR)
plt.xlabel("Iteration")
plt.title("Transfer task (max_steps=6)")
plt.savefig("Transfer_task.png", bbox_inches="tight")
""" Plot the configs tried for the preliminary tasks """
fig, ax = plt.subplots()
for key in transfer_learning_evaluations:
scatter_space_exploration(
ax,
transfer_learning_evaluations[key].hyperparameters,
max_trials,
"Task %s" % key,
)
add_labels(
ax,
scheduler.config_space,
"Explored locations of BO for preliminary tasks",
)
plt.savefig("Configs_explored_preliminary.png", bbox_inches="tight")
""" Plot the configs tried for the transfer task """
fig, ax = plt.subplots()
# Plot the configs tried by the different schedulers on the transfer task
for label in labels:
finished_trials = filter_completed(transfer_task_results[label])
scatter_space_exploration(
ax, finished_trials, max_trials, label, color=colours[label]
)
# Plot the first config tested as a big square
ax.scatter(
finished_trials["width"][0],
finished_trials["height"][0],
marker="s",
color=colours[label],
s=100,
)
# Plot the optima from the preliminary tasks as black crosses
past_label = "Preliminary optima"
for key in transfer_learning_evaluations:
argmin = np.argmin(
transfer_learning_evaluations[key].objective_values(METRIC_ATTR)[
:max_trials, 0, 0
]
)
ax.scatter(
transfer_learning_evaluations[key].hyperparameters["width"][argmin],
transfer_learning_evaluations[key].hyperparameters["height"][argmin],
color="k",
marker="x",
label=past_label,
)
past_label = None
add_labels(ax, scheduler.config_space, "Explored locations for transfer task")
plt.savefig("Configs_explored_transfer.png", bbox_inches="tight")
Requirements:
Needs
matplotlib
to be installed if the plotting flag is given:pip install matplotlib
. If you installed Syne Tune withvisual
orextra
, this dependence is included.
An example of how to use evaluations collected in Syne Tune to run a transfer
learning scheduler. Makes use of train_height.py.
Used in the
transfer learning tutorial.
To plot the figures, run as
python launch_transfer_learning_example.py --generate_plots
.
Plot Results of Tuning Experiment
import logging
from pathlib import Path
from syne_tune.backend import LocalBackend
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune import Tuner, StoppingCriterion
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
max_steps = 100
n_workers = 4
config_space = {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
}
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
trial_backend = LocalBackend(entry_point=entry_point)
# Random search without stopping
scheduler = RandomSearch(
config_space, mode=METRIC_MODE, metric=METRIC_ATTR, random_seed=random_seed
)
stop_criterion = StoppingCriterion(max_wallclock_time=20)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
n_workers=n_workers,
stop_criterion=stop_criterion,
results_update_interval=5,
tuner_name="plot-results-demo",
metadata={"description": "just an example"},
)
tuner.run()
# shows how to print the best configuration found from the tuner and retrains it
trial_id, best_config = tuner.best_config()
tuning_experiment = load_experiment(tuner.name)
# prints the best configuration found from experiment-results
print(f"best result found: {tuning_experiment.best_config()}")
# plots the best metric over time
tuning_experiment.plot()
# plots values found by all trials over time
tuning_experiment.plot_trials_over_time()
Requirements:
Needs
matplotlib
to be installed:pip install matplotlib
. If you installed Syne Tune withvisual
orextra
, this dependence is included.
Makes use of train_height.py.
Resume a Tuning Job
from syne_tune.config_space import randint
import shutil
from pathlib import Path
from syne_tune import StoppingCriterion
from syne_tune import Tuner
from syne_tune.backend import LocalBackend
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import ASHA
from syne_tune.optimizer.schedulers.searchers.utils import make_hyperparameter_ranges
from syne_tune.util import random_string
def launch_first_tuning(experiment_name: str):
max_epochs = 100
metric = "mean_loss"
mode = "min"
config_space = {
"steps": max_epochs,
"width": randint(0, 10),
"height": randint(0, 10),
}
entry_point = (
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
scheduler = ASHA(
config_space=config_space,
metric=metric,
mode=mode,
max_t=max_epochs,
search_options={"allow_duplicates": True},
resource_attr="epoch",
)
trial_backend = LocalBackend(entry_point=str(entry_point))
stop_criterion = StoppingCriterion(
max_num_trials_started=10,
)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=4,
tuner_name=experiment_name,
suffix_tuner_name=False,
)
tuner.run()
if __name__ == "__main__":
experiment_name = f"resume-tuning-example-{random_string(5)}"
# Launch a tuning, tuning results and checkpoints are written to disk
launch_first_tuning(experiment_name)
# Later loads an experiment from disk given the experiment name,
# in particular sets `load_tuner` to True to deserialize the Tuner
tuning_experiment = load_experiment(experiment_name, load_tuner=True)
# Copy the tuner as it will be modified when retuning
shutil.copy(
tuning_experiment.path / "tuner.dill",
tuning_experiment.path / "tuner-backup.dill",
)
# Update stop criterion to run the tuning a couple more trials than before
tuning_experiment.tuner.stop_criterion = StoppingCriterion(
max_num_trials_started=20
)
# Define a new config space for instance favoring a new part of the space based on data analysis
new_config_space = {
"steps": 100,
"width": randint(10, 20),
"height": randint(1, 10),
}
# Update scheduler with random searcher to use new configuration space,
# For now we modify internals, adding a method `update_config_space` to RandomSearcher would be a cleaner option.
tuning_experiment.tuner.scheduler.config_space = new_config_space
tuning_experiment.tuner.scheduler.searcher._hp_ranges = make_hyperparameter_ranges(
new_config_space
)
tuning_experiment.tuner.scheduler.searcher.configure_scheduler(
tuning_experiment.tuner.scheduler
)
# Resume the tuning with the modified search space and stopping criterion
# The scheduler will now explore the updated search space
tuning_experiment.tuner.run()
Customize Results Written during an Experiment
from typing import Dict, Any, Optional, List
from pathlib import Path
import logging
from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint
from syne_tune.constants import ST_TUNER_TIME
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import DyHPO
from syne_tune.optimizer.schedulers.searchers.dyhpo.hyperband_dyhpo import (
DyHPORungSystem,
)
from syne_tune.results_callback import ExtraResultsComposer, StoreResultsCallback
from syne_tune import Tuner, StoppingCriterion
# We would like to extract some extra information from the scheduler during the
# experiment. To this end, we implement a class for extracting this information
class DyHPOExtraResults(ExtraResultsComposer):
def __call__(self, tuner: Tuner) -> Optional[Dict[str, Any]]:
scheduler = tuner.scheduler
assert isinstance(scheduler, DyHPO) # sanity check
# :class:`~syne_tune.optimizer.schedulers.searchers.dyhpo.hyperband_dyhpo.DyHPORungSystem`
# collects statistics about how often several types of decisions were made in
# ``on_task_schedule``
return scheduler.terminator._rung_systems[0].summary_schedule_records()
def keys(self) -> List[str]:
return DyHPORungSystem.summary_schedule_keys()
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
max_epochs = 100
n_workers = 4
# Hyperparameter configuration space
config_space = {
"width": randint(1, 20),
"height": randint(1, 20),
"epochs": 100,
}
# We use the DyHPO scheduler, since it records some interesting extra
# informations
scheduler = DyHPO(
config_space,
metric="mean_loss",
resource_attr="epoch",
max_resource_attr="epochs",
search_options={"debug_log": False},
grace_period=2,
)
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height_simple.py"
)
# Extra results are stored by the
# :class:`~syne_tune.results_callback.StoreResultsCallback`. In fact, they
# are appended to the default time-stamped results whenever a report is
# received.
extra_results_composer = DyHPOExtraResults()
callbacks = [StoreResultsCallback(extra_results_composer=extra_results_composer)]
tuner = Tuner(
trial_backend=LocalBackend(entry_point=entry_point),
scheduler=scheduler,
stop_criterion=StoppingCriterion(max_wallclock_time=30),
n_workers=4, # how many trials are evaluated in parallel
callbacks=callbacks,
)
tuner.run()
# Let us have a look what was written. Here, we just look at the information
# at the end of the experiment
results_df = load_experiment(tuner.name).results
final_pos = results_df.loc[:, ST_TUNER_TIME].argmax()
final_row = dict(results_df.loc[final_pos])
extra_results_at_end = {
name: final_row[name] for name in extra_results_composer.keys()
}
print(f"\nExtra results at end of experiment:\n{extra_results_at_end}")
Makes use of train_height.py.
An example for how to append extra results to those written by default to
results.csv.zip
. This is done by customizing the
StoreResultsCallback
.
Pass Configuration as JSON File to Training Script
import os
import logging
from pathlib import Path
from argparse import ArgumentParser
from syne_tune.backend import LocalBackend, SageMakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
get_execution_role,
default_sagemaker_session,
)
from syne_tune.optimizer.baselines import (
ASHA,
)
from syne_tune import Tuner, StoppingCriterion
from syne_tune.remote.constants import (
DEFAULT_CPU_INSTANCE_SMALL,
PYTORCH_LATEST_FRAMEWORK,
PYTORCH_LATEST_PY_VERSION,
)
from examples.training_scripts.height_example.train_height_config_json import (
height_config_space,
RESOURCE_ATTR,
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
parser = ArgumentParser()
parser.add_argument("--use_sagemaker_backend", type=int, default=0)
args = parser.parse_args()
use_sagemaker_backend = bool(args.use_sagemaker_backend)
random_seed = 31415927
max_epochs = 100
n_workers = 4
max_wallclock_time = 5 * 60 if use_sagemaker_backend else 10
config_space = height_config_space(max_epochs)
entry_point = (
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height_config_json.py"
)
scheduler = ASHA(
config_space,
metric=METRIC_ATTR,
mode=METRIC_MODE,
max_resource_attr=MAX_RESOURCE_ATTR,
resource_attr=RESOURCE_ATTR,
)
if not use_sagemaker_backend:
trial_backend = LocalBackend(
entry_point=str(entry_point),
pass_args_as_json=True,
)
else:
from sagemaker.pytorch import PyTorch
import syne_tune
if "AWS_DEFAULT_REGION" not in os.environ:
os.environ["AWS_DEFAULT_REGION"] = "us-west-2"
trial_backend = SageMakerBackend(
sm_estimator=PyTorch(
entry_point=str(entry_point),
instance_type=DEFAULT_CPU_INSTANCE_SMALL,
instance_count=1,
framework_version=PYTORCH_LATEST_FRAMEWORK,
py_version=PYTORCH_LATEST_PY_VERSION,
role=get_execution_role(),
dependencies=syne_tune.__path__,
max_run=10 * 60,
sagemaker_session=default_sagemaker_session(),
disable_profiler=True,
debugger_hook_config=False,
keep_alive_period_in_seconds=60, # warm pool feature
),
metrics_names=[METRIC_ATTR],
pass_args_as_json=True,
)
stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
start_jobs_without_delay=False,
)
tuner.run()
Requirements:
If
use_sagemaker_backend = True
, needs access to AWS SageMaker.
Makes use of the following train_height_config_json.py training script:
import logging
import time
from typing import Optional, Dict, Any
from argparse import ArgumentParser
from syne_tune import Reporter
from syne_tune.config_space import randint
from syne_tune.utils import add_config_json_to_argparse, load_config_json
report = Reporter()
RESOURCE_ATTR = "epoch"
METRIC_ATTR = "mean_loss"
METRIC_MODE = "min"
MAX_RESOURCE_ATTR = "steps"
def train_height(step: int, width: float, height: float) -> float:
return 100 / (10 + width * step) + 0.1 * height
def height_config_space(
max_steps: int, sleep_time: Optional[float] = None
) -> Dict[str, Any]:
if sleep_time is None:
sleep_time = 0.1
return {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
"sleep_time": sleep_time,
"list_arg": ["this", "is", "a", "list", 1, 2, 3],
"dict_arg": {
"this": 27,
"is": [1, 2, 3],
"a": "dictionary",
"even": {
"a": 0,
"nested": 1,
"one": 2,
},
},
}
def _check_extra_args(config: Dict[str, Any]):
config_space = height_config_space(5)
for k in ("list_arg", "dict_arg"):
a, b = config[k], config_space[k]
assert a == b, (k, a, b)
if __name__ == "__main__":
root = logging.getLogger()
root.setLevel(logging.INFO)
parser = ArgumentParser()
# Append required argument(s):
add_config_json_to_argparse(parser)
args, _ = parser.parse_known_args()
# Loads config JSON and merges with ``args``
config = load_config_json(vars(args))
# Check that args with complex types have been received correctly
_check_extra_args(config)
width = config["width"]
height = config["height"]
sleep_time = config["sleep_time"]
num_steps = config[MAX_RESOURCE_ATTR]
for step in range(num_steps):
# Sleep first, since results are returned at end of "epoch"
time.sleep(sleep_time)
# Feed the score back to Syne Tune.
dummy_score = train_height(step, width, height)
report(
**{
"step": step,
METRIC_ATTR: dummy_score,
RESOURCE_ATTR: step + 1,
}
)
Speculative Early Checkpoint Removal
"""
Example for speculative checkpoint removal with asynchronous multi-fidelity
"""
from typing import Optional, Dict, Any, List
import logging
from syne_tune.backend import LocalBackend
from syne_tune.callbacks.hyperband_remove_checkpoints_callback import (
HyperbandRemoveCheckpointsCommon,
)
from syne_tune.constants import ST_TUNER_TIME
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import MOBSTER
from syne_tune.results_callback import ExtraResultsComposer, StoreResultsCallback
from syne_tune.util import find_first_of_type
from syne_tune import Tuner, StoppingCriterion
from benchmarking.benchmark_definitions.mlp_on_fashionmnist import (
mlp_fashionmnist_benchmark,
)
# This is used to monitor what the checkpoint removal mechanism is doing, and
# writing out results. This is optional, the mechanism works without this.
class CPRemovalExtraResults(ExtraResultsComposer):
def __call__(self, tuner: Tuner) -> Optional[Dict[str, Any]]:
callback = find_first_of_type(tuner.callbacks, HyperbandRemoveCheckpointsCommon)
return None if callback is None else callback.extra_results()
def keys(self) -> List[str]:
return HyperbandRemoveCheckpointsCommon.extra_results_keys()
if __name__ == "__main__":
logging.getLogger().setLevel(logging.DEBUG)
random_seed = 31415927
n_workers = 4
max_num_checkpoints = 10
# This time may be too short to see positive effects:
max_wallclock_time = 1800
# Monitor how checkpoint removal is doing over time, appending this
# information to results.csv.zip?
monitor_cp_removal_in_results = True
# We pick the MLP on FashionMNIST benchmark
benchmark = mlp_fashionmnist_benchmark()
# Local backend
# By setting ``delete_checkpoints=True``, we ask for checkpoints to be removed
# once a trial cannot be resumed anymore
trial_backend = LocalBackend(
entry_point=str(benchmark.script),
delete_checkpoints=True,
)
# MOBSTER (model-based ASHA) with promotion scheduling (pause and resume).
# Checkpoints are written for each paused trial, and these are not removed,
# because in principle, every paused trial may be resumed in the future.
# If checkpoints are large, this may fill up your disk.
# Here, we use speculative checkpoint removal to keep the number of checkpoints
# to at most ``max_num_checkpoints``. To this end, paused trials are ranked by
# expected cost of removing their checkpoint.
scheduler = MOBSTER(
benchmark.config_space,
type="promotion",
max_resource_attr=benchmark.max_resource_attr,
resource_attr=benchmark.resource_attr,
mode=benchmark.mode,
metric=benchmark.metric,
random_seed=random_seed,
early_checkpoint_removal_kwargs=dict(
max_num_checkpoints=max_num_checkpoints,
),
)
stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
# The tuner activates early checkpoint removal iff
# ``trial_backend.delete_checkpoints``. In this case, it requests details
# from the scheduler (which is ``early_checkpoint_removal_kwargs`` in our
# case). Early checkpoint removal is done by appending a callback to those
# normally used with the tuner.
if monitor_cp_removal_in_results:
# We can monitor how well checkpoint removal is working by storing
# extra results (this is optional)
extra_results_composer = CPRemovalExtraResults()
callbacks = [
StoreResultsCallback(extra_results_composer=extra_results_composer)
]
else:
extra_results_composer = None
callbacks = None
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
callbacks=callbacks,
)
tuner.run()
if monitor_cp_removal_in_results:
# We have monitored how checkpoint removal has been doing over time. Here,
# we just look at the information at the end of the experiment
results_df = load_experiment(tuner.name).results
final_pos = results_df.loc[:, ST_TUNER_TIME].argmax()
final_row = dict(results_df.loc[final_pos])
extra_results_at_end = {
name: final_row[name] for name in extra_results_composer.keys()
}
logging.info(f"Extra results at end of experiment:\n{extra_results_at_end}")
# We can obtain additional details from the callback, which is the last one
# in ``tuner``
callback = find_first_of_type(tuner.callbacks, HyperbandRemoveCheckpointsCommon)
trials_resumed = callback.trials_resumed_without_checkpoint()
if trials_resumed:
logging.info(
f"The following {len(trials_resumed)} trials were resumed without a checkpoint:\n{trials_resumed}"
)
else:
logging.info("No trials were resumed without a checkpoint")
Requirements:
Needs “mlp_fashionmnist” benchmark, which requires Syne Tune to have been installed from source.
This example uses the mlp_fashionmnist
benchmark. It runs for about 30
minutes. It demonstrates speculative early checkpoint removal for MOBSTER
with promotion scheduling (pause and resume).
Launch HPO Experiment with Ray Tune Scheduler
import logging
from pathlib import Path
from ray.tune.schedulers import AsyncHyperBandScheduler
from ray.tune.search.skopt import SkOptSearch
import numpy as np
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import RayTuneScheduler
from syne_tune import Tuner, StoppingCriterion
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
RESOURCE_ATTR,
METRIC_ATTR,
METRIC_MODE,
MAX_RESOURCE_ATTR,
)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.DEBUG)
random_seed = 31415927
max_steps = 100
n_workers = 4
config_space = {
MAX_RESOURCE_ATTR: max_steps,
"width": randint(0, 20),
"height": randint(-100, 100),
}
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
# Local backend
trial_backend = LocalBackend(entry_point=entry_point)
# Hyperband scheduler with SkOpt searcher
np.random.seed(random_seed)
ray_searcher = SkOptSearch()
ray_searcher.set_search_properties(
mode=METRIC_MODE,
metric=METRIC_ATTR,
config=RayTuneScheduler.convert_config_space(config_space),
)
ray_scheduler = AsyncHyperBandScheduler(
max_t=max_steps,
time_attr=RESOURCE_ATTR,
mode=METRIC_MODE,
metric=METRIC_ATTR,
)
scheduler = RayTuneScheduler(
config_space=config_space,
ray_scheduler=ray_scheduler,
ray_searcher=ray_searcher,
)
stop_criterion = StoppingCriterion(max_wallclock_time=20)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=n_workers,
)
tuner.run()
Makes use of train_height.py.
Stand-Alone Bayesian Optimization
import logging
from syne_tune.config_space import uniform, randint, choice
from syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common import (
dictionarize_objective,
)
from syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_factory import (
make_hyperparameter_ranges,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects import (
create_tuning_job_state,
)
from syne_tune.optimizer.schedulers.searchers.gp_fifo_searcher import GPFIFOSearcher
from syne_tune.optimizer.schedulers.searchers.gp_searcher_utils import encode_state
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
random_seed = 31415927
# toy example of 3 hp's
config_space = {
"hp_1": uniform(-5.0, 5.0),
"hp_2": randint(-5, 5),
"hp_3": choice(["a", "b", "c"]),
}
hp_ranges = make_hyperparameter_ranges(config_space)
batch_size = 16
num_init_candidates_for_batch = 10
state = create_tuning_job_state(
hp_ranges=hp_ranges,
cand_tuples=[
(-3.0, -4, "a"),
(2.2, -3, "b"),
(-4.9, -1, "b"),
(-1.9, -1, "c"),
(-3.5, 3, "a"),
],
metrics=[dictionarize_objective(x) for x in (15.0, 27.0, 13.0, 39.0, 35.0)],
)
gp_searcher = GPFIFOSearcher(
state.hp_ranges.config_space,
points_to_evaluate=None,
random_seed=random_seed,
metric="objective",
debug_log=False,
)
gp_searcher_state = gp_searcher.get_state()
gp_searcher_state["state"] = encode_state(state)
gp_searcher = gp_searcher.clone_from_state(gp_searcher_state)
next_candidate_list = gp_searcher.get_batch_configs(
batch_size=batch_size,
num_init_candidates_for_batch=num_init_candidates_for_batch,
)
assert len(next_candidate_list) == batch_size
Syne Tune combines a scheduler (HPO algorithm) with a backend to provide a complete HPO solution. If you already have a system in place for job scheduling and managing the state of the tuning problem, you may want to call the scheduler on its own. This example demonstrates how to do this for Gaussian process based Bayesian optimization.
Ask Tell Interface
"""
This is an example on how to use syne-tune in the ask-tell mode.
In this setup the tuning loop and experiments are disentangled. The AskTell Scheduler suggests new configurations
and the users themselves perform experiments to test the performance of each configuration.
Once done, user feeds the result into the Scheduler which uses the data to suggest better configurations.
In some cases, experiments needed for function evaluations can be very complex and require extra orchestration
(example vary from setting up jobs on non-aws clusters to runnig physical lab experiments) in which case this
interface provides all the necessary flexibility
"""
from typing import Dict
import datetime
import logging
import dill
import numpy as np
from syne_tune.backend.trial_status import Trial, Status, TrialResult
from syne_tune.config_space import uniform
from syne_tune.optimizer.baselines import RandomSearch, BayesianOptimization
from syne_tune.optimizer.scheduler import TrialScheduler
class AskTellScheduler:
bscheduler: TrialScheduler
trial_counter: int
completed_experiments: Dict[int, TrialResult]
def __init__(self, base_scheduler: TrialScheduler):
self.bscheduler = base_scheduler
self.trial_counter = 0
self.completed_experiments = {}
def ask(self) -> Trial:
"""
Ask the scheduler for new trial to run
:return: Trial to run
"""
trial_suggestion = self.bscheduler.suggest(self.trial_counter)
trial = Trial(
trial_id=self.trial_counter,
config=trial_suggestion.config,
creation_time=datetime.datetime.now(),
)
self.trial_counter += 1
return trial
def tell(self, trial: Trial, experiment_result: Dict[str, float]):
"""
Feed experiment results back to the Scheduler
:param trial: Trial that was run
:param experiment_result: {metric: value} dictionary with experiment results
"""
trial_result = trial.add_results(
metrics=experiment_result,
status=Status.completed,
training_end_time=datetime.datetime.now(),
)
self.bscheduler.on_trial_complete(trial=trial, result=experiment_result)
self.completed_experiments[trial_result.trial_id] = trial_result
def best_trial(self, metric: str) -> TrialResult:
"""
Return the best trial according to the provided metric
"""
if self.bscheduler.mode == "max":
sign = 1.0
else:
sign = -1.0
return max(
[value for key, value in self.completed_experiments.items()],
key=lambda trial: sign * trial.metrics[metric],
)
def target_function(x, noise: bool = True):
fx = x * x + np.sin(x)
if noise:
sigma = np.cos(x) ** 2 + 0.01
noise = 0.1 * np.random.normal(loc=x, scale=sigma)
fx = fx + noise
return fx
def get_objective():
metric = "mean_loss"
mode = "min"
max_iterations = 100
config_space = {
"x": uniform(-1, 1),
}
return metric, mode, config_space, max_iterations
def plot_objective():
"""
In this function, we will inspect the objective by plotting the target function
:return:
"""
from syne_tune.try_import import try_import_visual_message
try:
import matplotlib.pyplot as plt
except ImportError:
print(try_import_visual_message())
metric, mode, config_space, max_iterations = get_objective()
plt.set_cmap("viridis")
x = np.linspace(config_space["x"].lower, config_space["x"].upper, 400)
fx = target_function(x, noise=False)
noise = 0.1 * np.cos(x) ** 2 + 0.01
plt.plot(x, fx, "r--", label="True value")
plt.fill_between(x, fx + noise, fx - noise, alpha=0.2, fc="r")
plt.legend()
plt.grid()
plt.show()
def tune_with_random_search() -> TrialResult:
metric, mode, config_space, max_iterations = get_objective()
scheduler = AskTellScheduler(
base_scheduler=RandomSearch(config_space, metric=metric, mode=mode)
)
for iter in range(max_iterations):
trial_suggestion = scheduler.ask()
test_result = target_function(**trial_suggestion.config)
scheduler.tell(trial_suggestion, {metric: test_result})
return scheduler.best_trial(metric)
def save_restart_with_gp() -> TrialResult:
metric, mode, config_space, max_iterations = get_objective()
scheduler = AskTellScheduler(
base_scheduler=BayesianOptimization(config_space, metric=metric, mode=mode)
)
for iter in range(int(max_iterations / 2)):
trial_suggestion = scheduler.ask()
test_result = target_function(**trial_suggestion.config)
scheduler.tell(trial_suggestion, {metric: test_result})
# --- The scheduler can be written to disk to pause experiment
output_path = "scheduler-checkpoint.dill"
with open(output_path, "wb") as f:
dill.dump(scheduler, f)
# --- The Scheduler can be read from disk at a later time to resume experiments
with open(output_path, "rb") as f:
scheduler = dill.load(f)
for iter in range(int(max_iterations / 2)):
trial_suggestion = scheduler.ask()
test_result = target_function(**trial_suggestion.config)
scheduler.tell(trial_suggestion, {metric: test_result})
return scheduler.best_trial(metric)
def tune_with_gp() -> TrialResult:
metric, mode, config_space, max_iterations = get_objective()
scheduler = AskTellScheduler(
base_scheduler=BayesianOptimization(config_space, metric=metric, mode=mode)
)
for iter in range(max_iterations):
trial_suggestion = scheduler.ask()
test_result = target_function(**trial_suggestion.config)
scheduler.tell(trial_suggestion, {metric: test_result})
return scheduler.best_trial(metric)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.WARN)
# plot_objective() # Please uncomment this to plot the objective
print("Random:", tune_with_random_search())
print("GP with restart:", save_restart_with_gp())
print("GP:", tune_with_gp())
This is an example on how to use syne-tune in the ask-tell mode. In this setup the tuning loop and experiments are disentangled. The AskTell Scheduler suggests new configurations and the users themselves perform experiments to test the performance of each configuration. Once done, user feeds the result into the Scheduler which uses the data to suggest better configurations.
In some cases, experiments needed for function evaluations can be very complex and require extra orchestration (example vary from setting up jobs on non-aws clusters to running physical lab experiments) in which case this interface provides all the necessary flexibility.
Ask Tell interface for Hyperband
"""
This is an example on how to use syne-tune in the ask-tell mode.
In this setup the tuning loop and experiments are disentangled. The AskTell Scheduler suggests new configurations
and the users themselves perform experiments to test the performance of each configuration.
Once done, user feeds the result into the Scheduler which uses the data to suggest better configurations.
In some cases, experiments needed for function evaluations can be very complex and require extra orchestration
(example vary from setting up jobs on non-aws clusters to runnig physical lab experiments) in which case this
interface provides all the necessary flexibility
This is an extension of launch_ask_tell_scheduler.py to run multi-fidelity methods such as Hyperband
"""
import logging
from typing import Tuple
import numpy as np
from examples.launch_ask_tell_scheduler import AskTellScheduler
from syne_tune.backend.trial_status import Trial, TrialResult
from syne_tune.config_space import uniform
from syne_tune.optimizer.baselines import ASHA
from syne_tune.optimizer.scheduler import SchedulerDecision
def target_function(x, step: int = None, noise: bool = True):
fx = x * x + np.sin(x)
if noise:
sigma = np.cos(x) ** 2 + 0.01
noise = 0.1 * np.random.normal(loc=x, scale=sigma)
fx = fx + noise
if step is not None:
fx += step * 0.01
return fx
def get_objective():
metric = "mean_loss"
mode = "min"
max_iterations = 100
config_space = {
"x": uniform(-1, 1),
}
return metric, mode, config_space, max_iterations
def run_hyperband_step(
scheduler: AskTellScheduler, trial_suggestion: Trial, max_steps: int, metric: str
) -> Tuple[float, float]:
for step in range(1, max_steps):
test_result = target_function(**trial_suggestion.config, step=step)
decision = scheduler.bscheduler.on_trial_result(
trial_suggestion, {metric: test_result, "epoch": step}
)
if decision == SchedulerDecision.STOP:
break
return step, test_result
def tune_with_hyperband() -> TrialResult:
metric, mode, config_space, max_iterations = get_objective()
max_steps = 100
scheduler = AskTellScheduler(
base_scheduler=ASHA(
config_space,
metric=metric,
resource_attr="epoch",
max_t=max_steps,
mode=mode,
)
)
for iter in range(max_iterations):
trial_suggestion = scheduler.ask()
final_step, test_result = run_hyperband_step(
scheduler, trial_suggestion, max_steps, metric
)
scheduler.tell(trial_suggestion, {metric: test_result, "epoch": final_step})
return scheduler.best_trial(metric)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.WARN)
print("Hyperband:", tune_with_hyperband())
This is an extension of launch_ask_tell_scheduler.py to run multi-fidelity methods such as Hyperband.
Multi Objective Multi Surrogate (MSMOS) Searcher
from pathlib import Path
import numpy as np
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint, uniform
from syne_tune.optimizer.baselines import MORandomScalarizationBayesOpt
def main():
random_seed = 6287623
# Hyperparameter configuration space
config_space = {
"steps": randint(0, 100),
"theta": uniform(0, np.pi / 2),
"sleep_time": 0.01,
}
metrics = ["y1", "y2"]
modes = ["min", "min"]
# Creates a FIFO scheduler with a ``MultiObjectiveMultiSurrogateSearcher``. The
# latter is configured by one default GP surrogate per objective, and with the
# ``MultiObjectiveLCBRandomLinearScalarization`` acquisition function.
scheduler = MORandomScalarizationBayesOpt(
config_space=config_space,
metric=metrics,
mode=modes,
random_seed=random_seed,
)
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "mo_artificial"
/ "mo_artificial.py"
)
tuner = Tuner(
trial_backend=LocalBackend(entry_point=entry_point),
scheduler=scheduler,
stop_criterion=StoppingCriterion(max_wallclock_time=30),
n_workers=1, # how many trials are evaluated in parallel
)
tuner.run()
if __name__ == "__main__":
main()
This example shows how to use the multi-objective multi-surrogate (MSMOS) searcher to tune
a multi-objective problem. In this example, we use two Gaussian process regresors
as the surrogate models and rely on lower confidence bound random scalarizer
as the acquisition function. With that in mind, any Syne Tune Estimator
can be
used as surrogate.
Basics of Syne Tune
This tutorial provides a first overview of Syne Tune. You will learn about the most important concepts of automated hyperparameter tuning, and how to make it work for your setup.
Note
In order to run the code coming with this tutorial, you need to have installed Syne Tune from source.
Concepts and Terminology
Syne Tune is a library for large-scale distributed hyperparameter optimization (HPO). Here is some basic terminology. A specific set of values for hyperparameters is called a configuration. The configuration space is the domain of a configuration, prescribing type and valid range of each hyperparameter. Finally, a trial refers to an evaluation of the underlying machine learning model on a given configuration. A trial may result in one or more observations, for example the validation error after each epoch of training the model. Some HPO algorithms may pause a trial and restart it later in time.
HPO experiments in Syne Tune involve the interplay between three components: Tuner, Backend, and Scheduler. There is also dedicated tooling for Benchmarking.
Tuner
The Tuner
orchestrates the overall search for the best
configuration. It does so by interacting with scheduler and backend. It
queries the scheduler for a new configuration to evaluate whenever a worker is
free, and passes this suggestion to the backend for the execution of this trial.
Scheduler
In Syne Tune, HPO algorithms are called schedulers (base class
TrialScheduler
). They search for a new,
most promising configuration and suggest it as a new trial to the tuner. Some
schedulers may decide to resume a paused trial instead of suggesting a new one.
Schedulers may also be in charge of stopping running trials. Syne Tune supports
many schedulers, including
multi-fidelity methods.
Backend
The backend module is responsible for starting, stopping, pausing and resuming
trials, as well as accessing results reported by trials and their statuses (base
class TrialBackend
). Syne Tune currently supports four
execution backends to facilitate experimentations: local backend,
Python backend, SageMaker backend, and simulator backend.
Recall that an HPO experiment is defined by two scripts. First, a launcher script
which configures the configuration space, the backend, and the scheduler, then
starts the tuning loop. Second, a training script, in which the machine learning
model of interest (e.g., a deep neural network, or gradient boosted decision trees)
is trained for a fixed hyperparameter configuration, and some validation metric is
reported, either at the end or after each epoch of training. It is the responsibility
of the backend to execute the training script for different configurations, often in
parallel, and to relay their reports back to the tuner.
Local Backend
Class LocalBackend
. This backend runs
each training job locally, on the same machine as the tuner. Each training job is
run as a subprocess. Importantly, this means that the number of workers, as
specified by n_workers
passed to Tuner
, must be smaller or
equal to the number of independent resources on this machine, e.g. the number of
GPUs or CPU cores. Experiments with the local backend can either be launched on
your current machine (in which case this needs to own the resources you are
requesting, such as GPUs), or you can
launch the experiment remotely
as a SageMaker training job, using an instance type of your choice. The figure
below demonstrates the local backend. On the left, both scripts are executed on
the local machine, while on the right, scripts are run remotely.
Local backend on a local machine |
Local backend when running on SageMaker |
Syne Tune support rotating multiple GPUs on the machine, assigning the next trial to the least busy GPU, e.g. the GPU with the smallest amount of trials currently running.
The local backend is simple and has very small delays for starting, stopping, or resuming trials. However, it also has shortcomings. Most importantly, the number of trials which can run concurrently, is limited by the resources of the chosen instance. If GPUs are required, each trial is limited to using a single GPU, so that several trials can run in parallel.
The Python backend (PythonBackend
) is simply a
wrapper around the local backend, which allows you to define an experiment in a
single script (instead of two).
SageMaker Backend
Class SageMakerBackend
. This backend
runs each trial evaluation as a separate SageMaker training job. Given sufficient
instance limits, you can run your experiments with any number of workers you like,
and each worker may use all resources on the executing instance. It is even
possible to execute trials on instances of different types, which allows for
joint tuning of hyperparameters and compute resources.
The figure below demonstrates the SageMaker backend. On the left, the launcher
script runs on the local machine, while on the right, it is run remotely.
SageMaker backend with tuner running locally |
SageMaker backend with tuner running on SageMaker |
The SageMaker backend executes each trial as independent SageMaker training job,
This allows you to use any instance type and configuration you like. Also, you
may use any of the SageMaker frameworks, from scikit-learn
over PyTorch
and TensorFlow
, up to dedicated frameworks for distributed training. You may
also
bring your own Docker image.
This backend is most suited to tune models for which training is fairly expensive. SageMaker training jobs incur certain delays for starting or stopping, which are not present in the local backend. The SageMaker backend can be sped up by using SageMaker managed warm pools.
Simulator Backend
Class BlackboxRepositoryBackend
.
This backend is useful for comparing HPO methods, or variations of such methods.
It runs on a tabulated or surrogate benchmark, where validation metric data
typically obtained online by running a training script has been precomputed
offline. In a corporate setting, simulation experiments are useful for unit and
regression testing, but also to speed up evaluations of prototypes. More details
are given here, and in
this example.
The main advantage of the simulator backend is that it allows for realistic experimentation at very low cost, and running order of magnitude faster than real time. A drawback is the upfront cost of generating a tabulated benchmark of sufficient complexity to match the real problem of interest.
Importantly, Syne Tune is agnostic to which execution backend is being used. You
can easily switch between backends by changing the trial_backend
argument
in Tuner
:
launch_height_baselines.py provides an example for launching experiments with the local backend
launch_height_python_backend.py provides an example for launching experiments with the Python backend
launch_height_sagemaker.py provides an example for launching experiments with the SageMaker backend
launch_nasbench201_simulated.py provides an example for launching experiments with the simulator backend
Benchmarking
A benchmark is a collection of meta-datasets from different configuration spaces, where the exact dataset split, the evaluation protocol, and the performance measure are well-specified. Benchmarking allows for experimental reproducibility and assist us in comparing HPO methods on the specified configurations. Refer to this tutorial for a complete guide on benchmarking in Syne Tune.
Setting up the Problem
Running Example
For most of this tutorial, we will be concerned with one running example: tuning some hyperparameters of a two-layer perceptron on the FashionMNIST dataset.
FashionMNIST |
Two-layer MLP |
---|---|
This is not a particularly difficult problem. Due to its limited size, and the type of model, you can run it on a CPU instance. It is not a toy problem either. Depending on model size, training for the full number of epochs can take more than 90 minutes. We will present results obtained by running HPO for 3 hours, using 4 workers. In order to get best possible results with model-based HPO, you would have to run for longer.
Annotating the Training Script
You will normally start with some code to train a machine learning model, which comes with a number of free parameters you would like to tune. The goal is to obtain a trained (and tuned) model with low prediction error on future data from the same task. One way to do this is to split available data into disjoint training and validation sets, and to score a configuration (i.e., an instantiation of all hyperparameters) by first training on the training set, then computing the error on the validation set. This is what we will do here, while noting that there are other (more costly) scores we could have used instead (e.g., cross-validation). Here is an example:
import argparse
import logging
from benchmarking.training_scripts.mlp_on_fashion_mnist.mlp_on_fashion_mnist import (
download_data,
split_data,
model_and_optimizer,
train_model,
validate_model,
)
from syne_tune import Reporter
def objective(config): # [1]
# Download data
data_train = download_data(config)
# Report results to Syne Tune
report = Reporter()
# Split into training and validation set
train_loader, valid_loader = split_data(config, data_train)
# Create model and optimizer
state = model_and_optimizer(config)
# Training loop
for epoch in range(1, config["epochs"] + 1):
train_model(config, state, train_loader)
# Report validation accuracy to Syne Tune
# [2]
accuracy = validate_model(config, state, valid_loader)
report(accuracy=accuracy)
if __name__ == "__main__":
root = logging.getLogger()
root.setLevel(logging.INFO)
# [3]
parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int, required=True)
parser.add_argument("--dataset_path", type=str, required=True)
# Hyperparameters
parser.add_argument("--n_units_1", type=int, required=True)
parser.add_argument("--n_units_2", type=int, required=True)
parser.add_argument("--batch_size", type=int, required=True)
parser.add_argument("--dropout_1", type=float, required=True)
parser.add_argument("--dropout_2", type=float, required=True)
parser.add_argument("--learning_rate", type=float, required=True)
parser.add_argument("--weight_decay", type=float, required=True)
args, _ = parser.parse_known_args()
# Evaluate objective and report results to Syne Tune
objective(config=vars(args))
This script imports boiler plate code from mlp_on_fashionmnist.py. It is a typical script to train a neural network, using PyTorch:
[1]
objective
is encoding the function we would like to optimize. It downloads the data, splits it into training and validation set, and constructs the model and optimizer. Next, the model is trained forconfig['epochs']
epochs. An epoch constitutes a partitioning of the training set into mini-batches of sizeconfig['batch_size']
, presented to the stochastic gradient descent optimizer in a random ordering.[2] Finally, once training is done, we compute the accuracy of the model on the validation set and report it back to Syne Tune. To this end, we create a callback (
report = Reporter()
) and call it once the training loop finished, passing the validation accuracy (report(accuracy=accuracy)
).[3] Values in
config
are parameters of the training script. As is customary in SageMaker, these parameters are command line arguments to the script. A subset of these parameters are hyperparameters, namely the parameters we would like to tune. Our example has 7 hyperparameters, 3 of type int and 4 of type float. Another notable parameter isconfig['epochs']
, the number of epochs to train. This is not a parameter to be tuned, even though it plays an important role when we get to early stopping methods below. If your training problem is iterative in nature, we recommend you include the number of iterations (or epochs) among the parameters to your script.[4] Most hyperparameters determine the model, optimizer or learning rate scheduler. In
model_and_optimizer
, we can see thatconfig['n_units_1']
,config['n_units_2']
are the number of units in first and second hidden layer of a multi-layer perceptron with ReLU activations and dropout (FashionMNIST inputs are 28-by-28 grey-scale images, and there are 10 classes). Also,config['learning_rate']
andconfig['weight_decay]
parameterize the Adam optimizer.
This script differs by a vanilla training script only by two lines, which
create reporter
and call it at the end of training. Namely, we report
the validation accuracy after training as report(accuracy=accuracy)
.
Note
By default, the configuration is passed to the training script as command line arguments. This precludes passing arguments of complex type, such as lists or dictionaries, as there is also a length limit to arguments. In order to get around these restrictions, you can also pass arguments via a JSON file.
Defining the Configuration Space
Having defined the objective, we still need to specify the space we would like to search over. We will use the following configuration space throughout this tutorial:
from syne_tune.config_space import randint, uniform, loguniform
# Configuration space (or search space)
config_space = {
"n_units_1": randint(4, 1024),
"n_units_2": randint(4, 1024),
"batch_size": randint(8, 128),
"dropout_1": uniform(0, 0.99),
"dropout_2": uniform(0, 0.99),
"learning_rate": loguniform(1e-6, 1),
"weight_decay": loguniform(1e-8, 1),
}
The configuration space is a dictionary with key names corresponding to command
line input parameters of our training script. For each parameter you would like
to tune, you need to specify a Domain
, imported
from syne_tune.config_space
. A domain consists of a type (float, int,
categorical), a range (inclusive on both ends), and an encoding (linear or
logarithmic). In our example, n_units_1
, n_units_2
, batch_size
are
int with linear encoding (randint
), dropout_1
, dropout_2
are
float with linear encoding (uniform
), and learning_rate
,
weight_decay
are float with logarithmic encoding (loguniform
).
We also need to specify upper and lower bounds: n_units_1
lies between 4
and 1024, the range includes both boundary values.
Choosing a good configuration space for a given problem may require some iterations. Parameters like learning rate or regularization constants are often log-encoded, as best values may vary over several orders of magnitude and may be close to 0. On the other hand, probabilities are linearly encoded. Search ranges need to be chosen wide enough not to discount potentially useful values up front, but setting them overly large risks a long tuning time.
In general, the range definitions are more critical for methods based on random exploration than for model-based HPO methods. On the other hand, we should avoid to encode finite-sized numerical ranges as categorical for model-based HPO, instead using one of the more specialized types in Syne Tune. More details on choosing the configuration space are provided here, where you will also learn about more types: categorical, finite range, and ordinal.
Finally, you can also tune only a subset of the hyperparameters of your training script, providing fixed (default) values for the remaining ones. For example, the following configuration space fixes the model architecture:
from syne_tune.config_space import randint, uniform, loguniform
config_space = {
'n_units_1': 512,
'n_units_2': 128,
'batch_size': randint(8, 128),
'dropout_1': uniform(0, 0.99),
'dropout_2': uniform(0, 0.99),
'learning_rate': loguniform(1e-6, 1),
'weight_decay': loguniform(1e-8, 1),
}
Random Search
Grid and Random Search
With our tuning problem well-defined, what are basic methods to solve it? The most frequently used baselines are grid search and random search. Both of them pick a sequence of hyperparameter configurations, evaluate the objective for all of them, and return the configuration which attained the best metric value. This sequence is chosen independently of any metric values received in the process, a property which not only renders these baselines very simple to implement, but also makes them embarrassingly parallel.
Grid and Random Search (figure by Bergstra & Bengio) |
For grid search, we place a grid on each hyperparameter range, which is
uniformly or log-uniformly spaced. The product of these grids determines the
sequence, which can be traversed in regular and random ordering. An obvious
drawback of grid search is that the size of this sequence is exponential in the
number of hyperparameters. Simple “nested loop” implementations are
particularly problematic: if they are stopped early, HPs in outer loops are
sampled much worse than those in inner loops. As seen in the figure above,
grid search is particularly inefficient if some HPs are more important for the
objective values than others. For all of these reasons, grid search is not a
recommended baseline for HPO, unless very few parameters have to be tuned.
Nevertheless, Syne provides an implementation in
GridSearcher
.
In random search, the sequence of configurations is chosen by independent sampling. In the simple case of interest here, each value in a configuration is chosen by sampling independently from the hyperparameter domain. Recall our search space:
from syne_tune.config_space import randint, uniform, loguniform
# Configuration space (or search space)
config_space = {
"n_units_1": randint(4, 1024),
"n_units_2": randint(4, 1024),
"batch_size": randint(8, 128),
"dropout_1": uniform(0, 0.99),
"dropout_2": uniform(0, 0.99),
"learning_rate": loguniform(1e-6, 1),
"weight_decay": loguniform(1e-8, 1),
}
Here, n_units_1
is sampled uniformly from 4,...,1024
, while
learning_rate
is sampled log-uniformly from [1e-6, 1]
(i.e., it is
exp(u)
, where u
is sampled uniformly in [-6 log(10), 0]
). As seen
in figure above, random search in general does better than grid search when
some HPs are more important than others.
Launcher Script for Random Search
Here is the launcher script we will use throughout this tutorial in order to run HPO experiments.
import logging
from argparse import ArgumentParser
from pathlib import Path
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.baselines import (
RandomSearch,
BayesianOptimization,
ASHA,
MOBSTER,
)
from syne_tune import Tuner, StoppingCriterion
from syne_tune.config_space import randint, uniform, loguniform
# Configuration space (or search space)
config_space = {
"n_units_1": randint(4, 1024),
"n_units_2": randint(4, 1024),
"batch_size": randint(8, 128),
"dropout_1": uniform(0, 0.99),
"dropout_2": uniform(0, 0.99),
"learning_rate": loguniform(1e-6, 1),
"weight_decay": loguniform(1e-8, 1),
}
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
# [1]
parser = ArgumentParser()
parser.add_argument(
"--method",
type=str,
choices=(
"RS",
"BO",
"ASHA-STOP",
"ASHA-PROM",
"MOBSTER-STOP",
"MOBSTER-PROM",
),
default="RS",
)
parser.add_argument(
"--random_seed",
type=int,
default=31415927,
)
parser.add_argument(
"--n_workers",
type=int,
default=4,
)
parser.add_argument(
"--max_wallclock_time",
type=int,
default=3 * 3600,
)
parser.add_argument(
"--experiment_tag",
type=str,
default="basic-tutorial",
)
args, _ = parser.parse_known_args()
# Here, we specify the training script we want to tune
# - `mode` and `metric` must match what is reported in the training script
# - Metrics need to be reported after each epoch, `resource_attr` must match
# what is reported in the training script
if args.method in ("RS", "BO"):
train_file = "traincode_report_end.py"
elif args.method.endswith("STOP"):
train_file = "traincode_report_eachepoch.py"
else:
train_file = "traincode_report_withcheckpointing.py"
entry_point = Path(__file__).parent / train_file
max_resource_level = 81 # Maximum number of training epochs
mode = "max"
metric = "accuracy"
resource_attr = "epoch"
max_resource_attr = "epochs"
# Additional fixed parameters [2]
config_space.update(
{
max_resource_attr: max_resource_level,
"dataset_path": "./",
}
)
# Local backend: Responsible for scheduling trials [3]
# The local backend runs trials as sub-processes on a single instance
trial_backend = LocalBackend(entry_point=str(entry_point))
# Scheduler: Depends on `args.method` [4]
scheduler = None
# Common scheduler kwargs
method_kwargs = dict(
metric=metric,
mode=mode,
random_seed=args.random_seed,
max_resource_attr=max_resource_attr,
search_options={"num_init_random": args.n_workers + 2},
)
sch_type = "promotion" if args.method.endswith("PROM") else "stopping"
if args.method == "RS":
scheduler = RandomSearch(config_space, **method_kwargs)
elif args.method == "BO":
scheduler = BayesianOptimization(config_space, **method_kwargs)
else:
# Multi-fidelity method
method_kwargs["resource_attr"] = resource_attr
if args.method.startswith("ASHA"):
scheduler = ASHA(config_space, type=sch_type, **method_kwargs)
elif args.method.startswith("MOBSTER"):
scheduler = MOBSTER(config_space, type=sch_type, **method_kwargs)
else:
raise NotImplementedError(args.method)
# Stopping criterion: We stop after `args.max_wallclock_time` seconds
# [5]
stop_criterion = StoppingCriterion(max_wallclock_time=args.max_wallclock_time)
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=args.n_workers,
tuner_name=args.experiment_tag,
metadata={
"seed": args.random_seed,
"algorithm": args.method,
"tag": args.experiment_tag,
},
)
tuner.run()
Random search is obtained by calling this script with --method RS
.
Let us walk through the script, keeping this special case in mind:
[1] The script comes with command line arguments:
method
selects the HPO method (random search being given byRS
),n_workers
the number of evaluations which can be done in parallel,max_wallclock_time
the duration of the experiment, and results are stored under the tagexperiment_tag
[2] Recall that apart from the 7 hyperparameters, our training script needs two additional parameters, which are fixed throughout the experiment. In particular, we need to specify the number of epochs to train for in
epochs
. We set this value tomax_resource_level = 81
. Here, “resource” is a more general concept than “epoch”, but for most of this tutorial, they can be considered to be the same. We need to extendconfig_space
by these two additional parameters.[3] Next, we need to choose a backend, which specifies how Syne Tune should execute our training jobs (also called trials). The simplest choice is the local backend, which runs trials as sub-processes on a single instance.
[4] Most important, we need to choose a scheduler, which is how HPO algorithms are referred to in Syne Tune. A scheduler needs to suggest configurations for new trials, and also to make scheduling decisions about running trials. Most schedulers supported in Syne Tune can be imported from
syne_tune.optimizer.baselines
. In our example, we useRandomSearch
, see alsoRandomSearcher
.Schedulers need to know how the target metric is referred to in the
report
call of the training script (metric
), and whether this criterion is to be minimized or maximized (mode
). If its decisions are randomized,random_seed
controls this random sampling.[5] Finally, we need to specify a stopping criterion. In our example, we run random search for
max_wallclock_time
seconds, the default being 3 hours.StoppingCriterion
can also use other attributes, such asmax_num_trials_started
ormax_num_trials_completed
. If several attributes are used, you get the or combination.Everything comes together in the
Tuner
. Here, we can also specifyn_workers
, the number of workers. This is the maximum number of trials which are run concurrently. For the local backend, concurrent trials share the resources of a single instance (e.g., CPU cores or GPUs), so the effective number of workers is limited in this way. To ensure you really usen_workers
workers, make sure to pick an instance type which caters for your needs (e.g., no less thann_workers
GPUs or CPU cores), and also make sure your training script does not grab all the resources. Finally,tuner.run()
starts the HPO experiment.
Results for Random Search
Results for Random Search |
Here is how random search performs on our running example. The x axis is wall-clock time, the y axis best validation error attained until then. Such “tuning curves” are among the best ways to compare different HPO methods, as they display the most relevant information, without hiding overheads due to synchronization requirements or decision-making.
We ran random search with 4 workers (n_workers = 4
) for 3 hours. In fact,
we repeated the experiments 50 times with different random seeds. The solid
line shows the median, the dashed lines the 25 and 75 percentiles. An important
take-away message is that HPO performance can vary substantially when repeated
randomly, especially when the experiment is stopped rather early. When
comparing methods, it is therefore important to run enough random repeats and
use appropriate statistical techniques which acknowledge the inherent random
fluctuations.
Note
In order to learn more about how to launch long-running HPO experiments many times in parallel on SageMaker, please have a look at this tutorial.
Recommendations
One important parameter of
RandomSearcher
(and the
other schedulers we use in this tutorial) we did not use is
points_to_evaluate
, which allows specifying initial configurations to
suggest first. For example:
first_config = dict(
n_units_1=128,
n_units_2=128,
batch_size=64,
dropout_1=0.5,
dropout_2=0.5,
learning_rate=1e-3,
weight_decay=0.01,
)
scheduler = RandomSearch(
config_space,
metric=metric,
mode=mode,
random_seed=random_seed,
points_to_evaluate=[first_config],
)
Here, first_config
is the first configuration to be suggested, while
subsequent ones are drawn at random. If the model you would like to tune comes
with some recommended defaults, you should use them in points_to_evaluate
,
in order to give random search a head start. In fact, points_to_evaluate
can contain more than one initial configurations, which are then suggested in
the order given there.
Note
Configurations in points_to_evaluate
need not be completely specified.
If so, missing values are imputed by a mid-point rule. In fact, the default
for points_to_evaluate
is [dict()]
, namely one configuration where
all values are selected by the mid-point rule. If you want to run pure
random search from the start (which is not recommended), you need to set
points_to_evaluate=[]
. Details are provided
here.
Bayesian Optimization
Sequential Model-Based Search
With limited parallel computing resources, experiments are sequential processes, where trials are started and report results in some ordering. This means that when deciding on which configuration to explore with any given trial, we can make use of all metric results reported by earlier trials, given that they already finished. In the simplest case, with a single worker, a new trial can start only once all earlier trials finished. We should be able to use this information in order to make better and better decisions as the experiment proceeds.
To make this precise, at any given time when a worker comes available, we need to make a decision which configuration to evaluate with the new trial, based on (a) which decisions have been made for all earlier trials, and (b) metric values reported those earlier trials which have already finished. With more than one worker, the trial set for (a) can be larger than for (b), since some trials may still be running: their results are pending. It is important to take pending trials into account, since otherwise we risk querying our objective at redundant configurations. The best way to take information (a) and (b) into account is by way of a statistical model, leading to sequential model-based decision-making.
What is the challenge for making good next configuration decisions? Say we have already evaluated the objective at a number of configurations, chosen at random. One idea is to refine the search nearby the configuration which resulted in the best metric value so far, thereby exploiting our knowledge. Even without gradients, such local search can be highly effective. On the other hand, it risks getting stuck in a local optimum. Another extreme is random search, where we explore the objective all over the search space. Choosing between these two extremes, at any given point in time, is known as explore-exploit trade-off, and is fundamental to sequential model-based search.
What is Bayesian Optimization?
One of the oldest and most widely used instantiations of sequential model-based search is Bayesian optimization. There are a number of great tutorials and review articles on Bayesian optimization, and we won’t repeat them here:
Most instances of Bayesian optimization work by modelling the objective as function \(f(\mathbf{x})\), where \(\mathbf{x}\) is a configuration from the search space. Given such a probabilistic surrogate model, we can condition it on the observed metric data (b) in order to obtain a posterior distribution. Finally, we use this posterior distribution along with additional statistics obtained from the data (such as for example the best metric value attained so far) in order to compute a acquisition function \(a(\mathbf{x})\), an (approximate) maximum of which will be our suggested configuration. While \(a(\mathbf{x})\) can itself be difficult to globally optimize, it is available in closed form and can typically be differentiated w.r.t. \(\mathbf{x}\). Moreover, it is important to understand that \(a(\mathbf{x})\) is not an approximation to \(f(\mathbf{x})\), but instead scores the expected value of sampling the objective at \(\mathbf{x}\), thereby embodying the explore-exploit trade-off. In particular, once some \(\mathbf{x}_*\) is chosen and included into the set (a), \(a(\mathbf{x}_*)\) is much diminished.
The Bayesian optimization template requires us to make two choices:
Surrogate model: By far the most common choice is to use Gaussian process surrogate models (the tutorials linked above explain the basics of Gaussian processes). A Gaussian process is parameterized by a mean and a covariance (or kernel) function. In Syne Tune, the default corresponds to what is most frequently used in practice: Matern 5/2 kernel with automatic relevance determination (ARD). A nice side effect of this choice is that the model can learn about the relative relevance of each hyperparameter as more metric data is obtained, which allows this form of Bayesian optimization to render the curse of dimensionality much less severe than it is for random search.
Acquisition function: The default choice in Syne Tune corresponds to the most popular choice in practice: expected improvement.
GP-based Bayesian optimization is run by our
launcher script
with the argument --method BO
. Many options can be specified via
search_options
, but we use the defaults here. See
GPFIFOSearcher
for all
details. In our example, we set num_init_random
to n_workers + 2
, which
is the number of initial decisions made by random search, before switching
over to maximizing the acquisition function.
Results for Bayesian Optimization
Results for Bayesian Optimization |
Here is how Bayesian optimization performs on our running example, compared to random search. We used the same conditions (4 workers, 3 hours experiment time, 50 random repetitions).
In this particular setup, Bayesian optimization does not outperform random search after 3 hours. This is a rather common pattern. Bayesian optimization requires a certain amount of data in order to learn enough about the objective function (in particular, about which parameters are most relevant) in order to outperform random search by targeted exploration and exploitation. If we continued to 4 or 5 hours, we would see a significant difference.
Recommendations
Here, we collect some additional recommendations. Further details are found here.
Categorical Hyperparameters
While our running example does not have any, hyperparameters of categorical type are often used. For example:
from syne_tune.config_space import lograndint, choice
config_space = {
'n_units_1': lograndint(4, 1024),
# ...
'activation': choice(['ReLU', 'LeakyReLU', 'Softplus']),
}
Here, activation
could determine the type of activation function.
It is important to understand that in Bayesian optimization, a
categorical parameter is encoded as vector in the multi-dimensional
unit cube: the encoding dimension is equal to the number of different
values. This is to make sure there is no ordering information between
the different values, each pair has the same distance in the encoding
space.
This is usually not what you want with numerical values, whose
ordering provide important information to the search. For example,
it sounds simpler to search over the finite range
choice([4, 8, 16, 32, 64, 128, 256, 512, 1024])
than over the infinite
lograndint(4, 1024)
for n_units_1
, but the opposite is the
case. The former occupies 9 dimensions, the latter 1 dimension in
the encoded space, and ordering information is lost for the former.
A better alternative is logfinrange(4, 1024, 9)
.
Syne Tune provides a range of finite numerical domains in order to
avoid suboptimal performance of Bayesian optimization due to the uncritical
use of choice
. Since this is somewhat subtle, and you may also want
to import configuration spaces from other HPO libraries which do not
have these types, Syne Tune provides an automatic conversion logic
with streamline_config_space()
. Details are given
here.
Note
When using Bayesian optimization or any other model-based HPO method,
we strongly recommend to use
streamline_config_space()
in order to ensure that
your domains are chosen in a way that works best with internal encoding.
Speeding up Decision-Making
Gaussian process surrogate models have many crucial advantages over other probabilistic surrogate models typically used in machine learning. But they have one key disadvantage: inference computations scale cubically in the number of observations. For most HPO use cases, this is not a problem, since no more than a few hundred evaluations can be afforded.
Syne Tune allows to control the number of observations the GP surrogate model
is fit to, via max_size_data_for_model
in search_options
. If the data
is larger, it is downsampled to this size. Sampling is controlled by another
argument max_size_top_fraction
. Namely, this fraction of entries in the
downsampled set are filled by those points in the full set with the best metric
values, while the remaining entries are sampled (with replacement) from the
rest of the full set. The default for max_size_data_for_model
is
DEFAULT_MAX_SIZE_DATA_FOR_MODEL
.
The feature is switched off by setting this to None
or a very large value,
but this is not recommended. Subsampling is repeated every time the surrogate
model is fit.
Beyond, there are some search_options
arguments you can use in order to
speed up Bayesian optimization. The most expensive part of making a decision
consists in refitting the parameters of the GP surrogate model, such as the ARD
parameters of the kernel. While this refitting is essential for good performance
with a small number of observations, it can be thinned out or even stopped when
the dataset gets large. You can use opt_skip_init_length
,
opt_skip_period
to this end (details are
here.
Warping of Inputs
If you use input_warping=True
in search_options
, inputs are warped
before being fed into the covariance function, the effective kernel becomes
\(k(w(x), w(x'))\), where \(w(x)\) is a warping transform with two
non-negative parameters per component. These parameters are learned along with
other parameters of the surrogate model. Input warping allows the surrogate
model to represent non-stationary functions, while still keeping the numbers
of parameters small. Note that only such components of \(x\) are warped
which belong to non-categorical hyperparameters.
Box-Cox Transformation of Target Values
This option is available only for positive target values. If you use
boxcox_transform=True
in search_options
, target values are transformed
before being fitted with a Gaussian marginal likelihood. This is using the Box-Cox
transform with a parameter \(\lambda\), which is learned alongside other
parameters of the surrogate model. The transform is \(\log y\) for
\(\lambda = 0\), and \(y - 1\) for \(\lambda = 1\).
Both input warping and Box-Cox transform of target values are combined in this paper:
Cowen-Rivers, A. et.al.HEBO: Pushing the Limits of Sample-efficient Hyper-parameter OptimisationJournal of Artificial Intelligence Research 74 (2022), 1269-1349
However, they fit \(\lambda\) up front by maximizing the likelihood of the targets under a univariate Gaussian assumption for the latent \(z\), while we learn \(\lambda\) jointly with all other parameters.
Asynchronous Successive Halving and Hyperband
Early Stopping
Learning Curves (image from Aaron Klein) |
In the methods discussed above, we train each model for 81 epochs before scoring it. This is expensive, it can take up to 1.5 hours. In order to figure out whether a configuration is pretty poor, do we really need to train it all the way to the end?
At least for most neural network training problems, the validation error after training only a few epoch can be a surprisingly strong signal separating the best from the worst configurations (see figure above). Therefore, if a certain trial shows worse performance after (say) 3 epochs than many others, we may just as well stop it early, allowing the worker to pick up another potentially more rewarding task.
Synchronous Successive Halving and Hyperband
Successive halving is a simple, yet powerful scheduling method based on the idea of early stopping. Applied to our running example, we would start 81 trials with different, randomly chosen configurations. Computing validation errors after 1 epoch, we stop the 54 (or 2/3) worst performing trials, allowing the 27 (or 1/3) best performing trials to continue. This procedure is repeated after 3, 9, and 27 epochs, each time the 2/3 worst performing trials are stopped. This way, only a single trial runs all the way to 81 epochs. Its configuration has survived stopping decisions after 1, 3, 9, and 27 epochs, so likely is worth its running time.
In practice, concurrent execution has to be mapped to a small number of workers, and successive halving is implemented by pausing trials at rung levels (i.e., after 1, 3, 9, 27 epochs), and then resuming the top 1/3 to continue training until the next rung level. Pause and resume scheduling is implemented by checkpointing. We will ignore these details for now, but come back to them later. Ignoring practical details of scheduling, and assuming that training time per epoch is the same for each trial, the idea behind successive halving is to spend the same amount of time on trials stopped after 1, 3, 9, and 27 epochs, while making sure that at each rung level, the 2/3 worst performers are eliminated.
Successive halving has two parameters: the reduction factor (3 in our example), and the grace period (1 in our example). For a reduction factor 2, rung levels would be 1, 2, 4, 8, 16, 32, 64, and we would eliminate the 1/2 worst performers at each of them. The larger the reduction factor, the fewer rung levels, and the more aggressive the filtering at each of them. The default value in Syne Tune is 3, which seems to work well for most neural network tuning problems. The grace period is the lowest rung level. Its choice is more delicate. If set too large, the potential advantage of early stopping is lost, since even the worst trials are trained for this many epochs. If set too small, the validation errors at the lowest rung level are determined more by the random initial weights than the training data, and stopping decisions there will be arbitrary.
Hyperband, a generalization of successive halving, eliminates the grace
period as free parameter. In our example above, rung levels were
[1, 3, 9, 27, 81]
, and the grace period was 1. Hyperband defines
brackets as sub-sequences starting at 1, 3, 9, 27, 81, of size 5, 4, 3, 2, 1
respectively. Then, successive halving is run on each of these brackets in
sequence, where the number of trials started for each bracket is adjusted in
a way that roughly equalizes the total number of epochs trained in each bracket.
While successive halving and Hyperband are widely known, they do not work all
that well for hyperparameter tuning of neural network models. The main reason
for this is their synchronous nature of decision-making. If we think of rungs
as lists of slots, which are filled by metric results of trials getting there,
each rung has an a priori fixed size. In our successive halving example, rungs
at r = 1, 3, 9, 27, 81
epochs have sizes 81 / r
. Each rung is a
synchronization point. Before any trial can be resumed towards level 3, all
81 trials have to complete their first epoch. The progress of well-performing
trials is delayed, not only because workers are idle due to some trials
finishing faster than others, but also because of sequential computations (we
rarely have 81 workers available). At the other extreme, filling the final
rung requires a single trial to train for 54 epochs, while all other workers
are idle. This can be compensated to some extent by free workers running
trials for the next iteration already, but scheduling becomes rather complex at
this point. Syne Tune provides synchronous Hyperband as
SynchronousHyperbandScheduler
.
However, we can usually do much better with asynchronous scheduling.
Asynchronous Successive Halving
An asynchronous scheduler needs to be free of synchronization points. Whenever a worker becomes available, the decision what it should do next must be instantaneous, based on the data available at that point in time. It is not hard to come up with an asynchronous variant successive halving. In fact, it can be done in several ways.
Returning to our example, we pre-define a system of rungs at 1, 3, 9, 27
epochs as before, and we record metric values of trials reaching each rung.
However, instead of having fixed sizes up front, each rung is a growing list.
Whenever a trial reaches a rung (by having trained as many epochs as the rung
specifies), its metric value is entered into the sorted list. We can now
compute a predicate continue
which is true iff the new value lies in the
top 1/3.
There are two variants of asynchronous successive halving (ASHA), with
different requirements on the backend. In the stopping variant, a trial
reaching a rung level is stopped and discarded if continue = False
,
otherwise it is allowed to continue. If there is not enough data at a rung, the
trial continues by default. The backend needs to be able to stop jobs at
random times.
In the promotion variant, a trial reaching a rung level is always paused,
its worker is released. Once a worker becomes available, all rungs are scanned
top down. If any paused trial with continue = True
is found, it is resumed
to train until the next rung level (e.g., a trial resumed at rung 3 trains
until 9 epochs): the trial is promoted to the next rung. If no paused trial
can be promoted, a new one is started from scratch. This ASHA variant requires
pause and resume scheduling. In particular, a trial needs to checkpoint its
state (at least at rung levels), and these checkpoints need to be accessible
to all workers. On the other hand, the backend never needs to stop running
trials, as the stopping condition for each training job is determined up
front.
Scripts for Asynchronous Successive Halving
In this section, we will focus on the stopping variant of ASHA, leaving
the promotion variant for later. First, we need to modify our training
script. In order to support early stopping decisions, it needs to compute and
report validation errors during training. Recall
traincode_report_end.py
used with random search and Bayesian optimization. We will replace
objective
with the following code snippet, giving rise to
traincode_report_eachepoch.py
:
def objective(config):
# Download data
data_train = download_data(config)
# Report results to Syne Tune
report = Reporter()
# Split into training and validation set
train_loader, valid_loader = split_data(config, data_train)
# Create model and optimizer
state = model_and_optimizer(config)
# Training loop
for epoch in range(1, config["epochs"] + 1):
train_model(config, state, train_loader)
# Report validation accuracy to Syne Tune
accuracy = validate_model(config, state, valid_loader)
report(epoch=epoch, accuracy=accuracy)
Instead of computing and reporting the validation error only after
config['epochs']
epochs, we do this at the end of each epoch. To
distinguish different reports, we also include epoch=epoch
in each report.
Here, epoch
is called resource attribute. For Syne Tune’s asynchronous
Hyperband and related schedulers, resource attributes must have positive
integer values, which you can think of “resources spent”. For neural network
training, the resource attribute is typically “epochs trained”.
This is the only modification we need. Curious readers may wonder why we report validation accuracy after every epoch, while ASHA really only needs to know it at rung levels. Indeed, with some extra effort, we could rewrite the script to compute and report validation metrics only at rung levels, and ASHA would work just the same. However, for most setups, training for an epoch is substantially more expensive than computing the validation error at the end, and we can keep our script simple. Moreover, Syne Tune provides some advanced model-based extensions of ASHA scheduling, which make good use of metric data reported at the end of every epoch.
Our launcher script
runs stopping-based ASHA with the argument --method ASHA-STOP
. Note that
the entry point is traincode_report_eachepoch.py
in this case, and the
scheduler is ASHA
. Also, we need to pass the name of the resource attribute
in resource_attr
. Finally, mode="stopping"
selects the stopping
variant. Further details about ASHA and relevant additional arguments (for
which we use defaults here) are found in
this tutorial.
When you run this script, you will note that many more trials are started than for random search, and that the majority of trials are stopped after 1 or 3 epochs.
Results for Asynchronous Successive Halving
Results for Asynchronous Successive Halving |
Here are results for our running example (4 workers; 3 hours; median, 25/75 percentiles over 50 repeats). ASHA stopping makes a big difference, outperforming random search and Bayesian optimization substantially. Early stopping can speed up neural network tuning dramatically, compared to standard scheduling.
If we ran for much longer, Bayesian optimization would eventually catch up with ASHA and even do better. But of course, wall-clock time matters: it is an important, if not the most important metric for automated tuning. The faster satisfying results are obtained, the more manual iterations over data, model types, and high level features can be afforded.
Model-Based Asynchronous Successive Halving
Extrapolating Learning Curves
Learning Curves (image from Aaron Klein) |
By modelling metric data from earlier trials, Bayesian optimization learns to suggest more useful configurations down the line than randomly sampled ones. Since new configurations are sampled at random in ASHA, a natural question is how to combine it with Bayesian decision-making.
It is not immediately clear how to do this, since the data we observe per trial are not single numbers, but learning curves (see figure above). In fact, the most useful single function to model would be the validation error after the final epoch (81 in our example), but the whole point of early stopping scheduling is to query this function only very rarely. By the nature of successive halving scheduling, we observe at any point in time a lot more data for few epochs than for many. Therefore, Bayesian decision-making needs to incorporate some form of learning curve extrapolation.
One way to do so is to build a joint probabilistic model of all the data. The validation metric reported at the end of epoch \(r\) for configuration \(\mathbf{x}\) is denoted as \(f(\mathbf{x}, r)\). In order to allow for extrapolation from small \(r\) to \(r_{max}\) (81 in our example), our model needs to capture dependencies along epochs. Moreover, it also has to represent dependencies between learning curves for different configurations, since otherwise we cannot use it to score the value of a new configuration we have not seen data from before.
MOBSTER
A simple method combining ASHA with Bayesian optimization is MOBSTER. It restricts Bayesian decision-making to proposing configurations for new trials, leaving scheduling decisions for existing trials (e.g., stopping, pausing, promoting) to ASHA. Recall from Bayesian Optimization that we need two ingredients: a surrogate model \(f(\mathbf{x}, r)\) and an acquisition function \(a(\mathbf{x})\):
Surrogate model: MOBSTER uses joint surrogate models of \(f(\mathbf{x}, r)\) which start from a Gaussian process model over \(\mathbf{x}\) and extend it to learning curves, such that the distribution over \(f(\mathbf{x}, r)\) remains jointly Gaussian. This is done in several different ways, which are detailed below.
Acquisition function: MOBSTER adopts an idea from BOHB, where it is argued that the function of interest is really \(f(\mathbf{x}, r_{max})\) (where \(r_{max}\) is the full number of epochs), so expected improvement for this function would be a reasonable choice. However, this requires at least a small number of observations at this level. To this end, we use expected improvement for the function \(f(\mathbf{x}, r_{acq})\), where \(r_{acq}\) is the largest resource level for which a certain (small) number of observations are available.
These choices conveniently reduce MOBSTER to a Bayesian optimization searcher of similar form than without early stopping. One important difference is of course that a lot more data is available now, which has scaling implications for the surrogate model. More details about MOBSTER, and further options not discussed here, are given in this tutorial.
Our launcher script
runs stopping-based MOBSTER with the argument --method MOBSTER-STOP
. At
least if defaults are chosen, this is much the same as for ASHA-STOP
.
However, we can configure the surrogate model with a range of options, which
are detailed here.
Results for MOBSTER
Results for MOBSTER |
Here are results for our running example (4 workers; 3 hours; median, 25/75 percentiles over 50 repeats). MOBSTER performs comparably to ASHA on this example. As with Bayesian optimization versus random search, it would need more time in order to make a real difference.
Results on NASBench201 (ImageNet-16)
We repeated this comparison on a harder benchmark problem:
NASBench-201, on the ImageNet-16
dataset. Here, r_max = 200
, and rung levels are 1, 3, 9, 27, 81, 200
.
We used 8 workers and 8 hours experiment time, and once more report median
and 25/75 percentiles over 50 repeats. Now, after about 5 hours, MOBSTER
starts to break away from ASHA and performs significantly better.
Results on NASBench201 (ImageNet-16) |
In order to understand why MOBSTER outperforms ASHA, we can visualize the learning curves of trials. In these plots, neighboring trials are assigned different colors, circles mark rung levels, and diamonds mark final rung levels reached.
ASHA |
MOBSTER |
---|---|
We can see that ASHA continues to suggest poor configurations at a constant rate. While these are stopped after 1 epoch, they still take up valuable resources. In contrast, MOBSTER quickly learns how to avoid the worst configurations and spends available resource more effectively.
Promotion-based Scheduling
Pause and Resume. Checkpointing of Trials
As we have seen, one way to implement early stopping scheduling is to make trials report metrics at certain points (rung levels), and to stop them when their performance falls behind other trials. This is conceptually simple. A trial maps to a single training run, and it is very easy to annotate training code in order to support automated tuning.
Another idea is pause and resume. Here, a trial may be paused at the end of an epoch, releasing its worker. Any paused trial may be resumed later on when a worker becomes available, which means that it continues training where it left when paused. Synchronous schedulers need pause and resume, since trials reach a synchronization point at different times, and earlier ones have to wait for the slowest one. For asynchronous schedulers, pause and resume is an alternative to stopping trials, which can often work better. While a paused trial needs no resources, it can be resumed later on, so its past training time is not wasted.
However, pause and resume needs more support from the training script, which
has to make sure that a paused trial can be resumed later on, continuing
training as if nothing happened in between. To this end, the state of the
training job has to be checkpointed (i.e., stored into a file). The
training script
has to be modified once more, by replacing objective
with this code:
import argparse
import logging
from benchmarking.training_scripts.mlp_on_fashion_mnist.mlp_on_fashion_mnist import (
download_data,
split_data,
model_and_optimizer,
train_model,
validate_model,
)
from syne_tune import Reporter
from syne_tune.utils import (
resume_from_checkpointed_model,
checkpoint_model_at_rung_level,
add_checkpointing_to_argparse,
pytorch_load_save_functions,
)
def objective(config):
# Download data
data_train = download_data(config)
# Report results to Syne Tune
report = Reporter()
# Split into training and validation set
train_loader, valid_loader = split_data(config, data_train)
# Create model and optimizer
state = model_and_optimizer(config)
# Checkpointing
load_model_fn, save_model_fn = pytorch_load_save_functions(
{"model": state["model"], "optimizer": state["optimizer"]}
)
# Resume from checkpoint (optional) [2]
resume_from = resume_from_checkpointed_model(config, load_model_fn)
# Training loop
for epoch in range(resume_from + 1, config["epochs"] + 1):
train_model(config, state, train_loader)
# Write checkpoint (optional) [1]
checkpoint_model_at_rung_level(config, save_model_fn, epoch)
# Report validation accuracy to Syne Tune
accuracy = validate_model(config, state, valid_loader)
report(epoch=epoch, accuracy=accuracy)
if __name__ == "__main__":
root = logging.getLogger()
root.setLevel(logging.INFO)
parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int, required=True)
parser.add_argument("--dataset_path", type=str, required=True)
# Hyperparameters
parser.add_argument("--n_units_1", type=int, required=True)
parser.add_argument("--n_units_2", type=int, required=True)
parser.add_argument("--batch_size", type=int, required=True)
parser.add_argument("--dropout_1", type=float, required=True)
parser.add_argument("--dropout_2", type=float, required=True)
parser.add_argument("--learning_rate", type=float, required=True)
parser.add_argument("--weight_decay", type=float, required=True)
# [3]
add_checkpointing_to_argparse(parser)
args, _ = parser.parse_known_args()
# Evaluate objective and report results to Syne Tune
objective(config=vars(args))
Checkpointing requires you to implement the following:
[1] A checkpoint has to be written at the end of each epoch. The precise content of the checkpoint depends on the training script, but it has to contain the epoch at which it was written. It is recommended to write the checkpoint before reporting metrics to Syne Tune, since otherwise the writing of the checkpoint may be jeopardized by Syne Tune trying to stop the script.
[2] A checkpoint is to be loaded just before the start of the training loop. If the checkpoint file is present and the state can be restored, the training loop starts with the epoch
resume_from + 1
, where the checkpoint was written at the end of epochresume_from
. Otherwise,resume_from = 0
, and the training loop starts from scratch.[3] Checkpointing requires additional input arguments. You can add them by hand or use
add_checkpointing_to_argparse
. The most important is the local directory name where the checkpoint should be written or loaded from. A checkpoint may consist of different files. If this argument is not passed to the script, checkpointing is deactivated.
Syne Tune provides some helper functions for checkpointing, see FAQ.
checkpoint_model_at_rung_level(config, save_model_fn, epoch)
stores a checkpoint at the end of epochepoch
. The main work is done bysave_model_fn
.resume_from = resume_from_checkpointed_model(config, load_model_fn)
loads a checkpoint, and returns its epoch if successful. Otherwise, 0 is returned. Again,load_model_fn
does the main work.pytorch_load_save_functions
: If you use PyTorch, this is providingsave_model_fn
,load_model_fn
that should work for you. Instate_dict_objects
, you pass a dict of PyTorch objects with a mutable state (look forload_state_dict
,state_dict
methods). Make sure to include all relevant objects (model, algorithm, learning rate scheduler). Optionally,mutable_state
contains additional elementary variables.
Note that while checkpoints are written at the end of each epoch, the most recent one overwrites previous ones. In fact, for the purpose of pause and resume, checkpoints have to be written only at rung levels, because trials can only be paused there. Selective checkpointing could be supported by passing the rung levels to the training script, but this is currently not done in Syne Tune.
Our launcher script
runs promotion-based ASHA with the argument --method ASHA-PROM
, and
promotion-based MOBSTER with --method MOBSTER-PROM
:
Recall that the argument
max_resource_attr
forHyperbandScheduler
allows the scheduler to infer the maximum resource levelr_max
. For promotion-based scheduling, this argument has a second function. Namely, it allows the scheduler to inform the training script until which epoch it has to train, so it does not have to be stopped anymore from the outside. For example, say that a trial paused atr=3
is promoted to run until the next rung levelr=9
. The scheduler calls the training script withconfig[max_resource_attr] = 9
(instead of 81). It is resumed from itsr=3
checkpoint and runs epochs 4, 5, 6, 7, 8, 9, then terminates by itself. Ifmax_resource_attr
is not used, training scripts are started to be run until the end, and they need to be stopped by the backend. Depending on the backend, there can be a delay between a stopping signal being sent and a worker coming available again, which is avoided ifmax_resource_attr
is used. Moreover, future backends may be able to use the information on how long a resumed trial needs to be run until paused for improved scheduling.Syne Tune allows promotion-based schedulers to be used with training scripts which do not implement checkpointing. Our launcher script would just as well work with
traincode_report_eachepoch.py
. In this case, a trial to be resumed is started from scratch, and metric reports up to the resume epoch are ignored. For example, say a trial paused atr=3
is resumed. If the training script does not implement checkpointing, it will start from scratch and report forr = 1, 2, 3, 4, ...
. The scheduler discards the first 3 reports in this case. However, it is strongly recommended to implement checkpointing if promotion-based scheduling is to be used.
Early Removal of Checkpoints
By default, the checkpoints written by all trials are retained on disk (for a trial, later checkpoints overwrite earlier ones). When checkpoints are large and the local backend is used, this may result in a lot of disk space getting occupied, or even the disk filling up. Syne Tune supports checkpoints being removed once they are not needed anymore, or even speculatively, as is detailed here.
Results for promotion-based ASHA and MOBSTER
Results for promotion-based ASHA and MOBSTER |
Here are results for our running example (4 workers; 3 hours; median, 25/75 percentiles over 50 repeats). These results are rather similar to what we obtained for stopping-based scheduling, except the random variations are somewhat larger for ASHA stopping than for ASHA promotion.
It is not a priori clear when stopping or promotion-based scheduling will work
better. When it comes to the backend, promotion-based scheduling needs
checkpointing, and the backend needs to efficiently handle the transfer of
checkpoints between workers. On the other hand, promotion-based scheduling does
not require the backend to stop jobs (see max_resource_attr
discussion
above), which can be subject to delays in some backends. Run with the local
backend, where delays play no role, stopping and promotion-based scheduling
can behave quite differently. In our experiments, we have often observed that
stopping can be more efficient at the beginning, while promotion has an edge
during later stages.
Our recommendation is to implement checkpointing in your training script, which gives you access to all Syne Tune schedulers, and then to gain some experience with what works best for your problem at hand.
SageMaker Backend
Limitations of the Local Backend
We have been using the local backend LocalBackend
in this tutorial so far. Due to its simplicity and very low overheads for
starting, stopping, or resuming trials, this is the preferred choice for
getting started. But with models and datasets getting larger, some
disadvantages become apparent:
All concurrent training jobs (as well as the tuning algorithm itself) are run as subprocesses on the same instance. This limits the number of workers by what is offered by the instance type. You can set
n_workers
to any value you like, but what you really get depends on available resources. If you want 4 GPU workers, your instance types needs to have at least 4 GPUs, and each training job can use only one of them.It is hard to encapsulate dependencies of your training code. You need to specify them explicitly, and they need to be compatible with the Syne Tune dependencies. You cannot use Docker images.
You may be used to work with SageMaker frameworks, or even specialized setups such as distributed training. In such cases, it is hard to get tuning to work with the local backend.
Launcher Script for SageMaker Backend
Syne Tune offers the SageMaker backend
SageMakerBackend
as alternative to the local one.
Using it requires some preparation, as is detailed
here.
Recall our
launcher script.
In order to use the SageMaker backend, we need to create trial_backend
differently:
trial_backend = SageMakerBackend(
# we tune a PyTorch Framework from Sagemaker
sm_estimator=PyTorch(
entry_point=entry_point.name,
source_dir=str(entry_point.parent),
instance_type="ml.c5.4xlarge",
instance_count=1,
role=get_execution_role(),
dependencies=[str(repository_root_path() / "benchmarking")],
max_run=int(1.05 * args.max_wallclock_time),
framework_version="1.7.1",
py_version="py3",
disable_profiler=True,
debugger_hook_config=False,
sagemaker_session=default_sagemaker_session(),
),
metrics_names=[metric],
)
In essence, the SageMakerBackend
is parameterized
with a SageMaker estimator, which executes the training script. In our example,
we use the PyTorch
SageMaker framework as a pre-built container for the
dependencies our training scripts requires. However, any other type of
SageMaker estimator
can be used here just as well. Finally, if you include any of the metrics reported
by your training script in metrics_names
, their values are visualized in the
dashboard for the SageMaker training job.
If your training script requires additional dependencies not contained in the
chosen SageMaker framework, you can specify those in a requirements.txt
file in the same directory as your training script (i.e., in the source_dir
of the SageMaker estimator). In our example, this file needs to contain the
filelock
dependence.
Note
This simple example avoids complications about writing results to S3 in a unified manner, or using special features of SageMaker which can speed up tuning substantially. For more information about the SageMaker backend, please consider this tutorial.
Outlook
Further Topics
We are at the end of this basic tutorial. There are many further topics we did not touch here. Some are established, but not basic, while others are still experimental. Here is an incomplete overview:
Running many experiments in parallel: We have stressed the importance of running repetitions of experiments, as results carry quite some stochastic variation. Also, there are higher-level decisions best done by trial-and-error, which can be seen as “outer loop random search”. Syne Tune offers facilities to launch many tuning experiments in parallel, as SageMaker training jobs. More details are found in this tutorial, see also the FAQ.
Multi-fidelity Schedulers: Syne Tune provides many more multi-fidelity schedulers than ASHA and MOBSTER. An overview and categorization of supported methods is given in this tutorial.
Population-based Training: This is a popular scheduler for tuning reinforcement learning, where optimization hyperparameters like learning rate can be changed at certain points during the training. An example is at
examples/launch_pbt.py
, see alsoPopulationBasedTraining
. Note that checkpointing is mandatory for PBT.Constrained HPO: In many applications, more than a single metric play a role. With constrained HPO, you can maximize recall subject to a constraint on precision; minimize prediction latency subject to a constraint on accuracy; or maximize accuracy subject to a constraint on a fairness metric. Constrained HPO is a special case of Bayesian Optimization, where
searcher='bayesopt_constrained'
, and the name of the constraint metric (the constraint is feasible iff this metric is non-positive) must be given asconstraint_attr
insearch_options
. More details on constrained HPO and methodology adopted in Syne Tune can be found here, see alsoConstrainedGPFIFOSearcher
.Multi-objective HPO: Another way to approach tuning problems with multiple metrics is trying to sample the Pareto frontier, i.e. identifying configurations whose performance along one metric cannot be improved without degrading performance along another. Syne Tune provides a range of methodology in this direction. An example is at
examples/launch_height_moasha.py
. More details on multi-objective HPO and methodology adopted in Syne Tune can be found here, see alsoMOASHA
.Transfer-learning Schedulers: Syne Tune provides several transfer-learning schedulers. To get started check out this tutorial.
How to Choose a Configuration Space
One important step in applying hyperparameter optimization to your tuning
problem is to define a configuration space (or search space). Doing this
optimally for any given problem is more of an art than a science, but in this
tutorial you will learn about the basics and some gotchas. Syne Tune also
provides some logic in streamline_config_space()
to
automatically transform domains into forms more suitable for Bayesian
optimization, this is explained here as well.
Introduction
Here is an example for a configuration space:
from syne_tune.config_space import (
lograndint, uniform, loguniform, choice,
)
config_space = {
'n_units': lograndint(4, 1024),
'dropout': uniform(0, 0.9),
'learning_rate': loguniform(1e-6, 1),
'activation': choice(['relu', 'tanh']),
'epochs': 128,
}
Not all entries in config_space
need to be hyperparameters. For example,
epochs
is simply a constant passed to the training function. For every
hyperparameter, a domain has to be specified. The domain determines the value
range of the parameter and its internal encoding.
Each hyperparameter is independent of the other entries in config_space
. In
particular, the domain of a hyperparameter cannot depend on the value of
another. In fact, common actions involving a configuration space, such as
sampling, encoding, or decoding a configuration are done independently on its
hyperparameters.
Domains
A domain not only defines the value range of a parameter, but also its internal
encoding. The latter is important in order to define what uniform sampling
means, a basic component of many HPO algorithms. The following domains are
currently supported (for full details, see syne_tune.config_space
):
uniform(lower, upper)
: Real-valued uniform in[lower, upper]
loguniform(lower, upper)
: Real-valued log-uniform in[lower, upper]
. More precisely, the value isexp(x)
, wherex
is drawn uniformly in[log(lower), log(upper)]
.randint(lower, upper)
: Integer uniform inlower, ..., upper
. The value range includes bothlower
andupper
(difference to Python range convention, whereupper
would not be included).lograndint(lower, upper)
: Integer log-uniform inlower, ..., upper
. More precisely, the value isint(round(exp(x)))
, wherex
is drawn uniformly in[log(lower - 0.5), log(upper + 0.5)]
.choice(categories)
: Uniform from the finite listcategories
of values. Entries incategories
should ideally be of typestr
, but typesint
andfloat
are also allowed (the latter can lead to errors due to round-off).ordinal(categories, kind)
: Variant ofchoice
for which the order of entries incategories
matters. For methods like Bayesian optimization, nearby elements in the list have closer encodings. Compared tochoice
, the encoding consists of a single number here. Different variants are implemented. Ifkind="equal"
(general default), we userandint(0, len(categories) - 1)
internally on the positions incategories
, so that each category is chosen with the same probability. Ifkind="nn"
(default ifcategories
strictly increasing and of typeint
orfloat
),categories
must contain strictly increasingint
orfloat
values. Internally, we useuniform
for an interval containing all values, a real value is mapped to a category by nearest neighbor. Ifkind="nn-log"
, this is done in the log space.logordinal(categories)
is a synonym forordinal(categories, kind="nn-log")
. The latter two kinds are finite set versions ofuniform
,loguniform
, the different categories are (in general) not chosen with equal probabilities.finrange(lower, upper, size)
: Can be used as finite analogue ofuniform
. Uniform from the finite rangelower, ..., upper
of sizesize
, where entries are equally spaced. For example,finrange(0.5, 1.5, 3)
means0.5, 1.0, 1.5
, andfinrange(0.1, 1.0, 10)
means0.1, 0.2, ..., 1.0
. We require thatsize >= 2
. Note that bothlower
andupper
are part of the value range.logfinrange(lower, upper, size)
: Can be used as finite analogue ofloguniform
. Values areexp(x)
, wherex
is drawn uniformly from the finite rangelog(lower), ..., log(upper)
of sizesize
(entries equally spaced). Note that bothlower
andupper
are part of the value range.
By default, the value type for finrange
and logfinrange
is float
.
It can be changed to int
by the argument cast_int=True
. For example,
logfinrange(8, 256, 6, cast_int=True)
results in 8, 16, 32, 64, 128,
256
and value type int
, while logfinrange(8, 256, 6)
results in
8.0, 16.0, 32.0, 64.0, 128.0, 256.0
and value type float
.
Recommendations
How to choose the domain for a given hyperparameter? Obviously, we want to
avoid illegal values: learning rates should be positive, probabilities lie
in [0, 1]
, and numbers of units must be integers. Apart from this, the
choice of domain is not always obvious, and different choices can affect
search performance significantly in some cases.
With streamline_config_space()
, Syne Tune provides some
logic which transforms domains into others more suitable for Bayesian
optimization. For example:
from syne_tune.config_space import randint, uniform, choice
from syne_tune.utils import streamline_config_space
config_space = {
'n_units': randint(4, 1024),
'dropout': uniform(0, 0.9),
'learning_rate': uniform(1e-6, 1),
'weigth_decay': choice([0.001, 0.01, 0.1, 1.0]),
'magic_constant': choice([1, 2, 5, 10, 15, 30]),
}
new_config_space = streamline_config_space(config_space)
# Results in:
# new_config_space = {
# 'n_units': lograndint(4, 1024),
# 'dropout': uniform(0, 0.9),
# 'learning_rate': loguniform(1e-6, 1),
# 'weigth_decay': logfinrange(0.001, 1.0, 4),
# 'magic_constant': logordinal([1, 2, 5, 10, 15, 30]),
# }
Here, new_config_space
results in the same set of configurations, but the
internal encoding is more suitable for many of the model-based HPO methods in
Syne Tune. Why?
Avoid using choice (categorical) for numerical parameters. Many HPO algorithms make very good use of the information that a parameter is numerical, therefore has a linear ordering. They cannot do that if you do not tell them, and search performance will normally suffer. A good example is Bayesian optimization. Numerical parameters are encoded as themselves (the int domain is relaxed to the corresponding float interval), allowing the surrogate model (e.g., Gaussian process covariance kernel) to exploit ordering and distance in these numerical spaces. On the other hand, a categorical parameter with 10 different values is one-hot encoded to 10(!) dimensions in
[0, 1]
. Now, all pairs of distinct values have exactly the same distance in this embedding, so that any ordering or distance information is lost. Bayesian optimization does not perform well in general in high-dimensional embedding spaces.It is for this reason that
streamline_config_space()
converts the domains ofweight_decay
andmagic_constant
fromchoice
tologfinrange
andlogordinal
respectively.Use infinite ranges. No competitive HPO algorithm ever enumerates all possible configurations and iterates over all of them. There is almost certainly no gain in restricting a learning rate to 5 values you just picked out of your hat, instead of just using the
loguniform
domain. However, there is a lot to be lost. First, if you usechoice
, Bayesian optimization may perform poorly. Second, you may just be wrong with your initial choice and have to do time-consuming extra steps of refinement.For finite numerical domains, use finrange or logfinrange. If you insist on a finite range (in some cases, this may be the better choice) for a numerical parameter, make use of
finrange
orlogfinrange
instead ofchoice
, as alternatives touniform
andloguniform
respectively. If your value spacing is not regular, you can useordinal
orlogordinal
. For example,choice([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1])
can be replaced bylogordinal([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1])
, which is whatstreamline_config_space()
would do.Use a log transform for parameters which may vary over several orders of magnitude. Examples are learning rates or regularization constants. In the example above,
streamline_config_space()
convertsn_units
fromrandint(4, 1024)
tolograndint(4, 1024)
andlearning_rate
fromuniform(1e-6, 1)
tologuniform(1e-6, 1)
.Use points_to_evaluate. On top of refining your configuration space, we strongly recommend to specify initial default configurations by
points_to_evaluate
.
As a user, you can memory all of this, or you can use
streamline_config_space()
and just do the following:
Use
uniform
forfloat
values,randint
forint
values, and leave the decision for log scaling to the logic.Use
choice
for each finite domain, just make sure that all entries have the same type (str
,int
, orfloat
).streamline_config_space()
will transform your choice intofinrange
,logfinrange
,ordinal
, orlogordinal
for value typesfloat
orint
.
You should also use streamline_config_space()
when
importing configuration spaces from other HPO libraries, which may not support
the finite numerical domains Syne Tune has.
Note
The conversion of choice
to finrange
or logfinrange
in
streamline_config_space()
can be approximate. While
the list has the same size, some entries may be changed. For example,
choice([1, 2, 5, 10, 20, 50])
is replaced by logfinrange
with
values 1, 2, 5, 10, 22, 48
. If this is a problem for certain domains, use
the exclude_names
argument.
Finally, here is what streamline_config_space()
is doing:
For a domain
uniform(lower, upper)
orrandint(lower, upper)
: Iflower > 0
andupper >= lower * 100
, replace domain byloguniform(lower, upper)
orlograndint(lower, upper)
.For a domain
choice(categories)
, where all entries incategories
are of typeint
orfloat
: This domain is replaced byfinrange
,logfinrange
,ordinal
, orlogordinal
(with the same value type), depending on best fit. Namely,categories
is sorted to \(x_0 < \dots < x_{n-1}\), and a linear function \(a * j + b, j = 0,\dots, n-1\) is fit to \([x_j]\), and to \([\log x_j]\) if \(x_0 > 0\). The quality of the fit is scored by \(R^2\), it determines logarithmic or linear encoding, and also the choice betweenfinrange
andordinal
. Forordinal
, we always usekind="nn"
.In order to exclude certain hyperparameters from replacements, pass their names in the
exclude_names
argument ofstreamline_config_space()
.
Using the Built-in Schedulers
In this tutorial, you will learn how to use and configure the most important built-in HPO algorithms. Alternatively, you can also use most algorithms from Ray Tune.
This tutorial provides a walkthrough of some of the topics addressed here.
Schedulers and Searchers
The decision-making algorithms driving an HPO experiments are referred to as schedulers. As in Ray Tune, some of our schedulers are internally configured by a searcher. A scheduler interacts with the backend, making decisions on which configuration to evaluate next, and whether to stop, pause or resume existing trials. It relays “next configuration” decisions to the searcher. Some searchers maintain a surrogate model which is fitted to metric data coming from evaluations.
Note
There are two ways to create many of the schedulers of Syne Tune:
Import wrapper class from
syne_tune.optimizer.baselines
, for exampleRandomSearch
for random searchUse template classes
FIFOScheduler
orHyperbandScheduler
together with thesearcher
argument, for exampleFIFOScheduler
withsearcher="random"
for random search
Importing from syne_tune.optimizer.baselines
is often simpler. However,
in this tutorial, we will use the template classes in order to expose the
common structure and to explain arguments only once.
FIFOScheduler
This is the simplest kind of scheduler. It cannot stop or pause trials, each evaluation proceeds to the end. Depending on the searcher, this scheduler supports:
Random search [
searcher="random"
]Bayesian optimization with Gaussian processes [
searcher="bayesopt"
]Grid search [
searcher="grid"
]TPE with kernel density estimators [
searcher="kde"
]Constrained Bayesian optimization [
searcher="bayesopt_constrained"
]Cost-aware Bayesian optimization [
searcher="bayesopt_cost"
]Bore [
searcher="bore"
]
We will only consider the first two searchers in this tutorial. Here is a
launcher script using FIFOScheduler
:
import logging
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import FIFOScheduler
from syne_tune import Tuner, StoppingCriterion
from benchmarking.benchmark_definitions import \
mlp_fashionmnist_benchmark
if __name__ == '__main__':
logging.getLogger().setLevel(logging.DEBUG)
n_workers = 4
max_wallclock_time = 120
# We pick the MLP on FashionMNIST benchmark
# The 'benchmark' object contains arguments needed by scheduler and
# searcher (e.g., 'mode', 'metric'), along with suggested default values
# for other arguments (which you are free to override)
benchmark = mlp_fashionmnist_benchmark()
config_space = benchmark.config_space
backend = LocalBackend(entry_point=benchmark.script)
# GP-based Bayesian optimization searcher. Many options can be specified
# via `search_options`, but let's use the defaults
searcher = "bayesopt"
search_options = {'num_init_random': n_workers + 2}
scheduler = FIFOScheduler(
config_space,
searcher=searcher,
search_options=search_options,
mode=benchmark.mode,
metric=benchmark.metric,
)
tuner = Tuner(
trial_backend=backend,
scheduler=scheduler,
stop_criterion=StoppingCriterion(
max_wallclock_time=max_wallclock_time
),
n_workers=n_workers,
)
tuner.run()
What happens in this launcher script?
We select the
mlp_fashionmnist
benchmark, adopting its default hyperparameter search space without modifications.We select the local backend, which runs up to
n_workers = 4
processes in parallel on the same instance.We create a
FIFOScheduler
withsearcher = "bayesopt"
. This means that new configurations to be evaluated are selected by Bayesian optimization, and all trials are run to the end. The scheduler needs to know theconfig_space
, the name of metric to tune (metric
) and whether to minimize or maximize this metric (mode
). Formlp_fashionmnist
, we havemetric = "accuracy"
andmode = "max"
, so we select a configuration which maximizes accuracy.Options for the searcher can be passed via
search_options
. We use defaults, except for changingnum_init_random
(see below) to the number of workers plus two.Finally, we create the tuner, passing
trial_backend
,scheduler
, as well as the stopping criterion for the experiment (stop after 120 seconds) and the number of workers. The experiment is started bytuner.run()
.
FIFOScheduler
provides the full range
of arguments. Here, we list the most important ones:
config_space
: Hyperparameter search space. This argument is mandatory. Apart from hyperparameters to be searched over, the space may contain fixed parameters (such asepochs
in the example above). Aconfig
passed to the training script is always extended by these fixed parameters. If you use a benchmark, you can usebenchmark["config_space"]
here, or you can modify this default search space.searcher
: Selects searcher to be used (see below).search_options
: Options to configure the searcher (see below).metric
,mode
: Name of metric to tune (i.e, key used inreport
call by the training script), which is either to be minimized (mode="min"
) or maximized (mode="max"
). If you use a benchmark, just usebenchmark["metric"]
andbenchmark["mode"]
here.points_to_evaluate
: Allows to specify a list of configurations which are evaluated first. If your training code corresponds to some open source ML algorithm, you may want to use the defaults provided in the code. The entry (or entries) inpoints_to_evaluate
do not have to specify values for all hyperparameters. For any hyperparameter not listed there, the following rule is used to choose a default. Forfloat
andint
value type, the mid-point of the search range is used (in linear or log scaling). For categorical value type, the first entry in the value set is used. The default is a single config with all values chosen by the default rule. Pass an empty list in order to not specify any initial configs.random_seed
: Master random seed. Random sampling in schedulers and searchers are done by a number ofnumpy.random.RandomState
generators, whose seeds are derived fromrandom_seed
. If not given, a random seed is sampled and printed in the log.
Random Search
The simplest HPO baseline is random search, which you obtain with
searcher="random"
, or by using
RandomSearch
instead of
FIFOScheduler
. Search decisions are not based on past data, a new
configuration is chosen by sampling attribute values at random, from
distributions specified in config_space
. These distributions are detailed
here.
If points_to_evaluate
is specified, configurations are first taken from
this list before any are drawn at random. Options for configuring the searcher
are given in search_options
. These are:
debug_log
: IfTrue
, a useful log output about the search progress is printed.allow_duplicates
: IfTrue
, the same configuration may be suggested more than once. The default isFalse
, in that sampling is without replacement.
Bayesian Optimization
Bayesian optimization is obtained by searcher='bayesopt'
, or by using
BayesianOptimization
instead of
FIFOScheduler
. More information about Bayesian optimization is provided
here.
Options for configuring the searcher are given in search_options
. These
include options for the random searcher.
GPFIFOSearcher
provides the
full range of arguments. We list the most important ones:
num_init_random
: Number of initial configurations chosen at random (or viapoints_to_evaluate
). In fact, the number of initial configurations is the maximum of this and the length ofpoints_to_evaluate
. Afterwards, configurations are chosen by Bayesian optimization (BO). In general, BO is only used once at least one metric value from past trials is available. We recommend to set this value to the number of workers plus two.opt_nstarts
,opt_maxiter
: BO employs a Gaussian process surrogate model, whose own hyperparameters (e.g., kernel parameters, noise variance) are chosen by empirical Bayesian optimization. In general, this is done whenever new data becomes available. It is the most expensive computation in each round.opt_maxiter
is the maximum number of L-BFGS iterations. We runopt_nstarts
such optimizations from random starting points and pick the best.max_size_data_for_model
,max_size_top_fraction
: GP computations scale cubically with the number of observations, and decision making can become very slow for too many trials. Whenever there are more thanmax_size_data_for_model
observations, the dataset is downsampled to this size. Here,max_size_data_for_model * max_size_top_fraction
of the entries correspond to the cases with the best metric values, while the remaining entries are drawn at random (without replacement) from all other cases. Defaults toDEFAULT_MAX_SIZE_DATA_FOR_MODEL
.opt_skip_init_length
,opt_skip_period
: Refitting the GP hyperparameters in each round can become expensive, especially when the number of observations grows large. If so, you can choose to do it only everyopt_skip_period
rounds. Skipping optimizations is done only once the number of observations is aboveopt_skip_init_length
.gp_base_kernel
: Selects the covariance (or kernel) function to be used in the surrogate model. Current choices are “matern52-ard” (Matern5/2
with automatic relevance determination; the default) and “matern52-noard” (Matern5/2
without ARD).acq_function
: Selects the acquisition function to be used. Current choices are “ei” (negative expected improvement; the default) and “lcb” (lower confidence bound). The latter has the form \(\mu(x) - \kappa \sigma(x)\), where \(\mu(x)\), \(\sigma(x)\) are predictive mean and standard deviation, and \(\kappa > 0\) is a parameter, which can be passed viaacq_function_kwargs={"kappa": 0.5}
for \(\kappa = 0.5\).input_warping
: If this isTrue
, inputs are warped before being fed into the covariance function, the effective kernel becomes \(k(w(x), w(x'))\), where \(w(x)\) is a warping transform with two non-negative parameters per component. These parameters are learned along with other parameters of the surrogate model. Input warping allows the surrogate model to represent non-stationary functions, while still keeping the numbers of parameters small. Note that only such components of \(x\) are warped which belong to non-categorical hyperparameters.boxcox_transform
: If this isTrue
, target values are transformed before being fitted with a Gaussian marginal likelihood. This is using the Box-Cox transform with a parameter \(\lambda\), which is learned alongside other parameters of the surrogate model. The transform is \(\log y\) for \(\lambda = 0\), and \(y - 1\) for \(\lambda = 1\). This option requires the targets to be positive.
HyperbandScheduler
This scheduler comes in at least two different variants, one may stop trials
early (type="stopping"
), the other may pause trials and resume them later
(type="promotion"
). For tuning neural network models, it tends to work
much better than FIFOScheduler
. You may have read about successive halving
and Hyperband before. Chances are you read about synchronous scheduling of
parallel evaluations, while both HyperbandScheduler
and FIFOScheduler
implement asynchronous scheduling, which can be substantially more
efficient. This tutorial provides
details about synchronous and asynchronous variants of successive halving and
Hyperband.
Here is a launcher script using
HyperbandScheduler
:
import logging
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import HyperbandScheduler
from syne_tune import Tuner, StoppingCriterion
from benchmarking.benchmark_definitions import \
mlp_fashionmnist_benchmark
if __name__ == '__main__':
logging.getLogger().setLevel(logging.DEBUG)
n_workers = 4
max_wallclock_time = 120
# We pick the MLP on FashionMNIST benchmark
# The 'benchmark' object contains arguments needed by scheduler and
# searcher (e.g., 'mode', 'metric'), along with suggested default values
# for other arguments (which you are free to override)
benchmark = mlp_fashionmnist_benchmark()
config_space = benchmark.config_space
backend = LocalBackend(entry_point=benchmark.script)
# MOBSTER: Combination of asynchronous successive halving with
# GP-based Bayesian optimization
searcher = 'bayesopt'
search_options = {'num_init_random': n_workers + 2}
scheduler = HyperbandScheduler(
config_space,
searcher=searcher,
search_options=search_options,
type="stopping",
max_resource_attr=benchmark.max_resource_attr,
resource_attr=benchmark.resource_attr,
mode=benchmark.mode,
metric=benchmark.metric,
grace_period=1,
reduction_factor=3,
)
tuner = Tuner(
trial_backend=backend,
scheduler=scheduler,
stop_criterion=StoppingCriterion(
max_wallclock_time=max_wallclock_time
),
n_workers=n_workers,
)
tuner.run()
Much of this launcher script is the same as for FIFOScheduler
, but
HyperbandScheduler
comes with a number
of extra arguments we will explain in the sequel (type
,
max_resource_attr
, grace_period
, reduction_factor
,
resource_attr
). The mlp_fashionmnist
benchmark trains a two-layer MLP
on FashionMNIST
(more details are
here). The accuracy is computed and
reported at the end of each epoch:
for epoch in range(resume_from + 1, config['epochs'] + 1):
train_model(config, state, train_loader)
accuracy = validate_model(config, state, valid_loader)
report(epoch=epoch, accuracy=accuracy)
While metric="accuracy"
is the criterion to be optimized,
resource_attr="epoch"
is the resource attribute. In the schedulers
discussed here, the resource attribute must be a positive integer.
HyperbandScheduler
maintains reported
metrics for all trials at certain rung levels (levels of resource attribute
epoch
at which scheduling decisions are done). When a trial reports
(epoch, accuracy)
for a rung level == epoch
, the scheduler makes a
decision whether to stop (pause) or continue. This decision is done based on
all accuracy
values encountered before at the same rung level. Whenever a
trial is stopped (or paused), the executing worker becomes available to evaluate
a different configuration.
Rung level spacing and stop/go decisions are determined by the parameters
max_resource_attr
, grace_period
, and reduction_factor
. The first
is the name of the attribute in config_space
which contains the maximum
number of epochs to train (max_resource_attr == "epochs"
in our
benchmark). This allows the training script to obtain
max_resource_value = config["max_resource_attr"]
. Rung levels are
\(r_{min}, r_{min} \eta, r_{min} \eta^2, \dots, r_{max}\), where
\(r_{min}\) is grace_period
, \(\eta\) is reduction_factor
, and
\(r_{max}\) is max_resource_value
. In the example above,
max_resource_value = 81
, grace_period = 1
, and reduction_factor = 3
,
so that rung levels are 1, 3, 9, 27, 81. The spacing is such that stop/go
decisions are done less frequently for trials which already went further: they
have earned trust by not being stopped earlier. \(r_{max}\) need not be
of the form \(r_{min} \eta^k\). If max_resource_value = 56
in the
example above, the rung levels would be 1, 3, 9, 27, 56.
Given such a rung level spacing, stop/go decisions are done by comparing
accuracy
to the 1 / reduction_factor
quantile of values recorded at
the rung level. In the example above, our trial is stopped if accuracy
is
no better than the best 1/3 of previous values (the list includes the current
accuracy
value), otherwise it is stopped.
Further details about HyperbandScheduler
and multi-fidelity HPO methods
are given in this tutorial.
HyperbandScheduler
provides the full
range of arguments. Here, we list the most important ones:
max_resource_attr
,grace_period
,reduction_factor
: As detailed above, these determine the rung levels and the stop/go decisions. The resource attribute is a positive integer. We needreduction_factor >= 2
. Note that instead ofmax_resource_attr
, you can also usemax_t
, as detailed here.rung_increment
: This parameter can be used instead ofreduction_factor
(the latter takes precedence). In this case, rung levels are spaced linearly: \(r_{min} + j \nu, j = 0, 1, 2, \dots\), where \(\nu\) isrung_increment
. The stop/go rule in the successive halving scheduler is set based on the ratio of successive rung levels.rung_levels
: Alternatively, the user can specify the list of rung levels directly (positive integers, strictly increasing). The stop/go rule in the successive halving scheduler is set based on the ratio of successive rung levels.type
: The most important values are"stopping", "promotion"
(see above).brackets
: Number of brackets to be used in Hyperband. More details are found here. The default is 1 (successive halving).
Depending on the searcher, this scheduler supports:
Asynchronous successive halving (ASHA) [
searcher="random"
]MOBSTER [
searcher="bayesopt"
]Asynchronous BOHB [
searcher="kde"
]Hyper-Tune [
searcher="hypertune"
]Cost-aware Bayesian optimization [
searcher="bayesopt_cost"
]Bore [
searcher="bore"
]DyHPO [
searcher="dyhpo", type="dyhpo"
]
We will only consider the first two searchers in this tutorial.
Asynchronous Hyperband (ASHA)
If HyperbandScheduler
is configured
with a random searcher, we obtain ASHA, as proposed in
A System for Massively Parallel Hyperparameter Tuning.
More details are provided here.
Nothing much can be configured via search_options
in this case. The
arguments are the same as for random search with FIFOScheduler
.
Model-based Asynchronous Hyperband (MOBSTER)
If HyperbandScheduler
is configured with
a Bayesian optimization searcher, we obtain MOBSTER, as proposed in
Model-based Asynchronous Hyperparameter and Neural Architecture Search.
By default, MOBSTER uses a multi-task Gaussian process surrogate model for
metrics data observed at all resource levels. More details are provided
here.
Recommendations
Finally, we provide some general recommendations on how to use our built-in schedulers.
If you can afford it for your problem, random search is a useful baseline (
RandomSearch
). However, if even a single full evaluation takes a long time, try ASHA (ASHA
) instead. The default for ASHA istype="stopping"
, but you should considertype="promotion"
as well (more details on this choice are given here.Use these baseline runs to get an idea how long your experiment needs to run. It is recommended to use a stopping criterion of the form
stop_criterion=StoppingCriterion(max_wallclock_time=X)
, so that the experiment is stopped afterX
seconds.If your tuning problem comes with an obvious resource parameter, make sure to implement it such that results are reported during the evaluation, not only at the end. When training a neural network model, choose the number of epochs as resource. In other situations, choosing a resource parameter may be more difficult. Our schedulers require positive integers. Make sure that evaluations for the same configuration scale linearly in the resource parameter: an evaluation up to
2 * r
should be roughly twice as expensive as one up tor
.If your problem has a resource parameter, always make sure to try
HyperbandScheduler
, which in many cases runs much faster thanFIFOScheduler
.If you end up tuning the same ML algorithm or neural network model on different datasets, make sure to set
points_to_evaluate
appropriately. If the model comes from frequently used open source code, its built-in defaults will be a good choice. Any hyperparameter not covered inpoints_to_evaluate
is set using a midpoint heuristic. While still better than choosing the first configuration at random, this may not be very good.In general, the defaults should work well if your tuning problem is expensive enough (at least a minute per unit of
r
). In such cases, MOBSTER (MOBSTER
) can outperform ASHA substantially. However, if your problem is cheap, so you can afford a lot of evaluations, the searchers based on GP surrogate models may end up expensive. In fact, once the number of evaluations surpassed a certain threshold, the data is filtered down before fitting the surrogate model (see here). You can adjust this threshold or changeopt_skip_period
in order to speed up MOBSTER.
Multi-Fidelity Hyperparameter Optimization
This tutorial provides an overview of multi-fidelity HPO algorithms implemented in Syne Tune. Multi-fidelity scheduling is one of the most successful recent ideas used to speed up HPO. You will learn about the differences and relationships between different methods, and how to choose the best approach for your own problems.
Note
In order to run the code in this tutorial, you need to have
installed the blackbox-repository
dependencies.
Introduction
In this section, we define and motivate some basic definitions. As this tutorial is mostly driven by examples, we will not go into much detail here.
What is Hyperparameter Optimization (HPO)?
In hyperparameter optimization (HPO), the goal is to minimize an a priori unknown function \(f(\mathbf{x})\) over a configuration space \(\mathbf{x}\in\mathcal{X}\). Here, \(\mathbf{x}\) is a hyperparameter configuration. For example, \(f(\mathbf{x})\) could be obtained by training a neural network model on a training dataset, then computing its error on a disjoint validation dataset. The hyperparameters may configure several aspects of this setup, for example:
Optimization parameters: Learning rate, batch size, momentum fraction, regularization constant, dropout fraction, choice of stochastic gradient descent (SDG) optimizer, warm-up ratio
Architecture parameters: Number of layers, width of layers, number of convolution filters, number of self-attention heads
If HPO ranges over architecture parameters, potentially including the operator types and connectivity of cells (or layers), it is also referred to as neural architecture search (NAS).
In general, HPO is a more difficult optimization problem than training for weights and biases, for a number of reasons:
Hyperparameters are often discrete (integer or categorical), so smooth optimization principles do not apply
HPO is the outer loop of a nested (or bi-level) optimization problem, where the inner loop consists of training for weights and biases. This means that an evaluation of \(f(\mathbf{x})\) can be very expensive (hours or even days)
The nested structure implies further difficulties. Training is non-deterministic (random initialization and mini-batch ordering), so \(f(\mathbf{x})\) is really a random function. Even for continuous hyperparamters, a gradient of \(f(\mathbf{x})\) is not tractable to obtain
For these reasons, a considerable amount of technology has so far been applied to the HPO problem. In the context of this tutorial, two directions are most relevant:
Saving compute resources and time by using partial evaluations of \(f(\mathbf{x})\) most of the time. Such evaluations are called low fidelity or low resource below
Fitting data from \(f(\mathbf{x})\) (and its lower fidelities) with a surrogate probabilistic model. The latter has properties that the real target function lacks (fast to evaluate; gradients can be computed), and this can efficiently guide the search. The main purpose of a surrogate model is to reduce the number of evaluations of \(f(\mathbf{x})\), while still finding a high quality optimum
Fidelities and Resources
In this section, we will introduce concepts of multi-fidelity hyperparameter optimization. Examples will be given further below. The reader may skip this section and return to it as a glossary.
An evaluation of \(f(\mathbf{x})\) requires a certain amount of compute resources and wallclock time. Most of this time is spent in training the model. In most cases, training resources and time can be broken down into units. For example:
Neural networks are trained for a certain number of epochs (i.e., sweeps over the training set). In this case, training for one epoch could be one resource unit. This resource unit will be used as running example in this tutorial.
Machine learning models can also be trained on subsets of the training set, in order to save resources. We could create a nested system of sets, where for simplicity all sizes are integer multiples of the smallest one. In this case, training on the smallest subset size is one resource unit.
We can decide the amount of resources when evaluating a configuration, giving rise to observations of \(f(\mathbf{x}, r)\), where \(r\in\{1, 2, 3, \dots, r_{max}\}\) denotes the resource used (e.g., number of epochs of training).
It is common to define \(f(\mathbf{x}, r_{max}) = f(\mathbf{x})\), so that the original criterion of interest has the largest resource that can be chosen. In this context, any \(f(\mathbf{x}, r)\) with \(r < r_{max}\) is called a low fidelity criterion w.r.t. \(f(\mathbf{x}, r_{max})\). The smaller \(r\), the lower the fidelity. A smaller resource requires less computation and waiting time, but it also produces a datapoint of less quality when approximating the target metric. Importantly, all methods discussed here make the following assumption:
For every fixed \(\mathbf{x}\), running time and compute cost of evaluating \(f(\mathbf{x}, r)\) scales roughly proportional to \(r\). If this is not the case for the natural resource unit in your problem, you need to map \(r\) to your unit in a non-linear way. Note that time may still strongly depend on the configuration \(\mathbf{x}\) itself.
Multi-Fidelity Scheduling
How could an existing HPO technology be extended in order to make use of multi-fidelity observations \(f(\mathbf{x}, r)\) at different resources? There are two basic principles which come to mind:
A priori decisions: Whenever a decision is required which configuration \(\mathbf{x}\) to evaluate next, the method also decides the resource \(r\) to be spent on that evaluation.
A posteriori decisions: Whenever a new configuration \(\mathbf{x}\) can be run, it is started without a definite amount of resource attached to it. After it spent some resources, its low-fidelity observations are compared to others who spent the same resource before. Decisions on stopping, or also on resuming, trials are taken based on the outcome of such comparisons.
While some work on multi-fidelity Bayesian optimization has chosen the former option, methods with a posteriori decision-making have been far more successful. All methods discussed in this tutorial adhere to the a posteriori principle for decisions which trials to stop or resume from a paused state. In the sequel, we will use the terminology scheduling decisions rather than a posteriori.
How to implement such scheduling decisions? In general, we need to compare a number of trials with each other on the basis of observations at a certain resource level \(r\) (or, more generally, on values up to \(r\)). In this tutorial, and in Syne Tune more generally, we use terminology defined in the ASHA publication. A rung is a list of trials \(\mathbf{x}_j\) and observations \(f(\mathbf{x}_j, r)\) at a certain resource level \(r\). This resource is also called rung level. In general, a decision on what to do with one or several trials in the rung is taken by sorting the rung members w.r.t. their metric values. A positive decision (i.e., continue, or resume) is taken if the trial ranks among the better ones (above a certain quantile), a negative one (i.e., stop, or keep paused) is taken otherwise.
More details will be given when we come to real examples below. Just a few remarks at this point, which will be substantiated with examples:
Modern successive halving methods innovated over earlier proposals by suggesting a geometric spacing of rung levels, and by calibrating the thresholds in scheduling decisions according to this spacing. For example, the median stopping rule predates successive halving, but is typically outperformed by ASHA (while MSR is implemented in Syne Tune, it is not discussed in this tutorial).
Scheduling decisions can either be made synchronously or asynchronously. In the former case, decisions are batched up for many trials, while in the latter case, decisions for each trial are made instantaneously.
Asynchronous scheduling can either be implemented as start-and-stop, or as pause-and-resume. In the former case, trials are started when workers become available, and they may be stopped at rung levels (and just continue otherwise). In pause-and-resume scheduling, any trial is always run until the next rung level and paused there. When a worker becomes available, it may be used to resume any of the paused trials, in case they compare well against peers at the same rung. These modalities place different requirements on the training script and the execution backend.
Setting up the Problem
If you have not done this before, it is recommended you first work through the Basics of Syne Tune tutorial, in order to become familiar with concepts such as configuration, configuration space, backend, scheduler.
Note
In this tutorial, we will use a surrogate benchmark in order to obtain
realistic results with little computation. To this end, you need
to have the blackbox-repository
dependencies installed, as detailed
here.
Note that
the first time you use a surrogate benchmark, its data files are downloaded
and stored to your S3 bucket, this can take a considerable amount of time.
The next time you use the benchmark, it is loaded from your local disk or
your S3 bucket, which is fast.
Running Example
For most of this tutorial, we will be concerned with one running example: the NASBench-201 benchmark. NASBench-201 is a frequently used neural architecture search benchmark with a configuration space of six categorical parameters, with five values each. The authors trained networks under all these configurations and provide metrics, such as training error, evaluation error and runtime after each epoch, free for researchers to use. In this tutorial, we make use of the CIFAR100 variant of this benchmark, where the model architectures have been trained on the CIFAR100 image classification dataset.
NASBench-201 is an example for a tabulated benchmark. Researchers can benchmark and compare HPO algorithms on the data without having to spend efforts to train models. They do not need expensive GPU computation in order to explore ideas or do comparative studies.
Syne Tune is particularly well suited to work with tabulated benchmarks. First, it contains a blackbox repository for maintenance and fast access to tabulated benchmarks. Second, it features a simulator backend which simulates training evaluations from a blackbox. The simulator backend can be used with any Syne Tune scheduler, and experiment runs are very close to what would be obtained by running training for real. In particular, the simulation maintains correct timings and temporal order of events. Importantly, time is simulated as well. Not only are experiments very cheap to run (on basic CPU hardware), they also finish many times faster than real time.
The Launcher Script
The most flexible way to run HPO experiments in Syne Tune is by writing a launcher script. In this tutorial, we will use the following launcher script.
import logging
from argparse import ArgumentParser
from syne_tune.experiments.benchmark_definitions import nas201_benchmark
from syne_tune.backend.simulator_backend.simulator_callback import (
SimulatorCallback,
)
from syne_tune.blackbox_repository.simulated_tabular_backend import (
BlackboxRepositoryBackend,
)
from syne_tune.optimizer.baselines import (
ASHA,
MOBSTER,
HyperTune,
SyncHyperband,
SyncBOHB,
SyncMOBSTER,
DEHB,
)
from syne_tune import Tuner, StoppingCriterion
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
parser = ArgumentParser()
parser.add_argument(
"--method",
type=str,
choices=(
"ASHA-STOP",
"ASHA-PROM",
"ASHA6-STOP",
"MOBSTER-JOINT",
"MOBSTER-INDEP",
"HYPERTUNE-INDEP",
"HYPERTUNE4-INDEP",
"HYPERTUNE-JOINT",
"SYNCHB",
"SYNCSH",
"SYNCMOBSTER",
"BOHB",
"DEHB",
),
default="ASHA-STOP",
)
parser.add_argument(
"--random_seed",
type=int,
default=31415927,
)
parser.add_argument(
"--experiment_tag",
type=str,
default="mf-tutorial",
)
parser.add_argument(
"--dataset",
type=str,
choices=("cifar10", "cifar100", "ImageNet16-120"),
default="cifar100",
)
args = parser.parse_args()
# [1]
# Setting up simulator backend for blackbox repository
# We use the NASBench201 blackbox for the training set `args.dataset`
benchmark = nas201_benchmark(args.dataset)
max_resource_attr = benchmark.max_resource_attr
trial_backend = BlackboxRepositoryBackend(
elapsed_time_attr=benchmark.elapsed_time_attr,
max_resource_attr=max_resource_attr,
blackbox_name=benchmark.blackbox_name,
dataset=benchmark.dataset_name,
surrogate=benchmark.surrogate,
surrogate_kwargs=benchmark.surrogate_kwargs,
)
# [2]
# Select configuration space for the benchmark. Here, we use the default
# for the blackbox
blackbox = trial_backend.blackbox
# Common scheduler kwargs
method_kwargs = dict(
metric=benchmark.metric,
mode=benchmark.mode,
resource_attr=blackbox.fidelity_name(),
random_seed=args.random_seed,
max_resource_attr=max_resource_attr,
)
# Insert maximum resource level into configuration space. Doing so is
# best practice and has advantages for pause-and-resume schedulers
config_space = blackbox.configuration_space_with_max_resource_attr(
max_resource_attr
)
scheduler = None
if args.method in {"ASHA-STOP", "ASHA-PROM", "ASHA6-STOP"}:
# [3]
# Scheduler: Asynchronous Successive Halving (ASHA)
# The 'stopping' variant stops trials which underperform compared to others
# at certain resource levels (called rungs).
# The 'promotion' variant pauses each trial at certain resource levels
# (called rungs). Trials which outperform others at the same rung, are
# promoted later on, to run to the next higher rung.
# We configure this scheduler with random search: configurations for new
# trials are drawn at random
scheduler = ASHA(
config_space,
type="promotion" if args.method == "ASHA-PROM" else "stopping",
brackets=6 if args.method == "ASHA6-STOP" else 1,
**method_kwargs,
)
elif args.method in {"MOBSTER-JOINT", "MOBSTER-INDEP"}:
# Scheduler: Asynchronous MOBSTER
# We configure the scheduler with GP-based Bayesian optimization, using
# the "gp_multitask" or the "gp_independent" surrogate model.
search_options = None
if args.method == "MOBSTER-INDEP":
search_options = {"model": "gp_independent"}
scheduler = MOBSTER(
config_space,
search_options=search_options,
type="promotion",
**method_kwargs,
)
elif args.method in {"HYPERTUNE-INDEP", "HYPERTUNE4-INDEP", "HYPERTUNE-JOINT"}:
# Scheduler: Hyper-Tune
# We configure the scheduler with GP-based Bayesian optimization, using
# the "gp_multitask" or the "gp_independent" surrogate model.
search_options = None
if args.method == "HYPERTUNE-JOINT":
search_options = {"model": "gp_multitask"}
scheduler = HyperTune(
config_space,
search_options=search_options,
type="promotion",
brackets=4 if args.method == "HYPERTUNE4-INDEP" else 1,
**method_kwargs,
)
elif args.method in {"SYNCHB", "SYNCSH"}:
# Scheduler: Synchronous successive halving or Hyperband
# We configure this scheduler with random search: configurations for new
# trials are drawn at random
scheduler = SyncHyperband(
config_space,
brackets=1 if args.method == "SYNCSH" else None,
**method_kwargs,
)
elif args.method == "SYNCMOBSTER":
# Scheduler: Synchronous MOBSTER
# We configure this scheduler with GP-BO search. The default surrogate
# model is "gp_independent": independent processes at each rung level,
# which share a common ARD kernel, but separate mean functions and
# covariance scales.
scheduler = SyncMOBSTER(
config_space,
**method_kwargs,
)
elif args.method == "BOHB":
# Scheduler: Synchronous BOHB
# We configure this scheduler with KDE search, which is using the
# "two-density" approximation of the EI acquisition function from
# TPE (Bergstra & Bengio).
scheduler = SyncBOHB(
config_space,
**method_kwargs,
)
elif args.method == "DEHB":
# Scheduler: Differential Evolution Hyperband (DEHB)
# We configure this scheduler with random search.
scheduler = DEHB(
config_space,
**method_kwargs,
)
stop_criterion = StoppingCriterion(
max_wallclock_time=benchmark.max_wallclock_time,
max_num_evaluations=benchmark.max_num_evaluations,
)
# [4]
tuner = Tuner(
trial_backend=trial_backend,
scheduler=scheduler,
stop_criterion=stop_criterion,
n_workers=benchmark.n_workers,
sleep_time=0,
callbacks=[SimulatorCallback()],
tuner_name=args.experiment_tag,
metadata={
"seed": args.random_seed,
"algorithm": args.method,
"tag": args.experiment_tag,
"benchmark": "nas201-" + args.dataset,
},
)
tuner.run()
Let us have a walk through this script, assuming it is called with the default
--method ASHA-STOP
:
If you worked through Basics of Syne Tune, you probably miss the training scripts. Since we use the simulator backend with a blackbox (NASBench-201), a training script is not required, since the backend is directly linked to the blackbox repository and obtains evaluation data from there.
[1] We first select the benchmark and create the simulator backend linked with this benchmark. Relevant properties of supported benchmarks are collected in
syne_tune.experiments.benchmark_definitions
, usingSurrogateBenchmarkDefinition
. Some properties are tied to the benchmark and must not be changed (elapsed_time_attr
,metric
,mode
,blackbox_name
,max_resource_attr
). Other properties are default values suggested for the benchmark and may be changed by the user (n_workers
,max_num_evaluations
,max_wallclock_time
,surrogate
). Some of the blackboxes are not computed on a dense grid, they require a surrogate regression model in order to be functional. For such,surrogate
andsurrogate_kwargs
need to be considered. However, NASBench-201 comes with a finite configuration space, which has been sampled exhaustively.[1] We then create the
BlackboxRepositoryBackend
. Instead of a training script, this backend needs information about the blackbox used for the simulation.elapsed_time_attr
is the name of the elapsed time metric of the blackbox (time from start of training until end of epoch).max_resource_attr
is the name of the maximum resource entry in the configuration space (more on this shortly).[2] Next, we select the configuration space and determine some attribute names. With a tabulated benchmark, we are bound to use the configuration space coming with the blackbox,
trial_backend.blackbox.configuration_space
. If another configuration space is to be used, a surrogate regression model has to be specified. In this case,config_space_surrogate
can be passed at the construction ofBlackboxRepositoryBackend
. Since NASBench-201 has a native finite configuration space, we can ignore this extra complexity in this tutorial. However, choosing a suitable configuration space and specifying a surrogate can be important for model-based HPO methods. Some more informations are given here.[2] We can determine
resource_attr
(name of resource attribute) from the blackbox asblackbox.fidelity_name()
. Next, ifmax_resource_attr
is specified, we attach the information about the largest resource level to the configuration space, viablackbox.configuration_space_with_max_resource_attr(max_resource_attr)
. Doing so is best practice in general. In the end, the training script needs to know how long to train for at most (i.e., the maximum number of epochs in our example), this should not be hardcoded. Another advantage of attaching the maximum resource information to the configuration space is that pause-and-resume schedulers can use it to signal the training script how long to really run for. This is explained in more detail when we come to these schedulers. In short, we strongly recommend to usemax_resource_attr
and to configure schedulers with it.[2] If
max_resource_attr
is not to be used, the scheduler needs to be passed the maximum resource value explicitly. ForASHA-STOP
, this is themax_t
attribute. This is not recommended, and not shown here.[3] At this point, we create the multi-fidelity scheduler, which is ASHA in the default case. Most supported schedulers can easily be imported from
syne_tune.optimizer.baselines
, using common names.[4] Finally, we create a stopping criterion and a
Tuner
. This should be well known from Basics of Syne Tune. One speciality here is that we requiresleep_time=0
andcallbacks=[SimulatorCallback()]
for things to work out with the simulator backend. Namely, since time is simulated, theTuner
does not really have to sleep between its iterations (simulated time will be increased in distinct steps). Second,SimulatorCallback
is needed for simulation of time. It is fine to add additional callbacks here, as long asSimulatorCallback
is one of them.
The Blackbox Repository
Giving a detailed account of the blackbox repository is out of scope of this tutorial. If you run the launcher script above, you will be surprised how quickly it finishes. The only real time spent is on logging, fetching metric values from the blackbox, and running the scheduler code. Since the latter is very fast (mostly some random sampling and data organization), whole simulated HPO experiments with many parallel workers can be done in mere seconds.
However, when you run it for the very first time, you will have to wait for quite some time. This is because the blackbox repository downloads the raw data for the benchmark of your choice, processes it, and (optionally) stores it to your S3 bucket. It also stores a local copy. If the data is already in your S3 bucket, it will be downloaded from there if you run on a different instance, this is rather fast. But downloading and processing the raw data can take an hour or more for some of the blackboxes.
Synchronous Successive Halving and Hyperband
In this section, we will introduce some simple multi-fidelity HPO methods based on synchronous decision-making. Methods discussed here are not model-based, they suggest new configurations simply by drawing them uniformly at random from the configuration space, much like random search does.
Early Stopping Hyperparameter Configurations
The figure below depicts learning curves of a set of neural networks with different hyperparameter configurations trained for the same number of epochs. After a few epochs we are already able to visually distinguish between the well-performing and the poorly performing ones. However, the ordering is not perfect, and we might still require the full amount of 100 epochs to identify the best performing configuration.
Learning curves for randomly drawn hyperparameter configurations |
The idea of early stopping based HPO methods is to free up compute resources by early stopping the evaluation of poorly performing configurations and allocate them to more promising ones. This speeds up the optimization process, since we have a higher throughput of configurations that we can try.
Recall the notation of resource from the introduction. In this tutorial, resource equates to epochs trained, so \(r=2\) refers to metric values evaluated at the end of the second epoch. The main objective of interest, validation error in our tutorial, is denoted by \(f(\mathbf{x}, r)\), where \(\mathbf{x}\) is the configuration, \(r\) the resource level. Our problem typically defines a maximum resource level \(r_{max}\), so that in general the goal is to find \(\mathbf{x}\) which minimizes \(f(\mathbf{x}, r_{max})\). In NASBench-201, the maximum number of epochs is \(r_{max} = 200\).
Synchronous Successive Halving
One of the simplest competitive multi-fidelity HPO methods is synchronous successive halving (SH). The basic idea is to start with \(N\) configurations randomly sampled from the configuration space, training each of them for \(r_{min}\) epochs only (e.g., \(r_{min} = 1\)). We then discard a fraction of the worst performing trials and train the remaining ones for longer. Iterating this process, fewer trials run for longer, until at least one trial reaches \(r_{max}\) epochs.
More formally, successive halving (SH) is parameterized by a minimum resource \(r_{min}\) (for example 1 epoch) and a halving constant \(\eta\in\{2, 3, \dots\}\). The defaults in Syne Tune are \(r_{min} = 1\) and \(\eta = 3\), and we will use these for now. Next, we define rung levels \(\mathcal{R} = \{ r_{min}, r_{min}\eta, r_{min}\eta^2, \dots \}\), so that all \(r\in \mathcal{R}\) satisfy \(r\le r_{max}\). In our example, \(\mathcal{R} = \{ 1, 3, 9, 27, 81 \}\). Moreover, the initial number of configurations is set to \(N = \eta^5 = 243\). In general, a trial is trained until reaching the next recent rung level, then evaluated there, and the validation errors of all trials at a rung level are used to decide which of them to discard. We start with running \(N\) trials until rung level \(r_{min}\). Sorting the validation errors, we keep the top \(1 / \eta\) fraction (i.e, \(N / \eta\) configurations) and discard all the rest. The surviving trials are trained for \(r_{min}\eta\) epochs, and the process is repeated. Synchronized at each rung level, a \(1 / \eta\) fraction of trials survives and finds it budget to be multiplied by \(\eta\). With this particular choice of \(N\), only a single trial will be trained to the full resource \(r_{max}\). In our example:
We first train 243 randomly chosen configurations for 1 epoch each
Once all of them are finished, we promote those 81 trials with the lowest validation errors to train for 3 epochs
Then, the 27 best-performing ones after 3 epochs are trained for 9 epochs
The 9 best ones after 9 epochs are trained for 27 epochs
The 3 best ones after 27 epochs are trained for 81 epochs
The single best configuration after 81 epochs is trained for 200 epochs
Finally, once one such round of SH is finished, we start the next round with a new set of initial configurations, until the total budget is spent.
Our launcher script runs synchronous
successive halving if method="SYNCSH"
. The relevant parameters are
grace_period
( \(r_{min}\) ) and reduction_factor
( \(\eta\) ).
Moreover, for SH, we need to set brackets=1
, since otherwise an extension
called Hyperband is run (to be discussed shortly).
API docs:
Baseline:
SyncHyperband
Additional arguments:
SynchronousGeometricHyperbandScheduler
Synchronous SH employs pause-and-resume scheduling (see introduction). Once a trial reaches a rung level, it is paused there. This is because the decision of which trials to promote to the next rung level can only be taken once the current rung level is completely filled up: only then can we determine the top \(1 / \eta\) fraction of trials which are to be resumed. Syne Tune supports pause-and-resume schedulers with checkpointing. Namely, the state of a trial (e.g., weights of neural network model) is stored when it is paused. Once a trial is resumed, the checkpoint is loaded and training can resume from there. Say a trial is paused at \(r = 9\) and is later resumed towards \(r = 27\). With checkpointing, we have to train for \(27 - 9 = 18\) epochs only instead of 27 epochs for training from scratch. More details are given here. For tabulated benchmarks, checkpointing is supported by default.
Finally, it is important to understand in which sense the method detailed in this section is synchronous. This is because decision-making on which trials to resume is synchronized at certain points in time, namely when a rung level is completed. In general, a trial reaching a rung level has to be paused, because it is not the last one required to fill the rung. In our example, the rung at \(r = 1\) requires 243 trials to finish training for one epoch, so that 242 of them have to be paused for some time.
Synchronous decision-making does not mean that parallel compute resources (called workers in Syne Tune) need to sit idle. In Syne Tune, workers are asychronously scheduled in general: whenever a worker finishes, it is assigned a new task immediately. Say a worker just finished, but we find all remaining slots in the current rung to be pending (meaning that other workers evaluate trials to end up there, but are not finished yet). We cannot resume a trial from this rung, because promotion decisions require all slots to be filled. In such cases, our implementation starts a new round of SH (or further contributes to a new round already started for the same reason).
In the sequel, the synchronous / asynchronous terminology always refers to decision-making, and not to scheduling of parallel resources.
Synchronous Hyperband
While SH can greatly improve upon random search, the choice of \(r_{min}\) can have an impact on its performance. If \(r_{min}\) is too small, our network might not have learned anything useful, and even the best configurations may be filtered out at random. If \(r_{min}\) is too large on the other hand, the benefits of early stopping may be greatly diminished.
Hyperband is an extension of SH that mitigates the risk of setting \(r_{min}\) too small. It runs SH as subroutine, where each round, called a bracket, balances between \(r_{min}\) and the number of initial configurations \(N\), such that the same total amount of resources is used. One round of Hyperband consists of a sequential loop over brackets.
The number of brackets can be chosen anywhere between 1 (i.e., SH) and the number of rung levels. In Syne Tune, the default number of brackets is the maximum. Without going into formal details, here are the brackets for our NASBench-201 example:
Bracket 0: \(r_{min} = 1, N = 243\)
Bracket 1: \(r_{min} = 3, N = 98\)
Bracket 2: \(r_{min} = 9, N = 41\)
Bracket 3: \(r_{min} = 27, N = 18\)
Bracket 4: \(r_{min} = 81, N = 9\)
Bracket 5: \(r_{min} = 200, N = 6\)
Our launcher script runs synchronous
Hyperband if method="SYNCHB"
. Since brackets
is not used when creating
SyncHyperband
, the maximum value 6 is chosen. We also use the default
values for grace_period
(1) and reduction_factor
(3).
API docs:
Baseline:
SyncHyperband
Additional arguments:
SynchronousGeometricHyperbandScheduler
The advantages of Hyperband over SH are mostly theoretical. In practice, while Hyperband can improve on SH if \(r_{min}\) chosen for SH is clearly too small, it tends to perform worse than SH if \(r_{min}\) is adequate. This disadvantage of Hyperband is somewhat mitigated in the Syne Tune implementation, where new brackets are started whenever workers cannot contribute to the current bracket (because remaining slots in the current rung are pending, see above).
Asynchronous Successive Halving
In this section, we will turn our attention to methods adopting asynchronous decision-making, which tend to be more efficient than their synchronous counterparts.
Asynchronous Successive Halving: Early Stopping Variant
In synchronous successive halving (SH), decisions on whether to promote a trial or not can be delayed for a long time. In our example, say we are lucky and sample an excellent configuration early on, among the 243 initial ones. In order to promote it to train for 81 epochs, we first need to train 243 trials for 1 epoch, then 81 for 3 epochs, 27 for 9 epochs, and 9 for 27 epochs. Our excellent trial will always be among the top \(1/3\) of others at these rung levels, but its progress through the rungs is severely delayed.
In asynchronous successive halving (ASHA), the aim is to promote promising configurations as early as possible. There are two different variants of ASHA, and we will begin with the (arguably) simpler one. Whenever a worker becomes available, a new configuration is sampled at random, and a new trial starts training from scratch. Whenever a trial reaches a rung level, a decision is made immediately on whether to stop training or let it continue. This decision is made based on all data available at the rung until now. If the trial is among the top \(1 / \eta\) fraction of configurations previously registered at this rung, it continues. Otherwise, it is stopped. As long as a rung has less than \(\eta\) trials, the default is to continue.
Different to synchronous SH, there are no fixed rung sizes. Instead, each rung grows over time. ASHA is free of synchronization points. Promising trials can be trained for many epochs without having to wait for delayed promotion decisions. While asynchronous decision-making can be much more efficient at running good configurations to the end, it runs the risk of making bad decisions based on too little data.
Our launcher script runs the stopping
variant of ASHA if method="ASHA-STOP"
.
API docs:
Baseline:
ASHA
Additional arguments:
HyperbandScheduler
(type="stopping"
selects the early stopping variant)
Asynchronous Successive Halving: Promotion Variant
In fact, the algorithm originally proposed as
ASHA is slightly different to what has
been detailed above. Instead of starting a trial once and rely on early
stopping, this promotion variant is of the pause-and-resume type. Namely,
whenever a trial reaches a rung, it is paused there. Whenever a worker becomes
available, all rungs are scanned top to bottom. If a paused trial is found
which lies in the top \(1 / \eta\) of all rung entries, it is promoted:
it may resume and train until the next rung level. If no promotable paused
trial is found, a new trial is started from scratch. Our
launcher script runs the stopping
variant of ASHA if method="ASHA-PROM"
.
API docs:
Baseline:
ASHA
Additional arguments:
HyperbandScheduler
(type="promotion"
selects the promotion variant)
If these two variants (stopping and promotion) are compared under ideal conditions, one sometimes does better than the other, and vice versa. However, they come with different requirements. The promotion variant pauses and resumes trials, therefore benefits from checkpointing being implemented for the training code. If this is not the case, the stopping variant may be more attractive.
On the other hand, the stopping variant requires the backend to frequently stop
workers and bringing them back in order to start a new trial. For some
backends, the turn-around time for this process may be slow, in which case the
promotion type can be more attractive. In this context, it is important to
understand the relevance of passing max_resource_attr
to the scheduler
(and, in our case, also to the
BlackboxRepositoryBackend
).
Recall the discussion here. If the
configuration space contains an entry with the maximum resource, whose key is
passed to the scheduler as max_resource_attr
, the latter can modify this
value when calling the backend to start or resume a trial. For example, if a
trial is resumed at \(r = 3\) to train until \(r = 9\), the scheduler
passes a configuration to the backend with {max_resource_attr: 9}
. This
means that the training code knows how long it has to run, it does not have to
be stopped by the backend.
ASHA can be significantly accelerated by using PASHA (Progressive ASHA) that dynamically allocates maximum resources for the tuning procedure depending on the need. PASHA starts with a small initial amount of maximum resources and progressively increases them if the ranking of the configurations in the top two rungs has not stabilized. In practice PASHA leads to e.g. 3x speedup compared to ASHA, but this can be even higher for large datasets with millions of examples. A tutorial about PASHA is here.
Asynchronous Hyperband
Finally, ASHA can also be extended to use multiple brackets. Namely, whenever
a new trial is started, its bracket (or, equivalently, its \(r_{min}\)
value) is sampled randomly from a distribution. In Syne Tune, this distribution
is proportional to the rung sizes in synchronous Hyperband. In our example
with 6 brackets (see details here),
this distribution is \(P(r_{min}) = [1:243/415, 3:98/415, 9:41/415,
27:18/415, 81:9/415, 200:6/415]\). Our launcher script runs asynchronous Hyperband with 6
brackets if method="ASHA6-STOP"
.
API docs:
Baseline:
ASHA
Additional arguments:
HyperbandScheduler
(brackets
selects the number of brackets, defaults to 1)
As also noted in ASHA, the algorithm
often works best with a single bracket, so that brackets=1
is the default
in Syne Tune. However, we will see further below that model-based variants of
ASHA with multiple brackets can outperform the single-bracket version if the
distribution over \(r_{min}\) is adaptively chosen.
Finally, Syne Tune implements two variants of ASHA with brackets > 1
. In
the default variant, there is only a single system of rungs. For each new
trial, \(r_{min}\) is sampled to be equal to one of the rung levels, which
means the trial does not have to compete with others at rung levels
\(r < r_{min}\). The other variant is activated by passing
rung_system_per_bracket=True
to
HyperbandScheduler
. In this case, each
bracket has its own rung system, and trials started in one bracket only have
to compete with others in the same bracket.
Early Removal of Checkpoints
By default, the checkpoints written by all trials are retained on disk (for a trial, later checkpoints overwrite earlier ones). When checkpoints are large and the local backend is used, this may result in a lot of disk space getting occupied, or even the disk filling up. Syne Tune supports checkpoints being removed once they are not needed anymore, or even speculatively, as is detailed here.
Model-based Synchronous Hyperband
All methods considered so far have been extensions of random search by clever multi-fidelity scheduling. In this section, we consider combinations of Bayesian optimization with multi-fidelity scheduling, where configurations are chosen based on performance of previously chosen ones, rather than being sampled at random.
Basics of Syne Tune: Bayesian Optimization provides an introduction to Bayesian optimization in Syne Tune.
Synchronous BOHB
The first model-based method we consider is BOHB, which uses the TPE formulation of Bayesian optimization. In the latter, an approximation to the expected improvement (EI) acquisition function is interpreted via a ratio of two densities. BOHB uses kernel density estimators rather than tree Parzen estimators (as in TPE) to model the two densities.
BOHB uses the same scheduling mechanism (i.e., rung levels, promotion
decisions) than synchronous Hyperband (or SH), but it uses a model fit to past
data for suggesting the configuration of every new trial.
Recall that
validation error after \(r\) epochs is denoted by \(f(\mathbf{x}, r)\),
where \(\mathbf{x}\) is the configuration. BOHB fits KDEs separately to the
data obtained at each rung level. When a new configuration is to be suggested,
it first determines the largest rung level \(r_{acq}\) supported by enough
data for the two densities to be properly fit. It then makes a TPE decision at
this resource level. Our launcher script
runs synchronous BOHB if method="BOHB"
.
API docs:
Baseline:
SyncBOHB
Additional arguments:
SynchronousGeometricHyperbandScheduler
While BOHB is often more efficient than SYNCHB, it is held back by synchronous decision-making. Note that BOHB does not model the random function \(f(\mathbf{x}, r)\) directly, which makes it hard to properly react to pending evaluations, i.e. trials which have been started but did not return metric values yet. BOHB ignores pending evaluations if present, which could lead to redundant decisions being made if the number of workers (i.e., parallelization factor) is large.
Synchronous MOBSTER
Another model-based variant is synchronous MOBSTER. We will provide more details on MOBSTER below, when discussing model-based asynchronous methods.
Our launcher script runs synchronous
MOBSTER if method="SYNCMOBSTER"
. Note that the default surrogate model for
SyncMOBSTER
is gp_independent
, where the data at each rung level
is represented by an independent Gaussian process (more details are given
here).
It turns out that SyncMOBSTER
outperforms
SyncBOHB
substantially on the benchmark chosen here.
API docs:
Baseline:
SyncMOBSTER
Additional arguments:
SynchronousGeometricHyperbandScheduler
When running these experiments with the simulator backend, we note that
suddenly it takes quite some time for an experiment to be finished. Still many
times faster than real time, we now need many minutes instead of seconds. This
is a reminder that model-based decision-making can take time. In GP-based
Bayesian optimization, hyperparameters of a Gaussian process model are fit for
every decision, and acquisition functions are being optimized over many
candidates. On the real time scale (the x axis in our result plots), this time
is often well spent. After all, SyncMOBSTER
outperforms SyncBOHB
significantly. But since decision-making computations cannot be tabulated, they
slow down the simulations.
As a consequence, we should be careful with result plots showing performance with respect to number of training evaluations, as these hide both the time required to make decisions, as well as potential inefficiencies in scheduling jobs in parallel. HPO methods should always be compared with real experiment time on the x axis, and the any-time performance of methods should be visualized by plotting curves, not just quoting “final values”. Examples are provided here.
Note
Syne Tune allows to launch experiments remotely and in parallel in order to still obtain results rapidly, as is detailed here.
Differential Evolution Hyperband
Another recent model-based extension of synchronous Hyperband is
Differential Evolution Hyperband (DEHB).
DEHB is typically run with multiple brackets. A main difference to Hyperband
is that configurations promoted from a rung to the next are also modified by
an evolutionary rule, involving mutation, cross-over and selection. Since
configurations are not just sampled once, but potentially modified at every
rung, the hope is to find well-performing configurations faster. Our
launcher script runs DEHB if
method="DEHB"
.
API docs:
Baseline:
DEHB
Additional arguments:
GeometricDifferentialEvolutionHyperbandScheduler
The main feature of DEHB over synchronous Hyperband is that configurations can be modified at every rung. However, this feature also has a drawback. Namely, DEHB cannot make effective use of checkpointing. If a trial is resumed with a different configuration, starting from its last recent checkpoint is not admissable. However, our implementation is careful to make use of checkpointing in the very first bracket of DEHB, which is equivalent to a normal run of synchronous SH.
Model-based Asynchronous Hyperband
We have seen that asynchronous decision-making tends to outperform synchronous variants in practice, and model-based extensions of the latter can outperform random sampling of new configurations. In this section, we discuss combinations of Bayesian optimization with asynchronous decision-making, leading to the currently best performing multi-fidelity methods in Syne Tune.
All examples here can either be run in stopping or promotion mode of ASHA. We will use the promotion mode here (i.e., pause-and-resume scheduling).
Surrogate Models of Learning Curves
Recall that validation error after \(r\) epochs is denoted by \(f(\mathbf{x}, r)\), with \(\mathbf{x}\) the configuration. Here, \(r\mapsto f(\mathbf{x}, r)\) is called learning curve. A learning curve surrogate model predicts \(f(\mathbf{x}, r)\) from observed data. A difficult requirement in the context of multi-fidelity HPO is that observations are much more abundant at smaller resource levels \(r\), while predictions are more valuable at larger \(r\).
In the context of Gaussian process based Bayesian optimization, Syne Tune supports a number of different learning curve surrogate models. The type of model is selected upon construction of the scheduler:
scheduler = MOBSTER(
config_space,
type="promotion",
search_options=dict(
model="gp_multitask",
gp_resource_kernel="exp-decay-sum",
),
metric=benchmark.metric,
mode=benchmark.mode,
resource_attr=resource_attr,
random_seed=random_seed,
max_resource_attr=max_resource_attr,
)
Here, options configuring the searcher are collected in search_options
. The
most important options are model
, selecting the type of surrogate model,
and gp_resource_kernel
selecting the covariance model in the case
model="gp_multitask"
.
Independent Processes at each Rung Level
A simple learning curve surrogate model is obtained by
search_options["model"] = "gp_independent"
. Here, \(f(\mathbf{x}, r)\)
at each rung level \(r\) is represented by an independent Gaussian process
model. The models have individual constant mean functions
\(\mu_r(\mathbf{x}) = \mu_r\) and covariance functions
\(k_r(\mathbf{x}, \mathbf{x}') = c_r k(\mathbf{x}, \mathbf{x}')\),
where \(k(\mathbf{x}, \mathbf{x}')\) is a Matern-5/2 ARD kernel without
variance parameter, which is shared between the models, and the \(c_r > 0\)
are individual variance parameters. The idea is that while validation errors at
different rung levels may be scaled and shifted, they should still exhibit
similar dependencies on the hyperparameters. The noise variance \(\sigma^2\)
used in the Gaussian likelihood is the same across all data. However, if
search_options["separate_noise_variances"] = True
, different noise
variances \(\sigma_r^2\) are used for data at different rung levels.
Multi-Task Gaussian Process Models
A more advanced set of learning curve surrogate models is obtained by
search_options["model"] = "gp_multitask"
(which is the default for
asynchronous MOBSTER). In this case, a single Gaussian process model
represents \(f(\mathbf{x}, r)\) directly, with mean function
\(\mu(\mathbf{x}, r)\) and covariance function
\(k((\mathbf{x}, r), (\mathbf{x}', r'))\). The GP model is selected by
search_options["gp_resource_kernel"]
, currently supported options are
"exp-decay-sum"
, "exp-decay-combined"
, "exp-decay-delta1"
,
"freeze-thaw"
, "matern52"
, "matern52-res-warp"
,
"cross-validation"
. The default choice is "exp-decay-sum"
, which is
inspired by the exponential decay model proposed
here. Details about these different
models are given here and in
the source code.
Decision-making is somewhat more expensive with "gp_multitask"
than with
"gp_independent"
, because the notorious cubic scaling of GP inference
applies over observations made at all rung levels. However, the extra cost is
limited by the fact that most observations by far are made at the lowest
resource level \(r_{min}\) anyway.
Additive Gaussian Models
Two additional models are selected by
search_options["model"] = "gp_expdecay"
and
search_options["model"] = "gp_issm"
. The former is the exponential
decay model proposed here, the latter is
a variant thereof. These additive Gaussian models represent dependencies across
\(r\) in a cheaper way than in "gp_multitask"
, and they can be fit to
all observed data, not just at rung levels. Also, joint sampling is cheap.
However, at this point, additive Gaussian models remain experimental, and they will not be further discussed here. They can be used with MOBSTER, but not with Hyper-Tune.
Asynchronous MOBSTER
MOBSTER combines ASHA and
asynchronous Hyperband with GP-based Bayesian optimization. A Gaussian process
learning curve surrogate model is fit to the data at all rung levels, and
posterior predictive distributions are used in order to compute acquisition
function values and decide on which configuration to start next. We distinguish
between MOBSTER-JOINT with a GP multi-task model ("gp_multitask"
) and
MOBSTER-INDEP with an independent GP model ("gp_independent"
), as detailed
above. The acquisition function is expected improvement (EI) at the rung level
\(r_{acq}\) also used by BOHB.
Our launcher script runs (asynchronous)
MOBSTER-JOINT if method="MOBSTER-JOINT"
. The searcher can be configured
with search_options
, but MOBSTER-JOINT with the "exp-decay-sum"
covariance model is the default.
API docs:
Baseline:
MOBSTER
Additional arguments:
HyperbandScheduler
search_options
:GPMultiFidelitySearcher
As shown below, MOBSTER can outperform ASHA significantly. This is achieved by starting many less trials that stop very early (after 1 epoch) due to poor performance. Essentially, MOBSTER rapidly learns some important properties about the NASBench-201 problem and avoids basic mistakes which random sampling of configurations runs into at a constant rate. While ASHA stops such poor trials early, they still take away resources, which MOBSTER can spend on longer evaluations of more promising configurations. This advantage of model-based over random sampling based multi-fidelity methods is even more pronounced when starting and stopping jobs comes with delays. Such delays are typically present in real world distributed systems, but are absent in our simulations.
Different to BOHB, MOBSTER takes into account pending evaluations, i.e. trials which have been started but did not return metric values yet. This is done by integrating out their metric values by Monte Carlo. Namely, we draw a certain number of joint samples over pending targets and average the acquisition function over these. In the context of multi-fidelity, if a trial is running, a pending evaluation is registered for the next recent rung level it will reach.
Why is the surrogate model in MOBSTER-JOINT fit to the data at rung levels
only? After all, training scripts tend to report validation errors after each
epoch, why not use all this data? Syne Tune allows to do so (for the
"gp_multitask"
model), by passing searcher_data="all"
when creating
the HyperbandScheduler
(another
intermediate is searcher_data="rungs_and_last"
). However, while this may
lead to a more accurate model, it also becomes more expensive to fit, and does
not tend to make a difference, so the default searcher_data="rungs"
is
recommended.
Finally, we can also combine ASHA with
BOHB decision-making, by choosing
searcher="kde"
in
HyperbandScheduler
. This is an
asynchronous version of BOHB.
MOBSTER-INDEP
Our launcher script runs
(asynchronous) MOBSTER-INDEP if method="MOBSTER-INDEP"
. The independent
GPs model is selected by search_options["model"] = "gp_independent"
.
MOBSTER tends to perform slightly better with a joint multi-task GP model than
with an independent GPs model, justifying the Syne Tune default. In our
experience so far, changing the covariance model in MOBSTER-JOINT has only
marginal impact.
API docs:
Baseline:
MOBSTER
Additional arguments:
HyperbandScheduler
search_options
:GPMultiFidelitySearcher
(here, we usesearch_options["model"] = "gp_independent"
)
MOBSTER and Hyperband
Just like ASHA can be run with multiple brackets,
so can MOBSTER, simply by selecting brackets
when creating
HyperbandScheduler
. In our experience so
far, just like with ASHA, MOBSTER tends to work best with a single bracket.
Controlling MOBSTER Computations
MOBSTER often outperforms ASHA substantially. However, when applied to a problem where many evaluations can be done, fitting the GP surrogate model to all observed data can become slow. In fact, Gaussian process inference scales cubically in the number of observations. The amount of computation spent by MOBSTER can be controlled:
Setting the limit
max_size_data_for_model
: Once the total number of observations is above this limit, the data is sampled down to this size. This is done in a way which retains all observations from trials which reached higher rung levels, while data from trials stopped early are more likely to be removed. This down sampling is redone every time the surrogate model is fit, so that new data (especially at higher rungs) is taken into account. Also, scheduling decisions about stopping, pausing, or promoting trials are always done based on all data.The default value for
max_size_data_for_model
isDEFAULT_MAX_SIZE_DATA_FOR_MODEL
. It can be changed by passingsearch_options = {"max_size_data_for_model": XYZ}
when creating the MOBSTER scheduler. You can switch off the limit mechanism by passingNone
or a very large value. As the current default value is on the smaller end, to ensure fast computations, you may want to experiment with larger values as well.Parameters
opt_skip_init_length
,opt_skip_period
: When fitting the GP surrogate model, the most expensive computation by far is refitting its own parameters, such as kernel parameters. The frequency of this computation can be regulated, as detailed here.
Hyper-Tune
Hyper-Tune is a model-based extension of
ASHA with some additional features compared to MOBSTER. It can be seen as
extending MOBSTER-INDEP (with the "gp_independent"
surrogate model) in two
ways. First, it uses an acquisition function based on an ensemble predictive
distribution, while MOBSTER relies on the \(r_{acq}\) heuristic from BOHB.
Second, if multiple brackets are used (Hyperband case), Hyper-Tune offers an
adaptive mechanism to sample the bracket for a new trial. Both extensions are
based on a quantification of consistency of data on different rung levels, which
is used to weight rung levels according to their reliability for making
decisions (namely, which configuration \(\mathbf{x}\) and bracket
\(r_{min}\) to associate with a new trial).
Our launcher script runs Hyper-Tune
if method="HYPERTUNE-INDEP"
. The searcher can be configured with
search_options
, but the independent GPs model "gp_independent"
is the
default. In this example, Hyper-Tune is using a single bracket, so the
difference to MOBSTER-INDEP is due to the ensemble predictive distribution for
the acquisition function.
Syne Tune also implements Hyper-Tune with the GP multi-task surrogate models
used in MOBSTER. In result plots for this tutorial, original Hyper-Tune is
called HYPERTUNE-INDEP, while this latter variant is called HYPERTUNE-JOINT.
Our launcher script runs this variant
if method="HYPERTUNE-JOINT"
.
API docs:
Baseline:
HyperTune
Additional arguments:
HyperbandScheduler
search_options
:HyperTuneSearcher
(search_options["model"] = "gp_independent"
by default, but HYPERTUNE-JOINT is using"gp_multitask"
)
Finally, computations of Hyper-Tune can be controlled in the same way as in MOBSTER.
Hyper-Tune with Multiple Brackets
Just like ASHA and MOBSTER, Hyper-Tune can also be run with multiple brackets,
simply by using the brackets
argument of
HyperbandScheduler
. If brackets > 1
,
Hyper-Tune samples the bracket for a new trial from an adaptive distribution
closely related to the ensemble distribution used for acquisitions. Our
launcher script runs Hyper-Tune with 4
brackets if method="HYPERTUNE4-INDEP"
.
Recall that both ASHA and MOBSTER tend to work better for one than for multiple brackets. This may well be due to the fixed, non-adaptive distribution that brackets are sampled from. Ideally, a method would learn over time whether a low rung level tends to be reliable in predicting the ordering at higher ones, or whether it should rather be avoided (and \(r_{min}\) should be increased). This is what the adaptive mechanism in Hyper-Tune tries to do. In our comparisons, we find that HYPERTUNE-INDEP with multiple brackets can outperform MOBSTER-JOINT with a single bracket.
Details
In this section, we provide some details about Hyper-Tune and our implementation. The Hyper-Tune extensions are based on a quantification of consistency of data on different rung levels For example, assume that \(r < r_{*}\) are two rung levels, with sufficiently many points at \(r_{*}\). If \(\mathcal{X}_{*}\) collects trials with data at \(r_{*}\), all these have also been observed at \(r\). Sampling \(f(\mathcal{X}_{*}, r)\) from the posterior distribution of the surrogate model, we can compare the ordering of these predictions at \(r\) with the ordering of observations at \(r_{*}\), using a pair-wise ranking loss. A large loss value means frequent cross-overs of learning curves between \(r\) and \(r_{*}\), and predictions at rung level \(r\) are unreliable when it comes to the ordering of trials \(\mathcal{X}_{*}\) at \(r_{*}\).
At any point during the algorithm, denote by \(r_{*}\) the largest rung
level with a sufficient number of observations (our implementation requires 6
points). Assuming that \(r_{*} > r_{min}\), we can estimate a distribution
\([\theta_r]\) over rung levels \(\mathcal{R}_{*} =
\{r\in\mathcal{R}\, |\, r\le r_{*}\}\) as follows. We draw \(S\) independent
samples from the model at these rung levels. For each sample \(s\), we
compute loss values \(l_{r, s}\) for \((r, r_{*})\) over all
\(r\in\mathcal{R}_{*}\), and determine the argmin
indicator
\([\text{I}_{l_{r, s} = m_s}]\), where
\(m_s = \text{min}(l_{r, s} | r\in\mathcal{R}_{*})\). The distribution
\([\theta_r]\) is obtained as normalized sum of these indicators over
\(s=1,\dots, S\). We also need to compute loss values \(l_{r_{*}, s}\),
this is done using a cross-validation approximation, see
here or the code in
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune
for details. In the beginning, with too little data at the second rung level,
we use \(\theta_{r_{min}} = 1\) and 0 elsewhere.
Decisions about a new configuration are based on an acquisition function over a predictive distribution indexed by \(\mathbf{x}\) alone. For Hyper-Tune, an ensemble distribution with weighting distribution \([\theta_r]\) is used. Sampling from this distribution works by first sampling \(r\sim [\theta_r]\), then \(f(\mathbf{x}) = f(\mathbf{x}, r)\) from the predictive distribution for that \(r\). This means that models from all rung levels are potentially used, weighted by how reliable they predict the ordering at the highest level \(r_{*}\) supported by data. In our experiments so far, this adaptive weighting can outperform the \(r_{acq}\) heuristic used in BOHB and MOBSTER.
Note that our implementation generalizes Hyper-Tune in that ranking losses and \([\theta_r]\) are estimated once \(r_{*} > r_{min}\) (i.e., once \(r_{*}\) is equal to the second rung level). In the original work, one has to wait until \(r_{*} = r_{max}\), i.e. the maximum rung level is supported by enough data. We find that for many expensive tuning problems, early decision-making can make a large difference, so if the Hyper-Tune extensions provide benefits, they should be used as early during the experiment as possible. For example, in the trial plots for Hyper-Tune shown above, it takes more than 10000 seconds for 6 trials to reach the full 200 epochs, so in the original variant of Hyper-Tune, advanced decision-making only starts when more than half of the experiment is already done.
If Hyper-Tune is used with more than one bracket, the \([\theta_r]\) is also used in order to sample the bracket for a new trial. To this end, we need to determine a distribution \(P(r)\) over all rung levels which feature as \(r_{min}\) in a bracket. In our NASBench-201 example, if Hyper-Tune is run with 5 brackets, the support of \(P(r)\) would be \(\mathcal{S} = \{1, 3, 9, 27, 81\}\). Also, denote the default distribution used in ASHA and MOBSTER by \(P_0(r)\). Let \(r_0 = \text{min}(r_{*}, \text{max}(\mathcal{S}))\). For \(r\in\mathcal{S}\), we define \(P(r) = M \theta_r / r\) for \(r\le r_0\), and \(P(r) = P_0(r)\) for \(r > r_0\), where \(M = \sum_{r\in\mathcal{S}, r\le r_0} P_0(r)\). In other words, we use \(\theta_r / r\) for rung levels supported by data, and the default \(P_0(r)\) elsewhere. Once more, this slightly generalizes Hyper-Tune.
DyHPO
DyHPO is another recent model-based
multi-fidelity method. It is a promotion-based scheduler like the ones below
with type="promotion"
, but differs from MOBSTER and Hyper-Tune in that
promotion decisions are done based on the surrogate model, not on the
quantile-based rule of successive halving. In a nutshell:
Rung levels are equi-spaced: \(\mathcal{R} = \{ r_{min}, r_{min} + \nu, r_{min} + 2 \nu, \dots \}\). If \(r_{min} = \nu\), this means that a trial which is promoted or started from scratch, always runs for \(\nu\) resources, independent of its current rung level.
Once a worker is free, we can either promote a paused trial or start a new one. In DyHPO, all paused trials compete with a number of new configurations for the next \(\nu\) resources to be spent. The scoring criterion is a special version of expected improvement, so depends on the surrogate model.
Different to MOBSTER, the surrogate model is used more frequently. Namely, in MOBSTER, if any trial can be promoted, the surrogate model is not accessed. This means that DyHPO comes with higher decision-making costs, which need to be controlled.
Since scoring trials paused at the highest rung populated so far requires extrapolation in terms of resource \(r\), it cannot be used with
search_options["model"] = "gp_independent"
. The other surrogate models are supported.
Our implementation of DyHPO differs from the published work in a number of important points:
DyHPO uses an advanced surrogate model based on a neural network covariance kernel which is fitted to the current data. Our implementation supports DyHPO with the GP surrogate models detailed above, except for
"gp_independent"
.Our decision rule is different from DyHPO as published, and can be seen as a hybrid between DyHPO and ASHA. Namely, we throw a coin \(\{0, 1\}\) with probability \(P_1\) being configurable as
probability_sh
. If this gives 1, we try to promote a trial using the ASHA rule based on quantiles. Here, the quantile thresholds are adjusted to the linear spacing of rung levels. If no trial can be promoted this way, we fall back to the DyHPO rule. If the coin comes up 0, we use the DyHPO rule. The algorithm as published is obtained for \(P_1 = 0\). However, we find that a non-zeroprobability_sh
is crucial for obtaining robust behaviour, since the original DyHPO rule on its own tends to start too many trials at the beginning before promoting any paused ones.Since in DyHPO, the surrogate model is used more frequently than in MOBSTER, it is important to control surrogate model computations, as detailed above. Apart from the default for
max_size_data_for_model
, we also useopt_skip_period = 3
as default for DyHPO.
API docs:
Baseline:
DyHPO
Additional arguments:
HyperbandScheduler
search_options
:GPMultiFidelitySearcher
(note thatsearch_options["model"]
must not be equal to"gp_independent"
)
Comparison of Methods
In this section, we present an empirical comparison of all methods discussed in this tutorial. The methodology of our study is as follows:
We use the NASBench-201 benchmark (CIFAR100 dataset)
All methods are run with a
max_wallclock_time
limit of 6 hours (or 21600 seconds). We plot minimum validation error attained as function of wallclock time (which, in our case, is simulated time)Results are aggregated over a number of repetitions. The number of repetitions is 50 for SYNCSH, SYNCHB, BOHB, DEHB, ASHA-STOP, ASHA-PROM, ASHA6-STOP and SYNCMOBSTER, while MOBSTER-JOINT, MOBSTER-INDEP, HYPERTUNE1-INDEP, HYPERTUNE4-INDEP and HYPERTUNE-JOINT are repeated 30 times. Figures plot the interquartile mean in bold and a bootstrap 95% confidence interval for this estimator in dashed lines (the IQM is a robust estimator of the mean, but depends on more data than the median)
SYNCSH, ASHA-STOP, ASHA-PROM, MOBSTER-JOINT, MOBSTER-INDEP, HYPERTUNE1-INDEP use 1 bracket, HYPERTUNE4-INDEP, HYPERTUNE-JOINT use 4 brackets, and SYNCHB, BOHB, DEHB, SYNCMOBSTER use the maximum of 6 brackets
In SYNCSH, SYNCHB, ASHA-STOP, ASHA-PROM, ASHA6-STOP, new configurations are drawn at random, while BOHB, SYNCMOBSTER, MOBSTER-JOINT, MOBSTER-INDEP, HYPERTUNE1-INDEP, HYPERTUNE4-INDEP, HYPERTUNE-JOINT are variants of Bayesian optimization. In DEHB, configurations in the first bracket are drawn at random, but in later brackets, they are evolved from earlier ones
ASHA-STOP, ASHA6-STOP use early stopping, while SYNCSH, SYNCHB, BOHB, SYNCMOBSTER, ASHA-PROM, MOBSTER-JOINT, MOBSTER-INDEP, HYPERTUNE1-INDEP, HYPERTUNE4-INDEP, HYPERTUNE-JOINT use pause-and-resume. DEHB is a synchronous method, but does not resume trials from checkpoints (except in the very first bracket)
Here are results, grouped by synchronous decision-making, asynchronous decision-making (promotion type), and asynchronous decision-making (stopping type). ASHA-PROM results are repeated in all plots for reference.
Synchronous Multi-fidelity HPO |
Asynchronous Multi-fidelity HPO (promotion) |
Asynchronous Multi-fidelity HPO (stopping) |
These results are obtained on a single benchmark with a rather small configuration space. Nevertheless, they are roughly in line with results we obtained on a larger range of benchmarks. A few conclusions can be drawn, which may help readers choosing the best HPO method and its configuration for their own problem.
Asynchronous methods outperform synchronous ones in general, in particular when it comes to any-time performance. A notable exception (on this benchmark) is SYNCMOBSTER, which performs en par with the best asynchronous methods.
Among the synchronous methods, SYNCMOBSTER performs best, followed by BOHB. SYNCHB and SYNCSH perform very similar. The performance of DEHB is somewhat disappointing on this benchmark.
The best-performing methods on this benchmark are MOBSTER-JOINT and HYPERTUNE1-INDEP, with HYPERTUNE4-INDEP a close runner-up. For MOBSTER, the joint multi-task surrogate model should be preferred, while for HYPERTUNE, the independent GPs model works better.
On this benchmark, moving to multiple brackets does not pay off for the asynchronous methods. However, on benchmarks where the choice of \(r_{min}\) is more critical, moving beyond successive halving can be beneficial. In such cases, we currently recommend to use HYPERTUNE-INDEP, whose adaptive weighting and bracket sampling is clearly more effective than simpler heuristics used in Hyperband or BOHB.
Benchmarking in Syne Tune
Benchmarking refers to the comparison of a range of HPO algorithms on one or more tuning problems, or benchmarks. This tutorial provides an overview of tooling which facilitates benchmarking of HPO algorithms in Syne Tune. The same tooling can be used to rapidly create launcher scripts for any HPO experiment, allowing you to easily switch between local, SageMaker, and simulator backend. The tutorial also shows how any number of experiments can be run in parallel, in order to obtain desired results faster.
Note
In order to run the code in this tutorial, you need to have
installed Syne Tune from source.
Also, make sure to have installed the blackbox-repository
dependencies.
Note
Benchmarking (i.e., comparing different HPO methods) is using the Syne Tune
experimentation framework in syne_tune.experiments
. In this framework,
a benchmark is simply just a tuning problem endowed with some defaults.
There are other use cases of experimentation than benchmarking (see
here and
here), but the term benchmark for tuning
problem is used in all of them.
Benchmarking with Simulator Backend
The fastest and cheapest way to compare a number of different HPO methods, or
variants thereof, is benchmarking with the simulator backend. In this case,
all training evaluations are simulated by querying metric and time values from
a tabulated blackbox or a surrogate model. Not only are expensive computations
on GPUs avoided, but the experiment also runs faster than real time. In some
cases, results for experiments with max_wallclock_time
of several hours,
can be obtained in a few seconds.
Note
In order to use surrogate benchmarks and the simulator backend, you need
to have the blackbox-repository
dependencies installed, as detailed
here.
For the YAHPO blackbox, you also need the yahpo
dependencies. Note that
the first time you use a surrogate benchmark, its data files are downloaded
and stored to your S3 bucket, this can take a considerable amount of time.
The next time you use the benchmark, it is loaded from your local disk or
your S3 bucket, which is fast.
Note
The experimentation framework in syne_tune.experiments
which is used
here, is not limited to benchmarking (i.e., comparing the performance
between different HPO methods), but is also the default way to run many
experiments in parallel, say with different configuration spaces. This is
explained more in
this tutorial.
Defining the Experiment
As usual in Syne Tune, the experiment is defined by a number of scripts. We
will look at an example in
benchmarking/examples/benchmark_hypertune/.
Common code used in these benchmarks can be found in syne_tune.experiments
.
Local launcher:
syne_tune.experiments.launchers.hpo_main_simulator
Remote launcher:
syne_tune.experiments.launchers.launch_remote_simulator
Benchmark definitions:
syne_tune.experiments.benchmark_definitions
Let us look at the scripts in order, and how you can adapt them to your needs:
benchmarking/examples/benchmark_hypertune/baselines.py: Defines the HPO methods to take part in the experiment, in the form of a dictionary
methods
which maps method names to factory functions, which in turn mapMethodArguments
to scheduler objects. TheMethodArguments
class contains the union of attributes needed to configure schedulers. In particular,scheduler_kwargs
contains constructor arguments. For your convenience, the mapping fromMethodsArguments
to scheduler are defined for most baseline methods insyne_tune.experiments.default_baselines
(as noted just below, this mapping involves merging argument dictionaries), but you can override arguments as well (for example,type
in the examples here). Note that if you like to compare different variants of a method, you need to create different entries inmethods
, for exampleMethods.MOBSTER_JOINT
andMethods.MOBSTER_INDEP
are different variants of MOBSTER.benchmarking/examples/benchmark_hypertune/benchmark_definitions.py: Defines the benchmarks to be considered in this experiment, in the form of a dictionary
benchmark_definitions
with values of typeSurrogateBenchmarkDefinition
. In general, you will just pick definitions fromsyne_tune.experiments.benchmark_definitions
, unless you are using your own surrogate benchmark not contained in Syne Tune. But you can also modify parameters, for examplesurrogate
andsurrogate_kwargs
in order to select a different surrogate model, or you can change the defaults forn_workers
ormax_wallclock_time
.benchmarking/examples/benchmark_hypertune/hpo_main.py: Script for launching experiments locally. All you typically need to do here is to import
syne_tune.experiments.launchers.hpo_main_simulator
and (optionally) to add additional command line arguments you would like to parameterize your experiment with. In our example here, we add two options,num_brackets
which configures Hyperband schedulers, andnum_samples
which configures the Hyper-Tune methods only. Apart fromextra_args
, you also need to definemap_method_args
, which modifiesmethod_kwargs
(the arguments ofMethodArguments
) based on the extra arguments. Details formap_method_args
are given just below. Finally,main()
is called with yourmethods
andbenchmark_definitions
dictionaries, and (optionally) withextra_args
andmap_method_args
. We will see shortly how the launcher is called, and what happens inside.benchmarking/examples/benchmark_hypertune/launch_remote.py: Script for launching experiments remotely, in that each experiment runs as its own SageMaker training job, in parallel with other experiments. You need to import
syne_tune.experiments.launchers.launch_remote_simulator
and pass the samemethods
,benchmark_definitions
,extra_args
as inbenchmarking.examples.benchmark_hypertune.hpo_main
. Moreover, you need to specify paths for source dependencies. If you installed Syne Tune from sources, it is easiest to specifysource_dependencies=benchmarking.__path__
, as this allows access to all benchmarks and examples included there. On top of that, you can pass an indicator functionis_expensive_method
to tag the HPO methods which are themselves expensive to run. As detailed below, our script runs different seeds (repetitions) in parallel for expensive methods, but sequentially for cheap ones. We will see shortly how the launcher is called, and what happens inside.benchmarking/examples/benchmark_hypertune/requirements.txt: Dependencies for
hpo_main.py
to be run remotely as SageMaker training job, in the context of launching experiments remotely. In particular, this needs the dependencies of Syne Tune itself. A safe bet here issyne-tune[extra]
andtqdm
(which is the default ifrequirements.txt
is missing). However, you can decrease startup time by narrowing down the dependencies you really need (see FAQ). In our example here, we needgpsearchers
andkde
for methods. For simulated experiments, you always need to haveblackbox-repository
here. In order to use YAHPO benchmarks, also addyahpo
.
Specifying Extra Arguments
In many cases, you will want to run different methods using their default
arguments, or only change them as part of the definition in baselines.py
.
But sometimes, it can be useful to be able to set options via extra command line
arguments. This can be done via extra_args
and map_method_args
, which are
typically used in order to be able to configure scheduler arguments for certain
methods. But in principle, any argument of
MethodArguments
can be modified. Here,
extra_args
is simply extending arguments to the command line parser, where the
name
field contains the name of the option without any leading “-“.
map_method_args
has the signature
method_kwargs = map_method_args(args, method, method_kwargs)
Here, method_kwargs
are arguments of
MethodArguments
, which can be modified
by map_method_args
(the modified dictionary is returned). args
is the
result of command line parsing, and method
is the name of the method to
be constructed based on these arguments. The latter argument allows
map_method_args
to depend on the method. In our example
benchmarking/examples/benchmark_hypertune/hpo_main.py,
num_brackets
applies to all methods, while num_samples
only applies
to the variants of Hyper-Tune. Both arguments modify the dictionary
scheduler_kwargs
in MethodArguments
,
which contains constructor arguments for the scheduler.
Note the use of recursive_merge
. This means that the changes done in
map_method_args
are recursively merged into the prior method_kwargs
. In
our example, we may already have method_kwargs.scheduler_kwargs
or even
method_kwargs.scheduler_kwargs.search_options
. While the new settings here
take precedence, prior content of method_kwargs
not affected remains in
place. In the same way, extra arguments passed to baseline wrappers in
syne_tune.experiments.default_baselines
are recursively merged into the
arguments determined by the default logic.
Note
map_method_args
is applied to rewrite method_kwargs
just before the
method is created. This means that all entries of
MethodArguments
can be modified from
their default values. You can also use map_method_args
independent of
extra_args
(however, if extra_args
is given, then map_method_args
must be given as well).
Writing Extra Results
By default, Syne Tune writes result files metadata.json
, results.csv.zip
,
and tuner.dill
for every experiment, see
here. Here,
results.csv.zip
contains all data reported by training jobs, along with
time stamps. The contents of this dataframe can be customized, by adding extra
columns to it. This is done by passing extra_results_composer
of type
ExtraResultsComposer
when creating the
StoreResultsCallback
callback, which
is passed in callbacks
to Tuner
. You can use this
mechanism by passing a ExtraResultsComposer
object as extra_results
to main
. This object extracts extra information
and returns it as dictionary, which is appended to the results dataframe. A
complete example is
benchmarking/examples/benchmark_dyhpo.
Launching Experiments Locally
Here is an example of how simulated experiments are launched locally (if you
installed Syne Tune from source, you need to start the script from the
benchmarking/examples
directory):
python benchmark_hypertune/hpo_main.py \
--experiment_tag tutorial-simulated --benchmark nas201-cifar100 \
--method ASHA --num_seeds 10
This call runs a number of experiments sequentially on the local machine:
experiment_tag
: Results of experiments are written to~/syne-tune/{experiment_tag}/*/{experiment_tag}-*/
. This name should confirm to S3 conventions (alphanumerical and-
; no underscores).benchmark
: Selects benchmark from keys ofbenchmark_definitions
. If this is not given, experiments for all keys inbenchmark_definitions
are run in sequence.method
: Selects HPO method to run from keys ofmethods
. If this is not given, experiments for all keys inmethods
are run in sequence.num_seeds
: Each experiment is runnum_seeds
times with different seeds (0, ..., num_seeds - 1
). Due to random factors both in training and tuning, a robust comparison of HPO methods requires such repetitions. Fortunately, these are cheap to obtain in the simulation context. Another parameter isstart_seed
(default: 0), giving seedsstart_seed, ..., num_seeds - 1
. For example,--start_seed 5 --num_seeds 6
runs for a single seed equal to 5. The dependence of random choices on the seed is detailed below.max_wallclock_time
,n_workers
: These arguments overwrite the defaults specified in the benchmark definitions.max_size_data_for_model
: Parameter for Bayesian optimization, MOBSTER or Hyper-Tune, see here and here.scale_max_wallclock_time
: If 1, and ifn_workers
is given as argument, but notmax_wallclock_time
, the benchmark defaultbenchmark.max_wallclock_time
is multiplied by :math:B / min(A, B)
, whereA = n_workers
,B = benchmark.n_workers
. This means we run for longer ifn_workers < benchmark.n_workers
, but keepbenchmark.max_wallclock_time
the same otherwise.use_long_tuner_name_prefix
: If 1, results for an experiment are written to a directory whose prefix isf"{experiment_tag}-{benchmark_name}-{seed}"
, followed by a postfix containing date-time and a 3-digit hash. If 0, the prefix isexperiment_tag
only. The default is 1 (long prefix).restrict_configurations
: See below.fcnet_ordinal
: Applies to FCNet benchmarks only. The hyperparameterhp_init_lr
has domainchoice([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1])
. Since the parameter is really ordinal, this is not a good choice. With this option, the domain can be switched to different variants ofordinal
. The default isnn-log
, which is the domainlogordinal([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1])
(this is also the replacement whichstreamline_config_space()
would do). In order to keep the original categorical domain, use--fcnet_ordinal none
.
If you defined additional arguments via extra_args
, you can use them
here as well. For example, --num_brackets 3
would run all
multi-fidelity methods with 3 brackets (instead of the default 1).
Launching Experiments Remotely
There are some drawbacks of launching experiments locally. First, they block
the machine you launch from. Second, different experiments are run sequentially,
not in parallel. Remote launching has exactly the same parameters as launching
locally, but experiments are sliced along certain axes and run in parallel,
using a number of SageMaker training jobs. Here is an example (if you
installed Syne Tune from source, you need to start the script from the
benchmarking/examples
directory):
python benchmark_hypertune/launch_remote.py \
--experiment_tag tutorial-simulated --benchmark nas201-cifar100 \
--num_seeds 10
Since --method
is not used, we run experiments for all methods. Also, we
run experiments for 10 seeds. There are 7 methods, so the total number of
experiments is 70 (note that we select a single benchmark here). Running this
command will launch 43 SageMaker training jobs, which do the work in parallel.
Namely, for methods ASHA
, SYNCHB
, BOHB
, all 10 seeds are run
sequentially in a single SageMaker job, since our is_expensive_method
function returns False
for them. Simulating experiments is so fast for
these methods that it is best to run seeds sequentially. However, for
MOBSTER-JOINT
, MOBSTER-INDEP
, HYPERTUNE-INDEP
, HYPERTUNE-JOINT
,
our is_expensive_method
returns True
, and we use one SageMaker
training jobs for each seeds, giving rise to 4 * 10 = 40
jobs running in
parallel. For these methods, the simulation time is quite a bit longer, because
decision making takes more time (these methods fit Gaussian process surrogate
models to data and optimize acquisition functions). Results are written to
~/syne-tune/{experiment_tag}/ASHA/
for the cheap method ASHA
, and to
/syne-tune/{experiment_tag}/MOBSTER-INDEP-3/
for the expensive method
MOBSTER-INDEP
and seed 3.
The command above selected a single benchmark nas201-cifar100
. If
--benchmark
is not given, we iterate over all benchmarks in
benchmark_definitions
. This is done sequentially, which works fine for a
limited number of benchmarks.
However, you may want to run experiments on a large number of benchmarks, and
to this end also parallelize along the benchmark axis. To do so, you can pass
a nested dictionary as benchmark_definitions
. For example, we could use the
following:
from syne_tune.experiments.benchmark_definitions import (
nas201_benchmark_definitions,
fcnet_benchmark_definitions,
lcbench_selected_benchmark_definitions,
)
benchmark_definitions = {
"nas201": nas201_benchmark_definitions,
"fcnet": fcnet_benchmark_definitions,
"lcbench": lcbench_selected_benchmark_definitions,
}
In this case, experiments are sliced along the axis
("nas201", "fcnet", "lcbench")
to be run in parallel in different SageMaker
training jobs.
Dealing with ResourceLimitExceeded Errors
When launching many experiments in parallel, you may run into your AWS resource
limits, so that no more SageMaker training jobs can be run. The default behaviour
in this case is to wait for 10 minutes and try again. You can influence this by
--estimator_fit_backoff_wait_time <wait_time>
, where <wait_time>
is the
waiting time between attempts in seconds. If this is 0 or negative, the script
terminates with an error once your resource limits are reached.
Pitfalls of Experiments from Tabulated Blackboxes
Comparing HPO methods on tabulated benchmarks, using simulation, has obvious benefits. Costs are very low. Moreover, results are often obtain many times faster than real time. However, we recommend you do not rely on such kind of benchmarking only. Here are some pitfalls:
Tabulated benchmarks are often of limited complexity, because more complex benchmarks cannot be sampled exhaustively
Tabulated benchmarks do not reflect the stochasticity of real benchmarks (e.g., random weight initialization, random ordering of mini-batches)
While tabulated benchmarks like
nas201
orfcnet
are evaluated exhaustively or on a fine grid, other benchmarks (likelcbench
) contain observations only at a set of randomly chosen configurations, while their configuration space is much larger or even infinite. For such benchmarks, you can either restrict the scheduler to suggest configurations only from the set supported by the benchmark (see subsection just below), or you can use a surrogate model which interpolates observations from those contained in the benchmark to all others in the configuration space. Unfortunately, the choice of surrogate model can strongly affect the benchmark, for the same underlying data. As a general recommendation, you should be careful with surrogate benchmarks which offer a large configuration space, but are based on only medium amounts of real data.
Restricting Scheduler to Configurations of Tabulated Blackbox
For a tabulated benchmark like lcbench
, most entries of the configuration
space are not covered by data. For such, you can either use a surrogate, which
can be configured by attributes surrogate
, surrogate_kwargs
, and
add_surrogate_kwargs
of
SurrogateBenchmarkDefinition
.
Or you can restrict the scheduler to only suggest configurations covered by
data. The latter is done by the option --restrict_configurations 1
. The
advantage of doing so is that your comparison does not depend on the choice of
surrogate, but only on the benchmark data itself. However, there are also some
drawbacks:
This option is currently not supported for the following schedulers:
Grid Search
SyncBOHB
BOHB
DEHB
REA
KDE
PopulationBasedTraining
ZeroShotTransfer
ASHACTS
MOASHA
Schedulers like Gaussian process based Bayesian optimization typically use local gradient-based optimization of the acquisition function. This is not possible with
--restrict_configurations 1
. Instead, they evaluate the acquisition function at a finite numbernum_init_candidates
of points and pick the best oneIn general, you should avoid to use surrogate benchmarks which offer a large configuration space, but are based on only medium amounts of real data. When using
--restrict_configurations 1
with such a benchmark, your methods may perform better than they should, just because they nearly sample the space exhaustively
In general, --restrict_configurations 1
is supported for schedulers which
select the next configuration from a finite set. In contrast, methods like
DEHB
or BOHB
(or Bayesian optimization with local acquisition function
optimization) optimize over encoded vectors, then round the solution back to a
configuration. In order to use a tabulated benchmark like lcbench
with these
methods, you need to specify a surrogate. Maybe the least intrusive surrogate
is nearest neighbor. Here is the benchmark definition for lcbench
:
def lcbench_benchmark(dataset_name: str, datasets=None) -> SurrogateBenchmarkDefinition:
"""
The default is to use nearest neighbour regression with ``K=1``. If
you use a more sophisticated surrogate, it is recommended to also
define ``add_surrogate_kwargs``, for example:
.. code-block:: python
surrogate="RandomForestRegressor",
add_surrogate_kwargs={
"predict_curves": True,
"fit_differences": ["time"],
},
:param dataset_name: Value for ``dataset_name``
:param datasets: Used for transfer learning
:return: Definition of benchmark
"""
return SurrogateBenchmarkDefinition(
max_wallclock_time=7200,
n_workers=4,
elapsed_time_attr="time",
metric="val_accuracy",
mode="max",
blackbox_name="lcbench",
dataset_name=dataset_name,
surrogate="KNeighborsRegressor", # 1-nn surrogate
surrogate_kwargs={"n_neighbors": 1},
max_num_evaluations=4000,
datasets=datasets,
max_resource_attr="epochs",
)
The 1-NN surrogate is selected by surrogate="KNeighborsRegressor"
and setting
the number of nearest neighbors to 1. For each configuration, the surrogate finds
the nearest neighbor in the table (w.r.t. Euclidean distance between encoded
vectors) and returns its metric values.
Selecting Benchmarks from benchmark_definitions
Each family of tabulated (or surrogate) blackboxes accessible to the
benchmarking tooling discussed here, are represented by a Python file in
syne_tune.experiments.benchmark_definitions
(the same directly also
contains definitions for real benchmarks). For example:
NASBench201 (
syne_tune.experiments.benchmark_definitions.nas201
): Tabulated, no surrogate needed.FCNet (
syne_tune.experiments.benchmark_definitions.fcnet
): Tabulated, no surrogate needed.LCBench (
syne_tune.experiments.benchmark_definitions.lcbench
): Needs surrogate model (scikit-learn regressor) to be selected.YAHPO (
syne_tune.experiments.benchmark_definitions.yahpo
): Contains a number of blackboxes, some with a large number of instances. All these are surrogate benchmarks, with a special surrogate model.
Typically, a blackbox concerns a certain machine learning algorithm with a fixed configuration space. Many of them have been evaluated over a number of different datasets. Note that in YAHPO, a blackbox is called scenario, and a dataset is called instance, so that a scenario can have a certain number of instances. In our terminology, a tabulated benchmark is obtained by selecting a blackbox together with a dataset.
The files in syne_tune.experiments.benchmark_definitions
typically
contain:
Functions named
*_benchmark
, which map arguments (such asdataset_name
) to the benchmark definitionSurrogateBenchmarkDefinition
and*
being the name of the blackbox (or scenario).Dictionaries named
*_benchmark_definitions
withSurrogateBenchmarkDefinition
values. If a blackbox has a lot of datasets, we also define a dictionary*_selected_benchmark_definitions
, which selects benchmarks which are interesting (e.g., not all baselines achieving the same performance rapidly). In general, we recommend starting with these selected benchmarks.
The YAHPO Family
A rich source of blackbox surrogates in Syne Tune comes from
YAHPO, which is also detailed in
this paper. YAHPO contains a number of
blackboxes (called scenarios), some of which over a lot of datasets (called
instances). All our definitions are in
syne_tune.experiments.benchmark_definitions.yahpo
. Further details can
also be found in the import code
syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import
.
Here is an overview:
yahpo_nb301
: NASBench301. Single scenario and instance.yahpo_lcbench
: LCBench. Same underlying data than our own LCBench, but different surrogate model.yahpo_iaml
: Family of blackboxes, parameterized by ML method (yahpo_iaml_methods
) and target metric (yahpo_iaml_metrics
). Each of th`ese have 4 datasets (OpenML datasets).yahpo_rbv2
: Family of blackboxes, parameterized by ML method (yahpo_rbv2_methods
) and target metric (yahpo_rbv2_metrics
). Each of these come with a large number of datasets (OpenML datasets). Note that compared to YAHPO Gym, we filtered out scenarios which are invalid (e.g., F1 score 0, AUC/F1 equal to 1). We also determined usefulmax_wallclock_time
values (yahpo_rbv2_max_wallclock_time
), and selected benchmarks which show interesting behaviour (yahpo_rbv2_selected_instances
).
Note
At present (YAHPO Gym v1.0), the yahpo_lcbench
surrogate has been
trained on invalid LCBench original data (namely, values for first and last
fidelity value have to be removed). As long as this is not fixed, we
recommend using our built-in lcbench
blackbox instead.
Note
In YAHPO Gym, yahpo_iaml
and yahpo_rbv2
have a fidelity attribute
trainsize
with values between 1/20
and 1
, which is the fraction
of full dataset the method has been trained. Our import script multiplies
trainsize
values with 20 and designates type randint(1, 20)
, since
common Syne Tune multi-fidelity schedulers require resource_attr
values
to be positive integers. yahpo_rbv2
has a second fidelity attribute
repl
, whose value is constant 10, this is removed by our import script.
Benchmarking with Local Backend
A real benchmark (as opposed to a benchmark based on tabulated data or a surrogate model) is based on a training script, which is executed for each evaluation. The local backend is the default choice in Syne Tune for running on real benchmarks.
Note
While Syne Tune contains benchmark definitions for all surrogate benchmarks
in syne_tune.experiments.benchmark_definitions
, examples for real
benchmarks are only available when Syne Tune is installed from source.
They are located in benchmarking
.
Defining the Experiment
As usual in Syne Tune, the experiment is defined by a number of scripts.
We will look at an example in
benchmarking/examples/launch_local/.
Common code used in these benchmarks can be found in
syne_tune.experiments
:
Local launcher:
syne_tune.experiments.launchers.hpo_main_local
Remote launcher:
syne_tune.experiments.launchers.launch_remote_local
Definitions for real benchmarks:
benchmarking.benchmark_definitions
Let us look at the scripts in order, and how you can adapt them to your needs:
benchmarking/examples/launch_local/baselines.py: This is the same as in the simulator case.
benchmarking/examples/launch_local/hpo_main.py: This is the same as in the simulator case, but based on
syne_tune.experiments.launchers.hpo_main_local
. We will see shortly how the launcher is called, and what happens inside.benchmarking/examples/launch_local/launch_remote.py: Much the same as in the simulator case, but based on
syne_tune.experiments.launchers.launch_remote_local
. We will see shortly how the launcher is called, and what happens inside. Note thatsource_dependencies=benchmarking.__path__
, which allows the launcher script to access the training code and benchmark definitions.benchmarking/examples/launch_local/requirements-synetune.txt: This file is for defining the requirements of the SageMaker training job in remote launching, it mainly has to contain the Syne Tune dependencies. Your training script may have additional dependencies, and they are combined with the ones here automatically, as detailed below.
Extra arguments can be specified by extra_args
, map_method_args
, and
extra results can be written using extra_results
, as is explained
here.
Launching Experiments Locally
Here is an example of how experiments with the local backend are launched locally:
python benchmarking/examples/launch_local/hpo_main.py \
--experiment_tag tutorial-local --benchmark resnet_cifar10 \
--method ASHA --num_seeds 1 --n_workers 1
This call runs a single experiment on the local machine (which needs to have a GPU with PyTorch being installed):
experiment_tag
: Results of experiments are written to~/syne-tune/{experiment_tag}/*/{experiment_tag}-*/
. This name should confirm to S3 conventions (alphanumerical and-
; no underscores).benchmark
: Selects benchmark from keys ofreal_benchmark_definitions()
. The default isresnet_cifar10
.method
: Selects HPO method to run from keys ofmethods
. If this is not given, experiments for all keys inmethods
are run in sequence.num_seeds
: Each experiment is runnum_seeds
times with different seeds (0, ..., num_seeds - 1
). Due to random factors both in training and tuning, a robust comparison of HPO methods requires such repetitions. Another parameter isstart_seed
(default: 0), giving seedsstart_seed, ..., num_seeds - 1
. For example,--start_seed 5 --num_seeds 6
runs for a single seed equal to 5.n_workers
,max_wallclock_time
: You can overwrite the default values for the selected benchmark by these command line arguments.max_size_data_for_model
: Parameter for Bayesian optimization, MOBSTER or Hyper-Tune, see here and here.num_gpus_per_trial
: If you run on an instance with more than one GPU, you can prescribe how many GPUs should be allocated to each trial. The default is 1. Note that if the product ofn_workers
andnum_gpus_per_trial
is larger than the number of GPUs on the instance, trials will be delayed.gpus_to_use
: Allows to restrict the GPUs used by Syne Tune. For example, if your instance has 8 GPUs, but you also want to use the latter four of them, usegpus_to_use=[4, 5, 6, 7]
.delete_checkpoints
: If 1, checkpoints of trials are removed whenever they are not needed anymore. The default is 0, in that all checkpoints are retained.scale_max_wallclock_time
: If 1, and ifn_workers
is given as argument, but notmax_wallclock_time
, the benchmark defaultbenchmark.max_wallclock_time
is multiplied by :math:B / min(A, B)
, whereA = n_workers
,B = benchmark.n_workers
. This means we run for longer ifn_workers < benchmark.n_workers
, but keepbenchmark.max_wallclock_time
the same otherwise.use_long_tuner_name_prefix
: If 1, results for an experiment are written to a directory whose prefix isf"{experiment_tag}-{benchmark_name}-{seed}"
, followed by a postfix containing date-time and a 3-digit hash. If 0, the prefix isexperiment_tag
only. The default is 1 (long prefix).
If you defined additional arguments via extra_args
, you can use them here
as well.
Note
When launching an experiment locally, you need to be on an instance which
supports the required computations (e.g., has 1 or more GPUs), and you need
to have installed all required dependencies, including those of the
SageMaker framework. In the example above, resnet_cifar10
uses the
PyTorch
framework, and n_workers=4
by default, which we overwrite by
n_workers=1
: you need to launch on a machine with 1 GPU, and with
PyTorch being installed and properly setup to run GPU computations. If you
cannot be bothered with all of this, please consider
remote launching as an
alternative. On the other hand, you can launch experiments locally without
using SageMaker (or AWS) at all.
Benchmark Definitions
In the example above, we select a benchmark via --benchmark resnet_cifar10
.
All currently included real benchmarks are collected in
real_benchmark_definitions()
,
a function which returns the dictionary of real benchmarks, configured by some
extra arguments. If you are happy with selecting one of these existing benchmarks,
you may safely skip this subsection.
For resnet_cifar10
, this selects
resnet_cifar10_benchmark()
,
which returns meta-data for the benchmark as a
RealBenchmarkDefinition
object. Here, the argument sagemaker_backend
is False
in our case,
since we use the local backend, and additional **kwargs
override arguments
of RealBenchmarkDefinition
. Important arguments are:
script
: Absolute filename of the training script. If your script requires additional dependencies on top of the SageMaker framework, you need to specify them inrequirements.txt
in the same directory.config_space
: Configuration space, this must includemax_resource_attr
metric
,mode
,max_resource_attr
,resource_attr
: Names related to the benchmark, either of methods reported (output) or ofconfig_space
entries (input).max_wallclock_time
,n_workers
,max_num_evaluations
: Defaults for tuner or stopping criterion, suggested for this benchmark.instance_type
: Suggested AWS instance type for this benchmark.framework
,estimator_kwargs
: SageMaker framework and additional arguments to SageMaker estimator.
Note that parameters like n_workers
, max_wallclock_time
, or
instance_type
are given default values here, which can be overwritten
by command line arguments. This is why the function signature ends with
**kwargs
, and we execute _kwargs.update(kwargs)
just before creating
the RealBenchmarkDefinition
object.
Launching Experiments Remotely
Remote launching is particularly convenient for experiments with the local backend, even if you just want to run a single experiment. For local launching, you need to be on an EC2 instance of the desired instance type, and Syne Tune has to be installed there along with all dependencies of your benchmrk. None of this needs to be done for remote launching. Here is an example:
python benchmarking/examples/launch_local/launch_remote.py \
--experiment_tag tutorial-local --benchmark resnet_cifar10 \
--num_seeds 5
Since --method
is not used, we run experiments for all methods (RS
,
BO
, ASHA
, MOBSTER
), and for 5 seeds. These are 20 experiments,
which are mapped to 20 SageMaker training jobs. These will run on instances of
type ml.g4dn.12xlarge
, which is the default for resnet_cifar10
and the
local backend. Instances of this type have 4 GPUs, so we can use n_workers
up to 4 (the default being 4). Results are written to S3, using paths such as
syne-tune/{experiment_tag}/ASHA-3/
for method ASHA
and seed 3.
Finally, some readers may be puzzled why Syne Tune dependencies are defined in
benchmarking/examples/launch_local/requirements-synetune.txt
, and not in
requirements.txt
instead. The reason is that dependencies of the SageMaker
estimator for running the experiment locally is really the union of two such
files. First, requirements-synetune.txt
for the Syne Tune dependencies,
and second, requirements.txt
next to the training script. The remote
launching script is creating a requirements.txt
file with this union in
benchmarking/examples/launch_local/
, which should not become part of the
repository.
Visualizing Tuning Metrics in the SageMaker Training Job Console
When experiments are launched remotely with the local or SageMaker backend, a
number of metrics are published to the SageMaker training job console (this
feature can be switched off with --remote_tuning_metrics 0
):
BEST_METRIC_VALUE
: Best metric value attained so farBEST_TRIAL_ID
: ID of trial for best metric value so farBEST_RESOURCE_VALUE
: Resource value for best metric value so farBEST_HP_PREFIX
, followed by hyperparameter name: Hyperparameter value for best metric value so far
You can inspect these metrics in real time in AWS CloudWatch. To do so:
Locate the training job running your experiment in the AWS SageMaker console. Click on
Training
, thenTraining jobs
, then on the job in the list. For the command above, the jobs are named liketutorial-local-RS-0-XyK8
(experiment tag, then method, then seed, then 4-character hash).Under
Metrics
, you will see a number of entries, starting withbest_metric_value
andbest_trial_id
.Further below, under
Monitor
, click onView algorithm metrics
. This opens a CloudWatch dashboardAt this point, you need to change a few defaults, in that CloudWatch only samples metrics (by grepping the logs) every 5 minutes and then displays average values over the 5-minute window. Click on
Browse
and select the metrics you want to display. For now, selectbest_metric_value
,best_trial_id
,best_resource_value
.Click on
Graphed metrics
, and for every metric, selectPeriod -> 30 seconds
. Also, selectStatistics -> Maximum
for metricsbest_trial_id
,best_resource_value
. Forbest_metric_value
, selectStatistics -> Minimum
if your objective metric is minimized (mode="min"
), andStatistics -> Maximum
otherwise. In ourresnet_cifar10
example, the objective is accuracy, to be maximized, so we select the latter.Finally, select
`10s
for auto-refresh (the circle with arrow in the upper right corner), and change the temporal resolution by displaying1h
(top row).
This visualization shows you the best metric value attained so far, and which
trial attained it for which resource value (e.g., number of epochs). It can be
improved. For example, we could plot the curves in different axes. Also, we can
visualize the best hyperparameter configuration found so far. In the
resnet_cifar10
example, this is given by the metrics best_hp_lr
,
best_hp_batch_size
, best_hp_weight_decay
, best_hp_momentum
.
Random Seeds and Paired Comparisons
Random effects are the most important reason for variations in experimental outcomes, due to which a meaningful comparison of HPO methods needs to run a number of repetitions (also called seeds above). There are two types of random effects:
Randomness in the evaluation of the objective \(f(x)\) to optimize: repeated evaluations of \(f\) for the same configuration \(x\) result in different metric values. In neural network training, these variations originate from random weight initialization and the ordering of mini-batches.
Randomness in the HPO algorithm itself. This is evident for random search and ASHA, but just as well concerns Bayesian optimization, since the initial configurations are drawn at random, and the optimization of the acquisition function involves random choices as well.
Syne Tune allows the second source of randomness to be controlled by passing
a random seed to the scheduler at initialization. If random search is run
several times with the same random seed for the same configuration space,
exactly the same sequence of configurations is suggested. The same holds for ASHA.
When running random search and Bayesian optimization with the same random seed,
the initial configurations (which in BO are either taken from
points_to_evaluate
or drawn at random) are identical.
The scheduler random seed used in a benchmark experiment is a combination of
a master random seed and the seed number introduced above (the latter has
values \(0, 1, 2, \dots\)). The master random seed is passed to
launch_remote.py
or hpo_main.py
as --random_seed
. If no master
random seed is passed, it is drawn at random and output. The master random
seed is also written into metadata.json
as part of experimental results.
Importantly, the scheduler random seed is the same across different methods
for the same seed. This implements a practice called paired comparison,
whereby for each seed, different methods are fed with the same random number
sequence. This practice reduces variance between method outcomes, while
still taking account of randomness by running the experiment several times
(for different seeds \(0, 1, 2, \dots\)).
Note
When comparing several methods on the same benchmark, it is recommended
to (a) repeat the experiment several times (via --num_seeds
), and
to (b) use the same master random seed. If all comparisons are done
with a single call of launch_remote.py
or hpo_main.py
, this is
automatically the case, as the master random seed is drawn at random.
However, if the comparison extends over several calls, make sure to
note down the master random seed from the first call and pass this
value via --random_seed
to subsequent calls. The master random seed
is also stored as random_seed
in the metadata metadata.json
as
part of experimental results.
Benchmarking with SageMaker Backend
The SageMaker backend allows you to run distributed tuning across several instances, where the number of parallel evaluations is not limited by the configuration of an instance, but only by your compute budget.
Defining the Experiment
The scripts required to define an experiment are pretty much the same as in the
local backend case. We will look at an example in
benchmarking/examples/launch_sagemaker/.
Common code used in these benchmarks can be found in
syne_tune.experiments
:
Local launcher:
syne_tune.experiments.launchers.hpo_main_sagemaker
Remote launcher:
syne_tune.experiments.launchers.launch_remote_sagemaker
Definitions for real benchmarks:
benchmarking.benchmark_definitions
The scripts
benchmarking/examples/launch_sagemaker/baselines.py,
benchmarking/examples/launch_sagemaker/hpo_main.py, and
benchmarking/examples/launch_sagemaker/launch_remote.py
are identical in structure to what happens in the
local backend case, with the only
difference that syne_tune.experiments.launchers.hpo_main_sagemaker
or
syne_tune.experiments.launchers.launch_remote_sagemaker
are imported from. Moreover,
Syne Tune dependencies need to be specified in
benchmarking/examples/launch_sagemaker/requirements.txt.
In terms of benchmarks, the same definitions can be used for the SageMaker
backend, in particular you can select from
real_benchmark_definitions()
.
However, the functions there are called with sagemaker_backend=True
, which
can lead to different values in
RealBenchmarkDefinition
.
For example,
resnet_cifar10_benchmark()
returns instance_type=ml.g4dn.xlarge
for the SageMaker backend (1 GPU per
instance), but instance_type=ml.g4dn.12xlarge
for the local backend (4 GPUs
per instance). This is because for the local backend to support n_workers=4
,
the instance needs to have at least 4 GPUs, but for the SageMaker backend, each
worker uses its own instance, so a cheaper instance type can be used.
Extra arguments can be specified by extra_args
, map_method_args
, and
extra results can be written using extra_results
, as is explained
here.
Launching Experiments Locally
Here is an example of how experiments with the SageMaker backend are launched locally:
python benchmarking/examples/launch_sagemaker/hpo_main.py \
--experiment_tag tutorial-sagemaker --benchmark resnet_cifar10 \
--method ASHA --num_seeds 1
This call launches a single experiment on the local machine (however, each trial launches the training script as a SageMaker training job, using the instance type suggested for the benchmark). The command line arguments are the same as in the local backend case. Additional arguments are:
n_workers
,max_wallclock_time
: Overwrite the default values for the selected benchmark.max_failures
: Number of trials which can fail without terminating the entire experiment.warm_pool
: This flag is discussed below.max_size_data_for_model
: Parameter for Bayesian optimization, MOBSTER or Hyper-Tune, see here and here.scale_max_wallclock_time
: If 1, and ifn_workers
is given as argument, but notmax_wallclock_time
, the benchmark defaultbenchmark.max_wallclock_time
is multiplied by :math:B / min(A, B)
, whereA = n_workers
,B = benchmark.n_workers
. This means we run for longer ifn_workers < benchmark.n_workers
, but keepbenchmark.max_wallclock_time
the same otherwise.use_long_tuner_name_prefix
: If 1, results for an experiment are written to a directory whose prefix isf"{experiment_tag}-{benchmark_name}-{seed}"
, followed by a postfix containing date-time and a 3-digit hash. If 0, the prefix isexperiment_tag
only. The default is 1 (long prefix).
If you defined additional arguments via extra_args
, you can use them here
as well.
Launching Experiments Remotely
Sagemaker backend experiments can also be launched remotely, in which case each experiment is run in a SageMaker training job, using a cheap instance type, within which trials are executed as SageMaker training jobs as well. The usage is the same as in the local backend case.
When experiments are launched remotely with the SageMaker backend, a number of
metrics are published to the SageMaker training job console (this feature can
be switched off with --remote_tuning_metrics 0
). This is detailed
here.
Using SageMaker Managed Warm Pools
The SageMaker backend supports
SageMaker managed warm pools,
a recently launched feature of SageMaker. In a nutshell, this feature allows
customers to circumvent start-up delays for SageMaker training jobs which share
a similar configuration (e.g., framework) with earlier jobs which have already
terminated. For Syne Tune with the SageMaker backend, this translates to
experiments running faster or, for a fixed max_wallclock_time
, running more
trials. Warm pools are used if the command line argument --warm_pool 1
is
used with hpo_main.py
. For the example above:
python benchmarking/examples/launch_sagemaker/hpo_main.py \
--experiment_tag tutorial-sagemaker --benchmark resnet_cifar10 \
--method ASHA --num_seeds 1 --warm_pool 1
The warm pool feature is most useful with multi-fidelity HPO methods (such as
ASHA
and MOBSTER
in our example). Some points you should be aware of:
When using SageMaker managed warm pools with the SageMaker backend, it is important to use
start_jobs_without_delay=False
when creating theTuner
.Warm pools are a billable resource, and you may incur extra costs arising from the fact that up to
n_workers
instances are kept running for about 10 minutes at the end of your experiment. You have to request warm pool quota increases for instance types you would like to use. For our example, you need to have quotas for (at least) fourml.g4dn.xlarge
instances, both for training and warm pool usage.As a sanity check, you can watch the training jobs in the console. You should see
InUse
andReused
in the Warm pool status column. Running the example above, the first 4 jobs should complete in about 7 to 8 minutes, while all subsequent jobs should take only 2 to 3 minutes.
Visualization of Results
As we have seen, Syne Tune is a powerful tool for running a large number of experiments in parallel, which can be used to compare different tuning algorithms, or to split a difficult tuning problem into smaller pieces, which can be worked on in parallel. In this section, we show how results of all experiments of such a comparative study can be visualized, using plotting facilities provided in Syne Tune.
Note
This section offers an example of the plotting facilities in Syne Tune. A more comprehensive tutorial is here.
A Comparative Study
For the purpose of this tutorial, we ran the setup of benchmarking/examples/benchmark_hypertune/, using 15 random repetitions (or seeds). This is the command:
python benchmarking/examples/benchmark_hypertune/launch_remote.py \
--experiment_tag docs-1 --random_seed 2965402734 --num_seeds 15
Note that we fix the seed here in order to obtain repeatable results. Recall from here that we compare 7 methods on 12 surrogate benchmarks:
Since 4 of the 7 methods are “expensive”, the above command launches
3 + 4 * 15 = 63
remote tuning jobs in parallel. Each of these jobs runs experiments for one method and all 12 benchmarks. For the “expensive” methods, each job runs a single seed, while for the remaining methods (ASHA, SYNCHB, BOHB), all seeds are run sequentially in a single job, so that a job for a “cheap” method runs12 * 15 = 180
experiments sequentially.The total number of experiment runs is
7 * 12 * 15 = 1260
Results of these experiments are stored to S3, using paths such as
<s3-root>/syne-tune/docs-1/ASHA/docs-1-<datetime>/
for ASHA (all seeds), or<s3-root>/syne-tune/docs-1/HYPERTUNE-INDEP-5/docs-1-<datetime>/
for seed 5 of HYPERTUNE-INDEP. Result files aremetadata.json
,results.csv.gz
, andtuner.dill
. The former two are required for plotting results.
Once all of this has finished, we are left with 3780 result files on S3. We will now show how these can be downloaded, processed, and visualized.
Visualization of Results
First, we need to download the results from S3 to the local disk. This can be
done by a command which is also printed at the end of launch_remote.py
:
aws s3 sync s3://<BUCKET-NAME>/syne-tune/docs-1/ ~/syne-tune/docs-1/ \
--exclude "*" --include "*metadata.json" --include "*results.csv.zip"
This command can also be run from inside the plotting code. Note that the
tuner.dill
result files are not downloaded, since they are not needed for
result visualization.
Here is the code for generating result plots for two of the benchmarks:
from typing import Dict, Any, Optional
import logging
from baselines import methods
from benchmark_definitions import benchmark_definitions
from syne_tune.experiments import ComparativeResults, PlotParameters, SubplotParameters
def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
# The setup is the algorithm. No filtering
return metadata["algorithm"]
SETUPS_RIGHT = ("ASHA", "SYNCHB", "BOHB")
def metadata_to_subplot(metadata: Dict[str, Any]) -> Optional[int]:
return int(metadata["algorithm"] in SETUPS_RIGHT)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
experiment_name = "docs-1"
experiment_names = (experiment_name,)
setups = list(methods.keys())
num_runs = 15
download_from_s3 = False # Set ``True`` in order to download files from S3
# Plot parameters across all benchmarks
plot_params = PlotParameters(
xlabel="wall-clock time",
aggregate_mode="iqm_bootstrap",
grid=True,
)
# We would like two subplots (1 row, 2 columns), with MOBSTER and HYPERTUNE
# results on the left, and the remaining baselines on the right. Each
# column gets its own title, and legends are shown in both
plot_params.subplots = SubplotParameters(
nrows=1,
ncols=2,
kwargs=dict(sharey="all"),
titles=["Model-based Methods", "Baselines"],
legend_no=[0, 1],
)
# The creation of ``results`` downloads files from S3 (only if
# ``download_from_s3 == True``), reads the metadata and creates an inverse
# index. If any result files are missing, or there are too many of them,
# warning messages are printed
results = ComparativeResults(
experiment_names=experiment_names,
setups=setups,
num_runs=num_runs,
metadata_to_setup=metadata_to_setup,
plot_params=plot_params,
metadata_to_subplot=metadata_to_subplot,
download_from_s3=download_from_s3,
)
# We can now create plots for the different benchmarks
# First: nas201-cifar100
benchmark_name = "nas201-cifar100"
benchmark = benchmark_definitions[benchmark_name]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
ylim=(0.265, 0.31),
)
results.plot(
benchmark_name=benchmark_name,
plot_params=plot_params,
file_name=f"./{experiment_name}-{benchmark_name}.png",
)
# Next: nas201-ImageNet16-120
benchmark_name = "nas201-ImageNet16-120"
benchmark = benchmark_definitions[benchmark_name]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
ylim=(0.535, 0.58),
)
results.plot(
benchmark_name=benchmark_name,
plot_params=plot_params,
file_name=f"./{experiment_name}-{benchmark_name}.png",
)
The figure for benchmark nas201-cifar-100
looks as follows:
Results for NASBench-201 (CIFAR-100) |
There are two subfigures next to each other. Each contains a number of curves in bold, along with confidence intervals. The horizontal axis depicts wall-clock time, and on the vertical axis, we show the best metric value found until this time.
More general, the data from our 1260 experiments can be grouped w.r.t. subplot, then setup. Each setup gives rise to one curve (bold, with confidence band). Subplots are optional, the default is to plot a single figure.
The function
metadata_to_setup
maps the metadata stored for an experiment to the setup name, or toNone
if this experiment should be filtered out. In our basic case, the setup is simply the name of the tuning algorithm. Our benchmarking framework stores a host of information as metadata, the most useful keys for grouping are:algorithm
: Name of method (ASHA
,MOBSTER-INDEP
, … in our example)tag
: Experiment tag. This isdocs-1
in our example. Becomes useful when we merge data from different studies in a single figurebenchmark
: Benchmark name (nas201-cifar-100
, … in our example)n_workers
: Number of workers
Other keys may be specific to
algorithm
.Once the data is grouped w.r.t. benchmark, then subplot (optional), then setup, we should be left with 15 experiments, one for each seed. Each seed gives rise to a best metric value curve. A metric value
metric_val
is converted asmetric_multiplier * metric_val
ifmode == "min"
, and as1 - metric_multiplier * metric_val
ifmode == "max"
. For example, if your metric is accuracy in percent (from 0 to 100), thenmode="max"
andmetric_multiplier=0.01
, and the curve shows error in [0, 1]. However, ifconvert_to_min == False
,metric_val
is always converted asmetric_multiplier * metric_val
, so that larger is better ifmode == "max"
.These 15 curves are now interpolated to a common grid, and at each grid point, the 15 values (one for each seed) are aggregated into 3 values
lower
,aggregate
,upper
. In the figure,aggregate
is shown in bold, andlower
,upper
in dashed. Different aggregation modes are supported (selected byplot_params.aggregate_mode
):mean_and_ci
: Mean and 0.95 normal confidence intervaliqm_bootstrap
(default): Interquartile mean and 0.95 confidence interval based on the bootstrap variance estimate. These statistics are argued for in Agarwal et.al: Deep Reinforcement Learning at the Edge of the Statistical Precipice.median_percentiles
: Median and 25 (lower), 75 (upper) percentiles
Plotting starts with the creation of a
ComparativeResults
object. We need to pass the experiment names (or tags), the list of all setups, the number of runs (or seeds), themetadata_to_setup
function, as well as default plot parameters inplot_params
. SeePlotParameters
for full details about the latter. In our example, we setxlabel
,aggregate_mode
(see above), and enable a grid withgrid=True
. Note that these parameters can be extended and overwritten by parameters for each plot.In our example, we separate the MOBSTER and HYPERTUNE setups from the baselines, by using two subfigures. This is done by specifying
plot_params.subplots
andmetadata_to_subplot
. In the former,plot_params.subplots.nrows
andplot_params.subplots.ncols
are mandatory, providing the shape of the subplot arrangement. Inplot_params.subplots.titles
, we can provide titles for each column (which we do here). If given, this overridesplot_params.title
. Also,plot_params.subplots.legend_no=[0, 1]
asks for legends in both subplots (the default is no legend at all). For full details about these arguments, seeSubplotParameters
The creation of
results
does a number of things. First, ifdownload_from_s3=True
, result files are downloaded from S3. In our example, we assume this has already been done. Next, all result files are iterated over, allmetadata.json
are read, and an inverse index from benchmark name to paths,setup_name
, andsubplot_no
is created. This process also checks that exactlynum_runs
experiments are present for every setup. For large studies, it frequently happens that too few or too many results are found. The warning outputs can be used for debugging.Given
results
, we can create plots for every benchmark. In our example, this is done fornas201-cifar100
andnas201-ImageNet16-120
, by callingresults.plot()
. Apart from the benchmark name, we also pass plot parameters inplot_params
, which extend (and overwrite) those passed at construction. In particular, we need to passmetric
andmode
, which we can obtain from the benchmark description. Moreover,ylim
is a sensible range for the vertical axis, which is different for every benchmark (this is optional).If we pass
file_name
as argument toresults.plot
, the figure is stored in this file.
Note
Apart from plots comparing different setups, aggregated over multiple seeds, we can also visualize the learning curves per trial for a single experiment. Details are given in this tutorial.
Contributing Your Benchmark
In order to increase its scope and usefulness, Syne Tune greatly welcomes the contribution of new benchmarks, in particular in areas not yet well covered. In a nutshell, contributing a benchmark is pretty similar to a code contribution, but in this section, we provide some extra hints.
Contributing a Real Benchmark
In principle, a real benchmark consists of a Python script which runs evaluations, adhering to the conventions of Syne Tune. However, in order for your benchmark to be useful for the community, here are some extra requirements:
The benchmark should not be excessively expensive to run
If your benchmark involves training a machine learning model, the code should work with the dependencies of a SageMaker framework. You can specify extra dependencies, but they should be small. While Syne Tune (and SageMaker) supports Docker containers, Syne Tune is not hosting them. At present, we also do not accept
Dockerfile
script contributions, since we cannot maintain them.If your benchmark depends on data files, these must be hosted for public read access somewhere. Syne Tune cannot host data files, and will reject contributions with large files. If downloading and preprocessing the data for your benchmark takes too long, you may contribute an import script of a similar type to what is done in our
syne_tune.blackbox_repository
.
Let us have a look at the resnet_cifar10
benchmark as example of what needs
to be done:
resnet_cifar10.py: The training script for your benchmark should be in a subdirectory of
benchmarking/training_scripts/
. The same directory can contain a filerequirements.txt
with dependencies beyond the SageMaker framework you specify for your code. You are invited to study the code resnet_cifar10.py in detail. Important points are:Your script needs to report relevant metrics back to Syne Tune at the end of each epoch (or only once, at the end, if your script does not support multi-fidelity tuning), using an instance of
Reporter
.We strongly recommend your script to support checkpointing, and the
resnet_cifar10
script is a good example for how to do this withPyTorch
training scripts. If checkpointing is not supported, all pause-and-resume schedulers will run substantially slower than they really have to, because every resume operation requires them to train the model from scratch.
benchmarking.benchmark_definitions.resnet_cifar10
: You need to define some meta-data for your benchmark inbenchmarking.benchmark_definitions
. This should be a function returning aRealBenchmarkDefinition
object. Arguments should be a flagsagemaker_backend
(True
for SageMaker backend experiment,False
otherwise), and**kwargs
overwriting values inRealBenchmarkDefinition
. Hints:framework
should be one of the SageMaker frameworks. You should also specifyframework_version
andpy_version
in theestimator_kwargs
dict.config_space
is the configuration space for your benchmark. Please make sure to choose hyperparameter domains wisely.instance_type
,n_workers
: You need to specify a default instance type and number of workers for experiments running your benchmark. If in doubt, choose instances with the lowest costs. Currently, most of our GPU benchmarks useml.g4dn.xlarge
, and CPU benchmarks useml.c5.4xlarge
. Note that for experiments with the local backend (sagemaker_backend=False
), the instance type must offer at leastn_workers
GPUs or CPU cores. For example,ml.g4dn.xlarge
only has 1 GPU, whileml.g4dn.12xlarge
provides forn_workers=4
.max_wallclock_time
is a default value for the length of experiments running your benchmark, a value which depends oninstance_type
,n_workers
. *metric
,mode
,max_resource_attr
,resource_attr
are required parameters for your benchmark, which are arguments to schedulers.
Note
If you simply would like to run experiments with your own training code,
it is not necessary for you to the benchmarking
module at all. It
just makes comparisons to other built-in benchmarks easier. See
this tutorial for more details.
Role of benchmarking/nursery/
The best place to contribute a new benchmark, along with launcher scripts, is
to create a new package in benchmarking.nursery
. This package contains:
Training script and meta-data definition, as detailed above
Launcher scripts, as detailed in the remainder of this tutorial
Optionally, some scripts to visualize results
You are encouraged to run some experiments with your benchmark, involving a number of baseline HPO methods, and submit results along with your pull request.
Once your benchmark is in there, it may be used by the community. If others
find it useful, it can be graduated into
benchmarking.benchmark_definitions
,
benchmarking.training_scripts
, and benchmarking.examples
.
We are looking forward to your pull request.
Contributing a Tabulated Benchmark
Syne Tune contains a blackbox repository syne_tune.blackbox_repository
for maintaining and serving tabulated and surrogate benchmarks, as well as a
simulator backend (syne_tune.backend.simulator_backend
), which
simulates training evaluations from a blackbox. The simulator backend can be
used with any Syne Tune scheduler, and experiment runs are very close to what
would be obtained by running training for real. Since time is simulated as well,
not only are experiments very cheap to run (on basic CPU hardware), they also
finish many times faster than real time. An overview is given
here.
If you have the data for a tabulated benchmark, we strongly encourage you to
contribute an import script to Syne Tune.
Examples for such scripts are
syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import
,
syne_tune.blackbox_repository.conversion_scripts.scripts.nasbench201_import
,
syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import
,
syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import
,
syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.lcbench
.
See also
FAQ.
Visualization of Results
Finding the best model to deploy for a task at hand is a semi-automated process. The data scientist runs a set of experiments in parallel, visualizes comparative results, based on which the next set of experiments are planned. Syne Tune does not only allow you to run many experiments in parallel, but also provides tooling to rapidly create customized visualization in order to gain insights for the next steps, or to present final results to clients. This tutorial provides an overview of visualization facilities.
Note
In order to run the code in this tutorial, you need to have
installed Syne Tune from source.
Also, make sure to have installed the blackbox-repository
dependencies.
Visualization Results of a Single Experiment
In this section, we describe the setup to be used for this tutorial. Then, we show how the results of a single experiment can be visualized.
Note
This tutorial shares some content with this one, but is more comprehensive in terms of features.
Note
In this tutorial, we will use a surrogate benchmark in order to obtain
realistic results with little computation. To this end, you need
to have the blackbox-repository
dependencies installed, as detailed
here.
Note that
the first time you use a surrogate benchmark, its data files are downloaded
and stored to your S3 bucket, this can take a considerable amount of time.
The next time you use the benchmark, it is loaded from your local disk or
your S3 bucket, which is fast.
A Comparative Study
For the purpose of this tutorial, we ran the setup of benchmarking/examples/benchmark_hypertune/, using 15 random repetitions (or seeds). This is the command:
python benchmarking/examples/benchmark_hypertune/launch_remote.py \
--experiment_tag docs-1 --random_seed 2965402734 --num_seeds 15
Note that we fix the seed here in order to obtain repeatable results. Recall from here that we compare 7 methods on 12 surrogate benchmarks:
Since 4 of the 7 methods are “expensive”, the above command launches
3 + 4 * 15 = 63
remote tuning jobs in parallel. Each of these jobs runs experiments for one method and all 12 benchmarks. For the “expensive” methods, each job runs a single seed, while for the remaining methods (ASHA, SYNCHB, BOHB), all seeds are run sequentially in a single job, so that a job for a “cheap” method runs12 * 15 = 180
experiments sequentially.The total number of experiment runs is
7 * 12 * 15 = 1260
Results of these experiments are stored to S3, using paths such as
<s3-root>/syne-tune/docs-1/ASHA/docs-1-<datetime>/
for ASHA (all seeds), or<s3-root>/syne-tune/docs-1/HYPERTUNE-INDEP-5/docs-1-<datetime>/
for seed 5 of HYPERTUNE-INDEP. Result files aremetadata.json
,results.csv.gz
, andtuner.dill
. The former two are required for plotting results.
Once all of this has finished, we are left with 3780 result files on S3. First,
we need to download the results from S3 to the local disk. This can be done by
a command which is also printed at the end of launch_remote.py
:
aws s3 sync s3://<BUCKET-NAME>/syne-tune/docs-1/ ~/syne-tune/docs-1/ \
--exclude "*" --include "*metadata.json" --include "*results.csv.zip"
This command can also be run from inside the plotting code. Note that the
tuner.dill
result files are not downloaded, since they are not needed for
result visualization.
Visualization of a Single Experiment
For a single experiment, we can directly plot the best metric value obtained
as a function of wall-clock time. This can be done directly following the
experiment, as shown in
this example. In
our setup, experiments have been launched remotely, so in order to plot
results for a single experiment, we need to know the full tuner name.
Say, we would like to plot results of MOBSTER-JOINT
, seed=0
. The
names of single experiments are obtained by:
ls ~/syne-tune/docs-1/MOBSTER-JOINT-0/
There is one experiment per benchmark, starting with docs-1-nas201-ImageNet16-120-0
,
docs-1-nas201-cifar100-0
, docs-1-nas201-cifar10-0
, followed by date-time
strings. Once the tuner name is known, the following scripts plots the
desired curve and also displays the best configuration found:
from syne_tune.experiments import load_experiment
if __name__ == "__main__":
# Replace with name for your experiment:
# Run:
# ls ~/syne-tune/docs-1/MOBSTER-JOINT-0/
tuner_name = (
"docs-1/MOBSTER-JOINT-0/docs-1-nas201-cifar10-0-2023-04-15-11-35-31-201"
)
tuning_experiment = load_experiment(tuner_name)
print(tuning_experiment)
print(f"best result found: {tuning_experiment.best_config()}")
tuning_experiment.plot()
In general, you will have run more than one experiment. As in our study above, you may want to compare different methods, or variations of the tuning problem. You may want to draw conclusions by running on several benchmarks, and counter random effects by repeating experiments several times. In the next section, we show how comparative plots over many experiments can be created.
Visualization of Results from many Experiments
Apart from troubleshooting, visualizing the results of a single experiment is of limited use. In this section, we show how to create comparative plots, using results of many experiment. We will use results from the study detailed above.
A First Comparative Plot
Here is the code for generating result plots for two of the benchmarks:
from typing import Dict, Any, Optional
import logging
from baselines import methods
from benchmark_definitions import benchmark_definitions
from syne_tune.experiments import ComparativeResults, PlotParameters, SubplotParameters
def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
# The setup is the algorithm. No filtering
return metadata["algorithm"]
SETUPS_RIGHT = ("ASHA", "SYNCHB", "BOHB")
def metadata_to_subplot(metadata: Dict[str, Any]) -> Optional[int]:
return int(metadata["algorithm"] in SETUPS_RIGHT)
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
experiment_name = "docs-1"
experiment_names = (experiment_name,)
setups = list(methods.keys())
num_runs = 15
download_from_s3 = False # Set ``True`` in order to download files from S3
# Plot parameters across all benchmarks
plot_params = PlotParameters(
xlabel="wall-clock time",
aggregate_mode="iqm_bootstrap",
grid=True,
)
# We would like two subplots (1 row, 2 columns), with MOBSTER and HYPERTUNE
# results on the left, and the remaining baselines on the right. Each
# column gets its own title, and legends are shown in both
plot_params.subplots = SubplotParameters(
nrows=1,
ncols=2,
kwargs=dict(sharey="all"),
titles=["Model-based Methods", "Baselines"],
legend_no=[0, 1],
)
# The creation of ``results`` downloads files from S3 (only if
# ``download_from_s3 == True``), reads the metadata and creates an inverse
# index. If any result files are missing, or there are too many of them,
# warning messages are printed
results = ComparativeResults(
experiment_names=experiment_names,
setups=setups,
num_runs=num_runs,
metadata_to_setup=metadata_to_setup,
plot_params=plot_params,
metadata_to_subplot=metadata_to_subplot,
download_from_s3=download_from_s3,
)
# We can now create plots for the different benchmarks
# First: nas201-cifar100
benchmark_name = "nas201-cifar100"
benchmark = benchmark_definitions[benchmark_name]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
ylim=(0.265, 0.31),
)
results.plot(
benchmark_name=benchmark_name,
plot_params=plot_params,
file_name=f"./{experiment_name}-{benchmark_name}.png",
)
# Next: nas201-ImageNet16-120
benchmark_name = "nas201-ImageNet16-120"
benchmark = benchmark_definitions[benchmark_name]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
ylim=(0.535, 0.58),
)
results.plot(
benchmark_name=benchmark_name,
plot_params=plot_params,
file_name=f"./{experiment_name}-{benchmark_name}.png",
)
The figure for benchmark nas201-cifar-100
looks as follows:
Results for NASBench-201 (CIFAR-100) |
There are two subfigures next to each other. Each contains a number of curves in bold, along with confidence intervals. The horizontal axis depicts wall-clock time, and on the vertical axis, we show the best metric value found until this time.
More general, the data from our 1260 experiments can be grouped w.r.t. subplot, then setup. Each setup gives rise to one curve (bold, with confidence band). Subplots are optional, the default is to plot a single figure.
The function
metadata_to_setup
maps the metadata stored for an experiment to the setup name, or toNone
if this experiment should be filtered out. In our basic case, the setup is simply the name of the tuning algorithm. Our experimentation framework stores a host of information as metadata, the most useful keys for grouping are:algorithm
: Name of method (ASHA
,MOBSTER-INDEP
, … in our example)tag
: Experiment tag. This isdocs-1
in our example. Becomes useful when we merge data from different studies in a single figurebenchmark
: Benchmark name (nas201-cifar-100
, … in our example)n_workers
: Number of workers
Other keys may be specific to
algorithm
.Once the data is grouped w.r.t. benchmark, then subplot (optional), then setup, we should be left with 15 experiments, one for each seed. Each seed gives rise to a best metric value curve. A metric value
metric_val
is converted asmetric_multiplier * metric_val
ifmode == "min"
, and as1 - metric_multiplier * metric_val
ifmode == "max"
. For example, if your metric is accuracy in percent (from 0 to 100), thenmode="max"
andmetric_multiplier=0.01
, and the curve shows error in [0, 1].These 15 curves are now interpolated to a common grid, and at each grid point, the 15 values (one for each seed) are aggregated into 3 values
lower
,aggregate
,upper
. In the figure,aggregate
is shown in bold, andlower
,upper
in dashed. Different aggregation modes are supported (selected byplot_params.aggregate_mode
):mean_and_ci
: Mean and 0.95 normal confidence intervaliqm_bootstrap
(default): Interquartile mean and 0.95 confidence interval based on the bootstrap variance estimate. These statistics are argued for in Agarwal et.al: Deep Reinforcement Learning at the Edge of the Statistical Precipice.median_percentiles
: Median and 25 (lower), 75 (upper) percentiles
Plotting starts with the creation of a
ComparativeResults
object. We need to pass the experiment names (or tags), the list of all setups, the number of runs (or seeds), themetadata_to_setup
function, as well as default plot parameters inplot_params
. SeePlotParameters
for full details about the latter. In our example, we setxlabel
,aggregate_mode
(see above), and enable a grid withgrid=True
. Note that these parameters can be extended and overwritten by parameters for each plot.In our example, we separate the MOBSTER and HYPERTUNE setups from the baselines, by using two subfigures. This is done by specifying
plot_params.subplots
andmetadata_to_subplot
. In the former,plot_params.subplots.nrows
andplot_params.subplots.ncols `` are mandatory, prescribing the shape of the subplot arrangement. In ``plot_params.subplots.titles
, we can provide titles for each column (which we do here). If given, this overridesplot_params.title
. Also,plot_params.subplots.legend_no=[0, 1]
asks for legends in both subplots (the default is no legend at all). For full details about these arguments, seeSubplotParameters
The creation of
results
does a number of things. First, ifdownload_from_s3=True
, result files are downloaded from S3. In our example, we assume this has already been done. Next, all result files are iterated over, allmetadata.json
are read, and an inverse index from benchmark name to paths,setup_name
, andsubplot_no
is created. This process also checks that exactlynum_runs
experiments are present for every setup. For large studies, it frequently happens that too few or too many results are found. The warning outputs can be used for debugging.Given
results
, we can create plots for every benchmark. In our example, this is done fornas201-cifar100
andnas201-ImageNet16-120
, by callingresults.plot()
. Apart from the benchmark name, we also pass plot parameters inplot_params
, which extend (and overwrite) those passed at construction. In particular, we need to passmetric
andmode
, which we can obtain from the benchmark description. Moreover,ylim
is a sensible range for the vertical axis, which is different for every benchmark (this is optional).If we pass
file_name
as argument toresults.plot
, the figure is stored in this file.results.plot
returns a dictionary, whose entries “fig” and “axs” contain the figure and its axes (subfigures), allowing for further fine-tuning.
Note
If suplots are used, the grouping is w.r.t. (subplot, setup)
, not
just by setup
. This means you can use the same setup name in
different subplots to show different data. For example, your study may
have run a range of methods under different conditions (say, using a
different number of workers). You can then map these conditions to
subplots and show the same setups in each subplot. In any case, the
mapping of setups to colors is fixed and the same in every subplot.
Note
Plotting features presented here can also be used to visualize results for a single seed. In this case, there are no error bars.
Additional Features
In this section, we discuss additional features, allowing you to customize your result plots.
Combining Results from Multiple Studies
HPO experiments are expensive to do, so you want to avoid re-running them for baselines over and over. Our plotting tools allow you to easily combine results across multiple studies.
As an example, say we would like to relate our docs-1
results to what
random search and Bayesian optimization do on the same benchmarks. These
baseline results were already obtained as part of an earlier study
baselines-1
, in which a number of methods were compared, among them RS
and BO
. As an additional complication, the earlier study used 30
repetitions (or seeds), while docs-1
uses 15. Here is the modification of
the code above in order to include these additional baseline results in the
plot on the right side. First, we need to replace metadata_to_setup
and
SETUPS_RIGHT
:
def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
algorithm = metadata["algorithm"]
tag = metadata["tag"]
seed = int(metadata["seed"])
# Filter out experiments from "baselines-1" we don't want to compare
# against
if tag == "baselines-1" and (seed >= 15 or algorithm not in ("RS", "BO")):
return None
else:
return algorithm
SETUPS_RIGHT = ("ASHA", "SYNCHB", "BOHB", "RS", "BO")
There are now two more setups, “RS” and “BO”, whose results come from the
earlier baselines-1
study. Now, ComparativeResults
has to be created
differently:
experiment_names = experiment_names + ("baselines-1",)
setups = setups + ["RS", "BO"]
results = ComparativeResults(
experiment_names=experiment_names,
setups=setups,
num_runs=num_runs,
metadata_to_setup=metadata_to_setup,
plot_params=plot_params,
metadata_to_subplot=metadata_to_subplot,
download_from_s3=download_from_s3,
)
Note
If you intend to combine results from several different studies, it is
recommended to use the same random seed (specified as --random_seed
),
which ensures that the same sequence of random numbers is used in each
experiment. This results in a so-called paired comparison, lowering the
random variations across setups. In our example, we would look up the
master random seed of the baselines-1
study and use this for docs-1
as well.
Add Performance of Initial Trials
When using HPO, you often have an idea about one or several default
configurations that should be tried first. In Syne Tune, such initial
configurations can be specified by points_to_evaluate
(see
here for details).
An obvious question to ask is how long it takes for a HPO method to find a
configuration which works significantly better than these initial ones.
We can visualize the performance of initial trials by specifying
plot_params.show_init_trials
of type
ShowTrialParameters
. In our docs-1
study,
points_to_evaluate
is not explicitly used, but the configuration of the
first trial is selected by a mid-point heuristic. Our plotting script from
above needs to be modified:
plot_params.show_init_trials = ShowTrialParameters(
setup_name="ASHA",
trial_id=0,
new_setup_name="default"
)
results = ComparativeResults(
experiment_names=experiment_names,
setups=setups,
num_runs=num_runs,
metadata_to_setup=metadata_to_setup,
plot_params=plot_params,
metadata_to_subplot=metadata_to_subplot,
download_from_s3=download_from_s3,
)
Since the ASHA
curve is plotted on the right side, this will add another
curve there with label default
. This curve shows the best metric value,
using data from the first trial only (trial_id == 0
). It is extended as a
flat constant line to the end of the horizontal range.
If you specify a number of initial configurations with points_to_evaluate
,
set ShowTrialParameters.trial_id
to their number minus 1. The initial trials
curve will use data from trials with ID less or equal this number.
Controlling Subplots
Our example above already creates two subplots, horizontally arranged, and we
discussed the role of metadata_to_subplot
. Here, we provide extra details
about fields in SubplotParameters
, the type
for plot_params.subplots
:
nrows
,ncols
: Shape of subplot matrix. The total number of subplots is<= ncols * nrows
.kwargs
contains further arguments tomatplotlib.pyplot.subplots
. For example, ifsharey="all"
, the y tick labels are only created for the first column. If you usenrows > 1
, you may want to share x tick labels as well, withsharex="all"
.titles
: Iftitle_each_figure == False
, this is a list of titles, one for each column. Iftitle_each_figure == True
, thentitles
contains a title for each subplot. Iftitles
is not given, the global titleplot_params.title
is printed on top of the left-most column.legend_no
: List of subfigures in which the legend is shown. The default is not to show legends. In our example, there are different setups in each subplot, so we want a legend in each. If your subplots show the same setups under different conditions, you may want to show the legend in one of the subplots only, in which caselegend_no
contains a single number.xlims
: Use this if your subfigures have x axis ranges. The globalxlim
is overwritten by(0, xlims[subplot_no])
.subplot_indices
: Any given plot produced byplot()
does not have to contain all subfigures. For example, you may want to group your results into 4 or 8 bins, then create a sequence of plots comparing pairs of them. Ifsubplot_indices
is given, it contains the subplot indices to be shown, and this order. Otherwise, this is \(0, 1, 2, \dots\). If this is given, thentitles
andxlims
is relative to this list (in thatxlims[i]
corresponds to subfiguresubplot_indices[i]
), butlegend_no
is not.
Plotting Derived Metrics
You can also plot metrics which are not directly contained in the results data
(as a column), but which can be computed from the results. To this end, you
can pass a dataframe column generator as dataframe_column_generator
to
plot()
. For example, assume
we run multi-objective HPO methods on a benchmark involving metrics cost
and latency
(mode="min"
for both of them). The final plot
command
would look like this:
from syne_tune.experiments.multiobjective import (
hypervolume_indicator_column_generator,
)
# ...
dataframe_column_generator = hypervolume_indicator_column_generator(
metrics_and_modes = [("cost", "min"), ("latency", "min")]
)
plot_params = PlotParameters(
metric="hypervolume_indicator",
mode="max",
)
results.plot(
benchmark_name=benchmark_name,
plot_params=plot_params,
dataframe_column_generator=dataframe_column_generator,
one_result_per_trial=True,
)
The mapping returned by
hypervolume_indicator_column_generator()
maps a results dataframe to a new column containing the best hypervolume indicator as function of wall-clock time for the metricscost
andlatency
, which must be contained in the results dataframe.The option
one_result_per_trial=True
ofresults.plot
ensures that the result data is filtered, so that for each experiment, one the final row for each trial remains. This option is useful if the methods are single-fidelity, but results are reported after each epoch. The filtering makes sure that only results for the largest epoch are used for each trial. Since this is done before the best hypervolume indicator is computed, it can speed up the computation dramatically.
Filtering Experiments by DateTime Bounds
Results can be filtered out by having metadata_to_setup
or
metadata_to_subplot
return None
. This is particularly useful if results
from several studies are to be combined. Another way to filter experiments is
using the datetime_bounds
argument of
ComparativeResults
. A common use case is that
experiments for a large study have been launched in several stages, and those
of an early stage failed. If the corresponding result files are not removed on S3,
the creation of ComparativeResults
will complain about too many results
being found. datetime_bounds
is specified in terms of date-time strings of
the format ST_DATETIME_FORMAT
, which currently is
“YYYY-MM-DD-HH-MM-SS”. For example, if results are valid from
“2023-03-19-22-01-57” onwards, but invalid before, we can use
datetime_bounds=("2023-03-19-22-01-57", None)
. datetime_bounds
can also
be a dictionary with keys from experiment_names
, in which case bounds are
specific to different experiment prefixes.
Extract Meta-Data Values
Apart from plotting results, we can also retrieve meta-data values. This is
done by passing a list of meta-data key names via metadata_keys
when
creating ComparativeResults
. Afterwards, the
corresponding meta-data values can be queried by calling
results.metadata_values(benchmark_name)
. The result is a nested dictionary
result
, so that result[key][setup_name]
is a list of values, where
key
is the meta-data key from metadata_keys
, setup_name
is a setup
name. The list contains values from all experiments mapped to this
setup_name
. If you use the same setup names across different subplots,
set metadata_subplot_level=True
, in which case
results.metadata_values(benchmark_name)
returns
result[key][setup_name][subplot_no]
, so the grouping w.r.t. setup names
and subplots is used.
Extract Final Values for Extra Results
Syne Tune allows extra results to be stored alongside the usual metrics data,
as shown in
examples/launch_height_extra_results.py.
These are simply additional columns in the result dataframe. In order to plot
them over time, you currently need to write your own plotting scripts. If the
best value over time approach of Syne Tune’s plotting tools makes sense for
any single column, you can just specify their name for plot_params.metric
and set plot_params.mode
accordingly.
However, in many cases it is sufficient to know final values for extra results,
grouped in the same way as everything else. For example, extra results may be
used to monitor some internals of the HPO method being used, in which case we
may be satisfied to see these statistics at the end of experiments. If
extra_results_keys
is used in
plot()
, the method returns
a nested dictionary extra_results
under key “extra_results”, so that
extra_results[setup_name][key]
contains a list of values (one for each
seed) for setup setup_name
and key
an extra result name from
extra_results_keys
. As above, if
metadata_subplot_level=True
at construction of
ComparativeResults
, the structure of the
dictionary is extra_results[setup_name][subplot_no][key]
.
Visualizing Learning Curves
We have seen how results from many experiments can be visualized jointly in order to compare different HPO methods, different variations of the benchmark (e.g., different configuration spaces), or both. In order to understand differences between two setups in a more fine-grained fashion, it can be useful to look at learning curve plots. In this section, we demonstrate Syne Tune tooling along this direction.
Why Hyper-Tune Does Outperform ASHA?
In our docs-1
study, HYPERTUNE-INDEP
significantly outperforms ASHA
.
The best metric value curve descends much faster initially, and also the final
performance at max_wallclock_time
is significantly better.
How can this difference be explained? Both methods use the same scheduling logic,
so differences are mostly due to how configurations of new trials are suggested.
In ASHA
, this is done by random sampling. In HYPERTUNE-INDEP
, independent
Gaussian process surrogate models are fitted on observations at each rung level,
and decisions are made based on an acquisition function which carefully weights
the input from each of these models (details are given
here). But how exactly does
this difference matter? We can find out by plotting learning curves of trials
for two experiments next to each other, ASHA
on the left, HYPERTUNE-INDEP`
on the right. Here is the code for doing this:
Here is the code for generating result plots for two of the benchmarks:
from typing import Dict, Any, Optional
from syne_tune.experiments import (
TrialsOfExperimentResults,
PlotParameters,
MultiFidelityParameters,
)
from benchmarking.examples.benchmark_hypertune.benchmark_definitions import (
benchmark_definitions,
)
SETUPS_TO_COMPARE = ("ASHA", "HYPERTUNE-INDEP")
def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
algorithm = metadata["algorithm"]
return algorithm if algorithm in SETUPS_TO_COMPARE else None
if __name__ == "__main__":
experiment_name = "docs-1"
benchmark_name_to_plot = "nas201-cifar100"
seed_to_plot = 7
download_from_s3 = False # Set ``True`` in order to download files from S3
experiment_names = (experiment_name,)
# Plot parameters across all benchmarks
plot_params = PlotParameters(
xlabel="wall-clock time",
grid=True,
)
# We need to provide details about rung levels of the multi-fidelity methods.
# Also, all methods compared are pause-and-resume
multi_fidelity_params = MultiFidelityParameters(
rung_levels=[1, 3, 9, 27, 81, 200],
multifidelity_setups={name: True for name in SETUPS_TO_COMPARE},
)
# The creation of ``results`` downloads files from S3 (only if
# ``download_from_s3 == True``), reads the metadata and creates an inverse
# index. If any result files are missing, or there are too many of them,
# warning messages are printed
results = TrialsOfExperimentResults(
experiment_names=experiment_names,
setups=SETUPS_TO_COMPARE,
metadata_to_setup=metadata_to_setup,
plot_params=plot_params,
multi_fidelity_params=multi_fidelity_params,
download_from_s3=download_from_s3,
)
# Create plot for certain benchmark and seed
benchmark = benchmark_definitions[benchmark_name_to_plot]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
)
results.plot(
benchmark_name=benchmark_name_to_plot,
seed=seed_to_plot,
plot_params=plot_params,
file_name=f"./learncurves-{experiment_name}-{benchmark_name_to_plot}.png",
)
The figure for benchmark nas201-cifar-100
and seed=7
looks as follows:
Learning curves for NASBench-201 (CIFAR-100), |
The class for creating learning curve plots is
TrialsOfExperimentResults
. It is quite similar
to ComparativeResults
, but there are differences:
For learning curve plots, each setup occupies its own subfigure. Also, the seed for each plot is fixed, so each subfigure is based on the results for a single experiment.
metadata_to_setup
is used to filter out the experiments we want to compare. In this case, this isASHA
andHYPERTUNE-INDEP
.The default for
plot_params.subplots
is a single row of subfigures, one for each setup, and titles correspond to setup names. In our example, we use this default. If you want to compare many setups, you can use an arrangement with multiple rows as well.In learning curve plots, the trajectory of metric values for a trial is plotted in a different color per trial (more precisely, we circle through a palette, so that eventually colors are repeated). The final metric value of a trial is marked with a diamond.
If comparing multi-fidelity methods (like ASHA, Hyper-Tune, MOBSTER), you should also specify
multi_fidelity_params
, passing the rung levels. In this case, metric values at rung levels are marked by a circle, or by a diamond if this is the final value for a trial.If some of your multi-fidelity setups are of the pause-and-resume type (i.e., the evaluation of a trial can be paused and possibly resumed later on), list them in
multi_fidelity_params.pause_resume_setups
. Trajectories of pause-and-resume methods need to be plotted differently: there has to be a gap between the value at a rung level and the next one, instead of a line connecting them. In our example, all setups are pause-and-resume, and these gaps are clearly visible.
What do these plots tell us about the differences between ASHA
and
HYPERTUNE-INDEP
? First of all, HYPERTUNE-INDEP
has many less isolated
diamonds than ASHA
. These correspond to trials which are paused after one
epoch and never resumed. For ASHA
, both the rate of single diamonds and
their metric distribution remains stationary over time, while for
HYPERTUNE-INDEP
, the rate rapidly diminishes, and also the metric values
for single diamonds improve. This is what we would expect. ASHA
does not
learn anything from the past, and simply continues to suggest configurations
at random, while HYPERTUNE-INDEP
rapidly learns what part of the
configuration to avoid and does not repeat basic mistakes moving forward. This
means that overall, ASHA
wastes resources on starting poorly performing
trials over and over, while HYPERTUNE-INDEP
uses these resources in order
to resume training for more trials, thereby reaching better performances over
the same time horizon. These results were obtained in the context of simulated
experimentation, without delays for starting, pausing, or resuming trials. In
the presence of such delays, the advantage of model-based methods over ASHA
becomes more pronounced.
With specific visualizations, we can drill deeper to figure out what
HYPERTUNE-INDEP
learns about the configuration space. For example, the
configurations of all trials are stored in the results as well. Doing so, we
can confirm that HYPERTUNE-INDEP
rapidly learns about basic properties of
the NASBench-201
configuration space, where certain connections are mandatory
for good results, and consistenty chooses them after a short initial phase.
Rapid Experimentation with Syne Tune
The main goal of automated tuning is to help the user to find and adjust the best machine learning model as quickly as possible, given some computing resources controlled by the user. Syne Tune contains some tooling which can speed up this interactive process substantially. The user can launch many experiments in parallel, slicing the complete model selection and tuning problems into smaller parts. Comparative plots can be created from past experimental data and easily customized to specific needs.
Syne Tune’s tooling for rapid experimentation is part of the benchmarking framework, which is covered in detail in this tutorial. However, as demonstrated here, this framework is useful for experimentation beyond the comparison of different HPO algorithm. The tutorial here is self-contained, but the reader may want to consult the benchmarking tutorial for background information.
Note
The code used in this tutorial is contained in the
Syne Tune sources, it is not
installed by pip
. You can obtain this code by installing Syne Tune from
source, but the only code that is needed is in
benchmarking.examples.demo_experiment
. The final section also needs
code from benchmarking.nursery.odsc_tutorial
.
Also, make sure to have installed the blackbox-repository
dependencies.
Setting up an Experimental Study
Any statistical analysis consists of a sequence of experiments, where later ones are planned given outcomes of earlier ones. Parallelization can be used to speed up this process:
If outcomes or decision-making are randomized (e.g., training neural networks starts from random initial weights; HPO may suggest configurations drawn at random), it is important to repeat experiments several times in order to gain robust outcomes.
If a search problem becomes too big, it can be broken down into several parts, which can be worked on independently.
In this section, we describe the setup for a simple study, which can be used to showcase tooling in Syne Tune for splitting up a large problem into pieces, running random repetitions, writing out extra information, and creating customized comparative plots.
For simplicity, we use surrogate benchmarks from the fcnet
family, whereby
tuning is simulated. This is the
default configuration space for these benchmarks:
CONFIGURATION_SPACE = {
"hp_activation_fn_1": choice(["tanh", "relu"]),
"hp_activation_fn_2": choice(["tanh", "relu"]),
"hp_batch_size": logfinrange(8, 64, 4, cast_int=True),
"hp_dropout_1": finrange(0.0, 0.6, 3),
"hp_dropout_2": finrange(0.0, 0.6, 3),
"hp_init_lr": choice([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]),
"hp_lr_schedule": choice(["cosine", "const"]),
NUM_UNITS_1: logfinrange(16, 512, 6, cast_int=True),
NUM_UNITS_2: logfinrange(16, 512, 6, cast_int=True),
}
Note
In the Syne Tune experimentation framework, a tuning problem (i.e., training and evaluation script or blackbox, together with defaults) is called a benchmark. This terminology is used even if the goal of experimentation is not benchmarking (i.e., comparing different HPO methods), as is the case in this tutorial here.
Note
The code used in this tutorial is contained in the
Syne Tune source, it is not
installed by pip
. You can obtain this code by installing Syne Tune from
source, but the only code that is needed is in
benchmarking.examples.demo_experiment
, so if you copy that out of the
repository, you do not need all the remaining source code.
Note
In order to use surrogate benchmarks and the simulator backend, you need
to have the blackbox-repository
dependencies installed, as detailed
here.
Note that the first time you use a surrogate benchmark, its data files are
downloaded and stored to your S3 bucket, this can take a considerable amount
of time. The next time you use the benchmark, it is loaded from your local
disk or your S3 bucket, which is fast.
Modifying the Configuration Space
The hyperparameters hp_activation_fn_1
and hp_activation_fn_2
prescribe
the type of activation function in hidden layers 1 and 2. We can split the
overall tuning problem into smaller pieces by fixing these parameters to
fixed values, considering relu
and tanh
networks independently. In our
study, we will compare the following methods:
ASHA-TANH
,MOBSTER-TANH
: Runs ASHA and MOBSTER on the simplified configuration space, wherehp_activation_fn_1 = hp_activation_fn_2 = "tanh"
ASHA-RELU
,MOBSTER-RELU
: Runs ASHA and MOBSTER on the simplified configuration space, wherehp_activation_fn_1 = hp_activation_fn_2 = "relu"
ASHA
,MOBSTER
: Runs ASHA and MOBSTER on the original configuration spaceRS
,BO
: Runs baselines random search and Bayesian optimization on the original configuration space
Here is the script defining these alternatives:
import copy
from syne_tune.experiments.default_baselines import (
RandomSearch,
BayesianOptimization,
ASHA,
MOBSTER,
)
from syne_tune.experiments.baselines import MethodArguments
class Methods:
RS = "RS"
BO = "BO"
ASHA = "ASHA"
MOBSTER = "MOBSTER"
ASHA_TANH = "ASHA-TANH"
MOBSTER_TANH = "MOBSTER-TANH"
ASHA_RELU = "ASHA-RELU"
MOBSTER_RELU = "MOBSTER-RELU"
def _modify_config_space(
method_arguments: MethodArguments, value: str
) -> MethodArguments:
result = copy.copy(method_arguments)
result.config_space = dict(
method_arguments.config_space,
hp_activation_fn_1=value,
hp_activation_fn_2=value,
)
return result
methods = {
Methods.RS: lambda method_arguments: RandomSearch(method_arguments),
Methods.BO: lambda method_arguments: BayesianOptimization(method_arguments),
Methods.ASHA: lambda method_arguments: ASHA(
method_arguments,
type="promotion",
),
Methods.MOBSTER: lambda method_arguments: MOBSTER(
method_arguments,
type="promotion",
),
# Fix activations to "tanh"
Methods.ASHA_TANH: lambda method_arguments: ASHA(
_modify_config_space(method_arguments, value="tanh"),
type="promotion",
),
Methods.MOBSTER_TANH: lambda method_arguments: MOBSTER(
_modify_config_space(method_arguments, value="tanh"),
type="promotion",
),
# Fix activations to "relu"
Methods.ASHA_RELU: lambda method_arguments: ASHA(
_modify_config_space(method_arguments, value="relu"),
type="promotion",
),
Methods.MOBSTER_RELU: lambda method_arguments: MOBSTER(
_modify_config_space(method_arguments, value="relu"),
type="promotion",
),
}
Different methods are defined in dictionary
methods
, as functions mappingmethod_arguments
of typeMethodArguments
to a scheduler object. Here,method_arguments.config_space
contains the default configuration space for the benchmark, where bothhp_activation_fn_1
andhp_activation_fn_2
are hyperparameters of typechoice(["tanh", "relu"])
.For
ASHA-TANH
,MOBSTER-TANH
,ASHA-RELU
,MOBSTER-RELU
, we fix these parameters. This is done in_modify_config_space
, wheremethod_arguments.config_space
is replaced by a configuration space where the two hyperparameters are fixed (so methods do not search over them anymore).Another way to modify
method_arguments
just before a method is created, is to use themap_extra_args
argument ofmain()
, as detailed here. This allows the modification to depend on extra command line arguments.
Next, we define the benchmarks our study should run over. For our simple
example, we use the fcnet
benchmarks:
from syne_tune.experiments.benchmark_definitions import fcnet_benchmark_definitions
benchmark_definitions = fcnet_benchmark_definitions.copy()
This is where you would have to plug in your own benchmarks, namely your training script with a bit of metadata. Examples are provided here and here.
Recording Extra Results
Next, we need to write the hpo_main.py
script which runs a single experiment.
As shown here,
this is mostly about selecting the correct main
function among
syne_tune.experiments.launchers.hpo_main_simulator.main()
,
syne_tune.experiments.launchers.hpo_main_local.main()
,
syne_tune.experiments.launchers.hpo_main_sagemaker.main()
, depending on the trial
backend we want to use. In our case, we also would like to record extra
information about the experiment. Here is the script:
from typing import Optional, Dict, Any, List
from baselines import methods
from benchmark_definitions import benchmark_definitions
from syne_tune import Tuner
from syne_tune.experiments.launchers.hpo_main_simulator import main
from syne_tune.optimizer.schedulers import HyperbandScheduler
from syne_tune.results_callback import ExtraResultsComposer
RESOURCE_LEVELS = [1, 3, 9, 27, 81]
class RungLevelsExtraResults(ExtraResultsComposer):
"""
We would like to monitor the sizes of rung levels over time. This is an extra
information normally not recorded and stored.
"""
def __call__(self, tuner: Tuner) -> Optional[Dict[str, Any]]:
if not isinstance(tuner.scheduler, HyperbandScheduler):
return None
rung_information = tuner.scheduler.terminator.information_for_rungs()
return {
f"num_at_level{resource}": num_entries
for resource, num_entries, _ in rung_information
if resource in RESOURCE_LEVELS
}
def keys(self) -> List[str]:
return [f"num_at_level{r}" for r in RESOURCE_LEVELS]
if __name__ == "__main__":
extra_results = RungLevelsExtraResults()
main(methods, benchmark_definitions, extra_results=extra_results)
As usual, we import
syne_tune.experiments.launchers.hpo_main_simulator.main()
(we use the simulator backend) and call it, passing ourmethods
andbenchmark_definitions
. We also passextra_results
, since we would like to record extra results.Note that apart from
syne_tune
imports, this script is only doing local imports. No other code frombenchmarking
is required.A certain number of time-stamped results are recorded by default in
results.csv.zip
, details are here. In particular, all metric values reported for all trials are recorded.In our example, we would also like to record information about the multi-fidelity schedulers ASHA and MOBSTER. As detailed in this tutorial, they record metric values for trials at different rung levels these trials reached (e.g., number of epochs trained), and decisions on which paused trial to promote to the next rung level are made by comparing its performance with all others in the same rung. The rung levels are growing over time, and we would like to record their respective sizes as a function of wall-clock time.
To this end, we create a subclass of
ExtraResultsComposer
, whose__call__
method extracts the desired information from the currentTuner
object. In our example, we first test whether the current scheduler is ASHA or MOBSTER (recall that we also runRS
andBO
as baselines). If so, we extract the desired information and return it as a dictionary.Finally, we create
extra_results
and pass it to themain
function.
The outcome is that a number of additional columns are appended to the dataframe
stored in results.csv.zip
, at least for experiments with ASHA or
MOBSTER schedulers. Running this script launches an experiment locally (if you
installed Syne Tune from source, you need to start the script from the
benchmarking/examples
directory):
python demo_experiment/hpo_main.py --experiment_tag docs-2-debug
Running Experiments in Parallel
Running our hpo_main.py
script launches a single experiment on the local
machine, writing results to a local directory. This is nice for debugging, but
slow and cumbersome once we convinced ourselves that the setup is working. We
will want to launch many experiments in parallel on AWS, and use our local
machine for other work.
Experiments with our setups
RS
,BO
,ASHA-TANH
,MOBSTER-TANH
,ASHA-RELU
,MOBSTER-RELU
,ASHA
,MOBSTER
are independent and can be run in parallel.We repeat each experiment 20 times, in order to quantify the random fluctuation in the results. These seeds are independent and can be run in parallel.
We could also run experiments with different benchmarks (i.e., datasets in
fcnet
) in parallel. But since a single simulated experiment is fast to do, we are not doing this here.
Running experiments in parallel requires a remote launcher script:
from pathlib import Path
from benchmark_definitions import benchmark_definitions
from baselines import methods
from syne_tune.experiments.launchers.launch_remote_simulator import launch_remote
if __name__ == "__main__":
def _is_expensive_method(method: str) -> bool:
return method.startswith("MOBSTER") or method == "BO"
entry_point = Path(__file__).parent / "hpo_main.py"
launch_remote(
entry_point=entry_point,
methods=methods,
benchmark_definitions=benchmark_definitions,
is_expensive_method=_is_expensive_method,
)
Again, we simply choose the correct
launch_remote
function amonglaunch_remote()
,launch_remote()
,launch_remote()
, depending on the trial backend.Note that apart from
syne_tune
imports, this script is only doing local imports. No other code frombenchmarking
is required.In
is_expensive_method
, we pass a predicate from method name. Ifis_expensive_method(method)
isTrue
, the 20 different seeds are run in parallel. Otherwise, they are run sequentially.In our example, we know that
BO
andMOBSTER
run quite a bit slower in the simulator thanRS
andASHA
, so we label the former as expensive. This means we have 4 expensive methods and 4 cheap ones, and our complete study will launch4 + 4 * 20 = 84
SageMaker training jobs. Sincefcnet
contains four benchmarks, we run8 * 20 * 4 = 640
experiments in total.
All of these experiments can be launched with a single command (if you
installed Syne Tune from source, you need to start the script from the
benchmarking/examples
directory):
python demo_experiment/launch_remote.py \
--experiment_tag docs-2 --random_seed 2465497701 --num_seeds 20
If --random_seed
is not given, a master random seed is drawn at random,
printed and also stored in the metadata. If a study consists of launching
experiments in several steps, it is good practice to pass the same random seed
for each launch command. For example, you can run the first launch command
without passing a seed, then note the seed from the output and use it for
further launches.
Avoiding Costly Failures
In practice, with a new experimental setup, it is not a good idea to launch all experiments in one go. We recommend to move in stages.
First, if our benchmarks run locally as well, we should start with some local tests. For example:
python demo_experiment/hpo_main.py \
--experiment_tag docs-2-debug --random_seed 2465497701 \
--method ASHA-RELU --verbose 1
We can cycle through several methods and check whether anything breaks. Note that
--verbose 1
generates useful output about the progress of the method, which
can be used to check whether properties are the way we expect (for example,
"relu"
is chosen for the fixed hyperparameters). Results are stored locally
under ~/syne_tune/docs-2-debug/
.
Next, we launch the setup remotely, but for a single seed:
python demo_experiment/launch_remote.py \
--experiment_tag docs-2 --random_seed 2465497701 --num_seeds 1
This will start 8 SageMaker training jobs, one for each method, and with
seed=0
. Some of them, like RS
, ASHA
, ASHA-*
will finish very
rapidly, and it makes sense to quickly browse their logs, to check whether
desired properties are met.
Finally, if this looks good, we can launch all the rest:
python demo_experiment/launch_remote.py \
--experiment_tag docs-2 --random_seed 2465497701 --num_seeds 20 \
--start_seed 1
This is launching all remaining experiments with seed
from 1 to 19.
Note
If something breaks when remotely launching for seed=0
, it may be that
results have already been written to S3. This is because results are written
out periodically. If you use the same tag docs-2
for initial debugging,
you will have to remove these results on S3, or otherwise be careful filtering
them out later on (this is discussed below).
In a large study consisting of many experiments, it can happen that some
experiments fail for reasons which do not invalidate results of the other ones.
If this happens, it is not a good idea, both time and cost wise, to start the
whole study from scratch. Instead, we recommend to clean up and restart only
the experiments which failed. For example, assume that in our study above,
the MOBSTER-TANH
experiments of seed == 13
failed:
We need to remove incomplete results of these experiments, which can corrupt final aggregate results otherwise. This can either be done by removing them on S3, or by advanced filtering (discussed below). In general, we recommend the former. For our example, the results to be removed are in
s3://{sagemaker-default-bucket}/syne-tune/docs-2/MOBSTER-TANH-13/
. Namely, sinceMOBSTER-TANH
is an “expensive” method, results for different seeds are written to different subdirectories.Next, we need to start the failed experiments again:
python demo_experiment/launch_remote.py \
--experiment_tag docs-2 --random_seed 2465497701 --num_seeds 14 \
--start_seed 13 --method MOBSTER-TANH
Instead, assume that the ASHA
experiments for seed == 13
failed. This is
a “cheap” method, so results for all seeds are written to
s3://{sagemaker-default-bucket}/syne-tune/docs-2/ASHA/
, into subdirectories
of the form docs-2-<benchmark>-<seed>-<datetime>
. Since this method is cheap,
we can rerun all its experiments, by first removing everything under
s3://{sagemaker-default-bucket}/syne-tune/docs-2/ASHA/
, then:
python demo_experiment/launch_remote.py \
--experiment_tag docs-2 --random_seed 2465497701 --num_seeds 20 \
--method ASHA
Note
Don’t worry if you restart failed experiments without first removing its
incomplete results on S3. Due to the <datetime>
postfix of directory
names, results of a restart never conflict with older ones. However, once
you plot aggregate results, you will get a warning that too many results
have been found, along with where these results are located. At this point,
you can still remove the incomplete ones.
Visualization of Results
Once all results are obtained, we would rapidly like to create comparative plots.
In Syne Tune, each experiment stores two files, metadata.json
with metadata,
and results.csv.zip
containing time-stamped results. The
Tuner
object at the end of the experiment is also serialized
to tuner.dill
, but this is not needed here.
Note
This section offers an example of the plotting facilities in Syne Tune. More details are provided in this tutorial.
First, we need to download the results from S3 to the local disk. This can be
done by a command which is also printed at the end of launch_remote.py
:
aws s3 sync s3://<BUCKET-NAME>/syne-tune/docs-2/ ~/syne-tune/docs-2/ \
--exclude "*" --include "*metadata.json" --include "*results.csv.zip"
This command can also be run from inside the plotting code. Note that the
tuner.dill
result files are not downloaded, since they are not needed for
result visualization.
Here is the code for generating result plots for two of the benchmarks:
from typing import Dict, Any, Optional, List, Set
import logging
from baselines import methods
from benchmark_definitions import benchmark_definitions
from hpo_main import RungLevelsExtraResults
from syne_tune.experiments import ComparativeResults, PlotParameters, SubplotParameters
def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
# The setup is the algorithm. No filtering
return metadata["algorithm"]
SETUP_TO_SUBPLOT = {
"ASHA": 0,
"MOBSTER": 0,
"ASHA-TANH": 1,
"MOBSTER-TANH": 1,
"ASHA-RELU": 2,
"MOBSTER-RELU": 2,
"RS": 3,
"BO": 3,
}
def metadata_to_subplot(metadata: Dict[str, Any]) -> Optional[int]:
return SETUP_TO_SUBPLOT[metadata["algorithm"]]
def _print_extra_results(
extra_results: Dict[str, Dict[str, List[float]]],
keys: List[str],
skip_setups: Set[str],
):
for setup_name, results_for_setup in extra_results.items():
if setup_name not in skip_setups:
print(f"[{setup_name}]:")
for key in keys:
values = results_for_setup[key]
print(f" {key}: {[int(x) for x in values]}")
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
experiment_name = "docs-2"
experiment_names = (experiment_name,)
setups = list(methods.keys())
num_runs = 20
download_from_s3 = False # Set ``True`` in order to download files from S3
# Plot parameters across all benchmarks
plot_params = PlotParameters(
xlabel="wall-clock time",
aggregate_mode="iqm_bootstrap",
grid=True,
)
# We would like four subplots (2 row, 2 columns), each showing two setups.
# Each subplot gets its own title, and legends are shown in each,
plot_params.subplots = SubplotParameters(
nrows=2,
ncols=2,
kwargs=dict(sharex="all", sharey="all"),
titles=[
"activations tuned",
"activations = tanh",
"activations = relu",
"single fidelity",
],
title_each_figure=True,
legend_no=[0, 1, 2, 3],
)
# The creation of ``results`` downloads files from S3 (only if
# ``download_from_s3 == True``), reads the metadata and creates an inverse
# index. If any result files are missing, or there are too many of them,
# warning messages are printed
results = ComparativeResults(
experiment_names=experiment_names,
setups=setups,
num_runs=num_runs,
metadata_to_setup=metadata_to_setup,
plot_params=plot_params,
metadata_to_subplot=metadata_to_subplot,
download_from_s3=download_from_s3,
)
# We can now create plots for the different benchmarks:
# - We store the figures as PNG files
# - We also load the extra results collected during the experiments
# (recall that we monitored sizes of rungs for ASHA and MOBSTER).
# Instead of plotting their values over time, we print out their
# values at the end of each experiment
extra_results_keys = RungLevelsExtraResults().keys()
skip_setups = {"RS", "BO"}
# First: fcnet-protein
benchmark_name = "fcnet-protein"
benchmark = benchmark_definitions[benchmark_name]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
ylim=(0.22, 0.30),
)
extra_results = results.plot(
benchmark_name=benchmark_name,
plot_params=plot_params,
file_name=f"./{experiment_name}-{benchmark_name}.png",
extra_results_keys=extra_results_keys,
)["extra_results"]
_print_extra_results(extra_results, extra_results_keys, skip_setups=skip_setups)
# Next: fcnet-slice
benchmark_name = "fcnet-slice"
benchmark = benchmark_definitions[benchmark_name]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
ylim=(0.00025, 0.0012),
)
results.plot(
benchmark_name=benchmark_name,
plot_params=plot_params,
file_name=f"./{experiment_name}-{benchmark_name}.png",
extra_results_keys=extra_results_keys,
)
_print_extra_results(extra_results, extra_results_keys, skip_setups=skip_setups)
The figure for benchmark fcnet-protein
looks as follows:
Results for FCNet (protein dataset) |
Moreover, we obtain an output for extra results, as follows:
[ASHA]:
num_at_level1: [607, 630, 802, 728, 669, 689, 740, 610, 566, 724, 691, 812, 837, 786, 501, 642, 554, 625, 531, 672]
num_at_level3: [234, 224, 273, 257, 247, 238, 271, 222, 191, 256, 240, 273, 287, 272, 185, 227, 195, 216, 197, 241]
num_at_level9: [97, 81, 99, 95, 99, 99, 106, 92, 73, 98, 90, 95, 99, 98, 74, 86, 78, 82, 85, 101]
num_at_level27: [49, 36, 37, 36, 41, 47, 37, 43, 36, 35, 34, 37, 39, 39, 39, 44, 41, 30, 45, 49]
num_at_level81: [22, 17, 18, 15, 21, 22, 19, 26, 20, 15, 16, 13, 13, 23, 27, 29, 20, 17, 20, 26]
[MOBSTER]:
num_at_level1: [217, 311, 310, 353, 197, 96, 377, 135, 364, 336, 433, 374, 247, 282, 175, 302, 187, 225, 182, 240]
num_at_level3: [107, 133, 124, 138, 104, 64, 163, 72, 157, 132, 146, 140, 123, 112, 110, 129, 90, 100, 86, 126]
num_at_level9: [53, 62, 55, 59, 66, 51, 83, 47, 72, 55, 54, 59, 54, 51, 72, 65, 60, 49, 55, 70]
num_at_level27: [29, 34, 30, 26, 50, 37, 49, 31, 27, 25, 23, 28, 27, 28, 49, 33, 42, 27, 34, 45]
num_at_level81: [18, 20, 16, 14, 33, 25, 37, 24, 13, 17, 10, 14, 17, 20, 32, 24, 29, 15, 26, 31]
[ASHA-TANH]:
num_at_level1: [668, 861, 755, 775, 644, 916, 819, 710, 694, 870, 764, 786, 769, 710, 862, 807, 859, 699, 757, 794]
num_at_level3: [237, 295, 265, 272, 221, 311, 302, 246, 246, 294, 278, 280, 276, 240, 297, 290, 304, 258, 270, 279]
num_at_level9: [86, 112, 101, 97, 91, 104, 119, 90, 92, 104, 98, 96, 98, 90, 108, 120, 105, 109, 105, 102]
num_at_level27: [37, 47, 39, 39, 40, 39, 45, 44, 39, 41, 41, 44, 44, 40, 45, 43, 38, 53, 49, 39]
num_at_level81: [21, 16, 16, 16, 20, 16, 17, 18, 17, 14, 18, 21, 21, 20, 17, 19, 16, 19, 23, 20]
[MOBSTER-TANH]:
num_at_level1: [438, 594, 462, 354, 307, 324, 317, 359, 483, 523, 569, 492, 516, 391, 408, 565, 492, 322, 350, 479]
num_at_level3: [166, 206, 156, 135, 133, 127, 129, 131, 175, 211, 191, 165, 178, 169, 151, 204, 164, 122, 132, 205]
num_at_level9: [69, 75, 56, 54, 78, 60, 57, 60, 76, 80, 72, 56, 72, 103, 67, 77, 63, 48, 59, 92]
num_at_level27: [36, 35, 25, 28, 45, 37, 27, 36, 46, 27, 37, 26, 37, 58, 31, 36, 26, 28, 33, 39]
num_at_level81: [20, 13, 12, 11, 23, 20, 13, 20, 23, 10, 13, 9, 18, 31, 16, 18, 11, 16, 19, 21]
[ASHA-RELU]:
num_at_level1: [599, 670, 682, 817, 608, 585, 770, 397, 613, 721, 599, 601, 618, 718, 613, 674, 715, 638, 598, 652]
num_at_level3: [201, 246, 242, 277, 225, 209, 282, 140, 212, 245, 202, 205, 215, 245, 207, 239, 238, 224, 221, 234]
num_at_level9: [75, 94, 94, 100, 89, 92, 101, 60, 78, 89, 76, 82, 80, 98, 86, 96, 83, 84, 90, 91]
num_at_level27: [37, 43, 36, 34, 40, 45, 39, 35, 34, 31, 40, 40, 38, 39, 35, 34, 29, 34, 41, 35]
num_at_level81: [23, 19, 14, 13, 19, 21, 15, 24, 17, 13, 20, 18, 19, 18, 20, 16, 13, 15, 22, 17]
[MOBSTER-RELU]:
num_at_level1: [241, 319, 352, 438, 354, 386, 197, 262, 203, 387, 320, 139, 359, 401, 334, 294, 361, 403, 178, 141]
num_at_level3: [110, 156, 135, 166, 138, 143, 104, 124, 95, 136, 133, 71, 133, 151, 130, 122, 134, 151, 92, 74]
num_at_level9: [50, 83, 59, 75, 59, 55, 57, 72, 53, 53, 58, 40, 62, 63, 61, 54, 52, 65, 48, 47]
num_at_level27: [31, 51, 29, 31, 29, 23, 39, 38, 36, 20, 29, 36, 32, 29, 32, 29, 24, 27, 31, 34]
num_at_level81: [20, 35, 12, 11, 12, 15, 22, 18, 26, 12, 16, 27, 16, 15, 20, 15, 15, 13, 18, 22]
There are four subfigures arranged as two-by-two matrix. Each contains two curves in bold, along with confidence intervals. The horizontal axis depicts wall-clock time, and on the vertical axis, we show the best metric value found until this time.
More general, the data from our 640 experiments can be grouped w.r.t. subplot, then setup. Each setup gives rise to one curve (bold, with confidence band). Subplots are optional, the default is to plot a single figure.
The function
metadata_to_setup
maps the metadata stored for an experiment to the setup name. In our basic case, the setup is simply the name of the method.The function
metadata_to_subplot
maps the metadata to the subplot index (0, 1, 2, 3). We group setups with the same configuration space, but also split multi-fidelity and single-fidelity methods.Once the data is grouped w.r.t. benchmark, then subplot (optional), then setup, we should be left with 20 experiments, one for each seed. These 20 curves are now interpolated to a common grid, and at each grid point, the 20 values are aggregated into
lower
,aggregate
,upper
. In the figure,aggregate
is shown in bold, andlower
,upper
in dashed. Different aggregation modes are supported (selected byplot_params.aggregate_mode
).We pass
extra_results_keys
to theplot()
method in order to also retrieve extra results. This method returns a dictionary, whose “extra_results” entry is what we need.
Advanced Experimenting
Once you start to run many experiments, you will get better at avoiding wasteful repetitions. Here are some ways in which Syne Tune can support you.
Combining results from several studies: It often happens that results for a new idea need to be compared to baselines on a common set of benchmarks. You do not have to re-run baselines, but can easily combine older results with more recent ones. This is explained here.
When running many experiments, some may fail. Syne Tune supports you in not having to re-run everything from scratch. As already noted above, when creating aggregate plots, it is important not to use incomplete results stored for failed experiments. The cleanest way to do so is to remove these results on S3. Another option is to filter out corrupt results:
If you forget about removing such corrupt results, you will get a reminder when creating
ComparativeResults
. Since you pass the list of setup names and the number of seeds (innum_runs
), you get a warning when too many experiments have been found, along with the path names.Results are stored on S3, using object name prefixes of the form
<s3-bucket>/syne-tune/docs-2/ASHA/docs-2-fcnet-protein-7-2023-04-20-15-20-18-456/
or<s3-bucket>/syne-tune/docs-2/MOBSTER-7/docs-2-fcnet-protein-7-2023-04-20-15-20-00-677/
. The pattern is<tag>/<method>/<tag>-<benchmark>-<seed>-<datetime>/
for cheap methods, and<tag>/<method>-<seed>/<tag>-<benchmark>-<seed>-<datetime>/
for expensive methods.Instead of removing corrupt results on S3, you can also filter them by datetime, using the
datetime_bounds
argument ofComparativeResults
. This allows you define an open or closed datetime range for results you want to keep. If your failed attempts preceed the ones that finally worked out, this type of filtering can save you the head-ache of removing files on S3.Warning: When you remove objects on S3 for some experiment tag, it is strongly recommended to remove all result files locally (so everything at
~/syne-tune/<tag>/
) and sync them back from S3, using the command at the start of this section.aws s3 sync
is prone to make mistakes otherwise, which are very hard to track down.
My Code Contains Packages
All code in benchmarking.examples.demo_experiment
is contained in a single
directory. If your code for launching experiments and defining benchmarks is
structured into packages, you need to follow some extra steps.
There are two choices you have:
Either, you install Syne Tune from source. In this case, you can just keep your launcher scripts and benchmark definitions in there, and use absolute imports from
benchmarking
. One advantage of this is that you can use all benchmarks currently included inbenchmarking.benchmark_definitions
.Or you do not install Syne Tune from source, in which case this section is for you.
We will use the example in benchmarking.nursery.odsc_tutorial
. More
details about this example are found in
this tutorial. We will not assume that Syne
Tune is installed from source, but just that the code from
benchmarking.nursery.odsc_tutorial
is present at <abspath>/odsc_tutorial/
.
The root package for this example is transformer_wikitext2
, in that all
imports start from there, for example:
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.hpo_main_local import main
if __name__ == "__main__":
main(methods, benchmark_definitions)
The code has the following structure:
tree transformer_wikitext2/
transformer_wikitext2
├── __init__.py
├── baselines.py
├── benchmark_definitions.py
├── code
│ ├── __init__.py
│ ├── requirements.txt
│ ├── training_script.py
│ ├── training_script_no_checkpoints.py
│ ├── training_script_report_end.py
│ └── transformer_wikitext2_definition.py
├── local
│ ├── __init__.py
│ ├── hpo_main.py
│ ├── launch_remote.py
│ ├── plot_learning_curve_pairs.py
│ ├── plot_learning_curves.py
│ ├── plot_results.py
│ └── requirements-synetune.txt
└── sagemaker
├── __init__.py
├── hpo_main.py
├── launch_remote.py
├── plot_results.py
└── requirements.txt
Training code and benchmark definition are in code
, launcher and plotting
scripts for the local backend in local
, and ditto for the SageMaker backend
in sagemaker
.
In order to run any of the scripts, the PYTHONPATH
environment variable needs
to be appended to as follows:
export PYTHONPATH="${PYTHONPATH}:<abspath>/odsc_tutorial/"
Here, you need to replace <abspath>
with the absolute path to odsc_tutorial
.
Once this is done, the following should work:
python transformer_wikitext2/local/hpo_main.py \
--experiment_tag mydebug --benchmark transformer_wikitext2 --num_seeds 1
Of course, this script needs all training script dependencies to be installed locally. If you work with SageMaker, it is much simpler to launch experiments remotely. The launcher script is as follows:
from pathlib import Path
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.launch_remote_local import launch_remote
if __name__ == "__main__":
entry_point = Path(__file__).parent / "hpo_main.py"
source_dependencies = [str(Path(__file__).parent.parent)]
launch_remote(
entry_point=entry_point,
methods=methods,
benchmark_definitions=benchmark_definitions,
source_dependencies=source_dependencies,
)
Importantly, you need to set source_dependencies
in this script. Here,
source_dependencies = [str(Path(__file__).parent.parent)]
translates
to ["<abspath>/odsc_tutorial/transformer_wikitext2"]
. If you have
multiple root packages you want to import from, source_dependencies
must
contain all of them.
The following command should work now:
python transformer_wikitext2/local/launch_remote.py \
--experiment_tag mydebug --benchmark transformer_wikitext2 --num_seeds 1 \
--method BO
This should launch one SageMaker training job, which runs Bayesian optimization with 4 workers. You can also test remote launching with the SageMaker backend:
python transformer_wikitext2/sagemaker/launch_remote.py \
--experiment_tag mydebug --benchmark transformer_wikitext2 --num_seeds 1 \
--method BO --n_workers 2
This command should launch one SageMaker training job running Bayesian optimization with the SageMaker backend, meaning that at any given time, two worker training jobs are running.
How to Contribute a New Scheduler
This tutorial guides developers and researchers to contribute a new scheduler to Syne Tune, or to modify and extend an existing one.
We hope this information inspires you to give it a try. Please do consider contributing your efforts to Syne Tune:
Reproducible research: Syne Tune contains careful implementations of many baselines and SotA algorithms. Once your new method is in there, you can compare apples against apples (same backend, same benchmarks, same stopping rules) instead of apples against oranges.
Faster and cheaper: You have a great idea for a new scheduler? Test it right away on a large range of benchmarks. Use Syne Tune’s blackbox repository and simulator backend in order to dramatically cut compute costs and waiting time.
Impact: If you compared your method to a range of others, you know how hard it is to get full-fledged HPO code of others running. Why would it be any different for yours? We did a lot of the hard work already, why not benefit from that?
Your code is more awesome than ours? Great! Why not contribute your backend or your benchmarks to Syne Tune as well?
Note
In order to develop new methodology in Syne Tune, make sure to use an
installation from source.
In particular, you need to have installed the dev
dependencies.
A First Example
In this section, we start with a simple example and clarify some basic concepts.
If you have not done so, we recommend you have a look at Basics of Syne Tune in order to get familiar with basic concepts of Syne Tune.
First Example
A simple example for a new scheduler (called SimpleScheduler
) is given by
the script
examples/launch_height_standalone_scheduler.py.
All schedulers are subclasses of
TrialScheduler
. Important methods
include:
Constructor: Needs to be passed the configuration space. Most schedulers also have
metric
(name of metric to be optimized) andmode
(whether metric is to be minimized or maximized; default is"min"
)._suggest
(internal version ofsuggest
): Called by theTuner
whenever a worker is available. Returns trial to execute next, which in most cases will start a new configuration using trial IDtrial_id
(asstart_suggestion
). Some schedulers may also suggest to resume a paused trial (asresume_suggestion
). OurSimpleScheduler
simply draws a new configuration at random from the configuration space.on_trial_result
: Called by theTuner
whenever a new result reported by a running trial has been received. Here,trial
provides information about the trial (most important istrial.trial_id
), andresult
contains the arguments passed toReporter
by the underlying training script. All but the simplest schedulers maintain a state which is modified based on this information. The scheduler also decides what to do with this trial, returning aSchedulerDecision
to theTuner
, which in turn relays this decision to the backend. OurSimpleScheduler
maintains a sorted list of all metric values reported inself.sorted_results
. Whenever a trial reports a metric value which is worse than 4/5 of all previous reports (across all trials), the trial is stopped, otherwise it may continue. This is an example for a multi-fidelity scheduler, in that a trial reports results multiple times (for example, a script training a neural network may report validation errors at the end of each epoch). Even if your scheduler does not support a multi-fidelity setup, in that it does not make use of intermediate results, it should work properly with training scripts which report such results (e.g., after every epoch).metric_names
: Returns names of metrics which are relevant to this scheduler. These names appear as keys in theresult
dictionary passed toon_trial_result
.
There are further methods in
TrialScheduler
, which will be discussed
in detail below. This simple scheduler is also
missing the points_to_evaluate
argument, which we recommend every new
scheduler to support, and which is discussed in more detail
here.
Basic Concepts
Recall from Basics of Syne Tune that an HPO
experiment is run as interplay between a backend and a scheduler, which is
orchestrated by the Tuner
. The backend starts, stops,
pauses, or resumes training jobs and relays their reports. A trial abstracts
the evaluation of a hyperparameter configuration. There is a diverse range of
schedulers which can be implemented in Syne Tune, some examples are:
Simple “full evaluation” schedulers. These suggest configurations for new trials, but do not try to interact with running trials, even if the latter post intermediate results. A basic example is
FIFOScheduler
, to be discussed below.Early-stopping schedulers. These require trials to post intermediate results (e.g., validation errors after every epoch), and their
on_trial_result
may stop underperforming trials early. An example isHyperbandScheduler
withtype="stopping"
.Pause-and-resume schedulers. These require trials to post intermediate results (e.g., validation errors after every epoch). Their
on_trial_result
may pause trials at certain points in time, and their_suggest
may decide to resume a paused trial instead of starting a new one. An example isHyperbandScheduler
withtype="promotion"
.
Note
The method on_trial_result()
returns a SchedulerDecision
,
signaling the tuner to continue, stop, or pause the reporting trial.
The difference between pause and stop is important. If a trial is stopped,
it cannot be resumed later on. In particular, its checkpoints may be removed
(if the backend is created with delete_checkpoints=True
). On the other
hand, if a trial is paused, it may be resumed in the future, and its most
recent checkpoint is retained (more details are given
here).
Asynchronous Job Execution
One important constraint on any scheduler to be run in Syne Tune is that calls
to both suggest
and on_trial_report
have to be non-blocking: they need
to return instantaneously, i.e. must not wait for some future events to happen.
This is to ensure that in the presence of several workers (i.e., parallel
execution resources), idle time is avoided: Syne Tune is always executing
parallel jobs asynchronously.
Unfortunately, many HPO algorithms proposed in the literature assume a synchronous job execution setup, often for conceptual simplicity (examples include successive halving and Hyperband, as well as batch suggestions for Bayesian optimization). In general, it just takes a little extra effort to implement non-blocking versions of these, and Syne Tune provides ample support code for doing so, as will be demonstrated in detail.
Searchers and Schedulers
Many HPO algorithms have a modular structure. They need to make decisions about
how to keep workers busy in order to obtain new information (suggest
), and
they need to react to new results posted by trials (on_trial_result
). Most
schedulers make these decisions following a general principle, such as:
Random search: New configurations are sampled at random.
Bayesian optimization: Surrogate models representing metrics are fit to result data, and they are used to make decisions (mostly
suggest
). Examples include Gaussian process based BO or TPE (Tree Parzen Estimator).Evolutionary search: New configurations are obtained by mutating well-performing members of a population.
Once such internal structure is recognized, we can use it to expand the range of methods while maintaining simple, modular implementations. In Syne Tune, this is done by configuring generic schedulers with internal searchers. A basic example is given below, more advanced examples follow further below.
If you are familiar with Ray Tune, please note a difference in terminology. In Ray Tune, searcher and scheduler are two independent concepts, mapping to different decisions to be made by an HPO algorithm. In Syne Tune, the HPO algorithm is represented by the scheduler, which may have a searcher as component. We found that once model-based HPO is embraced (e.g., Bayesian optimization), this creates strong dependencies between suggest and stop or resume decisions, so that the supposed modularity does not really exist.
Maybe the most important recommendation for implementing a new scheduler in Syne Tune is this: be lazy!
Can your idea be implemented as a new searcher, to be plugged into an existing generic scheduler? Detailed examples are given here, here, and here.
Does your idea involve changing the stop/continue or pause/resume decisions in asynchronous successive halving or Hyperband? All you need to do is to implement a new
RungSystem
. Examples:StoppingRungSystem
,PromotionRungSystem
,RUSHStoppingRungSystem
,PASHARungSystem
,CostPromotionRungSystem
.
Random Search
Random search is arguably the simplest HPO baseline. In a nutshell, _suggest
samples a new configuration at random from the configuration space, much like
our SimpleScheduler
above, and on_trial_result
does nothing except
returning SchedulerDecision.CONTINUE
. A slightly more advanced version
would make sure that the same configuration is not suggested twice.
In this section, we walk through the Syne Tune implementation of random search,
thereby discussing some additional concepts. This will also be a first example
of the modular concept just described: random search is implemented as generic
FIFOScheduler
configured by a
RandomSearcher
.
A self-contained implementation of random search would be shorter. On the other
hand, as seen in
syne_tune.optimizer.baselines
, FIFOScheduler
also powers GP-based
Bayesian optimization, grid search, BORE, regularized evolution and constrained
BO simply by specifying different searchers. A number of concepts, to be
discussed here, have to be implemented once only and can be maintained much more
easily.
FIFOScheduler and RandomSearcher
We will have a close look at
FIFOScheduler
and
RandomSearcher
. Let us first
consider the arguments of FIFOScheduler
:
searcher
,search_options
: These are used to configure the scheduler with a searcher. For ease of use,searcher
can be a name, and additional arguments can be passed viasearch_options
. In this case, the searcher is created by a factory, as detailed below. Alternatively,searcher
can also be aBaseSearcher
object.metric
,mode
: As discussed above inSimpleScheduler
.random_seed
: Several pseudo-random number generators may be used in scheduler and searcher. Seeds for these are drawn from a random seed generator maintained inFIFOScheduler
, whose seed can be passed here. As a general rule, all schedulers and searchers implemented in Syne Tune carefully manage such generators (and contributed schedulers are strongly encourage to adopt this pattern).points_to_evaluate
: A list of configurations (possibly partially specified) to be suggested first. This allows the user to initialize the search by default configurations, thereby injecting knowledge about the task. We strongly recommend every scheduler to support this mechanism. More details are given below.max_resource_attr
,max_t
: These arguments are relevant for multi-fidelity schedulers. Only one of them needs to be given. We recommend to usemax_resource_attr
. More details are given below.
The most important use case is to configure FIFOScheduler
with a new
searcher, and we will concentrate on this one. First, the base class of all
searchers is BaseSearcher
:
points_to_evaluate
: A list of configurations to be suggested first. This is initialized and (possibly) imputed in the base class, but needs to be used in child classes. Configurations inpoints_to_evaluate
can be partially specified. Any hyperparameter missing in a configuration is imputed using a “midpoint” rule. For a numerical parameter, this is the middle of the range (in linear or log scale). For a categorical parameter, the first value is chosen. Ifpoints_evaluate
is not given, the default is[dict()]
: a single initial configuration is determined fully by the midpoint rule. In order not to use initial configurations at all, the user has to passpoints_to_evaluate=[]
. The imputation of configurations is done in the base class.configure_scheduler
: Callback function, allows the searcher to configure itself depending on the scheduler. It also allows the searcher to reject schedulers it is not compatible with. This method is called automatically at the beginning of an experiment.get_config
: This method is called by the scheduler in_suggest
, it delegates the suggestion of a configuration for a new trial to the searcher.on_trial_result
: This is called by the scheduler in its ownon_trial_result
, also passing the configuration of the current trial. If the searcher maintains a surrogate model (for example, based on a Gaussian process), it should update its model withresult
data iffupdate=True
. This is discussed in more detail below. Note thaton_trial_result
does not return anything: decisions on how to proceed with the trial are not done in the searcher.register_pending
: Registers one (or more) pending evaluations, which are signals to the searcher that a trial has been started and will return an observation in the future. This is important in order to avoid redundant suggestions in model-based HPO.evaluation_failed
: Called by the scheduler if a trial failed. Default searcher reactions are to remove pending evaluations and not to suggest the corresponding configuration again. More advanced constrained searchers may also try to avoid nearby configurations in the future.cleanup_pending
: Removes all pending evaluations for a trial. This is called by the scheduler when a trial terminates.get_state
,clone_from_state
: Used in order to serialize and de-serialize the searcherdebug_log
: There is some built-in support for a detailed log, embedded inFIFOScheduler
and the Syne Tune searchers.
Below BaseSearcher
, there is
StochasticSearcher
, which
should be used by all searchers which make random decisions. It maintains a PRN
generator and provides methods to serialize and de-serialize its state.
StochasticAndFilterDuplicatesSearcher
extends StochasticSearcher
. It supports a number of features which are
desirable for most searchers:
Seed management for random decisions.
Avoid suggesting the same configuration more than once. While we in general recommend to use the default
allow_duplicates == False
, allowing for duplicates can be useful when dealing with configuration spaces of small finite size.Restrict configurations which can be suggested to a finite set. This can be very useful when using tabulated blackboxes. It does not make sense for every scheduler though, as some rely on a continuous search over the configuration space. You can inherit from
StochasticAndFilterDuplicatesSearcher
and still not support this feature, by insisting onrestrict_configurations == None
.
All built-in Syne Tune searchers either inherit from this class, or avoid
duplicate suggestions in a different way. Finally, let us walk through
RandomSearcher
:
There are a few features beyond
SimpleScheduler
above. The searcher does not suggest the same configuration twice (ifallow_duplicates == False
), and also warns if a finite configuration space has been exhausted. It also usesHyperparameterRanges
for random sampling and comparing configurations (to spot duplicates). This is a useful helper class, also for encoding configurations as vectors. The logic of detecting duplicates is implemented in the base classStochasticAndFilterDuplicatesSearcher
. Finally,debug_log
is used for diagnostic logs.get_config
first asks for another entry frompoints_to_evaluate
by way of_next_initial_config
. It then samples a new configuration at random. This is done without replacement ifallow_duplicates == False
, and with replacement otherwise. If successful, it also feedsdebug_log
._update
: This is not needed for random search, but is used here in order to feeddebug_log
.
The TrialScheduler API
In this section, we have a closer look at the
TrialScheduler
API, and how a scheduler
interacts with the trial backend.
Interaction between TrialScheduler and TrialBackend
Syne Tune supports a multitude of automatic tuning scenarios which embrace
asynchronous job execution. The goal of automatic tuning is to find a
configuration whose evaluation results in a sufficiently small (or large, if
mode="max"
) metric value, and to do so as fast as possible. This is done
by starting trials with promising configurations (suggest
), and
(optionally) by stopping or pausing trials which underperform. A certain
number of such evaluation (or training) jobs can be executed in parallel, on
separate workers (which can be different GPUs or CPU cores on the same
instance, or different instances).
In Syne Tune, this process is split between two entities: the trial backend
and the trial scheduler. The backend wraps the
training code to be executed for different configurations and is responsible to
start jobs, as well as stop, pause or resume them. It also collects results
reported by the training jobs and relays them to the scheduler. In Syne Tune,
pause-and-resume scheduling is done via
checkpointing. While
code to write and load checkpoints locally must be provided by the training
script, the backend makes them available when needed. There are two basic
events which happen repeatedly during an HPO experiment, as orchestrated by the
Tuner
:
The
Tuner
polls the backend, which signals that one or more workers are available. For each free worker, it callssuggest()
, asking for what to do next. As already seen in our first example, the scheduler will typically suggest a configuration for a new trial to be started. On the other hand, a pause-and-resume scheduler may also suggest to resume a trial which is currently paused (having been started, and then paused, in the past). Based on the scheduler response, theTuner
asks the backend to start a new trial, or to resume an existing one.The
Tuner
polls the backend for new results, having been reported since the last recent poll. For each such result,on_trial_result()
is called. The scheduler makes a decision of what to do with the reporting trial. Based on this decision, theTuner
asks the backend to stop or pause the trial (or does nothing, in case the trial is to continue).
The processing of these events is non-blocking and full asynchronous, without any synchronization points. Depending on the backend, there can be substantial delays between a trial reporting a result and a stop or pause decision being executed. During this time, the training code simply continues, it may even report further results. Moreover, a worker may be idle between finishing an evaluation and starting or resuming another one, due to delays in the backend or even compute time for decisions in the scheduler. However, it will never be idle having to wait for results from other trials.
TrialScheduler API
We now discuss additional aspects of the
TrialScheduler
API, beyond what has
already been covered here:
suggest
returns aTrialSuggestion
object with fieldsspawn_new_trial_id
,checkpoint_trial_id
,config
. Here,start_suggestion()
hasspawn_new_trial_id=True
and requiresconfig
. A new trial is to be started with configurationconfig
. Typically, this trial starts training from scratch. However, some specific schedulers allow the trial to warm-start from a checkpoint written for a different trial (an example isPopulationBasedTraining
). A pause-and-resume scheduler may also returnresume_suggestion()
, wherespawn_new_trial_id=False
andcheckpoint_trial_id
is mandatory. In this case, a currently paused trial with IDcheckpoint_trial_id
is to be resumed. Typically, the configuration of the trial does not change, but ifconfig
is used, the resumed trial is assigned a new configuration. However, for all schedulers currently implemented in Syne Tune, a trial’s configuration never changes.The only reason for
suggest
to returnNone
is if no further suggestion can be made. This can happen if the configuration space has been exhausted. As discussed here, the scheduler cannot delay asuggest
decision to a later point in time.The helper methods
_preprocess_config
and_postprocess_config
are used when interfacing with a searcher. Namely, the configuration space (memberconfig_space
) may contain any number of fixed attributes alongside the hyperparameters to be tuned (the latter have values of typeDomain
), and each hyperparameter has a specificvalue_type
(mostlyfloat
,int
orstr
). Searchers require clean configurations, containing only hyperparameters with the correct value types, which is ensured by_preprocess_config
. Also,_postprocess_config
adds back the fixed attributes fromconfig_space
, unless they have already been set.on_trial_add
: This method is called byTuner
once a new trial has been scheduled to be started. In general, a scheduler may assume that ifsuggest
returnsstart_suggestion()
, the corresponding trial is going to be started, soon_trial_add
is not mandatory.on_trial_error
: This method is called byTuner
if the backend reports a trial’s evaluation to have failed. A useful reaction for the scheduler is to not propose this configuration again, and also to remove pending evaluations associated with this trial.on_trial_complete
: This method is called once a trial’s evaluation is complete, without having been stopped early. The final reported result is passed here. Schedulers who ignore intermediate reports from trials, may just implement this method and haveon_trial_result
returnSchedulerDecision.CONTINUE
. Multi-fidelity schedulers may ignore this method, since any reported result is transmitted viaon_trial_result
(the final result is transmitted twice, first viaon_trial_result
, then viaon_trial_complete
).on_trial_remove
is called when a trial gets stopped or paused, so is not running anymore, but also did not finish naturally. Once more, this method is not mandatory.
Wrapping External Scheduler Code
One of the most common instances of extending Syne Tune is wrapping external code. While there are comprehensive open source frameworks for HPO, many recent advanced algorithms are only available as research codes, typically ignoring systems aspects such as distributed scheduling, or maintaining results in an interchangeable format. Due to the modular, backend-agnostic design of Syne Tune, external scheduler code is easily integrated, and can then be compared “apples to apples” against a host of baselines, be it by fast simulation on surrogate benchmarks, or distributed across several machines.
In this chapter, we will walk through an example of how to wrap Gaussian process based Bayesian optimization from BoTorch.
BoTorchSearcher
While Syne Tune supports Gaussian process based Bayesian optimization natively
via GPFIFOSearcher
, with
searcher="bayesopt"
in FIFOScheduler
,
you can also use BoTorch via
BoTorchSearcher
,
with searcher="botorch"
in
FIFOScheduler
.
Before we look into the code, note that even though we wrap external HPO code, we still need to implement some details on our side:
We need to maintain the trials which have resulted in observations, as well as those which are pending (e.g., have been started, but have not yet returned an observation).
We need to provide the code for suggesting initial configurations, either drawing from
points_to_evaluate
, or sampling at random.We need to avoid duplicate suggestions if
allow_duplicates == False
.BoTorch requires configurations to be encoded as vectors with values in \([0, 1]\). We need to provide this encoding and decoding as well.
Such details are often ignored in research code (in fact, most HPO code just
implements the equivalent of
get_config()
,
given all previous data), but has robust and easy to use solutions in Syne Tune,
as we demonstrate here. Here is
_get_config()
:
def _get_config(self, trial_id: str, **kwargs) -> Optional[dict]:
trial_id = int(trial_id)
config_suggested = self._next_initial_config()
if config_suggested is None:
if len(self.objectives()) < self.num_minimum_observations:
config_suggested = self._get_random_config()
else:
config_suggested = self._sample_next_candidate()
if config_suggested is not None:
self.trial_configs[trial_id] = config_suggested
return config_suggested
First,
self._next_initial_config()
is called, which returns a configuration frompoints_to_evaluate
if there is still one not yet returned, otherwiseNone
.Otherwise, if there are less than
self.num_minimum_observations
trials which have returned observation, we return a randomly sampled configuration (self._get_random_config()
), otherwise one suggested by BoTorch (self._sample_next_candidate()
).Here,
self._get_random_config()
is implemented in the base classStochasticAndFilterDuplicatesSearcher
and calls the same code as all other schedulers employing random suggestions in Syne Tune. In particular, this function allows to pass an exclusion list of configurations to avoid.The exclusion list
self._excl_list
is maintained in the base classStochasticAndFilterDuplicatesSearcher
. Ifallow_duplicates == False
, it contains all configurations suggested previously. Otherwise, it contains configurations of failed or pending trials, which we want to avoid in any case. The exclusion list is implemented asExclusionList
. Configurations are represented by hash strings which are independent of details such as floating point resolution.If
allow_duplicates == False
and the configuration space is finite, it can happen that all configurations have already been suggested, in which caseget_config
returnsNone
.Finally,
_get_config
is called inget_config()
, where ifallow_duplicates == False
, the new configuration is added to the exclusion list.In
_sample_next_candidate()
, the usage ofself._restrict_configurations
is of interest. It relates to therestrict_configurations
argument. If this is notNone
, configurations are suggested from a finite set, namely those inself._restrict_configurations
. Ifallows_duplicates == False
, entries are removed from there once suggested. For our example, we need to avoid doing a local optimization of the acquisition function (viaoptimize_acqf
) in this case, but use_sample_and_pick_acq_best()
instead. Since the latter usesself._get_random_config()
, we are all set, since this makes use ofself._restrict_configurations
already.
Other methods are straightforward:
We also take care of pending evaluations (i.e. trials whose observations have not been reported yet). In
register_pending()
, the trial ID is added toself.pending_trials
._update()
stores the metric value fromresult[self._metric]
, whereself._metric
is the name of the primary metric. Also, the trial is removed fromself.pending_trials
, so it ceases to be pending.By implementing
evaluation_failed()
andcleanup_pending()
, we make sure that failed trials do not remain pending.configure_scheduler()
is a callback which allows the searcher to depend on its scheduler. In particular, the searcher should reject non-supported scheduler types. The base class implementationconfigure_scheduler()
setsself._metric
andself._mode
from the corresponding attributes of the scheduler, so they do not have to be set at construction of the searcher.
Finally, all the code specific to BoTorch is located in
_sample_next_candidate()
and other internal methods. Importantly, BoTorch requires configurations to be
encoded as vectors with values in \([0, 1]\), which is done using the
self._hp_ranges
member, as is detailed below.
Note
When implementing a new searcher, whether from scratch or wrapping external
code, we recommend you use the base class
StochasticAndFilterDuplicatesSearcher
and implement the allow_duplicates
argument. This will also give you
proper random seed management and points_to_evaluate
. Instead of
get_config
, you implement the internal method _get_config
. If you need
to draw configurations at random, use the method _get_random_config
which
uses the built-in exclusion list, properly deals with configuration spaces
of finite size, and uses the random generator seeded in a consistent and
reproducible way.
We also recommend that you implement the restrict_configurations
argument,
unless this is hard to do for your scheduler. Often, a scheduler can be made
to score a certain number of configurations and return the best. If so, you
use self._get_random_config()
to select the configurations to score, which
take care of restrict_configurations
.
HyperparameterRanges
Most model-based HPO algorithms require configurations to be encoded as vectors with values in \([0, 1]\). If \(\mathbf{u} = e(\mathbf{x})\) and \(\mathbf{x} = d(\mathbf{u})\) denote encoding and decoding map, where \(\mathbf{x}\in \mathcal{X}\) is a configuration and \(\mathbf{u} \in [0,1]^k\), then \(d(e(\mathbf{x})) = \mathbf{x}\) for every configuration \(\mathbf{x}\), and a random sample \(d(\mathbf{u})\), where the components of \(\mathbf{u}\) are sampled uniformly at random, is equivalent to a random sample from the configuration space, as defined by the hyperparameter domains.
With HyperparameterRanges
,
Syne Tune provides encoding and decoding for all domains in
syne_tune.config_space
(see this tutorial for
a summary). In fact, this API can be implemented in different ways, and the
factory function
make_hyperparameter_ranges()
can be used to create a HyperparameterRanges
object from a configuration
space.
to_ndarray()
provides the encoding map \(e(\mathbf{x})\), andto_ndarray_matrix()
encodes a list of configurations into a matrix.from_ndarray()
provides the decoding map \(d(\mathbf{u})\).config_to_match_string()
maps a configuration to a hash string which can be used to test for (approximate) equality (seeallow_duplicates
discussion above).
Apart from encoding and decoding, HyperparameterRanges
provides further
functionalities, such as support for a resource attribute in model-based
multi-fidelity schedulers, or the active_config_space
feature which is
useful to support transfer tuning (i.e., HPO in the presence of evaluation
data from earlier experiments with different configuration spaces).
Note
When implementing a new searcher or wrapping external code, we recommend you
use HyperparameterRanges
in order to encode and decode configurations as vectors, instead of writing
this on your own. Doing so ensures that your searcher supports all
hyperparameter domais offered by Syne Tune, even new ones potentially added
in the future. If you do not like the built-in implementation of the
HyperparameterRanges
API, feel free to contribute a different one.
Managing Dependencies
External code can come with extra dependencies. For example, BoTorchSearcher
depends on torch
, botorch
, and gpytorch
. If you just use Syne Tune
for your own experiments, you do not have to worry about this. However, we
strongly encourage you to
contribute back your extension.
Since some applications of Syne Tune require restricted dependencies, such are
carefully managed. There are different
installation options,
each of which coming with a requirements.txt
file (see setup.py
for
details).
First, check whether any of the installation options cover the dependencies of your extension (possibly a union of several of them). If so, please use conditional imports w.r.t. these (see below)
If the required dependencies are not covered, you can create a new installation option (say,
foo
), viarequirements-foo.txt
and a modification ofsetup.py
. In this case, please also extendtry_import
by a functiontry_import_foo_message
.
Once all required dependencies are covered by some installation option, wrap their imports as follows:
try:
from foo import bar # My dependencies
# ...
except ImportError:
print(try_import_foo_message())
Extending Asynchronous Hyperband
Syne Tune provides powerful generic scheduler templates for popular methods like successive halving and Hyperband. These can be run with synchronous or asynchronous decision-making. The most important generic templates at the moment are:
FIFOScheduler: Full evaluation scheduler, baseclass for many others. See also
FIFOScheduler
.HyperbandScheduler: Asynchronous successive halving and Hyperband. See also
HyperbandScheduler
.SynchronousHyperbandScheduler: Synchronous successive halving and Hyperband. See also
SynchronousHyperbandScheduler
.
Chances are your idea for a new scheduler maps to one of these templates, in which case you can save a lot of time and headache by just extending the template, rather than re-implementing the wheel. Due to Syne Tune’s modular design of schedulers and their components (e.g., searchers, decision rules), you may even get more than you bargained for.
In this section, we will walk through an example of how to furnish the asynchronous successive halving scheduler with a specific searcher.
HyperbandScheduler
Details about asynchronous successive halving and Hyperband are given in the
Multi-fidelity HPO tutorial. This is a
multi-fidelity scheduler, where trials report intermediate results (e.g.,
validation error at the end of each epoch of training). We can formalize this
notion by the concept of resource \(r = 1, 2, 3, \dots\) (e.g.,
\(r\) is the number of epochs trained). A generic implementation of this
method is provided in class:HyperbandScheduler
.
Let us have a look at its arguments not shared with the base class
class:FIFOScheduler
:
A mandatory argument is
resource_attr
, which is the name of a field in theresult
dictionary passed toscheduler.on_trial_report
. This field contains the resource \(r\) for which metric values have been reported. For example, if a trial reports validation error at the end of the 5-th epoch of training,result
contains{resource_attr: 5}
.We already noted the arguments
max_resource_attr
andmax_t
in class:FIFOScheduler
. They are used to determine the maximum resource \(r_{max}\) (e.g., the total number of epochs a trial is to be trained, if not stopped before). As discussed in detail here, it is best practice reserving a field in the configuration spacescheduler.config_space
to contain \(r_{max}\). If this is done, its name should be passed inmax_resource_attr
. Now, every configuration sent to the training script contains \(r_{max}\), which should not be hardcoded in the script. Moreover, ifmax_resource_attr
is used, a pause-and-resume scheduler (e.g.,HyperbandScheduler
withtype="stopping"
) can modify this field in the configuration of a trial which is only to be run until a certain resource less than \(r_{max}\). Nevertheless, ifmax_resource_attr
is not used, then \(r_{max}\) has to be passed explicitly viamax_t
(which is not needed ifmax_resource_attr
is used).reduction_factor
,grace_period
,brackets
are important parameters detailed in the tutorial. Ifbrackets > 1
, we run asynchronous Hyperband with this number of brackets, while forbracket == 1
we run asynchronous successive halving (this is the default).As detailed in the tutorial,
type
determines whether the method uses early stopping (type="stopping"
) or pause-and-resume scheduling (type="promotion"
). Further choices oftype
activate specific algorithms such as RUSH, PASHA, or cost-sensitive successive halving.
Kernel Density Estimator Searcher
One of the most flexible ways of extending
HyperbandScheduler
is to provide it with
a novel searcher. In order to
understand how this is done, we will walk through
MultiFidelityKernelDensityEstimator
and
KernelDensityEstimator
.
This searcher implements suggest
as in
BOHB, as also detailed in
this tutorial. In a
nutshell, the searcher splits all observations into two parts (good and
bad), depending on metric values lying above or below a certain quantile, and
fits kernel density estimators to these two subsets. It then makes decisions
based on a particular ratio of these densities, which is approximating a
variant of the expected improvement acquisition function.
We begin with the base class
KernelDensityEstimator
,
which works with schedulers implementing
TrialSchedulerWithSearcher
(the most important one being FIFOScheduler
),
but already implements most of what is needed in the multi-fidelity context.
The code does quite some bookkeeping concerned with mapping configurations to feature vectors. If you want to do this from scratch for your searcher, we recommend to use
HyperparameterRanges
. However,KernelDensityEstimator
was extracted from the original BOHB implementation.Observation data is collected in
self.X
(feature vectors for configurations) andself.y
(values forself._metric
, negated ifself.mode == "max"
). In particular, the_update
method simply appends new data to these members.get_config
fits KDEs to the good and bad parts ofself.X
,self.y
. It then samplesself.num_candidates
configurations at random, evaluates the TPE acquisition function for each candidate, and returns the best one.configure_scheduler
is a callback which allows the searcher to check whether its scheduler is compatible, and to depend on details of this scheduler. In our case, we check whether the scheduler implementsTrialSchedulerWithSearcher
, which is the minimum requirement for a searcher.
Note
Any scheduler configured by a searcher should inherit from
TrialSchedulerWithSearcher
,
which mainly makes sure that
configure_scheduler()
is called before the searcher is first used. It is also strongly recommended
to implement configure_scheduler
for a new searcher, restricting usage
to compatible schedulers.
The class
MultiFidelityKernelDensityEstimator
inherits from KernelDensityEstimator
:
On top of
self.X
andself.y
, it also maintains resource values \(r\) for each datapoint inself.resource_levels
.get_config
remains the same, only its subroutinetrain_kde
for training the good and bad density models is modified. The idea is to fit these to data from a single rung level, namely the largest level at which we have observed at leastself.num_min_data_points
points.configure_scheduler
restricts usage to schedulers implementingMultiFidelitySchedulerMixin
, which all multi-fidelity schedulers need to inherit from (examples areHyperbandScheduler
for asynchronous Hyperband andSynchronousHyperbandScheduler
for synchronous Hyperband). It also callsconfigure_scheduler()
. Moreover,self.resource_attr
is obtained from the scheduler, so does not have to be passed.
Note
Any multi-fidelity scheduler configured by a searcher should inherit from both
TrialSchedulerWithSearcher
and
MultiFidelitySchedulerMixin
.
The latter is a basic API to be implemented by multi-fidelity schedulers, which
is used by the configure_scheduler
of searchers specialized to multi-fidelity
HPO. Doing so makes sure any new multi-fidelity scheduler can seamlessly be
used with any such searcher.
While being functional and simple, the
MultiFidelityKernelDensityEstimator
does not showcase the full range of
information exchanged between HyperbandScheduler
and a searcher. In
particular:
register_pending
: BOHB does not take pending evaluations into account.remove_case
,evaluation_failed
are not implemented.get_state
,clone_from_state
are not implemented, so schedulers with this searcher are not properly serialized.
For a more complete and advanced example, the reader is invited to study
GPMultiFidelitySearcher
and
GPFIFOSearcher
.
This searcher takes pending evaluations into account (by way of fantasizing).
Moreover, it can be configured with a Gaussian process model and an acquisition
function, which is optimized in a gradient-based manner.
Moreover, as already noted here,
HyperbandScheduler
also allows to configure the decision rule for
stop/continue or pause/resume as part of on_trial_report
. Examples for this
are found in
StoppingRungSystem
,
PromotionRungSystem
,
RUSHStoppingRungSystem
,
PASHARungSystem
,
CostPromotionRungSystem
.
Extending Synchronous Hyperband
In the previous section, we gave an example of how to extend asynchronous Hyperband with a new searcher. Syne Tune also provides a scheduler template for synchronous Hyperband. In this section, we will walk through an example of how to extend this template.
Our example here is somewhat more advanced than the one given for asynchronous Hyperband. In fact, we will walk through the implementation of Differential Evolution Hyperband (DEHB) in Syne Tune. Readers who are not interested in how to extend synchronous Hyperband, may skip this section without loss.
Synchronous Hyperband
The differences between synchronous and asynchronous successive halving and
Hyperband are detailed in
this tutorial.
In a nutshell, synchronous Hyperband uses rung levels of a priori fixed size,
and decisions on which trials to promote to the next level are only done when
all slots in the current rung are filled. In other words, promotion decisions
are synchronized, while the execution of parallel jobs still happens
asynchronously. This requirement poses slight additional challenges for an
implementation, over what is said in
published work. We start with an
overview of
SynchronousHyperbandScheduler
.
Concepts such as resource, rung, bracket, grace period \(r_{min}\),
reduction factor \(\eta\) are detailed in
this tutorial.
SynchronousHyperbandBracket
represents a bracket, consisting of a list of rungs, where each rung is
defined by (rung_size, level)
, rung_size
is the number of slots,
level
the resource level. Any system of rungs is admissible, as long
as rung_size
is strictly decreasing and level
is strictly
increasing.
Any active bracket (i.e., supporting running trials) has a
self.current_rung
, where not all slots are occupied.A slot in the current rung can be occupied, pending, or free. A slot is free if it has not been associated with a trial yet. It is pending if it is associated with a trial, but the latter has not returned a metric value yet. It is occupied if it contains a metric value. A rung is worked on by turning free slots to pending by associating them with a trial, and turning pending slots to occupied when their trials return values.
next_free_slot
: ReturnsSlotInRung
information about the next free slot, orNone
if all slots are occupied or pending. This method is called as part ofsuggest
.on_result
: This method is called as part ofon_trial_result
, when a trial reports the result a pending slot is waiting for. The corresponding slot becomes occupied. If this action renders the rung complete (i.e., all slots are occupied), then_promote_trials_at_rung_complete
is called. This method increasesself.current_rung
and populates thetrial_id
fields by the top performers of the rung just completed. All slots in the new rung are free. Note that thetrial_id
fields of the first rung are assigned toNone
at the beginning, they are set by the caller (using newtrial_id
values provided by the backend).
SynchronousHyperbandBracketManager
maintains all brackets during an experiment. It is configured by a list
of brackets, where each bracket has one less rungs than its predecessor.
The Hyperband algorithm cycles through this RungSystemsPerBracket
in
a round robin fashion. The bracket manager relays next_job
and
on_result
calls to the correct SynchronousHyperbandBracket
. The
first bracket which is not yet complete, is the primary bracket.
next_job
: The preferred bracket to take the job (vianext_free_slot
) is the primary one. However, a bracket may not be able to take the job, because its current rung has no free slots (i.e., they are all occupied or pending). In this case, the manager scans successive brackets. If no existing bracket can take the job, a new bracket is created.
Given these classes,
SynchronousHyperbandScheduler
is straightforward. It is a pause-and-resume scheduler, and it implements the API
MultiFidelitySchedulerMixin
,
so that any searchers supporting multi-fidelity schedulers can be used. More
precisely, SynchronousHyperbandScheduler
inherits from
SynchronousHyperbandCommon
,
which derives from
TrialSchedulerWithSearcher
and
MultiFidelitySchedulerMixin
and collects some code used during construction.
_suggest
pollsself.bracket_manager.next_job()
. If theSlotInRung
returned hastrial_id
assigned, it corresponds to a trial to be promoted, so the decision isresume_suggestion()
Otherwise, the scheduler decides forstart_suggestion()
with a newtrial_id
, which also updates theSlotInRung.trial_id
field. In any case, the scheduler maintains the curently pending slots inself._trial_to_pending_slot
.on_trial_result
relays information back viaself.bracket_manager.on_result((bracket_id, slot_in_rung))
, as long astrial_id
appears inself._trial_to_pending_slot
and has reached its required rung level.
Differential Evolution Hyperband
We will now have a closer look at the implementation of
DEHB in Syne Tune, which is a
recent extension of synchronous Hyperband, where configurations of
trials are chosen by evolutionary computations (mutation, cross-over,
selection). This example is more advanced than the
one above, in that we need to do more than
furnishing
SynchronousHyperbandScheduler
with a new searcher. The only time when a searcher suggests configurations is
at the very start, when the first rung of the first bracket is filled. All
further configurations are obtained by evolutionary means.
The main difference between DEHB and synchronous Hyperband is how configurations to be evaluated in a rung are chosen, based on trials in the rung above and in earlier brackets. In synchronous Hyperband, we simply promote the best performing trials from the rung above. In particular, the configurations do not change, and trials paused in the rung above are resumed. In DEHB, this promotion process is more complicated, and importantly, it leads to new trials with different configurations. This means that trials are not resumed in DEHB. Moreover, each configuration attached to a trial is represented by an encoded vector with values in \([0, 1]\), where the mapping from vectors to configurations is not invertible if the configuration space contains discrete parameters. Much the same is done in Gaussian process based Bayesian optimization.
The very first bracket of DEHB is processed in the same way as in synchronous Hyperband, so assume the current bracket is not the first. This is how the configuration vector for a free slot in a rung is chosen:
Identify a mutation candidate set. If there is a rung above, this set contains the best performing trials from there, namely those that would be promoted in synchronous Hyperband. If there is no rung above, the set is the rung with same level from the previous bracket. Now, if this set contains less than 3 entries, we add configurations from earlier trials at the same rung level (the global parent pool). This mutation candidate set is the same for all choices in the same rung.
Draw 3 configurations at random, without replacement, from the mutation candidate set and create a mutant as a linear combination of them.
Identify the target configuration from the same slot and rung level in the previous bracket. The candidate for the slot is obtained by cross-over between mutant and target, in that each entry of the vector is picked randomly from that position in one of the two. An evaluation is started for this candidate configuration.
Finally, there is selection. Once the slot is to be occupied, we compare metric values between target and candidate, and the better one gets assigned to the slot.
While this sounds quite foreign to what we saw
above, we can make
progress by associating each candidate vector arising from mutation and
cross-over with a new trial_id
. After all, in order to determine the
winner between candidate and target, we have to evaluate the former.
Once this is done, we can map mutation and cross-over to suggest
,
and selection to on_trial_report
. It becomes clear that we can use
most of the infrastructure for synchronous Hyperband without change.
DifferentialEvolutionHyperbandBracket
has only minor differences to SynchronousHyperbandBracket
. First,
_promote_trials_at_rung_complete
does nothing, because promotion
(i.e., determining the trials for a rung from the one above) is a more
complex process now. In particular, the trial_id
fields of free
slots in the current rung are None
until they become occupied.
Second, top_list_for_previous_rung
returns the top performing trials
of the rung above the current one. This information is needed in order
to create the mutation candidate set. All other methods remain the same.
We still need to identify the next free slot (at the time of mutation
and cross-over), and need to write information back when a slot gets
occupied.
At this point, it is important to acknowledge some difficulties arising from asynchronous job execution. Namely, mutation and cross-over require the configurations for the mutation candidate set and target to have been determined before, and selection needs the metric value for the target. If this type of information is not present when we need it, we are not allowed to wait.
If the current rung is not the first in the bracket, we know that all slots in the rung above are occupied. After all, DEHB is still a synchronous HPO method.
The rung from where to choose the target can be problematic, as it may not have been decided upon completely when mutation starts for the current rung. In this case, our implementation cycles back through the brackets until an assigned slot (i.e., not free) is found in the right place.
For this reason, it is possible in principle that the target
trial_id
changes between cross-over and selection. Also, in rare cases, the target may not have a metric at selection time. In this case, the candidate wins.
DifferentialEvolutionHyperbandBracketManager
is very similar to SynchronousHyperbandBracketManager
. Differences include:
The system of brackets is more rigid in DEHB, in that subsequent brackets are determined by the first one. In particular, later brackets have less total budget, because rung sizes are inherited from the first bracket.
top_of_previous_rung
helps choosing the mutation candidate set. Its return values are cached.trial_id_from_parent_slot
selects thetrial_id
for the target for cross-over and selection.
DifferentialEvolutionHyperbandScheduler
implements the DEHB scheduler. Just like SynchronousHyperbandScheduler
, it
inherits from
SynchronousHyperbandCommon
,
which contains common code used by both of them.
On top of
SynchronousHyperbandScheduler
, it also mapstrial_id
to encoded configuration inself._trial_info
, andself._global_parent_pool
maintains all completed trials at each rung level._suggest
: We start by determining a free slot, then a configuration vector for the new trial, typically by mutation and cross-over. One difficulty is that this could end up suggesting a configuration already proposed before, because many encoded vectors map to the same configuration. In this case, we retry and may ultimately draw encoded configs at random. Except for a special case in the very first bracket, we return withstart_suggestion()
.New encoded configurations are chosen only for the first rung of the first bracket. Our implementation allows a searcher to be specified for this choice. However, the default is to sample the new vector uniformly at random, see
_encoded_config_from_searcher
. Importantly, this is different from usingsearcher="random"
. The latter samples a configuration and maps it to an encoded vector, a process which has less entropy if discrete hyperparameters are present.on_trial_result
is similar to what happens inSynchronousHyperbandScheduler
, except that selection is happening as well. If the target wins in the selection,ext_slot.trial_id
is changed to the targettrial_id
. In any case, we returnSchedulerDecision.STOP
because the trial will not have to be resumed later on (except in the very first bracket).
Linking in a New Searcher
At this point, you should have learned everything needed for implementing a new scheduler, or for modifying an existing template scheduler to your special requirements. Say, you have implemented a new searcher to be plugged into one of the existing generic schedulers. In this section, we will look into how a new searcher can be made available in an easy-to-use fashion.
The Searcher Factory
Recall that our generic schedulers, such as
FIFOScheduler or
HyperbandScheduler allow the
user to choose a searcher via the string argument searcher
, and to
configure the searcher (away from defaults) by the dictionary argument
search_options
. While searcher
can also be a
BaseSearcher
instance, it is simpler and more convenient to choose the searcher by
name. For example:
Generic schedulers only work with certain types of searchers. This consistency is checked when
searcher
is a name, but may lead to subtle errors if not.Several arguments of a searcher are typically just the same as for the surrounding scheduler, or can be inferred from arguments of the scheduler. This can become complex for some searchers and leads to difficult boiler plate code in case
searcher
is to be created by hand.While not covered in this tutorial, constructing schedulers and searchers for Gaussian process based Bayesian optimization and its extensions to multi-fidelity scheduling, constrained or cost-aware search is significantly more complex, as can be seen in
syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
.
It is the purpose of
searcher_factory()
to create the correct
BaseSearcher
object for given
scheduler arguments, including searcher
(name) and search_options
. Let
us have a look how the constructor of
FIFOScheduler
calls the factory. We see
how scheduler arguments like metric
, mode
, points_to_evaluate
are
just passed through to the factory. We also need to set
search_options["scheduler"]
in order to tell searcher_factory
which
generic scheduler is calling it.
The
searcher_factory()
code should be straightforward to understand and extend. Pick a name for your
new searcher and set searcher_cls
and supported_schedulers
(the latter
can be left to None
if your searcher works with all generic schedulers). The
constructor of your searcher needs to have the signature
def __init__(self, config_space: dict, metric: str, **kwargs):
Here, kwargs
will be fed with search_options
, but enriched with fields
like mode
, points_to_evaluate
, random_seed_generator
, scheduler
.
Your searcher is not required to make use of them, even though we strongly
recommend to support points_to_evaluate
and to make use of
random_seed_generator
(as is
shown here). Here are
some best practices for linking a new searcher into the factory:
The Syne Tune code is written in a way which allows to run certain scenarios with a restricted set of all possible dependencies (see FAQ). This is achieved by conditional imports. If your searcher requires dependencies beyond the core, please make sure to use
try ... except ImportError
as you see in the code.Try to make sure that your searcher also works without
search_options
being specified by the user. You will always have the fields contributed by the generic schedulers, and for all others, your code should ideally come with sensible defaults.Make sure to implement the
configure_scheduler
method of your new searcher, restricting usage to supported scheduler types.
The Baseline Wrappers
In order to facilitate choosing and configuring a scheduler along with its
searcher, Syne Tune defines the most frequently used combinations in
syne_tune.optimizer.baselines
. The minimal signature of a baseline
class is this:
def __init__(self, config_space: dict, metric: str, **kwargs):
Or, in the multi-objective case:
def __init__(self, config_space: dict, metric: List[str], **kwargs):
If the underlying scheduler maintains a searcher (as most schedulers do),
arguments to the searcher (except for config_space
, metric
) are
given in kwargs["search_options"]
. If a scheduler is of multi-fidelity
type, the minimal signature is:
def __init__(self, config_space: dict, metric: str, resource_attr: str, **kwargs):
If the scheduler accepts a random seed, this must be kwargs["random_seed"]
.
Several wrapper classes in syne_tune.optimizer.baselines
have signatures
with more arguments, which are either passed to the scheduler or to the searcher.
For example, some wrappers make random_seed
explicit in the signature,
instead of having it in kwargs
.
Note
If a scheduler maintains a searcher inside, and in particular if it simply
configures FIFOScheduler
or
class:HyperbandScheduler
with a new
searcher, it is strongly recommended to adhere to the policy to specify
searcher arguments in kwargs["search_options"]
. This simplifies enabling
the new scheduler in the simple experimentation framework of
syne_tune.experiments
, and in general provides a common user
experience across different schedulers.
Let us look at an example of a baseline wrapper whose underlying scheduler is
of type FIFOScheduler
with a specific
searcher, which is not itself created via a searcher factory:
class REA(FIFOScheduler):
"""Regularized Evolution (REA).
See :class:`~syne_tune.optimizer.schedulers.searchers.regularized_evolution.RegularizedEvolution`
for ``kwargs["search_options"]`` parameters.
:param config_space: Configuration space for evaluation function
:param metric: Name of metric to optimize
:param population_size: See
:class:`~syne_tune.optimizer.schedulers.searchers.RegularizedEvolution`.
Defaults to 100
:param sample_size: See
:class:`~syne_tune.optimizer.schedulers.searchers.RegularizedEvolution`.
Defaults to 10
:param random_seed: Random seed, optional
:param kwargs: Additional arguments to
:class:`~syne_tune.optimizer.schedulers.FIFOScheduler`
"""
def __init__(
self,
config_space: Dict[str, Any],
metric: str,
population_size: int = 100,
sample_size: int = 10,
random_seed: Optional[int] = None,
**kwargs,
):
searcher_kwargs = _create_searcher_kwargs(
config_space, metric, random_seed, kwargs
)
searcher_kwargs["population_size"] = population_size
searcher_kwargs["sample_size"] = sample_size
super(REA, self).__init__(
config_space=config_space,
metric=metric,
searcher=RegularizedEvolution(**searcher_kwargs),
random_seed=random_seed,
**kwargs,
)
def create_gaussian_process_estimator(
config_space: Dict[str, Any],
metric: str,
random_seed: Optional[int] = None,
search_options: Optional[Dict[str, Any]] = None,
) -> Estimator:
scheduler = BayesianOptimization(
config_space=config_space,
metric=metric,
random_seed=random_seed,
search_options=search_options,
)
searcher = scheduler.searcher # GPFIFOSearcher
state_transformer = searcher.state_transformer # ModelStateTransformer
estimator = state_transformer.estimator # GaussProcEmpiricalBayesEstimator
# update the estimator properties
estimator.active_metric = metric
return estimator
class MORandomScalarizationBayesOpt(FIFOScheduler):
"""
Uses :class:`~syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveMultiSurrogateSearcher`
with one standard GP surrogate model per metric (same as in
:class:`BayesianOptimization`, together with the
:class:`~syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveLCBRandomLinearScalarization`
acquisition function.
If `estimators` is given, surrogate models are taken from there, and the
default is used otherwise. This is useful if you have a good low-variance
model for one of the objectives.
:param config_space: Configuration space for evaluation function
:param metric: Name of metrics to optimize
:param mode: Modes of optimization. Defaults to "min" for all
:param random_seed: Random seed, optional
:param estimators: Use these surrogate models instead of the default GP
one. Optional
:param kwargs: Additional arguments to
:class:`~syne_tune.optimizer.schedulers.FIFOScheduler`. Here,
``kwargs["search_options"]`` is used to create the searcher and its
GP surrogate models.
"""
def __init__(
self,
config_space: Dict[str, Any],
metric: List[str],
mode: Union[List[str], str] = "min",
random_seed: Optional[int] = None,
estimators: Optional[Dict[str, Estimator]] = None,
**kwargs,
):
try:
from syne_tune.optimizer.schedulers.multiobjective import (
MultiObjectiveMultiSurrogateSearcher,
MultiObjectiveLCBRandomLinearScalarization,
)
except ImportError:
logging.info(try_import_moo_message())
raise
searcher_kwargs = _create_searcher_kwargs(
config_space, metric, random_seed, kwargs
)
if estimators is None:
estimators = dict()
else:
estimators = estimators.copy()
if isinstance(mode, str):
mode = [mode] * len(metric)
if "search_options" in kwargs:
search_options = kwargs["search_options"].copy()
else:
search_options = dict()
search_options["no_fantasizing"] = True
for _metric in metric:
if _metric not in estimators:
estimators[_metric] = create_gaussian_process_estimator(
config_space=config_space,
metric=_metric,
search_options=search_options,
)
# Note: ``mode`` is dealt with in the ``update`` method of the MO
# searcher, by converting the metrics. Internally, all metrics are
# minimized
searcher = MultiObjectiveMultiSurrogateSearcher(
estimators=estimators,
mode=mode,
scoring_class=partial(
MultiObjectiveLCBRandomLinearScalarization, random_seed=random_seed
),
**searcher_kwargs,
)
super().__init__(
config_space=config_space,
metric=metric,
mode=mode,
searcher=searcher,
random_seed=random_seed,
**kwargs,
)
class NSGA2(FIFOScheduler):
"""
See :class:`~syne_tune.optimizer.schedulers.searchers.RandomSearcher`
for ``kwargs["search_options"]`` parameters.
:param config_space: Configuration space for evaluation function
:param metric: Name of metric to optimize
:param population_size: The size of the population for NSGA-2
:param random_seed: Random seed, optional
:param kwargs: Additional arguments to
:class:`~syne_tune.optimizer.schedulers.FIFOScheduler`
"""
def __init__(
self,
config_space: Dict[str, Any],
metric: List[str],
mode: Union[List[str], str] = "min",
population_size: int = 20,
random_seed: Optional[int] = None,
**kwargs,
):
searcher_kwargs = _create_searcher_kwargs(
config_space, metric, random_seed, kwargs
)
searcher_kwargs["mode"] = mode
searcher_kwargs["population_size"] = population_size
super(NSGA2, self).__init__(
config_space=config_space,
metric=metric,
mode=mode,
searcher=NSGA2Searcher(**searcher_kwargs),
random_seed=random_seed,
**kwargs,
)
The signature has
config_space
,metric
, andrandom_seed
. It also has two searcher arguments,population_size
andsample_size
.In order to compile the arguments
searcher_kwargs
for creating the searcher, we first call_create_searcher_kwargs(config_space, metric, random_seed, kwargs)
. Doing so is particularly important in order to ensure random seeds are managed between scheduler and searcher in the same way across different Syne Tune schedulers.Next, the additional arguments
population_size
andsample_size
need to be appended to these searcher arguments. Had we usedkwargs["search_options"]
instead, this would not be necessary.Finally, we create
FIFOScheduler
, passingconfig_space
,metric
, as well as the new searcher viasearcher=RegularizedEvolution(**searcher_kwargs)
, and finally pass**kwargs
at the end.
Baselines and Benchmarking
As shown in this tutorial and
this tutorial, a particularly convenient
way to define and run experiments is using the code in
syne_tune.experiments
. Once a new scheduler has a baseline wrapper, it
is very easy to make it available there: you just need to add a wrapper in
syne_tune.experiments.default_baselines
. For the REA
example above,
this is:
from syne_tune.optimizer.baselines import REA as _REA
def REA(method_arguments: MethodArguments, **kwargs):
return _REA(**_baseline_kwargs(method_arguments, kwargs))
Contribute your Extension
At this point, you are ready to plug in your latest idea and make it work in Syne Tune. Given that it works well, we would encourage you to contribute it back to the community. We are looking forward to your pull request.
Extending the Documentation
Syne Tune comes with an extensive amount of documentation:
User-facing APIs are commented in the code, using the reStructered text format. This is used to generate the API Reference. Please refer to the code in order to understand our conventions. Please make sure that links to classes, methods, or functions work. In the presence of
:math:
expression, the docstring should be raw:r""" ... """
.Examples in
examples/
are working, documented scripts showcasing individual features. If you contribute a new example, please also link it in docs/source/examples.rst.Frequently asked questions at docs/source/faq.rst.
Table of all HPO algorithms in docs/source/getting_started.rst. If you contribute a new HPO method, please add a row there. As explained above, please also extend
baselines
.Tutorials at
docs/source/tutorials/
. These are short chapters, explaining a concept in more detail than an example. A tutorial should be self-contained and come with functioning code, which can be run in a reasonable amount of time and cost. It may contain figures created with a larger effort.
Building the Documentation
You can build the documentation locally as follows. Make sure to have Syne
Tune installed with dev
dependencies:
cd docs
rm -rf source/_apidoc
make clean
make html
Then, open docs/build/html/index.html
in your browser.
The documentation is also built as part of our CI system, so you can inspect it as part of a pull request:
Move to the list of all checks (if the PR is in good shape, you should see All checks have passed)
Locate docs/readthedocs.org:syne-tune at the end of the list. Click on Details
Click on View docs just below Build took X seconds (do not click on the tall View Docs button upper right, this leads to the latest public docs)
When extending the documentation, please verify the following:
Check whether links work. They typically fail silently, possibly emitting a warning. Use proper links when referring to classes, modules, functions, methods, or constants, and check whether the links to the API Reference work.
Conventions
We use the following conventions to ensure that documentation stays up-to-date:
Use
literalinclude
for almost all code snippets. In general, the documentation is showing code which is part of a functional script, which can either be inexamples/
, inbenchmarking/examples/
, or otherwise next to the documentation files.Almost all code shown in the documentation is run as part of integration testing (
.github/workflows/integ-tests.yml
) or end-to-end testing (.github/workflows/end-to-end-tests.yml
). If you contribute documentation with code, please insert your functional script into one of the two:integ-tests.yml
is run as part of our CI system. Code should run for no more than 30 seconds. It must not depend on data loaded from elsewhere, and not make use of surrogate blackboxes. It must not use SageMaker.end-to-end-tests.yml
is run manually on a regular basis, and in particular before a new release. Code may download files or depend on surrogate blackboxes. It may use SageMaker. Costs and runtime should be kept reasonable.
Links to other parts of the documentation should be used frequently. We use anonymous references (two trailing underscores).
Whenever mentioning a code construction (class, method, function, module, constant), please use a proper link with absolute module name and leading tilde. This allows interested readers to inspect API details and the code. When the same name is used several times in the same paragraph, it is sufficient to use a proper link for the first occurence only.
How to Implement Bayesian Optimization
This tutorial can be seen as more advanced successor of our developer tutorial. It provides an overview of how model-based search, and in particular Bayesian optimization, is implemented in Syne Tune, and how this code can be extended in order to fit your needs. The basic developer tutorial is a prerequisite to take full advantage of the advanced tutorial here.
We hope this information inspires you to give it a try to extend Syne Tune’s Bayesian optimization to your needs. Please do consider contributing your efforts to Syne Tune.
Note
In order to develop new methodology in Syne Tune, make sure to use an
installation from source.
In particular, you need to have installed the dev
dependencies.
Overview of Module Structure
We begin with an overview of the module structure of the Bayesian optimization (BO) code in Syne Tune. Feel free to directly move to the first example and come back here for reference.
Recall that
Bayesian optimization is implemented in a searcher, which is a component of
a scheduler responsible for suggesting the next configuration to sample, given
data from earlier trials. While searchers using BO are located in
syne_tune.optimizer.schedulers.searchers
and submodules, the BO code
itself is found in syne_tune.optimizer.schedulers.searchers.bayesopt
.
Recall that
a typical BO algorithm is configured by a surrogate model and an acquisition
function. In Syne Tune, acquisition functions are implemented generically,
while (except for special cases) surrogate models can be grouped in two
different classes:
Gaussian process based surrogate models: Implementations in
gpautograd
.Surrogate models based on
scikit-learn
like estimators: Implementations insklearn
.
The remaining code in syne_tune.optimizer.schedulers.searchers.bayesopt
is generic or wraps lower-level code. Submodules are as follows:
datatypes
: Collects types related to maintaining data obtained from trials. The most important class isTuningJobState
, which collects relevant data during an experiment. Note that other relevant classes are insyne_tune.optimizer.schedulers.searchers.utils
, such asHyperparameterRanges
, which wraps a configuration space and maps configurations to encoded vectors used as inputs to a surrogate model.models
: Contains a range of surrogate models, both for single and multi-fidelity tuning, along with the machinery to fit parameters of these models. In a nutshell, retraining of parameters and posterior computations for a surrogate model are defined inEstimator
, which returns aPredictor
to be used for posterior predictions, which in turn drive the optimization of an acquisition function. A model-based searcher interacts with aModelStateTransformer
, which maintains the state of the experiment (aTuningJobState
object) and interacts with anEstimator
. Subclasses ofEstimator
andPredictor
are mainly wrappers of underlying code ingpautograd
orsklearn
. Details will be provided shortly. This module also contains a range of acquisition functions, mostly inmeanstd_acqfunc
.tuning_algorithms
: The Bayesian optimization logic resides here, mostly inBayesianOptimizationAlgorithm
. Interfaces for all relevant concepts are defined inbase_classes
:Predictor
: Probabilistic predictor obtained from surrogate model, to be plugged into acquisition function.AcquisitionFunction
: Acquisition function, which is optimized in order to suggest the next configuration.ScoringFunction
: Base class ofAcquisitionFunction
which does not support gradient computations. Score functions can be used to rank a finite number of candidates.LocalOptimizer
: Local optimizer for minimizing the acquisition function.
gpautograd
: The Gaussian process based surrogate models, defined inmodels
, can be implemented in different ways. Syne Tune currently uses the lightweight autograd library, and the corresponding implementation lies in this module.sklearn
: Collects code required to implement surrogate models based onscikit-learn
like estimators.
Note
The most low-level code for Gaussian process based Bayesian optimization is
contained in
gpautograd
, which
is specific to autograd and L-BFGS
optimization. Unless you want to implement a new kernel function, you
probably do not have to extend this code. As we will see, most extensions of
interest can be done in
models
(new
surrogate model, new acquisition function), or in
tuning_algorithms
(different BO workflow).
A Walk Through Bayesian Optimization
The key primitive of BO is to suggest a next configuration to evaluate the
unknown target function at (e.g., the validation error after training a
machine learning model with a hyperparameter configuration), based on all
data gathered about this function in the past. This primitive is triggered in
the get_config()
method of a BO searcher. It consists of two main steps:
Estimate surrogate model(s), given all data obtained. Often, a single surrogate model represents the target metric of interest, but in generalized setups such as multi-fidelity, constrained, or multi-objective BO, surrogate models may be fit to several metrics. A surrogate model provides predictive distributions for the metric it represents, at any configuration, which allows BO to explore the space of configurations not yet sampled at. For most built-in GP based surrogate models, estimation is done by maximizing the log marginal likelihood, as we see in more detail below.
Use probabilistic predictions of surrogate models to search for the best next configuration to sample at. This is done in
BayesianOptimizationAlgorithm
, and is the main focus here.
BayesianOptimizationAlgorithm
can suggest a batch of num_requested_candidates > 1
. If
greedy_batch_selection == True
, this is done greedily, one configuration
at a time, yet diversity is maintained by inserting already suggested
configurations as pending into the state. If greedy_batch_selection == False
,
we simply return the num_requested_candidates
top-scoring configurations.
For simplicity, we focus on num_requested_candidates == 1
, so that
a single configuration is suggested. This happens in several steps:
First, a list of
num_initial_candidates
initial configurations is drawn at random frominitial_candidates_generator
of typeCandidateGenerator
.Next, these configurations are scored using
initial_candidates_scorer
of typeScoringFunction
. This is a parent class ofAcquisitionFunction
, but acquisition functions support gradient computation as well. The scoring function typically depends on a predictor obtained from a surrogate model.Finally, local optimization of an acquisition function is run, using an instance of
LocalOptimizer
, which depends on an acquisition function and one or more predictors. Local optimization is initialized with the top-scoring configuration from the previous step. If it fails or does not result in a configuration with a better acquisition value, then this initial configuration is returned. The final local optimization can be skipped by passing an instance ofNoOptimization
.
This workflow offers a number of opportunities for customization:
The
initial_candidates_generator
by default draws configurations at random with replacement (checking for duplicates is expensive, and does not add value). This could be replaced by pseudo-random sampling with better coverage properties, or by Latin hypercube designs.The
initial_candidate_scorer
is often the same as the acquisition function in the final local optimization. Other acquisition strategies, such as (independent) Thompson sampling, can be implemented here.You may want to customize the acquisition function feeding into local optimization (and initial scoring), more details are provided below.
Implementing a Surrogate Model
In Bayesian optimization (BO), a surrogate model represents the data observed from a target metric so far, and its probabilistic predictions at new configurations (typically involving both predictive mean and variance) guides the search for a most informative next acquisition. In this section, we will show how surrogate models are implemented in Syne Tune, and give an example of how a novel model can be added.
Recall from above that Syne Tune offers surrogate model from two broad classes:
Gaussian process based models and scikit-learn estimator based models. Both
are implemented in terms of the same abstractions
Estimator
,
and
Predictor
.
We will first walk through GP based surrogate models, then dive into an example
of how to implement a new scikit-learn estimator based model. More details
about how to extend GP based models are provided
further below.
Example
Before diving into details, let us look at a simple example for how to implement a new surrogate model in Syne Tune, of the scikit-learn estimator based type. It does not come with some of the complexities of Gaussian process based surrogate models, to be discussed below:
Fantasizing is not supported
MCMC (or ensemble predictions) is not supported
Gradient-based optimization of an acquisition function is not supported, in that Bayesian optimization is scoring a finite number of candidates drawn at random, selecting the best
The full example code is given
here.
We implement subclasses of
SKLearnPredictor
and
SKLearnEstimator
.
These are wrapped by
SKLearnPredictorWrapper
and
SKLearnEstimatorWrapper
.
from syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn import (
SKLearnEstimator,
SKLearnPredictor,
)
class BayesianRidgePredictor(SKLearnPredictor):
"""
Predictor for surrogate model given by ``sklearn.linear_model.BayesianRidge``.
"""
def __init__(self, ridge: BayesianRidge):
self.ridge = ridge
def predict(self, X: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
return self.ridge.predict(X, return_std=True)
class BayesianRidgeEstimator(SKLearnEstimator):
"""
Estimator for surrogate model given by ``sklearn.linear_model.BayesianRidge``.
None of the parameters of ``BayesianRidge`` are exposed here, so they are all
fixed up front.
"""
def __init__(self, *args, **kwargs):
self.ridge = BayesianRidge(*args, **kwargs)
def fit(
self, X: np.ndarray, y: np.ndarray, update_params: bool
) -> SKLearnPredictor:
self.ridge.fit(X, y.ravel())
return BayesianRidgePredictor(ridge=copy.deepcopy(self.ridge))
The
BayesianRidgeEstimator
is wrapping the scikit-learn estimatorsklearn.linear_model.BayesianRidge
, which implements a form of Bayesian regression estimation. While this method has hyperparameters, they are automatically set infit
, so we do not need to make them explicit. The result offit
is aBayesianRidgePredictor
instance which wraps a copy of the fitted scikit-learn estimator.In
BayesianRidgePredictor
, thepredict
methods calls the equivalent of the scikit-learn estimator withreturn_std=True
, so that both predictive means and stddevs are returned.
The remaining launcher script is much the same as other examples, except that
FIFOScheduler
is used with a
particular searcher:
searcher = SKLearnSurrogateSearcher(
config_space=config_space,
metric=METRIC_ATTR,
estimator=BayesianRidgeEstimator(),
scoring_class=EIAcquisitionFunction,
)
SKLearnSurrogateSearcher
needs aSKLearnEstimator
object asestimator
, as well as the choice of acquisition function asscoring_class
.
The Predictor Class
Scikit-learn based estimators are typically rather simple and based on deterministic machine learning methods. Bayesian optimization is usually run with Bayesian models, where proper quantification of uncertainty is center-stage, and supporting these is a little more difficult.
In Bayesian statistics, (surrogate) models are conditioned on data in order to
obtain a posterior distribution, represented by a posterior state. Given this
state, probabilistic predictions can be done at arbitrary input points. This is
done by objects of type
Predictor
,
whose methods deal with predictions on new configurations.
Note
Before moving on, it is important to understand the difference between conditioning a probabilistic model on data in order to obtain a posterior distribution, with which probabilistic predictions (i.e., mean and variance) can be computed at input points, and learning (or fitting) the (hyper)parameters of the model. For a Bayesian surrogate model, the latter involves Markov Chain Monte Carlo or marginal likelihood optimization, which requires conditioning on data several times. For non-Bayesian models, parameters are often fit by cross-validation.
At this point, there are a number of relevant concepts:
A model can be “fitted” by Markov Chain Monte Carlo (MCMC), in which case its predictive distribution is an ensemble. This is why prediction methods returns lists. In the default case (single model, no MCMC), these lists are of size one.
A model may support fantasizing in order to properly deal with pending configurations in the current state (see also
register_pending
in the discussion here). At least in the Gaussian process surrogate model case, fantasizing is done by drawingnf
samples of target values for the pending configurations, then average predictions over this sample. The Gaussian predictive distributions in this average share the same variance, but have different means. A surrogate model which does not support fantasizing, can ignore this extra complexity.
Take the example of a basic Gaussian process surrogate model, which is behind
BayesianOptimization
. The predictor is
GaussProcPredictor
.
This class can serve models fit by marginal likelihood optimization (empirical
Bayes) or MCMC, but let us focus on the former. Predictions in this model are
based on a posterior state, which maintains a representation of the Gaussian
posterior distribution needed for probabilistic predictions. Say we would like
do a prediction at some configuration \(\mathbf{c}\). First, this
configuration is mapped to an (encoded) input vector \(\mathbf{x}\). Next,
predictive distributions are computed, using the posterior state:
Here, nf
denotes the number of fantasy samples (nf=1
if fantasizing is not
supported). This is served by methods of
Predictor
:
hp_ranges_for_prediction
: Returns instance ofHyperparameterRanges
which is used to map a configuration \(\mathbf{c}\) to an encoded vector \(\mathbf{x}\).predict
: Given a matrix \(\mathbf{X}\) of input vectors (these are the rows \(\mathbf{x}_i\)), return a list of dictionaries. In our non-MCMC example, this list has length 1. The dictionary contains statistics of the predictive distribution. In our example, this would be predictive means (key “mean”) and predictive standard deviations (key “std”). More precisely, the entry for “mean” would be a matrix \([\mu_j(\mathbf{x}_i)]_{i,j}\) of shape(n, nf)
, wheren
is the number of input vectors, and the entry for “std” would be a vector \([\sigma(\mathbf{x}_i)]_i\) of shape(n,)
. If the surrogate model does not support fantasizing, the entry for “mean” is also a vector of shape(n,)
.predict_candidates
: Version ofpredict
, where the input is a list of configurations \([\mathbf{c}_j]\), which are first mapped to rows of the matrix \(\mathbf{X}\) by usinghp_ranges_for_prediction
.keys_predict
: Keys of dictionaries returned bypredict
. If a surrogate model is to be used with a standard acquisition function, such as expected improvement, it needs to return at least means (“mean”) and standard deviations (“std”). However, in other contexts, a surrogate model may be deterministic, in which case only means (“mean”) are returned. This method allows an acquisition function to check whether it can work with surrogate models passed to it.backward_gradient
: This method is needed in order to support local gradient-based optimization of an acquisition function, as discussed here. It is detailed below.current_best
: A number of acquisition functions depend on the incumbent, which is a smooth approximation to the best target value observed so far. Typically, this is implemented as \(\mathrm{min}(\mu_j(\mathbf{x}_i))\) over all inputs \(\mathbf{x}_i\) already sampled for previous trials. As withpredict
, this returns a list of vectors of shape(nf,)
, catering for fantasizing. If fantasizing is not supported, this is a list of scalars, and the list size is 1 for non-MCMC.
Note
In fact,
GaussProcPredictor
inherits from
BasePredictor
,
which extends the base interface by some helper code to implement the
current_best
method.
Supporting Local Gradient-based Optimization
As discussed above, BO in Syne Tune supports local gradient-based optimization
of an acquisition function. This needs to be supported by an implementation of
Predictor
,
in terms of the backward_gradient
method.
In the most basic case, an acquisition function \(\alpha(\mathbf{x})\) has the following structure:
We ignore fantasizing here, otherwise \(\mu(\mathbf{x})\) becomes a vector. For gradient-based optimization, we need derivatives
The backward_gradient
method takes arguments \(\mathbf{x}\) (input
) and
a dictionary mapping “mean” to \(\partial\alpha/\partial\mu\) at
\(\mu = \mu(\mathbf{x})\), “std” to \(\partial\alpha/\partial\sigma\)
at \(\sigma = \sigma(\mathbf{x})\) (head_gradients
), and returns the
gradient \(\partial\alpha/\partial\mathbf{x}\).
Readers familiar with deep learning frameworks like PyTorch
may wonder why
we don’t just combine surrogate model and acquisition function into forming
\(\alpha(\mathbf{x})\), and compute its gradient by reverse mode
differentiation. However, this would strongly couple the two concepts, in that
they would have to be implemented in the same auto-differentiation system.
Instead, backward_gradient
decouples the gradient computation into head
gradients for the acquisition function, which (as we will see) can be
implemented in native NumPy
, and backward_gradient
for the surrogate
model itself. For Syne Tune’s Gaussian process surrogate models, the latter
is implemented using autograd. If the
predict
method is implemented using this framework, gradients are
obtained automatically as usual.
ModelStateTransformer and Estimator
An instance of
Predictor
represents the posterior distribution of a model conditioned on observed data.
Where does this conditioning take place? Note that while machine learning
APIs like scikit-learn
couple fitting and prediction in a single API, these
two are decoupled in Syne Tune by design:
Estimator
: The most important method isfit_from_state()
. It computes the posterior state by conditioning on observed data, which are sufficient statistics required for probabilistic predictions. Moreover, ifupdate_params=True
, this final conditioning is preceded by fitting the (hyper)parameters of the model (this is more expensive, and ifupdate_params=False
, the current parameters are used without updating them).Predictor
: Wraps the posterior state computed by theEstimator
, allows for predictions.
The fitting of surrogate models underlying a Bayesian optimization experiment
happens in
ModelStateTransformer
,
which interfaces between a model-based searcher and the surrogate model. The
ModelStateTransformer
maintains the state of the experiment, where all data
about observations and pending configurations are collected. Its
fit()
method triggers fitting the surrogate models to the current data (this step can
be skipped for computational savings) and computing their posterior states.
ModelStateTransformer
hands down these tasks to an object of type
Estimator
,
which is specific to the surrogate model being used. For our Gaussian process
example, this would be
GaussProcEmpiricalBayesEstimator
.
Here, parameters of the Gaussian process models (such as parameters of the
covariance function) are fitted by marginal likelihood maximization, and the
GP posterior state is computed.
Note
To summarize, if your surrogate model needs to be fit to data, you need to
implement a subclass of
Estimator
,
whose fit_from_state
method takes in data in form of a
TuningJobState
and returns a
Predictor
.
You can use
transform_state_to_data()
in order to convert the TuningJobState
object into the usual pair of
feature matrix features
and target vector targets
, along with
normalization of targets.
Implementing Components of Bayesian Optimization
At this point, you should have obtained an overview of how Bayesian optimization (BO) is structured in Syne Tune, and understood how a new surrogate model can be implemented. In this section, we turn to other components of BO: the acquisition function, and the covariance kernel of the Gaussian process surrogate model. We also look inside the factory for creating Gaussian process based searchers.
Implementing an Acquisition Function
In Bayesian optimization, the next configuration to sample at is chosen by minimizing an acquisition function:
In general, the acquisition function \(\alpha(\mathbf{x})\) is optimized over encoded vectors \(\mathbf{x}\), and the optimal \(\mathbf{x}_*\) is rounded back to a configuration. This allows for gradient-based optimization of \(\alpha(\mathbf{x})\).
In Syne Tune, acquisition functions are subclasses of
AcquisitionFunction
.
It may depend on one or more surrogate models, by being a function of the
predictive statistics returned by the predict
method of
Predictor
.
For a wide range of acquisition functions used in practice, we have that
In other words, \(\alpha(\mathbf{x})\) is a function of the predictive
mean and standard deviation of a single surrogate model. This case is
covered by
MeanStdAcquisitionFunction
.
More general, this class implements acquisition functions depending on one
or more surrogate models, each of which returning means and (optionally)
standard deviations in predict
. Given the generic code in Syne Tune, a
new acquisition function of this type is easy to implement. As an example,
consider the lower confidence bound (LCB) acquisition function:
Here is the code:
class LCBAcquisitionFunction(MeanStdAcquisitionFunction):
r"""
Lower confidence bound (LCB) acquisition function:
.. math::
h(\mu, \sigma) = \mu - \kappa * \sigma
"""
def __init__(self, predictor: Predictor, kappa: float, active_metric: str = None):
super().__init__(predictor, active_metric)
assert isinstance(predictor, Predictor)
assert kappa > 0, "kappa must be positive"
self.kappa = kappa
def _head_needs_current_best(self) -> bool:
return False
def _compute_head(
self,
output_to_predictions: SamplePredictionsPerOutput,
current_best: Optional[np.ndarray],
) -> np.ndarray:
means, stds = self._extract_mean_and_std(output_to_predictions)
return np.mean(means - stds * self.kappa, axis=1)
def _compute_head_and_gradient(
self,
output_to_predictions: SamplePredictionsPerOutput,
current_best: Optional[np.ndarray],
) -> HeadWithGradient:
mean, std = self._extract_mean_and_std(output_to_predictions)
nf_mean = mean.size
dh_dmean = np.ones_like(mean) / nf_mean
dh_dstd = (-self.kappa) * np.ones_like(std)
return HeadWithGradient(
hval=np.mean(mean - std * self.kappa),
gradient={self.active_metric: dict(mean=dh_dmean, std=dh_dstd)},
)
An object is constructed by passing
model
(aPredictor
) andkappa
(the positive constant \(\kappa\)). The surrogate model must return means and standard deviations in itspredict
method._compute_head
: This method computes \(\alpha(\mathbf{\mu}, \mathbf{\sigma})\), given means and standard deviations. The argumentoutput_to_predictions
is a dictionary of dictionaries. If the acquisition function depends on a dictionary of surrogate models, the first level corresponds to that. The second level corresponds to the statistics returned bypredict
. In the simple case here, the first level is a single entry with keyINTERNAL_METRIC_NAME
, and the second level uses keys “mean” and “std” for means \(\mathbf{\mu}\) and stddevs \(\mathbf{\sigma}\). Recall that due to fantasizing, the “mean” entry can be a(n, nf)
matrix, in which case we compute the average along the columns. The argumentcurrent_best
is needed only for acquisition functions which depend on the incumbent._compute_head_and_gradient
: This method is needed for the computation of \(\partial\alpha/\partial\mathbf{x}\), for a single input \(\mathbf{x}\). Given the same arguments as_compute_head
(but for \(n = 1\) inputs), it returns aHeadWithGradient
object, whosehval
entry is the same as the return value of_compute_head
, whereas thegradient
entry contains the head gradients which are passed to thebackward_gradient
method of thePredictor
. This entry is a nested dictionary of the same structure asoutput_to_predictions
. The head gradient for a single surrogate model (as in our example) has \(\partial\alpha/(\partial\mathbf{\mu})\) for “mean” and \(\partial\alpha/(\partial\mathbf{\sigma})\) for “std”. It is particularly simple for the LCB example._head_needs_current_best
returnsFalse
, since the LCB acquisition function does not depend on the incumbent (i.e., the current best metric value), which means that thecurrent_best
arguments need not be provided.
Finally, a new acquisition function should be linked into
acquisition_function_factory()
,
so that users can select it via arguments acq_function
and
acq_function_kwargs
in
BayesianOptimization
. The factory code
is:
from functools import partial
from syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes import (
AcquisitionFunctionConstructor,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl import (
EIAcquisitionFunction,
LCBAcquisitionFunction,
)
SUPPORTED_ACQUISITION_FUNCTIONS = (
"ei",
"lcb",
)
def acquisition_function_factory(name: str, **kwargs) -> AcquisitionFunctionConstructor:
assert (
name in SUPPORTED_ACQUISITION_FUNCTIONS
), f"name = {name} not supported. Choose from:\n{SUPPORTED_ACQUISITION_FUNCTIONS}"
if name == "ei":
return EIAcquisitionFunction
else: # name == "lcb"
kappa = kwargs.get("kappa", 1.0)
return partial(LCBAcquisitionFunction, kappa=kappa)
Here, acq_function_kwargs
is passed as kwargs
. For our example,
acq_function="lcb"
. The user can pass a value for kappa
via
acq_function_kwargs={"kappa": 0.5}
.
A slightly more involved example is
EIAcquisitionFunction
,
representing the expected improvement (EI) acquisition function, which is the
default choice for BayesianOptimization
in Syne Tune. This function depends on the incumbent, so current_best
needs
to be given. Note that if the means passed to _compute_head
have shape
(n, nf)
due to fantasies, then current_best
has shape (1, nf)
,
since the incumbent depends on the fantasy sample.
Acquisition functions can depend on more than one surrogate model. In such a
case, the model
argument to their constructor is a dictionary, and the
key names of the corresponding models (or outputs) are also used in the
output_to_predictions
arguments and head gradients:
EIpuAcquisitionFunction
is an acquisition function for cost-aware HPO:\[\alpha_{\mathrm{EIpu}}(\mathbf{x}) = \frac{\alpha_{\mathrm{EI}}(\mu_y(\mathbf{x}), \sigma_y(\mathbf{x}))}{\mu_c(\mathbf{x})^{\rho}}\]Here, \((\mu_y, \sigma_y)\) are predictions from the surrogate model for the target function \(y(\mathbf{x})\), whereas \(\mu_c\) are mean predictions for the cost function \(c(\mathbf{x})\). The latter can be represented by a deterministic surrogate model, whose
predict
method only returns means as “mean”. In fact, the method_output_to_keys_predict
specifies which moments are required from each surrogate model.CEIAcquisitionFunction
is an acquisition function for constrained HPO:\[\alpha_{\mathrm{CEI}}(\mathbf{x}) = \alpha_{\mathrm{EI}}(\mu_y(\mathbf{x}), \sigma_y(\mathbf{x})) \cdot \mathbb{P}(c(\mathbf{x})\le 0).\]Here, \(y(\mathbf{x})\) is the target function, \(c(\mathbf{x})\) is the constraint function. Both functions are represented by probabilistic surrogate models, whose
predict
method returns means and stddevs. We say that \(\mathbf{x}\) is feasible if \(c(\mathbf{x})\le 0\), and the goal is to minimize \(y(\mathbf{x})\) over feasible points.One difficulty with this acquisition function is that the incumbent in the EI term is computed only over observations which are feasible (so \(c_i\le 0\)). This means we cannot rely on the surrogate model for \(y(\mathbf{x})\) to provide the incumbent, but instead need to determine the feasible incumbent ourselves, in the
_get_current_bests_internal
method.
A final complication in
MeanStdAcquisitionFunction
arises if some or all surrogate models are MCMC ensembles. In such a case,
we average over the sample for each surrogate model involved. Inside this sum
over the Cartesian product, the incumbent depends on the sample index for each
model. This is dealt with by
CurrentBestProvider
.
In the default case for an acquisition function which needs the incumbent
(such as, for example, EI), this value depends only on the model for the
active (target) metric, and
ActiveMetricCurrentBestProvider
is used.
Note
Acquisition function implementations are independent of which
auto-differentiation mechanism is used under the hood. Different to
surrogate models, there is no acquisition function code in
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd
.
This is because the implementation only needs to provide head gradients
in compute_acq_with_gradient
, which are easy to derive and compute
for common acquisition functions.
Implementing a Covariance Function for GP Surrogate Models
A Gaussian process, modelling a random function \(y(\mathbf{x})\), is defined by a mean function \(\mu(\mathbf{x})\) and a covariance function (or kernel) \(k(\mathbf{x}, \mathbf{x}')\). While Syne Tune contains a number of different covariance functions for multi-fidelity HPO, where learning curves \(y(\mathbf{x}, r)\) are modelled, \(r = 1, 2, \dots\) the number of epochs trained (details are provided here), it currently provides the Matern 5/2 covariance function only for models of \(y(\mathbf{x})\). A few comments up front:
Mean and covariance functions are parts of (Gaussian process) surrogate models. For these models, complex gradients are required for different purposes. First, our Bayesian optimization code supports gradient-based minimization of the acquisition function. Second, a surrogate model is fitted to observed data, which is typically done by gradient-based optimization (e.g., marginal likelihood optimization, empirical Bayes) or by gradient-based Markov Chain Monte Carlo (e.g., Hamiltonian Monte Carlo). This means that covariance function code must be written in a framework supporting automatic differentiation. In Syne Tune, this code resides in
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd
. It is based on autograd.Covariance functions contain parameters to be fitted to observed data. Kernels in Syne Tune typically feature an overall output scale, as well as inverse bandwidths for the input. In the (so called) automatic relevance determination parameterization, we use one inverse bandwidth per input vector component. This allows the surrogate model to learn relevance to certain input components: if components are not relevant to explain the observed data, their inverse bandwidths can be driven to very small values. Syne Tune uses code extracted from MXNet Gluon for managing parameters. The base class
KernelFunction
derives fromMeanFunction
, which derives fromBlock
. The main service of this class is to maintain a parameter dictionary, collecting all parameters in the current objects and its members (recursively).
In order to understand how a new covariance function can be implemented, we will
go through the most important parts of
Matern52
.
This covariance function is defined as:
Its parameters are the output scale \(c > 0\) and the inverse bandwidths
\(s_j > 0\), where \(\mathbf{S}\) is the
diagonal matrix with diagonal entries \(s_j\). If ARD == False
, there
is only a single bandwidth parameter \(s > 0\).
First, we need some includes:
import autograd.numpy as anp
from autograd.tracer import getval
from typing import Dict, Any
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.constants import (
INITIAL_COVARIANCE_SCALE,
INITIAL_INVERSE_BANDWIDTHS,
DEFAULT_ENCODING,
INVERSE_BANDWIDTHS_LOWER_BOUND,
INVERSE_BANDWIDTHS_UPPER_BOUND,
COVARIANCE_SCALE_LOWER_BOUND,
COVARIANCE_SCALE_UPPER_BOUND,
NUMERICAL_JITTER,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution import (
Uniform,
LogNormal,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon import Block
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers import (
encode_unwrap_parameter,
register_parameter,
create_encoding,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean import (
MeanFunction,
)
Since a number of covariance functions are simple expressions of squared distances \(\|\mathbf{S} (\mathbf{x} - \mathbf{x}')\|^2\), Syne Tune contains a block for this one:
class SquaredDistance(Block):
r"""
Block that is responsible for the computation of matrices of squared
distances. The distances can possibly be weighted (e.g., ARD
parametrization). For instance:
.. math::
m_{i j} = \sum_{k=1}^d ib_k^2 (x_{1: i k} - x_{2: j k})^2
\mathbf{X}_1 = [x_{1: i j}],\quad \mathbf{X}_2 = [x_{2: i j}]
Here, :math:`[ib_k]` is the vector :attr:`inverse_bandwidth`.
if ``ARD == False``, ``inverse_bandwidths`` is equal to a scalar broadcast to the
d components (with ``d = dimension``, i.e., the number of features in ``X``).
:param dimension: Dimensionality :math:`d` of input vectors
:param ARD: Automatic relevance determination (``inverse_bandwidth`` vector
of size ``d``)? Defaults to ``False``
:param encoding_type: Encoding for ``inverse_bandwidth``. Defaults to
:const:`~syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.constants.DEFAULT_ENCODING`
"""
def __init__(
self,
dimension: int,
ARD: bool = False,
encoding_type: str = DEFAULT_ENCODING,
**kwargs
):
super().__init__(**kwargs)
self.ARD = ARD
inverse_bandwidths_dimension = 1 if not ARD else dimension
self.encoding = create_encoding(
encoding_type,
INITIAL_INVERSE_BANDWIDTHS,
INVERSE_BANDWIDTHS_LOWER_BOUND,
INVERSE_BANDWIDTHS_UPPER_BOUND,
inverse_bandwidths_dimension,
Uniform(INVERSE_BANDWIDTHS_LOWER_BOUND, INVERSE_BANDWIDTHS_UPPER_BOUND),
)
with self.name_scope():
self.inverse_bandwidths_internal = register_parameter(
self.params,
"inverse_bandwidths",
self.encoding,
shape=(inverse_bandwidths_dimension,),
)
def _inverse_bandwidths(self):
return encode_unwrap_parameter(self.inverse_bandwidths_internal, self.encoding)
def forward(self, X1, X2):
"""Computes matrix of squared distances
:param X1: input matrix, shape ``(n1, d)``
:param X2: input matrix, shape ``(n2, d)``
"""
# In case inverse_bandwidths if of size (1, dimension), dimension>1,
# ARD is handled by broadcasting
inverse_bandwidths = anp.reshape(self._inverse_bandwidths(), (1, -1))
X1_scaled = anp.multiply(X1, inverse_bandwidths)
X1_squared_norm = anp.sum(anp.square(X1_scaled), axis=1)
if X2 is X1:
D = -2.0 * anp.dot(X1_scaled, anp.transpose(X1_scaled))
X2_squared_norm = X1_squared_norm
else:
X2_scaled = anp.multiply(X2, inverse_bandwidths)
D = -2.0 * anp.matmul(X1_scaled, anp.transpose(X2_scaled))
X2_squared_norm = anp.sum(anp.square(X2_scaled), axis=1)
D = D + anp.reshape(X1_squared_norm, (-1, 1))
D = D + anp.reshape(X2_squared_norm, (1, -1))
return anp.abs(D)
def get_params(self) -> Dict[str, Any]:
"""
Parameter keys are "inv_bw<k> "if ``dimension > 1``, and "inv_bw" if
``dimension == 1``.
"""
inverse_bandwidths = anp.reshape(self._inverse_bandwidths(), (-1,))
if inverse_bandwidths.size == 1:
return {"inv_bw": inverse_bandwidths[0]}
else:
return {
"inv_bw{}".format(k): inverse_bandwidths[k]
for k in range(inverse_bandwidths.size)
}
def set_params(self, param_dict: Dict[str, Any]):
dimension = self.encoding.dimension
if dimension == 1:
inverse_bandwidths = [param_dict["inv_bw"]]
else:
keys = ["inv_bw{}".format(k) for k in range(dimension)]
for k in keys:
assert k in param_dict, "'{}' not in param_dict = {}".format(
k, param_dict
)
inverse_bandwidths = [param_dict[k] for k in keys]
self.encoding.set(self.inverse_bandwidths_internal, inverse_bandwidths)
In the constructor, we create a parameter vector for the inverse bandwidths \([s_j]\), which can be just a scalar if
ARD == False
. In Syne Tune, each parameter has an encoding (e.g., identity or logarithmic), which includes a lower and upper bound, an initial value, as well as a prior distribution. The latter is used for regularization during optimization.The most important method is
forward
. Given two matrices \(\mathbf{X}_1\), \(\mathbf{X}_2\), whose rows are input vectors, we compute the matrix \([\|\mathbf{x}_{1:i} - \mathbf{x}_{2:j}\|^2]_{i, j}\) of squared distances. Most important, we useanp = autograd.numpy
here instead ofnumpy
. Theseautograd
wrappers ensure that automatic differentiation can be used in order to compute gradients w.r.t. leaf nodes in the computation graph spanned by thenumpy
computations. Also, note the use ofencode_unwrap_parameter
in_inverse_bandwidths
to obtain the inverse bandwidth parameters asnumpy
array. Finally, note thatX1
andX2
can be the same object, in which case we can save compute time and create a smaller computation graph.Each block in Syne Tune also provides
get_params
andset_params
methods, which are used for serialization and deserialization.
Given this code, the implementation of
Matern52
is simple:
class Matern52(KernelFunction):
"""
Block that is responsible for the computation of Matern 5/2 kernel.
if ``ARD == False``, ``inverse_bandwidths`` is equal to a scalar broadcast to the
d components (with ``d = dimension``, i.e., the number of features in ``X``).
Arguments on top of base class :class:`SquaredDistance`:
:param has_covariance_scale: Kernel has covariance scale parameter? Defaults
to ``True``
"""
def __init__(
self,
dimension: int,
ARD: bool = False,
encoding_type: str = DEFAULT_ENCODING,
has_covariance_scale: bool = True,
**kwargs
):
super(Matern52, self).__init__(dimension, **kwargs)
self.has_covariance_scale = has_covariance_scale
self.squared_distance = SquaredDistance(
dimension=dimension, ARD=ARD, encoding_type=encoding_type
)
if has_covariance_scale:
self.encoding = create_encoding(
encoding_name=encoding_type,
init_val=INITIAL_COVARIANCE_SCALE,
constr_lower=COVARIANCE_SCALE_LOWER_BOUND,
constr_upper=COVARIANCE_SCALE_UPPER_BOUND,
dimension=1,
prior=LogNormal(0.0, 1.0),
)
with self.name_scope():
self.covariance_scale_internal = register_parameter(
self.params, "covariance_scale", self.encoding
)
@property
def ARD(self) -> bool:
return self.squared_distance.ARD
def _covariance_scale(self):
if self.has_covariance_scale:
return encode_unwrap_parameter(
self.covariance_scale_internal, self.encoding
)
else:
return 1.0
def forward(self, X1, X2):
"""Computes Matern 5/2 kernel matrix
:param X1: input matrix, shape ``(n1,d)``
:param X2: input matrix, shape ``(n2,d)``
"""
covariance_scale = self._covariance_scale()
X1 = self._check_input_shape(X1)
if X2 is not X1:
X2 = self._check_input_shape(X2)
D = 5.0 * self.squared_distance(X1, X2)
# Using the plain np.sqrt is numerically unstable for D ~ 0
# (non-differentiability)
# that's why we add NUMERICAL_JITTER
B = anp.sqrt(D + NUMERICAL_JITTER)
return anp.multiply((1.0 + B + D / 3.0) * anp.exp(-B), covariance_scale)
def diagonal(self, X):
X = self._check_input_shape(X)
covariance_scale = self._covariance_scale()
covariance_scale_times_ones = anp.multiply(
anp.ones((getval(X.shape[0]), 1)), covariance_scale
)
return anp.reshape(covariance_scale_times_ones, (-1,))
def diagonal_depends_on_X(self):
return False
def param_encoding_pairs(self):
result = [
(
self.squared_distance.inverse_bandwidths_internal,
self.squared_distance.encoding,
)
]
if self.has_covariance_scale:
result.insert(0, (self.covariance_scale_internal, self.encoding))
return result
def get_covariance_scale(self):
if self.has_covariance_scale:
return self._covariance_scale()[0]
else:
return 1.0
def set_covariance_scale(self, covariance_scale):
assert self.has_covariance_scale, "covariance_scale is fixed to 1"
self.encoding.set(self.covariance_scale_internal, covariance_scale)
def get_params(self) -> Dict[str, Any]:
result = self.squared_distance.get_params()
if self.has_covariance_scale:
result["covariance_scale"] = self.get_covariance_scale()
return result
def set_params(self, param_dict: Dict[str, Any]):
self.squared_distance.set_params(param_dict)
if self.has_covariance_scale:
self.set_covariance_scale(param_dict["covariance_scale"])
In the constructor, we create an object of type
SquaredDistance
. A nice feature of MXNet Gluon blocks is that the parameter dictionary of an object is automatically extended by the dictionaries of members, so we don’t need to cater for that. Beware that this only works for members which are of typeBlock
directly. If you use a list or dictionary containing such objects, you need to include their parameter dictionaries explicitly. Next, we also define a covariance scale parameter \(c > 0\), unlesshas_covariance_scale == False
.forward
callsforward
of theSquaredDistance
object, then computes the kernel matrix, usinganp = autograd.numpy
once more.diagonal
returns the diagonal of the kernel matrix based on a matrixX
of inputs. For this particular kernel, the diagonal does not depend on the content ofX
, but only its shape, which is whydiagonal_depends_on_X
returnsFalse
.Besides
get_params
andset_params
, we also need to implementparam_encoding_pairs
, which is required by the optimization code used for fitting the surrogate model parameters.
At this point, you should not have any major difficulties implementing a new covariance function, such as the Gaussian kernel or the Matern kernel with parameter 3/2.
The Factory for Gaussian Process Searchers
Once a covariance function (or any other component of a surrogate model) has
been added, how is it accessed by a user? In general, all details about the
surrogate model are specified in search_options
passed to
FIFOScheduler
or
BayesianOptimization
. Available options
are documented in
GPFIFOSearcher
. Syne Tune
offers a range of searchers based on various Gaussian process surrogate models
(e.g., single fidelity, multi-fidelity, constrained, cost-aware). The code to
generate all required components for these searchers is bundled in
gp_searcher_factory
. For
each type of searcher, there is a factory function and a defaults function.
For BayesianOptimization
(which is
equivalent to FIFOScheduler
with
searcher="bayesopt"
), we have:
gp_fifo_searcher_factory()
: Takessearch_options
forkwargs
and returns the arguments for theGPFIFOSearcher
constructor.gp_fifo_searcher_defaults()
: Provides default values and type constraints forsearch_options
The searcher object is created in
searcher_factory()
.
Finally, search_options
are merged with default values, and searcher_factory
is called in the constructor of
FIFOScheduler
. This process keeps
things simple for the user, who just has to specify the type of searcher by
searcher
, and additional arguments by search_options
. For any argument
not provided there, a sensible default value is used.
Factory and default functions in
gp_searcher_factory
are based
on common code in this module, which reflects the complexity of some of the
searchers, but is otherwise self-explanatory. As a continuation of the
previous section, suppose we had implemented a novel covariance function to
be used in GP-based Bayesian optimization. The user-facing argument to select
a kernel is gp_base_kernel
, its default value is “matern52-ard” (Matern
5/2 with ARD parameters). Here is the code for creating this covariance
function in gp_searcher_factory
:
def _create_base_gp_kernel(hp_ranges: HyperparameterRanges, **kwargs) -> KernelFunction:
"""
The default base kernel is :class:`Matern52` with ARD parameters.
But in the transfer learning case, the base kernel is a product of
two ``Matern52`` kernels, the first non-ARD over the categorical
parameter determining the task, the second ARD over the remaining
parameters.
"""
input_warping = kwargs.get("input_warping", False)
if kwargs.get("transfer_learning_task_attr") is not None:
if input_warping:
logger.warning(
"Cannot use input_warping=True together with transfer_learning_task_attr. Will use input_warping=False"
)
# Transfer learning: Specific base kernel
kernel = create_base_gp_kernel_for_warmstarting(hp_ranges, **kwargs)
else:
has_covariance_scale = kwargs.get("has_covariance_scale", True)
kernel = base_kernel_factory(
name=kwargs["gp_base_kernel"],
dimension=hp_ranges.ndarray_size,
has_covariance_scale=has_covariance_scale,
)
if input_warping:
# Use input warping on all coordinates which do not belong to a
# categorical hyperparameter
kernel = kernel_with_warping(kernel, hp_ranges)
if kwargs.get("debug_log", False) and isinstance(kernel, WarpedKernel):
ranges = [(warp.lower, warp.upper) for warp in kernel.warpings]
logger.info(
f"Creating base GP covariance kernel with input warping: ranges = {ranges}"
)
return kernel
Ignoring
transfer_learning_task_attr
, we first callbase_kernel_factory
to create the base kernel, passingkwargs["gp_base_kernel"]
as its name.Syne Tune also supports warping of the inputs to a kernel, which adds two more parameters for each component (except those coming from categorical hyperparameters, these are not warped).
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel import (
KernelFunction,
Matern52,
ExponentialDecayResourcesKernelFunction,
ExponentialDecayResourcesMeanFunction,
FreezeThawKernelFunction,
FreezeThawMeanFunction,
CrossValidationMeanFunction,
CrossValidationKernelFunction,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping import (
WarpedKernel,
Warping,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean import (
MeanFunction,
)
SUPPORTED_BASE_MODELS = (
"matern52-ard",
"matern52-noard",
)
def base_kernel_factory(name: str, dimension: int, **kwargs) -> KernelFunction:
assert (
name in SUPPORTED_BASE_MODELS
), f"name = {name} not supported. Choose from:\n{SUPPORTED_BASE_MODELS}"
return Matern52(
dimension=dimension,
ARD=name == "matern52-ard",
has_covariance_scale=kwargs.get("has_covariance_scale", True),
)
base_kernel_factory
creates the base kernel, based on its name (must be inSUPPORTED_BASE_MODELS
, the dimension of input vectors, as well as further parameters (has_covariance_scale
in our example). Currently, Syne Tune only supports the Matern 5/2 kernel, with and without ARD.Had we implemented a novel covariance function, we would have to select a new name, insert it into
SUPPORTED_BASE_MODELS
, and insert code intobase_kernel_factory
. Once this is done, the new base kernel can as well be selected as component in multi-fidelity or constrained Bayesian optimization.
Combining a Gaussian Process Model from Components
We have already seen above how to implement a surrogate model from scratch. However, many Gaussian process models proposed in the Bayesian optimization literature are combinations of more basic underlying models. In this section, we show how such combinations are implemented in Syne Tune.
Note
When planning to implement a new Gaussian process model, you should first check whether the outcome is simply a Gaussian process with mean and covariance function arising from combinations of means and kernels of the components. If that is the case, it is often simpler and more efficient to implement a new mean and covariance function using existing code (as shown above), and to use a standard GP model with these functions.
Independent Processes for Multiple Fidelities
In this section, we will look at the example of
independent
,
providing a surrogate model for a set of functions
\(y(\mathbf{x}, r)\), where \(r\in \mathcal{R}\) is an integer from a
finite set. This model is used in the context of
multi-fidelity HPO.
Each \(y(\mathbf{x}, r)\) is represented by an independent Gaussian process,
with mean function \(\mu_r(\mathbf{x})\) and covariance function
\(c_r k(\mathbf{x}, \mathbf{x}')\). The covariance function \(k\) is
shared between all the processes, but the scale parameters \(c_r > 0\) are
different for each process. In multi-fidelity HPO, we observe more data at
smaller resource levels \(r\). Using the same ARD-parameterized kernel for
all processes allows to share statistical strenght between the different
levels. The code in
independent
follows a useful pattern:
IndependentGPPerResourcePosteriorState
: Posterior state, representing the posterior distribution after conditioning on data. This is used (a) to compute the log marginal likelihood for fitting the model parameters, and (b) for predictions driving the acquisition function optimization.IndependentGPPerResourceMarginalLikelihood
: Wraps code to generate posterior state, and represents the negative log marginal likelihood function used to fit the model parameters.IndependentGPPerResourceModel
: Wraps code for creating the likelihood object. API towards higher level code.
The code of
IndependentGPPerResourcePosteriorState
is a simple reduction to
GaussProcPosteriorState
,
the posterior state for a basic Gaussian process. For example, here is the code
to compute the posterior state:
def _compute_states(
self,
features: np.ndarray,
targets: np.ndarray,
kernel: KernelFunction,
mean: Dict[int, MeanFunction],
covariance_scale: Dict[int, np.ndarray],
noise_variance: Dict[int, np.ndarray],
resource_attr_range: Tuple[int, int],
debug_log: bool = False,
):
features, resources = decode_extended_features(features, resource_attr_range)
self._states = dict()
for resource, mean_function in mean.items():
cov_scale = covariance_scale[resource]
rows = np.flatnonzero(resources == resource)
if rows.size > 0:
r_features = features[rows]
r_targets = targets[rows]
self._states[resource] = GaussProcPosteriorState(
features=r_features,
targets=r_targets,
mean=mean_function,
kernel=(kernel, cov_scale),
noise_variance=noise_variance[resource],
debug_log=debug_log,
)
mean
andcovariance_scale
are dictionaries containing \(\mu_r\) and \(c_r\) respectively.features
are extended features of the form \((\mathbf{x}_i, r_i)\). The functiondecode_extended_features
maps this to arrays \([\mathbf{x}_i]\) and \([r_i]\).We compute separate posterior states for each level \(r\in\mathcal{R}\), using the data \((\mathbf{x}_i, y_i)\) so that \(r_i = r\).
Other methods of the base class
PosteriorStateWithSampleJoint
are implemented accordingly, reducing computations to the states for each level.
The code of
IndependentGPPerResourceMarginalLikelihood
is obvious, given the base class
MarginalLikelihood
.
The same holds for
IndependentGPPerResourceModel
,
given the base class
GaussianProcessOptimizeModel
.
One interesting feature is that the creation of the likelihood object is
delayed, because the set of rung levels \(\mathcal{R}\) of the multi-fidelity
scheduler need to be known. The create_likelihood
method is called in
configure_scheduler()
,
a callback function with the scheduler as argument.
Since our independent GP model implements the APIs of
MarginalLikelihood
and
GaussianProcessOptimizeModel
,
we can plug it into generic code in syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model
,
which works as outlined
above.
In particular, the estimator
GaussProcEmpiricalBayesEstimator
accepts gp_model
of type
IndependentGPPerResourceModel
,
and it creates predictors of type
GaussProcPredictor
.
Overview of gpautograd
Most of the code in
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd
adheres to
the same pattern (posterior state, likelihood function, model wrapper):
Standard GP model:
GaussProcPosteriorState
,GaussianProcessMarginalLikelihood
,GaussianProcessRegression
. This also covers multi-task GP models for multi-fidelity, by way of extended configurations.Independent GP models for multi-fidelity (example above):
IndependentGPPerResourcePosteriorState
,IndependentGPPerResourceMarginalLikelihood
,IndependentGPPerResourceModel
.Hyper-Tune independent GP models for multi-fidelity:
HyperTuneIndependentGPPosteriorState
,HyperTuneIndependentGPMarginalLikelihood
,HyperTuneIndependentGPModel
.Hyper-Tune multi-task GP models for multi-fidelity:
HyperTuneJointGPPosteriorState
,HyperTuneJointGPMarginalLikelihood
,HyperTuneJointGPModel
.Linear state space learning curve models:
IncrementalUpdateGPAdditivePosteriorState
,GaussAdditiveMarginalLikelihood
,GaussianProcessLearningCurveModel
. This code is still experimental.
PASHA: Efficient HPO and NAS with Progressive Resource Allocation
Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. PASHA is an approach designed to tackle the challenge of tuning machine learning models trained on large datasets with limited computational resources. PASHA extends ASHA and is able to dynamically allocate maximum resources for the tuning procedure depending on the need. The experimental comparison shows that PASHA identifies well-performing hyperparameter configurations and architectures while consuming significantly fewer computational resources than ASHA.
What is PASHA?
The goal of PASHA is to identify well-performing configurations significantly faster than current methods, so that we can then retrain the model with the selected configuration (in practice on the combined training and validation sets). By giving preference to evaluating more configurations rather than evaluating them for longer than needed, PASHA can lead to significant speedups while achieving similar performance as existing methods.
PASHA is a variant of ASHA that starts with a small amount of initial resources and gradually increases them depending on the stability of configuration rankings in the top two rungs (rounds of promotion). Each time the ranking of configurations in the top two rungs becomes inconsistent, PASHA increases the maximum number of resources. This can be understood as “unlocking” a new rung level. An illustration of how PASHA stops early if the ranking of configurations has stabilized is shown in Figure 1.

Given that deep-learning algorithms typically rely on stochastic gradient descent, ranking inconsistencies can occur between similarly performing configurations. Hence, we need some benevolence in estimating the ranking. As a solution, PASHA uses a soft-ranking approach where we group configurations based on their validation performance metric (e.g. accuracy).
In soft ranking, configurations are still sorted by predictive performance but they are considered equivalent if the performance difference is smaller than a value \(\epsilon\) (or equal to it). Instead of producing a sorted list of configurations, this provides a list of lists where for every position of the ranking there is a list of equivalent configurations. The concept is explained graphically in Figure 2. The value of \(\epsilon\) is automatically estimated by measuring noise in rankings.

How well does PASHA work?
Experimental evaluation has shown PASHA consistently leads to strong improvements in runtime, while achieving similar accuracies as ASHA. PASHA is e.g. three times faster than ASHA on NASBench201. Full experiments and further details are available in PASHA: Efficient HPO and NAS with Progressive Resource Allocation.
We provide an example script launch_pasha_nasbench201.py that shows how to run an experiment with PASHA on NASBench201.
Recommendations
PASHA is particularly useful for large-scale datasets with millions of datapoints, where it can lead to e.g. 15x speedup compared to ASHA.
If only a few epochs are used for training, it is useful to define rung levels in terms of the number of datapoints processed rather than the number of epochs. This makes it possible for PASHA to stop the HPO significantly earlier and obtain a large speedup.
A suitable stopping criterion for PASHA is the number of configurations that have been evaluated so far, but it can also be evaluated using stopping criteria based on the wallclock time. With time-based criteria PASHA would make an impact when the stopping time is selected as a small value.
Using Syne Tune for Transfer Learning
Transfer learning allows us to speed up our current optimisation by learning from related optimisation runs. For instance, imagine we want to change from a smaller to a larger model. We already have a collection of hyperparameter evaluations for the smaller model. Then we can use these to guide our hyperparameter optimisation of the larger model, for instance by starting with the configuration that performed best. Or imagine that we keep the same model, but add more training data or add another data feature. Then we expect good hyperparameter configurations on the previous training data to work well on the augmented data set as well.
Syne Tune includes implementations of several transfer learning schedulers; a list of available schedulers is given here. In this tutorial we look at three of them:
ZeroShotTransfer
- Sequential Model-Free Hyperparameter Tuning.Martin Wistuba, Nicolas Schilling, Lars Schmidt-Thieme.IEEE International Conference on Data Mining (ICDM) 2015.First we calculate the rank of each hyperparameter configuration on each previous task. Then we choose configurations in order to minimise the sum of the ranks across the previous tasks. The idea is to speed up optimisation by picking configurations with high ranks on previous tasks.
BoundingBox
- Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning.Valerio Perrone, Huibin Shen, Matthias Seeger, Cédric Archambeau, Rodolphe Jenatton.NeurIPS 2019.We construct a smaller hyperparameter search space by taking the minimum box which contains the optimal configurations for the previous tasks. The idea is to speed up optimisation by not searching areas which have been suboptimal for all previous tasks.
- Quantiles (
quantile_based_searcher
) - A Quantile-based Approach for Hyperparameter Transfer Learning.David Salinas, Huibin Shen, Valerio Perrone.ICML 2020.We map the hyperparameter evaluations to quantiles for each task. Then we learn a distribution of quantiles given hyperparameters. Finally, we sample from the distribution and evaluate the best sample. The idea is to speed up optimisation by searching areas with high-ranking configurations but without enforcing hard limits on the search space.
- Quantiles (
We compare them to standard
BayesianOptimization
(BO).
We construct a set of tasks based on the
height example. We first collect
evaluations on five tasks, and then compare results on the sixth. We consider
the single-fidelity case. For each task we assume a budget of 10 (max_trials
)
evaluations.
We use BO on the preliminary tasks, and for the transfer task we compare BO,
ZeroShot, BoundingBox and Quantiles. The set of tasks is made by adjusting the
max_steps
parameter in the height example, but could correspond to adjusting
the training data instead.
The code is available
here.
Make sure to run it as
python launch_transfer_learning_example.py --generate_plots
if you want to generate the plots locally.
The optimisations vary between runs, so your plots might look
different.
In order to run our transfer learning schedulers we need to parse the output of
the tuner into a dict of
TransferLearningTaskEvaluations
.
We do this in the extract_transferable_evaluations
function.
def filter_completed(df):
# Filter out runs that didn't finish
return df[df["status"] == "Completed"].reset_index()
def extract_transferable_evaluations(df, metric, config_space):
"""
Take a dataframe from a tuner run, filter it and generate
TransferLearningTaskEvaluations from it
"""
filter_df = filter_completed(df)
return TransferLearningTaskEvaluations(
configuration_space=config_space,
hyperparameters=filter_df[config_space.keys()],
objectives_names=[metric],
# objectives_evaluations need to be of shape
# (num_evals, num_seeds, num_fidelities, num_objectives)
# We only have one seed, fidelity and objective
objectives_evaluations=np.array(filter_df[metric], ndmin=4).T,
)
We start by collecting evaluations by running BayesianOptimization
on
the five preliminary
tasks. We generate the different tasks by setting max_steps=1..5
in the
backend in init_scheduler
, giving five very similar tasks.
Once we have run BO on the task we store the
evaluations as TransferLearningTaskEvaluations
.
def run_scheduler_on_task(entry_point, scheduler, max_trials):
"""
Take a scheduler and run it for max_trials on the backend specified by entry_point
Return a dataframe of the optimisation results
"""
tuner = Tuner(
trial_backend=LocalBackend(entry_point=str(entry_point)),
scheduler=scheduler,
stop_criterion=StoppingCriterion(max_num_trials_finished=max_trials),
n_workers=4,
sleep_time=0.001,
)
tuner.run()
return tuner.tuning_status.get_dataframe()
def init_scheduler(
scheduler_str, max_steps, seed, mode, metric, transfer_learning_evaluations
):
"""
Initialise the scheduler
"""
kwargs = {
"metric": metric,
"config_space": height_config_space(max_steps=max_steps),
"mode": mode,
"random_seed": seed,
}
kwargs_w_trans = copy.deepcopy(kwargs)
kwargs_w_trans["transfer_learning_evaluations"] = transfer_learning_evaluations
if scheduler_str == "BayesianOptimization":
return BayesianOptimization(**kwargs)
if scheduler_str == "ZeroShotTransfer":
return ZeroShotTransfer(use_surrogates=True, **kwargs_w_trans)
if scheduler_str == "Quantiles":
return FIFOScheduler(
searcher=QuantileBasedSurrogateSearcher(**kwargs_w_trans),
**kwargs,
)
if scheduler_str == "BoundingBox":
kwargs_sched_fun = {key: kwargs[key] for key in kwargs if key != "config_space"}
kwargs_w_trans[
"scheduler_fun"
] = lambda new_config_space, mode, metric: BayesianOptimization(
new_config_space,
**kwargs_sched_fun,
)
del kwargs_w_trans["random_seed"]
return BoundingBox(**kwargs_w_trans)
raise ValueError("scheduler_str not recognised")
if __name__ == "__main__":
max_trials = 10
np.random.seed(1)
# Use train_height backend for our tests
entry_point = str(
Path(__file__).parent
/ "training_scripts"
/ "height_example"
/ "train_height.py"
)
# Collect evaluations on preliminary tasks
transfer_learning_evaluations = {}
for max_steps in range(1, 6):
scheduler = init_scheduler(
"BayesianOptimization",
max_steps=max_steps,
seed=np.random.randint(100),
mode=METRIC_MODE,
metric=METRIC_ATTR,
transfer_learning_evaluations=None,
)
print("Optimising preliminary task %s" % max_steps)
prev_task = run_scheduler_on_task(entry_point, scheduler, max_trials)
# Generate TransferLearningTaskEvaluations from previous task
transfer_learning_evaluations[max_steps] = extract_transferable_evaluations(
prev_task, METRIC_ATTR, scheduler.config_space
)
Then we run different schedulers to compare on our transfer task with
max_steps=6
. For ZeroShotTransfer
we set use_surrogates=True
, meaning
that it uses an XGBoost model to estimate the rank of configurations, as we do
not have evaluations of the same configurations on all previous tasks.
# Collect evaluations on transfer task
max_steps = 6
transfer_task_results = {}
labels = ["BayesianOptimization", "BoundingBox", "ZeroShotTransfer", "Quantiles"]
for scheduler_str in labels:
scheduler = init_scheduler(
scheduler_str,
max_steps=max_steps,
seed=max_steps,
mode=METRIC_MODE,
metric=METRIC_ATTR,
transfer_learning_evaluations=transfer_learning_evaluations,
)
print("Optimising transfer task using %s" % scheduler_str)
transfer_task_results[scheduler_str] = run_scheduler_on_task(
entry_point, scheduler, max_trials
)
We plot the results on the transfer task. We see that the early performance of
the transfer schedulers is much better than standard BO. We only plot the first
max_trials
results. The transfer task is very similar to the preliminary
tasks, so we expect the transfer schedulers to do well. And that is what we see
in the plot below.
def add_labels(ax, conf_space, title):
ax.legend()
ax.set_xlabel("width")
ax.set_ylabel("height")
ax.set_xlim([conf_space["width"].lower - 1, conf_space["width"].upper + 1])
ax.set_ylim([conf_space["height"].lower - 10, conf_space["height"].upper + 10])
ax.set_title(title)
def scatter_space_exploration(ax, task_hyps, max_trials, label, color=None):
ax.scatter(
task_hyps["width"][:max_trials],
task_hyps["height"][:max_trials],
alpha=0.4,
label=label,
color=color,
)
colours = {
"BayesianOptimization": "C0",
"BoundingBox": "C1",
"ZeroShotTransfer": "C2",
"Quantiles": "C3",
}
def plot_last_task(max_trials, df, label, metric, color):
max_tr = min(max_trials, len(df))
plt.scatter(range(max_tr), df[metric][:max_tr], label=label, color=color)
plt.plot([np.min(df[metric][:ii]) for ii in range(1, max_trials + 1)], color=color)
# Optionally generate plots. Defaults to False
parser = argparse.ArgumentParser()
parser.add_argument(
"--generate_plots", action="store_true", help="generate optimisation plots."
)
args = parser.parse_args()
if args.generate_plots:
from syne_tune.try_import import try_import_visual_message
try:
import matplotlib.pyplot as plt
except ImportError:
print(try_import_visual_message())
print("Generating optimisation plots.")
""" Plot the results on the transfer task """
for label in labels:
plot_last_task(
max_trials,
transfer_task_results[label],
label=label,
metric=METRIC_ATTR,
color=colours[label],
)
plt.legend()
plt.ylabel(METRIC_ATTR)
plt.xlabel("Iteration")
plt.title("Transfer task (max_steps=6)")
plt.savefig("Transfer_task.png", bbox_inches="tight")

We also look at the parts of the search space explored. First by looking at the preliminary tasks.
""" Plot the configs tried for the preliminary tasks """
fig, ax = plt.subplots()
for key in transfer_learning_evaluations:
scatter_space_exploration(
ax,
transfer_learning_evaluations[key].hyperparameters,
max_trials,
"Task %s" % key,
)
add_labels(
ax,
scheduler.config_space,
"Explored locations of BO for preliminary tasks",
)
plt.savefig("Configs_explored_preliminary.png", bbox_inches="tight")

Then we look at the explored search space for the transfer task. For all the transfer methods the first tested point (marked as a square) is closer to the previously explored optima (in black crosses), than for BO which starts by checking the middle of the search space.
""" Plot the configs tried for the transfer task """
fig, ax = plt.subplots()
# Plot the configs tried by the different schedulers on the transfer task
for label in labels:
finished_trials = filter_completed(transfer_task_results[label])
scatter_space_exploration(
ax, finished_trials, max_trials, label, color=colours[label]
)
# Plot the first config tested as a big square
ax.scatter(
finished_trials["width"][0],
finished_trials["height"][0],
marker="s",
color=colours[label],
s=100,
)
# Plot the optima from the preliminary tasks as black crosses
past_label = "Preliminary optima"
for key in transfer_learning_evaluations:
argmin = np.argmin(
transfer_learning_evaluations[key].objective_values(METRIC_ATTR)[
:max_trials, 0, 0
]
)
ax.scatter(
transfer_learning_evaluations[key].hyperparameters["width"][argmin],
transfer_learning_evaluations[key].hyperparameters["height"][argmin],
color="k",
marker="x",
label=past_label,
)
past_label = None
add_labels(ax, scheduler.config_space, "Explored locations for transfer task")
plt.savefig("Configs_explored_transfer.png", bbox_inches="tight")

Distributed Hyperparameter Tuning: Finding the Right Model can be Fast and Fun
These sections are part of a tutorial given at the Open Data Science Conference Europe in June 2023. They provide hands-on examples for distributed hyperparameter tuning, as well as links to further details for self-teaching.
Note
The code used in this tutorial is contained in the
Syne Tune sources, it is not
installed by pip
. You can obtain this code by installing Syne Tune from
source, but the only code that is needed is in
benchmarking.nursery.odsc_tutorial
.
You also need have access to AWS SageMaker, and work through
these setups.
Getting Started with Hyperparameter Tuning
In this section, you will learn what is needed to get hyperparameter tuning up and running. We will look at an example where a deep learning language model is trained on natural language text.
What is Hyperparameter Tuning?
When solving a business problem with machine learning, there are parts which can be automated by spending compute resources, and other parts require human expert attention and choices to be made. By automating some of the more tedious parts of the latter, hyperparameter tuning shifts the needle between these cost factors. Like any other smart tool, it saves you time to concentrate on where your strengths really lie, and where you can create the most value.
At a high level, hyperparameter tuning finds configurations of a system which optimize a target metric (or several ones, as we will see later). We can try any configuration from a configuration space, but each evaluation of the system has a cost and takes time. The main challenge of hyperparameter tuning is to run as few trials as possible, so that total costs are minimal. Also, if possible, trials should be run in parallel, so that the total experiment time is minimal.
In this tutorial, we will mostly be focussed on making decisions and tuning free parameters in the context of training machine learning models on data, so their predictions can be used as part of a solution to a business problem. There are many other steps between the initial need and a deployed solution, such as understanding business requirements, collecting, cleaning and labeling data, monitoring and maintenance. Some of these can be addressed with automated tuning as well, others need different tools.
A common paradigm for decision-making and parameter tuning is to try a number of different configurations and select the best in the end.
A trial consists of training a model on a part of the data (the training data). Here, training is an automated process (for example, stochastic gradient descent on weight and biases of a neural network model), given a configuration (e.g., what learning rate is used, what batch size, etc.). Then, the trained model is evaluated on another part of the data (validation data, disjoint from training data), giving rise to a quality metric (e.g., validation error, AUC, F1), or even several ones. For small datasets, we can also use cross-validation, by repeating training and evaluation on a number of different splits, reporting the average of validation metrics.
This metric value (or values) is the response of the system to a configuration. Note that the response is stochastic: if we run again with the same configuration, we may get a different value. This is because training has random elements (e.g., initial weights are sampled, ordering of training data).
Enough high level and definitions, let us dive into an example.
Annotating a Training Script
First, we need a script to execute a trial, by training a model and evaluating it. Since training models is bread and butter to machine learners, you will have no problem to come up with one. We start with an example: training_script_report_end.py. Ignoring the boilerplate, here are the important parts. First, we define the hyperparameters which should be optimized over:
from syne_tune import Reporter
from syne_tune.config_space import randint, uniform, loguniform, add_to_argparse
METRIC_NAME = "val_loss"
MAX_RESOURCE_ATTR = "epochs"
_config_space = {
"lr": loguniform(1e-6, 1e-3),
"dropout": uniform(0, 0.99),
"batch_size": randint(16, 48),
"momentum": uniform(0, 0.99),
"clip": uniform(0, 1),
}
The keys of
_config_space
are the hyperparameters we would like to tune (lr
,dropout
,batch_size
,momentum
,clip
). It also defines their ranges and datatypes, we come back to this below.METRIC_NAME
is the name of the target metric returned,MAX_RESOURCE_ATTR
the key name for how many epochs to train.
Next, here is the function which executes a trial:
def objective(config):
torch.manual_seed(config["seed"])
use_cuda = config["use_cuda"]
if torch.cuda.is_available() and not use_cuda:
print("WARNING: You have a CUDA device, so you should run with --use-cuda 1")
device = torch.device("cuda" if use_cuda else "cpu")
# [1]
# Download data, setup data loaders
corpus = download_dataset(config)
ntokens = len(corpus.dictionary)
train_data = batchify(corpus.train, bsz=config["batch_size"], device=device)
valid_data = batchify(corpus.valid, bsz=10, device=device)
# Used for reporting metrics to Syne Tune
report = Reporter()
# [2]
# Create model and optimizer
model, optimizer, criterion = create_training_objects(config, ntokens, device)
# [3]
for epoch in range(1, config[MAX_RESOURCE_ATTR] + 1):
train(model, train_data, optimizer, criterion, config, ntokens, epoch)
# [4]
# Report validation loss back to Syne Tune
val_loss = evaluate(model, valid_data, criterion, config, ntokens)
report(**{METRIC_NAME: val_loss})
The input
config
toobjective
is a configuration dictionary, containing values for the hyperparameters and other fixed parameters (such as the number of epochs to train).[1] We start with downloading training and validation data. The training data loader
train_data
depends on hyperparameterconfig["batch_size"]
.[2] Next, we create model and optimizer. This depends on the remaining hyperparameters in
config
.[3] We then run
config[MAX_RESOURCE_ATTR]
epochs of training.[4] Finally, we compute the error on the validation data and report it back to Syne Tune. The latter is done by creating
report
of typeReporter
and calling it with a dictionary, usingMETRIC_NAME
as key.
Finally, the script needs some command line arguments:
parser = argparse.ArgumentParser(
description="PyTorch Wikitext-2 Transformer Language Model",
formatter_class=argparse.RawTextHelpFormatter,
)
parser.add_argument(
"--" + MAX_RESOURCE_ATTR, type=int, default=40, help="upper epoch limit"
)
parser.add_argument("--use_cuda", type=int, default=1)
parser.add_argument(
"--input_data_dir",
type=str,
default="./",
help="location of the data corpus",
)
parser.add_argument(
"--optimizer_name", type=str, default="sgd", choices=["sgd", "adam"]
)
parser.add_argument("--bptt", type=int, default=35, help="sequence length")
parser.add_argument("--seed", type=int, default=1111, help="random seed")
parser.add_argument(
"--precision", type=str, default="float", help="float | double | half"
)
parser.add_argument(
"--log_interval",
type=int,
default=200,
help="report interval",
)
parser.add_argument("--d_model", type=int, default=256, help="width of the model")
parser.add_argument(
"--ffn_ratio", type=int, default=1, help="the ratio of d_ffn to d_model"
)
parser.add_argument("--nlayers", type=int, default=2, help="number of layers")
parser.add_argument(
"--nhead",
type=int,
default=2,
help="the number of heads in the encoder/decoder of the transformer model",
)
add_to_argparse(parser, _config_space)
args, _ = parser.parse_known_args()
args.use_cuda = bool(args.use_cuda)
objective(config=vars(args))
We use an argument parser
parser
. Hyperparameters can be added byadd_to_argparse(parser, _config_space)
, given the configuration space is defined in this script, or otherwise you can do this manually. We also need some more inputs, which are not hyperparameters, for exampleMAX_RESOURCE_ATTR
.
You can also provide the input to a training script as JSON file.
Compared to a vanilla training script, we only added two lines, creating
report
and calling it for reporting the validation error at the end.
Choosing a Configuration Space
Apart from annotating a training script, making hyperparameters explicit as inputs, you also need to define a configuration space. In our example, we add this definition to the script, but you can also keep it separate and use the same training script with different configuration spaces:
_config_space = {
"lr": loguniform(1e-6, 1e-3),
"dropout": uniform(0, 0.99),
"batch_size": randint(16, 48),
"momentum": uniform(0, 0.99),
"clip": uniform(0, 1),
}
Each hyperparameters gets assigned a data type and a range. In this example,
batch_size
is an integer, whilelr
,dropout
,momentum
,clip
are floats.lr
is encoded in log scale.
Syne Tune provides a range of data types. Choosing them well requires a bit of attention, guidelines are given here.
Specifying Default Values
Once you have annotated your training script and chosen a configuration space,
you have specified all the input Syne Tune needs. You can now specify the
details about your tuning experiment in code, as discussed
here.
However, Syne Tune provides some tooling in syne_tune.experiments
which makes the
life of most users easier, and we will use this tooling in the rest of the
tutorial. To this end, we need to define some defaults about how experiments
are to be run (most of these can be overwritten by command line arguments):
from pathlib import Path
from transformer_wikitext2.code.training_script import (
_config_space,
METRIC_NAME,
RESOURCE_ATTR,
MAX_RESOURCE_ATTR,
)
from syne_tune.experiments.benchmark_definitions.common import RealBenchmarkDefinition
from syne_tune.remote.constants import (
DEFAULT_GPU_INSTANCE_1GPU,
DEFAULT_GPU_INSTANCE_4GPU,
)
def transformer_wikitext2_benchmark(sagemaker_backend: bool = False, **kwargs):
if sagemaker_backend:
instance_type = DEFAULT_GPU_INSTANCE_1GPU
else:
# For local backend, GPU cores serve different workers
instance_type = DEFAULT_GPU_INSTANCE_4GPU
fixed_parameters = dict(
**{MAX_RESOURCE_ATTR: 40},
d_model=256,
ffn_ratio=1,
nlayers=2,
nhead=2,
bptt=35,
optimizer_name="sgd",
input_data_dir="./",
use_cuda=1,
seed=1111,
precision="float",
log_interval=200,
)
config_space = {**_config_space, **fixed_parameters}
_kwargs = dict(
script=Path(__file__).parent / "training_script.py",
config_space=config_space,
metric=METRIC_NAME,
mode="min",
max_resource_attr=MAX_RESOURCE_ATTR,
resource_attr=RESOURCE_ATTR,
max_wallclock_time=5 * 3600,
n_workers=4,
instance_type=instance_type,
framework="PyTorch",
)
_kwargs.update(kwargs)
return RealBenchmarkDefinition(**_kwargs)
All you need to do is to provide a function (transformer_wikitext2_benchmark
here)
which returns an instance of
RealBenchmarkDefinition
.
The most important fields are:
script
: Filename of training script.config_space
: The configuration space to be used by default. This consists of two parts. First, the hyperparameters from_config
, already discussed above. Second,fixed_parameters
are passed to each trial as they are. In particular, we would like to train for 40 epochs, so pass{MAX_RESOURCE_ATTR: 40}
.metric
,max_resource_attr
,resource_attr
: Names of inputs to and metrics reported from the training script. Ifmode == "max"
, the target metricmetric
is maximized, ifmode == "min"
, it is minimized.max_wallclock_time
: Wallclock time the experiment is going to run (5 hours in our example).n_workers
: Maximum number of trials which run in parallel (4 in our example). The achievable degree of parallelism may be lower, depending on which execution backend is used and which hardware instance we run on.
Also, note the role of **kwargs
in the function signature, which allows
to overwrite any of the default values (e.g., for max_wallclock_time
,
n_workers
, or instance_type
) with command line arguments.
Note
In the Syne Tune experimentation framework, a tuning problem (i.e., training and evaluation script together with defaults) is called a benchmark. This terminology is used even if the goal of experimentation is not benchmarking (i.e., comparing different HPO methods), as is the case in this tutorial here.
Multi-Fidelity Hyperparameter Tuning
In our example above, a transformer language model is trained for 40 epochs before being validated. If a configuration performs poorly, we should find out earlier, and a lot of time could be saved by stopping poorly performing trials early. This is what multi-fidelity HPO methods are doing. There are different variants:
Early stopping (“stopping” type): Trials are not just validated after 40 epochs, but at the end of every epoch. If a trial is performing worse than many others trained for the same number of epochs, it is stopped early.
Pause and resume (“promotion” type): Trials are generally paused at the end of certain epochs, called rungs. A paused trial gets promoted (i.e., its training is resumed) if it does better than a majority of trials who reached the same rung.
Syne Tune provides a large number of multi-fidelity HPO methods, more details are given in this tutorial. In this section, you learn what needs to be done to support multi-fidelity hyperparameter tuning.
Annotating a Training Script for Multi-fidelity Tuning
Clearly, the training script training_script_report_end.py won’t do for multi-fidelity tuning. These methods need to know validation errors of models after each epoch of training, while the script above only validates the model at the end, after 40 epochs of training. A small modification of our training script, training_script_no_checkpoints.py, enables multi-fidelity tuning. The relevant part is this:
def objective(config):
torch.manual_seed(config["seed"])
use_cuda = config["use_cuda"]
if torch.cuda.is_available() and not use_cuda:
print("WARNING: You have a CUDA device, so you should run with --use-cuda 1")
device = torch.device("cuda" if use_cuda else "cpu")
# Download data, setup data loaders
corpus = download_dataset(config)
ntokens = len(corpus.dictionary)
train_data = batchify(corpus.train, bsz=config["batch_size"], device=device)
valid_data = batchify(corpus.valid, bsz=10, device=device)
# Used for reporting metrics to Syne Tune
report = Reporter()
# Create model and optimizer
model, optimizer, criterion = create_training_objects(config, ntokens, device)
for epoch in range(1, config[MAX_RESOURCE_ATTR] + 1):
train(model, train_data, optimizer, criterion, config, ntokens, epoch)
val_loss = evaluate(model, valid_data, criterion, config, ntokens)
print("-" * 89)
print(
f"| end of epoch {epoch:3d} | valid loss {val_loss:5.2f} | "
f"valid ppl {np.exp(val_loss):8.2f}"
)
print("-" * 89)
# Report validation loss back to Syne Tune
report(**{RESOURCE_ATTR: epoch, METRIC_NAME: val_loss})
Instead of calling report
only once, at the end, we evaluate the model and
report back at the end of each epoch. We also need to report the number of
epochs done, using RESOURCE_ATTR
as key. The execution backend receives these
reports and relays them to the HPO method, which in turn makes a decision whether
the trial may continue or should be stopped.
Checkpointing
Instead of stopping underperforming trials, some multi-fidelity methods rather pause trials. Any paused trial can be resumed in the future if there is evidence that it outperforms the majority of other trials. If training is very expensive, pause-and-resume scheduling can work better than early stopping, because any pause decision can be revisited in the future, while a stopping decision is final. Moreover, pause-and-resume scheduling does not require trials to be stopped, which can carry delays in some execution backends.
However, pause-and-resume scheduling needs checkpointing in order to work well. Once a trial is paused, its mutable state is stored in disk. When a trial gets resumed, this state is loaded from disk, and training can resume exactly from where it stopped.
Checkpointing needs to be implemented as part of the training script. Fortunately, Syne Tune provides some tooling to simplify this. Another modification of our training script, training_script.py, enables checkpointing. The relevant part is this:
def objective(config):
torch.manual_seed(config["seed"])
use_cuda = config["use_cuda"]
if torch.cuda.is_available() and not use_cuda:
print("WARNING: You have a CUDA device, so you should run with --use-cuda 1")
device = torch.device("cuda" if use_cuda else "cpu")
# Download data, setup data loaders
corpus = download_dataset(config)
ntokens = len(corpus.dictionary)
train_data = batchify(corpus.train, bsz=config["batch_size"], device=device)
valid_data = batchify(corpus.valid, bsz=10, device=device)
# Used for reporting metrics to Syne Tune
report = Reporter()
# Create model and optimizer
model, optimizer, criterion = create_training_objects(config, ntokens, device)
# [3]
# Checkpointing
state_dict_objects = {
"model": model,
"optimizer": optimizer,
}
if config["precision"] == "half":
state_dict_objects["amp"] = amp
load_model_fn, save_model_fn = pytorch_load_save_functions(
state_dict_objects=state_dict_objects,
)
# [2]
# Resume from checkpoint
resume_from = resume_from_checkpointed_model(config, load_model_fn)
for epoch in range(resume_from + 1, config[MAX_RESOURCE_ATTR] + 1):
train(model, train_data, optimizer, criterion, config, ntokens, epoch)
val_loss = evaluate(model, valid_data, criterion, config, ntokens)
print("-" * 89)
print(
f"| end of epoch {epoch:3d} | valid loss {val_loss:5.2f} | "
f"valid ppl {np.exp(val_loss):8.2f}"
)
print("-" * 89)
# [1]
# Write checkpoint
checkpoint_model_at_rung_level(config, save_model_fn, epoch)
# Report validation loss back to Syne Tune
report(**{RESOURCE_ATTR: epoch, METRIC_NAME: val_loss})
Full details about supporting checkpointing are given in this tutorial. In a nutshell:
[1] Checkpoints have to be written at the end of each epoch, to a path passed as command line argument. A checkpoint needs to include the epoch number when it was written.
[2] Before the training loop starts, a checkpoint should be loaded from the same place. If one is found, the training loop skips all epochs already done. If not, it starts from scratch as usual.
[3] Syne Tune provides some checkpointing tooling for PyTorch models.
At this point, we have a final version, training_script.py, of our training script, which can be used with all HPO methods in Syne Tune. While earlier versions are simpler to implement, we recommend to include reporting and checkpointing after every epoch in any training script you care about. When checkpoints become very large, you may run into problems with disk space, which can be dealt with as described here.
Note
The pause-and-resume HPO methods in Syne Tune also work if checkpointing is not implemented. However, this means that training for a trial to be resumed in fact starts from scratch. The additional overhead makes running these methods less attractive. We strongly recommend to implement checkpointing.
Comparing Different HPO Methods
We have learned about different methods for hyperparameter tuning:
RandomSearch
: Sample configurations at randomBayesianOptimization
: Learn how to best sample by probabilistic modeling of past observationsASHA
: Compare running trials with each other after certain numbers of epochs and stop those which underperformMOBSTER
: Combine early stopping fromASHA
with informed sampling fromBayesianOptimization
How do these methods compare when applied to our transformer_wikitext2
tuning
problem? In this section, we look at comparative plots which can easily be
generated with Syne Tune.
Note
Besides MOBSTER
, Syne Tune provides
a number of additional state-of-the-art model-based variants of
ASHA
, such as
HyperTune
or
DyHPO
. Moreover, these methods can
be configured in many ways, see
this tutorial.
A Comparative Study
It is easy to compare different setups with each other in Syne Tune, be it a number of HPO methods, or the same method on different variations, such as different number of workers, or different configuration spaces. First, we specify which methods to compare with each other:
from syne_tune.experiments.default_baselines import (
RandomSearch,
BayesianOptimization,
ASHA,
MOBSTER,
)
class Methods:
RS = "RS"
BO = "BO"
ASHA = "ASHA"
MOBSTER = "MOBSTER"
methods = {
Methods.RS: lambda method_arguments: RandomSearch(method_arguments),
Methods.BO: lambda method_arguments: BayesianOptimization(method_arguments),
Methods.ASHA: lambda method_arguments: ASHA(method_arguments, type="promotion"),
Methods.MOBSTER: lambda method_arguments: MOBSTER(
method_arguments, type="promotion"
),
}
We compare random search (RS
), Bayesian Optimization (BO
), ASHA
(ASHA
), and MOBSTER (MOBSTER
), deviating from the defaults for
each method only in that we use the promotion (or pause-and-resume)
variant of the latter two. Next, we specify which baselines we would like
to consider in our study:
from typing import Dict
from syne_tune.experiments.benchmark_definitions import RealBenchmarkDefinition
from transformer_wikitext2.code.transformer_wikitext2_definition import (
transformer_wikitext2_benchmark,
)
def benchmark_definitions(
sagemaker_backend: bool = False, **kwargs
) -> Dict[str, RealBenchmarkDefinition]:
return {
"transformer_wikitext2": transformer_wikitext2_benchmark(
sagemaker_backend=sagemaker_backend, **kwargs
),
}
The only benchmark we consider in this study is our transformer_wikitext2
tuning problem, with its default configuration space (in general, many
benchmarks can be selected from
benchmarking.benchmark_definitions.real_benchmark_definitions.real_benchmark_definitions()
).
Our study has the following properties:
We use
LocalBackend
as execution backend, which runsn_workers=4
trials as parallel processes. The AWS instance type isinstance_type="ml.g4dn.12xlarge"
, which provides 4 GPUs, one for each worker.We repeat each experiment 10 times with different random seeds, so that all in all, we run 40 experiments (4 methods, 10 seeds).
These details are specified in scripts
hpo_main.py and
launch_remote.py, which we will
discuss in more detail in Module 2, along with the
choice of the execution backend. Once all experiments have finished (if all of
them are run in parallel, this takes a little more than max_wallclock_time
,
or 5 hours), we can visualize results.
Comparison of methods on transformer_wikitext2 benchmark, using the local backend with 4 workers. |
We can clearly see the benefits coming both from Bayesian optimization
(intelligent rather than random sampling) and multi-fidelity scheduling. A
combination of the two, MOBSTER
, provides both a rapid initial decrease
and the best performance after 5 hours.
Launching Experiments Remotely
As a machine learning practitioner, you operate in a highly competitive landscape. Your success depends to a large extent on whether you can decrease the time to the next decision. In this section, we discuss one important approach, namely how to increase the number of experiments run in parallel.
Note
Imports in our scripts are absolute against the root package
transformer_wikitext2
, so that only the code in
benchmarking.nursery.odsc_tutorial
has to be present. In order to run
them, you need to append <abspath>/odsc_tutorial/
to the PYTHONPATH
environment variable. This is required even if you have installed Syne Tune
from source.
Launching our Study
Here is how we specified and ran experiments of our study. First, we specify a script for launching experiments locally:
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.hpo_main_local import main
if __name__ == "__main__":
main(methods, benchmark_definitions)
This is very simple, as most work is done by the generic
syne_tune.experiments.launchers.hpo_main_local.main()
. Note that hpo_main_local
needs to be chosen, since we use the local backend.
This local launcher script can be used to configure your experiment, given additional command line arguments, as is explained in detail here.
You can use hpo_main.py
to launch experiments locally, but they’ll run
sequentially, one after the other, and you need to have all dependencies
installed locally. A second script is needed in order to launch many
experiments in parallel:
from pathlib import Path
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.launch_remote_local import launch_remote
if __name__ == "__main__":
entry_point = Path(__file__).parent / "hpo_main.py"
source_dependencies = [str(Path(__file__).parent.parent)]
launch_remote(
entry_point=entry_point,
methods=methods,
benchmark_definitions=benchmark_definitions,
source_dependencies=source_dependencies,
)
Once more, all the hard work in done in
syne_tune.experiments.launchers.launch_remote_local.launch_remote()
, where
launch_remote_local
needs to be chosen for the local backend. Most important
is that our previous hpo_main.py
is specified as entry_point
here. Here is
the command to run all experiments of our study in parallel (replace ...
by the
absolute path to odsc_tutorial
):
export PYTHONPATH="${PYTHONPATH}:/.../odsc_tutorial/"
python transformer_wikitext2/local/launch_remote.py \
--experiment_tag odsc-1 --benchmark transformer_wikitext2 --num_seeds 10
This command launches 40 SageMaker training jobs, running 10 random repetitions (seeds) for each of the 4 methods specified in
baselines.py
.Each SageMaker training job uses one
ml.g4dn.12xlarge
AWS instance. You can only run all 40 jobs in parallel if your resource limit for this instance type is 40 or larger. Each training job will run a little longer than 5 hours, as specified bymax_wallclock_time
.You can use
--instance_type
and--max_wallclock_time
command line arguments to change these defaults. However, if you choose an instance type with less than 4 GPUs, the local backend will not be able to run 4 trials in parallel.If
benchmark_definitions.py
defines a single benchmark only, the--benchmark
argument can also be dropped.
When using remote launching, results of your experiments are written to S3, to the default bucket for your AWS account. Once all jobs have finished (which takes a little more than 5 hours if you have sufficient limits, and otherwise longer), you can create the comparative plot shown above, using this script:
from typing import Dict, Any, Optional
import logging
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments import ComparativeResults, PlotParameters
SETUPS = list(methods.keys())
def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
return metadata["algorithm"]
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
experiment_names = ("odsc-1",)
num_runs = 10
download_from_s3 = False # Set ``True`` in order to download files from S3
# Plot parameters across all benchmarks
plot_params = PlotParameters(
xlabel="wall-clock time",
aggregate_mode="iqm_bootstrap",
grid=True,
)
# The creation of ``results`` downloads files from S3 (only if
# ``download_from_s3 == True``), reads the metadata and creates an inverse
# index. If any result files are missing, or there are too many of them,
# warning messages are printed
results = ComparativeResults(
experiment_names=experiment_names,
setups=SETUPS,
num_runs=num_runs,
metadata_to_setup=metadata_to_setup,
plot_params=plot_params,
download_from_s3=download_from_s3,
)
# Create comparative plot (single panel)
benchmark_name = "transformer_wikitext2"
benchmark = benchmark_definitions(sagemaker_backend=False)[benchmark_name]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
ylim=(5, 8),
)
results.plot(
benchmark_name=benchmark_name,
plot_params=plot_params,
file_name=f"./odsc-comparison-local-{benchmark_name}.png",
)
For details about visualization of results in Syne Tune, please consider this tutorial. In a nutshell, this is what happens:
Collect and filter results from all experiments of a study
Group them according to setup (HPO method here), aggregate over seeds
Create plot in which each setup is represented by a curve and confidence bars
Distributed Tuning
The second approach to shorten the time to the next decision is to decrease the time per experiment. This can be done, to some extent, by increasing the number of workers, i.e. the number of trials which are run in parallel. In this section, we show how this can be done.
Note
Imports in our scripts are absolute against the root package
transformer_wikitext2
, so that only the code in
benchmarking.nursery.odsc_tutorial
has to be present. In order to run
them, you need to append <abspath>/odsc_tutorial/
to the PYTHONPATH
environment variable. This is required even if you have installed Syne Tune
from source.
Comparing Different Numbers of Workers
Our study above was done with 4 workers. With the local backend, an experiment with all its workers runs on a single instance. We need to select an instance type with at least 4 GPUs, and each training script can use one of them only.
Syne Tune provides another backend,
SageMakerBackend
, which executes each trial as a
separate SageMaker training job. This allows you to decouple the number of
workers from the instance type. In fact, for this backend, the default
instance type for our benchmark is ml.g4dn.xlarge
, which has a single
GPU and is cheaper to run than ml.g4dn.12xlarge
we used with the
local backend above.
In order to showcase the SageMaker backend, we run a second study in order
to compare our 4 methods RS
, BO
, ASHA
, and MOBSTER
using
a variable number of workers (2, 4, 8). Here, max_wallclock_time
is 5
hours for 4, 8 workers, but double that (10 hours) for 2 workers. Using the
SageMaker backend instead of the local one only requires a minimal change
in the launcher scripts:
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.hpo_main_sagemaker import main
if __name__ == "__main__":
main(methods, benchmark_definitions)
from pathlib import Path
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.launch_remote_sagemaker import launch_remote
if __name__ == "__main__":
entry_point = Path(__file__).parent / "hpo_main.py"
source_dependencies = [str(Path(__file__).parent.parent)]
launch_remote(
entry_point=entry_point,
methods=methods,
benchmark_definitions=benchmark_definitions,
source_dependencies=source_dependencies,
)
We import from hpo_main_sagemaker
and launch_remote_sagemaker
instead
of hpo_main_local
and launch_remote_local
. Here is how the experiments
are launched (replace ...
by the absolute path to odsc_tutorial
):
export PYTHONPATH="${PYTHONPATH}:/.../odsc_tutorial/"
python benchmarking/nursery/odsc_tutorial/transformer_wikitext2/sagemaker/launch_remote.py \
--experiment_tag tmlr-10 --benchmark transformer_wikitext2 \
--random_seed 2938702734 --scale_max_wallclock_time 1 \
--num_seeds 5 --n_workers <n-workers>
Here, <n_workers>
is 2, 4, 8 respectively.
We run 5 random repetitions (seeds), therefore 20 experiments per value of
<n_workers>
.Running the experiments for
<n_workers>
requires a resource limit larger or equal to<n_workers> * 20
for instance typeml.g4dn.xlarge
. If your limit is less than this, you should launch fewer experiments in parallel, since otherwise most of the experiments will not be able to use<n_workers>
workers.With
--scale_max_wallclock_time 1
, we adjustmax_wallclock_time
ifn_workers
is smaller than the default value (4) for our benchmark. In our example, the case--n_workers 2
runs for 10 hours instead of 5.
Once all experiments are finished, with results written to S3, we can create a plot comparing the performance across different numbers of workers, using the following script:
from typing import Dict, Any, Optional
import logging
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments import ComparativeResults, PlotParameters, SubplotParameters
TMLR10_SETUPS = [
"2 workers",
"4 workers",
"8 workers",
]
def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
return f"{metadata['n_workers']} workers"
TMLR10_METHOD_TO_SUBPLOT = {
"RS": 0,
"BO": 1,
"ASHA": 2,
"MOBSTER": 3,
}
def metadata_to_subplot(metadata: dict) -> Optional[int]:
return TMLR10_METHOD_TO_SUBPLOT[metadata["algorithm"]]
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
experiment_names = ("tmlr-10",)
num_runs = 5
download_from_s3 = False # Set ``True`` in order to download files from S3
# Plot parameters across all benchmarks
plot_params = PlotParameters(
xlabel="wall-clock time",
aggregate_mode="iqm_bootstrap",
grid=True,
)
# We would like to have 4 subfigures, one for each method
plot_params.subplots = SubplotParameters(
nrows=2,
ncols=2,
kwargs=dict(sharex="all", sharey="all"),
titles=["RS", "BO", "ASHA", "MOBSTER"],
title_each_figure=True,
legend_no=[0],
)
# The creation of ``results`` downloads files from S3 (only if
# ``download_from_s3 == True``), reads the metadata and creates an inverse
# index. If any result files are missing, or there are too many of them,
# warning messages are printed
results = ComparativeResults(
experiment_names=experiment_names,
setups=TMLR10_SETUPS,
num_runs=num_runs,
metadata_to_setup=metadata_to_setup,
plot_params=plot_params,
metadata_to_subplot=metadata_to_subplot,
download_from_s3=download_from_s3,
)
# Create comparative plot (single panel)
benchmark_name = "transformer_wikitext2"
benchmark = benchmark_definitions(sagemaker_backend=True)[benchmark_name]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
ylim=(5, 8),
)
results.plot(
benchmark_name=benchmark_name,
plot_params=plot_params,
file_name=f"./odsc-comparison-sagemaker-{benchmark_name}.png",
)
For details about visualization of results in Syne Tune, please consider this tutorial. In a nutshell:
Different to the plot above, we have four subplots here, one for each method. In each subplot, we compare results for different numbers of workers.
metadata_to_subplot
configures grouping w.r.t. subplot (depends on method), whilemetadata_to_setup
configures grouping w.r.t. each curve shown in each subplot (depends onn_workers
).
Here is the plot:
Comparison of methods on transformer_wikitext2 benchmark, using the SageMaker backend with 2, 4, 8 workers. |
In general, we obtain good results faster with more workers. However, especially for
BO
andMOBSTER
, the improvements are less pronounced than one might expect.Our results counter a common misconception, that as we go to higher degrees of parallelization of trials, the internals of the HPO method do not matter anymore, and one might as well use random search. This is certainly not the case for our problem, where
BO
with 2 workers attains a better performance after 5 hours thanRS
with 8 workers, at a quarter of the cost.
Drilling Down on Performance Differences
Often, we would like to gain an understanding about why one method performs better than another on a given problem. In this section, we show another type of visualization which can shed some light on this question.
Plotting Learning Curves per Trial
A useful step towards understanding performance differences between setups is to look at the learning curves of trials. Here is a script for creating such plots for the methods compared in our study:
from typing import Dict, Any, Optional
import logging
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments import (
TrialsOfExperimentResults,
PlotParameters,
MultiFidelityParameters,
SubplotParameters,
)
SETUPS = list(methods.keys())
def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
return metadata["algorithm"]
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
experiment_names = ("odsc-1",)
seed_to_plot = 0
download_from_s3 = False # Set ``True`` in order to download files from S3
# Plot parameters across all benchmarks
plot_params = PlotParameters(
xlabel="wall-clock time",
grid=True,
)
# We need to provide details about rung levels of the multi-fidelity methods.
# Also, all methods compared are pause-and-resume
multi_fidelity_params = MultiFidelityParameters(
rung_levels=[1, 3, 9, 27, 40],
multifidelity_setups={"ASHA": True, "MOBSTER": True},
)
# We would like to have 4 subfigures, one for each method
plot_params.subplots = SubplotParameters(
nrows=2,
ncols=2,
kwargs=dict(sharex="all", sharey="all"),
titles=SETUPS,
title_each_figure=True,
)
# The creation of ``results`` downloads files from S3 (only if
# ``download_from_s3 == True``), reads the metadata and creates an inverse
# index. If any result files are missing, or there are too many of them,
# warning messages are printed
results = TrialsOfExperimentResults(
experiment_names=experiment_names,
setups=SETUPS,
metadata_to_setup=metadata_to_setup,
plot_params=plot_params,
multi_fidelity_params=multi_fidelity_params,
download_from_s3=download_from_s3,
)
# Create plot for certain benchmark and seed
benchmark_name = "transformer_wikitext2"
benchmark = benchmark_definitions(sagemaker_backend=True)[benchmark_name]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
)
results.plot(
benchmark_name=benchmark_name,
seed=seed_to_plot,
plot_params=plot_params,
file_name=f"./odsc-learncurves-local-seed{seed_to_plot}.png",
)
Full details about visualization of results in Syne Tune are given in this tutorial. In a nutshell, this is what happens:
The workflow is similar to comparative plots, but here, each setup occupies a different subfigure, and there is no aggregation over seeds (the seed has to be specified in
results.plot
).Two of the methods compared are multi-fidelity (ASHA, MOBSTER), which is why additional information has to be passed as
multi_fidelity_params
. This is because learning curves are plotted differently for single-fidelity, multi-fidelity of early stopping and of pause-and-resume type.With
plot_params.subplots
, we ask for a two-by-two matrix of subfigures. By default, subfigures are oriented as a single row.
Learning curves of trials for different methods on transformer_wikitext2 benchmark, using the local backend with 4 workers. |
Learning curves of different trials are plotted in different colors.
For ASHA and MOBSTER, learning curves are interrupted by pauses at rung levels, and in some cases resume later. Single markers are trials run for a single epoch only.
Comparing RS with BO, we see that BO learns to avoid early mistakes rapidly, while RS samples poorly performing configurations at a constant rate.
Comparing RS with ASHA, we see that ASHA stops poor trials early, so can explore more configurations, but still suffers from repeating mistakes over and over.
Comparing BO with MOBSTER, both clearly learn from the past. However, MOBSTER pauses suboptimal configurations earlier, which allows it to find very good configurations earlier than BO (in about half the time).
With a small modification of the script, we can plot pairs of subfigures for side-by-side comparisons:
from typing import Dict, Any, Optional
import logging
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments import (
TrialsOfExperimentResults,
PlotParameters,
MultiFidelityParameters,
SubplotParameters,
)
SETUPS = list(methods.keys())
def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
return metadata["algorithm"]
if __name__ == "__main__":
logging.getLogger().setLevel(logging.INFO)
experiment_names = ("odsc-1",)
seed_to_plot = 0
download_from_s3 = False # Set ``True`` in order to download files from S3
# Plot parameters across all benchmarks
plot_params = PlotParameters(
xlabel="wall-clock time",
grid=True,
ylim=(5, 13),
)
# We need to provide details about rung levels of the multi-fidelity methods.
# Also, all methods compared are pause-and-resume
multi_fidelity_params = MultiFidelityParameters(
rung_levels=[1, 3, 9, 27, 40],
multifidelity_setups={"ASHA": True, "MOBSTER": True},
)
# The creation of ``results`` downloads files from S3 (only if
# ``download_from_s3 == True``), reads the metadata and creates an inverse
# index. If any result files are missing, or there are too many of them,
# warning messages are printed
results = TrialsOfExperimentResults(
experiment_names=experiment_names,
setups=SETUPS,
metadata_to_setup=metadata_to_setup,
plot_params=plot_params,
multi_fidelity_params=multi_fidelity_params,
download_from_s3=download_from_s3,
)
# Create plots for certain benchmark and seed
benchmark_name = "transformer_wikitext2"
benchmark = benchmark_definitions(sagemaker_backend=True)[benchmark_name]
# These parameters overwrite those given at construction
plot_params = PlotParameters(
metric=benchmark.metric,
mode=benchmark.mode,
)
for indices, name in [
([0, 1], "rs-vs-bo"),
([0, 2], "rs-vs-asha"),
([1, 3], "bo-vs-mobster"),
]:
plot_params.subplots = SubplotParameters(
nrows=1,
ncols=2,
kwargs=dict(sharey="all"),
subplot_indices=indices,
titles=[SETUPS[ind] for ind in indices],
)
results.plot(
benchmark_name=benchmark_name,
seed=seed_to_plot,
plot_params=plot_params,
file_name=f"./odsc-learncurves-{name}-seed{seed_to_plot}.png",
)
Videos featuring Syne Tune
Martin Wistuba: Hyperparameter Optimization for the Impatient (PyData 2023)
David Salinas: Syne Tune: A Library for Large Scale Hyperparameter Tuning and Reproducible Research
API Reference
benchmarking package
Subpackages
benchmarking.benchmark_definitions package
Submodules
benchmarking.benchmark_definitions.distilbert_on_imdb module
benchmarking.benchmark_definitions.finetune_transformer_glue module
- benchmarking.benchmark_definitions.finetune_transformer_glue.finetune_transformer_glue_benchmark(sagemaker_backend=False, choose_model=False, dataset='rte', model_type='bert-base-cased', num_train_epochs=3, train_valid_fraction=0.7, random_seed=31415927, **kwargs)[source]
This benchmark consists of fine-tuning a Hugging Face transformer model, selected from the zoo, on one of the GLUE benchmarks:
Wang etal.GLUE: A Multi-task Benchmark and Analysis Platform for NaturalLanguage UnderstandingICLR 2019- Parameters:
sagemaker_backend (
bool
) – Use SageMaker backend? This affects the choice of instance type. Defaults toFalse
choose_model (
bool
) – Should tuning involve selecting the best pre-trained model fromPRETRAINED_MODELS
? If so, the configuration space is extended by another choice variable. Defaults toFalse
dataset (
str
) – Name of GLUE task, fromTASK2METRICSMODE
. Defaults to “rte”model_type (
str
) – Pre-trained model to be used. Ifchoose_model
is set, this is the model used in the first evaluation. Defaults to “bert-base-cased”num_train_epochs (
int
) – Maximum number of epochs for fine-tuning. Defaults to 3train_valid_fraction (
float
) – The original training set is split into training and validation part, this is the fraction of the training partrandom_seed (
int
) – Random seed for training scriptkwargs – Overwrites default params in
RealBenchmarkDefinition
object returned
- Return type:
- benchmarking.benchmark_definitions.finetune_transformer_glue.finetune_transformer_glue_all_benchmarks(sagemaker_backend=False, model_type='bert-base-cased', num_train_epochs=3, train_valid_fraction=0.7, random_seed=31415927, **kwargs)[source]
- Return type:
Dict
[str
,RealBenchmarkDefinition
]
benchmarking.benchmark_definitions.finetune_transformer_swag module
- benchmarking.benchmark_definitions.finetune_transformer_swag.finetune_transformer_swag_benchmark(sagemaker_backend=False, num_train_epochs=3, per_device_train_batch_size=8, **kwargs)[source]
- Parameters:
sagemaker_backend (
bool
) – Use SageMaker backend? This affects the choice of instance type. Defaults toFalse
num_train_epochs (
int
) – Maximum number of epochs for fine-tuning. Defaults to 3per_device_train_batch_size (
int
) – Batch size per device. Defaults to 8kwargs – Overwrites default params in
RealBenchmarkDefinition
object returned
- Return type:
benchmarking.benchmark_definitions.lstm_wikitext2 module
benchmarking.benchmark_definitions.mlp_on_fashionmnist module
benchmarking.benchmark_definitions.real_benchmark_definitions module
- benchmarking.benchmark_definitions.real_benchmark_definitions.real_benchmark_definitions(sagemaker_backend=False, **kwargs)[source]
- Return type:
Dict
[str
,RealBenchmarkDefinition
]
benchmarking.benchmark_definitions.resnet_cifar10 module
benchmarking.benchmark_definitions.transformer_wikitext2 module
benchmarking.examples package
Subpackages
benchmarking.examples.benchmark_dehb package
Submodules
benchmarking.examples.benchmark_dehb.baselines module
benchmarking.examples.benchmark_dehb.benchmark_definitions module
benchmarking.examples.benchmark_dehb.hpo_main module
benchmarking.examples.benchmark_dehb.launch_remote module
benchmarking.examples.benchmark_dyhpo package
Submodules
benchmarking.examples.benchmark_hypertune package
Submodules
benchmarking.examples.benchmark_hypertune.baselines module
benchmarking.examples.benchmark_hypertune.benchmark_definitions module
benchmarking.examples.benchmark_hypertune.hpo_main module
benchmarking.examples.benchmark_hypertune.launch_remote module
benchmarking.examples.benchmark_hypertune.plot_results module
benchmarking.examples.benchmark_warping package
Submodules
benchmarking.examples.benchmark_warping.baselines module
- class benchmarking.examples.benchmark_warping.baselines.Methods[source]
Bases:
object
- RS = 'RS'
- ASHA = 'ASHA'
- BO = 'BO'
- BO_WARP = 'BO-WARP'
- BO_BOXCOX = 'BO-BOXCOX'
- BO_WARP_BOXCOX = 'BO-WARP-BOXCOX'
- MOBSTER = 'MOBSTER'
- MOBSTER_WARP = 'MOBSTER-WARP'
- MOBSTER_BOXCOX = 'MOBSTER-BOXCOX'
- MOBSTER_WARP_BOXCOX = 'MOBSTER-WARP-BOXCOX'
benchmarking.examples.benchmark_warping.benchmark_definitions module
benchmarking.examples.benchmark_warping.hpo_main module
benchmarking.examples.benchmark_warping.launch_remote module
benchmarking.examples.demo_experiment package
Submodules
benchmarking.examples.fine_tuning_transformer_glue package
Submodules
benchmarking.examples.fine_tuning_transformer_swag package
Submodules
benchmarking.examples.launch_local package
Submodules
benchmarking.examples.launch_sagemaker package
Submodules
benchmarking.training_scripts package
benchmarking.utils package
- benchmarking.utils.get_cost_model_for_batch_size(params, batch_size_key, batch_size_range)[source]
Returns cost model depending on the batch size only.
- Parameters:
params (
Dict
[str
,Any
]) – Command line argumentsbatch_size_key (
str
) – Name of batch size entry in configbatch_size_range (
Tuple
[int
,int
]) – (lower, upper) for batch size, both sides are inclusive
- Returns:
Cost model (or None if dependencies cannot be imported)
- class benchmarking.utils.StoreSearcherStatesCallback[source]
Bases:
TunerCallback
Stores list of searcher states alongside a tuning run. The list is extended by a new state whenever the
TuningJobState
has changed compared to the last recently added one.This callback is useful to create meaningful unit tests, by sampling a given searcher alongside a realistic experiment.
Works only for
ModelBasedSearcher
searchers. For other searchers, nothing is stored.- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
- property states
Submodules
benchmarking.utils.get_cost_model module
- benchmarking.utils.get_cost_model.get_cost_model_for_batch_size(params, batch_size_key, batch_size_range)[source]
Returns cost model depending on the batch size only.
- Parameters:
params (
Dict
[str
,Any
]) – Command line argumentsbatch_size_key (
str
) – Name of batch size entry in configbatch_size_range (
Tuple
[int
,int
]) – (lower, upper) for batch size, both sides are inclusive
- Returns:
Cost model (or None if dependencies cannot be imported)
benchmarking.utils.launch_sample_searcher_states module
This script launches an experiment for the purpose of sampling searcher states, which can then be used in unit tests.
benchmarking.utils.searcher_state_callback module
- class benchmarking.utils.searcher_state_callback.StoreSearcherStatesCallback[source]
Bases:
TunerCallback
Stores list of searcher states alongside a tuning run. The list is extended by a new state whenever the
TuningJobState
has changed compared to the last recently added one.This callback is useful to create meaningful unit tests, by sampling a given searcher alongside a realistic experiment.
Works only for
ModelBasedSearcher
searchers. For other searchers, nothing is stored.- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
- property states
setup module
syne_tune package
- class syne_tune.StoppingCriterion(max_wallclock_time=None, max_num_evaluations=None, max_num_trials_started=None, max_num_trials_completed=None, max_cost=None, max_num_trials_finished=None, min_metric_value=None, max_metric_value=None)[source]
Bases:
object
Stopping criterion that can be used in a Tuner, for instance
Tuner(stop_criterion=StoppingCriterion(max_wallclock_time=3600), ...)
.If several arguments are used, the combined criterion is true whenever one of the atomic criteria is true.
In principle,
stop_criterion
forTuner
can be any lambda function, but this class should be used with remote launching in order to ensure proper serialization.- Parameters:
max_wallclock_time (
Optional
[float
]) – Stop once this wallclock time is reachedmax_num_evaluations (
Optional
[int
]) – Stop once more than this number of metric records have been reportedmax_num_trials_started (
Optional
[int
]) – Stop once more than this number of trials have been startedmax_num_trials_completed (
Optional
[int
]) – Stop once more than this number of trials have been completed. This does not include trials which were stopped or failedmax_cost (
Optional
[float
]) – Stop once total cost of evaluations larger than this valuemax_num_trials_finished (
Optional
[int
]) – Stop once more than this number of trials have finished (i.e., completed, stopped, failed, or stopping)min_metric_value (
Optional
[Dict
[str
,float
]]) – Dictionary with thresholds for selected metrics. Stop once an evaluation reports a metric value below a thresholdmax_metric_value (
Optional
[Dict
[str
,float
]]) – Dictionary with thresholds for selected metrics. Stop once an evaluation reports a metric value above a threshold
-
max_wallclock_time:
float
= None
-
max_num_evaluations:
int
= None
-
max_num_trials_started:
int
= None
-
max_num_trials_completed:
int
= None
-
max_cost:
float
= None
-
max_num_trials_finished:
int
= None
-
min_metric_value:
Optional
[Dict
[str
,float
]] = None
-
max_metric_value:
Optional
[Dict
[str
,float
]] = None
- class syne_tune.Tuner(trial_backend, scheduler, stop_criterion, n_workers, sleep_time=5.0, results_update_interval=10.0, print_update_interval=30.0, max_failures=1, tuner_name=None, asynchronous_scheduling=True, wait_trial_completion_when_stopping=False, callbacks=None, metadata=None, suffix_tuner_name=True, save_tuner=True, start_jobs_without_delay=True, trial_backend_path=None)[source]
Bases:
object
Controller of tuning loop, manages interplay between scheduler and trial backend. Also, stopping criterion and number of workers are maintained here.
- Parameters:
trial_backend (
TrialBackend
) – Backend for trial evaluationsscheduler (
TrialScheduler
) – Tuning algorithm for making decisions about which trials to start, stop, pause, or resumestop_criterion (
Callable
[[TuningStatus
],bool
]) – Tuning stops when this predicates returnsTrue
. Called in each iteration with the current tuning status. It is recommended to useStoppingCriterion
.n_workers (
int
) – Number of workers used here. Note that the backend needs to support (at least) this number of workers to be run in parallelsleep_time (
float
) – Time to sleep when all workers are busy. Defaults toDEFAULT_SLEEP_TIME
results_update_interval (
float
) – Frequency at which results are updated and stored (in seconds). Defaults to 10.print_update_interval (
float
) – Frequency at which result table is printed. Defaults to 30.max_failures (
int
) – This many trial execution failures are allowed before the tuning loop is aborted. Defaults to 1tuner_name (
Optional
[str
]) – Name associated with the tuning experiment, default to the name of the entrypoint. Must consists of alpha-digits characters, possibly separated by ‘-’. A postfix with a date time-stamp is added to ensure uniqueness.asynchronous_scheduling (
bool
) – Whether to use asynchronous scheduling when scheduling new trials. IfTrue
, trials are scheduled as soon as a worker is available. IfFalse
, the tuner waits that all trials are finished before scheduling a new batch of sizen_workers
. Default toTrue
.wait_trial_completion_when_stopping (
bool
) – How to deal with running trials when stopping criterion is met. IfTrue
, the tuner waits until all trials are finished. IfFalse
, all trials are terminated. Defaults toFalse
.callbacks (
Optional
[List
[TunerCallback
]]) – Called at certain times in the tuning loop, for example when a result is seen. The default callback stores results everyresults_update_interval
.metadata (
Optional
[dict
]) – Dictionary of user-metadata that will be persisted in{tuner_path}/{ST_METADATA_FILENAME}
, in addition to metadata provided by the user.SMT_TUNER_CREATION_TIMESTAMP
is always included which measures the time-stamp when the tuner started to run.suffix_tuner_name (
bool
) – IfTrue
, a timestamp is appended to the providedtuner_name
that ensures uniqueness, otherwise the name is left unchanged and is expected to be unique. Defaults toTrue
.save_tuner (
bool
) – IfTrue
, theTuner
object is serialized at the end of tuning, including its dependencies (e.g., scheduler). This allows all details of the experiment to be recovered. Defaults toTrue
.start_jobs_without_delay (
bool
) –Defaults to
True
. If this isTrue
, the tuner starts new jobs depending on scheduler decisions communicated to the backend. For example, if a trial has just been stopped (by callingbackend.stop_trial
), the tuner may start a new one immediately, even if the SageMaker training job is still busy due to stopping delays. This can lead to faster experiment runtime, because the backend is temporarily going over its budget.If set to
False
, the tuner always asks the backend for the number of busy workers, which guarantees that we never go over then_workers
budget. This makes a difference for backends where stopping or pausing trials is not immediate (e.g.,SageMakerBackend
). Not going over budget means thatn_workers
can be set up to the available quota, without running the risk of an exception due to the quota being exceeded. If you get such exceptions, we recommend to usestart_jobs_without_delay=False
. Also, if the SageMaker warm pool feature is used, it is recommended to setstart_jobs_without_delay=False
, since otherwise more thann_workers
warm pools will be started, because existing ones are busy with stopping when they should be reassigned.trial_backend_path (
Optional
[str
]) –If this is given, the path of
trial_backend
(where logs and checkpoints of trials are stored) is set to this. Otherwise, it is set toself.tuner_path
, so that per-trial information is written to the same path as tuning results.If the backend is
LocalBackend
and the experiment is ru remotely, we recommend to set this, since otherwise checkpoints and logs are synced to S3, along with tuning results, which is costly and error-prone.
- best_config(metric=0)[source]
- Parameters:
metric (
Union
[str
,int
,None
]) – Indicates which metric to use, can be the index or a name of the metric. default to 0 - first metric defined in the Scheduler- Return type:
Tuple
[int
,Dict
[str
,Any
]]- Returns:
the best configuration found while tuning for the metric given and the associated trial-id
- class syne_tune.Reporter(add_time=True, add_cost=True)[source]
Bases:
object
Callback for reporting metric values from a training script back to Syne Tune. Example:
from syne_tune import Reporter report = Reporter() for epoch in range(1, epochs + 1): # ... report(epoch=epoch, accuracy=accuracy)
- Parameters:
add_time (
bool
) – If True (default), the time (in secs) since creation of theReporter
object is reported automatically asST_WORKER_TIME
add_cost (
bool
) – If True (default), estimated dollar cost since creation ofReporter
object is reported automatically asST_WORKER_COST
. This is available for SageMaker backend only. Requiresadd_time=True
.
-
add_time:
bool
= True
-
add_cost:
bool
= True
Subpackages
syne_tune.backend package
- class syne_tune.backend.LocalBackend(entry_point, delete_checkpoints=False, pass_args_as_json=False, rotate_gpus=True, num_gpus_per_trial=1, gpus_to_use=None)[source]
Bases:
TrialBackend
A backend running locally by spawning sub-process concurrently. Note that no resource management is done so the concurrent number of trials should be adjusted to the machine capacity.
Additional arguments on top of parent class
TrialBackend
:- Parameters:
entry_point (
str
) – Path to Python main file to be tunedrotate_gpus (
bool
) – In case several GPUs are present, each trial is scheduled on a different GPU. A new trial is preferentially scheduled on a free GPU, and otherwise the GPU with least prior assignments is chosen. IfFalse
, then all GPUs are used at the same time for all trials. Defaults toTrue
.num_gpus_per_trial (
int
) – Number of GPUs to be allocated to each trial. Must be not larger than the total number of GPUs available. Defaults to 1gpus_to_use (
Optional
[List
[int
]]) – If this is given, the backend only uses GPUs in this lists (non-negative ints). Entries must be inrange(get_num_gpus())
. Defaults to using all GPUs.
- trial_path(trial_id)[source]
- Parameters:
trial_id (
int
) – ID of trial- Return type:
Path
- Returns:
Directory where files related to trial are written to
- checkpoint_trial_path(trial_id)[source]
- Parameters:
trial_id (
int
) – ID of trial- Return type:
Path
- Returns:
Directory where checkpoints for trial are written to and read from
- copy_checkpoint(src_trial_id, tgt_trial_id)[source]
Copy the checkpoint folder from one trial to the other.
- Parameters:
src_trial_id (
int
) – Source trial ID (copy from)tgt_trial_id (
int
) – Target trial ID (copy to)
- delete_checkpoint(trial_id)[source]
Removes checkpoint folder for a trial. It is OK for the folder not to exist.
- Parameters:
trial_id (
int
) – ID of trial for which checkpoint files are deleted
- busy_trial_ids()[source]
Returns list of ids for currently busy trials
A trial is busy if its status is
in_progress
orstopping
. If the execution setup is able to runn_workers
jobs in parallel, then if this method returns a list of sizen
, the tuner may startn_workers - n
new jobs.- Return type:
List
[Tuple
[int
,str
]]- Returns:
List of
(trial_id, status)
- stdout(trial_id)[source]
Fetch
stdout
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stdout)
- stderr(trial_id)[source]
Fetch
stderr
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stderr)
- set_path(results_root=None, tuner_name=None)[source]
- Parameters:
results_root (
Optional
[str
]) – The local folder that should contain the results of the tuning experiment. Used byTuner
to indicate a desired path where the results should be written to. This is used to unify the location of backend files andTuner
results when possible (in the local backend). By default, the backend does not do anything since not all backends may be able to unify their file locations.tuner_name (
Optional
[str
]) – Name of the tuner, can be used for instance to save checkpoints on remote storage.
- class syne_tune.backend.PythonBackend(tune_function, config_space, rotate_gpus=True, delete_checkpoints=False)[source]
Bases:
LocalBackend
A backend that supports the tuning of Python functions (if you rather want to tune an endpoint script such as “train.py”, then you should use
LocalBackend
). The functiontune_function
should be serializable, should not reference any global variable or module and should have as arguments a subset of the keys ofconfig_space
. When deserializing, a md5 is checked to ensure consistency.For instance, the following function is a valid way of defining a backend on top of a simple function:
from syne_tune.backend import PythonBackend from syne_tune.config_space import uniform def f(x, epochs): import logging import time from syne_tune import Reporter root = logging.getLogger() root.setLevel(logging.DEBUG) reporter = Reporter() for i in range(epochs): reporter(epoch=i + 1, y=x + i) config_space = { "x": uniform(-10, 10), "epochs": 5, } backend = PythonBackend(tune_function=f, config_space=config_space)
See
examples/launch_height_python_backend.py
for a complete example.Additional arguments on top of parent class
LocalBackend
:- Parameters:
tune_function (
Callable
) – Python function to be tuned. The function must call Syne Tune reporter to report metrics and be serializable, imports should be performed inside the function body.config_space (
Dict
[str
,object
]) – Configuration space corresponding to arguments oftune_function
- property tune_function_path: Path
- set_path(results_root=None, tuner_name=None)[source]
- Parameters:
results_root (
Optional
[str
]) – The local folder that should contain the results of the tuning experiment. Used byTuner
to indicate a desired path where the results should be written to. This is used to unify the location of backend files andTuner
results when possible (in the local backend). By default, the backend does not do anything since not all backends may be able to unify their file locations.tuner_name (
Optional
[str
]) – Name of the tuner, can be used for instance to save checkpoints on remote storage.
- class syne_tune.backend.SageMakerBackend(sm_estimator, metrics_names=None, s3_path=None, delete_checkpoints=False, pass_args_as_json=False, **sagemaker_fit_kwargs)[source]
Bases:
TrialBackend
This backend executes each trial evaluation as a separate SageMaker training job, using
sm_estimator
as estimator.Checkpoints are written to and loaded from S3, using
checkpoint_s3_uri
of the estimator.Compared to
LocalBackend
, this backend can run any number of jobs in parallel (given sufficient resources), and any instance type can be used.This backend allows to select the instance type and count for a trial evaluation, by passing values in the configuration, using names
ST_INSTANCE_TYPE
andST_INSTANCE_COUNT
. If these are given in the configuration, they overwrite the default insm_estimator
. This allows for tuning instance type and count along with the hyperparameter configuration.Additional arguments on top of parent class
TrialBackend
:- Parameters:
sm_estimator (
Framework
) – SageMaker estimator for trial evaluations.metrics_names (
Optional
[List
[str
]]) – Names of metrics passed toreport
, used to plot live curve in SageMaker (optional, only used for visualization)s3_path (
Optional
[str
]) – S3 base path used for checkpointing. The full path also involves the tuner name and thetrial_id
. The default base path is the S3 bucket associated with the SageMaker accountsagemaker_fit_kwargs – Extra arguments that passed to
sagemaker.estimator.Framework
when fitting the job, for instance{'train': 's3://my-data-bucket/path/to/my/training/data'}
- property sm_client
- busy_trial_ids()[source]
Returns list of ids for currently busy trials
A trial is busy if its status is
in_progress
orstopping
. If the execution setup is able to runn_workers
jobs in parallel, then if this method returns a list of sizen
, the tuner may startn_workers - n
new jobs.- Return type:
List
[Tuple
[int
,str
]]- Returns:
List of
(trial_id, status)
- stdout(trial_id)[source]
Fetch
stdout
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stdout)
- stderr(trial_id)[source]
Fetch
stderr
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stderr)
- property source_dir: str | None
- set_entrypoint(entry_point)[source]
Update the entrypoint.
- Parameters:
entry_point (
str
) – New path of the entrypoint.
- copy_checkpoint(src_trial_id, tgt_trial_id)[source]
Copy the checkpoint folder from one trial to the other.
- Parameters:
src_trial_id (
int
) – Source trial ID (copy from)tgt_trial_id (
int
) – Target trial ID (copy to)
- delete_checkpoint(trial_id)[source]
Removes checkpoint folder for a trial. It is OK for the folder not to exist.
- Parameters:
trial_id (
int
) – ID of trial for which checkpoint files are deleted
Subpackages
syne_tune.backend.python_backend package
Submodules
syne_tune.backend.python_backend.python_backend module
- class syne_tune.backend.python_backend.python_backend.PythonBackend(tune_function, config_space, rotate_gpus=True, delete_checkpoints=False)[source]
Bases:
LocalBackend
A backend that supports the tuning of Python functions (if you rather want to tune an endpoint script such as “train.py”, then you should use
LocalBackend
). The functiontune_function
should be serializable, should not reference any global variable or module and should have as arguments a subset of the keys ofconfig_space
. When deserializing, a md5 is checked to ensure consistency.For instance, the following function is a valid way of defining a backend on top of a simple function:
from syne_tune.backend import PythonBackend from syne_tune.config_space import uniform def f(x, epochs): import logging import time from syne_tune import Reporter root = logging.getLogger() root.setLevel(logging.DEBUG) reporter = Reporter() for i in range(epochs): reporter(epoch=i + 1, y=x + i) config_space = { "x": uniform(-10, 10), "epochs": 5, } backend = PythonBackend(tune_function=f, config_space=config_space)
See
examples/launch_height_python_backend.py
for a complete example.Additional arguments on top of parent class
LocalBackend
:- Parameters:
tune_function (
Callable
) – Python function to be tuned. The function must call Syne Tune reporter to report metrics and be serializable, imports should be performed inside the function body.config_space (
Dict
[str
,object
]) – Configuration space corresponding to arguments oftune_function
- property tune_function_path: Path
- set_path(results_root=None, tuner_name=None)[source]
- Parameters:
results_root (
Optional
[str
]) – The local folder that should contain the results of the tuning experiment. Used byTuner
to indicate a desired path where the results should be written to. This is used to unify the location of backend files andTuner
results when possible (in the local backend). By default, the backend does not do anything since not all backends may be able to unify their file locations.tuner_name (
Optional
[str
]) – Name of the tuner, can be used for instance to save checkpoints on remote storage.
syne_tune.backend.python_backend.python_entrypoint module
An entry point that loads a serialized function from PythonBackend
and executes it with the provided hyperparameter.
The md5 hash of the file is first checked before executing the deserialized function.
syne_tune.backend.sagemaker_backend package
Submodules
syne_tune.backend.sagemaker_backend.custom_framework module
- class syne_tune.backend.sagemaker_backend.custom_framework.CustomFramework(entry_point, image_uri, source_dir=None, hyperparameters=None, **kwargs)[source]
Bases:
Framework
- LATEST_VERSION = '0.1'
- create_model(model_server_workers=None, role=None, vpc_config_override='VPC_CONFIG_DEFAULT')[source]
Create a SageMaker
Model
object that can be deployed to anEndpoint
.- Args:
- **kwargs: Keyword arguments used by the implemented method for
creating the
Model
.
- Returns:
sagemaker.model.Model: A SageMaker
Model
object. SeeModel()
for full details.
syne_tune.backend.sagemaker_backend.instance_info module
- class syne_tune.backend.sagemaker_backend.instance_info.InstanceInfo(name, num_cpu, num_gpu, cost_per_hour)[source]
Bases:
object
-
name:
str
-
num_cpu:
int
-
num_gpu:
int
-
cost_per_hour:
float
-
name:
- class syne_tune.backend.sagemaker_backend.instance_info.InstanceInfos[source]
Bases:
object
Utility to get information of an instance type (num cpu/gpu, cost per hour).
- syne_tune.backend.sagemaker_backend.instance_info.select_instance_type(min_gpu=0, max_gpu=16, min_cost_per_hour=None, max_cost_per_hour=None)[source]
- Parameters:
min_gpu (
int
) –max_gpu (
int
) –min_cost_per_hour (
Optional
[float
]) –max_cost_per_hour (
Optional
[float
]) –
- Return type:
List
[str
]- Returns:
a list of instance type that met the required constrain on minimum/maximum number of GPU and
minimum/maximum cost per hour.
syne_tune.backend.sagemaker_backend.sagemaker_backend module
- class syne_tune.backend.sagemaker_backend.sagemaker_backend.SageMakerBackend(sm_estimator, metrics_names=None, s3_path=None, delete_checkpoints=False, pass_args_as_json=False, **sagemaker_fit_kwargs)[source]
Bases:
TrialBackend
This backend executes each trial evaluation as a separate SageMaker training job, using
sm_estimator
as estimator.Checkpoints are written to and loaded from S3, using
checkpoint_s3_uri
of the estimator.Compared to
LocalBackend
, this backend can run any number of jobs in parallel (given sufficient resources), and any instance type can be used.This backend allows to select the instance type and count for a trial evaluation, by passing values in the configuration, using names
ST_INSTANCE_TYPE
andST_INSTANCE_COUNT
. If these are given in the configuration, they overwrite the default insm_estimator
. This allows for tuning instance type and count along with the hyperparameter configuration.Additional arguments on top of parent class
TrialBackend
:- Parameters:
sm_estimator (
Framework
) – SageMaker estimator for trial evaluations.metrics_names (
Optional
[List
[str
]]) – Names of metrics passed toreport
, used to plot live curve in SageMaker (optional, only used for visualization)s3_path (
Optional
[str
]) – S3 base path used for checkpointing. The full path also involves the tuner name and thetrial_id
. The default base path is the S3 bucket associated with the SageMaker accountsagemaker_fit_kwargs – Extra arguments that passed to
sagemaker.estimator.Framework
when fitting the job, for instance{'train': 's3://my-data-bucket/path/to/my/training/data'}
- property sm_client
- busy_trial_ids()[source]
Returns list of ids for currently busy trials
A trial is busy if its status is
in_progress
orstopping
. If the execution setup is able to runn_workers
jobs in parallel, then if this method returns a list of sizen
, the tuner may startn_workers - n
new jobs.- Return type:
List
[Tuple
[int
,str
]]- Returns:
List of
(trial_id, status)
- stdout(trial_id)[source]
Fetch
stdout
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stdout)
- stderr(trial_id)[source]
Fetch
stderr
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stderr)
- property source_dir: str | None
- set_entrypoint(entry_point)[source]
Update the entrypoint.
- Parameters:
entry_point (
str
) – New path of the entrypoint.
- copy_checkpoint(src_trial_id, tgt_trial_id)[source]
Copy the checkpoint folder from one trial to the other.
- Parameters:
src_trial_id (
int
) – Source trial ID (copy from)tgt_trial_id (
int
) – Target trial ID (copy to)
- delete_checkpoint(trial_id)[source]
Removes checkpoint folder for a trial. It is OK for the folder not to exist.
- Parameters:
trial_id (
int
) – ID of trial for which checkpoint files are deleted
syne_tune.backend.sagemaker_backend.sagemaker_utils module
- syne_tune.backend.sagemaker_backend.sagemaker_utils.default_config()[source]
https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-python-throttlingexception/
- Return type:
Config
- Returns:
Default config which avoids throttling
- syne_tune.backend.sagemaker_backend.sagemaker_utils.get_log(jobname, log_client=None)[source]
- Parameters:
jobname (
str
) – name of a sagemaker training joblog_client – a log client, for instance
boto3.client('logs')
if None, the client is instantiated with the
default AWS configuration :rtype:
List
[str
] :return: lines appearing in the log of the Sagemaker training job
- syne_tune.backend.sagemaker_backend.sagemaker_utils.sagemaker_search(trial_ids_and_names, sm_client=None, log_client=None)[source]
- Parameters:
trial_ids_and_names (
List
[Tuple
[int
,str
]]) – Trial ids and sagemaker jobnames to retrieve information fromsm_client – Sagemaker client used to search for jobs
sm_client – Log client used to query lob logs
- Return type:
List
[TrialResult
]- Returns:
list of dictionary containing job information (status, creation-time, metrics, hyperparameters etc).
In term of speed around 100 jobs can be retrieved per second.
- syne_tune.backend.sagemaker_backend.sagemaker_utils.metric_definitions_from_names(metrics_names)[source]
- Parameters:
metrics_names (
List
[str
]) – names of the metrics present in the log.
Metrics must be written in the log as [metric-name]: value, for instance [accuracy]: 0.23 :return: a list of metric dictionaries that can be passed to sagemaker so that metrics are parsed from logs, the list can be passed to
metric_definitions
in sagemaker.
- syne_tune.backend.sagemaker_backend.sagemaker_utils.add_metric_definitions_to_sagemaker_estimator(estimator, metrics_names)[source]
Adds metric definitions according to
metric_definitions_from_names()
toestimator
for each name inmetrics_names
. The regexp for each name is compatible with howReporter
outputs metric values.- Parameters:
estimator (
EstimatorBase
) – SageMaker estimatormetrics_names (
List
[str
]) – Names of metrics to be appended
- syne_tune.backend.sagemaker_backend.sagemaker_utils.sagemaker_fit(sm_estimator, hyperparameters, checkpoint_s3_uri=None, wait=False, job_name=None, *sagemaker_fit_args, **sagemaker_fit_kwargs)[source]
- Parameters:
sm_estimator (
Framework
) – sagemaker estimator to be fittedhyperparameters (
Dict
[str
,object
]) – dictionary of hyperparameters that are passed toentry_point_script
checkpoint_s3_uri (
Optional
[str
]) – checkpoint_s3_uri of Sagemaker Estimatorwait (
bool
) – whether to wait for job completionmetrics_names – names of metrics to track reported with
report.py
. In case those metrics are passed, their
learning curves will be shown in Sagemaker console. :return: name of sagemaker job
- syne_tune.backend.sagemaker_backend.sagemaker_utils.get_execution_role()[source]
- Returns:
sagemaker execution role that is specified with the environment variable
AWS_ROLE
, if not specified then
we infer it by searching for the role associated to Sagemaker. Note that
import sagemaker; sagemaker.get_execution_role()
does not return the right role outside of a Sagemaker notebook.
- syne_tune.backend.sagemaker_backend.sagemaker_utils.download_sagemaker_results(s3_path=None)[source]
Download results obtained after running tuning remotely on Sagemaker, e.g. when using
RemoteLauncher
.
- syne_tune.backend.sagemaker_backend.sagemaker_utils.map_identifier_limited_length(name, max_length=63, rnd_digits=4)[source]
If
name
is longer than ‘max_length`` characters, it is mapped to a new identifier of lengthmax_length
, being the concatenation of the firstmax_length - rnd_digits
characters ofname
, followed by a random string of lengthhash_digits
.- Parameters:
name (
str
) – Identifier to be limited in lengthmax_length (
int
) – Maximum length for outputrnd_digits (
int
) – See above
- Return type:
str
- Returns:
See above
- syne_tune.backend.sagemaker_backend.sagemaker_utils.s3_copy_objects_recursively(s3_source_path, s3_target_path)[source]
Recursively copies objects from
s3_source_path
tos3_target_path
.We return a dict with ‘num_action_calls’, ‘num_successful_action_calls’, ‘first_error_message’ (the error message for the first failed
action
call encountered).Note
This function should not be used to copy a large number of objects, as it is rather slow (one API call for object)
- Parameters:
s3_source_path (
str
) –s3_target_path (
str
) –
- Return type:
Dict
[str
,Any
]- Returns:
See above
- syne_tune.backend.sagemaker_backend.sagemaker_utils.s3_delete_objects_recursively(s3_path)[source]
Recursively deletes objects from
s3_path
.We return a dict with ‘num_action_calls’, ‘num_successful_action_calls’, ‘first_error_message’ (the error message for the first failed
action
call encountered).Note
This function should not be used to delete a large number of objects, as it is rather slow (one API call for object)
- Parameters:
s3_path (
str
) –- Return type:
Dict
[str
,Any
]- Returns:
See above
- syne_tune.backend.sagemaker_backend.sagemaker_utils.s3_download_files_recursively(s3_source_path, target_path, valid_postfixes=None)[source]
Recursively downloads objects from
s3_source_path
and stores them locally as files belowtarget_path
We return a dict with ‘num_action_calls’, ‘num_successful_action_calls’, ‘first_error_message’ (the error message for the first failed
action
call encountered).If
valid_postfixes
is given, only such objects are downloaded for whichobject_key.endswith(postfix)
for somepostfix in valid_postfixes
.Note
This function should not be used to download a large number of objects, as it is rather slow (one API call for object). In this case, running
aws s3 sync
can be much faster.- Parameters:
s3_source_path (
str
) – See abovetarget_path (
str
) – See abovevalid_postfixes (
Optional
[List
[str
]]) – See above, optional
- Return type:
Dict
[str
,Any
]- Returns:
See above
- syne_tune.backend.sagemaker_backend.sagemaker_utils.backend_path_not_synced_to_s3()[source]
When an experiment with the local backend is run remotely (as SageMaker training job), we do not want checkpoints to be synced to S3, since this is expensive and error-prone (since several trials may write checkpoints at the same time). Pass the returned path to
trial_backend_path
when constructing the :class`~syne_tune.Tuner`.Here, we direct checkpoint writing to /opt/ml/input/data/, which is mounted on a partition with sufficient space. Different to /opt/ml/checkpoints, this directory is not synced to S3.
- Return type:
Path
- Returns:
Path to set in local backend
syne_tune.backend.simulator_backend package
Submodules
syne_tune.backend.simulator_backend.events module
- class syne_tune.backend.simulator_backend.events.Event(trial_id)[source]
Bases:
object
Base class for events dealt with in the simulator.
-
trial_id:
int
-
trial_id:
- class syne_tune.backend.simulator_backend.events.StartEvent(trial_id)[source]
Bases:
Event
Start training evaluation function for
trial_id
. In fact, the function is run completely, andOnTrialResultEvent
events and oneCompleteEvent
are generated.-
trial_id:
int
-
trial_id:
- class syne_tune.backend.simulator_backend.events.CompleteEvent(trial_id, status)[source]
Bases:
Event
Job for trial
trial_id
completes with statusstatus
. This is registered at the backend.-
status:
str
-
status:
- class syne_tune.backend.simulator_backend.events.StopEvent(trial_id)[source]
Bases:
Event
Job for trial
trial_id
is stopped. This leads to all later events fortrial_id
to be deleted, and a newCompleteEvent
.-
trial_id:
int
-
trial_id:
- class syne_tune.backend.simulator_backend.events.OnTrialResultEvent(trial_id, result)[source]
Bases:
Event
Result reported by some worker arrives at the backend and is registered there.
-
result:
Dict
[str
,Any
]
-
result:
- class syne_tune.backend.simulator_backend.events.SimulatorState(event_heap=None, events_added=0)[source]
Bases:
object
Maintains the state of the simulator, in particular the event heap.
event_heap
is the priority queue for events, the key being(time, cnt)
, wheretime
is the event time, andcnt
is a non-negative int used to break ties. When an event is added, thecnt
value is taken fromevents_added
. This means that ties are broken first_in_first_out.- push(event, event_time)[source]
Push new event onto heap
- Parameters:
event (
Event
) –event_time (
float
) –
syne_tune.backend.simulator_backend.simulator_backend module
- class syne_tune.backend.simulator_backend.simulator_backend.SimulatorConfig(delay_on_trial_result=0.05, delay_complete_after_final_report=0.05, delay_complete_after_stop=0.05, delay_start=0.05, delay_stop=0.05)[source]
Bases:
object
Configures the simulator:
- Parameters:
delay_on_trial_result (
float
) – Time fromreport
called on worker to result registered at backend, defaults toDEFAULT_DELAY
delay_complete_after_final_report (
float
) – Time from finalreport
called on worker to job completion being registered at backend. Defaults toDEFAULT_DELAY
delay_complete_after_stop (
float
) – Time from stop signal received at worker to job completion being registered at backend. Defaults toDEFAULT_DELAY
delay_start (
float
) – Time from start command being sent at backend and job starting on the worker (which is free). Defaults toDEFAULT_DELAY
delay_stop (
float
) – Time from stop signal being sent at backend to signal received at worker (which is running). Defaults toDEFAULT_DELAY
-
delay_on_trial_result:
float
= 0.05
-
delay_complete_after_final_report:
float
= 0.05
-
delay_complete_after_stop:
float
= 0.05
-
delay_start:
float
= 0.05
-
delay_stop:
float
= 0.05
- class syne_tune.backend.simulator_backend.simulator_backend.SimulatorBackend(entry_point, elapsed_time_attr, simulator_config=None, tuner_sleep_time=5.0, debug_resource_attr=None)[source]
Bases:
LocalBackend
This simulator backend drives experiments with tabulated training evaluation functions, which return their computation time rather than spend it. To this end, time (on the tuning instance) is simulated using a
time_keeper
and an event priority queue in_simulator_state
.Time is advanced both by
run()
waiting, and by non-negligible computations during the tuning loop (in particular, we take care ofscheduler.suggest
andscheduler.on_trial_result
there).When the
entry_point
script is executed, we wait for all results to be returned. In each result, the value for keyelapsed_time_attr
contains the time since start of the script. These values are used to place worker events on the simulated timeline (represented bysimulator_state
). NOTE: If a trial is resumed, the elapsed_time value contains the time since start of the last recent resume, NOT the cumulative time used by the trial.Each method call starts by advancing time by what was spent outside, since the last recent call to the backend. Then, all events in
simulator_state
are processed whose time is before the current time intime_keeper
. The method ends bytime_keeper.mark_exit()
.Note
In this basic version of the simulator backend, we still call a Python main function as a subprocess, which returns the requested metrics by looking them up or running a surrogate. This is flexible, but has the overhead of loading a table at every call. For fast and convenient simulations, use :
BlackboxRepositoryBackend
after bringing your tabulated data or surrogate benchmark into the blackbox repository.- Parameters:
entry_point (
str
) – Python main file to be tuned (this should return all results directly, and report elapsed time in theelapsed_time_attr
fieldelapsed_time_attr (
str
) – See abovesimulator_config (
Optional
[SimulatorConfig
]) – Parameters for simulator, optionaltuner_sleep_time (
float
) – Effective sleep time inrun()
. This information is needed inSimulatorCallback
. Defaults toDEFAULT_SLEEP_TIME
- property time_keeper: SimulatedTimeKeeper
- start_trial(config, checkpoint_trial_id=None)[source]
Start new trial with new trial ID
- Parameters:
config (
Dict
) – Configuration for new trialcheckpoint_trial_id (
Optional
[int
]) – If given, the new trial starts from the checkpoint written by this previous trial
- Return type:
- Returns:
New trial, which includes new trial ID
- fetch_status_results(trial_ids)[source]
- Parameters:
trial_ids (
List
[int
]) – Trials whose information should be fetched.- Return type:
(
Dict
[int
,Tuple
[Trial
,str
]],List
[Tuple
[int
,dict
]])- Returns:
A tuple containing 1) a dictionary from trial-id to Trial and status information; 2) a list of (trial-id, results) pairs for each new result emitted since the last call. The list of results is sorted by the worker time-stamp.
- busy_trial_ids()[source]
Returns list of ids for currently busy trials
A trial is busy if its status is
in_progress
orstopping
. If the execution setup is able to runn_workers
jobs in parallel, then if this method returns a list of sizen
, the tuner may startn_workers - n
new jobs.- Return type:
List
[Tuple
[int
,str
]]- Returns:
List of
(trial_id, status)
syne_tune.backend.simulator_backend.simulator_callback module
- class syne_tune.backend.simulator_backend.simulator_callback.SimulatorCallback(extra_results_composer=None)[source]
Bases:
StoreResultsCallback
Callback to be used in
run()
in order to support theSimulatorBackend
.This is doing two things. First,
on_tuning_sleep()
is advancing thetime_keeper
of the simulator backend bytuner_sleep_time
(also defined in the backend). The real sleep time inTuner
must be 0.Second, we need to make sure that results written out are annotated by simulated time, not real time. This is already catered for by
SimulatorBackend
addingST_TUNER_TIME
entries to each result it receives.Third (and most subtle), we need to make sure the stop criterion in
run()
is using simulated time instead of real time when making a decision based onmax_wallclock_time
. By default,StoppingCriterion
takesTuningStatus
as an input, which counts real time and knows nothing about simulated time. To this end, we modifystop_criterion
of the tuner to instead depend on theST_TUNER_TIME
fields in the results received. This allows us to keep bothTuner
andTuningStatus
independent of the time keeper.- Parameters:
extra_results_composer (
Optional
[ExtraResultsComposer
]) – Optional. If given, this is called inon_trial_result()
, and the resulting dictionary is appended as extra columns to the results dataframe
syne_tune.backend.simulator_backend.time_keeper module
- class syne_tune.backend.simulator_backend.time_keeper.SimulatedTimeKeeper[source]
Bases:
TimeKeeper
Here, time is simulated. It needs to be advanced explicitly.
In addition,
mark_exit()
andreal_time_since_last_recent_exit()
are used to measure real time spent outside the backend (i.e., in the tuner loop and scheduler). Namely, every method ofSimulatorBackend
callsmark_exit()
before leaving, andreal_time_since_last_recent_exit()
at the start, advancing the time counter accordingly.- property start_time_stamp: datetime
- Returns:
Time stamp (datetime) of (last recent) call of
start_of_time
Submodules
syne_tune.backend.local_backend module
- class syne_tune.backend.local_backend.LocalBackend(entry_point, delete_checkpoints=False, pass_args_as_json=False, rotate_gpus=True, num_gpus_per_trial=1, gpus_to_use=None)[source]
Bases:
TrialBackend
A backend running locally by spawning sub-process concurrently. Note that no resource management is done so the concurrent number of trials should be adjusted to the machine capacity.
Additional arguments on top of parent class
TrialBackend
:- Parameters:
entry_point (
str
) – Path to Python main file to be tunedrotate_gpus (
bool
) – In case several GPUs are present, each trial is scheduled on a different GPU. A new trial is preferentially scheduled on a free GPU, and otherwise the GPU with least prior assignments is chosen. IfFalse
, then all GPUs are used at the same time for all trials. Defaults toTrue
.num_gpus_per_trial (
int
) – Number of GPUs to be allocated to each trial. Must be not larger than the total number of GPUs available. Defaults to 1gpus_to_use (
Optional
[List
[int
]]) – If this is given, the backend only uses GPUs in this lists (non-negative ints). Entries must be inrange(get_num_gpus())
. Defaults to using all GPUs.
- trial_path(trial_id)[source]
- Parameters:
trial_id (
int
) – ID of trial- Return type:
Path
- Returns:
Directory where files related to trial are written to
- checkpoint_trial_path(trial_id)[source]
- Parameters:
trial_id (
int
) – ID of trial- Return type:
Path
- Returns:
Directory where checkpoints for trial are written to and read from
- copy_checkpoint(src_trial_id, tgt_trial_id)[source]
Copy the checkpoint folder from one trial to the other.
- Parameters:
src_trial_id (
int
) – Source trial ID (copy from)tgt_trial_id (
int
) – Target trial ID (copy to)
- delete_checkpoint(trial_id)[source]
Removes checkpoint folder for a trial. It is OK for the folder not to exist.
- Parameters:
trial_id (
int
) – ID of trial for which checkpoint files are deleted
- busy_trial_ids()[source]
Returns list of ids for currently busy trials
A trial is busy if its status is
in_progress
orstopping
. If the execution setup is able to runn_workers
jobs in parallel, then if this method returns a list of sizen
, the tuner may startn_workers - n
new jobs.- Return type:
List
[Tuple
[int
,str
]]- Returns:
List of
(trial_id, status)
- stdout(trial_id)[source]
Fetch
stdout
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stdout)
- stderr(trial_id)[source]
Fetch
stderr
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stderr)
- set_path(results_root=None, tuner_name=None)[source]
- Parameters:
results_root (
Optional
[str
]) – The local folder that should contain the results of the tuning experiment. Used byTuner
to indicate a desired path where the results should be written to. This is used to unify the location of backend files andTuner
results when possible (in the local backend). By default, the backend does not do anything since not all backends may be able to unify their file locations.tuner_name (
Optional
[str
]) – Name of the tuner, can be used for instance to save checkpoints on remote storage.
syne_tune.backend.time_keeper module
- class syne_tune.backend.time_keeper.TimeKeeper[source]
Bases:
object
To be used by tuner, backend, and scheduler to measure time differences and wait for a specified amount of time. By centralizing this functionality here, we can support simulating experiments much faster than real time if the training evaluation function corresponds to a tabulated benchmark.
- class syne_tune.backend.time_keeper.RealTimeKeeper[source]
Bases:
TimeKeeper
syne_tune.backend.trial_backend module
- class syne_tune.backend.trial_backend.TrialBackend(delete_checkpoints=False, pass_args_as_json=False)[source]
Bases:
object
Interface for backend to execute evaluations of trials.
- Parameters:
delete_checkpoints (
bool
) – IfTrue
, the checkpoints written by a trial are deleted once the trial is stopped or is registered as completed. Checkpoints of paused trials may also be removed, if the scheduler supports early checkpoint removal. Also, as part ofstop_all()
called at the end of the tuning loop, all remaining checkpoints are deleted. Defaults toFalse
(no checkpoints are removed).pass_args_as_json (
bool
) – Normally, the hyperparameter configuration is passed as command line arguments to the trial evaluation script. This works if all hyperparameters have elementary types. Ifpass_args_as_json == True
, the configuration is instead written into a JSON file, whose name is passed as command line argumentST_CONFIG_JSON_FNAME_ARG
. The trial evaluation script then loads the configuration from this file. This allows the configuration to contain entries with complex types (e.g., lists or dictionaries), as long as they are JSON-serializable. Defaults toFalse
.
- start_trial(config, checkpoint_trial_id=None)[source]
Start new trial with new trial ID
- Parameters:
config (
Dict
[str
,Any
]) – Configuration for new trialcheckpoint_trial_id (
Optional
[int
]) – If given, the new trial starts from the checkpoint written by this previous trial
- Return type:
- Returns:
New trial, which includes new trial ID
- copy_checkpoint(src_trial_id, tgt_trial_id)[source]
Copy the checkpoint folder from one trial to the other.
- Parameters:
src_trial_id (
int
) – Source trial ID (copy from)tgt_trial_id (
int
) – Target trial ID (copy to)
- delete_checkpoint(trial_id)[source]
Removes checkpoint folder for a trial. It is OK for the folder not to exist.
- Parameters:
trial_id (
int
) – ID of trial for which checkpoint files are deleted
- resume_trial(trial_id, new_config=None)[source]
Resume paused trial
- Parameters:
trial_id (
int
) – ID of (paused) trial to be resumednew_config (
Optional
[dict
]) – If given, the config maintained intrial.config
is replaced bynew_config
- Return type:
- Returns:
Information for resumed trial
- pause_trial(trial_id, result=None)[source]
Pauses a running trial
Checks that the operation is valid and calls backend internal implementation to actually pause the trial. If the status is queried after this function, it should be
"paused"
.- Parameters:
trial_id (
int
) – ID of trial to pauseresult (
Optional
[dict
]) – Result dict based on which scheduler decided to pause the trial
- stop_trial(trial_id, result=None)[source]
Stops (and terminates) a running trial
Checks that the operation is valid and calls backend internal implementation to actually stop the trial. f the status is queried after this function, it should be
"stopped"
.- Parameters:
trial_id (
int
) – ID of trial to stopresult (
Optional
[dict
]) – Result dict based on which scheduler decided to stop the trial
- fetch_status_results(trial_ids)[source]
- Parameters:
trial_ids (
List
[int
]) – Trials whose information should be fetched.- Return type:
(
Dict
[int
,Tuple
[Trial
,str
]],List
[Tuple
[int
,dict
]])- Returns:
A tuple containing 1) a dictionary from trial-id to Trial and status information; 2) a list of (trial-id, results) pairs for each new result emitted since the last call. The list of results is sorted by the worker time-stamp.
- busy_trial_ids()[source]
Returns list of ids for currently busy trials
A trial is busy if its status is
in_progress
orstopping
. If the execution setup is able to runn_workers
jobs in parallel, then if this method returns a list of sizen
, the tuner may startn_workers - n
new jobs.- Return type:
List
[Tuple
[int
,str
]]- Returns:
List of
(trial_id, status)
- stdout(trial_id)[source]
Fetch
stdout
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stdout)
- stderr(trial_id)[source]
Fetch
stderr
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stderr)
- set_path(results_root=None, tuner_name=None)[source]
- Parameters:
results_root (
Optional
[str
]) – The local folder that should contain the results of the tuning experiment. Used byTuner
to indicate a desired path where the results should be written to. This is used to unify the location of backend files andTuner
results when possible (in the local backend). By default, the backend does not do anything since not all backends may be able to unify their file locations.tuner_name (
Optional
[str
]) – Name of the tuner, can be used for instance to save checkpoints on remote storage.
syne_tune.backend.trial_status module
- class syne_tune.backend.trial_status.Status[source]
Bases:
object
-
completed:
str
= 'Completed'
-
in_progress:
str
= 'InProgress'
-
failed:
str
= 'Failed'
-
paused:
str
= 'Paused'
-
stopped:
str
= 'Stopped'
-
stopping:
str
= 'Stopping'
-
completed:
- class syne_tune.backend.trial_status.Trial(trial_id, config, creation_time)[source]
Bases:
object
-
trial_id:
int
-
config:
Dict
[str
,object
]
-
creation_time:
datetime
-
trial_id:
- class syne_tune.backend.trial_status.TrialResult(trial_id, config, creation_time, metrics, status, training_end_time=None)[source]
Bases:
Trial
-
metrics:
List
[Dict
[str
,object
]]
-
status:
Literal
['Completed'
,'InProgress'
,'Failed'
,'Stopped'
,'Stopping'
]
-
training_end_time:
Optional
[datetime
] = None
- property seconds
- property cost
-
metrics:
syne_tune.blackbox_repository package
- class syne_tune.blackbox_repository.BlackboxOffline(df_evaluations, configuration_space, fidelity_space=None, objectives_names=None, seed_col=None)[source]
Bases:
Blackbox
A blackbox obtained given offline evaluations. Each row of the dataframe should contain one evaluation given a fixed configuration, fidelity and seed. The columns must correspond to the provided configuration and fidelity space, by default all columns that are prefixed by
"metric_"
are assumed to be metrics but this can be overridden by providing metric columns.Additional arguments on top of parent class
Blackbox
:- Parameters:
df_evaluations (
DataFrame
) – Data frame with evaluations dataseed_col (
Optional
[str
]) – optional, can be used when multiple seeds are recorded
- hyperparameter_objectives_values(predict_curves=False)[source]
If
predict_curves
is False, the shape ofX
is(num_evals * num_seeds * num_fidelities, num_hps + 1)
, the shape ofy
is(num_evals * num_seeds * num_fidelities, num_objectives)
. This can be reshaped to(num_fidelities, num_seeds, num_evals, *)
. The final column ofX
is the fidelity value (only a single fidelity attribute is supported).If
predict_curves
is True, the shape ofX
is(num_evals * num_seeds, num_hps)
, the shape ofy
is(num_evals * num_seeds, num_fidelities * num_objectives)
. The latter can be reshaped to(num_seeds, num_evals, num_fidelities, num_objectives)
.- Returns:
a tuple of two dataframes
(X, y)
, whereX
contains hyperparameters values andy
contains objective values, this is used when fitting a surrogate model.
- syne_tune.blackbox_repository.deserialize(path)[source]
- Parameters:
path (
str
) – where to find blackbox serialized information (at least data.csv.zip and configspace.json)groupby_col – separate evaluations into a list of blackbox with different task if the column is provided
- Return type:
Union
[Dict
[str
,BlackboxOffline
],BlackboxOffline
]- Returns:
list of blackboxes per task, or single blackbox in the case of a single task
- syne_tune.blackbox_repository.load_blackbox(name, skip_if_present=True, s3_root=None, generate_if_not_found=True, yahpo_kwargs=None, ignore_hash=True)[source]
- Parameters:
name (
str
) –name of a blackbox present in the repository, see
blackbox_list()
to get list of available blackboxes. Syne Tune currently provides the following blackboxes evaluations:”nasbench201”: 15625 multi-fidelity configurations of computer vision architectures evaluated on 3 datasets. NAS-Bench-201: Extending the scope of reproducible neural architecture search. Dong, X. and Yang, Y. 2020.
”fcnet”: 62208 multi-fidelity configurations of MLP evaluated on 4 datasets. Tabular benchmarks for joint architecture and hyperparameter optimization. Klein, A. and Hutter, F. 2019.
”lcbench”: 2000 multi-fidelity Pytorch model configurations evaluated on many datasets. Reference: Auto-PyTorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL. Lucas Zimmer, Marius Lindauer, Frank Hutter. 2020.
”icml-deepar”: 2420 single-fidelity configurations of DeepAR forecasting algorithm evaluated on 10 datasets. A quantile-based approach for hyperparameter transfer learning. Salinas, D., Shen, H., and Perrone, V. 2021.
”icml-xgboost”: 5O00 single-fidelity configurations of XGBoost evaluated on 9 datasets. A quantile-based approach for hyperparameter transfer learning. Salinas, D., Shen, H., and Perrone, V. 2021.
”yahpo-*”: Number of different benchmarks from YAHPO Gym. Note that these blackboxes come with surrogates already, so no need to wrap them into
SurrogateBlackbox
skip_if_present (
bool
) – skip the download if the file locally existss3_root (
Optional
[str
]) – S3 root directory for blackbox repository. Defaults to S3 bucket name of SageMaker sessiongenerate_if_not_found (
bool
) – If the blackbox file is not present locally or on S3, should it be generated using its conversion script?yahpo_kwargs (
Optional
[dict
]) – For a YAHPO blackbox (name == "yahpo-*"
), these are additional arguments toinstantiate_yahpo
ignore_hash (
bool
) – do not check if hash of currently stored files matches the pre-computed hash. Be careful with this option. If hashes do not match, results might not be reproducible.
- Return type:
- Returns:
blackbox with the given name, download it if not present.
- syne_tune.blackbox_repository.blackbox_list()[source]
- Return type:
List
[str
]- Returns:
list of blackboxes available
- syne_tune.blackbox_repository.add_surrogate(blackbox, surrogate=None, configuration_space=None, predict_curves=None, separate_seeds=False, fit_differences=None)[source]
Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation.
- Parameters:
blackbox (
Blackbox
) – the blackbox must implementhyperparameter_objectives_values()
so that input/output are passed to estimate the modelsurrogate – the model that is fitted to predict objectives given any configuration. Possible examples:
KNeighborsRegressor(n_neighbors=1)
,MLPRegressor()
or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows inX
to vectors. We useconfiguration_space
to deduce the types of columns inX
(categorical parameters are one-hot encoded).configuration_space (
Optional
[dict
]) – configuration space for the resulting blackbox surrogate. The default isblackbox.configuration_space
. But note that ifblackbox
is tabular, the domains inblackbox.configuration_space
are typically categorical even for numerical parameters.predict_curves (
Optional
[bool
]) – If True, the surrogate uses multivariate regression to predict metric curves over fidelities. If False, fidelity is used as input. The latter can lead to inconsistent predictions along fidelity and is typically more expensive. If not given, the default value isFalse
ifblackbox
is of typeBlackboxOffline
, otherwiseTrue
.separate_seeds (
bool
) – IfTrue
, seeds inblackbox
map to seeds in the surrogate blackbox, which fits different models to each seed. IfFalse
, the data fromblackbox
is merged for all seeds, and the surrogate represents a single seed. The latter provides more data for the surrogate model to be fit, but the variation between seeds is lost in the surrogate. Defaults toFalse
.fit_differences (
Optional
[List
[str
]]) – Names of objectives which are cumulative sums. For these objectives, they
data is transformed to finite differences before fitting the model. This is recommended forelapsed_time
objectives.
- Returns:
a blackbox where the output is obtained through the fitted surrogate
- class syne_tune.blackbox_repository.BlackboxRepositoryBackend(blackbox_name, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, dataset=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, config_space_surrogate=None, **simulatorbackend_kwargs)[source]
Bases:
_BlackboxSimulatorBackend
Allows to simulate a blackbox from blackbox-repository, selected by
blackbox_name
. Seeexamples/launch_simulated_benchmark.py
for an example on how to use. If you want to add a new dataset, see the Adding a new dataset section ofsyne_tune/blackbox_repository/README.md
.In each result reported to the simulator backend, the value for key
elapsed_time_attr
must be the time since the start of the evaluation. For example, if resource (or fidelity) equates to epochs trained, this would be the time from start of training until the end of the epoch. If the blackbox contains this information in a column,elapsed_time_attr
should be its key.If this backend is used with pause-and-resume multi-fidelity scheduling, it needs to track at which resource level each trial is paused. Namely, once a trial is resumed, all results for resources smaller or equal to that level are ignored, which simulates the situation that training is resumed from a checkpoint. This feature relies on
result
to be passed topause_trial()
. If this is not done, the backend cannot know from which resource level to resume a trial, so it starts the trial from scratch (which is equivalent to no checkpointing). The same happens ifsupport_checkpointing
is False.Note
If the blackbox maintains cumulative time (elapsed_time), this is different from what
SimulatorBackend
requires forelapsed_time_attr
, if a pause-and-resume scheduler is used. Namely, the backend requires the time since the start of the last recent resume. This conversion is done here internally in_run_job_and_collect_results()
, which is called for each resume. This means that the fieldelapsed_time_attr
is not what is received from the blackbox table, but instead what the backend needs.max_resource_attr
plays the same role as inHyperbandScheduler
. If given, it is the key in a configurationconfig
for the maximum resource. This is used by schedulers which limit each evaluation by setting this argument (e.g., promotion-based Hyperband).If
seed
is given, entries of the blackbox are queried for this seed. Otherwise, a seed is drawn at random for every trial, but the same seed is used for all_run_job_and_collect_results()
calls for the same trial. This is important for pause and resume scheduling.- Parameters:
blackbox_name (
str
) – Name of a blackbox, must have been registered in blackbox repository.elapsed_time_attr (
str
) – Name of the column containing cumulative timemax_resource_attr (
Optional
[str
]) – See aboveseed (
Optional
[int
]) – If given, this seed is used for all trial evaluations. Otherwise, seed is sampled at random for each trial. Only relevant for blackboxes with multiple seedssupport_checkpointing (
bool
) – IfFalse
, the simulation does not do checkpointing, so resumed trials are started from scratch. Defaults toTrue
dataset (
Optional
[str
]) – Selects different versions of the blackbox (typically, the same ML model has been trained on different datasets)surrogate (
Optional
[str
]) – Optionally, a model that is fitted to predict objectives given any configuration. Examples: “KNeighborsRegressor”, “MLPRegressor”, “XGBRegressor”, which would enable using the corresponding scikit-learn estimator, see alsomake_surrogate()
. The model is fit on top of pipeline that applies basic feature-processing to convert hyperparameter rows in X to vectors. Theconfiguration_space
hyperparameter types are used to deduce the types of columns in X (for instance, categorical hyperparameters are one-hot encoded).surrogate_kwargs (
Optional
[dict
]) – Arguments for the scikit-learn estimator, for instance{"n_neighbors": 1}
can be used ifsurrogate="KNeighborsRegressor"
is chosen. Ifblackbox_name
is a YAHPO blackbox, thensurrogate_kwargs
is passed asyahpo_kwargs
toload_blackbox()
. In this case,surrogate
is ignored (YAHPO always uses surrogates).config_space_surrogate (
Optional
[dict
]) – Ifsurrogate
is given, this is the configuration space for the surrogate blackbox. If not given, the space of the original blackbox is used. However, its numerical parameters have finite domains (categorical or ordinal), which is usually not what we want for a surrogate.simulatorbackend_kwargs – Additional arguments to parent
SimulatorBackend
- class syne_tune.blackbox_repository.UserBlackboxBackend(blackbox, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, **simulatorbackend_kwargs)[source]
Bases:
_BlackboxSimulatorBackend
Version of
_BlackboxSimulatorBackend
, where the blackbox is given as explicitBlackbox
object. Seeexamples/launch_simulated_benchmark.py
for an example on how to use.Additional arguments on top of parent
_BlackboxSimulatorBackend
:- Parameters:
blackbox (
Blackbox
) – Blackbox to be used for simulation
Subpackages
syne_tune.blackbox_repository.conversion_scripts package
Subpackages
syne_tune.blackbox_repository.conversion_scripts.scripts package
Subpackages
syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench package
Submodules
syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.api module
- class syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.api.Benchmark(data_dir, cache=False, cache_dir='cached/')[source]
Bases:
object
API for TabularBench.
- query(dataset_name, tag, config_id)[source]
Query a run.
Keyword arguments: dataset_name – str, the name of the dataset in the benchmark tag – str, the tag you want to query config_id – int, an identifier for which run you want to query, if too large will query the last run
- query_best(dataset_name, tag, criterion, position=0)[source]
Query the n-th best run. “Best” here means achieving the largest value at any epoch/step,
Keyword arguments: dataset_name – str, the name of the dataset in the benchmark tag – str, the tag you want to query criterion – str, the tag you want to use for the ranking position – int, an identifier for which position in the ranking you want to query
- get_config(dataset_name, config_id)[source]
Returns the configuration of a run specified by dataset name and config id
- plot_by_name(dataset_names, x_col, y_col, n_configs=10, show_best=False, xscale='linear', yscale='linear', criterion=None)[source]
Plot multiple datasets and multiple runs.
Keyword arguments: dataset_names – list x_col – str, tag to plot on x-axis y_col – str, tag to plot on y-axis n_configs – int, number of configs to plot for each dataset show_best – bool, weather to show the n_configs best (according to query_best()) xscale – str, set xscale, options as in matplotlib: “linear”, “log”, “symlog”, “logit”, … yscale – str, set yscale, options as in matplotlib: “linear”, “log”, “symlog”, “logit”, … criterion – str, tag used as criterion for query_best()
syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.lcbench module
- syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.lcbench.convert_task(bench, dataset_name)[source]
- class syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.lcbench.LCBenchRecipe[source]
Bases:
BlackboxRecipe
Submodules
syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import module
- Convert tabular data from
Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization Aaron Klein Frank Hutter https://arxiv.org/pdf/1905.04970.pdf.
- syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import.convert_dataset(dataset_path, max_rows=None)[source]
- syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import.plot_learning_curves()[source]
- class syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import.FCNETRecipe[source]
Bases:
BlackboxRecipe
syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import module
- Convert evaluations from
A Quantile-based Approach for Hyperparameter Transfer Learning David Salinas Huibin Shen Valerio Perrone http://proceedings.mlr.press/v119/salinas20a/salinas20a.pdf
- syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import.download(blackbox)[source]
- syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import.serialize_deepar()[source]
- syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import.serialize_xgboost()[source]
‘hp_log2_min_child_weight’, ‘hp_subsample’, ‘hp_colsample_bytree’, ‘hp_log2_gamma’, ‘hp_log2_lambda’, ‘hp_eta’, ‘hp_max_depth_index’, ‘hp_log2_alpha’, ‘metric_error’, ‘blackbox’, ‘task’
- class syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import.XGBoostRecipe[source]
Bases:
BlackboxRecipe
- class syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import.DeepARRecipe[source]
Bases:
BlackboxRecipe
syne_tune.blackbox_repository.conversion_scripts.scripts.nasbench201_import module
- syne_tune.blackbox_repository.conversion_scripts.scripts.nasbench201_import.str_to_list(arch_str)[source]
- syne_tune.blackbox_repository.conversion_scripts.scripts.nasbench201_import.convert_dataset(data, dataset)[source]
- class syne_tune.blackbox_repository.conversion_scripts.scripts.nasbench201_import.NASBench201Recipe[source]
Bases:
BlackboxRecipe
syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import module
- syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import.convert_task(task_data)[source]
- class syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import.PD1Recipe[source]
Bases:
BlackboxRecipe
- syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import.serialize(bb_dict, path, metadata=None)[source]
- syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import.deserialize(path)[source]
Deserialize blackboxes contained in a path that were saved with
serialize
above. TODO: the API is currently dissonant withserialize
,deserialize
for BlackboxOffline asserialize
is there a member. A possible way to unify is to have serialize also be a free function for BlackboxOffline. :type path:str
:param path: a path that contains blackboxes that were saved withserialize
:rtype:Dict
[str
,BlackboxTabular
] :return: a dictionary from task name to blackbox
syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import module
Wrap Surrogates from YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization Florian Pfisterer, Lennart Schneider, Julia Moosbauer, Martin Binder, Bernd Bischl
- syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.download(target_path, version)[source]
- class syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.BlackBoxYAHPO(benchmark, fidelities=None)[source]
Bases:
Blackbox
A wrapper that allows putting a ‘YAHPO’ BenchmarkInstance into a Blackbox.
If
fidelities
is given, it restrictsfidelity_values
to these values. The sequence must be positive int and increasing. This works only if there is a single fidelity attribute with integer values (but note that for some specific YAHPO benchmarks, a fractional fidelity is transformed to an integer one).Even though YAHPO interpolates between fidelities, it can make sense to restrict them to the values which have really been acquired in the data. Note that this restricts multi-fidelity schedulers like
HyperbandScheduler
, in that all their rungs levels have to be fidelity values.For example, for YAHPO
iaml
, the fidelitytrainsize
has been acquired at [0.05, 0.1, 0.2, 0.4, 0.6, 0.8, 1], this is transformed to [1, 2, 4, 8, 12, 16, 20]. By default, the fidelity is represented bycs.randint(1, 20)
, but iffidelities
is passed, it usescs.ordinal(fidelities)
.- Parameters:
benchmark (
BenchmarkSet
) – YAHPOBenchmarkSet
fidelities (
Optional
[List
[int
]]) – See above
- property instances: array
- property fidelity_values: array
- Returns:
Fidelity values; or None if the blackbox has none
- property time_attribute: str
Name of the time column
- syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.cs_to_synetune(config_space)[source]
Convert ConfigSpace.ConfigSpace to a synetune configspace.
TODO cover all possible hyperparameters of ConfigSpace.ConfigSpace, right now we only convert the one we need.
- syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.instantiate_yahpo(scenario, check=False, fidelities=None)[source]
Instantiates a dict of
BlackBoxYAHPO
, one entry for each instance.- Parameters:
scenario (
str
) –check (
bool
) – If False,objective_function
of the blackbox does not check whether the input configuration is valid. This is faster, but calls fail silently if configurations are invalid.
- Returns:
- syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.serialize_yahpo(scenario, target_path, version='1.0')[source]
- class syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.YAHPORecipe(name)[source]
Bases:
BlackboxRecipe
Submodules
syne_tune.blackbox_repository.conversion_scripts.blackbox_recipe module
- class syne_tune.blackbox_repository.conversion_scripts.blackbox_recipe.BlackboxRecipe(name, cite_reference, hash=None)[source]
Bases:
object
- generate(s3_root=None)[source]
Generates the blackbox on disk then upload it on s3 if AWS is available. :type s3_root:
Optional
[str
] :param s3_root: s3 root where to upload to s3, default to s3://{sagemaker-bucket}/blackbox-repository. If AWS is not available, this step is skipped and the dataset is just persisted locally. :return:
syne_tune.blackbox_repository.conversion_scripts.recipes module
syne_tune.blackbox_repository.conversion_scripts.utils module
- syne_tune.blackbox_repository.conversion_scripts.utils.get_sub_directory_and_name(name)[source]
Blackboxes are either stored under “{blackbox-repository}/{name}” (such as fcnet, nas201, …) or “{blackbox-repository}/{subdir}/{subname}” for all yahpo benchmark. In the Yahpo case, “yahpo-rbv2_xgboost” is for instance stored on “{blackbox-repository}/yahpo/rbv2_xgboost/”. :type name:
str
:param name: name of the blackbox, for instance “fcnet”, “lcbench” or “yahpo-rbv2_xgboost”. :return: subdirectory and subname such that the blackbox should be stored on {blackbox_repository}/{subdir}/{name}.
- syne_tune.blackbox_repository.conversion_scripts.utils.blackbox_local_path(name)[source]
- Return type:
Path
- syne_tune.blackbox_repository.conversion_scripts.utils.blackbox_s3_path(name, s3_root=None)[source]
- Return type:
Path
- syne_tune.blackbox_repository.conversion_scripts.utils.upload_blackbox(name, s3_root=None)[source]
Uploads a blackbox locally present in repository_path to S3. :type name:
str
:param name: folder must be available in repository_path/name
- syne_tune.blackbox_repository.conversion_scripts.utils.validate_hash(tgt_folder, original_hash)[source]
Computes hash of the files in tgt_folder and validates it with the original hash :type tgt_folder: :param tgt_folder: target folder that contains the files of the original benchmark :type original_hash: :param original_hash: original sha256 hash :return:
Submodules
syne_tune.blackbox_repository.blackbox module
- class syne_tune.blackbox_repository.blackbox.Blackbox(configuration_space, fidelity_space=None, objectives_names=None)[source]
Bases:
object
Interface designed to be compatible with
HPOBench- Parameters:
configuration_space (
Dict
[str
,Any
]) – Configuration space of blackbox.fidelity_space (
Optional
[dict
]) – Fidelity space for blackbox, optional.objectives_names (
Optional
[List
[str
]]) – Names of the metrics, by default consider all metrics prefixed by"metric_"
to be metrics
- objective_function(configuration, fidelity=None, seed=None)[source]
Returns an evaluation of the blackbox.
First perform data check and then call
_objective_function()
that should be overriden in the child class.- Parameters:
configuration (
Dict
[str
,Any
]) – configuration to be evaluated, should belong toconfiguration_space
fidelity (
Union
[dict
,Number
,None
]) – not passing a fidelity is possible if either the blackbox does not have a fidelity space or if it has a single fidelity in its fidelity space. In the latter case, all fidelities are returned in form of a tensor with shape(num_fidelities, num_objectives)
.seed (
Optional
[int
]) – Only used if the blackbox defines multiple seeds
- Return type:
Union
[Dict
[str
,float
],ndarray
]- Returns:
dictionary of objectives evaluated or tensor with shape
(num_fidelities, num_objectives)
if no fidelity was given.
- hyperparameter_objectives_values(predict_curves=False)[source]
If
predict_curves
is False, the shape ofX
is(num_evals * num_seeds * num_fidelities, num_hps + 1)
, the shape ofy
is(num_evals * num_seeds * num_fidelities, num_objectives)
. This can be reshaped to(num_fidelities, num_seeds, num_evals, *)
. The final column ofX
is the fidelity value (only a single fidelity attribute is supported).If
predict_curves
is True, the shape ofX
is(num_evals * num_seeds, num_hps)
, the shape ofy
is(num_evals * num_seeds, num_fidelities * num_objectives)
. The latter can be reshaped to(num_seeds, num_evals, num_fidelities, num_objectives)
.- Return type:
Tuple
[DataFrame
,DataFrame
]- Returns:
a tuple of two dataframes
(X, y)
, whereX
contains hyperparameters values andy
contains objective values, this is used when fitting a surrogate model.
- property fidelity_values: array | None
- Returns:
Fidelity values; or None if the blackbox has none
- fidelity_name()[source]
Can only be used for blackboxes with a single fidelity attribute.
- Return type:
str
- Returns:
Name of fidelity attribute (must be single one)
- configuration_space_with_max_resource_attr(max_resource_attr)[source]
It is best practice to have one attribute in the configuration space to represent the maximum fidelity value used for evaluation (e.g., the maximum number of epochs).
- Parameters:
max_resource_attr (
str
) – Name of new attribute for maximum resource- Return type:
Dict
[str
,Any
]- Returns:
Configuration space augmented by the new attribute
- syne_tune.blackbox_repository.blackbox.from_function(configuration_space, eval_fun, fidelity_space=None, objectives_names=None)[source]
Helper to create a blackbox from a function, useful for test or to wrap-up real blackbox functions.
- Parameters:
configuration_space (
Dict
[str
,Any
]) – Configuration space for blackboxeval_fun (
Callable
) – Function that returns dictionary of objectives given configuration and fidelityfidelity_space (
Optional
[dict
]) – Fidelity space for blackboxobjectives_names (
Optional
[List
[str
]]) – Objectives returned by blackbox
- Return type:
- Returns:
Resulting blackbox wrapping
eval_fun
syne_tune.blackbox_repository.blackbox_offline module
- class syne_tune.blackbox_repository.blackbox_offline.BlackboxOffline(df_evaluations, configuration_space, fidelity_space=None, objectives_names=None, seed_col=None)[source]
Bases:
Blackbox
A blackbox obtained given offline evaluations. Each row of the dataframe should contain one evaluation given a fixed configuration, fidelity and seed. The columns must correspond to the provided configuration and fidelity space, by default all columns that are prefixed by
"metric_"
are assumed to be metrics but this can be overridden by providing metric columns.Additional arguments on top of parent class
Blackbox
:- Parameters:
df_evaluations (
DataFrame
) – Data frame with evaluations dataseed_col (
Optional
[str
]) – optional, can be used when multiple seeds are recorded
- hyperparameter_objectives_values(predict_curves=False)[source]
If
predict_curves
is False, the shape ofX
is(num_evals * num_seeds * num_fidelities, num_hps + 1)
, the shape ofy
is(num_evals * num_seeds * num_fidelities, num_objectives)
. This can be reshaped to(num_fidelities, num_seeds, num_evals, *)
. The final column ofX
is the fidelity value (only a single fidelity attribute is supported).If
predict_curves
is True, the shape ofX
is(num_evals * num_seeds, num_hps)
, the shape ofy
is(num_evals * num_seeds, num_fidelities * num_objectives)
. The latter can be reshaped to(num_seeds, num_evals, num_fidelities, num_objectives)
.- Returns:
a tuple of two dataframes
(X, y)
, whereX
contains hyperparameters values andy
contains objective values, this is used when fitting a surrogate model.
- syne_tune.blackbox_repository.blackbox_offline.serialize(bb_dict, path, categorical_cols=[])[source]
- Parameters:
bb_dict (
Dict
[str
,BlackboxOffline
]) –path (
str
) –categorical_cols (
List
[str
]) – optional, allow to retrieve columns as categories, lower drastically the memory footprint when few values are present
- Returns:
- syne_tune.blackbox_repository.blackbox_offline.deserialize(path)[source]
- Parameters:
path (
str
) – where to find blackbox serialized information (at least data.csv.zip and configspace.json)groupby_col – separate evaluations into a list of blackbox with different task if the column is provided
- Return type:
Union
[Dict
[str
,BlackboxOffline
],BlackboxOffline
]- Returns:
list of blackboxes per task, or single blackbox in the case of a single task
syne_tune.blackbox_repository.blackbox_surrogate module
- class syne_tune.blackbox_repository.blackbox_surrogate.Columns(names=None)[source]
Bases:
BaseEstimator
,TransformerMixin
- class syne_tune.blackbox_repository.blackbox_surrogate.BlackboxSurrogate(X, y, configuration_space, objectives_names, fidelity_space=None, fidelity_values=None, surrogate=None, predict_curves=False, num_seeds=1, fit_differences=None, max_fit_samples=None, name=None)[source]
Bases:
Blackbox
Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation. To wrap an existing blackbox with a surrogate estimator, use
add_surrogate()
which automatically extractX
,y
matrices from available blackbox evaluations.The surrogate regression model is provided by
surrogate
, it has to conform to the scikit-learn fit-predict API. Ifpredict_curves
isTrue
, the model maps features of the configuration to the whole curve over fidelities, separate for each metric and seed. This has several advantages. First, predictions are consistent: if all curves in the data respect a certain property which is retained under convex combinations, predictions have this property as well (examples: positivity, monotonicity). This is important forelapsed_time
metrics. The regression models are also fairly compact, and prediction is fast,max_fit_samples
is normally not needed.If
predict_curves
isFalse,
the model maps features from configuration and fidelity to metric values (univariate regression). In this case, properties like monotonicity are not retained. Also, training can take long and the trained models can be large.This difference only matters if there are fidelities. Otherwise, regression is always univariate.
If
num_seeds
is given, we maintain different surrogate models for each seed. Otherwise, a single surrogate model is fit to data across all seeds.If
fit_differences
is given, it contains names of objectives which are cumulative sums. For these objectives, they
data is transformed to finite differences before fitting the model. This is recommended forelapsed_time
objectives. This feature only matters if there are fidelities.Additional arguments on top of parent class
Blackbox
:- Parameters:
X (
DataFrame
) – dataframe containing hyperparameters values. Shape is(num_seeds * num_evals, num_hps)
ifpredict_curves
isTrue
,(num_fidelities * num_seeds * num_evals, num_hps)
otherwisey (
DataFrame
) – dataframe containing objectives values. Shape is(num_seeds * num_evals, num_fidelities * num_objectives)
ifpredict_curves
isTrue
, and(num_fidelities * num_seeds * num_evals, num_objectives)
otherwisesurrogate – the model that is fitted to predict objectives given any configuration, default to KNeighborsRegressor(n_neighbors=1). If
predict_curves
isTrue
, this must be multi-variate regression, i.e. accept target matrices infit
, where columns correspond to fidelities. Regression models from scikit-learn allow for that. Possible examples:KNeighborsRegressor(n_neighbors=1)
,MLPRegressor()
or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows inX
to vectors. We use the configuration_space hyperparameters types to deduce the types of columns inX
(for instance,Categorical
values are one-hot encoded).predict_curves (
bool
) – See above. Default isFalse
(backwards compatible)num_seeds (
int
) – See abovefit_differences (
Optional
[List
[str
]]) – See abovemax_fit_samples (
Optional
[int
]) – maximum number of samples to be fed to the surrogate estimator, if the more data points than this number are passed, then they are subsampled without replacement. Ifnum_seeds
is used, this is a limit on the data per seedname (
Optional
[str
]) –
- property fidelity_values: array | None
- Returns:
Fidelity values; or None if the blackbox has none
- property num_fidelities: int
- static make_model_pipeline(configuration_space, fidelity_space, model, predict_curves=False)[source]
Create feature pipeline for scikit-learn model
- Parameters:
configuration_space – Configuration space
fidelity_space – Fidelity space
model – Scikit-learn model
predict_curves – Predict full curves?
- Returns:
Feature pipeline
- fit_surrogate(X, y)[source]
Fits a surrogate model to data from a blackbox. Here, the targets
y
can be a matrix with the number of columns equal to the number of fidelity values (thepredict_curves = True
case).- Return type:
- hyperparameter_objectives_values(predict_curves=False)[source]
If
predict_curves
is False, the shape ofX
is(num_evals * num_seeds * num_fidelities, num_hps + 1)
, the shape ofy
is(num_evals * num_seeds * num_fidelities, num_objectives)
. This can be reshaped to(num_fidelities, num_seeds, num_evals, *)
. The final column ofX
is the fidelity value (only a single fidelity attribute is supported).If
predict_curves
is True, the shape ofX
is(num_evals * num_seeds, num_hps)
, the shape ofy
is(num_evals * num_seeds, num_fidelities * num_objectives)
. The latter can be reshaped to(num_seeds, num_evals, num_fidelities, num_objectives)
.- Return type:
Tuple
[DataFrame
,DataFrame
]- Returns:
a tuple of two dataframes
(X, y)
, whereX
contains hyperparameters values andy
contains objective values, this is used when fitting a surrogate model.
- syne_tune.blackbox_repository.blackbox_surrogate.add_surrogate(blackbox, surrogate=None, configuration_space=None, predict_curves=None, separate_seeds=False, fit_differences=None)[source]
Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation.
- Parameters:
blackbox (
Blackbox
) – the blackbox must implementhyperparameter_objectives_values()
so that input/output are passed to estimate the modelsurrogate – the model that is fitted to predict objectives given any configuration. Possible examples:
KNeighborsRegressor(n_neighbors=1)
,MLPRegressor()
or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows inX
to vectors. We useconfiguration_space
to deduce the types of columns inX
(categorical parameters are one-hot encoded).configuration_space (
Optional
[dict
]) – configuration space for the resulting blackbox surrogate. The default isblackbox.configuration_space
. But note that ifblackbox
is tabular, the domains inblackbox.configuration_space
are typically categorical even for numerical parameters.predict_curves (
Optional
[bool
]) – If True, the surrogate uses multivariate regression to predict metric curves over fidelities. If False, fidelity is used as input. The latter can lead to inconsistent predictions along fidelity and is typically more expensive. If not given, the default value isFalse
ifblackbox
is of typeBlackboxOffline
, otherwiseTrue
.separate_seeds (
bool
) – IfTrue
, seeds inblackbox
map to seeds in the surrogate blackbox, which fits different models to each seed. IfFalse
, the data fromblackbox
is merged for all seeds, and the surrogate represents a single seed. The latter provides more data for the surrogate model to be fit, but the variation between seeds is lost in the surrogate. Defaults toFalse
.fit_differences (
Optional
[List
[str
]]) – Names of objectives which are cumulative sums. For these objectives, they
data is transformed to finite differences before fitting the model. This is recommended forelapsed_time
objectives.
- Returns:
a blackbox where the output is obtained through the fitted surrogate
syne_tune.blackbox_repository.blackbox_tabular module
- class syne_tune.blackbox_repository.blackbox_tabular.BlackboxTabular(hyperparameters, configuration_space, fidelity_space, objectives_evaluations, fidelity_values=None, objectives_names=None)[source]
Bases:
Blackbox
Blackbox that contains tabular evaluations (e.g. all hyperparameters evaluated on all fidelities). We use a separate class than
BlackboxOffline
, as performance improvement can be made by avoiding to repeat hyperparameters and by storing all evaluations in a single table.Additional arguments on top of parent class
Blackbox
:- Parameters:
hyperparameters (
DataFrame
) – dataframe of hyperparameters, shape(num_evals, num_hps)
, columns must match hyperparameter names ofconfiguration_space
objectives_evaluations (
array
) – values of recorded objectives, must have shape(num_evals, num_seeds, num_fidelities, num_objectives)
fidelity_values (
Optional
[array
]) – values of thenum_fidelities
fidelities, default to[1, ..., num_fidelities]
- property fidelity_values: array
- Returns:
Fidelity values; or None if the blackbox has none
- hyperparameter_objectives_values(predict_curves=False)[source]
If
predict_curves
is False, the shape ofX
is(num_evals * num_seeds * num_fidelities, num_hps + 1)
, the shape ofy
is(num_evals * num_seeds * num_fidelities, num_objectives)
. This can be reshaped to(num_fidelities, num_seeds, num_evals, *)
. The final column ofX
is the fidelity value (only a single fidelity attribute is supported).If
predict_curves
is True, the shape ofX
is(num_evals * num_seeds, num_hps)
, the shape ofy
is(num_evals * num_seeds, num_fidelities * num_objectives)
. The latter can be reshaped to(num_seeds, num_evals, num_fidelities, num_objectives)
.- Parameters:
predict_curves (
bool
) – See above. Default isFalse
- Return type:
Tuple
[DataFrame
,DataFrame
]- Returns:
Dataframes corresponding to
X
andy
- rename_objectives(objective_name_mapping)[source]
- Parameters:
objective_name_mapping (
Dict
[str
,str
]) – dictionary from old objective name to new one, old objective name must be present in the blackbox- Return type:
- Returns:
a blackbox with as many objectives as
objective_name_mapping
- all_configurations()[source]
This method is useful in order to set
restrict_configurations
inStochasticAndFilterDuplicatesSearcher
orGPFIFOSearcher
, which restricts the searcher to only return configurations in this set. This allows you to use a tabular blackbox without a surrogate.- Return type:
List
[Dict
[str
,Any
]]- Returns:
List of all hyperparameter configurations for which objective values can be returned
- syne_tune.blackbox_repository.blackbox_tabular.deserialize(path)[source]
Deserialize blackboxes contained in a path that were saved with
serialize()
above.TODO: the API is currently dissonant with
serialize()
,deserialize()
forBlackboxOffline
asserialize
is a member function there. A possible way to unify is to have serialize also be a free function forBlackboxOffline
.- Parameters:
path (
str
) – a path that contains blackboxes that were saved withserialize()
- Return type:
Dict
[str
,BlackboxTabular
]- Returns:
a dictionary from task name to blackbox
syne_tune.blackbox_repository.repository module
- syne_tune.blackbox_repository.repository.blackbox_list()[source]
- Return type:
List
[str
]- Returns:
list of blackboxes available
- syne_tune.blackbox_repository.repository.load_blackbox(name, skip_if_present=True, s3_root=None, generate_if_not_found=True, yahpo_kwargs=None, ignore_hash=True)[source]
- Parameters:
name (
str
) –name of a blackbox present in the repository, see
blackbox_list()
to get list of available blackboxes. Syne Tune currently provides the following blackboxes evaluations:”nasbench201”: 15625 multi-fidelity configurations of computer vision architectures evaluated on 3 datasets. NAS-Bench-201: Extending the scope of reproducible neural architecture search. Dong, X. and Yang, Y. 2020.
”fcnet”: 62208 multi-fidelity configurations of MLP evaluated on 4 datasets. Tabular benchmarks for joint architecture and hyperparameter optimization. Klein, A. and Hutter, F. 2019.
”lcbench”: 2000 multi-fidelity Pytorch model configurations evaluated on many datasets. Reference: Auto-PyTorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL. Lucas Zimmer, Marius Lindauer, Frank Hutter. 2020.
”icml-deepar”: 2420 single-fidelity configurations of DeepAR forecasting algorithm evaluated on 10 datasets. A quantile-based approach for hyperparameter transfer learning. Salinas, D., Shen, H., and Perrone, V. 2021.
”icml-xgboost”: 5O00 single-fidelity configurations of XGBoost evaluated on 9 datasets. A quantile-based approach for hyperparameter transfer learning. Salinas, D., Shen, H., and Perrone, V. 2021.
”yahpo-*”: Number of different benchmarks from YAHPO Gym. Note that these blackboxes come with surrogates already, so no need to wrap them into
SurrogateBlackbox
skip_if_present (
bool
) – skip the download if the file locally existss3_root (
Optional
[str
]) – S3 root directory for blackbox repository. Defaults to S3 bucket name of SageMaker sessiongenerate_if_not_found (
bool
) – If the blackbox file is not present locally or on S3, should it be generated using its conversion script?yahpo_kwargs (
Optional
[dict
]) – For a YAHPO blackbox (name == "yahpo-*"
), these are additional arguments toinstantiate_yahpo
ignore_hash (
bool
) – do not check if hash of currently stored files matches the pre-computed hash. Be careful with this option. If hashes do not match, results might not be reproducible.
- Return type:
- Returns:
blackbox with the given name, download it if not present.
syne_tune.blackbox_repository.serialize module
syne_tune.blackbox_repository.simulated_tabular_backend module
- syne_tune.blackbox_repository.simulated_tabular_backend.make_surrogate(surrogate=None, surrogate_kwargs=None)[source]
Creates surrogate model (scikit-learn estimater)
- Parameters:
surrogate (
Optional
[str
]) – A model that is fitted to predict objectives given any configuration. Possible examples: “KNeighborsRegressor”, MLPRegressor”, “XGBRegressor”, which would enable using the corresponding scikit-learn estimator. The model is fit on top of pipeline that applies basic feature-processing to convert hyperparameters rows in X to vectors. Theconfiguration_space
hyperparameters types are used to deduce the types of columns in X (for instance, categorical hyperparameters are one-hot encoded).surrogate_kwargs (
Optional
[dict
]) – Arguments for the scikit-learn estimator, for instance{"n_neighbors": 1}
can be used ifsurrogate="KNeighborsRegressor"
is chosen.
- Returns:
Scikit-learn estimator representing surrogate model
- class syne_tune.blackbox_repository.simulated_tabular_backend.BlackboxRepositoryBackend(blackbox_name, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, dataset=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, config_space_surrogate=None, **simulatorbackend_kwargs)[source]
Bases:
_BlackboxSimulatorBackend
Allows to simulate a blackbox from blackbox-repository, selected by
blackbox_name
. Seeexamples/launch_simulated_benchmark.py
for an example on how to use. If you want to add a new dataset, see the Adding a new dataset section ofsyne_tune/blackbox_repository/README.md
.In each result reported to the simulator backend, the value for key
elapsed_time_attr
must be the time since the start of the evaluation. For example, if resource (or fidelity) equates to epochs trained, this would be the time from start of training until the end of the epoch. If the blackbox contains this information in a column,elapsed_time_attr
should be its key.If this backend is used with pause-and-resume multi-fidelity scheduling, it needs to track at which resource level each trial is paused. Namely, once a trial is resumed, all results for resources smaller or equal to that level are ignored, which simulates the situation that training is resumed from a checkpoint. This feature relies on
result
to be passed topause_trial()
. If this is not done, the backend cannot know from which resource level to resume a trial, so it starts the trial from scratch (which is equivalent to no checkpointing). The same happens ifsupport_checkpointing
is False.Note
If the blackbox maintains cumulative time (elapsed_time), this is different from what
SimulatorBackend
requires forelapsed_time_attr
, if a pause-and-resume scheduler is used. Namely, the backend requires the time since the start of the last recent resume. This conversion is done here internally in_run_job_and_collect_results()
, which is called for each resume. This means that the fieldelapsed_time_attr
is not what is received from the blackbox table, but instead what the backend needs.max_resource_attr
plays the same role as inHyperbandScheduler
. If given, it is the key in a configurationconfig
for the maximum resource. This is used by schedulers which limit each evaluation by setting this argument (e.g., promotion-based Hyperband).If
seed
is given, entries of the blackbox are queried for this seed. Otherwise, a seed is drawn at random for every trial, but the same seed is used for all_run_job_and_collect_results()
calls for the same trial. This is important for pause and resume scheduling.- Parameters:
blackbox_name (
str
) – Name of a blackbox, must have been registered in blackbox repository.elapsed_time_attr (
str
) – Name of the column containing cumulative timemax_resource_attr (
Optional
[str
]) – See aboveseed (
Optional
[int
]) – If given, this seed is used for all trial evaluations. Otherwise, seed is sampled at random for each trial. Only relevant for blackboxes with multiple seedssupport_checkpointing (
bool
) – IfFalse
, the simulation does not do checkpointing, so resumed trials are started from scratch. Defaults toTrue
dataset (
Optional
[str
]) – Selects different versions of the blackbox (typically, the same ML model has been trained on different datasets)surrogate (
Optional
[str
]) – Optionally, a model that is fitted to predict objectives given any configuration. Examples: “KNeighborsRegressor”, “MLPRegressor”, “XGBRegressor”, which would enable using the corresponding scikit-learn estimator, see alsomake_surrogate()
. The model is fit on top of pipeline that applies basic feature-processing to convert hyperparameter rows in X to vectors. Theconfiguration_space
hyperparameter types are used to deduce the types of columns in X (for instance, categorical hyperparameters are one-hot encoded).surrogate_kwargs (
Optional
[dict
]) – Arguments for the scikit-learn estimator, for instance{"n_neighbors": 1}
can be used ifsurrogate="KNeighborsRegressor"
is chosen. Ifblackbox_name
is a YAHPO blackbox, thensurrogate_kwargs
is passed asyahpo_kwargs
toload_blackbox()
. In this case,surrogate
is ignored (YAHPO always uses surrogates).config_space_surrogate (
Optional
[dict
]) – Ifsurrogate
is given, this is the configuration space for the surrogate blackbox. If not given, the space of the original blackbox is used. However, its numerical parameters have finite domains (categorical or ordinal), which is usually not what we want for a surrogate.simulatorbackend_kwargs – Additional arguments to parent
SimulatorBackend
- class syne_tune.blackbox_repository.simulated_tabular_backend.UserBlackboxBackend(blackbox, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, **simulatorbackend_kwargs)[source]
Bases:
_BlackboxSimulatorBackend
Version of
_BlackboxSimulatorBackend
, where the blackbox is given as explicitBlackbox
object. Seeexamples/launch_simulated_benchmark.py
for an example on how to use.Additional arguments on top of parent
_BlackboxSimulatorBackend
:- Parameters:
blackbox (
Blackbox
) – Blackbox to be used for simulation
syne_tune.blackbox_repository.utils module
- syne_tune.blackbox_repository.utils.metrics_for_configuration(blackbox, config, resource_attr, fidelity_range=None, seed=None)[source]
Returns all results for configuration
config
at fidelities in rangefidelity_range
.- Parameters:
blackbox (
Blackbox
) – Blackboxconfig (
Dict
[str
,Any
]) – Configurationresource_attr (
str
) – Name of resource attributefidelity_range (
Optional
[Tuple
[float
,float
]]) – Range [min_f, max_f], only fidelities in this range (both ends inclusive) are returned. Default is no filteringseed (
Optional
[int
]) – Seed for queries to blackbox. Drawn at random if not given
- Return type:
List
[dict
]- Returns:
List of result dicts
syne_tune.callbacks package
- class syne_tune.callbacks.TensorboardCallback(ignore_metrics=None, target_metric=None, mode=None, log_hyperparameters=True)[source]
Bases:
TunerCallback
Logs relevant metrics reported from trial evaluations, so they can be visualized with Tensorboard.
- Parameters:
ignore_metrics (
Optional
[List
[str
]]) – Defines which metrics should be ignored. If None, all metrics are reported to Tensorboard.target_metric (
Optional
[str
]) – Defines the metric we aim to optimize. If this argument is set, we report the cumulative optimum of this metric as well as the optimal hyperparameters we have found so far.mode (
Optional
[str
]) – Determined whether we maximize (“max”) or minimize (“min”) the target metric.log_hyperparameters (
bool
) – If set to True, we also log all hyperparameters specified in the configurations space.
- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
[str
,Any
]) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
Submodules
syne_tune.callbacks.hyperband_remove_checkpoints_callback module
- class syne_tune.callbacks.hyperband_remove_checkpoints_callback.TrialStatus[source]
Bases:
object
- RUNNING = 'RUNNING'
- PAUSED_WITH_CHECKPOINT = 'PAUSED-WITH-CP'
- PAUSED_NO_CHECKPOINT = 'PAUSED-NO-CP'
- STOPPED_OR_COMPLETED = 'STOPPED-COMPLETED'
- class syne_tune.callbacks.hyperband_remove_checkpoints_callback.BetaBinomialEstimator(beta_mean, beta_size)[source]
Bases:
object
Estimator of the probability \(p = P(X = 1)\) for a variable \(X\) with Bernoulli distribution. This is using a Beta prior, which is conjugate to the binomial likelihood. The prior is parameterized by effective sample size
beta_size
(\(a + b\)) and meanbeta_mean
(\(a / (a + b)\)).- property num_one: int
- property num_total
- class syne_tune.callbacks.hyperband_remove_checkpoints_callback.TrialInformation(trial_id, level, rank, rung_len, score_val=None)[source]
Bases:
object
-
trial_id:
str
-
level:
int
-
rank:
int
-
rung_len:
int
-
score_val:
Optional
[float
] = None
-
trial_id:
- class syne_tune.callbacks.hyperband_remove_checkpoints_callback.HyperbandRemoveCheckpointsCommon(max_num_checkpoints, max_wallclock_time, metric, resource_attr, mode)[source]
Bases:
TunerCallback
Common base class for
HyperbandRemoveCheckpointsCallback
andHyperbandRemoveCheckpointsBaselineCallback
.- property num_checkpoints_removed: int
- on_loop_end()[source]
Called at end of each tuning loop iteration
This is done before the loop stopping condition is checked and acted upon.
- on_trial_complete(trial, result)[source]
Called when a trial completes (
Status.completed
)The arguments here also have been passed to
scheduler.on_trial_complete
, before this call here.- Parameters:
trial (
Trial
) – Trial that just completed.result (
Dict
[str
,Any
]) – Last result obtained.
- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
[str
,Any
]) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
- on_start_trial(trial)[source]
Called just after a new trials is started
- Parameters:
trial (
Trial
) – Trial which has just been started
- on_resume_trial(trial)[source]
Called just after a trial is resumed
- Parameters:
trial (
Trial
) – Trial which has just been resumed
- trials_resumed_without_checkpoint()[source]
- Return type:
List
[Tuple
[str
,int
]]- Returns:
List of
(trial_id, level)
for trials which were resumed, even though their checkpoint was removed
- class syne_tune.callbacks.hyperband_remove_checkpoints_callback.HyperbandRemoveCheckpointsCallback(max_num_checkpoints, max_wallclock_time, metric, resource_attr, mode, approx_steps=25, prior_beta_mean=0.33, prior_beta_size=2, min_data_at_rung=5)[source]
Bases:
HyperbandRemoveCheckpointsCommon
Implements speculative early removal of checkpoints of paused trials for
HyperbandScheduler
(only for types which pause trials at rung levels).In this scheduler, any paused trial can in principle be resumed in the future, which is why we remove checkpoints speculatively. The idea is to keep the total number of checkpoints no larger than
max_num_checkpoints
. If this limit is reached, we rank all currently paused trials which still have a checkpoint and remove checkpoints for those with lowest scores. If a trial is resumed whose checkpoint has been removed, we have to train from scratch, at a cost proportional to the rung level the trial is paused at. The score is an approximation to this expected cost, the product of rung level and probability of getting resumed. This probability depends on the current rung size, the rank of the trial in the rung, and both the time spent and remaining for the experiment, so we needmax_wallclock_time
. Details are given in a technical report.The probability of getting resumed also depends on the probability \(p_r\) that a new trial arriving at rung \(r\) ranks better than an existing paused one with a checkpoint. These probabilities are estimated here. For each new arrival at a rung, we obtain one datapoint for every paused trial with checkpoint there. We use Bayesian estimators with Beta prior given by mean
prior_beta_mean
and sample sizeprior_beta_size
. The mean should be \(< 1/2\)). We also run an estimator for an overall probability \(p\), which is fed by all datapoints. This estimator is used as long as there are less than \(min_data_at_rung\) datapoints at rung \(r\).- Parameters:
max_num_checkpoints (
int
) – Once the total number of checkpoints surpasses this number, we remove some.max_wallclock_time (
int
) – Maximum time of the experimentmetric (
str
) – Name of metric inresult
ofon_trial_result()
resource_attr (
str
) – Name of resource attribute inresult
ofon_trial_result()
mode (
str
) – “min” or “max”approx_steps (
int
) – Number of approximation steps in score computation. Computations scale cubically in this number. Defaults to 25prior_beta_mean (
float
) – Parameter of Beta prior for estimators. Defaults to 0.33prior_beta_size (
float
) – Parameter of Beta prior for estimators. Defaults to 2min_data_at_rung (
int
) – See above. Defaults to 5
- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
[str
,Any
]) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
- class syne_tune.callbacks.hyperband_remove_checkpoints_callback.HyperbandRemoveCheckpointsBaselineCallback(max_num_checkpoints, max_wallclock_time, metric, resource_attr, mode, baseline=None)[source]
Bases:
HyperbandRemoveCheckpointsCommon
Implements some simple baselines to compare with
HyperbandRemoveCheckpointsCallback
.- Parameters:
max_num_checkpoints (
int
) – Once the total number of checkpoints surpasses this number, we remove some.max_wallclock_time (
int
) – Maximum time of the experimentmetric (
str
) – Name of metric inresult
ofon_trial_result()
resource_attr (
str
) – Name of resource attribute inresult
ofon_trial_result()
mode (
str
) – “min” or “max”baseline (
Optional
[str
]) –Type of baseline. Defaults to “by_level”
”random”: Select random paused trial with checkpoint
- ”by_level”: Select paused trial (with checkpoint) on lowest rung level,
and then of worst rank
syne_tune.callbacks.hyperband_remove_checkpoints_score module
- syne_tune.callbacks.hyperband_remove_checkpoints_score.compute_probabilities_of_getting_resumed(ranks, rung_lens, prom_quants, p_vals, time_ratio, approx_steps)[source]
Computes an approximation to the probability of getting resumed under our independence assumptions. This approximation improves with larger
approx_steps
, but its cost scales cubically in this number.- Parameters:
ranks (
ndarray
) – Ranks \(k\), starting from 1 (smaller is better)rung_lens (
ndarray
) – Rung lengths \(n_r\)prom_quants (
ndarray
) – Promotion quantiles \(\alpha_r\)p_vals (
ndarray
) – Probabilities \(p_r\)time_ratio (
float
) – Ratio \(\beta\) between time left and time spentapprox_steps (
int
) – Number of approximation steps, see above
- Return type:
ndarray
- Returns:
Approximations of probability to get resumed
syne_tune.callbacks.remove_checkpoints_callback module
- class syne_tune.callbacks.remove_checkpoints_callback.RemoveCheckpointsCallback[source]
Bases:
TunerCallback
This implements early removal of checkpoints of paused trials. In order for this to work, the scheduler needs to implement
trials_checkpoints_can_be_removed()
.
- class syne_tune.callbacks.remove_checkpoints_callback.DefaultRemoveCheckpointsSchedulerMixin[source]
Bases:
RemoveCheckpointsSchedulerMixin
Implements general case of
RemoveCheckpointsSchedulerMixin
, where the callback is of typeRemoveCheckpointsCallback
. This means scheduler has to implementtrials_checkpoints_can_be_removed()
.- trials_checkpoints_can_be_removed()[source]
Supports the general case (see header comment). This method returns IDs of paused trials for which checkpoints can safely be removed. These trials either cannot be resumed anymore, or it is very unlikely they will be resumed. Any trial ID needs to be returned only once, not over and over. If a trial gets stopped (by returning
SchedulerDecision.STOP
inon_trial_result()
), its checkpoint is removed anyway, so its ID does not have to be returned here.- Return type:
List
[int
]- Returns:
IDs of paused trials for which checkpoints can be removed
- callback_for_checkpoint_removal(stop_criterion)[source]
- Parameters:
stop_criterion (
Callable
[[TuningStatus
],bool
]) – Stopping criterion, as passed toTuner
- Return type:
Optional
[TunerCallback
]- Returns:
CP removal callback, or
None
if CP removal is not activated
syne_tune.callbacks.tensorboard_callback module
- class syne_tune.callbacks.tensorboard_callback.TensorboardCallback(ignore_metrics=None, target_metric=None, mode=None, log_hyperparameters=True)[source]
Bases:
TunerCallback
Logs relevant metrics reported from trial evaluations, so they can be visualized with Tensorboard.
- Parameters:
ignore_metrics (
Optional
[List
[str
]]) – Defines which metrics should be ignored. If None, all metrics are reported to Tensorboard.target_metric (
Optional
[str
]) – Defines the metric we aim to optimize. If this argument is set, we report the cumulative optimum of this metric as well as the optimal hyperparameters we have found so far.mode (
Optional
[str
]) – Determined whether we maximize (“max”) or minimize (“min”) the target metric.log_hyperparameters (
bool
) – If set to True, we also log all hyperparameters specified in the configurations space.
- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
[str
,Any
]) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
syne_tune.experiments package
- class syne_tune.experiments.ExperimentResult(name, results, metadata, tuner, path)[source]
Bases:
object
Wraps results dataframe and provides retrieval services.
- Parameters:
-
name:
str
-
results:
DataFrame
-
metadata:
Dict
[str
,Any
]
-
path:
Path
- plot_hypervolume(metrics_to_plot=None, reference_point=None, figure_path=None, **plt_kwargs)[source]
Plot best hypervolume value as function of wallclock time
- Parameters:
reference_point (
Optional
[ndarray
]) – Reference point for hypervolume calculations. If None, the maximum values of each metric is used.figure_path (
Optional
[str
]) – If specified, defines the path where the figure will be saved. If None, the figure is shownplt_kwargs – Arguments to
matplotlib.pyplot.plot()
- plot(metric_to_plot=0, figure_path=None, **plt_kwargs)[source]
Plot best metric value as function of wallclock time
- Parameters:
metric_to_plot (
Union
[str
,int
]) – Indicates which metric to plot, can be the index or a name of the metric. default to 0 - first metric definedfigure_path (
Optional
[str
]) – If specified, defines the path where the figure will be saved. If None, the figure is shownplt_kwargs – Arguments to
matplotlib.pyplot.plot()
- plot_trials_over_time(metric_to_plot=0, figure_path=None, figsize=None)[source]
Plot trials results over as function of wallclock time
- Parameters:
metric_to_plot (
Union
[str
,int
]) – Indicates which metric to plot, can be the index or a name of the metric. default to 0 - first metric definedfigure_path (
Optional
[str
]) – If specified, defines the path where the figure will be saved. If None, the figure is shownfigsize – width and height of figure
- best_config(metric=0)[source]
Return the best config found for the specified metric :type metric:
Union
[str
,int
] :param metric: Indicates which metric to use, can be the index or a name of the metric.default to 0 - first metric defined in the Scheduler
- Return type:
Dict
[str
,Any
]- Returns:
Configuration corresponding to best metric value
- syne_tune.experiments.load_experiment(tuner_name, download_if_not_found=True, load_tuner=False, local_path=None, experiment_name=None)[source]
Load results from an experiment
- Parameters:
tuner_name (
str
) – Name of a tuning experiment previously rundownload_if_not_found (
bool
) – If True, fetch results from S3 if not found locallyload_tuner (
bool
) – Whether to load the tuner in addition to metadata and resultslocal_path (
Optional
[str
]) – Path containing the experiment to load. If not specified,~/{SYNE_TUNE_FOLDER}/
is used.experiment_name (
Optional
[str
]) – If given, this is used as first directory.
- Return type:
- Returns:
Result object
- syne_tune.experiments.get_metadata(path_filter=None, root=PosixPath('/home/docs/syne-tune'))[source]
Load meta-data for a number of experiments
- Parameters:
path_filter (
Optional
[Callable
[[str
],bool
]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.root (
Path
) – Root path for experiment results. Default isexperiment_path()
- Return type:
Dict
[str
,dict
]- Returns:
Dictionary from tuner name to metadata dict
- syne_tune.experiments.list_experiments(path_filter=None, experiment_filter=None, root=PosixPath('/home/docs/syne-tune'), load_tuner=False)[source]
List experiments for which results are found
- Parameters:
path_filter (
Optional
[Callable
[[str
],bool
]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.experiment_filter (
Optional
[Callable
[[ExperimentResult
],bool
]]) – Filter onExperimentResult
, optionalroot (
Path
) – Root path for experiment results. Default is result ofexperiment_path()
load_tuner (
bool
) – Whether to load the tuner in addition to metadata and results
- Return type:
List
[ExperimentResult
]- Returns:
List of result objects
- syne_tune.experiments.load_experiments_df(path_filter=None, experiment_filter=None, root=PosixPath('/home/docs/syne-tune'), load_tuner=False)[source]
- Parameters:
path_filter (
Optional
[Callable
[[str
],bool
]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.experiment_filter (
Optional
[Callable
[[ExperimentResult
],bool
]]) – Filter onExperimentResult
root (
Path
) – Root path for experiment results. Default isexperiment_path()
load_tuner (
bool
) – Whether to load the tuner in addition to metadata and results
- Return type:
DataFrame
- Returns:
Dataframe that contains all evaluations reported by tuners according to the filter given. The columns contain trial-id, hyperparameter evaluated, metrics reported via
Reporter
. These metrics are collected automatically:st_worker_time
(indicating time spent in the worker when report was seen)time
(indicating wallclock time measured by the tuner)decision
decision taken by the scheduler when observing the resultstatus
status of the trial that was shown to the tunerconfig_{xx}
configuration value for the hyperparameter{xx}
tuner_name
named passed when instantiating the Tunerentry_point_name
,entry_point_path
name and path of the entry point that was tuned
- class syne_tune.experiments.ComparativeResults(experiment_names, setups, num_runs, metadata_to_setup, plot_params=None, metadata_to_subplot=None, benchmark_key='benchmark', with_subdirs='*', datetime_bounds=None, metadata_keys=None, metadata_subplot_level=False, download_from_s3=False, s3_bucket=None)[source]
Bases:
object
This class loads, processes, and plots results of a comparative study, combining several experiments for different methods, seeds, and benchmarks (optional). Note that an experiment corresponds to one run of HPO, resulting in files
ST_METADATA_FILENAME
for metadata, andST_RESULTS_DATAFRAME_FILENAME
for time-stamped results.There is one comparative plot per benchmark (aggregation of results across benchmarks are not supported here). Results are grouped by setup (which usually equates to method), and then summary statistics are shown for each setup as function of wall-clock time. The plot can also have several subplots, in which case results are first grouped into subplot number, then setup.
If
benchmark_key is None
, there is only a single benchmark, and all results are merged together.Both setup name and subplot number (optional) can be configured by the user, as function of metadata written for each experiment. The functions
metadata_to_setup
andmetadata_to_subplot
(optional) can also be used for filtering: results of experiments for which any of them returnsNone
, are not used.When grouping results w.r.t. benchmark name and setup name, we should end up with
num_runs
experiments. These are (typically) random repetitions with different seeds. If after grouping, a different number of experiments is found for some setup, a warning message is printed. In this case, we recommend to check the completeness of result files. Common reasons:Less than
num_runs
experiments found. Experiments failed, or files were not properly synced.More than
num_runs
experiments found. This happens if initial experiments for the study failed, but ended up writing results. This can be fixed by either removing the result files, or by usingdatetime_bounds
(since initial failed experiments ran first).
Result files have the path
f"{experiment_path()}{ename}/{patt}/{ename}-*/"
, wherepath
is fromwith_subdirs
, andename
fromexperiment_names
. The default iswith_subdirs="*"
. Ifwith_subdirs
isNone
, result files have the pathf"{experiment_path()}{ename}-*/"
. Use this if your experiments have been run locally.If
datetime_bounds
is given, it contains a tuple of strings(lower_time, upper_time)
, or a dictionary mapping names fromexperiment_names
to such tuples. Both strings are time-stamps in the formatST_DATETIME_FORMAT
(example: “2023-03-19-22-01-57”), and each can beNone
as well. This serves to filter out any result whose time-stamp does not fall within the interval (both sides are inclusive), whereNone
means the interval is open on that side. This feature is useful to filter out results of erroneous attempts.If
metadata_keys
is given, it contains a list of keys into the metadata. In this case, metadata values for these keys are extracted and can be retrieved withmetadata_values()
. In fact,metadata_values(benchmark_name)
returns a nested dictionary, whereresult[key][setup_name]
is a list of values. Ifmetadata_subplot_level
isTrue
andmetadata_to_subplot
is given, the result structure isresult[key][setup_name][subplot_no]
. This should be set if different subplots share the same setup names, since otherwise metadata values are only grouped by setup name.- Parameters:
experiment_names (
Tuple
[str
,...
]) – Tuple of experiment names (prefixes, without the timestamps)setups (
Iterable
[str
]) – Possible values of setup namesnum_runs (
int
) – When grouping results w.r.t. benchmark name and setup name, we should end up with this many experiments. See abovemetadata_to_setup (
Union
[Callable
[[Dict
[str
,Any
]],Optional
[str
]],Dict
[str
,Callable
[[Dict
[str
,Any
]],Optional
[str
]]]]) – See aboveplot_params (
Optional
[PlotParameters
]) – Parameters controlling the plot. Can be overwritten inplot()
. SeePlotParameters
metadata_to_subplot (
Optional
[Callable
[[Dict
[str
,Any
]],Optional
[int
]]]) – See above. Optionalbenchmark_key (
Optional
[str
]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this isNone
, there is only a single benchmark, and all results are merged togetherwith_subdirs (
Union
[str
,List
[str
],None
]) – See above. Defaults to “*”datetime_bounds (
Union
[Tuple
[Optional
[str
],Optional
[str
]],Dict
[str
,Tuple
[Optional
[str
],Optional
[str
]]],None
]) – See abovemetadata_keys (
Optional
[List
[str
]]) – See abovemetadata_subplot_level (
bool
) – See above. Defaults toFalse
download_from_s3 (
bool
) – Should result files be downloaded from S3? This is supported only ifwith_subdirs
s3_bucket (
Optional
[str
]) – Only ifdownload_from_s3 == True
. If not given, the default bucket for the SageMaker session is used
- metadata_values(benchmark_name=None)[source]
The nested dictionary returned has the structure
result[key][setup_name]
, orresult[key][setup_name][subplot_no]
ifmetadata_subplot_level == True
.- Parameters:
benchmark_name (
Optional
[str
]) – Name of benchmark- Return type:
Dict
[str
,Any
]- Returns:
Nested dictionary with meta-data values
- plot(benchmark_name=None, plot_params=None, file_name=None, extra_results_keys=None, dataframe_column_generator=None, one_result_per_trial=False)[source]
Create comparative plot from results of all experiments collected at construction, for benchmark
benchmark_name
(if there is a single benchmark only, this need not be given).If
plot_params.show_init_trials
is given, the best metric value curve for the data from trials<= plot_params.show_init_trials.trial_id
in a particular setupplot_params.show_init_trials.setup_name
is shown in all subplots the setup is contained in. This is useful to contrast the performance of methods against the performance for one particular trial, for example the initial configuration (i.e., to show how much this can be improved upon). The final metric value of this extra curve is extended until the end of the horizontal range, in order to make it visible. The corresponding curve is labeled withplot_params.show_init_trials.new_setup_name
in the legend.If
extra_results_keys
is given, these are column names in the result dataframe. For each setup and seed, we collect the values for the largest time stamp. We return a nested dictionaryextra_results
, so thatextra_results[setup_name][key]
contains values (over seeds), wherekey
is inextra_results_keys
. Ifmetadata_subplot_level
isTrue
andmetadata_to_subplot
is given, the structure isextra_results[setup_name][subplot_no][key]
.If
dataframe_column_generator
is given, it maps a result dataframe for a single experiment to a new column namedplot_params.metric
. This is applied before computing cumulative maximum or minimum and aggregation over seeds. This way, we can plot derived metrics which are not contained in the results as columns. Note that the transformed dataframe is not retained.- Parameters:
benchmark_name (
Optional
[str
]) – Name of benchmark for which to plot results. Not needed if there is only one benchmarkplot_params (
Optional
[PlotParameters
]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.file_name (
Optional
[str
]) – If given, the figure is stored in a file of this nameextra_results_keys (
Optional
[List
[str
]]) – See above, optionaldataframe_column_generator (
Optional
[Callable
[[DataFrame
],Series
]]) – See above, optionalone_result_per_trial (
bool
) – IfTrue
, results for each experiment are filtered down to one row per trial (the one with the largest time stamp). This is useful for results from a single-fidelity method, where the training script reported results after every epoch.
- Return type:
Dict
[str
,Any
]- Returns:
Dictionary with “fig”, “axs” (for further processing). If
extra_results_keys
, “extra_results” entry as stated above
- class syne_tune.experiments.PlotParameters(metric=None, mode=None, title=None, xlabel=None, ylabel=None, xlim=None, ylim=None, metric_multiplier=None, convert_to_min=None, tick_params=None, aggregate_mode=None, dpi=None, grid=None, subplots=None, show_init_trials=None)[source]
Bases:
object
Parameters specifying the figure.
If
convert_to_min == True
, then smaller is better in plots. An original metric valuemetric_val
is converted asmetric_multiplier * metric_val
ifmode == "min"
, and as1 - metric_multiplier * metric_val
ifmode == "max"
. Ifconvert_to_min == False`
, we always convert asmetric_multiplier * metric_val
, so that larger is better ifmode == "max"
.- Parameters:
metric (
Optional
[str
]) – Name of metric, mandatorymode (
Optional
[str
]) – See above, “min” or “max”. Defaults to “min” if not giventitle (
Optional
[str
]) – Title of plot. Ifsubplots
is used, seeSubplotParameters
xlabel (
Optional
[str
]) – Label for x axis. Ifsubplots
is used, this is printed below each column. Defaults toDEFAULT_XLABEL
ylabel (
Optional
[str
]) – Label for y axis. Ifsubplots
is used, this is printed left of each rowxlim (
Optional
[Tuple
[float
,float
]]) –(x_min, x_max)
for x axis. Ifsubplots
is used, seeSubplotParameters
ylim (
Optional
[Tuple
[float
,float
]]) –(y_min, y_max)
for y axis.metric_multiplier (
Optional
[float
]) – See above. Defaults to 1convert_to_min (
Optional
[bool
]) – See above. Defaults toTrue
tick_params (
Optional
[Dict
[str
,Any
]]) – Params forax.tick_params
aggregate_mode (
Optional
[str
]) –How are values across seeds aggregated?
”mean_and_ci”: Mean and 0.95 normal confidence interval
”median_percentiles”: Mean and 25, 75 percentiles
”iqm_bootstrap”: Interquartile mean and 0.95 confidence interval based on the bootstrap variance estimate
Defaults to
DEFAULT_AGGREGATE_MODE
dpi (
Optional
[int
]) – Resolution of figure in DPI. Defaults to 200grid (
Optional
[bool
]) – Figure with grid? Defaults toFalse
subplots (
Optional
[SubplotParameters
]) – If given, the figure consists of several subplots. SeeSubplotParameters
show_init_trials (
Optional
[ShowTrialParameters
]) – SeeShowTrialParameters
-
metric:
str
= None
-
mode:
str
= None
-
title:
str
= None
-
xlabel:
str
= None
-
ylabel:
str
= None
-
xlim:
Tuple
[float
,float
] = None
-
ylim:
Tuple
[float
,float
] = None
-
metric_multiplier:
float
= None
-
convert_to_min:
bool
= None
-
tick_params:
Dict
[str
,Any
] = None
-
aggregate_mode:
str
= None
-
dpi:
int
= None
-
grid:
bool
= None
-
subplots:
SubplotParameters
= None
-
show_init_trials:
ShowTrialParameters
= None
- class syne_tune.experiments.SubplotParameters(nrows=None, ncols=None, titles=None, title_each_figure=None, kwargs=None, legend_no=None, xlims=None, subplot_indices=None)[source]
Bases:
object
Parameters specifying an arrangement of subplots.
kwargs
is mandatory.- Parameters:
nrows (
Optional
[int
]) – Number of rows of subplot matrixncols (
Optional
[int
]) – Number of columns of subplot matrixtitles (
Optional
[List
[str
]]) – If given, these are titles for each column in the arrangement of subplots. Iftitle_each_figure == True
, these are titles for each subplot. Iftitles
is not given, thenPlotParameters.title
is printed on top of the leftmost columntitle_each_figure (
Optional
[bool
]) – Seetitles
, defaults toFalse
kwargs (
Optional
[Dict
[str
,Any
]]) – Extra arguments forplt.subplots
, apart from “nrows” and “ncols”legend_no (
Optional
[List
[int
]]) – Subplot indices where legend is to be shown. Defaults to[]
(no legends shown). This is not relative tosubplot_indices
xlims (
Optional
[List
[int
]]) – If this is given, must be a list with one entry per subfigure. In this case, the globalxlim
is overwritten by(0, xlims[subplot_no])
. Ifsubplot_indices
is given,xlims
must have the same length, andxlims[j]
refers to subplot indexsubplot_indices[j]
thensubplot_indices (
Optional
[List
[int
]]) – If this is given, we only plot subfigures with indices in this list, and in this order. Otherwise, we plot subfigures 0, 1, 2, …
-
nrows:
int
= None
-
ncols:
int
= None
-
titles:
List
[str
] = None
-
title_each_figure:
bool
= None
-
kwargs:
Dict
[str
,Any
] = None
-
legend_no:
List
[int
] = None
-
xlims:
List
[int
] = None
-
subplot_indices:
List
[int
] = None
- class syne_tune.experiments.ShowTrialParameters(setup_name=None, trial_id=None, new_setup_name=None)[source]
Bases:
object
Parameters specifying the
show_init_trials
feature. This features adds one more curve to each subplot wheresetup_name
features. This curve shows best metric value found for trials with ID<= trial_id
. The right-most value is extended as constant line across the remainder of the x-axis, for better visibility.- Parameters:
setup_name (
Optional
[str
]) – Setup from which the trial performance is takentrial_id (
Optional
[int
]) – ID of trial. Defaults to 0. If this is positive, data from trials with IDs<= trial_id
are shownnew_setup_name (
Optional
[str
]) – Name of the additional curve in legends
-
setup_name:
str
= None
-
trial_id:
int
= None
-
new_setup_name:
str
= None
- class syne_tune.experiments.TrialsOfExperimentResults(experiment_names, setups, metadata_to_setup, plot_params=None, multi_fidelity_params=None, benchmark_key='benchmark', seed_key='seed', with_subdirs='*', datetime_bounds=None, download_from_s3=False, s3_bucket=None)[source]
Bases:
object
This class loads, processes, and plots metric results for single experiments, where the curves for different trials have different colours.
Compared to
ComparativeResults
, each subfigure uses data from a single experiment (one benchmark, one seed, one setup). Both benchmark and seed need to be chosen inplot()
. If there are different setups, they give rise to subfigures.If
plot_params.subplots
is not given, the arrangement is one row with columns corresponding to setups, and setup names as titles. Specifyplot_params.subplots
in order to change this arrangement (e.g., to have more than one row). Setups can be selected by usingplot_params.subplots.subplot_indices
. Also, ifplot_params.subplots.titles
is not given, we use setup names, and each subplot gets its own title (plot_params.subplots.title_each_figure
is ignored).For
plot_params
, we use the samePlotParameters
as inComparativeResults
, but some fields are not used here (title
,aggregate_mode
,show_one_trial
,subplots.legend_no
,subplots.xlims
).- Parameters:
experiment_names (
Tuple
[str
,...
]) – Tuple of experiment names (prefixes, without the timestamps)setups (
Iterable
[str
]) – Possible values of setup namesmetadata_to_setup (
Union
[Callable
[[Dict
[str
,Any
]],Optional
[str
]],Dict
[str
,Callable
[[Dict
[str
,Any
]],Optional
[str
]]]]) – See aboveplot_params (
Optional
[PlotParameters
]) – Parameters controlling the plot. Can be overwritten inplot()
. SeePlotParameters
multi_fidelity_params (
Optional
[MultiFidelityParameters
]) – If given, we use a special variant tailored to multi-fidelity methods (seeplot()
).benchmark_key (
Optional
[str
]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this isNone
, there is only a single benchmark, and all results are merged togetherseed_key (
str
) – Key for seed in metadata files. Defaults to “seed”.with_subdirs (
Union
[str
,List
[str
],None
]) – See above. Defaults to “*”datetime_bounds (
Union
[Tuple
[Optional
[str
],Optional
[str
]],Dict
[str
,Tuple
[Optional
[str
],Optional
[str
]]],None
]) – See abovedownload_from_s3 (
bool
) – Should result files be downloaded from S3? This is supported only ifwith_subdirs
s3_bucket (
Optional
[str
]) – Only ifdownload_from_s3 == True
. If not given, the default bucket for the SageMaker session is used
- plot(benchmark_name=None, seed=0, plot_params=None, file_name=None)[source]
Creates a plot, whose subfigures should metric data from single experiments. In general:
Each trial has its own color, which is cycled through periodically. The cycling depends on the largest rung level for the trial. This is to avoid neighboring curves to have the same color
For single-fidelity methods (default,
multi_fidelity_params
not given):The learning curve for a trial ends with ‘o’. If it reports only once at the end, this is all that is shown for the trial
For multi-fidelity methods:
Learning curves are plotted in contiguous chunks of execution. For pause and resume setups (those in ``multi_fidelity_params.pause_resume_setups), they are interrupted. Each chunk starts at the epoch after resume and ends at the epoch where the trial is paused
Values at rung levels are marked as ‘o’. If this is the furthest the trial got to, the marker is ‘D’ (diamond)
Results for different setups are plotted as subfigures, either using the setup in
plot_params.subplots
, or as columns of a single row.- Parameters:
benchmark_name (
Optional
[str
]) – Name of benchmark for which to plot results. Not needed if there is only one benchmarkseed (
int
) – Seed number. Defaults to 0plot_params (
Optional
[PlotParameters
]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.file_name (
Optional
[str
]) – If given, the figure is stored in a file of this name
- class syne_tune.experiments.MultiFidelityParameters(rung_levels, multifidelity_setups)[source]
Bases:
object
Parameters configuring the multi-fidelity version of
TrialsOfExperimentResults
.multifidelity_setups
contains names of setups which are multi-fidelity, the remaining ones are single-fidelity. It can also be a dictionary, mapping a multi-fidelity setup name toTrue
if this is a pause-and-resume method (these are visualized differently),False
otherwise (early stopping method).- Parameters:
rung_levels (
List
[int
]) – See above. Positive integers, increasingmultifidelity_setups (
Union
[List
[str
],Dict
[str
,bool
]]) – See above
-
rung_levels:
List
[int
]
-
multifidelity_setups:
Union
[List
[str
],Dict
[str
,bool
]]
- syne_tune.experiments.hypervolume_indicator_column_generator(metrics_and_modes, reference_point=None, increment=1)[source]
Returns generator for new dataframe column containing the best hypervolume indicator as function of wall-clock time, based on the metrics in
metrics_and_modes
(metric names correspond to column names in the dataframe). For a metric withmode == "max"
, we use its negative.This mapping is used to create the
dataframe_column_generator
argument ofplot()
. Since the current implementation is not incremental and quite slow, if you plot results for single-fidelity HPO methods, it is strongly recommended to also useone_result_per_trial=True
:results = ComparativeResults(...) dataframe_column_generator = hypervolume_indicator_column_generator( metrics_and_modes ) plot_params = PlotParameters( metric="hypervolume_indicator", mode="max", ) results.plot( benchmark_name=benchmark_name, plot_params=plot_params, dataframe_column_generator=dataframe_column_generator, one_result_per_trial=True, )
- Parameters:
metrics_and_modes (
List
[Tuple
[str
,str
]]) – List of(metric, mode)
, see abovereference_point (
Optional
[ndarray
]) – Reference point for hypervolume computation. If not given, a default value is usedincrement (
int
) – If> 1
, the HV indicator is linearly interpolated, this is faster. Defaults to 1 (no interpolation)
- Returns:
Dataframe column generator
Subpackages
syne_tune.experiments.benchmark_definitions package
Submodules
syne_tune.experiments.benchmark_definitions.common module
- class syne_tune.experiments.benchmark_definitions.common.SurrogateBenchmarkDefinition(max_wallclock_time, n_workers, elapsed_time_attr, metric, mode, blackbox_name, dataset_name, max_num_evaluations=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, max_resource_attr=None, datasets=None, fidelities=None, points_to_evaluate=None)[source]
Bases:
object
Meta-data for tabulated benchmark, served by the blackbox repository.
For a standard benchmark,
metric
andmode
are scalars, and there is a single metric. For a multi-objective benchmark (e.g., constrained HPO, cost-aware HPO, sampling of Pareto front),metric
must be a list with the names of the different objectives. In this case,mode
is a list of the same size or a scalar.Note
In Syne Tune experimentation, a benchmark is simply a tuning problem (training and evaluation code or blackbox, together with defaults). They are useful beyond benchmarking (i.e., comparing different HPO methods with each other), in that many experimental studies compare setups with a single HPO method, but different variations of the tuning problem of the backend.
- Parameters:
max_wallclock_time (
float
) – Default value for stopping criterionn_workers (
int
) – Default value for tunerelapsed_time_attr (
str
) – Name of metric reportedmetric (
Union
[str
,List
[str
]]) – Name of metric reported (or list of several)mode (
Union
[str
,List
[str
]]) – “max” or “min” (or list of several)blackbox_name (
str
) – Name of blackbox, seeload_blackbox()
dataset_name (
str
) – Dataset (or instance) for blackboxmax_num_evaluations (
Optional
[int
]) – Default value for stopping criterionsurrogate (
Optional
[str
]) – Default value for surrogate to be used, seemake_surrogate()
. Otherwise: use no surrogatesurrogate_kwargs (
Optional
[dict
]) – Default value for arguments of surrogate, seemake_surrogate()
add_surrogate_kwargs (
Optional
[dict
]) – Arguments passed toadd_surrogate()
. Optional.max_resource_attr (
Optional
[str
]) – Internal name between backend and schedulerdatasets (
Optional
[List
[str
]]) – Used in transfer tuningfidelities (
Optional
[List
[int
]]) – If given, this is a strictly increasing subset of the fidelity values provided by the surrogate, and only those will be reportedpoints_to_evaluate (
Optional
[List
[Dict
[str
,Any
]]]) – Initial configurations to be suggested by the scheduler. If your benchmark training code suggests default values for the hyperparameters, it is good practice serving this default configuration here.
-
max_wallclock_time:
float
-
n_workers:
int
-
elapsed_time_attr:
str
-
metric:
Union
[str
,List
[str
]]
-
mode:
Union
[str
,List
[str
]]
-
blackbox_name:
str
-
dataset_name:
str
-
max_num_evaluations:
Optional
[int
] = None
-
surrogate:
Optional
[str
] = None
-
surrogate_kwargs:
Optional
[dict
] = None
-
add_surrogate_kwargs:
Optional
[dict
] = None
-
max_resource_attr:
Optional
[str
] = None
-
datasets:
Optional
[List
[str
]] = None
-
fidelities:
Optional
[List
[int
]] = None
-
points_to_evaluate:
Optional
[List
[Dict
[str
,Any
]]] = None
- class syne_tune.experiments.benchmark_definitions.common.RealBenchmarkDefinition(script, config_space, max_wallclock_time, n_workers, instance_type, metric, mode, max_resource_attr, framework, resource_attr=None, estimator_kwargs=None, max_num_evaluations=None, points_to_evaluate=None)[source]
Bases:
object
Meta-data for real benchmark, given by code.
For a standard benchmark,
metric
andmode
are scalars, and there is a single metric. For a multi-objective benchmark (e.g., constrained HPO, cost-aware HPO, sampling of Pareto front),metric
must be a list with the names of the different objectives. In this case,mode
is a list of the same size or a scalar.Note
In Syne Tune experimentation, a benchmark is simply a tuning problem (training and evaluation code or blackbox, together with defaults). They are useful beyond benchmarking (i.e., comparing different HPO methods with each other), in that many experimental studies compare setups with a single HPO method, but different variations of the tuning problem of the backend.
- Parameters:
script (
Path
) – Absolute filename of training scriptconfig_space (
Dict
[str
,Any
]) – Default value for configuration space, must includemax_resource_attr
max_wallclock_time (
float
) – Default value for stopping criterionn_workers (
int
) – Default value for tunerinstance_type (
str
) – Default value for instance typemetric (
str
) – Name of metric reported (or list of several)mode (
str
) – “max” or “min” (or list of several)max_resource_attr (
str
) – Name ofconfig_space
entryframework (
str
) – SageMaker framework to be used forscript
. Additional dependencies inrequirements.txt
inscript.parent
- :param resource_attr Name of attribute reported (required for
multi-fidelity)
- Parameters:
estimator_kwargs (
Optional
[dict
]) – Additional arguments to SageMaker estimator, e.g.framework_version
max_num_evaluations (
Optional
[int
]) – Default value for stopping criterionpoints_to_evaluate (
Optional
[List
[Dict
[str
,Any
]]]) – Initial configurations to be suggested by the scheduler. If your benchmark training code suggests default values for the hyperparameters, it is good practice serving this default configuration here.
-
script:
Path
-
config_space:
Dict
[str
,Any
]
-
max_wallclock_time:
float
-
n_workers:
int
-
instance_type:
str
-
metric:
str
-
mode:
str
-
max_resource_attr:
str
-
framework:
str
-
resource_attr:
Optional
[str
] = None
-
estimator_kwargs:
Optional
[dict
] = None
-
max_num_evaluations:
Optional
[int
] = None
-
points_to_evaluate:
Optional
[List
[Dict
[str
,Any
]]] = None
syne_tune.experiments.benchmark_definitions.fcnet module
syne_tune.experiments.benchmark_definitions.lcbench module
- syne_tune.experiments.benchmark_definitions.lcbench.lcbench_benchmark(dataset_name, datasets=None)[source]
The default is to use nearest neighbour regression with
K=1
. If you use a more sophisticated surrogate, it is recommended to also defineadd_surrogate_kwargs
, for example:surrogate="RandomForestRegressor", add_surrogate_kwargs={ "predict_curves": True, "fit_differences": ["time"], },
- Parameters:
dataset_name (
str
) – Value fordataset_name
datasets – Used for transfer learning
- Return type:
- Returns:
Definition of benchmark
syne_tune.experiments.benchmark_definitions.nas201 module
syne_tune.experiments.benchmark_definitions.yahpo module
syne_tune.experiments.launchers package
Submodules
syne_tune.experiments.launchers.hpo_main_common module
- class syne_tune.experiments.launchers.hpo_main_common.Parameter(name, type, help, default, required=False)[source]
Bases:
object
-
name:
str
-
type:
Any
-
help:
str
-
default:
Any
-
required:
bool
= False
-
name:
- class syne_tune.experiments.launchers.hpo_main_common.ConfigDict(**kwargs)[source]
Bases:
object
Dictinary with arguments for launcher scripts. Expected params as Parameter(name, type, default value)
- check_if_all_paremeters_present(desired_parameters)[source]
Verify that all the parameers present in desired_parameters can be found in this ConfigDict
- extra_parameters()[source]
Return all parameters beyond those required Required are the defauls and those requested in argparse
- Return type:
List
[Dict
[str
,Any
]]
- expand_base_arguments(extra_base_arguments)[source]
Expand the list of base argument for this experiment with those in extra_base_arguments
- static from_argparse(extra_args=None)[source]
Build the configuration dict from command line arguments
- Parameters:
extra_args (
Optional
[List
[Dict
[str
,Any
]]]) – Extra arguments for command line parser. Optional- Return type:
- syne_tune.experiments.launchers.hpo_main_common.get_metadata(seed, method, experiment_tag, benchmark_name, random_seed, max_size_data_for_model=None, benchmark=None, extra_metadata=None)[source]
Returns default value for
metadata
passed toTuner
.- Parameters:
seed (
int
) – Seed of repetitionmethod (
str
) – Name of methodexperiment_tag (
str
) – Tag of experimentbenchmark_name (
str
) – Name of benchmarkrandom_seed (
int
) – Master random seedmax_size_data_for_model (
Optional
[int
]) – Limits number of datapoints for surrogate model of BO, MOBSTER or HyperTunebenchmark (
Union
[SurrogateBenchmarkDefinition
,RealBenchmarkDefinition
,None
]) – Optional. Taken_workers
,max_wallclock_time
from thereextra_metadata (
Optional
[Dict
[str
,Any
]]) –metadata
updated by these at the end. Optional
- Return type:
Dict
[str
,Any
]- Returns:
Default
metadata
dictionary
- syne_tune.experiments.launchers.hpo_main_common.extra_metadata(args, extra_args)[source]
- Return type:
Dict
[str
,Any
]
syne_tune.experiments.launchers.hpo_main_local module
- syne_tune.experiments.launchers.hpo_main_local.get_benchmark(configuration, benchmark_definitions, **benchmark_kwargs)[source]
If
configuration.benchmark
isNone
andbenchmark_definitions
maps to a single benchmark,configuration.benchmark
is set to its key.- Return type:
- syne_tune.experiments.launchers.hpo_main_local.create_objects_for_tuner(configuration, methods, method, benchmark, master_random_seed, seed, verbose, extra_tuning_job_metadata=None, map_method_args=None, extra_results=None, num_gpus_per_trial=1)[source]
- Return type:
Dict
[str
,Any
]
- syne_tune.experiments.launchers.hpo_main_local.start_experiment_local_backend(configuration, methods, benchmark_definitions, extra_results=None, map_method_args=None, extra_tuning_job_metadata=None)[source]
Runs sequence of experiments with local backend sequentially. The loop runs over methods selected from
methods
and repetitions,map_method_args
can be used to modifymethod_kwargs
for constructingMethodArguments
, depending onconfiguration
and the method. This allows for extra flexibility to specify specific arguments for chosen methods Its signature ismethod_kwargs = map_method_args(configuration, method, method_kwargs)
, wheremethod
is the name of the baseline.Note
When this is launched remotely as entry point of a SageMaker training job (command line
--launched_remotely 1
), the backend is configured to write logs and checkpoints to a directory which is not synced to S3. This is different to the tuner path, which is “/opt/ml/checkpoints”, so that tuning results are synced to S3. Syncing checkpoints to S3 is not recommended (it is slow and can lead to failures, since several worker processes write to the same synced directory).- Parameters:
configuration (
ConfigDict
) – ConfigDict with parameters of the experiment. Must contain all parameters from LOCAL_BACKEND_EXTRA_PARAMETERSmethods (
Dict
[str
,Callable
[[MethodArguments
],TrialScheduler
]]) – Dictionary with method constructors.benchmark_definitions (
Callable
[...
,Dict
[str
,RealBenchmarkDefinition
]]) – Definitions of benchmarks; one is selected from command line argumentsextra_results (
Optional
[ExtraResultsComposer
]) – If given, this is used to append extra information to the results dataframemap_method_args (
Optional
[Callable
[[ConfigDict
,str
,Dict
[str
,Any
]],Dict
[str
,Any
]]]) – See above, optionalextra_tuning_job_metadata (
Optional
[Dict
[str
,Any
]]) – Metadata added to the tuner, can be used to manage results
- syne_tune.experiments.launchers.hpo_main_local.main(methods, benchmark_definitions, extra_args=None, map_method_args=None, extra_results=None)[source]
Runs sequence of experiments with local backend sequentially. The loop runs over methods selected from
methods
and repetitions, both controlled by command line arguments.map_method_args
can be used to modifymethod_kwargs
for constructingMethodArguments
, depending onconfiguration
returned byparse_args()
and the method. Its signature ismethod_kwargs = map_method_args(configuration, method, method_kwargs)
, wheremethod
is the name of the baseline. It is called just before the method is created.- Parameters:
methods (
Dict
[str
,Callable
[[MethodArguments
],TrialScheduler
]]) – Dictionary with method constructorsbenchmark_definitions (
Callable
[...
,Dict
[str
,RealBenchmarkDefinition
]]) – Definitions of benchmarks; one is selected from command line argumentsextra_args (
Optional
[List
[Dict
[str
,Any
]]]) – Extra arguments for command line parser. Optionalmap_method_args (
Optional
[Callable
[[ConfigDict
,str
,Dict
[str
,Any
]],Dict
[str
,Any
]]]) – See above, optionalextra_results (
Optional
[ExtraResultsComposer
]) – If given, this is used to append extra information to the results dataframe
syne_tune.experiments.launchers.hpo_main_sagemaker module
- syne_tune.experiments.launchers.hpo_main_sagemaker.start_experiment_sagemaker_backend(configuration, methods, benchmark_definitions, extra_results=None, map_method_args=None, extra_tuning_job_metadata=None)[source]
Runs experiment with SageMaker backend.
map_method_args
can be used to modifymethod_kwargs
for constructingMethodArguments
, depending onconfiguration
and the method. This allows for extra flexibility to specify specific arguments for chosen methods Its signature ismethod_kwargs = map_method_args(configuration, method, method_kwargs)
, wheremethod
is the name of the baseline.- Parameters:
configuration (
ConfigDict
) – ConfigDict with parameters of the experiment. Must contain all parameters from SAGEMAKER_BACKEND_EXTRA_PARAMETERSmethods (
Dict
[str
,Callable
[[MethodArguments
],TrialScheduler
]]) – Dictionary with method constructors.benchmark_definitions (
Callable
[...
,Dict
[str
,RealBenchmarkDefinition
]]) – Definitions of benchmarks; one is selected from command line argumentsextra_results (
Optional
[ExtraResultsComposer
]) – If given, this is used to append extra information to the results dataframemap_method_args (
Optional
[Callable
[[ConfigDict
,str
,Dict
[str
,Any
]],Dict
[str
,Any
]]]) – See above, optionalextra_tuning_job_metadata (
Optional
[Dict
[str
,Any
]]) – Metadata added to the tuner, can be used to manage results
- syne_tune.experiments.launchers.hpo_main_sagemaker.main(methods, benchmark_definitions, extra_args=None, map_method_args=None, extra_results=None)[source]
Runs experiment with SageMaker backend.
Command line arguments must specify a single benchmark, method, and seed, for example
--method ASHA --num_seeds 5 --start_seed 4
starts experiment withseed=4
, or--method ASHA --num_seeds 1
starts experiment withseed=0
. Here,ASHA
must be key inmethods
.map_method_args
can be used to modifymethod_kwargs
for constructingMethodArguments
, depending onconfiguration
returned byparse_args()
and the method. Its signature ismethod_kwargs = map_method_args(configuration, method, method_kwargs)
, wheremethod
is the name of the baseline. It is called just before the method is created.- Parameters:
methods (
Dict
[str
,Callable
[[MethodArguments
],TrialScheduler
]]) – Dictionary with method constructorsbenchmark_definitions (
Callable
[...
,Dict
[str
,RealBenchmarkDefinition
]]) – Definitions of benchmark; one is selected from command line argumentsextra_args (
Optional
[List
[Dict
[str
,Any
]]]) – Extra arguments for command line parser. Optionalmap_method_args (
Optional
[Callable
[[ConfigDict
,str
,Dict
[str
,Any
]],Dict
[str
,Any
]]]) – See above. Needed ifextra_args
is givenextra_results (
Optional
[ExtraResultsComposer
]) – If given, this is used to append extra information to the results dataframe
syne_tune.experiments.launchers.hpo_main_simulator module
- syne_tune.experiments.launchers.hpo_main_simulator.is_dict_of_dict(benchmark_definitions)[source]
- Return type:
bool
- syne_tune.experiments.launchers.hpo_main_simulator.get_transfer_learning_evaluations(blackbox_name, test_task, datasets, n_evals=None)[source]
- Parameters:
blackbox_name (
str
) – name of blackboxtest_task (
str
) – task where the performance would be tested, it is excluded from transfer-learning evaluationsdatasets (
Optional
[List
[str
]]) – subset of datasets to consider, only evaluations from those datasets are provided to
transfer-learning methods. If none, all datasets are used. :type n_evals:
Optional
[int
] :param n_evals: maximum number of evaluations to be returned :rtype:Dict
[str
,Any
] :return:
- syne_tune.experiments.launchers.hpo_main_simulator.start_experiment_simulated_backend(configuration, methods, benchmark_definitions, extra_results=None, map_method_args=None, extra_tuning_job_metadata=None, use_transfer_learning=False)[source]
Runs sequence of experiments with simulator backend sequentially. The loop runs over methods selected from
methods
, repetitions and benchmarks selected frombenchmark_definitions
map_method_args
can be used to modifymethod_kwargs
for constructingMethodArguments
, depending onconfiguration
and the method. This allows for extra flexibility to specify specific arguments for chosen methods Its signature ismethod_kwargs = map_method_args(configuration, method, method_kwargs)
, wheremethod
is the name of the baseline.- Parameters:
configuration (
ConfigDict
) – ConfigDict with parameters of the experiment. Must contain all parameters from LOCAL_LOCAL_SIMULATED_BENCHMARK_REQUIRED_PARAMETERSmethods (
Dict
[str
,Callable
[[MethodArguments
],TrialScheduler
]]) – Dictionary with method constructors.benchmark_definitions (
Union
[Dict
[str
,SurrogateBenchmarkDefinition
],Dict
[str
,Dict
[str
,SurrogateBenchmarkDefinition
]]]) – Definitions of benchmarks; one is selected from command line argumentsextra_results (
Optional
[ExtraResultsComposer
]) – If given, this is used to append extra information to the results dataframemap_method_args (
Optional
[Callable
[[ConfigDict
,str
,Dict
[str
,Any
]],Dict
[str
,Any
]]]) – See above, optionalextra_tuning_job_metadata (
Optional
[Dict
[str
,Any
]]) – Metadata added to the tuner, can be used to manage resultsuse_transfer_learning (
bool
) – If True, we use transfer tuning. Defaults to False
- syne_tune.experiments.launchers.hpo_main_simulator.main(methods, benchmark_definitions, extra_args=None, map_method_args=None, extra_results=None, use_transfer_learning=False)[source]
Runs sequence of experiments with simulator backend sequentially. The loop runs over methods selected from
methods
, repetitions and benchmarks selected frombenchmark_definitions
, with the range being controlled by command line arguments.map_method_args
can be used to modifymethod_kwargs
for constructingMethodArguments
, depending onconfiguration
returned byparse_args()
and the method. Its signature ismethod_kwargs = map_method_args(configuration, method, method_kwargs)
, wheremethod
is the name of the baseline. It is called just before the method is created.- Parameters:
methods (
Dict
[str
,Callable
[[MethodArguments
],TrialScheduler
]]) – Dictionary with method constructorsbenchmark_definitions (
Union
[Dict
[str
,SurrogateBenchmarkDefinition
],Dict
[str
,Dict
[str
,SurrogateBenchmarkDefinition
]]]) – Definitions of benchmarksextra_args (
Optional
[List
[Dict
[str
,Any
]]]) – Extra arguments for command line parser. Optionalmap_method_args (
Optional
[Callable
[[ConfigDict
,str
,Dict
[str
,Any
]],Dict
[str
,Any
]]]) – See above. Needed ifextra_args
givenextra_results (
Optional
[ExtraResultsComposer
]) – If given, this is used to append extra information to the results dataframeuse_transfer_learning (
bool
) – If True, we use transfer tuning. Defaults to False
syne_tune.experiments.launchers.launch_remote_common module
- syne_tune.experiments.launchers.launch_remote_common.sagemaker_estimator_args(entry_point, experiment_tag, tuner_name, benchmark=None, sagemaker_backend=False, source_dependencies=None)[source]
Returns SageMaker estimator keyword arguments for remote tuning job.
Note: We switch off SageMaker profiler and debugger, as both are not needed and consume extra resources and may introduce instabilities.
- Parameters:
entry_point (
Path
) – Script for running HPO experiment, used forentry_point
andsource_dir
argumentsexperiment_tag (
str
) – Tag of experiment, used to createcheckpoint_s3_uri
tuner_name (
str
) – Name of tuner, used to createcheckpoint_s3_uri
benchmark (
Union
[SurrogateBenchmarkDefinition
,RealBenchmarkDefinition
,None
]) – Benchmark definition, optionalsagemaker_backend (
bool
) – Is remote tuning job running the SageMaker backend? If not, it either runs local or simulator backend. Defaults toFalse
source_dependencies (
Optional
[List
[str
]]) – If given, these are additional source dependencies passed to the SageMaker estimator
- Return type:
Dict
[str
,Any
]- Returns:
Keyword arguments for SageMaker estimator
- syne_tune.experiments.launchers.launch_remote_common.fit_sagemaker_estimator(backoff_wait_time, estimator, ntimes_resource_wait=100, **kwargs)[source]
Runs
estimator.fit(**kwargs)
. Ifbackoff_wait_time > 0
, we make sure that iffit
fails withClientError
of type “ResourceLimitExceeded”, we wait forbackoff_wait_time
seconds and try again (up tontimes_resource_wait
times).If
backoff_wait_time <= 0
, the call offit
is not wrapped.- Parameters:
backoff_wait_time (
int
) – See above.estimator (
EstimatorBase
) – SageMaker estimator to callfit
forntimes_resource_wait (
int
) – Maximum number of retrieskwargs – Arguments for
estimator.fit
syne_tune.experiments.launchers.launch_remote_local module
syne_tune.experiments.launchers.launch_remote_sagemaker module
syne_tune.experiments.launchers.launch_remote_simulator module
- syne_tune.experiments.launchers.launch_remote_simulator.get_hyperparameters(seed, method, experiment_tag, random_seed, configuration)[source]
Compose hyperparameters for SageMaker training job
- Parameters:
seed (
int
) – Seed of repetitionmethod (
str
) – Method nameexperiment_tag (
str
) – Tag of experimentrandom_seed (
int
) – Master random seedconfiguration (
ConfigDict
) – Configuration for the job
- Return type:
Dict
[str
,Any
]- Returns:
Dictionary of hyperparameters
- syne_tune.experiments.launchers.launch_remote_simulator.launch_remote(entry_point, methods, benchmark_definitions, source_dependencies=None, extra_args=None, is_expensive_method=None)[source]
Launches sequence of SageMaker training jobs, each running an experiment with the simulator backend.
The loop runs over methods selected from
methods
. Different repetitions (seeds) are run sequentially in the remote job. However, ifis_expensive_method(method_name)
is true, we launch different remote jobs for every seed for this particular method. This is to cater for methods which are themselves expensive to run (e.g., involving Gaussian process based Bayesian optimization).If
benchmark_definitions
is a single-level dictionary and no benchmark is selected on the command line, then all benchmarks are run sequentially in the remote job. However, ifbenchmark_definitions
is two-level nested, we loop over the outer level and start separate remote jobs, each of which iterates over its inner level of benchmarks. This is useful if the number of benchmarks to iterate over is large.- Parameters:
entry_point (
Path
) – Script for running the experimentmethods (
Dict
[str
,Any
]) – Dictionary with method constructors; one is selected from command line argumentsbenchmark_definitions (
Union
[Dict
[str
,SurrogateBenchmarkDefinition
],Dict
[str
,Dict
[str
,SurrogateBenchmarkDefinition
]]]) – Definitions of benchmarks, can be nested (see above)source_dependencies (
Optional
[List
[str
]]) – If given, these are source dependencies for the SageMaker estimator, on top of Syne Tune itselfextra_args (
Optional
[List
[Dict
[str
,Any
]]]) – Extra arguments for command line parser, optionalis_expensive_method (
Optional
[Callable
[[str
],bool
]]) – See above. The default is a predicative always returning False (no method is expensive)
- syne_tune.experiments.launchers.launch_remote_simulator.launch_remote_experiments_simulator(configuration, entry_point, methods, benchmark_definitions, source_dependencies, is_expensive_method=None)[source]
Launches sequence of SageMaker training jobs, each running an experiment with the simulator backend.
The loop runs over methods selected from
methods
. Different repetitions (seeds) are run sequentially in the remote job. However, ifis_expensive_method(method_name)
is true, we launch different remote jobs for every seed for this particular method. This is to cater for methods which are themselves expensive to run (e.g., involving Gaussian process based Bayesian optimization).If
benchmark_definitions
is a single-level dictionary and no benchmark is selected on the command line, then all benchmarks are run sequentially in the remote job. However, ifbenchmark_definitions
is two-level nested, we loop over the outer level and start separate remote jobs, each of which iterates over its inner level of benchmarks. This is useful if the number of benchmarks to iterate over is large.- Parameters:
configuration (
ConfigDict
) – ConfigDict with parameters of the benchmark. Must contain all parameters from hpo_main_simulator.LOCAL_LOCAL_SIMULATED_BENCHMARK_REQUIRED_PARAMETERSentry_point (
Path
) – Script for running the experimentmethods (
Dict
[str
,Callable
[[MethodArguments
],TrialScheduler
]]) – Dictionary with method constructors; one is selected from command line argumentsbenchmark_definitions (
Union
[Dict
[str
,SurrogateBenchmarkDefinition
],Dict
[str
,Dict
[str
,SurrogateBenchmarkDefinition
]]]) – Definitions of benchmarks; one is selected from command line argumentsis_expensive_method (
Optional
[Callable
[[str
],bool
]]) – See above. The default is a predicative always returning False (no method is expensive)
syne_tune.experiments.launchers.utils module
- syne_tune.experiments.launchers.utils.sync_from_s3_command(experiment_name, s3_bucket=None)[source]
- Return type:
str
- syne_tune.experiments.launchers.utils.message_sync_from_s3(experiment_tag)[source]
- Return type:
str
- syne_tune.experiments.launchers.utils.combine_requirements_txt(synetune_requirements_file, script)[source]
- Return type:
Path
syne_tune.experiments.visualization package
Submodules
syne_tune.experiments.visualization.aggregate_results module
- syne_tune.experiments.visualization.aggregate_results.fill_trajectory(performance_list, time_list, replace_nan=nan)[source]
- Return type:
(
ndarray
,ndarray
)
- syne_tune.experiments.visualization.aggregate_results.compute_mean_and_ci(metrics_runs, time)[source]
Aggregate is the mean, error bars are empirical estimate of 95% confidence interval for the true mean.
Note: Error bar scale depends on number of runs
n
via1 / sqrt(n)
.- Return type:
Dict
[str
,ndarray
]
- syne_tune.experiments.visualization.aggregate_results.compute_median_percentiles(metrics_runs, time)[source]
Aggregate is the median, error bars are 25 and 75 percentiles.
Note: Error bar scale does not depend on number of runs.
- Return type:
Dict
[str
,ndarray
]
- syne_tune.experiments.visualization.aggregate_results.compute_iqm_bootstrap(metrics_runs, time)[source]
The aggregate is the interquartile mean (IQM). Error bars are bootstrap estimate of 95% confidence interval for true IQM. This is the normal interval, based on the bootstrap variance estimate. While other bootstrap CI estimates are available, they are more expensive to compute.
Note: Error bar scale depends on number of runs
n
via1 / sqrt(n)
.- Return type:
Dict
[str
,ndarray
]
syne_tune.experiments.visualization.multiobjective module
- syne_tune.experiments.visualization.multiobjective.hypervolume_indicator_column_generator(metrics_and_modes, reference_point=None, increment=1)[source]
Returns generator for new dataframe column containing the best hypervolume indicator as function of wall-clock time, based on the metrics in
metrics_and_modes
(metric names correspond to column names in the dataframe). For a metric withmode == "max"
, we use its negative.This mapping is used to create the
dataframe_column_generator
argument ofplot()
. Since the current implementation is not incremental and quite slow, if you plot results for single-fidelity HPO methods, it is strongly recommended to also useone_result_per_trial=True
:results = ComparativeResults(...) dataframe_column_generator = hypervolume_indicator_column_generator( metrics_and_modes ) plot_params = PlotParameters( metric="hypervolume_indicator", mode="max", ) results.plot( benchmark_name=benchmark_name, plot_params=plot_params, dataframe_column_generator=dataframe_column_generator, one_result_per_trial=True, )
- Parameters:
metrics_and_modes (
List
[Tuple
[str
,str
]]) – List of(metric, mode)
, see abovereference_point (
Optional
[ndarray
]) – Reference point for hypervolume computation. If not given, a default value is usedincrement (
int
) – If> 1
, the HV indicator is linearly interpolated, this is faster. Defaults to 1 (no interpolation)
- Returns:
Dataframe column generator
syne_tune.experiments.visualization.pareto_set module
- syne_tune.experiments.visualization.pareto_set.get_pareto_optimal(costs)[source]
Find the pareto-optimal points :type costs:
ndarray
:param costs: (n_points, m_cost_values) array :return: (n_points, 1) indicator if point is on pareto front or not.
- syne_tune.experiments.visualization.pareto_set.get_pareto_set(results, metrics, mode='min')[source]
Returns a subset of the results frame consisting of all Pareto optimal points. :type results:
DataFrame
:param results: pandas.DataFrame Experiment results dataframe generated by the Tuner object :type metrics:List
[str
] :param metrics: List that contains all metrics that should be optimized :type mode:Union
[str
,List
[str
],None
] :param mode: Defines for each metric whether to maximize or minimize :return: DataFrame with Pareto set
syne_tune.experiments.visualization.plot_per_trial module
- class syne_tune.experiments.visualization.plot_per_trial.MultiFidelityParameters(rung_levels, multifidelity_setups)[source]
Bases:
object
Parameters configuring the multi-fidelity version of
TrialsOfExperimentResults
.multifidelity_setups
contains names of setups which are multi-fidelity, the remaining ones are single-fidelity. It can also be a dictionary, mapping a multi-fidelity setup name toTrue
if this is a pause-and-resume method (these are visualized differently),False
otherwise (early stopping method).- Parameters:
rung_levels (
List
[int
]) – See above. Positive integers, increasingmultifidelity_setups (
Union
[List
[str
],Dict
[str
,bool
]]) – See above
-
rung_levels:
List
[int
]
-
multifidelity_setups:
Union
[List
[str
],Dict
[str
,bool
]]
- class syne_tune.experiments.visualization.plot_per_trial.TrialsOfExperimentResults(experiment_names, setups, metadata_to_setup, plot_params=None, multi_fidelity_params=None, benchmark_key='benchmark', seed_key='seed', with_subdirs='*', datetime_bounds=None, download_from_s3=False, s3_bucket=None)[source]
Bases:
object
This class loads, processes, and plots metric results for single experiments, where the curves for different trials have different colours.
Compared to
ComparativeResults
, each subfigure uses data from a single experiment (one benchmark, one seed, one setup). Both benchmark and seed need to be chosen inplot()
. If there are different setups, they give rise to subfigures.If
plot_params.subplots
is not given, the arrangement is one row with columns corresponding to setups, and setup names as titles. Specifyplot_params.subplots
in order to change this arrangement (e.g., to have more than one row). Setups can be selected by usingplot_params.subplots.subplot_indices
. Also, ifplot_params.subplots.titles
is not given, we use setup names, and each subplot gets its own title (plot_params.subplots.title_each_figure
is ignored).For
plot_params
, we use the samePlotParameters
as inComparativeResults
, but some fields are not used here (title
,aggregate_mode
,show_one_trial
,subplots.legend_no
,subplots.xlims
).- Parameters:
experiment_names (
Tuple
[str
,...
]) – Tuple of experiment names (prefixes, without the timestamps)setups (
Iterable
[str
]) – Possible values of setup namesmetadata_to_setup (
Union
[Callable
[[Dict
[str
,Any
]],Optional
[str
]],Dict
[str
,Callable
[[Dict
[str
,Any
]],Optional
[str
]]]]) – See aboveplot_params (
Optional
[PlotParameters
]) – Parameters controlling the plot. Can be overwritten inplot()
. SeePlotParameters
multi_fidelity_params (
Optional
[MultiFidelityParameters
]) – If given, we use a special variant tailored to multi-fidelity methods (seeplot()
).benchmark_key (
Optional
[str
]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this isNone
, there is only a single benchmark, and all results are merged togetherseed_key (
str
) – Key for seed in metadata files. Defaults to “seed”.with_subdirs (
Union
[str
,List
[str
],None
]) – See above. Defaults to “*”datetime_bounds (
Union
[Tuple
[Optional
[str
],Optional
[str
]],Dict
[str
,Tuple
[Optional
[str
],Optional
[str
]]],None
]) – See abovedownload_from_s3 (
bool
) – Should result files be downloaded from S3? This is supported only ifwith_subdirs
s3_bucket (
Optional
[str
]) – Only ifdownload_from_s3 == True
. If not given, the default bucket for the SageMaker session is used
- plot(benchmark_name=None, seed=0, plot_params=None, file_name=None)[source]
Creates a plot, whose subfigures should metric data from single experiments. In general:
Each trial has its own color, which is cycled through periodically. The cycling depends on the largest rung level for the trial. This is to avoid neighboring curves to have the same color
For single-fidelity methods (default,
multi_fidelity_params
not given):The learning curve for a trial ends with ‘o’. If it reports only once at the end, this is all that is shown for the trial
For multi-fidelity methods:
Learning curves are plotted in contiguous chunks of execution. For pause and resume setups (those in ``multi_fidelity_params.pause_resume_setups), they are interrupted. Each chunk starts at the epoch after resume and ends at the epoch where the trial is paused
Values at rung levels are marked as ‘o’. If this is the furthest the trial got to, the marker is ‘D’ (diamond)
Results for different setups are plotted as subfigures, either using the setup in
plot_params.subplots
, or as columns of a single row.- Parameters:
benchmark_name (
Optional
[str
]) – Name of benchmark for which to plot results. Not needed if there is only one benchmarkseed (
int
) – Seed number. Defaults to 0plot_params (
Optional
[PlotParameters
]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.file_name (
Optional
[str
]) – If given, the figure is stored in a file of this name
syne_tune.experiments.visualization.plotting module
- class syne_tune.experiments.visualization.plotting.SubplotParameters(nrows=None, ncols=None, titles=None, title_each_figure=None, kwargs=None, legend_no=None, xlims=None, subplot_indices=None)[source]
Bases:
object
Parameters specifying an arrangement of subplots.
kwargs
is mandatory.- Parameters:
nrows (
Optional
[int
]) – Number of rows of subplot matrixncols (
Optional
[int
]) – Number of columns of subplot matrixtitles (
Optional
[List
[str
]]) – If given, these are titles for each column in the arrangement of subplots. Iftitle_each_figure == True
, these are titles for each subplot. Iftitles
is not given, thenPlotParameters.title
is printed on top of the leftmost columntitle_each_figure (
Optional
[bool
]) – Seetitles
, defaults toFalse
kwargs (
Optional
[Dict
[str
,Any
]]) – Extra arguments forplt.subplots
, apart from “nrows” and “ncols”legend_no (
Optional
[List
[int
]]) – Subplot indices where legend is to be shown. Defaults to[]
(no legends shown). This is not relative tosubplot_indices
xlims (
Optional
[List
[int
]]) – If this is given, must be a list with one entry per subfigure. In this case, the globalxlim
is overwritten by(0, xlims[subplot_no])
. Ifsubplot_indices
is given,xlims
must have the same length, andxlims[j]
refers to subplot indexsubplot_indices[j]
thensubplot_indices (
Optional
[List
[int
]]) – If this is given, we only plot subfigures with indices in this list, and in this order. Otherwise, we plot subfigures 0, 1, 2, …
-
nrows:
int
= None
-
ncols:
int
= None
-
titles:
List
[str
] = None
-
title_each_figure:
bool
= None
-
kwargs:
Dict
[str
,Any
] = None
-
legend_no:
List
[int
] = None
-
xlims:
List
[int
] = None
-
subplot_indices:
List
[int
] = None
- class syne_tune.experiments.visualization.plotting.ShowTrialParameters(setup_name=None, trial_id=None, new_setup_name=None)[source]
Bases:
object
Parameters specifying the
show_init_trials
feature. This features adds one more curve to each subplot wheresetup_name
features. This curve shows best metric value found for trials with ID<= trial_id
. The right-most value is extended as constant line across the remainder of the x-axis, for better visibility.- Parameters:
setup_name (
Optional
[str
]) – Setup from which the trial performance is takentrial_id (
Optional
[int
]) – ID of trial. Defaults to 0. If this is positive, data from trials with IDs<= trial_id
are shownnew_setup_name (
Optional
[str
]) – Name of the additional curve in legends
-
setup_name:
str
= None
-
trial_id:
int
= None
-
new_setup_name:
str
= None
- class syne_tune.experiments.visualization.plotting.PlotParameters(metric=None, mode=None, title=None, xlabel=None, ylabel=None, xlim=None, ylim=None, metric_multiplier=None, convert_to_min=None, tick_params=None, aggregate_mode=None, dpi=None, grid=None, subplots=None, show_init_trials=None)[source]
Bases:
object
Parameters specifying the figure.
If
convert_to_min == True
, then smaller is better in plots. An original metric valuemetric_val
is converted asmetric_multiplier * metric_val
ifmode == "min"
, and as1 - metric_multiplier * metric_val
ifmode == "max"
. Ifconvert_to_min == False`
, we always convert asmetric_multiplier * metric_val
, so that larger is better ifmode == "max"
.- Parameters:
metric (
Optional
[str
]) – Name of metric, mandatorymode (
Optional
[str
]) – See above, “min” or “max”. Defaults to “min” if not giventitle (
Optional
[str
]) – Title of plot. Ifsubplots
is used, seeSubplotParameters
xlabel (
Optional
[str
]) – Label for x axis. Ifsubplots
is used, this is printed below each column. Defaults toDEFAULT_XLABEL
ylabel (
Optional
[str
]) – Label for y axis. Ifsubplots
is used, this is printed left of each rowxlim (
Optional
[Tuple
[float
,float
]]) –(x_min, x_max)
for x axis. Ifsubplots
is used, seeSubplotParameters
ylim (
Optional
[Tuple
[float
,float
]]) –(y_min, y_max)
for y axis.metric_multiplier (
Optional
[float
]) – See above. Defaults to 1convert_to_min (
Optional
[bool
]) – See above. Defaults toTrue
tick_params (
Optional
[Dict
[str
,Any
]]) – Params forax.tick_params
aggregate_mode (
Optional
[str
]) –How are values across seeds aggregated?
”mean_and_ci”: Mean and 0.95 normal confidence interval
”median_percentiles”: Mean and 25, 75 percentiles
”iqm_bootstrap”: Interquartile mean and 0.95 confidence interval based on the bootstrap variance estimate
Defaults to
DEFAULT_AGGREGATE_MODE
dpi (
Optional
[int
]) – Resolution of figure in DPI. Defaults to 200grid (
Optional
[bool
]) – Figure with grid? Defaults toFalse
subplots (
Optional
[SubplotParameters
]) – If given, the figure consists of several subplots. SeeSubplotParameters
show_init_trials (
Optional
[ShowTrialParameters
]) – SeeShowTrialParameters
-
metric:
str
= None
-
mode:
str
= None
-
title:
str
= None
-
xlabel:
str
= None
-
ylabel:
str
= None
-
xlim:
Tuple
[float
,float
] = None
-
ylim:
Tuple
[float
,float
] = None
-
metric_multiplier:
float
= None
-
convert_to_min:
bool
= None
-
tick_params:
Dict
[str
,Any
] = None
-
aggregate_mode:
str
= None
-
dpi:
int
= None
-
grid:
bool
= None
-
subplots:
SubplotParameters
= None
-
show_init_trials:
ShowTrialParameters
= None
- syne_tune.experiments.visualization.plotting.group_results_dataframe(df)[source]
- Return type:
Dict
[Tuple
[int
,str
],List
[Tuple
[str
,DataFrame
]]]
- syne_tune.experiments.visualization.plotting.filter_final_row_per_trial(grouped_dfs)[source]
We filter rows such that only one row per trial ID remains, namely the one with the largest time stamp. This makes sense for single-fidelity methods, where reports have still been done after every epoch.
- Return type:
Dict
[Tuple
[int
,str
],List
[Tuple
[str
,DataFrame
]]]
- syne_tune.experiments.visualization.plotting.enrich_results(grouped_dfs, column_name, dataframe_column_generator)[source]
- Return type:
Dict
[Tuple
[int
,str
],List
[Tuple
[str
,DataFrame
]]]
- class syne_tune.experiments.visualization.plotting.ComparativeResults(experiment_names, setups, num_runs, metadata_to_setup, plot_params=None, metadata_to_subplot=None, benchmark_key='benchmark', with_subdirs='*', datetime_bounds=None, metadata_keys=None, metadata_subplot_level=False, download_from_s3=False, s3_bucket=None)[source]
Bases:
object
This class loads, processes, and plots results of a comparative study, combining several experiments for different methods, seeds, and benchmarks (optional). Note that an experiment corresponds to one run of HPO, resulting in files
ST_METADATA_FILENAME
for metadata, andST_RESULTS_DATAFRAME_FILENAME
for time-stamped results.There is one comparative plot per benchmark (aggregation of results across benchmarks are not supported here). Results are grouped by setup (which usually equates to method), and then summary statistics are shown for each setup as function of wall-clock time. The plot can also have several subplots, in which case results are first grouped into subplot number, then setup.
If
benchmark_key is None
, there is only a single benchmark, and all results are merged together.Both setup name and subplot number (optional) can be configured by the user, as function of metadata written for each experiment. The functions
metadata_to_setup
andmetadata_to_subplot
(optional) can also be used for filtering: results of experiments for which any of them returnsNone
, are not used.When grouping results w.r.t. benchmark name and setup name, we should end up with
num_runs
experiments. These are (typically) random repetitions with different seeds. If after grouping, a different number of experiments is found for some setup, a warning message is printed. In this case, we recommend to check the completeness of result files. Common reasons:Less than
num_runs
experiments found. Experiments failed, or files were not properly synced.More than
num_runs
experiments found. This happens if initial experiments for the study failed, but ended up writing results. This can be fixed by either removing the result files, or by usingdatetime_bounds
(since initial failed experiments ran first).
Result files have the path
f"{experiment_path()}{ename}/{patt}/{ename}-*/"
, wherepath
is fromwith_subdirs
, andename
fromexperiment_names
. The default iswith_subdirs="*"
. Ifwith_subdirs
isNone
, result files have the pathf"{experiment_path()}{ename}-*/"
. Use this if your experiments have been run locally.If
datetime_bounds
is given, it contains a tuple of strings(lower_time, upper_time)
, or a dictionary mapping names fromexperiment_names
to such tuples. Both strings are time-stamps in the formatST_DATETIME_FORMAT
(example: “2023-03-19-22-01-57”), and each can beNone
as well. This serves to filter out any result whose time-stamp does not fall within the interval (both sides are inclusive), whereNone
means the interval is open on that side. This feature is useful to filter out results of erroneous attempts.If
metadata_keys
is given, it contains a list of keys into the metadata. In this case, metadata values for these keys are extracted and can be retrieved withmetadata_values()
. In fact,metadata_values(benchmark_name)
returns a nested dictionary, whereresult[key][setup_name]
is a list of values. Ifmetadata_subplot_level
isTrue
andmetadata_to_subplot
is given, the result structure isresult[key][setup_name][subplot_no]
. This should be set if different subplots share the same setup names, since otherwise metadata values are only grouped by setup name.- Parameters:
experiment_names (
Tuple
[str
,...
]) – Tuple of experiment names (prefixes, without the timestamps)setups (
Iterable
[str
]) – Possible values of setup namesnum_runs (
int
) – When grouping results w.r.t. benchmark name and setup name, we should end up with this many experiments. See abovemetadata_to_setup (
Union
[Callable
[[Dict
[str
,Any
]],Optional
[str
]],Dict
[str
,Callable
[[Dict
[str
,Any
]],Optional
[str
]]]]) – See aboveplot_params (
Optional
[PlotParameters
]) – Parameters controlling the plot. Can be overwritten inplot()
. SeePlotParameters
metadata_to_subplot (
Optional
[Callable
[[Dict
[str
,Any
]],Optional
[int
]]]) – See above. Optionalbenchmark_key (
Optional
[str
]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this isNone
, there is only a single benchmark, and all results are merged togetherwith_subdirs (
Union
[str
,List
[str
],None
]) – See above. Defaults to “*”datetime_bounds (
Union
[Tuple
[Optional
[str
],Optional
[str
]],Dict
[str
,Tuple
[Optional
[str
],Optional
[str
]]],None
]) – See abovemetadata_keys (
Optional
[List
[str
]]) – See abovemetadata_subplot_level (
bool
) – See above. Defaults toFalse
download_from_s3 (
bool
) – Should result files be downloaded from S3? This is supported only ifwith_subdirs
s3_bucket (
Optional
[str
]) – Only ifdownload_from_s3 == True
. If not given, the default bucket for the SageMaker session is used
- metadata_values(benchmark_name=None)[source]
The nested dictionary returned has the structure
result[key][setup_name]
, orresult[key][setup_name][subplot_no]
ifmetadata_subplot_level == True
.- Parameters:
benchmark_name (
Optional
[str
]) – Name of benchmark- Return type:
Dict
[str
,Any
]- Returns:
Nested dictionary with meta-data values
- plot(benchmark_name=None, plot_params=None, file_name=None, extra_results_keys=None, dataframe_column_generator=None, one_result_per_trial=False)[source]
Create comparative plot from results of all experiments collected at construction, for benchmark
benchmark_name
(if there is a single benchmark only, this need not be given).If
plot_params.show_init_trials
is given, the best metric value curve for the data from trials<= plot_params.show_init_trials.trial_id
in a particular setupplot_params.show_init_trials.setup_name
is shown in all subplots the setup is contained in. This is useful to contrast the performance of methods against the performance for one particular trial, for example the initial configuration (i.e., to show how much this can be improved upon). The final metric value of this extra curve is extended until the end of the horizontal range, in order to make it visible. The corresponding curve is labeled withplot_params.show_init_trials.new_setup_name
in the legend.If
extra_results_keys
is given, these are column names in the result dataframe. For each setup and seed, we collect the values for the largest time stamp. We return a nested dictionaryextra_results
, so thatextra_results[setup_name][key]
contains values (over seeds), wherekey
is inextra_results_keys
. Ifmetadata_subplot_level
isTrue
andmetadata_to_subplot
is given, the structure isextra_results[setup_name][subplot_no][key]
.If
dataframe_column_generator
is given, it maps a result dataframe for a single experiment to a new column namedplot_params.metric
. This is applied before computing cumulative maximum or minimum and aggregation over seeds. This way, we can plot derived metrics which are not contained in the results as columns. Note that the transformed dataframe is not retained.- Parameters:
benchmark_name (
Optional
[str
]) – Name of benchmark for which to plot results. Not needed if there is only one benchmarkplot_params (
Optional
[PlotParameters
]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.file_name (
Optional
[str
]) – If given, the figure is stored in a file of this nameextra_results_keys (
Optional
[List
[str
]]) – See above, optionaldataframe_column_generator (
Optional
[Callable
[[DataFrame
],Series
]]) – See above, optionalone_result_per_trial (
bool
) – IfTrue
, results for each experiment are filtered down to one row per trial (the one with the largest time stamp). This is useful for results from a single-fidelity method, where the training script reported results after every epoch.
- Return type:
Dict
[str
,Any
]- Returns:
Dictionary with “fig”, “axs” (for further processing). If
extra_results_keys
, “extra_results” entry as stated above
syne_tune.experiments.visualization.results_utils module
- syne_tune.experiments.visualization.results_utils.create_index_for_result_files(experiment_names, metadata_to_setup, metadata_to_subplot=None, metadata_keys=None, metadata_subplot_level=False, benchmark_key='benchmark', with_subdirs='*', datetime_bounds=None, seed_key=None)[source]
Helper function for
ComparativeResults
.Runs over all result directories for experiments of a comparative study. For each experiment, we read the metadata file, extract the benchmark name (key
benchmark_key
), and usemetadata_to_setup
,metadata_to_subplot
to map the metadata to setup name and subplot index. If any of the two returnNone
, the result is not used. Otherwise, we enter(result_path, setup_name, subplot_no)
into the list for benchmark name. Here,result_path
is the result path for the experiment, without theexperiment_path()
prefix. The index returned is the dictionary from benchmark names to these list. It allows loading results specifically for each benchmark, and we do not have to load and parse the metadata files again.If
benchmark_key is None
, the returned index is a dictionary with a single element only, and the metadata files need not contain an entry for benchmark name.Result files have the path
f"{experiment_path()}{ename}/{patt}/{ename}-*/"
, wherepath
is fromwith_subdirs
, andename
fromexperiment_names
. The default iswith_subdirs="*"
. Ifwith_subdirs
isNone
, result files have the pathf"{experiment_path()}{ename}-*/"
. This is an older convention, which makes it harder to sync files from S3, it is not recommended.If
metadata_keys
is given, it contains a list of keys into the metadata. In this case, a nested dictionarymetadata_values
is returned, wheremetadata_values[benchmark_name][key][setup_name]
contains a list of metadata values for this benchmark, key inmetadata_keys
, and setup name. In this case, ifmetadata_subplot_level
isTrue
andmetadata_to_subplot
is given,metadata_values
has the structuremetadata_values[benchmark_name][key][setup_name][subplot_no]
. This should be set if different subplots share the same setup names.If
datetime_bounds
is given, it contains a tuple of strings(lower_time, upper_time)
, or a dictionary mapping experiment names (fromexperiment_names
) to such tuples. Both strings are time-stamps in the formatST_DATETIME_FORMAT
(example: “2023-03-19-22-01-57”), and each can beNone
as well. This serves to filter out any result whose time-stamp does not fall within the interval (both sides are inclusive), whereNone
means the interval is open on that side. This feature is useful to filter out results of erroneous attempts.If
seed_key
is given, the returned index is a dictionary with keys(benchmark_name, seed)
, whereseed
is the value corresponding toseed_key
in the metadata dict. This mode is needed for plots focusing on a single experiment.- Parameters:
experiment_names (
Tuple
[str
,...
]) – Tuple of experiment names (prefixes, without the timestamps)metadata_to_setup (
Union
[Callable
[[Dict
[str
,Any
]],Optional
[str
]],Dict
[str
,Callable
[[Dict
[str
,Any
]],Optional
[str
]]]]) – See abovemetadata_to_subplot (
Optional
[Callable
[[Dict
[str
,Any
]],Optional
[int
]]]) – See above. Optionalmetadata_keys (
Optional
[List
[str
]]) – See above. Optionalmetadata_subplot_level (
bool
) – See above. Defaults toFalse
benchmark_key (
Optional
[str
]) – Key for benchmark in metadata files. Defaults to “benchmark”with_subdirs (
Union
[str
,List
[str
],None
]) – See above. Defaults to “*”datetime_bounds (
Union
[Tuple
[Optional
[str
],Optional
[str
]],Dict
[str
,Tuple
[Optional
[str
],Optional
[str
]]],None
]) – See aboveseed_key (
Optional
[str
]) – See above
- Return type:
Union
[Dict
[str
,Any
],Dict
[Tuple
[str
,int
],Any
]]- Returns:
Dictionary; entry “index” for index (see above); entry “setup_names” for setup names encountered; entry “metadata_values” see
metadata_keys
- syne_tune.experiments.visualization.results_utils.load_results_dataframe_per_benchmark(experiment_list)[source]
Helper function for
ComparativeResults
.Loads time-stamped results for all experiments in
experiments_list
and returns them in a single dataframe with additional columns “setup_name”, “suplot_no”, “tuner_name”, whose values are constant across data for one experiment, allowing for later grouping.- Parameters:
experiment_list (
List
[Tuple
[str
,str
,int
]]) – Information about experiments, seecreate_index_for_result_files()
- Return type:
Optional
[DataFrame
]- Returns:
Dataframe with all results combined
- syne_tune.experiments.visualization.results_utils.download_result_files_from_s3(experiment_names, s3_bucket=None)[source]
Downloads result files from S3. This works only if the result objects on S3 have prefixes
f"{s3_experiment_path(s3_bucket)}{ename}/"
, whereename
is inexperiment_names
. Only files with namesST_METADATA_FILENAME
andST_RESULTS_DATAFRAME_FILENAME
are downloaded.- Parameters:
experiment_names (
Tuple
[str
,...
]) – Tuple of experiment names (prefixes, without the timestamps)s3_bucket (
Optional
[str
]) – If not given, the default bucket for the SageMaker session is used
Submodules
syne_tune.experiments.baselines module
- class syne_tune.experiments.baselines.MethodArguments(config_space, metric, mode, random_seed, resource_attr, max_resource_attr=None, scheduler_kwargs=None, transfer_learning_evaluations=None, use_surrogates=False, fcnet_ordinal=None, num_gpus_per_trial=1)[source]
Bases:
object
Arguments for creating HPO method (scheduler). We collect the union of optional arguments for all use cases here.
- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space (typically taken from benchmark definition)metric (
str
) – Name of metric to optimizemode (
str
) – Whethermetric
is minimized (“min”) or maximized (“max”)random_seed (
int
) – Different for different repetitionsresource_attr (
str
) – Name of resource attributemax_resource_attr (
Optional
[str
]) – Name ofmax_resource_value
inconfig_space
. One ofmax_resource_attr
,max_t
is mandatoryscheduler_kwargs (
Optional
[Dict
[str
,Any
]]) – If given, overwrites defaults of scheduler argumentstransfer_learning_evaluations (
Optional
[Dict
[str
,Any
]]) – Support for transfer learning. Only for simulator backend experiments right nowuse_surrogates (
bool
) – For simulator backend experiments, defaults toFalse
fcnet_ordinal (
Optional
[str
]) – Only for simulator backend andfcnet
blackbox. This blackbox is tabulated with finite domains, one of which has irregular spacing. Iffcnet_ordinal="none"
, this is left as categorical, otherwise we use ordinal encoding withkind=fcnet_ordinal
.num_gpus_per_trial (
int
) – Only for local backend and GPU training. Number of GPUs assigned to a trial. This is passed here, because it needs to be written into the configuration space for some benchmarks. Defaults to 1
-
config_space:
Dict
[str
,Any
]
-
metric:
str
-
mode:
str
-
random_seed:
int
-
resource_attr:
str
-
max_resource_attr:
Optional
[str
] = None
-
scheduler_kwargs:
Optional
[Dict
[str
,Any
]] = None
-
transfer_learning_evaluations:
Optional
[Dict
[str
,Any
]] = None
-
use_surrogates:
bool
= False
-
fcnet_ordinal:
Optional
[str
] = None
-
num_gpus_per_trial:
int
= 1
- syne_tune.experiments.baselines.default_arguments(args, extra_args)[source]
- Return type:
Dict
[str
,Any
]
- syne_tune.experiments.baselines.convert_categorical_to_ordinal(config_space)[source]
- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space- Return type:
Dict
[str
,Any
]- Returns:
New configuration space where all categorical domains are replaced by ordinal ones (with
kind="equal"
)
- syne_tune.experiments.baselines.convert_categorical_to_ordinal_numeric(config_space, kind, do_convert=None)[source]
Converts categorical domains to ordinal ones, of type
kind
. This is not done ifkind="none"
, or ifdo_convert(config_space) == False
.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spacekind (
Optional
[str
]) – Type of ordinal, or"none"
do_convert (
Optional
[Callable
[[Dict
[str
,Any
]],bool
]]) – See above. The default is testing for the config space of thefcnet
blackbox
- Return type:
Dict
[str
,Any
]- Returns:
New configuration space
syne_tune.experiments.default_baselines module
syne_tune.experiments.experiment_result module
- class syne_tune.experiments.experiment_result.ExperimentResult(name, results, metadata, tuner, path)[source]
Bases:
object
Wraps results dataframe and provides retrieval services.
- Parameters:
-
name:
str
-
results:
DataFrame
-
metadata:
Dict
[str
,Any
]
-
path:
Path
- plot_hypervolume(metrics_to_plot=None, reference_point=None, figure_path=None, **plt_kwargs)[source]
Plot best hypervolume value as function of wallclock time
- Parameters:
reference_point (
Optional
[ndarray
]) – Reference point for hypervolume calculations. If None, the maximum values of each metric is used.figure_path (
Optional
[str
]) – If specified, defines the path where the figure will be saved. If None, the figure is shownplt_kwargs – Arguments to
matplotlib.pyplot.plot()
- plot(metric_to_plot=0, figure_path=None, **plt_kwargs)[source]
Plot best metric value as function of wallclock time
- Parameters:
metric_to_plot (
Union
[str
,int
]) – Indicates which metric to plot, can be the index or a name of the metric. default to 0 - first metric definedfigure_path (
Optional
[str
]) – If specified, defines the path where the figure will be saved. If None, the figure is shownplt_kwargs – Arguments to
matplotlib.pyplot.plot()
- plot_trials_over_time(metric_to_plot=0, figure_path=None, figsize=None)[source]
Plot trials results over as function of wallclock time
- Parameters:
metric_to_plot (
Union
[str
,int
]) – Indicates which metric to plot, can be the index or a name of the metric. default to 0 - first metric definedfigure_path (
Optional
[str
]) – If specified, defines the path where the figure will be saved. If None, the figure is shownfigsize – width and height of figure
- best_config(metric=0)[source]
Return the best config found for the specified metric :type metric:
Union
[str
,int
] :param metric: Indicates which metric to use, can be the index or a name of the metric.default to 0 - first metric defined in the Scheduler
- Return type:
Dict
[str
,Any
]- Returns:
Configuration corresponding to best metric value
- syne_tune.experiments.experiment_result.download_single_experiment(tuner_name, s3_bucket=None, experiment_name=None)[source]
Downloads results from S3 of a tuning experiment
- Parameters:
tuner_name (
str
) – Name of tuner to be retrieved.s3_bucket (
Optional
[str
]) – If not given, the default bucket for the SageMaker session is usedexperiment_name (
Optional
[str
]) – If given, this is used as first directory.
- syne_tune.experiments.experiment_result.load_experiment(tuner_name, download_if_not_found=True, load_tuner=False, local_path=None, experiment_name=None)[source]
Load results from an experiment
- Parameters:
tuner_name (
str
) – Name of a tuning experiment previously rundownload_if_not_found (
bool
) – If True, fetch results from S3 if not found locallyload_tuner (
bool
) – Whether to load the tuner in addition to metadata and resultslocal_path (
Optional
[str
]) – Path containing the experiment to load. If not specified,~/{SYNE_TUNE_FOLDER}/
is used.experiment_name (
Optional
[str
]) – If given, this is used as first directory.
- Return type:
- Returns:
Result object
- syne_tune.experiments.experiment_result.get_metadata(path_filter=None, root=PosixPath('/home/docs/syne-tune'))[source]
Load meta-data for a number of experiments
- Parameters:
path_filter (
Optional
[Callable
[[str
],bool
]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.root (
Path
) – Root path for experiment results. Default isexperiment_path()
- Return type:
Dict
[str
,dict
]- Returns:
Dictionary from tuner name to metadata dict
- syne_tune.experiments.experiment_result.list_experiments(path_filter=None, experiment_filter=None, root=PosixPath('/home/docs/syne-tune'), load_tuner=False)[source]
List experiments for which results are found
- Parameters:
path_filter (
Optional
[Callable
[[str
],bool
]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.experiment_filter (
Optional
[Callable
[[ExperimentResult
],bool
]]) – Filter onExperimentResult
, optionalroot (
Path
) – Root path for experiment results. Default is result ofexperiment_path()
load_tuner (
bool
) – Whether to load the tuner in addition to metadata and results
- Return type:
List
[ExperimentResult
]- Returns:
List of result objects
- syne_tune.experiments.experiment_result.load_experiments_df(path_filter=None, experiment_filter=None, root=PosixPath('/home/docs/syne-tune'), load_tuner=False)[source]
- Parameters:
path_filter (
Optional
[Callable
[[str
],bool
]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.experiment_filter (
Optional
[Callable
[[ExperimentResult
],bool
]]) – Filter onExperimentResult
root (
Path
) – Root path for experiment results. Default isexperiment_path()
load_tuner (
bool
) – Whether to load the tuner in addition to metadata and results
- Return type:
DataFrame
- Returns:
Dataframe that contains all evaluations reported by tuners according to the filter given. The columns contain trial-id, hyperparameter evaluated, metrics reported via
Reporter
. These metrics are collected automatically:st_worker_time
(indicating time spent in the worker when report was seen)time
(indicating wallclock time measured by the tuner)decision
decision taken by the scheduler when observing the resultstatus
status of the trial that was shown to the tunerconfig_{xx}
configuration value for the hyperparameter{xx}
tuner_name
named passed when instantiating the Tunerentry_point_name
,entry_point_path
name and path of the entry point that was tuned
syne_tune.experiments.util module
syne_tune.optimizer package
Subpackages
syne_tune.optimizer.schedulers package
- class syne_tune.optimizer.schedulers.FIFOScheduler(config_space, **kwargs)[source]
Bases:
TrialSchedulerWithSearcher
Scheduler which executes trials in submission order.
This is the most basic scheduler template. It can be configured to many use cases by choosing
searcher
along withsearch_options
.- Parameters:
config_space (Dict[str, Any]) – Configuration space for evaluation function
searcher (str or
BaseSearcher
) – Searcher forget_config
decisions. String values are passed tosearcher_factory()
along withsearch_options
and extra information. Supported values:SUPPORTED_SEARCHERS_FIFO
. Defaults to “random” (i.e., random search)search_options (Dict[str, Any], optional) – If searcher is
str
, these arguments are passed tosearcher_factory()
metric (str or List[str]) – Name of metric to optimize, key in results obtained via
on_trial_result
. For multi-objective schedulers, this can also be a listmode (str or List[str], optional) – “min” if
metric
is minimized, “max” ifmetric
is maximized, defaults to “min”. This can also be a list ifmetric
is a listpoints_to_evaluate (
List[dict]
, optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If not given, this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified. Note: Ifsearcher
is of typeBaseSearcher
,points_to_evaluate
must be set there.random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using
RandomSeedGenerator
. If not given, the master random seed is drawn at random here.max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If this is given,
max_t
is not needed. We recommend to usemax_resource_attr
overmax_t
. If given, we use it to infermax_resource_level
. It is also used to limit trial executions in promotion-based multi-fidelity schedulers (see class:HyperbandScheduler
,type="promotion"
).max_t (int, optional) – Value for
max_resource_level
. Needed for schedulers which make use of intermediate reports viaon_trial_result
. If this is not given, we try to infer its value fromconfig_space
(seeResourceLevelsScheduler
). checkingconfig_space["epochs"]
,config_space["max_t"]
, andconfig_space["max_epochs"]
. Ifmax_resource_attr
is given, we use the valueconfig_space[max_resource_attr]
. But ifmax_t
is given here, it takes precedence.time_keeper (
TimeKeeper
, optional) – This will be used for timing here (see_elapsed_time
). The time keeper has to be started at the beginning of the experiment. If not given, we use a local time keeper here, which is started with the first call to_suggest()
. Can also be set after construction, withset_time_keeper()
. Note: If you useSimulatorBackend
, you need to pass itstime_keeper
here.
- property searcher: BaseSearcher | None
- set_time_keeper(time_keeper)[source]
Assign time keeper after construction.
This is possible only if the time keeper was not assigned at construction, and the experiment has not yet started.
- Parameters:
time_keeper (
TimeKeeper
) – Time keeper to be used
- on_trial_result(trial, result)[source]
We simply relay
result
to the searcher. Other decisions are done inon_trial_complete
.- Return type:
str
- metric_names()[source]
- Return type:
List
[str
]- Returns:
List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)
- class syne_tune.optimizer.schedulers.HyperbandScheduler(config_space, **kwargs)[source]
Bases:
FIFOScheduler
,MultiFidelitySchedulerMixin
,RemoveCheckpointsSchedulerMixin
Implements different variants of asynchronous Hyperband
See
type
for the different variants. One implementation detail is when using multiple brackets, task allocation to bracket is done randomly, based on a distribution which can be configured.For definitions of concepts (bracket, rung, milestone), see
Li, Jamieson, Rostamizadeh, Gonina, Hardt, Recht, Talwalkar (2018)A System for Massively Parallel Hyperparameter Tuningor
Tiao, Klein, Lienart, Archambeau, Seeger (2020)Model-based Asynchronous Hyperparameter and Neural Architecture SearchNote
This scheduler requires both
metric
andresource_attr
to be returned by the reporter. Here, resource values must be positive int. Ifresource_attr == "epoch"
, this should be the number of epochs done, starting from 1 (not the epoch number, starting from 0).Rung levels and promotion quantiles
Rung levels are values of the resource attribute at which stop/go decisions are made for jobs, comparing their metric against others at the same level. These rung levels (positive, strictly increasing) can be specified via
rung_levels
, the largest must be<= max_t
. Ifrung_levels
is not given, they are specified bygrace_period
andreduction_factor
orrung_increment
:If \(r_{min}\) is
grace_period
, \(\eta\) isreduction_factor
, then rung levels are \(\mathrm{round}(r_{min} \eta^j), j=0, 1, \dots\). This is the default choice for successive halving (Hyperband).If
rung_increment
is given, but notreduction_factor
, then rung levels are \(r_{min} + j \nu, j=0, 1, \dots\), where \(\nu\) isrung_increment
.
If
rung_levels
is given, thengrace_period
,reduction_factor
,rung_increment
are ignored. If they are given, a warning is logged.The rung levels determine the quantiles to be used in the stop/go decisions. If rung levels are \(r_j\), define \(q_j = r_j / r_{j+1}\). \(q_j\) is the promotion quantile at rung level \(r_j\). On average, a fraction of \(q_j\) jobs can continue, the remaining ones are stopped (or paused). In the default successive halving case, we have \(q_j = 1/\eta\) for all \(j\).
Cost-aware schedulers or searchers
Some schedulers (e.g.,
type == "cost_promotion"
) or searchers may depend on cost values (with keycost_attr
) reported alongside the target metric. For promotion-based scheduling, a trial may pause and resume several times. The cost received inon_trial_result
only counts the cost since the last resume. We maintain the sum of such costs in_cost_offset()
, and append a new entry toresult
inon_trial_result
with the total cost. If the evaluation function does not implement checkpointing, once a trial is resumed, it has to start from scratch. We detect this inon_trial_result
and reset the cost offset to 0 (if the trial runs from scratch, the cost reported needs no offset added).Note
This process requires
cost_attr
to be setPending evaluations
The searcher is notified, by
searcher.register_pending
calls, of (trial, resource) pairs for which evaluations are running, and a result is expected in the future. These pending evaluations can be used by the searcher in order to direct sampling elsewhere.The choice of pending evaluations depends on
searcher_data
. If equal to “rungs”, pending evaluations sit only at rung levels, because observations are only used there. In the other cases, pending evaluations sit at all resource levels for which observations are obtained. For example, if a trial is at rung level \(r\) and continues towards the next rung level \(r_{next}\), ifsearcher_data == "rungs"
,searcher.register_pending
is called for \(r_{next}\) only, while for othersearcher_data
values, pending evaluations are registered for \(r + 1, r + 2, \dots, r_{next}\). However, if in this case,register_pending_myopic
isTrue
, we instead callsearcher.register_pending
for \(r + 1\) when each observation is obtained (not just at a rung level). This leads to less pending evaluations at any one time. On the other hand, when a trial is continued at a rung level, we already know it will emit observations up to the next rung level, so it seems more “correct” to register all these pending evaluations in one go.Additional arguments on top of parent class
FIFOScheduler
:- Parameters:
searcher (str or
BaseSearcher
) – Searcher forget_config
decisions. String values are passed tosearcher_factory()
along withsearch_options
and extra information. Supported values:SUPPORTED_SEARCHERS_HYPERBAND
. Defaults to “random” (i.e., random search)resource_attr (str, optional) – Name of resource attribute in results obtained via
on_trial_result
, defaults to “epoch”grace_period (int, optional) – Minimum resource to be used for a job. Ignored if
rung_levels
is given. Defaults to 1reduction_factor (float, optional) – Parameter to determine rung levels. Ignored if
rung_levels
is given. Must be \(\ge 2\), defaults to 3rung_increment (int, optional) – Parameter to determine rung levels. Ignored if
rung_levels
orreduction_factor
are given. Must be postiverung_levels (
List[int]
, optional) – If given, prescribes the set of rung levels to be used. Must contain positive integers, strictly increasing. This information overridesgrace_period
,reduction_factor
,rung_increment
. Note that the stop/promote rule in the successive halving scheduler is set based on the ratio of successive rung levels.brackets (int, optional) – Number of brackets to be used in Hyperband. Each bracket has a different grace period, all share
max_t
andreduction_factor
. Ifbrackets == 1
(default), we run asynchronous successive halving.type (str, optional) –
Type of Hyperband scheduler. Defaults to “stopping”. Supported values (see also subclasses of
RungSystem
):stopping: A config eval is executed by a single task. The task is stopped at a milestone if its metric is worse than a fraction of those who reached the milestone earlier, otherwise it continues. See
StoppingRungSystem
.promotion: A config eval may be associated with multiple tasks over its lifetime. It is never terminated, but may be paused. Whenever a task becomes available, it may promote a config to the next milestone, if better than a fraction of others who reached the milestone. If no config can be promoted, a new one is chosen. See
PromotionRungSystem
.cost_promotion: This is a cost-aware variant of ‘promotion’, see
CostPromotionRungSystem
for details. In this case, costs must be reported under the namerung_system_kwargs["cost_attr"]
in results.pasha: Similar to promotion type Hyperband, but it progressively expands the available resources until the ranking of configurations stabilizes.
rush_stopping: A variation of the stopping scheduler which requires passing
rung_system_kwargs
andpoints_to_evaluate
. The firstrung_system_kwargs["num_threshold_candidates"]
ofpoints_to_evaluate
will enforce stricter rules on which task is continued. SeeRUSHStoppingRungSystem
andRUSHScheduler
.rush_promotion: Same as
rush_stopping
but for promotion, seeRUSHPromotionRungSystem
dyhpo: A model-based scheduler, which can be seen as extension of “promotion” with
rung_increment
rather thanreduction_factor
, seeDynamicHPOSearcher
cost_attr (str, optional) – Required if the scheduler itself uses a cost metric (i.e.,
type="cost_promotion"
), or if the searcher uses a cost metric. See also header comment.searcher_data (str, optional) –
Relevant only if a model-based searcher is used. Example: For NN tuning and ``resource_attr == “epoch”’, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:
”rungs” (default): Only results at rung levels. Cheapest
”all”: All results. Most expensive
”rungs_and_last”: Results at rung levels, plus the most recent result. This means that in between rung levels, only the most recent result is used by the searcher. This is in between
Note: For a Gaussian additive learning curve surrogate model, this has to be set to ‘all’.
register_pending_myopic (bool, optional) – See above. Used only if
searcher_data != "rungs"
. Defaults toFalse
rung_system_per_bracket (bool, optional) – This concerns Hyperband with
brackets > 1
. Defaults toFalse
. When starting a job for a new config, it is assigned a randomly sampled bracket. The larger the bracket, the larger the grace period for the config. Ifrung_system_per_bracket == True
, we maintain separate rung level systems for each bracket, so that configs only compete with others started in the same bracket. Ifrung_system_per_bracket == False
, we use a single rung level system, so that all configs compete with each other. In this case, the bracket of a config only determines the initial grace period, i.e. the first milestone at which it starts competing with others. This is the default. The concept of brackets in Hyperband is meant to hedge against overly aggressive filtering in successive halving, based on low fidelity criteria. In practice, successive halving (i.e.,brackets = 1
) often works best in the asynchronous case (as implemented here). Ifbrackets > 1
, the hedging is stronger ifrung_system_per_bracket
isTrue
.do_snapshots (bool, optional) – Support snapshots? If
True
, a snapshot of all running tasks and rung levels is returned by_promote_trial()
. This snapshot is passed tosearcher.get_config
. Defaults toFalse
. Note: Currently, only the stopping variant supports snapshots.rung_system_kwargs (Dict[str, Any], optional) –
Arguments passed to the rung system: * num_threshold_candidates: Used if ``type in [“rush_promotion”,
”rush_stopping”]``. The first
num_threshold_candidates
inpoints_to_evaluate
enforce stricter requirements to the continuation of training tasks. SeeRUSHScheduler
.probability_sh: Used if
type == "dyhpo"
. In DyHPO, we typically all paused trials against a number of new configurations, and the winner is either resumed or started (new trial). However, with the probability given here, we instead try to promote a trial as iftype == "promotion"
. If no trial can be promoted, we fall back to the DyHPO logic. Use this to make DyHPO robust against starting too many new trials, because all paused ones score poorly (this happens especially at the beginning).
early_checkpoint_removal_kwargs (Dict[str, Any], optional) – If given, speculative early removal of checkpoints is done, see
HyperbandRemoveCheckpointsCallback
. The constructor arguments for theHyperbandRemoveCheckpointsCallback
must be given here, if they cannot be inferred (keymax_num_checkpoints
is mandatory). This feature is used only for scheduler types which pause and resume trials.
- does_pause_resume()[source]
- Return type:
bool
- Returns:
Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?
- property rung_levels: List[int]
Note that all entries of
rung_levels
are smaller thanmax_t
(orconfig_space[max_resource_attr]
): rung levels are resource levels where stop/go decisions are made. In particular, ifrung_levels
is passed at construction withrung_levels[-1] == max_t
, this last entry is stripped off.- Returns:
Rung levels (strictly increasing, positive ints)
- property num_brackets: int
- Returns:
Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1
- property resource_attr: str
- Returns:
Name of resource attribute in reported results
- property max_resource_level: int
- Returns:
Maximum resource level
- property searcher_data: str
- Returns:
Relevant only if a model-based searcher is used. Example: For NN tuning and
resource_attr == "epoch"
, we receive a result for each epoch, but not all epoch values are also rung levels.searcher_data
determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensiveget_config()
may become. Choices:”rungs”: Only results at rung levels. Cheapest
”all”: All results. Most expensive
”rungs_and_last”: Results at rung levels plus last recent one. Not available for all multi-fidelity schedulers
- on_trial_error(trial)[source]
Called when a trial has failed.
- Parameters:
trial (
Trial
) – Trial for which error is reported.
- on_trial_result(trial, result)[source]
We simply relay
result
to the searcher. Other decisions are done inon_trial_complete
.- Return type:
str
- on_trial_remove(trial)[source]
Called to remove trial.
This is called when the trial is in PAUSED or PENDING state. Otherwise, call
on_trial_complete()
.- Parameters:
trial (
Trial
) – Trial to be removed
- on_trial_complete(trial, result)[source]
Notification for the completion of trial.
Note that
on_trial_result()
is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignoreon_trial_result()
and just useresult
here.- Parameters:
trial (
Trial
) – Trial which is completingresult (
Dict
[str
,Any
]) – Result dictionary
- callback_for_checkpoint_removal(stop_criterion)[source]
- Parameters:
stop_criterion (
Callable
[[TuningStatus
],bool
]) – Stopping criterion, as passed toTuner
- Return type:
Optional
[TunerCallback
]- Returns:
CP removal callback, or
None
if CP removal is not activated
- class syne_tune.optimizer.schedulers.MedianStoppingRule(scheduler, resource_attr, running_average=True, metric=None, grace_time=1, grace_population=5, rank_cutoff=0.5)[source]
Bases:
TrialScheduler
Applies median stopping rule in top of an existing scheduler.
If result at time-step ranks less than the cutoff of other results observed at this time-step, the trial is interrupted and otherwise, the wrapped scheduler is called to make the stopping decision.
Suggest decisions are left to the wrapped scheduler.
The mode of the wrapped scheduler is used.
Reference:
Google Vizier: A Service for Black-Box Optimization.Golovin et al. 2017.Proceedings of the 23rd ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, August 2017Pages 1487–1495- Parameters:
scheduler (
TrialScheduler
) – Scheduler to be called for trial suggestion or when median-stopping-rule decision is to continue.resource_attr (
str
) – Key in the reported dictionary that accounts for the resource (e.g. epoch).running_average (
bool
) – IfTrue
, then uses the running average of observation instead of raw observations. Defaults toTrue
metric (
Optional
[str
]) – Metric to be considered, defaults toscheduler.metric
grace_time (
Optional
[int
]) – Median stopping rule is only applied for results whoseresource_attr
exceeds this amount. Defaults to 1grace_population (
int
) – Median stopping rule when at leastgrace_population
have been observed at a resource level. Defaults to 5rank_cutoff (
float
) – Results whose quantiles are below this level are discarded. Defaults to 0.5 (median)
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- grace_condition(time_step)[source]
- Parameters:
time_step (
float
) – Valueresult[self.resource_attr]
- Return type:
bool
- Returns:
Decide for continue?
- class syne_tune.optimizer.schedulers.PopulationBasedTraining(config_space, custom_explore_fn=None, **kwargs)[source]
Bases:
FIFOScheduler
Implements the Population Based Training (PBT) algorithm. This is an adapted version of the Ray Tune implementation:
https://docs.ray.io/en/latest/tune/tutorials/tune-advanced-tutorial.html
PBT was originally presented in the following paper:
Population based training (PBT) maintains a population of models spread across an asynchronous set of workers and dynamically adjust their hyperparameters during training. Every time a worker reaches a user-defined milestone, it returns the performance of the currently evaluated network. If the network is within the top percentile of the population, the worker resumes its training until the next milestone. If not, PBT selects a model from the top percentile uniformly at random. The worker now continues with the latest checkpoint of this new model but mutates the hyperparameters.
The mutation happens as following. For each hyperparameter, we either resample its value uniformly at random, or otherwise increment (multiply by 1.2) or decrement (multiply by 0.8) the value (probability 0.5 each). For categorical hyperparameters, the value is always resampled uniformly.
Note: While this is implemented as child of
FIFOScheduler
, we requiresearcher="random"
(default), since the current code only supports a random searcher.Additional arguments on top of parent class
FIFOScheduler
.- Parameters:
resource_attr (str) – Name of resource attribute in results obtained via
on_trial_result
, defaults to “time_total_s”population_size (int, optional) – Size of the population, defaults to 4
perturbation_interval (float, optional) – Models will be considered for perturbation at this interval of
resource_attr
. Note that perturbation incurs checkpoint overhead, so you shouldn’t set this to be too frequent. Defaults to 60quantile_fraction (float, optional) – Parameters are transferred from the top
quantile_fraction
fraction of trials to the bottomquantile_fraction
fraction. Needs to be between 0 and 0.5. Setting it to 0 essentially implies doing no exploitation at all. Defaults to 0.25resample_probability (float, optional) – The probability of resampling from the original distribution when applying
_explore()
. If not resampled, the value will be perturbed by a factor of 1.2 or 0.8 if continuous, or changed to an adjacent value if discrete. Defaults to 0.25custom_explore_fn (function, optional) – Custom exploration function. This function is invoked as
f(config)
instead of the built-in perturbations, and should returnconfig
updated as needed. If this is given,resample_probability
is not used
- class syne_tune.optimizer.schedulers.RayTuneScheduler(config_space, ray_scheduler=None, ray_searcher=None, points_to_evaluate=None)[source]
Bases:
TrialScheduler
Allow using Ray scheduler and searcher. Any searcher/scheduler should work, except such which need access to
TrialRunner
(e.g., PBT), this feature is not implemented in Syne Tune.If
ray_searcher
is not given (defaults to random searcher), initial configurations to evaluate can be passed inpoints_to_evaluate
. Ifray_searcher
is given, this argument is ignored (needs to be passed toray_searcher
at construction). Note: Useimpute_points_to_evaluate()
in order to preprocesspoints_to_evaluate
specified by the user or the benchmark.- Parameters:
config_space (
Dict
) – Configuration spaceray_scheduler – Ray scheduler, defaults to FIFO scheduler
ray_searcher (
Optional
[Searcher
]) – Ray searcher, defaults to random searchpoints_to_evaluate (
Optional
[List
[Dict
]]) – See above
- RT_FIFOScheduler
alias of
FIFOScheduler
- RT_Searcher
alias of
Searcher
- class RandomSearch(config_space, points_to_evaluate, mode)[source]
Bases:
Searcher
- suggest(trial_id)[source]
Queries the algorithm to retrieve the next set of parameters.
- Return type:
Optional
[Dict
]
- Arguments:
trial_id: Trial ID used for subsequent notifications.
- Returns:
- dict | FINISHED | None: Configuration for a trial, if possible.
If FINISHED is returned, Tune will be notified that no more suggestions/configurations will be provided. If None is returned, Tune will skip the querying of the searcher for this step.
- on_trial_complete(trial_id, result=None, error=False)[source]
Notification for the completion of trial.
Typically, this method is used for notifying the underlying optimizer of the result.
- Args:
trial_id: A unique string ID for the trial. result: Dictionary of metrics for current training progress.
Note that the result dict may include NaNs or may not include the optimization metric. It is up to the subclass implementation to preprocess the result to avoid breaking the optimization process. Upon errors, this may also be None.
error: True if the training process raised an error.
- on_trial_add(trial)[source]
Called when a new trial is added to the trial runner.
Additions are normally triggered by
suggest
.- Parameters:
trial (
Trial
) – Trial to be added
- on_trial_error(trial)[source]
Called when a trial has failed.
- Parameters:
trial (
Trial
) – Trial for which error is reported.
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- on_trial_complete(trial, result)[source]
Notification for the completion of trial.
Note that
on_trial_result()
is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignoreon_trial_result()
and just useresult
here.- Parameters:
trial (
Trial
) – Trial which is completingresult (
Dict
) – Result dictionary
- on_trial_remove(trial)[source]
Called to remove trial.
This is called when the trial is in PAUSED or PENDING state. Otherwise, call
on_trial_complete()
.- Parameters:
trial (
Trial
) – Trial to be removed
- metric_names()[source]
- Return type:
List
[str
]- Returns:
List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)
- metric_mode()[source]
- Return type:
str
- Returns:
“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned
- static convert_config_space(config_space)[source]
Converts config_space from our type to the one of Ray Tune.
Note:
randint(lower, upper)
in Ray Tune has exclusiveupper
, while this is inclusive for us. On the other hand,lograndint(lower, upper)
has inclusiveupper
in Ray Tune as well.- Parameters:
config_space – Configuration space
- Returns:
config_space
converted into Ray Tune type
Subpackages
syne_tune.optimizer.schedulers.multiobjective package
- class syne_tune.optimizer.schedulers.multiobjective.MOASHA(config_space, metrics, mode=None, time_attr='training_iteration', multiobjective_priority=None, max_t=100, grace_period=1, reduction_factor=3, brackets=1)[source]
Bases:
TrialScheduler
Implements MultiObjective Asynchronous Successive HAlving with different multiobjective sort options. References:
A multi-objective perspective on jointly tuning hardware and hyperparametersDavid Salinas, Valerio Perrone, Cedric Archambeau and Olivier CruchantNAS workshop, ICLR2021.and
Multi-objective multi-fidelity hyperparameter optimization with application to fairnessRobin Schmucker, Michele Donini, Valerio Perrone, Cédric Archambeau- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spacemetrics (
List
[str
]) – List of metric names MOASHA optimizes overmode (
Union
[str
,List
[str
],None
]) – One of{"min", "max"}
or a list of these values (same size asmetrics
). Determines whether objectives are minimized or maximized. Defaults to “min”time_attr (
str
) – A training result attr to use for comparing time. Note that you can pass in something non-temporal such astraining_iteration
as a measure of progress, the only requirement is that the attribute should increase monotonically. Defaults to “training_iteration”multiobjective_priority (
Optional
[MOPriority
]) – The multiobjective priority that is used to sort multiobjective candidates. We support several choices such as non-dominated sort or linear scalarization, default is non-dominated sort.max_t (
int
) – max time units per trial. Trials will be stopped aftermax_t
time units (determined bytime_attr
) have passed. Defaults to 100grace_period (
int
) – Only stop trials at least this old in time. The units are the same as the attribute named bytime_attr
. Defaults to 1reduction_factor (
float
) – Used to set halving rate and amount. This is simply a unit-less scalar. Defaults to 3brackets (
int
) – Number of brackets. Each bracket has a differentgrace_period
and number of rung levels. Defaults to 1
- metric_names()[source]
- Return type:
List
[str
]- Returns:
List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)
- metric_mode()[source]
- Return type:
str
- Returns:
“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned
- on_trial_add(trial)[source]
Called when a new trial is added to the trial runner.
Additions are normally triggered by
suggest
.- Parameters:
trial (
Trial
) – Trial to be added
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
[str
,Any
]) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- on_trial_complete(trial, result)[source]
Notification for the completion of trial.
Note that
on_trial_result()
is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignoreon_trial_result()
and just useresult
here.- Parameters:
trial (
Trial
) – Trial which is completingresult (
Dict
[str
,Any
]) – Result dictionary
- on_trial_remove(trial)[source]
Called to remove trial.
This is called when the trial is in PAUSED or PENDING state. Otherwise, call
on_trial_complete()
.- Parameters:
trial (
Trial
) – Trial to be removed
- class syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveRegularizedEvolution(config_space, metric, mode, points_to_evaluate=None, population_size=100, sample_size=10, multiobjective_priority=None, **kwargs)[source]
Bases:
RegularizedEvolution
Adapts regularized evolution algorithm by Real et al. to the multi-objective setting. Elements in the populations are scored via a multi-objective priority that is set to non-dominated sort by default. Parents are sampled from the population based on this score.
Additional arguments on top of parent class
syne_tune.optimizer.schedulers.searchers.StochasticSearcher
:- Parameters:
mode (
Union
[List
[str
],str
]) – Mode to use for the metric given, can be “min” or “max”, defaults to “min”population_size (
int
) – Size of the population, defaults to 100sample_size (
int
) – Size of the candidate set to obtain a parent for the mutation, defaults to 10
- class syne_tune.optimizer.schedulers.multiobjective.NSGA2Searcher(config_space, metric, mode='min', points_to_evaluate=None, population_size=20, **kwargs)[source]
Bases:
StochasticSearcher
This is a wrapper around the NSGA-2 [1] implementation of pymoo [2].
[1] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan.A fast and elitist multiobjective genetic algorithm: nsga-II.Trans. Evol. Comp, 6(2):182–197, April 2002.[2] J. Blank and K. Debpymoo: Multi-Objective Optimization in PythonIEEE Access, 2020- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spacemetric (
List
[str
]) –Name of metric passed to
update()
. Can be obtained from scheduler inconfigure_scheduler()
. In the case of multi-objective optimization,metric is a list of strings specifying all objectives to be optimized.
points_to_evaluate (
Optional
[List
[dict
]]) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. IfNone
(default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.mode (
Union
[List
[str
],str
]) – Should metric be minimized (“min”, default) or maximized (“max”). In the case of multi-objective optimization, mode can be a list defining for each metric if it is minimized or maximizedpopulation_size (
int
) – Size of the population
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[Dict
[str
,Any
]]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
- class syne_tune.optimizer.schedulers.multiobjective.LinearScalarizedScheduler(config_space, metric, mode='min', scalarization_weights=None, base_scheduler_factory=None, **base_scheduler_kwargs)[source]
Bases:
TrialScheduler
Scheduler with linear scalarization of multiple objectives
This method optimizes a single objective equal to the linear scalarization of given two objectives. The scalarized single objective is named:
"scalarized_<metric1>_<metric2>_..._<metricN>"
.- Parameters:
base_scheduler_factory (
Optional
[Callable
[[Any
],TrialScheduler
]]) – Factory method for the single-objective scheduler used on the scalarized objective. It will be initialized inside this scheduler. Defaults toFIFOScheduler
.config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
List
[str
]) – Names of metrics to optimizemode (
Union
[List
[str
],str
]) – Modes of metrics to optimize (“min” or “max”). All must be matching.scalarization_weights (
Union
[ndarray
,List
[float
],None
]) – Weights used to scalarize objectives. Defaults to an array of 1sbase_scheduler_kwargs – Additional arguments to
base_scheduler_factory
beyondconfig_space
,metric
,mode
-
scalarization_weights:
ndarray
-
single_objective_metric:
str
-
base_scheduler:
TrialScheduler
- on_trial_add(trial)[source]
Called when a new trial is added to the trial runner. See the docstring of the chosen base_scheduler for details
- on_trial_error(trial)[source]
Called when a trial has failed. See the docstring of the chosen base_scheduler for details
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial. See the docstring of the chosen base_scheduler for details
- Return type:
str
- on_trial_complete(trial, result)[source]
Notification for the completion of trial. See the docstring of the chosen base_scheduler for details
- on_trial_remove(trial)[source]
Called to remove trial. See the docstring of the chosen base_scheduler for details
- trials_checkpoints_can_be_removed()[source]
See the docstring of the chosen base_scheduler for details :rtype:
List
[int
] :return: IDs of paused trials for which checkpoints can be removed
- class syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveMultiSurrogateSearcher(config_space, metric, estimators, mode='min', points_to_evaluate=None, scoring_class=None, num_initial_candidates=250, num_initial_random_choices=3, allow_duplicates=False, restrict_configurations=None, clone_from_state=False, **kwargs)[source]
Bases:
BayesianOptimizationSearcher
Multi Objective Multi Surrogate Searcher for FIFO scheduler
This searcher must be used with
FIFOScheduler
. It provides Bayesian optimization, based on a scikit-learn estimator based surrogate model.Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
estimator – Instance of
SKLearnEstimator
to be used as surrogate modelscoring_class (
Optional
[Callable
[[Any
],ScoringFunction
]]) – The scoring function (or acquisition function) class and any extra parameters used to instantiate it. IfNone
, expected improvement (EI) is used. Note that the acquisition function is not locally optimized with this searcher.num_initial_candidates (
int
) – Number of candidates sampled for scoring with acquisition function.num_initial_random_choices (
int
) – Number of randomly chosen candidates before surrogate model is used.allow_duplicates (
bool
) – IfTrue
, allow for the same candidate to be selected more than once.restrict_configurations (
Optional
[List
[Dict
[str
,Any
]]]) – If given, the searcher only suggests configurations from this list. Ifallow_duplicates == False
, entries are popped off this list once suggested.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
- class syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveLCBRandomLinearScalarization(predictor, active_metric=None, weights_sampler=None, kappa=0.5, normalize_acquisition=True, random_seed=None)[source]
Bases:
ScoringFunction
Note: This is the multi objective random scalarization scoring function based on the work of Biswajit et al. [1]. This scoring function uses Lower Confidence Bound as the acquisition for the scalarized objective \(h(\mu, \sigma) = \mu - \kappa * \sigma\)
[1] Paria, Biswajit, Kirthevasan Kandasamy and Barnabás Póczos.A Flexible Framework for Multi-Objective Bayesian Optimization using Random Scalarizations.Conference on Uncertainty in Artificial Intelligence (2018).- Parameters:
predictor (
Dict
[str
,Predictor
]) – Surrogate predictor for statistics of predictive distributionweights_sampler (
Optional
[Callable
[[],Dict
[str
,float
]]]) –Callable that can generate weights for each objective. Once called it will return a dictionary mapping metric name to scalarization weight as {
<name of metric 1> : <weight for metric 1>, <name of metric 2> : <weight for metric 2>, …
}
kappa (
float
) – Hyperparameter used for the LCM portion of the scoringnormalize_acquisition (
bool
) – If True, use rank-normalization on the acquisition function results before weighting.random_seed (
Optional
[int
]) – The random seed used for default weights_sampler if not provided.
- score(candidates, predictor=None)[source]
- Parameters:
candidates (
Iterable
[Dict
[str
,Union
[int
,float
,str
]]]) – Configurations for which scores are to be computedpredictor (
Optional
[Dict
[str
,Predictor
]]) – Overrides default predictor
- Return type:
List
[float
]- Returns:
List of score values, length of
candidates
Submodules
syne_tune.optimizer.schedulers.multiobjective.linear_scalarizer module
- class syne_tune.optimizer.schedulers.multiobjective.linear_scalarizer.LinearScalarizedScheduler(config_space, metric, mode='min', scalarization_weights=None, base_scheduler_factory=None, **base_scheduler_kwargs)[source]
Bases:
TrialScheduler
Scheduler with linear scalarization of multiple objectives
This method optimizes a single objective equal to the linear scalarization of given two objectives. The scalarized single objective is named:
"scalarized_<metric1>_<metric2>_..._<metricN>"
.- Parameters:
base_scheduler_factory (
Optional
[Callable
[[Any
],TrialScheduler
]]) – Factory method for the single-objective scheduler used on the scalarized objective. It will be initialized inside this scheduler. Defaults toFIFOScheduler
.config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
List
[str
]) – Names of metrics to optimizemode (
Union
[List
[str
],str
]) – Modes of metrics to optimize (“min” or “max”). All must be matching.scalarization_weights (
Union
[ndarray
,List
[float
],None
]) – Weights used to scalarize objectives. Defaults to an array of 1sbase_scheduler_kwargs – Additional arguments to
base_scheduler_factory
beyondconfig_space
,metric
,mode
-
scalarization_weights:
ndarray
-
single_objective_metric:
str
-
base_scheduler:
TrialScheduler
- on_trial_add(trial)[source]
Called when a new trial is added to the trial runner. See the docstring of the chosen base_scheduler for details
- on_trial_error(trial)[source]
Called when a trial has failed. See the docstring of the chosen base_scheduler for details
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial. See the docstring of the chosen base_scheduler for details
- Return type:
str
- on_trial_complete(trial, result)[source]
Notification for the completion of trial. See the docstring of the chosen base_scheduler for details
- on_trial_remove(trial)[source]
Called to remove trial. See the docstring of the chosen base_scheduler for details
- trials_checkpoints_can_be_removed()[source]
See the docstring of the chosen base_scheduler for details :rtype:
List
[int
] :return: IDs of paused trials for which checkpoints can be removed
syne_tune.optimizer.schedulers.multiobjective.moasha module
- class syne_tune.optimizer.schedulers.multiobjective.moasha.MOASHA(config_space, metrics, mode=None, time_attr='training_iteration', multiobjective_priority=None, max_t=100, grace_period=1, reduction_factor=3, brackets=1)[source]
Bases:
TrialScheduler
Implements MultiObjective Asynchronous Successive HAlving with different multiobjective sort options. References:
A multi-objective perspective on jointly tuning hardware and hyperparametersDavid Salinas, Valerio Perrone, Cedric Archambeau and Olivier CruchantNAS workshop, ICLR2021.and
Multi-objective multi-fidelity hyperparameter optimization with application to fairnessRobin Schmucker, Michele Donini, Valerio Perrone, Cédric Archambeau- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spacemetrics (
List
[str
]) – List of metric names MOASHA optimizes overmode (
Union
[str
,List
[str
],None
]) – One of{"min", "max"}
or a list of these values (same size asmetrics
). Determines whether objectives are minimized or maximized. Defaults to “min”time_attr (
str
) – A training result attr to use for comparing time. Note that you can pass in something non-temporal such astraining_iteration
as a measure of progress, the only requirement is that the attribute should increase monotonically. Defaults to “training_iteration”multiobjective_priority (
Optional
[MOPriority
]) – The multiobjective priority that is used to sort multiobjective candidates. We support several choices such as non-dominated sort or linear scalarization, default is non-dominated sort.max_t (
int
) – max time units per trial. Trials will be stopped aftermax_t
time units (determined bytime_attr
) have passed. Defaults to 100grace_period (
int
) – Only stop trials at least this old in time. The units are the same as the attribute named bytime_attr
. Defaults to 1reduction_factor (
float
) – Used to set halving rate and amount. This is simply a unit-less scalar. Defaults to 3brackets (
int
) – Number of brackets. Each bracket has a differentgrace_period
and number of rung levels. Defaults to 1
- metric_names()[source]
- Return type:
List
[str
]- Returns:
List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)
- metric_mode()[source]
- Return type:
str
- Returns:
“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned
- on_trial_add(trial)[source]
Called when a new trial is added to the trial runner.
Additions are normally triggered by
suggest
.- Parameters:
trial (
Trial
) – Trial to be added
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
[str
,Any
]) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- on_trial_complete(trial, result)[source]
Notification for the completion of trial.
Note that
on_trial_result()
is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignoreon_trial_result()
and just useresult
here.- Parameters:
trial (
Trial
) – Trial which is completingresult (
Dict
[str
,Any
]) – Result dictionary
- on_trial_remove(trial)[source]
Called to remove trial.
This is called when the trial is in PAUSED or PENDING state. Otherwise, call
on_trial_complete()
.- Parameters:
trial (
Trial
) – Trial to be removed
syne_tune.optimizer.schedulers.multiobjective.multi_objective_regularized_evolution module
- class syne_tune.optimizer.schedulers.multiobjective.multi_objective_regularized_evolution.MultiObjectiveRegularizedEvolution(config_space, metric, mode, points_to_evaluate=None, population_size=100, sample_size=10, multiobjective_priority=None, **kwargs)[source]
Bases:
RegularizedEvolution
Adapts regularized evolution algorithm by Real et al. to the multi-objective setting. Elements in the populations are scored via a multi-objective priority that is set to non-dominated sort by default. Parents are sampled from the population based on this score.
Additional arguments on top of parent class
syne_tune.optimizer.schedulers.searchers.StochasticSearcher
:- Parameters:
mode (
Union
[List
[str
],str
]) – Mode to use for the metric given, can be “min” or “max”, defaults to “min”population_size (
int
) – Size of the population, defaults to 100sample_size (
int
) – Size of the candidate set to obtain a parent for the mutation, defaults to 10
syne_tune.optimizer.schedulers.multiobjective.multi_surrogate_multi_objective_searcher module
- class syne_tune.optimizer.schedulers.multiobjective.multi_surrogate_multi_objective_searcher.MultiObjectiveMultiSurrogateSearcher(config_space, metric, estimators, mode='min', points_to_evaluate=None, scoring_class=None, num_initial_candidates=250, num_initial_random_choices=3, allow_duplicates=False, restrict_configurations=None, clone_from_state=False, **kwargs)[source]
Bases:
BayesianOptimizationSearcher
Multi Objective Multi Surrogate Searcher for FIFO scheduler
This searcher must be used with
FIFOScheduler
. It provides Bayesian optimization, based on a scikit-learn estimator based surrogate model.Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
estimator – Instance of
SKLearnEstimator
to be used as surrogate modelscoring_class (
Optional
[Callable
[[Any
],ScoringFunction
]]) – The scoring function (or acquisition function) class and any extra parameters used to instantiate it. IfNone
, expected improvement (EI) is used. Note that the acquisition function is not locally optimized with this searcher.num_initial_candidates (
int
) – Number of candidates sampled for scoring with acquisition function.num_initial_random_choices (
int
) – Number of randomly chosen candidates before surrogate model is used.allow_duplicates (
bool
) – IfTrue
, allow for the same candidate to be selected more than once.restrict_configurations (
Optional
[List
[Dict
[str
,Any
]]]) – If given, the searcher only suggests configurations from this list. Ifallow_duplicates == False
, entries are popped off this list once suggested.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
syne_tune.optimizer.schedulers.multiobjective.multiobjective_priority module
- class syne_tune.optimizer.schedulers.multiobjective.multiobjective_priority.MOPriority(metrics=None)[source]
Bases:
object
- class syne_tune.optimizer.schedulers.multiobjective.multiobjective_priority.LinearScalarizationPriority(metrics=None, weights=None)[source]
Bases:
MOPriority
- class syne_tune.optimizer.schedulers.multiobjective.multiobjective_priority.FixedObjectivePriority(metrics=None, dim=None)[source]
Bases:
MOPriority
- class syne_tune.optimizer.schedulers.multiobjective.multiobjective_priority.NonDominatedPriority(metrics=None, dim=0, max_num_samples=None)[source]
Bases:
MOPriority
syne_tune.optimizer.schedulers.multiobjective.non_dominated_priority module
- syne_tune.optimizer.schedulers.multiobjective.non_dominated_priority.pareto_efficient(X)[source]
Evaluates for each allocation in the provided array whether it is Pareto efficient. The costs are assumed to be improved by lowering them (eg lower is better).
- Return type:
ndarray
Parameters
- X: np.ndarray [N, D]
The allocations to check where N is the number of allocations and D the number of costs per allocation.
Returns
- np.ndarray [N]
A boolean array, indicating for each allocation whether it is Pareto efficient.
- syne_tune.optimizer.schedulers.multiobjective.non_dominated_priority.compute_epsilon_net(X, dim=None)[source]
Outputs an order of the items in the provided array such that the items are spaced well. This means that after choosing a seed item, the next item is chosen to be the farthest from the seed item. The third item is then chosen to maximize the distance to the existing points and so on.
This algorithm is taken from “Nearest-Neighbor Searching and Metric Space Dimensions” (Clarkson, 2005, p.17).
- Return type:
ndarray
Parameters
- X: np.ndarray [N, D]
The items to sparsify where N is the number of items and D their dimensionality.
- dim: Optional[int], default: None
The index of the dimension which to use to choose the seed item. If
None
, an item is chosen at random, otherwise the item with the lowest value in the specified dimension is used.
Returns
- np.ndarray [N]
A list of item indices, defining a sparsified order of the items.
- syne_tune.optimizer.schedulers.multiobjective.non_dominated_priority.nondominated_sort(X, dim=None, max_items=None, flatten=True)[source]
Performs a multi-objective sort by iteratively computing the Pareto front and sparsifying the items within the Pareto front. This is a non-dominated sort leveraging an epsilon-net.
- Return type:
Union
[List
[int
],List
[List
[int
]]]
Parameters
- X: np.ndarray [N, D]
The multi-dimensional items to sort.
- dim: Optional[int], default: None
The feature (metric) to prefer when ranking items within the Pareto front. If
None
, items are chosen randomly.- max_items: Optional[int], default: None
The maximum number of items that should be returned. When this is
None
, all items are sorted.- flatten: bool, default: True
Whether to flatten the resulting array.
Returns
- Union[List[int], List[List[int]]]
The indices of the sorted items, either globally or within each of the Pareto front depending on the value of
flatten
.
syne_tune.optimizer.schedulers.multiobjective.nsga2_searcher module
- class syne_tune.optimizer.schedulers.multiobjective.nsga2_searcher.NSGA2Searcher(config_space, metric, mode='min', points_to_evaluate=None, population_size=20, **kwargs)[source]
Bases:
StochasticSearcher
This is a wrapper around the NSGA-2 [1] implementation of pymoo [2].
[1] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan.A fast and elitist multiobjective genetic algorithm: nsga-II.Trans. Evol. Comp, 6(2):182–197, April 2002.[2] J. Blank and K. Debpymoo: Multi-Objective Optimization in PythonIEEE Access, 2020- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spacemetric (
List
[str
]) –Name of metric passed to
update()
. Can be obtained from scheduler inconfigure_scheduler()
. In the case of multi-objective optimization,metric is a list of strings specifying all objectives to be optimized.
points_to_evaluate (
Optional
[List
[dict
]]) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. IfNone
(default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.mode (
Union
[List
[str
],str
]) – Should metric be minimized (“min”, default) or maximized (“max”). In the case of multi-objective optimization, mode can be a list defining for each metric if it is minimized or maximizedpopulation_size (
int
) – Size of the population
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[Dict
[str
,Any
]]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
syne_tune.optimizer.schedulers.multiobjective.random_scalarization module
- class syne_tune.optimizer.schedulers.multiobjective.random_scalarization.MultiObjectiveLCBRandomLinearScalarization(predictor, active_metric=None, weights_sampler=None, kappa=0.5, normalize_acquisition=True, random_seed=None)[source]
Bases:
ScoringFunction
Note: This is the multi objective random scalarization scoring function based on the work of Biswajit et al. [1]. This scoring function uses Lower Confidence Bound as the acquisition for the scalarized objective \(h(\mu, \sigma) = \mu - \kappa * \sigma\)
[1] Paria, Biswajit, Kirthevasan Kandasamy and Barnabás Póczos.A Flexible Framework for Multi-Objective Bayesian Optimization using Random Scalarizations.Conference on Uncertainty in Artificial Intelligence (2018).- Parameters:
predictor (
Dict
[str
,Predictor
]) – Surrogate predictor for statistics of predictive distributionweights_sampler (
Optional
[Callable
[[],Dict
[str
,float
]]]) –Callable that can generate weights for each objective. Once called it will return a dictionary mapping metric name to scalarization weight as {
<name of metric 1> : <weight for metric 1>, <name of metric 2> : <weight for metric 2>, …
}
kappa (
float
) – Hyperparameter used for the LCM portion of the scoringnormalize_acquisition (
bool
) – If True, use rank-normalization on the acquisition function results before weighting.random_seed (
Optional
[int
]) – The random seed used for default weights_sampler if not provided.
- score(candidates, predictor=None)[source]
- Parameters:
candidates (
Iterable
[Dict
[str
,Union
[int
,float
,str
]]]) – Configurations for which scores are to be computedpredictor (
Optional
[Dict
[str
,Predictor
]]) – Overrides default predictor
- Return type:
List
[float
]- Returns:
List of score values, length of
candidates
syne_tune.optimizer.schedulers.multiobjective.utils module
- syne_tune.optimizer.schedulers.multiobjective.utils.default_reference_point(results_array)[source]
- Return type:
ndarray
- syne_tune.optimizer.schedulers.multiobjective.utils.hypervolume(results_array, reference_point=None)[source]
Compute the hypervolume of all results based on reference points
- Parameters:
results_array (
ndarray
) – Array with experiment results ordered by time with shape(npoints, ndimensions)
.reference_point (
Optional
[ndarray
]) – Reference points for hypervolume calculations. IfNone
, the maximum values of each dimension of results_array is used.
- Return type:
float
:return Hypervolume indicator
- syne_tune.optimizer.schedulers.multiobjective.utils.linear_interpolate(hv_indicator, indices)[source]
- syne_tune.optimizer.schedulers.multiobjective.utils.hypervolume_cumulative(results_array, reference_point=None, increment=1)[source]
Compute the cumulative hypervolume of all results based on reference points Returns an array with hypervolumes given by an increasing range of points.
return_array[idx] = hypervolume(results_array[0 : (idx + 1)])
.The current implementation is very slow, since the hypervolume index is not computed incrementally. A solution for now is to use
increment > 1
, in which case the HV index is only computed everyincrement
entry, and linearly interpolated in between.- Parameters:
results_array (
ndarray
) – Array with experiment results ordered by time with shape(npoints, ndimensions)
.reference_point (
Optional
[ndarray
]) – Reference points for hypervolume calculations. IfNone
, the maximum values of each dimension of results_array is used.
- Return type:
ndarray
- Returns:
Cumulative hypervolume array, shape
(npoints,)
syne_tune.optimizer.schedulers.neuralbands package
- class syne_tune.optimizer.schedulers.neuralbands.NeuralbandScheduler(config_space, gamma=0.01, nu=0.01, step_size=30, max_while_loop=100, **kwargs)[source]
Bases:
NeuralbandSchedulerBase
NeuralBand is a neural-bandit based HPO algorithm for the multi-fidelity setting. It uses a budget-aware neural network together with a feedback perturbation to efficiently explore the input space across fidelities. NeuralBand uses a novel configuration selection criterion to actively choose the configuration in each trial and incrementally exploits the knowledge of every past trial.
- Parameters:
config_space (
Dict
) –gamma (
float
) – Control aggressiveness of configuration selection criterionnu (
float
) – Control aggressiveness of perturbing feedback for explorationstep_size (
int
) – How many trials we train network oncemax_while_loop (
int
) – Maximal number of times we can draw a configuration from configuration spacekwargs –
Submodules
syne_tune.optimizer.schedulers.neuralbands.networks module
- class syne_tune.optimizer.schedulers.neuralbands.networks.NetworkExploitation(dim, hidden_size=100)[source]
Bases:
Module
- forward(x1, b)[source]
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training:
bool
syne_tune.optimizer.schedulers.neuralbands.neuralband module
- syne_tune.optimizer.schedulers.neuralbands.neuralband.is_continue_decision(trial_decision)[source]
- Return type:
bool
- class syne_tune.optimizer.schedulers.neuralbands.neuralband.NeuralbandScheduler(config_space, gamma=0.01, nu=0.01, step_size=30, max_while_loop=100, **kwargs)[source]
Bases:
NeuralbandSchedulerBase
NeuralBand is a neural-bandit based HPO algorithm for the multi-fidelity setting. It uses a budget-aware neural network together with a feedback perturbation to efficiently explore the input space across fidelities. NeuralBand uses a novel configuration selection criterion to actively choose the configuration in each trial and incrementally exploits the knowledge of every past trial.
- Parameters:
config_space (
Dict
) –gamma (
float
) – Control aggressiveness of configuration selection criterionnu (
float
) – Control aggressiveness of perturbing feedback for explorationstep_size (
int
) – How many trials we train network oncemax_while_loop (
int
) – Maximal number of times we can draw a configuration from configuration spacekwargs –
syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement module
- syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement.is_continue_decision(trial_decision)[source]
- Return type:
bool
- class syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement.NeuralbandSchedulerBase(config_space, step_size, max_while_loop, **kwargs)[source]
Bases:
HyperbandScheduler
- class syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement.NeuralbandEGreedyScheduler(config_space, epsilon=0.1, step_size=30, max_while_loop=100, **kwargs)[source]
Bases:
NeuralbandSchedulerBase
- class syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement.NeuralbandTSScheduler(config_space, lamdba=0.1, nu=0.01, step_size=30, max_while_loop=100, **kwargs)[source]
Bases:
NeuralbandSchedulerBase
- class syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement.NeuralbandUCBScheduler(config_space, lamdba=0.01, nu=0.01, step_size=30, max_while_loop=100, **kwargs)[source]
Bases:
NeuralbandSchedulerBase
syne_tune.optimizer.schedulers.searchers package
- class syne_tune.optimizer.schedulers.searchers.BaseSearcher(config_space, metric, points_to_evaluate=None, mode='min')[source]
Bases:
object
Base class of searchers, which are components of schedulers responsible for implementing
get_config()
.Note
This is an abstract base class. In order to implement a new searcher, try to start from
StochasticAndFilterDuplicatesSearcher
orStochasticSearcher
, which implement generally useful properties.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spacemetric (
Union
[List
[str
],str
]) –Name of metric passed to
update()
. Can be obtained from scheduler inconfigure_scheduler()
. In the case of multi-objective optimization,metric is a list of strings specifying all objectives to be optimized.
points_to_evaluate (
Optional
[List
[Dict
[str
,Any
]]]) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. IfNone
(default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.mode (
Union
[List
[str
],str
]) – Should metric be minimized (“min”, default) or maximized (“max”). In the case of multi-objective optimization, mode can be a list defining for each metric if it is minimized or maximized
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[Dict
[str
,Any
]]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
- on_trial_result(trial_id, config, result, update)[source]
Inform searcher about result
The scheduler passes every result. If
update == True
, the searcher should update its surrogate model (if any), otherwiseresult
is an intermediate result not modelled.The default implementation calls
_update()
ifupdate == True
. It can be overwritten by searchers which also react to intermediate results.- Parameters:
trial_id (
str
) – Seeon_trial_result()
config (
Dict
[str
,Any
]) – Seeon_trial_result()
result (
Dict
[str
,Any
]) – Seeon_trial_result()
update (
bool
) – Should surrogate model be updated?
- register_pending(trial_id, config=None, milestone=None)[source]
Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.
- Parameters:
trial_id (
str
) – ID of trial to be registered as pending evaluationconfig (
Optional
[Dict
[str
,Any
]]) – Iftrial_id
has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.milestone (
Optional
[int
]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers(config, milestone)
as pending.
- remove_case(trial_id, **kwargs)[source]
Remove data case previously appended by
_update()
For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.
- Parameters:
trial_id (
str
) – ID of trial whose data is to be removedkwargs – Extra arguments, optional
- evaluation_failed(trial_id)[source]
Called by scheduler if an evaluation job for a trial failed.
The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).
- Parameters:
trial_id (
str
) – ID of trial whose evaluated failed
- cleanup_pending(trial_id)[source]
Removes all pending evaluations for trial
trial_id
.This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.
- Parameters:
trial_id (
str
) – ID of trial whose pending evaluations should be cleared
- dataset_size()[source]
- Returns:
Size of dataset a model is fitted to, or 0 if no model is fitted to data
- model_parameters()[source]
- Returns:
Dictionary with current model (hyper)parameter values if this is supported; otherwise empty
- get_state()[source]
Together with
clone_from_state()
, this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.- Return type:
Dict
[str
,Any
]- Returns:
Pickle-able mutable state of searcher
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- property debug_log: DebugLogPrinter | None
Some subclasses support writing a debug log, using
DebugLogPrinter
. SeeRandomSearcher
for an example.- Returns:
debug_log
object`` or None (not supported)
- syne_tune.optimizer.schedulers.searchers.impute_points_to_evaluate(points_to_evaluate, config_space)[source]
Transforms
points_to_evaluate
argument toBaseSearcher
. Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. Also, duplicate entries are filtered out. If None (default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.- Parameters:
points_to_evaluate (
Optional
[List
[Dict
[str
,Any
]]]) – Argument toBaseSearcher
config_space (
Dict
[str
,Any
]) – Configuration space
- Return type:
List
[Dict
[str
,Any
]]- Returns:
List of fully specified initial configs
- class syne_tune.optimizer.schedulers.searchers.StochasticSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
BaseSearcher
Base class of searchers which use random decisions. Creates the
random_state
member, which must be used for all random draws.Making proper use of this interface allows us to run experiments with control of random seeds, e.g. for paired comparisons or integration testing.
Additional arguments on top of parent class
BaseSearcher
:- Parameters:
random_seed_generator (
RandomSeedGenerator
, optional) – If given, random seed is drawn from thererandom_seed (int, optional) – Used if
random_seed_generator
is not given.
- class syne_tune.optimizer.schedulers.searchers.StochasticAndFilterDuplicatesSearcher(config_space, metric, points_to_evaluate=None, allow_duplicates=None, restrict_configurations=None, **kwargs)[source]
Bases:
StochasticSearcher
Base class for searchers with the following properties:
Random decisions use common
random_state
Maintains exclusion list to filter out duplicates in
get_config()
ifallows_duplicates == False`. If this is ``True
, duplicates are not filtered, and the exclusion list is used only to avoid configurations of failed trials.If
restrict_configurations
is given, this is a list of configurations, and the searcher only suggests configurations from there. Ifallow_duplicates == False
, entries are popped off this list once suggested.points_to_evaluate
is filtered to only contain entries in this set.
In order to make use of these features:
Reject configurations in
get_config()
ifshould_not_suggest()
returnsTrue
. If the configuration is drawn at random, use_get_random_config()
, which incorporates this filteringImplement
_get_config()
instead ofget_config()
. The latter adds the new config to the exclusion list ifallow_duplicates == False
Note: Not all searchers which filter duplicates make use of this class.
Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
allow_duplicates (
Optional
[bool
]) – See above. Defaults toFalse
restrict_configurations (
Optional
[List
[Dict
[str
,Any
]]]) – See above, optional
- property allow_duplicates: bool
- should_not_suggest(config)[source]
- Parameters:
config (
Dict
[str
,Any
]) – Configuration- Return type:
bool
- Returns:
get_config()
should not suggest this configuration?
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[Dict
[str
,Any
]]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
- register_pending(trial_id, config=None, milestone=None)[source]
Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.
- Parameters:
trial_id (
str
) – ID of trial to be registered as pending evaluationconfig (
Optional
[Dict
[str
,Any
]]) – Iftrial_id
has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.milestone (
Optional
[int
]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers(config, milestone)
as pending.
- syne_tune.optimizer.schedulers.searchers.extract_random_seed(**kwargs)[source]
- Return type:
(
int
,Dict
[str
,Any
])
- class syne_tune.optimizer.schedulers.searchers.RandomSearcher(config_space, metric, points_to_evaluate=None, debug_log=False, resource_attr=None, allow_duplicates=None, restrict_configurations=None, **kwargs)[source]
Bases:
StochasticAndFilterDuplicatesSearcher
Searcher which randomly samples configurations to try next.
Additional arguments on top of parent class
StochasticAndFilterDuplicatesSearcher
:- Parameters:
debug_log (
Union
[bool
,DebugLogPrinter
]) – IfTrue
, debug log printing is activated. Logs which configs are chosen when, and which metric values are obtained. Defaults toFalse
resource_attr (
Optional
[str
]) – Optional. Key inresult
passed to_update()
for resource value (for multi-fidelity schedulers)
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- property debug_log
Some subclasses support writing a debug log, using
DebugLogPrinter
. SeeRandomSearcher
for an example.- Returns:
debug_log
object`` or None (not supported)
- class syne_tune.optimizer.schedulers.searchers.GridSearcher(config_space, metric, points_to_evaluate=None, num_samples=None, shuffle_config=True, allow_duplicates=False, **kwargs)[source]
Bases:
StochasticSearcher
Searcher that samples configurations from an equally spaced grid over config_space.
It first evaluates configurations defined in points_to_evaluate and then continues with the remaining points from the grid.
Additional arguments on top of parent class
StochasticSearcher
.- Parameters:
num_samples (
Optional
[Dict
[str
,int
]]) – Dictionary, optional. Number of samples per hyperparameter. This is required for hyperparameters of type float, optional for integer hyperparameters, and will be ignored for other types (categorical, scalar). If left unspecified, a default value ofDEFAULT_NSAMPLE
will be used for float parameters, and the smallest ofDEFAULT_NSAMPLE
and integer range will be used for integer parameters.shuffle_config (
bool
) – IfTrue
(default), the order of configurations suggested after those specified inpoints_to_evaluate
is shuffled. Otherwise, the order will follow the Cartesian product of the configurations.allow_duplicates (
bool
) – IfTrue
,get_config()
may return the same configuration more than once. Defaults toFalse
- get_config(**kwargs)[source]
Select the next configuration from the grid.
This is done without replacement, so previously returned configs are not suggested again.
- Return type:
Optional
[dict
]- Returns:
A new configuration that is valid, or None if no new config can be suggested. The returned configuration is a dictionary that maps hyperparameters to its values.
- get_state()[source]
Together with
clone_from_state()
, this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.- Return type:
Dict
[str
,Any
]- Returns:
Pickle-able mutable state of searcher
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- syne_tune.optimizer.schedulers.searchers.searcher_factory(searcher_name, **kwargs)[source]
Factory for searcher objects
This function creates searcher objects from string argument name and additional kwargs. It is typically called in the constructor of a scheduler (see
FIFOScheduler
), which provides most of the requiredkwargs
.- Parameters:
searcher_name (
str
) – Value ofsearcher
argument to scheduler (seeFIFOScheduler
)kwargs – Argument to
BaseSearcher
constructor
- Return type:
- Returns:
New searcher object
- class syne_tune.optimizer.schedulers.searchers.ModelBasedSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
StochasticSearcher
Common code for surrogate model based searchers
If
num_initial_random_choices > 0
, initial configurations are drawn using an internalRandomSearcher
object, which is created in_assign_random_searcher()
. This internal random searcher sharesrandom_state
with the searcher here. This ensures that ifModelBasedSearcher
andRandomSearcher
objects are created with the samerandom_seed
andpoints_to_evaluate
argument, initial configurations are identical until_get_config_modelbased()
kicks in.Note that this works because
random_state
is only used in the internal random searcher until meth:_get_config_modelbased
is first called.- on_trial_result(trial_id, config, result, update)[source]
Inform searcher about result
The scheduler passes every result. If
update == True
, the searcher should update its surrogate model (if any), otherwiseresult
is an intermediate result not modelled.The default implementation calls
_update()
ifupdate == True
. It can be overwritten by searchers which also react to intermediate results.- Parameters:
trial_id (
str
) – Seeon_trial_result()
config (
Dict
[str
,Any
]) – Seeon_trial_result()
result (
Dict
[str
,Any
]) – Seeon_trial_result()
update (
bool
) – Should surrogate model be updated?
- get_config(**kwargs)[source]
Runs Bayesian optimization in order to suggest the next config to evaluate.
- Return type:
Optional
[Dict
[str
,Any
]]- Returns:
Next config to evaluate at
- dataset_size()[source]
- Returns:
Size of dataset a model is fitted to, or 0 if no model is fitted to data
- model_parameters()[source]
- Returns:
Dictionary with current model (hyper)parameter values if this is supported; otherwise empty
- get_state()[source]
The mutable state consists of the GP model parameters, the
TuningJobState
, and theskip_optimization
predicate (which can have a mutable state). We assume thatskip_optimization
can be pickled.Note that we do not have to store the state of
_random_searcher
, since this internal searcher shares itsrandom_state
with the searcher here.- Return type:
Dict
[str
,Any
]
- property debug_log
Some subclasses support writing a debug log, using
DebugLogPrinter
. SeeRandomSearcher
for an example.- Returns:
debug_log
object`` or None (not supported)
- class syne_tune.optimizer.schedulers.searchers.BayesianOptimizationSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
ModelBasedSearcher
Common Code for searchers using Bayesian optimization
We implement Bayesian optimization, based on a model factory which parameterizes the state transformer. This implementation works with any type of surrogate model and acquisition function, which are compatible with each other.
The following happens in
get_config()
:For the first
num_init_random
calls, a config is drawn at random (afterpoints_to_evaluate
, which are included in thenum_init_random
initial ones). Afterwards, Bayesian optimization is used, unless there are no finished evaluations yet (a surrogate model cannot be used with no data at all)For BO, model hyperparameter are refit first. This step can be skipped (see
opt_skip_*
parameters).Next, the BO decision is made based on
BayesianOptimizationAlgorithm
. This involves samplingnum_init_candidates`
configs are sampled at random, ranking them with a scoring function (initial_scoring
), and finally runing local optimization starting from the top scoring config.
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- register_pending(trial_id, config=None, milestone=None)[source]
Registers trial as pending. This means the corresponding evaluation task is running. Once it finishes, update is called for this trial.
- get_batch_configs(batch_size, num_init_candidates_for_batch=None, **kwargs)[source]
Asks for a batch of
batch_size
configurations to be suggested. This is roughly equivalent to callingget_config
batch_size
times, marking the suggested configs as pending in the state (but the state is not modified here). This means the batch is chosen sequentially, at about the cost of callingget_config
batch_size
times.If
num_init_candidates_for_batch
is given, it is used instead ofnum_init_candidates
for the selection of all but the first config in the batch. In order to speed up batch selection, choosenum_init_candidates_for_batch
smaller thannum_init_candidates
.If less than
batch_size
configs are returned, the search space has been exhausted.Note: Batch selection does not support
debug_log
right now: make sure to switch this off when creating scheduler and searcher.- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]
- class syne_tune.optimizer.schedulers.searchers.GPFIFOSearcher(config_space, metric, points_to_evaluate=None, clone_from_state=False, **kwargs)[source]
Bases:
BayesianOptimizationSearcher
Gaussian process Bayesian optimization for FIFO scheduler
This searcher must be used with
FIFOScheduler
. It provides Bayesian optimization, based on a Gaussian process surrogate model.It is not recommended creating
GPFIFOSearcher
searcher objects directly, but rather to createFIFOScheduler
objects withsearcher="bayesopt"
, and passing arguments here insearch_options
. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
to create components in a consistent way.Most of the implementation is generic in
BayesianOptimizationSearcher
.Note: If metric values are to be maximized (
mode-"max"
in scheduler), the searcher usesmap_reward
to map metric values to internal criterion values, and minimizes the latter. The default choice is to multiply values by -1.Pending configurations (for which evaluation tasks are currently running) are dealt with by fantasizing (i.e., target values are drawn from the current posterior, and acquisition functions are averaged over this sample, see
num_fantasy_samples
).The GP surrogate model uses a Matern 5/2 covariance function with automatic relevance determination (ARD) of input attributes, and a constant mean function. The acquisition function is expected improvement (EI). All hyperparameters of the surrogate model are estimated by empirical Bayes (maximizing the marginal likelihood). In general, this hyperparameter fitting is the most expensive part of a
get_config()
call.Note that the full logic of construction based on arguments is given in :mod:
syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
. In particular, seegp_fifo_searcher_defaults()
for default values.Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
clone_from_state (bool) – Internal argument, do not use
resource_attr (str, optional) – Name of resource attribute in reports. This is optional here, but required for multi-fidelity searchers. If
resource_attr
andcost_attr
are given, cost values are read from each report and stored in the state. This allows cost models to be fit on more data.cost_attr (str, optional) – Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Needed only by cost-aware searchers. Depending on whether
resource_attr
is given, cost values are read from each report or only at the end.num_init_random (int, optional) – Number of initial
get_config()
calls for which randomly sampled configs are returned. Afterwards, the model-based searcher is used. Defaults toDEFAULT_NUM_INITIAL_RANDOM_EVALUATIONS
num_init_candidates (int, optional) – Number of initial candidates sampled at random in order to seed the model-based search in
get_config
. Defaults toDEFAULT_NUM_INITIAL_CANDIDATES
num_fantasy_samples (int, optional) – Number of samples drawn for fantasizing (latent target values for pending evaluations), defaults to 20
no_fantasizing (bool, optional) – If True, fantasizing is not done and pending evaluations are ignored. This may lead to loss of diversity in decisions. Defaults to
False
input_warping (bool, optional) – If
True
, we use a warping transform, so the kernel function becomes \(k(w(x), w(x'))\), where \(w(x)\) is a warping transform parameterized by two non-negative numbers per component, which are learned as hyperparameters. See alsoWarping
. Coordinates which belong to categorical hyperparameters, are not warped. Defaults toFalse
.boxcox_transform (bool, optional) – If
True
, target values are transformed before being fitted with a Gaussian marginal likelihood. This is using the Box-Cox transform with a parameter \(\lambda\), which is learned alongside other parameters of the surrogate model. The transform is \(\log y\) for \(\lambda = 0\), and \(y - 1\) for \(\lambda = 1\). This option requires the targets to be positive. Defaults toFalse
.gp_base_kernel (str, optional) – Selects the covariance (or kernel) function to be used. Supported choices are
SUPPORTED_BASE_MODELS
. Defaults to “matern52-ard” (Matern 5/2 with automatic relevance determination).acq_function (str, optional) – Selects the acquisition function to be used. Supported choices are
SUPPORTED_ACQUISITION_FUNCTIONS
. Defaults to “ei” (expected improvement acquisition function).acq_function_kwargs (dict, optional) – Some acquisition functions have additional parameters, they can be passed here. If none are given, default values are used.
initial_scoring (str, optional) –
Scoring function to rank initial candidates (local optimization of EI is started from top scorer):
”thompson_indep”: Independent Thompson sampling; randomized score, which can increase exploration
”acq_func”: score is the same (EI) acquisition function which is used for local optimization afterwards
Defaults to
DEFAULT_INITIAL_SCORING
skip_local_optimization (bool, optional) – If
True
, the local gradient-based optimization of the acquisition function is skipped, and the top-ranked initial candidate (after initial scoring) is returned instead. In this case,initial_scoring="acq_func"
makes most sense, otherwise the acquisition function will not be used. Defaults to Falseopt_nstarts (int, optional) – Parameter for surrogate model fitting. Number of random restarts. Defaults to 2
opt_maxiter (int, optional) – Parameter for surrogate model fitting. Maximum number of iterations per restart. Defaults to 50
opt_warmstart (bool, optional) – Parameter for surrogate model fitting. If
True
, each fitting is started from the previous optimum. Not recommended in general. Defaults toFalse
opt_verbose (bool, optional) – Parameter for surrogate model fitting. If
True
, lots of output. Defaults toFalse
max_size_data_for_model (int, optional) – If this is set, we limit the number of observations the surrogate model is fitted on this value. If there are more observations, they are down sampled, see
SubsampleSingleFidelityStateConverter
for details. This down sampling is repeated every time the model is fit. Theopt_skip_*
predicates are evaluated before the state is downsampled. PassNone
not to apply such a threshold. The default isDEFAULT_MAX_SIZE_DATA_FOR_MODEL
.max_size_top_fraction (float, optional) – Only used if
max_size_data_for_model
is set. This fraction of the down sampled set is filled with the top entries in the full set, the remaining ones are sampled at random from the full set, seeSubsampleSingleFidelityStateConverter
for details. Defaults to 0.25.opt_skip_init_length (int, optional) – Parameter for surrogate model fitting, skip predicate. Fitting is never skipped as long as number of observations below this threshold. Defaults to 150
opt_skip_period (int, optional) – Parameter for surrogate model fitting, skip predicate. If
>1
, and number of observations aboveopt_skip_init_length
, fitting is done only K-th call, and skipped otherwise. Defaults to 1 (no skipping)allow_duplicates (bool, optional) – If
True
,get_config()
may return the same configuration more than once. Defaults toFalse
restrict_configurations (List[dict], optional) – If given, the searcher only suggests configurations from this list. This needs
skip_local_optimization == True
. Ifallow_duplicates == False
, entries are popped off this list once suggested.map_reward (str or
MapReward
, optional) –In the scheduler, the metric may be minimized or maximized, but internally, Bayesian optimization is minimizing the criterion.
map_reward
converts from metric to internal criterion:”minus_x”:
criterion = -metric
”<a>_minus_x”:
criterion = <a> - metric
. For example “1_minus_x” maps accuracy to zero-one error
From a technical standpoint, it does not matter what is chosen here, because criterion is only used internally. Also note that criterion data is always normalized to mean 0, variance 1 before fitted with a Gaussian process. Defaults to “1_minus_x”
transfer_learning_task_attr (str, optional) – Used to support transfer HPO, where the state contains observed data from several tasks, one of which is the active one. To this end,
config_space
must contain a categorical parameter of nametransfer_learning_task_attr
, whose range are all task IDs. Also,transfer_learning_active_task
must denote the active task, andtransfer_learning_active_config_space
is used asactive_config_space
argument inHyperparameterRanges
. This allows us to use a narrower search space for the active task than for the union of all tasks (config_space
must be that), which is needed if some configurations of non-active tasks lie outside of the ranges inactive_config_space
. One of the implications is thatfilter_observed_data()
is selecting configs of the active task, so that incumbents or exclusion lists are restricted to data from the active task.transfer_learning_active_task (str, optional) – See
transfer_learning_task_attr
.transfer_learning_active_config_space (Dict[str, Any], optional) – See
transfer_learning_task_attr
. If not given,config_space
is the search space for the active task as well. This active config space need not contain thetransfer_learning_task_attr
parameter. In fact, this parameter is set to a categorical withtransfer_learning_active_task
as single value, so that new configs are chosen for the active task only.transfer_learning_model (str, optional) –
See
transfer_learning_task_attr
. Specifies the surrogate model to be used for transfer learning:”matern52_product”: Kernel is product of Matern 5/2 (not ARD) on
transfer_learning_task_attr
and Matern 5/2 (ARD) on the rest. Assumes that data from same task are more closely related than data from different tasks”matern52_same”: Kernel is Matern 5/2 (ARD) on the rest of the variables,
transfer_learning_task_attr
is ignored. Assumes that data from all tasks can be merged together
Defaults to “matern52_product”
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
- class syne_tune.optimizer.schedulers.searchers.GPMultiFidelitySearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
GPFIFOSearcher
Gaussian process Bayesian optimization for asynchronous Hyperband scheduler.
This searcher must be used with a scheduler of type
MultiFidelitySchedulerMixin
. It provides a novel combination of Bayesian optimization, based on a Gaussian process surrogate model, with Hyperband scheduling. In particular, observations across resource levels are modelled jointly.It is not recommended to create
GPMultiFidelitySearcher
searcher objects directly, but rather to createHyperbandScheduler
objects withsearcher="bayesopt"
, and passing arguments here insearch_options
. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
to create components in a consistent way.Most of
GPFIFOSearcher
comments apply here as well. In multi-fidelity HPO, we optimize a function \(f(\mathbf{x}, r)\), \(\mathbf{x}\) the configuration, \(r\) the resource (or time) attribute. The latter must be a positive integer. In most applications,resource_attr == "epoch"
, and the resource is the number of epochs already trained.If
model == "gp_multitask"
(default), we model the function \(f(\mathbf{x}, r)\) jointly over all resource levels \(r\) at which it is observed (but seesearcher_data
inHyperbandScheduler
). The kernel and mean function of our surrogate model are over \((\mathbf{x}, r)\). The surrogate model is selected bygp_resource_kernel
. More details about the supported kernels is in:Tiao, Klein, Lienart, Archambeau, Seeger (2020)Model-based Asynchronous Hyperparameter and Neural Architecture SearchThe acquisition function (EI) which is optimized in
get_config()
, is obtained by fixing the resource level \(r\) to a value which is determined depending on the current state. Ifresource_acq
== ‘bohb’, \(r\) is the largest value<= max_t
, where we have seen \(\ge \mathrm{dimension}(\mathbf{x})\) metric values. Ifresource_acq == "first"
, \(r\) is the first milestone which config \(\mathbf{x}\) would reach when started.Additional arguments on top of parent class
GPFIFOSearcher
.- Parameters:
model (str, optional) –
Selects surrogate model (learning curve model) to be used. Choices are:
”gp_multitask” (default): GP multi-task surrogate model
”gp_independent”: Independent GPs for each rung level, sharing an ARD kernel
”gp_issm”: Gaussian-additive model of ISSM type
”gp_expdecay”: Gaussian-additive model of exponential decay type (as in Freeze Thaw Bayesian Optimization)
gp_resource_kernel (str, optional) – Only relevant for
model == "gp_multitask"
. Surrogate model over criterion function \(f(\mathbf{x}, r)\), \(\mathbf{x}\) the config, \(r\) the resource. Note that \(\mathbf{x}\) is encoded to be a vector with entries in[0, 1]
, and \(r\) is linearly mapped to[0, 1]
, while the criterion data is normalized to mean 0, variance 1. The reference above provides details on the models supported here. For the exponential decay kernel, the base kernel over \(\mathbf{x}\) is Matern 5/2 ARD. SeeSUPPORTED_RESOURCE_MODELS
for supported choices. Defaults to “exp-decay-sum”resource_acq (str, optional) – Only relevant for ``model in
{"gp_multitask", "gp_independent"}
. Determines how the EI acquisition function is used. Values: “bohb”, “first”. Defaults to “bohb”max_size_data_for_model (int, optional) –
If this is set, we limit the number of observations the surrogate model is fitted on this value. If there are more observations, they are down sampled, see
SubsampleMultiFidelityStateConverter
for details. This down sampling is repeated every time the model is fit, which ensures that most recent data is taken into account. Theopt_skip_*
predicates are evaluated before the state is downsampled.Pass
None
not to apply such a threshold. The default isDEFAULT_MAX_SIZE_DATA_FOR_MODEL
.opt_skip_num_max_resource (bool, optional) – Parameter for surrogate model fitting, skip predicate. If
True
, and number of observations aboveopt_skip_init_length
, fitting is done only when there is a new datapoint atr = max_t
, and skipped otherwise. Defaults toFalse
issm_gamma_one (bool, optional) – Only relevant for
model == "gp_issm"
. IfTrue
, the gamma parameter of the ISSM is fixed to 1, otherwise it is optimized over. Defaults toFalse
expdecay_normalize_inputs (bool, optional) – Only relevant for
model == "gp_expdecay"
. IfTrue
, resource values r are normalized to[0, 1]
as input to the exponential decay surrogate model. Defaults toFalse
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- register_pending(trial_id, config=None, milestone=None)[source]
Registers trial as pending. This means the corresponding evaluation task is running. Once it finishes, update is called for this trial.
- evaluation_failed(trial_id)[source]
Called by scheduler if an evaluation job for a trial failed.
The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).
- Parameters:
trial_id (
str
) – ID of trial whose evaluated failed
- cleanup_pending(trial_id)[source]
Removes all pending evaluations for trial
trial_id
.This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.
- Parameters:
trial_id (
str
) – ID of trial whose pending evaluations should be cleared
- remove_case(trial_id, **kwargs)[source]
Remove data case previously appended by
_update()
For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.
- Parameters:
trial_id (
str
) – ID of trial whose data is to be removedkwargs – Extra arguments, optional
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
Subpackages
syne_tune.optimizer.schedulers.searchers.bayesopt package
Subpackages
syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common module
- syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common.dictionarize_objective(x)[source]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common.TrialEvaluations(trial_id, metrics)[source]
Bases:
object
For each fixed k,
metrics[k]
is either a single value or a dict. The latter is used, for example, for multi-fidelity schedulers, wheremetrics[k][str(r)]
is the value at resource levelr
.-
trial_id:
str
-
metrics:
Dict
[str
,Union
[float
,Dict
[str
,float
]]]
- num_cases(metric_name='target', resource=None)[source]
Counts the number of observations for metric
metric_name
.- Parameters:
metric_name (
str
) – Defaults toINTERNAL_METRIC_NAME
resource (
Optional
[int
]) – In the multi-fidelity case, we only count observations at this resource level
- Return type:
int
- Returns:
Number of observations
-
trial_id:
- class syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common.PendingEvaluation(trial_id, resource=None)[source]
Bases:
object
Maintains information for pending candidates (i.e. candidates which have been queried for labeling, but target feedback has not yet been obtained.
The minimum information is the candidate which has been queried.
- property trial_id: str
- property resource: int | None
- class syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common.FantasizedPendingEvaluation(trial_id, fantasies, resource=None)[source]
Bases:
PendingEvaluation
Here, latent target values are integrated out by Monte Carlo samples, also called “fantasies”.
- property fantasies
syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.config_ext module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.config_ext.ExtendedConfiguration(hp_ranges, resource_attr_key, resource_attr_range)[source]
Bases:
object
This class facilitates handling extended configs, which consist of a normal config and a resource attribute.
The config space hp_ranges is extended by an additional resource attribute. Note that this is not a hyperparameter we optimize over, but it is under the control of the scheduler. Its allowed range is
[1, resource_attr_range[1]]
, which can be larger than[resource_attr_range[0], resource_attr_range[1]]
. This is because extended configs with resource values outside of resource_attr_range may arise (for example, in the early stopping context, we may receive data fromepoch < resource_attr_range[0]
).- get(config, resource)[source]
Create extended config with resource added.
- Parameters:
config (
Dict
[str
,Union
[int
,float
,str
]]) – Non-extended configresource (
int
) – Resource value
- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
Extended config
- remove_resource(config_ext)[source]
Strips away resource attribute and returns normal config. If
config_ext
is already normal, it is returned as is.- Parameters:
config_ext (
Dict
[str
,Union
[int
,float
,str
]]) – Extended config- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
config_ext
without resource attribute
syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.tuning_job_state module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.tuning_job_state.TuningJobState(hp_ranges, config_for_trial, trials_evaluations, failed_trials=None, pending_evaluations=None)[source]
Bases:
object
Collects all data determining the state of a tuning experiment. Trials are indexed by
trial_id
. The configurations associated with trials are listed inconfig_for_trial
.trials_evaluations
contains observations,failed_trials
lists trials for which evaluations have failed,pending_evaluations
lists trials for which observations are pending.trials_evaluations
may store values for different metrics in each record, and each such value may be a dict (see:class:TrialEvaluations
). For example, for multi-fidelity schedulers,trials_evaluations[i].metrics[k][str(r)]
is the value for metric k and trialtrials_evaluations[i].trial_id
observed at resource level r.- metrics_for_trial(trial_id, config=None)[source]
Helper for inserting new entry into
trials_evaluations
. Iftrial_id
is already contained there, the correspondingeval.metrics
is returned. Otherwise, a new entrynew_eval
is appended totrials_evaluations
and itsnew_eval.metrics
is returned (emptydict
). In the latter case,config
needs to be passed, because it may not yet feature inconfig_for_trial
.- Return type:
Union
[float
,Dict
[str
,float
]]
- num_observed_cases(metric_name='target', resource=None)[source]
Counts the number of observations for metric
metric_name
.- Parameters:
metric_name (
str
) – Defaults toINTERNAL_METRIC_NAME
resource (
Optional
[int
]) – In the multi-fidelity case, we only count observations at this resource level
- Return type:
int
- Returns:
Number of observations
- observed_data_for_metric(metric_name='target', resource_attr_name=None)[source]
Extracts datapoints from
trials_evaluations
for particular metricmetric_name
, in the form of a list of configs and a list of metric values. Ifmetric_name
is a dict-valued metric, the dict keys must be resource values, and the returned configs are extended. Here, the name of the resource attribute can be passed inresource_attr_name
(if not given, it can be obtained fromhp_ranges
if this is extended).Note: Implements the default behaviour, namely to return extended configs for dict-valued metrics, which also require
hp_ranges
to be extended. This is not correct for some specific multi-fidelity surrogate models, which should access the data directly.- Parameters:
metric_name (
str
) –resource_attr_name (
Optional
[str
]) –
- Return type:
(
List
[Dict
[str
,Union
[int
,float
,str
]]],List
[float
])- Returns:
configs, metric_values
- is_labeled(trial_id, metric_name='target', resource=None)[source]
Checks whether
trial_id
has observed data undermetric_name
. Ifresource
is given, the observation must be at that resource level.- Return type:
bool
- append_pending(trial_id, config=None, resource=None)[source]
Appends new pending evaluation. If the trial has not been registered here,
config
must be given. Otherwise, it is ignored.
- pending_configurations(resource_attr_name=None)[source]
Returns list of configurations corresponding to pending evaluations. If the latter have resource values, the configs are extended.
- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]
- all_configurations(filter_observed_data=None)[source]
Returns list of configurations for all trials represented here, whether observed, pending, or failed. If
filter_observed_data
is given, the configurations for observed trials are filtered with this predicate.- Parameters:
filter_observed_data (
Optional
[Callable
[[Dict
[str
,Union
[int
,float
,str
]]],bool
]]) – See above, optional- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]- Returns:
List of all configurations
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd package
- exception syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.SliceException[source]
Bases:
Exception
Subpackages
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.gp_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.gp_model.HyperTuneDistributionArguments(num_samples, num_brackets=None)[source]
Bases:
object
-
num_samples:
int
-
num_brackets:
Optional
[int
] = None
-
num_samples:
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.gp_model.HyperTuneModelMixin(hypertune_distribution_args)[source]
Bases:
object
- hypertune_bracket_distribution()[source]
Distribution [w_k] of support size
num_supp_brackets
, wherenum_supp_brackets <= args.num_brackets
(the latter is maximum if not given) is maximum such that the firstnum_supp_brackets
brackets have >= 6 labeled datapoints each.If
num_supp_brackets < args.num_brackets
, the distribution must be extended to full size before being used to sample the next bracket.- Return type:
Optional
[ndarray
]
- hypertune_ensemble_distribution()[source]
Distribution [theta_r] which is used to create an ensemble predictive distribution fed into the acquisition function. The ensemble distribution runs over all sufficiently supported rung levels, independent of the number of brackets.
- Return type:
Optional
[Dict
[int
,float
]]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.gp_model.HyperTuneIndependentGPModel(kernel, mean_factory, resource_attr_range, hypertune_distribution_args, target_transform=None, separate_noise_variances=False, initial_noise_variance=None, initial_covariance_scale=None, optimization_config=None, random_seed=None, fit_reset_params=True)[source]
Bases:
IndependentGPPerResourceModel
,HyperTuneModelMixin
Variant of
IndependentGPPerResourceModel
which implements additional features of the Hyper-Tune algorithm, seeYang Li et alHyper-Tune: Towards Efficient Hyper-parameter Tuning at ScaleVLDB 2022Our implementation differs from the Hyper-Tune paper in a number of ways. Most importantly, their method requires a sufficient number of observed points at the starting rung of the highest bracket. In contrast, we estimate ranking loss values already when the starting rung of the 2nd bracket is sufficiently occupied. This allows us to estimate the head of the distribution only (over all brackets with sufficiently occupied starting rungs), and we use the default distribution over the remaining tail. Eventually, we do the same as Hyper-Tune, but we move away from the default distribution earlier on.
- Parameters:
hypertune_distribution_args (
HyperTuneDistributionArguments
) – Parameters for Hyper-Tune
- create_likelihood(rung_levels)[source]
Delayed creation of likelihood, needs to know rung levels of Hyperband scheduler.
Note: last entry of
rung_levels
must bemax_t
, even if this is not a rung level in Hyperband.- Parameters:
rung_levels (
List
[int
]) – Rung levels
- hypertune_ensemble_distribution()[source]
Distribution [theta_r] which is used to create an ensemble predictive distribution fed into the acquisition function. The ensemble distribution runs over all sufficiently supported rung levels, independent of the number of brackets.
- Return type:
Optional
[Dict
[int
,float
]]
- fit(data)[source]
Fit the model parameters by optimizing the marginal likelihood, and set posterior states.
We catch exceptions during the optimization restarts. If any restarts fail, log messages are written. If all restarts fail, the current parameters are not changed.
- Parameters:
data (
Dict
[str
,Any
]) – Input data
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.gp_model.HyperTuneJointGPModel(kernel, resource_attr_range, hypertune_distribution_args, mean=None, target_transform=None, initial_noise_variance=None, optimization_config=None, random_seed=None, fit_reset_params=True)[source]
Bases:
GaussianProcessRegression
,HyperTuneModelMixin
Variant of
GaussianProcessRegression
which implements additional features of the Hyper-Tune algorithm, seeYang Li et al Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale VLDB 2022
See also
HyperTuneIndependentGPModel
- Parameters:
hypertune_distribution_args (
HyperTuneDistributionArguments
) – Parameters for Hyper-Tune
- create_likelihood(rung_levels)[source]
Delayed creation of likelihood, needs to know rung levels of Hyperband scheduler.
Note: last entry of
rung_levels
must bemax_t
, even if this is not a rung level in Hyperband.- Parameters:
rung_levels (
List
[int
]) – Rung levels
- hypertune_ensemble_distribution()[source]
Distribution [theta_r] which is used to create an ensemble predictive distribution fed into the acquisition function. The ensemble distribution runs over all sufficiently supported rung levels, independent of the number of brackets.
- Return type:
Optional
[Dict
[int
,float
]]
- fit(data)[source]
Fit the model parameters by optimizing the marginal likelihood, and set posterior states.
We catch exceptions during the optimization restarts. If any restarts fail, log messages are written. If all restarts fail, the current parameters are not changed.
- Parameters:
data (
Dict
[str
,Any
]) – Input data
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.likelihood module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.likelihood.HyperTuneIndependentGPMarginalLikelihood(kernel, mean, resource_attr_range, ensemble_distribution, target_transform=None, separate_noise_variances=False, initial_noise_variance=None, initial_covariance_scale=None, encoding_type=None, **kwargs)[source]
Bases:
IndependentGPPerResourceMarginalLikelihood
Variant of
IndependentGPPerResourceMarginalLikelihood
, which has the same internal model and marginal likelihood function, but whose posterior state is ofHyperTuneIndependentGPPosteriorState
, which uses an ensemble predictive distribution, whose weighting distribution has to be passed here at construction.- property ensemble_distribution: Dict[int, float]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.likelihood.HyperTuneJointGPMarginalLikelihood(kernel, mean, resource_attr_range, ensemble_distribution, target_transform=None, initial_noise_variance=None, encoding_type=None, **kwargs)[source]
Bases:
GaussianProcessMarginalLikelihood
Variant of
GaussianProcessMarginalLikelihood
, which has the same internal model and marginal likelihood function, but whose posterior state is ofHyperTuneJointGPPosteriorState
, which uses an ensemble predictive distribution, whose weighting distribution has to be passed here at construction.- property ensemble_distribution: Dict[int, float]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.posterior_state module
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.posterior_state.assert_ensemble_distribution(distribution, all_resources)[source]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.posterior_state.HyperTuneIndependentGPPosteriorState(features, targets, kernel, mean, covariance_scale, noise_variance, resource_attr_range, ensemble_distribution, debug_log=False)[source]
Bases:
IndependentGPPerResourcePosteriorState
Special case of
IndependentGPPerResourcePosteriorState
, where methodspredict
,backward_gradient
,sample_marginals
,sample_joint
are over a random function \(f_{MF}(x)\), obtained by first sampling the resource level \(r \sim [\theta_r]\), then use \(f_{MF}(x) = f(x, r)\). Predictive means and variances are:- ..math::
mu_{MF}(x) = sum_r theta_r mu(x, r) sigma_{MF}^2(x) = sum_r theta_r^2 sigma_{MF}^2(x, r)
Here, \([\theta_k]\) is a distribution over a subset of rung levels.
Note: This posterior state is unusual, in that
sample_marginals
,sample_joint
have to work both with (a) extended inputs (x, r) and (b) non-extended inputs x. For case (a), they behave like the superclass methods, this is needed to support fitting model parameters, for example for drawing fantasy samples. For case (b), they use the ensemble distribution detailed above, which supports optimizing the acquisition function.- predict(test_features)[source]
Computes marginal statistics (means, variances) for a number of test features.
- Parameters:
test_features (
ndarray
) – Features for test configs- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
posterior_means, posterior_variances
- sample_marginals(test_features, num_samples=1, random_state=None)[source]
If
test_features
are non-extended features (no resource attribute), we sample from the ensemble predictive distribution. Otherwise, we call the superclass method.- Return type:
ndarray
- sample_joint(test_features, num_samples=1, random_state=None)[source]
If
test_features
are non-extended features (no resource attribute), we sample from the ensemble predictive distribution. Otherwise, we call the superclass method.- Return type:
ndarray
- backward_gradient(input, head_gradients, mean_data, std_data)[source]
Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.
- Parameters:
input (
ndarray
) – Single input point x, shape (d,)head_gradients (
Dict
[str
,ndarray
]) – See Predictor.backward_gradientmean_data (
float
) – Mean used to normalize targetsstd_data (
float
) – Stddev used to normalize targets
- Return type:
ndarray
- Returns:
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.posterior_state.HyperTuneJointGPPosteriorState(features, targets, mean, kernel, noise_variance, resource_attr_range, ensemble_distribution, debug_log=False)[source]
Bases:
GaussProcPosteriorState
Special case of
GaussProcPosteriorState
, where methodspredict
,backward_gradient
,sample_marginals
,sample_joint
are over a random function \(f_{MF}(x)\), obtained by first sampling the resource level \(r \sim [\theta_r]\), then use \(f_{MF}(x) = f(x, r)\). Predictive means and variances are:- ..math::
mu_{MF}(x) = sum_r theta_r mu(x, r) sigma_{MF}^2(x) = sum_r theta_r^2 sigma_{MF}^2(x, r)
Here, \([\theta_k]\) is a distribution over a subset of rung levels.
Note: This posterior state is unusual, in that
sample_marginals
,sample_joint
have to work both with (a) extended inputs (x, r) and (b) non-extended inputs x. For case (a), they behave like the superclass methods, this is needed to support fitting model parameters, for example for drawing fantasy samples. For case (b), they use the ensemble distribution detailed above, which supports optimizing the acquisition function.- predict(test_features)[source]
Computes marginal statistics (means, variances) for a number of test features.
- Parameters:
test_features (
ndarray
) – Features for test configs- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
posterior_means, posterior_variances
- sample_marginals(test_features, num_samples=1, random_state=None)[source]
If
test_features
are non-extended features (no resource attribute), we sample from the ensemble predictive distribution. Otherwise, we call the superclass method.- Return type:
ndarray
- sample_joint(test_features, num_samples=1, random_state=None)[source]
If
test_features
are non-extended features (no resource attribute), we sample from the ensemble predictive distribution. Otherwise, we call the superclass method.- Return type:
ndarray
- backward_gradient(input, head_gradients, mean_data, std_data)[source]
Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.
The posterior represented here is based on normalized data, while the acquisition function is based on the de-normalized predictive distribution, which is why we need ‘mean_data’, ‘std_data’ here.
- Parameters:
input (
ndarray
) – Single input point x, shape (d,)head_gradients (
Dict
[str
,ndarray
]) – See Predictor.backward_gradientmean_data (
float
) – Mean used to normalize targetsstd_data (
float
) – Stddev used to normalize targets
- Return type:
ndarray
- Returns:
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.ExtendFeaturesByResourceMixin(resource, resource_attr_range)[source]
Bases:
object
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.PosteriorStateClampedResource(poster_state_extended, resource, resource_attr_range)[source]
Bases:
PosteriorStateWithSampleJoint
,ExtendFeaturesByResourceMixin
Converts posterior state of
PosteriorStateWithSampleJoint
over extended inputs into posterior state over non-extended inputs, where the resource attribute is clamped to a fixed value.- Parameters:
poster_state_extended (
PosteriorStateWithSampleJoint
) – Posterior state over extended inputsresource (
int
) – Value to which resource attribute is clampedresource_attr_range (
Tuple
[int
,int
]) – \((r_{min}, r_{max})\)
- property num_data
- property num_features
- property num_fantasies
- predict(test_features)[source]
Computes marginal statistics (means, variances) for a number of test features.
- Parameters:
test_features (
ndarray
) – Features for test configs- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
posterior_means, posterior_variances
- sample_marginals(test_features, num_samples=1, random_state=None)[source]
See comments of
predict
.- Parameters:
test_features (
ndarray
) – Input points for test configsnum_samples (
int
) – Number of samplesrandom_state (
Optional
[RandomState
]) – PRNG
- Return type:
ndarray
- Returns:
Marginal samples, (num_test, num_samples)
- sample_joint(test_features, num_samples=1, random_state=None)[source]
See comments of
predict
.- Parameters:
test_features (
ndarray
) – Input points for test configsnum_samples (
int
) – Number of samplesrandom_state (
Optional
[RandomState
]) – PRNG
- Return type:
ndarray
- Returns:
Joint samples, (num_test, num_samples)
- backward_gradient(input, head_gradients, mean_data, std_data)[source]
Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.
- Parameters:
input (
ndarray
) – Single input point x, shape (d,)head_gradients (
Dict
[str
,ndarray
]) – See Predictor.backward_gradientmean_data (
float
) – Mean used to normalize targetsstd_data (
float
) – Stddev used to normalize targets
- Return type:
ndarray
- Returns:
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.MeanFunctionClampedResource(mean_extended, resource, resource_attr_range, **kwargs)[source]
Bases:
MeanFunction
,ExtendFeaturesByResourceMixin
- param_encoding_pairs()[source]
- Returns list of tuples
(param_internal, encoding)
over all Gluon parameters maintained here.
- Returns:
List [(param_internal, encoding)]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.KernelFunctionClampedResource(kernel_extended, resource, resource_attr_range, **kwargs)[source]
Bases:
KernelFunction
,ExtendFeaturesByResourceMixin
- param_encoding_pairs()[source]
- Returns list of tuples
(param_internal, encoding)
over all Gluon parameters maintained here.
- Returns:
List [(param_internal, encoding)]
- set_params(param_dict)[source]
- Parameters:
param_dict (
Dict
[str
,Any
]) – Dictionary with new hyperparameter values- Returns:
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.GaussProcPosteriorStateAndRungLevels(poster_state, rung_levels)[source]
Bases:
PosteriorStateWithSampleJoint
- property poster_state: GaussProcPosteriorState
- property num_data
- property num_features
- property num_fantasies
- predict(test_features)[source]
Computes marginal statistics (means, variances) for a number of test features.
- Parameters:
test_features (
ndarray
) – Features for test configs- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
posterior_means, posterior_variances
- sample_marginals(test_features, num_samples=1, random_state=None)[source]
See comments of
predict
.- Parameters:
test_features (
ndarray
) – Input points for test configsnum_samples (
int
) – Number of samplesrandom_state (
Optional
[RandomState
]) – PRNG
- Return type:
ndarray
- Returns:
Marginal samples, (num_test, num_samples)
- sample_joint(test_features, num_samples=1, random_state=None)[source]
See comments of
predict
.- Parameters:
test_features (
ndarray
) – Input points for test configsnum_samples (
int
) – Number of samplesrandom_state (
Optional
[RandomState
]) – PRNG
- Return type:
ndarray
- Returns:
Joint samples, (num_test, num_samples)
- backward_gradient(input, head_gradients, mean_data, std_data)[source]
Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.
- Parameters:
input (
ndarray
) – Single input point x, shape (d,)head_gradients (
Dict
[str
,ndarray
]) – See Predictor.backward_gradientmean_data (
float
) – Mean used to normalize targetsstd_data (
float
) – Stddev used to normalize targets
- Return type:
ndarray
- Returns:
- property rung_levels: List[int]
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.hypertune_ranking_losses(poster_state, data, num_samples, resource_attr_range, random_state=None)[source]
Samples ranking loss values as defined in the Hyper-Tune paper. We return a matrix of size
(num_supp_levels, num_samples)
, wherenum_supp_levels <= poster_state.rung_levels
is the number of rung levels supported by at least 6 labeled datapoints.The loss values depend on the cases in
data
at the levelposter_state.rung_levels[num_supp_levels - 1]
. We must havenum_supp_levels >= 2
.Loss values at this highest supported level are estimated by cross-validation (so the data at this level is split into training and test, where the training part is used to obtain the posterior state). The number of CV folds is
<= 5
, and such that each fold has at least two points.- Parameters:
poster_state (
Union
[IndependentGPPerResourcePosteriorState
,GaussProcPosteriorStateAndRungLevels
]) – Posterior state over rung levelsdata (
Dict
[str
,Any
]) – Training datanum_samples (
int
) – Number of independent loss samplesresource_attr_range (
Tuple
[int
,int
]) –(r_min, r_max)
random_state (
Optional
[RandomState
]) – PRNG state
- Return type:
ndarray
- Returns:
See above
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.number_supported_levels_and_data_highest_level(rung_levels, data, resource_attr_range)[source]
Finds
num_supp_levels
as maximum such that rung levels up to there have>= 6
labeled datapoints. The set of labeled datapoints of levelnum_supp_levels - 1
is returned as well.If
num_supp_levels == 1
, no level except for the lowest has>= 6
datapoints. In this case,data_max_resource
returned is invalid.- Parameters:
rung_levels (
List
[int
]) – Rung levelsdata (
Dict
[str
,Any
]) – Training data (only data at highest level is used)resource_attr_range (
Tuple
[int
,int
]) –(r_min, r_max)
- Return type:
Tuple
[int
,dict
]- Returns:
(num_supp_levels, data_max_resource)
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.gpind_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.gpind_model.IndependentGPPerResourceModel(kernel, mean_factory, resource_attr_range, target_transform=None, separate_noise_variances=False, initial_noise_variance=None, initial_covariance_scale=None, optimization_config=None, random_seed=None, fit_reset_params=True)[source]
Bases:
GaussianProcessOptimizeModel
GP multi-fidelity model over f(x, r), where for each r, f(x, r) is represented by an independent GP. The different processes share the same kernel, but have their own mean functions mu_r and covariance scales c_r.
The likelihood object is not created at construction, but only with
create_likelihood
. This is because we need to know the rung levels of the Hyperband scheduler.- Parameters:
kernel (
KernelFunction
) – Kernel function without covariance scale, shared by models for all resources rmean_factory (
Callable
[[int
],MeanFunction
]) – Factory function for mean functions mu_r(x)resource_attr_range (
Tuple
[int
,int
]) – (r_min, r_max)target_transform (
Optional
[ScalarTargetTransform
]) – Invertible transform of target values y to latent values z, which are then modelled as Gaussian. Shared across different \(r\). Defaults to the identityseparate_noise_variances (
bool
) – Separate noise variance for each r? Otherwise, noise variance is sharedinitial_noise_variance (
Optional
[float
]) – Initial value for noise variance parameterinitial_covariance_scale (
Optional
[float
]) – Initial value for covariance scale parameters c_roptimization_config (
Optional
[OptimizationConfig
]) – Configuration that specifies the behavior of the optimization of the marginal likelihood.random_seed – Random seed to be used (optional)
fit_reset_params (
bool
) – Reset parameters to initial values before running ‘fit’? If False, ‘fit’ starts from the current values
- create_likelihood(rung_levels)[source]
Delayed creation of likelihood, needs to know rung levels of Hyperband scheduler.
Note: last entry of
rung_levels
must bemax_t
, even if this is not a rung level in Hyperband.- Parameters:
rung_levels (
List
[int
]) – Rung levels
- property likelihood: MarginalLikelihood
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.likelihood module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.likelihood.IndependentGPPerResourceMarginalLikelihood(kernel, mean, resource_attr_range, target_transform=None, separate_noise_variances=False, initial_noise_variance=None, initial_covariance_scale=None, encoding_type=None, **kwargs)[source]
Bases:
MarginalLikelihood
Marginal likelihood for GP multi-fidelity model over \(f(x, r)\), where for each \(r\), \(f(x, r)\) is represented by an independent GP. The different processes share the same kernel, but have their own mean functions \(\mu_r\) and covariance scales \(c_r\). If
separate_noise_variances == True
, each process has its own noise variance, otherwise all processes share the same noise variance.- Parameters:
kernel (
KernelFunction
) – Shared kernel function \(k(x, x')\)mean (
Dict
[int
,MeanFunction
]) – Maps rung level \(r\) to mean function \(\mu_r\)resource_attr_range (
Tuple
[int
,int
]) – \((r_{min}, r_{max})\)target_transform (
Optional
[ScalarTargetTransform
]) – Invertible transform of target values y to latent values z, which are then modelled as Gaussian. Shared across different \(r\). Defaults to the identityseparate_noise_variances (
bool
) – See above. Defaults toFalse
initial_noise_variance (
Optional
[float
]) – Initial value for noise variance(s). Defaults toINITIAL_NOISE_VARIANCE
initial_covariance_scale (
Optional
[float
]) – Initial value for covariance scales. Defaults toINITIAL_COVARIANCE_SCALE
encoding_type (
Optional
[str
]) – Encoding used for noise variance(s) and covariance scales. Defaults toDEFAULT_ENCODING
- forward(data)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.posterior_state module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.posterior_state.IndependentGPPerResourcePosteriorState(features, targets, kernel, mean, covariance_scale, noise_variance, resource_attr_range, debug_log=False)[source]
Bases:
PosteriorStateWithSampleJoint
Posterior state for model over f(x, r), where for a fixed set of resource levels r, each f(x, r) is represented by an independent Gaussian process. These processes share a common covariance function k(x, x), but can have their own mean functions mu_r and covariance scales c_r. They can also have their own noise variances, or the noise variance is shared.
Attention: Predictions can only be done at (x, r) where r has at least one training datapoint. This is because a posterior state cannot represent the prior.
- property num_data
- property num_features
- property num_fantasies
- predict(test_features)[source]
Computes marginal statistics (means, variances) for a number of test features.
- Parameters:
test_features (
ndarray
) – Features for test configs- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
posterior_means, posterior_variances
- sample_marginals(test_features, num_samples=1, random_state=None)[source]
Different to
predict
, entries intest_features
may have resources not covered by data in posterior state. For such entries, we return the prior mean. We do not sample from the prior. Ifsample_marginals
is used to draw fantasy values, this corresponds to the Kriging believer heuristic.- Return type:
ndarray
- sample_joint(test_features, num_samples=1, random_state=None)[source]
Different to
predict
, entries intest_features
may have resources not covered by data in posterior state. For such entries, we return the prior mean. We do not sample from the prior. Ifsample_joint
is used to draw fantasy values, this corresponds to the Kriging believer heuristic.- Return type:
ndarray
- backward_gradient(input, head_gradients, mean_data, std_data)[source]
Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.
- Parameters:
input (
ndarray
) – Single input point x, shape (d,)head_gradients (
Dict
[str
,ndarray
]) – See Predictor.backward_gradientmean_data (
float
) – Mean used to normalize targetsstd_data (
float
) – Stddev used to normalize targets
- Return type:
ndarray
- Returns:
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel package
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.KernelFunction(dimension, **kwargs)[source]
Bases:
MeanFunction
Base class of kernel (or covariance) function math:
k(x, x')
- Parameters:
dimension (
int
) – Dimensionality of input points after encoding intondarray
- property dimension
- Returns:
Dimension d of input points
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.Matern52(dimension, ARD=False, encoding_type='logarithm', has_covariance_scale=True, **kwargs)[source]
Bases:
KernelFunction
Block that is responsible for the computation of Matern 5/2 kernel.
if
ARD == False
,inverse_bandwidths
is equal to a scalar broadcast to the d components (withd = dimension
, i.e., the number of features inX
).Arguments on top of base class
SquaredDistance
:- Parameters:
has_covariance_scale (
bool
) – Kernel has covariance scale parameter? Defaults toTrue
- property ARD: bool
- forward(X1, X2)[source]
Computes Matern 5/2 kernel matrix
- Parameters:
X1 – input matrix, shape
(n1,d)
X2 – input matrix, shape
(n2,d)
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.ExponentialDecayResourcesKernelFunction(kernel_x, mean_x, encoding_type='logarithm', alpha_init=1.0, mean_lam_init=0.5, gamma_init=0.5, delta_fixed_value=None, delta_init=0.5, max_metric_value=1.0, **kwargs)[source]
Bases:
KernelFunction
Variant of the kernel function for modeling exponentially decaying learning curves, proposed in:
Swersky, K., Snoek, J., & Adams, R. P. (2014).Freeze-Thaw Bayesian Optimization.The argument in that paper actually justifies using a non-zero mean function (see
ExponentialDecayResourcesMeanFunction
) and centralizing the kernel proposed there. This is done here. Details in:Tiao, Klein, Archambeau, Seeger (2020)Model-based Asynchronous Hyperparameter OptimizationWe implement a new family of kernel functions, for which the additive Freeze-Thaw kernel is one instance (
delta == 0
). The kernel has parametersalpha
,mean_lam
,gamma > 0
, and0 <= delta <= 1
. Note thatbeta = alpha / mean_lam
is used in the Freeze-Thaw paper (the Gamma distribution overlambda
is parameterized differently). The additive Freeze-Thaw kernel is obtained fordelta == 0
(usedelta_fixed_value = 0
).In fact, this class is configured with a kernel and a mean function over inputs
x
(dimensiond
) and represents a kernel (and mean function) over inputs(x, r)
(dimensiond + 1
), where the resource attributer >= 0
is last.- forward(X1, X2, **kwargs)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- param_encoding_pairs()[source]
- Returns list of tuples
(param_internal, encoding)
over all Gluon parameters maintained here.
- Returns:
List [(param_internal, encoding)]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.ExponentialDecayResourcesMeanFunction(kernel, **kwargs)[source]
Bases:
MeanFunction
- forward(X)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.FabolasKernelFunction(dimension=1, encoding_type='logarithm', u1_init=1.0, u3_init=0.0, **kwargs)[source]
Bases:
KernelFunction
The kernel function proposed in:
Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, np. (2016). Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets, in AISTATS 2017. ArXiv:1605.07079 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1605.07079
Please note this is only one of the components of the factorized kernel proposed in the paper. This is the finite-rank (“degenerate”) kernel for modelling data subset fraction sizes. Defined as:
k(x, y) = (U phi(x))^T (U phi(y)), x, y in [0, 1], phi(x) = [1, (1 - x)^2]^T, U = [[u1, u3], [0, u2]] upper triangular, u1, u2 > 0.
- forward(X1, X2)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.ProductKernelFunction(kernel1, kernel2, name_prefixes=None, **kwargs)[source]
Bases:
KernelFunction
Given two kernel functions K1, K2, this class represents the product kernel function given by
\[((x_1, x_2), (y_1, y_2)) \mapsto K(x_1, y_1) \cdot K(x_2, y_2)\]We assume that parameters of K1 and K2 are disjoint.
- forward(X1, X2)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.FreezeThawKernelFunction(kernel_x, mean_x, encoding_type='logarithm', alpha_init=1.0, mean_lam_init=0.5, gamma_init=0.5, max_metric_value=1.0, **kwargs)[source]
Bases:
KernelFunction
Variant of the kernel function for modeling exponentially decaying learning curves, proposed in:
Swersky, K., Snoek, J., & Adams, R. P. (2014). Freeze-Thaw Bayesian Optimization. ArXiv:1406.3896 [Cs, Stat). Retrieved from http://arxiv.org/abs/1406.3896
The argument in that paper actually justifies using a non-zero mean function (see
ExponentialDecayResourcesMeanFunction
) and centralizing the kernel proposed there. This is done here.As in the Freeze-Thaw paper, learning curves for different configs are conditionally independent.
This class is configured with a kernel and a mean function over inputs x (dimension d) and represents a kernel (and mean function) over inputs (x, r) (dimension d + 1), where the resource attribute r >= 0 is last.
Note: This kernel is mostly for debugging! Its conditional independence assumptions allow for faster inference, as implemented in
GaussProcExpDecayPosteriorState
.- forward(X1, X2, **kwargs)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- param_encoding_pairs()[source]
- Returns list of tuples
(param_internal, encoding)
over all Gluon parameters maintained here.
- Returns:
List [(param_internal, encoding)]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.FreezeThawMeanFunction(kernel, **kwargs)[source]
Bases:
MeanFunction
- forward(X)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.CrossValidationMeanFunction(kernel, **kwargs)[source]
Bases:
MeanFunction
- forward(X)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.CrossValidationKernelFunction(kernel_main, kernel_residual, mean_main, num_folds, **kwargs)[source]
Bases:
KernelFunction
Kernel function suitable for \(f(x, r)\) being the average of
r
validation metrics evaluated on different (train, validation) splits.More specifically, there are ‘num_folds`` such splits, and \(f(x, r)\) is the average over the first
r
of them.We model the score on fold
k
as \(e_k(x) = f(x) + g_k(x)\), where \(f(x)\) and the \(g_k(x)\) are a priori independent Gaussian processes with kernelskernel_main
andkernel_residual
(all \(g_k\) share the same kernel). Moreover, the \(g_k\) are zero-mean, while \(f(x)\) may have a mean function. Then:\[ \begin{align}\begin{aligned}f(x, r) = r^{-1} sum_{k \le r} e_k(x),\\k((x, r), (x', r')) = k_{main}(x, x') + \frac{k_{residual}(x, x')}{\mathrm{max}(r, r')}\end{aligned}\end{align} \]Note that
kernel_main
,kernel_residual
are over inputs \(x\) (dimensiond
), while the kernel represented here is over inputs \((x, r)\) of dimensiond + 1
, where the resource attribute \(r\) (number of folds) is last.Inputs are encoded. We assume a linear encoding for r with bounds 1 and
num_folds
. TODO: Right now, all HPs are encoded, and the resource attribute counts as HP, even if it is not optimized over. This creates a dependence to how inputs are encoded.- forward(X1, X2, **kwargs)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.RangeKernelFunction(dimension, kernel, start, **kwargs)[source]
Bases:
KernelFunction
Given kernel function
K
and rangeR
, this class represents\[(x, y) \mapsto K(x_R, y_R)\]- forward(X1, X2)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.base module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.base.KernelFunction(dimension, **kwargs)[source]
Bases:
MeanFunction
Base class of kernel (or covariance) function math:
k(x, x')
- Parameters:
dimension (
int
) – Dimensionality of input points after encoding intondarray
- property dimension
- Returns:
Dimension d of input points
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.base.SquaredDistance(dimension, ARD=False, encoding_type='logarithm', **kwargs)[source]
Bases:
Block
Block that is responsible for the computation of matrices of squared distances. The distances can possibly be weighted (e.g., ARD parametrization). For instance:
\[ \begin{align}\begin{aligned}m_{i j} = \sum_{k=1}^d ib_k^2 (x_{1: i k} - x_{2: j k})^2\\\mathbf{X}_1 = [x_{1: i j}],\quad \mathbf{X}_2 = [x_{2: i j}]\end{aligned}\end{align} \]Here, \([ib_k]\) is the vector
inverse_bandwidth
. ifARD == False
,inverse_bandwidths
is equal to a scalar broadcast to the d components (withd = dimension
, i.e., the number of features inX
).- Parameters:
dimension (
int
) – Dimensionality \(d\) of input vectorsARD (
bool
) – Automatic relevance determination (inverse_bandwidth
vector of sized
)? Defaults toFalse
encoding_type (
str
) – Encoding forinverse_bandwidth
. Defaults toDEFAULT_ENCODING
- forward(X1, X2)[source]
Computes matrix of squared distances
- Parameters:
X1 – input matrix, shape
(n1, d)
X2 – input matrix, shape
(n2, d)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.base.Matern52(dimension, ARD=False, encoding_type='logarithm', has_covariance_scale=True, **kwargs)[source]
Bases:
KernelFunction
Block that is responsible for the computation of Matern 5/2 kernel.
if
ARD == False
,inverse_bandwidths
is equal to a scalar broadcast to the d components (withd = dimension
, i.e., the number of features inX
).Arguments on top of base class
SquaredDistance
:- Parameters:
has_covariance_scale (
bool
) – Kernel has covariance scale parameter? Defaults toTrue
- property ARD: bool
- forward(X1, X2)[source]
Computes Matern 5/2 kernel matrix
- Parameters:
X1 – input matrix, shape
(n1,d)
X2 – input matrix, shape
(n2,d)
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.cross_validation module
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.cross_validation.decode_resource_values(res_encoded, num_folds)[source]
We assume the resource attribute
r
is encoded asrandint(1, num_folds)
. Internally,r
is taken as value in the real interval[0.5, num_folds + 0.5]
, which is linearly transformed to[0, 1]
for encoding.- Parameters:
res_encoded – Encoded values
r
num_folds – Maximum number of folds
- Returns:
Original values
r
(not rounded toint
)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.cross_validation.CrossValidationKernelFunction(kernel_main, kernel_residual, mean_main, num_folds, **kwargs)[source]
Bases:
KernelFunction
Kernel function suitable for \(f(x, r)\) being the average of
r
validation metrics evaluated on different (train, validation) splits.More specifically, there are ‘num_folds`` such splits, and \(f(x, r)\) is the average over the first
r
of them.We model the score on fold
k
as \(e_k(x) = f(x) + g_k(x)\), where \(f(x)\) and the \(g_k(x)\) are a priori independent Gaussian processes with kernelskernel_main
andkernel_residual
(all \(g_k\) share the same kernel). Moreover, the \(g_k\) are zero-mean, while \(f(x)\) may have a mean function. Then:\[ \begin{align}\begin{aligned}f(x, r) = r^{-1} sum_{k \le r} e_k(x),\\k((x, r), (x', r')) = k_{main}(x, x') + \frac{k_{residual}(x, x')}{\mathrm{max}(r, r')}\end{aligned}\end{align} \]Note that
kernel_main
,kernel_residual
are over inputs \(x\) (dimensiond
), while the kernel represented here is over inputs \((x, r)\) of dimensiond + 1
, where the resource attribute \(r\) (number of folds) is last.Inputs are encoded. We assume a linear encoding for r with bounds 1 and
num_folds
. TODO: Right now, all HPs are encoded, and the resource attribute counts as HP, even if it is not optimized over. This creates a dependence to how inputs are encoded.- forward(X1, X2, **kwargs)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.cross_validation.CrossValidationMeanFunction(kernel, **kwargs)[source]
Bases:
MeanFunction
- forward(X)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.exponential_decay module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.exponential_decay.ExponentialDecayResourcesKernelFunction(kernel_x, mean_x, encoding_type='logarithm', alpha_init=1.0, mean_lam_init=0.5, gamma_init=0.5, delta_fixed_value=None, delta_init=0.5, max_metric_value=1.0, **kwargs)[source]
Bases:
KernelFunction
Variant of the kernel function for modeling exponentially decaying learning curves, proposed in:
Swersky, K., Snoek, J., & Adams, R. P. (2014).Freeze-Thaw Bayesian Optimization.The argument in that paper actually justifies using a non-zero mean function (see
ExponentialDecayResourcesMeanFunction
) and centralizing the kernel proposed there. This is done here. Details in:Tiao, Klein, Archambeau, Seeger (2020)Model-based Asynchronous Hyperparameter OptimizationWe implement a new family of kernel functions, for which the additive Freeze-Thaw kernel is one instance (
delta == 0
). The kernel has parametersalpha
,mean_lam
,gamma > 0
, and0 <= delta <= 1
. Note thatbeta = alpha / mean_lam
is used in the Freeze-Thaw paper (the Gamma distribution overlambda
is parameterized differently). The additive Freeze-Thaw kernel is obtained fordelta == 0
(usedelta_fixed_value = 0
).In fact, this class is configured with a kernel and a mean function over inputs
x
(dimensiond
) and represents a kernel (and mean function) over inputs(x, r)
(dimensiond + 1
), where the resource attributer >= 0
is last.- forward(X1, X2, **kwargs)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- param_encoding_pairs()[source]
- Returns list of tuples
(param_internal, encoding)
over all Gluon parameters maintained here.
- Returns:
List [(param_internal, encoding)]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.exponential_decay.ExponentialDecayResourcesMeanFunction(kernel, **kwargs)[source]
Bases:
MeanFunction
- forward(X)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.fabolas module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.fabolas.FabolasKernelFunction(dimension=1, encoding_type='logarithm', u1_init=1.0, u3_init=0.0, **kwargs)[source]
Bases:
KernelFunction
The kernel function proposed in:
Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, np. (2016). Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets, in AISTATS 2017. ArXiv:1605.07079 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1605.07079
Please note this is only one of the components of the factorized kernel proposed in the paper. This is the finite-rank (“degenerate”) kernel for modelling data subset fraction sizes. Defined as:
k(x, y) = (U phi(x))^T (U phi(y)), x, y in [0, 1], phi(x) = [1, (1 - x)^2]^T, U = [[u1, u3], [0, u2]] upper triangular, u1, u2 > 0.
- forward(X1, X2)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.freeze_thaw module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.freeze_thaw.FreezeThawKernelFunction(kernel_x, mean_x, encoding_type='logarithm', alpha_init=1.0, mean_lam_init=0.5, gamma_init=0.5, max_metric_value=1.0, **kwargs)[source]
Bases:
KernelFunction
Variant of the kernel function for modeling exponentially decaying learning curves, proposed in:
Swersky, K., Snoek, J., & Adams, R. P. (2014). Freeze-Thaw Bayesian Optimization. ArXiv:1406.3896 [Cs, Stat). Retrieved from http://arxiv.org/abs/1406.3896
The argument in that paper actually justifies using a non-zero mean function (see
ExponentialDecayResourcesMeanFunction
) and centralizing the kernel proposed there. This is done here.As in the Freeze-Thaw paper, learning curves for different configs are conditionally independent.
This class is configured with a kernel and a mean function over inputs x (dimension d) and represents a kernel (and mean function) over inputs (x, r) (dimension d + 1), where the resource attribute r >= 0 is last.
Note: This kernel is mostly for debugging! Its conditional independence assumptions allow for faster inference, as implemented in
GaussProcExpDecayPosteriorState
.- forward(X1, X2, **kwargs)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- param_encoding_pairs()[source]
- Returns list of tuples
(param_internal, encoding)
over all Gluon parameters maintained here.
- Returns:
List [(param_internal, encoding)]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.freeze_thaw.FreezeThawMeanFunction(kernel, **kwargs)[source]
Bases:
MeanFunction
- forward(X)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.product_kernel module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.product_kernel.ProductKernelFunction(kernel1, kernel2, name_prefixes=None, **kwargs)[source]
Bases:
KernelFunction
Given two kernel functions K1, K2, this class represents the product kernel function given by
\[((x_1, x_2), (y_1, y_2)) \mapsto K(x_1, y_1) \cdot K(x_2, y_2)\]We assume that parameters of K1 and K2 are disjoint.
- forward(X1, X2)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.range_kernel module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.range_kernel.RangeKernelFunction(dimension, kernel, start, **kwargs)[source]
Bases:
KernelFunction
Given kernel function
K
and rangeR
, this class represents\[(x, y) \mapsto K(x_R, y_R)\]- forward(X1, X2)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.ZeroKernel(dimension, **kwargs)[source]
Bases:
KernelFunction
Constant zero kernel. This works only in the context used here, we do return matrices or vectors, but zero scalars.
- forward(X1, X2, **kwargs)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.ZeroMean(**kwargs)[source]
Bases:
MeanFunction
- forward(X)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.ExponentialDecayBaseKernelFunction(r_max, r_min=1, normalize_inputs=False, **kwargs)[source]
Bases:
KernelFunction
Implements exponential decay kernel k_r(r, r’) from the Freeze-Thaw paper, corresponding to
ExponentialDecayResourcesKernelFunction
with delta=0 and no x attributes.Note: Inputs r lie in [r_min, r_max]. Optionally, they are normalized to [0, 1].
- forward(X1, X2)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
- param_encoding_pairs()[source]
- Returns list of tuples
(param_internal, encoding)
over all Gluon parameters maintained here.
- Returns:
List [(param_internal, encoding)]
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.logdet_cholfact_cov_resource(likelihood)[source]
Computes the additional log(det(Lbar)) term. This is sum_i log(det(Lbar_i)), where Lbar_i is upper left submatrix of
likelihood['lfact_all']
, with sizelikelihood['ydims'][i]
.- Parameters:
likelihood (
Dict
) – Result ofresource_kernel_likelihood_computations
- Return type:
float
- Returns:
log(det(Lbar))
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.resource_kernel_likelihood_precomputations(targets)[source]
Precomputations required by
resource_kernel_likelihood_computations
.Importantly,
prepare_data
orders datapoints by nonincreasing number of targetsydims[i]
. For0 <= j < ydim_max
,ydim_max = ydims[0] = max(ydims)
,num_configs[j]
is the number of datapoints i for whichydims[i] > j
.yflat
is a flat matrix (rows corresponding to fantasy samples; column vector if no fantasizing) consisting ofydim_max
parts, where part j is of sizenum_configs[j]
and containsy[j]
for targets of those i counted innum_configs[j]
.- Parameters:
targets (
List
[ndarray
]) – Targets from data representation returned byprepare_data
- Return type:
Dict
- Returns:
See above
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.resource_kernel_likelihood_computations(precomputed, res_kernel, noise_variance, skip_c_d=False)[source]
Given
precomputed
fromresource_kernel_likelihood_precomputations
and resource kernel functionres_kernel
, compute quantities required for inference and marginal likelihood computation, pertaining to the likelihood of a additive model, as in the Freeze-Thaw paper.Note that
res_kernel
takes raw (unnormalized) r as inputs. The code here works for any resource kernel and mean function, not just forExponentialDecayBaseKernelFunction
.Results returned are: - c: n vector [c_i] - d: n vector [d_i], positive - vtv: n vector [|v_i|^2] - wtv: (n, F) matrix[(W_i)^T v_i], F number of fantasy samples - wtw: n vector [|w_i|^2] (only if no fantasizing) - lfact_all: Cholesky factor for kernel matrix - ydims: Target vector sizes (copy from
precomputed
)- Parameters:
precomputed (
Dict
) – Output ofresource_kernel_likelihood_precomputations
res_kernel (
ExponentialDecayBaseKernelFunction
) – Kernel k(r, r’) over resourcesnoise_variance – Noise variance sigma^2
skip_c_d (
bool
) – If True, c and d are not computed
- Return type:
Dict
- Returns:
Quantities required for inference and learning criterion
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.resource_kernel_likelihood_slow_computations(targets, res_kernel, noise_variance, skip_c_d=False)[source]
Naive implementation of
resource_kernel_likelihood_computations
, which does not require precomputations, but is somewhat slower. Here, results are computed one datapoint at a time, instead of en bulk.This code is used in unit testing only.
- Return type:
Dict
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.predict_posterior_marginals_extended(poster_state, mean, kernel, test_features, resources, res_kernel)[source]
These are posterior marginals on f_r = h + g_r variables, where (x, r) are zipped from
test_features
,resources
.posterior_means
is a (n, F) matrix, where F is the number of fantasy samples, or F == 1 without fantasizing.- Parameters:
poster_state (
Dict
) – Posterior statemean – Mean function
kernel – Kernel function
test_features – Feature matrix for test points (not extended)
resources (
List
[int
]) – Resource values corresponding to rows oftest_features
res_kernel (
ExponentialDecayBaseKernelFunction
) – Kernel k(r, r’) over resources
- Returns:
posterior_means, posterior_variances
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.sample_posterior_joint(poster_state, mean, kernel, feature, targets, res_kernel, noise_variance, lfact_all, means_all, random_state, num_samples=1)[source]
Given
poster_state
for some data plus one additional configuration with data (feature
,targets
), draw joint samples of unobserved targets for this configuration.targets
may be empty, but must not be complete (there must be some unobserved targets). The additional configuration must not be in the dataset used to computeposter_state
.If
targets
correspond to resource values range(r_min, r_obs), we sample latent target values y_r corresponding to range(r_obs, r_max+1), returning a dict with [y_r] undery
(matrix withnum_samples
columns).- Parameters:
poster_state (
Dict
) – Posterior state for datamean – Mean function
kernel – Kernel function
feature – Features for additional config
targets (
ndarray
) – Target values for additional configres_kernel (
ExponentialDecayBaseKernelFunction
) – Kernel k(r, r’) over resourcesnoise_variance – Noise variance sigma^2
lfact_all – Cholesky factor of complete resource kernel matrix
means_all – See
lfact_all
random_state (
RandomState
) – numpy.random.RandomStatenum_samples (
int
) – Number of joint samples to draw (default: 1)
- Return type:
Dict
- Returns:
See above
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.gpiss_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.gpiss_model.GaussianProcessLearningCurveModel(kernel, res_model, mean=None, initial_noise_variance=None, optimization_config=None, random_seed=None, fit_reset_params=True)[source]
Bases:
GaussianProcessOptimizeModel
Represents joint Gaussian model of learning curves over a number of configurations. The model has an additive form:
f(x, r) = g(r | x) + h(x),
where h(x) is a Gaussian process model for function values at r_max, and the g(r | x) are independent Gaussian models. Right now, g(r | x) can be:
- Innovation state space model (ISSM) of a particular power-law decay
form. For this one, g(r_max | x) = 0 for all x. Used if
res_model
is of typeISSModelParameters
- Gaussian process model with exponential decay covariance function. This
is essentially the model from the Freeze Thaw paper, see also
ExponentialDecayResourcesKernelFunction
. Used ifres_model
is of typeExponentialDecayBaseKernelFunction
Importantly, inference scales cubically only in the number of configurations, not in the number of observations.
Details about ISSMs in general are found in
Hyndman, R. and Koehler, A. and Ord, J. and Snyder, R. Forecasting with Exponential Smoothing: The State Space Approach Springer, 2008
- Parameters:
kernel (
KernelFunction
) – Kernel function k(X, X’)res_model (
Union
[ISSModelParameters
,ExponentialDecayBaseKernelFunction
]) – Model for g(r | x)mean (
Optional
[MeanFunction
]) – Mean function mu(X)initial_noise_variance (
Optional
[float
]) – A scalar to initialize the value of the residual noise varianceoptimization_config (
Optional
[OptimizationConfig
]) – Configuration that specifies the behavior of the optimization of the marginal likelihood.random_seed – Random seed to be used (optional)
fit_reset_params (
bool
) – Reset parameters to initial values before running ‘fit’? If False, ‘fit’ starts from the current values
- property likelihood: MarginalLikelihood
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm module
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.prepare_data(state, config_space_ext, active_metric, normalize_targets=False, do_fantasizing=False)[source]
Prepares data in
state
for further processing. The entriesconfigs
,targets
of the result dict are lists of one entry per trial, they are sorted in decreasing order of number of target values.features
is the feature matrix corresponding toconfigs
. Ifnormalize_targets
is True, the target values are normalized to mean 0, variance 1 (over all values), andmean_targets
,std_targets
is returned.If
do_fantasizing
is True,state.pending_evaluations
is also taken into account. Entries there have to be of typeFantasizedPendingEvaluation
. Also, in terms of their resource levels, they need to be adjacent to observed entries, so there are no gaps. In this case, the entries of thetargets
list are matrices, each column corr´esponding to a fantasy sample.Note: If
normalize_targets
, mean and stddev are computed over observed values only. Also, fantasy values instate.pending_evaluations
are not normalized, because they are assumed to be sampled from the posterior with normalized targets as well.- Parameters:
state (
TuningJobState
) –TuningJobState
with dataconfig_space_ext (
ExtendedConfiguration
) – Extended config spaceactive_metric (
str
) –normalize_targets (
bool
) – See abovedo_fantasizing (
bool
) – See above
- Return type:
Dict
- Returns:
See above
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.prepare_data_with_pending(state, config_space_ext, active_metric, normalize_targets=False)[source]
Similar to
prepare_data
withdo_fantasizing=False
, but two dicts are returned, the first for trials without pending evaluations, the second for trials with pending evaluations. The latter dict also contains trials which have pending, but no observed evaluations. The second dict has the additional entrynum_pending
, which lists the number of pending evals for each trial. These evals must be contiguous and adjacent with observed evals, so that the union of observed and pending evals are contiguous (when it comes to resource levels).- Parameters:
state (
TuningJobState
) – Seeprepare_data
config_space_ext (
ExtendedConfiguration
) – Seeprepare_data
active_metric (
str
) – Seeprepare_data
normalize_targets (
bool
) – Seeprepare_data
- Return type:
(
Dict
,Dict
)- Returns:
See above
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.issm_likelihood_precomputations(targets, r_min)[source]
Precomputations required by
issm_likelihood_computations
.Importantly,
prepare_data
orders datapoints by nonincreasing number of targetsydims[i]
. For0 <= j < ydim_max
,ydim_max = ydims[0] = max(ydims)
,num_configs[j]
is the number of datapoints i for whichydims[i] > j
.deltay
is a flat matrix (rows corresponding to fantasy samples; column vector if no fantasizing) consisting ofydim_max
parts, where part j is of sizenum_configs[j]
and containsy[j] - y[j-1]
for targets of those i counted innum_configs[j]
, the term needed in the recurrence to computew[j]
. ‘logr`` is a flat vector consisting ofydim_max - 1
parts, where part j (starting from 1) is of sizenum_configs[j]
and contains the logarithmic term for computinga[j-1]
ande[j]
.- Parameters:
targets (
List
[ndarray
]) – Targets from data representation returned byprepare_data
r_min (
int
) – Value of r_min, as returned byprepare_data
- Return type:
Dict
- Returns:
See above
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.issm_likelihood_computations(precomputed, issm_params, r_min, r_max, skip_c_d=False)[source]
Given
precomputed
fromissm_likelihood_precomputations
and ISSM parametersissm_params
, compute quantities required for inference and marginal likelihood computation, pertaining to the ISSM likelihood.The index for r is range(r_min, r_max + 1). Observations must be contiguous from r_min. The ISSM parameters are: - alpha: n-vector, negative - beta: n-vector - gamma: scalar, positive
Results returned are: - c: n vector [c_i], negative - d: n vector [d_i], positive - vtv: n vector [|v_i|^2] - wtv: (n, F) matrix [(W_i)^T v_i], F number of fantasy samples - wtw: n-vector [|w_i|^2] (only if no fantasizing)
- Parameters:
precomputed (
Dict
) – Output ofissm_likelihood_precomputations
issm_params (
Dict
) – Parameters of ISSM likelihoodr_min (
int
) – Smallest resource valuer_max (
int
) – Largest resource valueskip_c_d (
bool
) – If True, c and d are not computed
- Return type:
Dict
- Returns:
Quantities required for inference and learning criterion
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.posterior_computations(features, mean, kernel, issm_likelihood, noise_variance)[source]
Computes posterior state (required for predictions) and negative log marginal likelihood (returned in
criterion
), The latter is computed only when there is no fantasizing (i.e., ifissm_likelihood
containswtw
).- Parameters:
features – Input matrix X
mean – Mean function
kernel – Kernel function
issm_likelihood (
Dict
) – Outcome ofissm_likelihood_computations
noise_variance – Variance of ISSM innovations
- Return type:
Dict
- Returns:
Internal posterior state
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.predict_posterior_marginals(poster_state, mean, kernel, test_features)[source]
These are posterior marginals on the h variable, whereas the full model is for f_r = h + g_r (additive).
posterior_means
is a (n, F) matrix, where F is the number of fantasy samples, or F == 1 without fantasizing.- Parameters:
poster_state (
Dict
) – Posterior statemean – Mean function
kernel – Kernel function
test_features – Feature matrix for test points (not extended)
- Returns:
posterior_means, posterior_variances
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.sample_posterior_marginals(poster_state, mean, kernel, test_features, random_state, num_samples=1)[source]
We sample from posterior marginals on the h variance, see also
predict_posterior_marginals
.
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.predict_posterior_marginals_extended(poster_state, mean, kernel, test_features, resources, issm_params, r_min, r_max)[source]
These are posterior marginals on f_r = h + g_r variables, where (x, r) are zipped from
test_features
,resources
.issm_params
are likelihood parameters for the test configs.posterior_means
is a (n, F) matrix, where F is the number of fantasy samples, or F == 1 without fantasizing.- Parameters:
poster_state (
Dict
) – Posterior statemean – Mean function
kernel – Kernel function
test_features – Feature matrix for test points (not extended)
resources (
List
[int
]) – Resource values corresponding to rows oftest_features
issm_params (
Dict
) – See abover_min (
int
) –r_max (
int
) –
- Returns:
posterior_means, posterior_variances
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.sample_posterior_joint(poster_state, mean, kernel, feature, targets, issm_params, r_min, r_max, random_state, num_samples=1)[source]
Given
poster_state
for some data plus one additional configuration with data (feature
,targets
,issm_params
), draw joint samples of the latent variables not fixed by the data, and of the latent target values.targets
may be empty, but must not reach all the way tor_max
. The additional configuration must not be in the dataset used to computeposter_state
.If
targets
correspond to resource values range(r_min, r_obs), we sample latent target values y_r corresponding to range(r_obs, r_max+1) and latent function values f_r corresponding to range(r_obs-1, r_max+1), unless r_obs = r_min (i.e.targets
empty), in which case both [y_r] and [f_r] ranges in range(r_min, r_max+1). We return a dict with [f_r] underf
, [y_r] undery
. These are matrices withnum_samples
columns.- Parameters:
poster_state (
Dict
) – Posterior state for datamean – Mean function
kernel – Kernel function
feature – Features for additional config
targets (
ndarray
) – Target values for additional configissm_params (
Dict
) – Likelihood parameters for additional configr_min (
int
) – Smallest resource valuer_max (
int
) – Largest resource valuerandom_state (
RandomState
) – numpy.random.RandomStatenum_samples (
int
) – Number of joint samples to draw (default: 1)
- Return type:
Dict
- Returns:
See above
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.issm_likelihood_slow_computations(targets, issm_params, r_min, r_max, skip_c_d=False)[source]
Naive implementation of
issm_likelihood_computations
, which does not require precomputations, but is much slower. Here, results are computed one datapoint at a time, instead of en bulk.This code is used in unit testing, and called from
sample_posterior_joint
.- Return type:
Dict
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.update_posterior_state(poster_state, kernel, feature, d_new, s_new, r2_new)[source]
Incremental update of posterior state, given data for one additional configuration. The new datapoint gives rise to a new row/column of the Cholesky factor. r2vec and svec are extended by
r2_new
,s_new
respectively. r4vec and pvec are extended and all entries change. The new datapoint is represented byfeature
,d_new
,s_new
,r2_new
.Note: The field
criterion
is not updated, but set to np.nan.- Parameters:
poster_state (
Dict
) – Posterior state for datakernel – Kernel function
feature – Features for additional config
d_new – See above
s_new – See above
r2_new – See above
- Return type:
Dict
- Returns:
Updated posterior state
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.update_posterior_pvec(poster_state, kernel, feature, d_new, s_new, r2_new)[source]
Part of
update_posterior_state
, just returns the new p vector.- Parameters:
poster_state (
Dict
) – Seeupdate_posterior_state
kernel – See
update_posterior_state
feature – See
update_posterior_state
d_new – See
update_posterior_state
s_new – See
update_posterior_state
r2_new – See
update_posterior_state
- Return type:
ndarray
- Returns:
New p vector, as flat vector
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.likelihood module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.likelihood.GaussAdditiveMarginalLikelihood(kernel, res_model, mean=None, initial_noise_variance=None, encoding_type=None, **kwargs)[source]
Bases:
MarginalLikelihood
Marginal likelihood of joint learning curve model, where each curve is modelled as sum of a Gaussian process over x (for the value at r_max) and a Gaussian model over r.
The latter
res_model
is either an ISSM or another Gaussian process with exponential decay covariance function.- Parameters:
kernel (
KernelFunction
) – Kernel function k(x, x’)res_model (
Union
[ISSModelParameters
,ExponentialDecayBaseKernelFunction
]) – Gaussian model over rmean (
Optional
[MeanFunction
]) – Mean function mu(x)initial_noise_variance – A scalar to initialize the value of the residual noise variance
- forward(data)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- param_encoding_pairs()[source]
Return a list of tuples with the Gluon parameters of the likelihood and their respective encodings
- Return type:
List
[tuple
]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.model_params module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.model_params.ISSModelParameters(gamma_is_one=False, **kwargs)[source]
Bases:
MeanFunction
Maintains parameters of an ISSM of a particular power low decay form.
For each configuration, we have alpha < 0 and beta. These may depend on the input feature x (encoded configuration):
(alpha, beta) = F(x; params),
where params are the internal parameters to be learned.
There is also gamma > 0, which can be fixed to 1.
- param_encoding_pairs()[source]
- Returns list of tuples
(param_internal, encoding)
over all Gluon parameters maintained here.
- Returns:
List [(param_internal, encoding)]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.model_params.IndependentISSModelParameters(gamma_is_one=False, **kwargs)[source]
Bases:
ISSModelParameters
Most basic implementation, where alpha, beta are scalars, independent of the configuration.
- param_encoding_pairs()[source]
- Returns list of tuples
(param_internal, encoding)
over all Gluon parameters maintained here.
- Returns:
List [(param_internal, encoding)]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.posterior_state module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.posterior_state.GaussProcAdditivePosteriorState(data, mean, kernel, noise_variance, **kwargs)[source]
Bases:
PosteriorState
Represent posterior state for joint Gaussian model of learning curves over a number of configurations. The (additive) model is the sum of a Gaussian process model for function values at r_max and independent Gaussian models over r only.
Importantly, inference scales cubically only in the number of configurations, not in the number of observations.
- property num_data
- property num_features
- property num_fantasies
- predict(test_features)[source]
We compute marginals over f(x, r), where
test_features
are extended features. Note: The test configs must not overlap with any in the training set. Otherwise, at least ifr != r_max
, the predictive distributions computed here may be wrong.- Parameters:
test_features (
ndarray
) – Extended features for test configs- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
posterior_means, posterior_variances
- sample_marginals(test_features, num_samples=1, random_state=None)[source]
See comments of
predict
.- Parameters:
test_features (
ndarray
) – Input points for test configsnum_samples (
int
) – Number of samplesrandom_state (
Optional
[RandomState
]) – PRNG
- Return type:
ndarray
- Returns:
Marginal samples, (num_test, num_samples)
- backward_gradient(input, head_gradients, mean_data, std_data)[source]
Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.
- Parameters:
input (
ndarray
) – Single input point x, shape (d,)head_gradients (
Dict
[str
,ndarray
]) – See Predictor.backward_gradientmean_data (
float
) – Mean used to normalize targetsstd_data (
float
) – Stddev used to normalize targets
- Return type:
ndarray
- Returns:
- sample_curves(data, num_samples=1, random_state=None)[source]
Given data from one or more configurations (as returned by
issm.prepare_data
), for each config, sample a curve from the joint posterior (predictive) distribution over latent targets. The curve for each config indata
may be partly observed, but must not be fully observed. Samples for the different configs are independent. None of the configs indata
must appear in the dataset used to compute the posterior state.The result is a list of dict, one for each config. If for a config, targets in
data
are given for resource values range(r_min, r_obs), the dict entryy
is a joint sample [y_r], r in range(r_obs, r_max+1). For some subclasses (e.g., ISSM), there is also an entryf
with a joint sample [f_r], r in range(r_obs-1, r_max+1), the latent function values before noise. These entries are matrices withnum_samples
columns, which are independent (the joint dependence is along the rows).- Parameters:
data (
Dict
[str
,Any
]) – Data for configs to predict atnum_samples (
int
) – Number of samples to draw from each curverandom_state (
Optional
[RandomState
]) – PRNG state to be used for sampling
- Return type:
List
[dict
]- Returns:
See above
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.posterior_state.IncrementalUpdateGPAdditivePosteriorState(data, mean, kernel, noise_variance, **kwargs)[source]
Bases:
GaussProcAdditivePosteriorState
Extension of
GaussProcAdditivePosteriorState
which allows for incremental updating (single config added to the dataset). This is required for simulation-based scoring, and for support of fantasizing.- update_pvec(feature, targets)[source]
Part of
update
: Only update prediction vector p. This cannot be used to update p for several new datapoints.- Parameters:
feature (
ndarray
) –targets (
ndarray
) –
- Return type:
ndarray
- Returns:
New p vector
- sample_and_update_for_pending(data_pending, sample_all_nonobserved=False, random_state=None)[source]
This function is needed for sampling fantasy targets, and also to support simulation-based scoring.
issm.prepare_data_with_pending
creates two data dictsdata_nopending
,data_pending
, the first for configs with observed data, but no pending evals, the second for configs with pending evals. You create the state withdata_nopending
, then call this method withdata_pending
.This method is iterating over configs (or trials) in
data_pending
. For each config, it draws a joint sample from some non-observed targets, then updates the state conditioned on observed and sampled targets (by callingupdate
). Ifsample_all_nonobserved
is False, the number of targets sampled is the entry indata_pending['num_pending']
. Otherwise, targets are sampled for all non-observed positions.The method returns the list of sampled target vectors, and the state at the end (like
update
does as well).- Parameters:
data_pending (
Dict
[str
,Any
]) – See abovesample_all_nonobserved (
bool
) – See aboverandom_state (
Optional
[RandomState
]) – PRNG
- Return type:
(
List
[ndarray
], IncrementalUpdateGPAdditivePosteriorState)- Returns:
pending_targets, final_state
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.posterior_state.GaussProcISSMPosteriorState(data, mean, kernel, iss_model, noise_variance, **kwargs)[source]
Bases:
IncrementalUpdateGPAdditivePosteriorState
Represent posterior state for joint Gaussian model of learning curves over a number of configurations. The model is the sum of a Gaussian process model for function values at r_max and independent Gaussian linear innovation state space models (ISSMs) of a particular power law decay form.
- predict(test_features)[source]
We compute marginals over f(x, r), where
test_features
are extended features. Note: The test configs must not overlap with any in the training set. Otherwise, at least ifr != r_max
, the predictive distributions computed here may be wrong.- Parameters:
test_features (
ndarray
) – Extended features for test configs- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
posterior_means, posterior_variances
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.posterior_state.GaussProcExpDecayPosteriorState(data, mean, kernel, res_kernel, noise_variance, **kwargs)[source]
Bases:
IncrementalUpdateGPAdditivePosteriorState
Represent posterior state for joint Gaussian model of learning curves over a number of configurations. The model is the sum of a Gaussian process model for function values at r_max and independent Gaussian processes over r, using an exponential decay covariance function. The latter is shared between all configs.
This is essentially the model from the Freeze Thaw paper (see also
ExponentialDecayResourcesKernelFunction
).- predict(test_features)[source]
We compute marginals over f(x, r), where
test_features
are extended features. Note: The test configs must not overlap with any in the training set. Otherwise, at least ifr != r_max
, the predictive distributions computed here may be wrong.- Parameters:
test_features (
ndarray
) – Extended features for test configs- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
posterior_means, posterior_variances
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.constants module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.constants.OptimizationConfig(lbfgs_tol, lbfgs_maxiter, verbose, n_starts)[source]
Bases:
object
-
lbfgs_tol:
float
-
lbfgs_maxiter:
int
-
verbose:
bool
-
n_starts:
int
-
lbfgs_tol:
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.constants.MCMCConfig(n_samples, n_burnin, n_thinning)[source]
Bases:
object
n_samples
is the total number of samples drawn. The firstn_burnin
of these are dropped (burn-in), and everyn_thinning
of the rest is returned. This means we return(n_samples - n_burnin) // n_thinning
samples.-
n_samples:
int
-
n_burnin:
int
-
n_thinning:
int
-
n_samples:
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.custom_op module
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.custom_op.AddJitterOp(*args, **kwargs)
Finds smaller jitter to add to diagonal of square matrix to render the matrix positive definite (in that linalg.potrf works).
Given input x (positive semi-definite matrix) and
sigsq_init
(nonneg scalar), findsigsq_final
(nonneg scalar), so that:sigsq_final = sigsq_init + jitter
,jitter >= 0
,x + sigsq_final * Id
positive definite (so thatpotrf
call works)We return the matrix
x + sigsq_final * Id
, for whichpotrf
has not failed.For the gradient, the dependence of jitter on the inputs is ignored.
The values tried for sigsq_final are:
sigsq_init, sigsq_init + initial_jitter * (jitter_growth ** k)
,k = 0, 1, 2, ...
,initial_jitter = initial_jitter_factor * max(mean(diag(x)), 1)
Note: The scaling of initial_jitter with
mean(diag(x))
is taken fromGPy
. The rationale is that the largest eigenvalue of x is>= mean(diag(x))
, and likely of this magnitude.There is no guarantee that the Cholesky factor returned is well-conditioned enough for subsequent computations to be reliable. A better solution would be to estimate the condition number of the Cholesky factor, and to add jitter until this is bounded below a threshold we tolerate. See
Higham, N.A Survey of Condition Number Estimation for Triangular MatricesMIMS EPrint: 2007.10Algorithm 4.1 could work for us.
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.custom_op.flatten_and_concat(x, sigsq_init)[source]
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.custom_op.cholesky_factorization(*args, **kwargs)
Replacement for
autograd.numpy.linalg.cholesky()
. Our backward (vjp) is faster and simpler, while somewhat less general (only works ifa.ndim == 2
).See https://arxiv.org/abs/1710.08717 for derivation of backward (vjp) expression.
- Parameters:
a – Symmmetric positive definite matrix A
- Returns:
Lower-triangular Cholesky factor L of A
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.Distribution[source]
Bases:
object
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.Gamma(mean, alpha)[source]
Bases:
Distribution
Gamma(mean, alpha):
p(x) = C(alpha, beta) x^{alpha - 1} exp( -beta x), beta = alpha / mean, C(alpha, beta) = beta^alpha / Gamma(alpha)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.Uniform(lower, upper)[source]
Bases:
Distribution
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.Normal(mean, sigma)[source]
Bases:
Distribution
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.LogNormal(mean, sigma)[source]
Bases:
Distribution
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.Horseshoe(s)[source]
Bases:
Distribution
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon module
Gluon APIs for autograd
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon.Block(prefix=None, params=None)[source]
Bases:
object
Base class for all neural network layers and models. Your models should subclass this class.
Block
can be nested recursively in a tree structure. You can create and assign childBlock
as regular attributes:from mxnet.gluon import Block, nn from mxnet import ndarray as F class Model(Block): def __init__(self, **kwargs): super(Model, self).__init__(**kwargs) # use name_scope to give child Blocks appropriate names. with self.name_scope(): self.dense0 = nn.Dense(20) self.dense1 = nn.Dense(20) def forward(self, x): x = F.relu(self.dense0(x)) return F.relu(self.dense1(x)) model = Model() model.initialize(ctx=mx.cpu(0)) model(F.zeros((10, 10), ctx=mx.cpu(0)))
Child
Block
assigned this way will be registered andcollect_params()
will collect their Parameters recursively. You can also manually register child blocks withregister_child()
. Parameters ———- prefix : strPrefix acts like a name space. All children blocks created in parent block’s
name_scope()
will have parent block’s prefix in their name. Please refer to naming tutorial for more info on prefix and naming.- paramsParameterDict or None
ParameterDict
for sharing weights with the newBlock
. For example, if you wantdense1
to sharedense0
’s weights, you can do:dense0 = nn.Dense(20) dense1 = nn.Dense(20, params=dense0.collect_params())
- name_scope()[source]
Returns a name space object managing a child
Block
and parameter names. Should be used within awith
statement:with self.name_scope(): self.dense = nn.Dense(20)
Please refer to the naming tutorial for more info on prefix and naming.
- property params
Returns this
Block
’s parameter dictionary (does not include its children’s parameters).
- collect_params(select=None)[source]
Returns a
ParameterDict
containing thisBlock
and all of its children’s Parameters(default), also can returns the selectParameterDict
which match some given regular expressions. For example, collect the specified parameters in [‘conv1_weight’, ‘conv1_bias’, ‘fc_weight’, ‘fc_bias’]:model.collect_params('conv1_weight|conv1_bias|fc_weight|fc_bias')
or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:
model.collect_params('.*weight|.*bias')
Parameters
- selectstr
regular expressions
Returns
The selected
ParameterDict
- register_child(block, name=None)[source]
Registers block as a child of self.
Block
s assigned to self as attributes will be registered automatically.
- apply(fn)[source]
Applies
fn
recursively to every child block as well as self. Parameters ———- fn : callableFunction to be applied to each submodule, of form
fn(block)
.Returns
this block
- initialize(init=None, ctx=None, verbose=False, force_reinit=False)[source]
Initializes
Parameter
s of thisBlock
and its children. Equivalent toblock.collect_params().initialize(...)
Parameters ———- init : InitializerGlobal default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.- ctxContext or list of Context
Keeps a copy of Parameters on one or many context(s).
- verbosebool, default False
Whether to verbosely print out details on initialization.
- force_reinitbool, default False
Whether to force re-initialization if parameter is already initialized.
- cast(dtype)[source]
Cast this Block to use another data type. Parameters ———- dtype : str or numpy.dtype
The new data type.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon.Parameter(name, grad_req='write', shape=None, dtype=<class 'numpy.float64'>, lr_mult=1.0, wd_mult=1.0, init=None, allow_deferred_init=False, differentiable=True, stype='default', grad_stype='default')[source]
Bases:
object
A Container holding parameters (weights) of Blocks.
Parameter
holds a copy of the parameter on eachContext
after it is initialized withParameter.initialize(...)
. Ifgrad_req
is not'null'
, it will also hold a gradient array on eachContext
:x = np.zeros((16, 100)) w = Parameter('fc_weight', shape=(16, 100), init=np.random.uniform) w.initialize() b.initialize() z = x + w.data
Parameters
- namestr
Name of this parameter.
- grad_req{‘write’, ‘add’, ‘null’}, default ‘write’
Specifies how to update gradient to grad arrays. -
'write'
means everytime gradient is written to gradNDArray
. -'add'
means everytime gradient is added to the gradNDArray
. You needto manually call
zero_grad()
to clear the gradient buffer before each iteration when using this option.‘null’ means gradient is not requested for this parameter. gradient arrays will not be allocated.
- shapeint or tuple of int, default None
Shape of this parameter. By default shape is not specified. Parameter with unknown shape can be used for
Symbol
API, butinit
will throw an error when usingNDArray
API.- dtypenumpy.dtype or str, default ‘float64’
Data type of this parameter. For example,
numpy.float64
or'float64'
.- lr_multfloat, default 1.0
Learning rate multiplier. Learning rate will be multiplied by lr_mult when updating this parameter with optimizer.
- wd_multfloat, default 1.0
Weight decay multiplier (L2 regularizer coefficient). Works similar to lr_mult.
- initInitializer, default None
Initializer of this parameter. Will use the global initializer by default.
- stype: {‘default’, ‘row_sparse’, ‘csr’}, defaults to ‘default’.
The storage type of the parameter.
- grad_stype: {‘default’, ‘row_sparse’, ‘csr’}, defaults to ‘default’.
The storage type of the parameter’s gradient.
Attributes
- grad_req{‘write’, ‘add’, ‘null’}
This can be set before or after initialization. Setting
grad_req
to'null'
withx.grad_req = 'null'
saves memory and computation when you don’t need gradient w.r.t x.- lr_multfloat
Local learning rate multiplier for this Parameter. The actual learning rate is calculated with
learning_rate * lr_mult
. You can set it withparam.lr_mult = 2.0
- wd_multfloat
Local weight decay multiplier for this Parameter.
- property grad_req
- property dtype
The type of the parameter. Setting the dtype value is equivalent to casting the value of the parameter
- property shape
The shape of the parameter. By default, an unknown dimension size is 0. However, when the NumPy semantic is turned on, unknown dimension size is -1.
- initialize(init=None, ctx=None, default_init=None, force_reinit=False)[source]
Initializes parameter and gradient arrays. Only used for
NDArray
API. Parameters ———- init : InitializerThe initializer to use. Overrides
Parameter.init()
and default_init.- ctxContext or list of Context, defaults to
context.current_context()
. Initialize Parameter on given context. If ctx is a list of Context, a copy will be made for each context. .. note:
Copies are independent arrays. User is responsible for keeping their values consistent when updating. Normally :py:class:`gluon.Trainer` does this for you.
- default_initInitializer
Default initializer is used when both
init()
andParameter.init()
areNone
.- force_reinitbool, default False
Whether to force re-initialization if parameter is already initialized.
Examples
>>> weight = mx.gluon.Parameter('weight', shape=(2, 2)) >>> weight.initialize(ctx=mx.cpu(0)) >>> weight.data() [[-0.01068833 0.01729892] [ 0.02042518 -0.01618656]] <NDArray 2x2 @cpu(0)> >>> weight.grad() [[ 0. 0.] [ 0. 0.]] <NDArray 2x2 @cpu(0)> >>> weight.initialize(ctx=[mx.gpu(0), mx.gpu(1)]) >>> weight.data(mx.gpu(0)) [[-0.00873779 -0.02834515] [ 0.05484822 -0.06206018]] <NDArray 2x2 @gpu(0)> >>> weight.data(mx.gpu(1)) [[-0.00873779 -0.02834515] [ 0.05484822 -0.06206018]] <NDArray 2x2 @gpu(1)>
- ctxContext or list of Context, defaults to
- reset_ctx(ctx)[source]
Re-assign Parameter to other contexts. Parameters ———- ctx : Context or list of Context, default
context.current_context()
.Assign Parameter to given context. If ctx is a list of Context, a copy will be made for each context.
- data(ctx=None)[source]
Returns a copy of this parameter on one context. Must have been initialized on this context before. For sparse parameters, use
Parameter.row_sparse_data()
instead. Parameters ———- ctx : ContextDesired context.
Returns
NDArray on ctx
- list_data()[source]
Returns copies of this parameter on all contexts, in the same order as creation. For sparse parameters, use
Parameter.list_row_sparse_data()
instead. Returns ——- list of NDArrays
- grad(ctx=None)[source]
Returns a gradient buffer for this parameter on one context. Parameters ———- ctx : Context
Desired context.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon.ParameterDict(prefix='', shared=None)[source]
Bases:
object
A dictionary managing a set of parameters. Parameters ———- prefix : str, default
''
The prefix to be prepended to all Parameters’ names created by this dict.
- sharedParameterDict or None
If not
None
, when this dict’sget()
method creates a new parameter, will first try to retrieve it from “shared” dict. Usually used for sharing parameters with another Block.
- property prefix
Prefix of this dict. It will be prepended to
Parameter`s' name created with :py:func:`get
.
- get(name, **kwargs)[source]
Retrieves a
Parameter
with nameself.prefix+name
. If not found,get()
will first try to retrieve it from “shared” dict. If still not found,get()
will create a newParameter
with key-word arguments and insert it to self. Parameters ———- name : strName of the desired Parameter. It will be prepended with this dictionary’s prefix.
Returns
- Parameter
The created or retrieved
Parameter
.
- initialize(init=None, ctx=None, verbose=False, force_reinit=False)[source]
Initializes all Parameters managed by this dictionary to be used for
NDArray
API. It has no effect when usingSymbol
API. Parameters ———- init : InitializerGlobal default Initializer to be used when
Parameter.init()
isNone
. Otherwise,Parameter.init()
takes precedence.- ctxContext or list of Context
Keeps a copy of Parameters on one or many context(s).
- verbosebool, default False
Whether to verbosely print out details on initialization.
- force_reinitbool, default False
Whether to force re-initialization if parameter is already initialized.
- reset_ctx(ctx)[source]
Re-assign all Parameters to other contexts. Parameters ———- ctx : Context or list of Context, default
context.current_context()
.Assign Parameter to given context. If ctx is a list of Context, a copy will be made for each context.
- list_ctx()[source]
Returns a list of all the contexts on which the underlying Parameters are initialized.
- setattr(name, value)[source]
Set an attribute to a new value for all Parameters. For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:
model.collect_params().setattr('grad_req', 'null')
- or change the learning rate multiplier::
model.collect_params().setattr(‘lr_mult’, 0.5)
Parameters
- namestr
Name of the attribute.
- valuevalid type for attribute name
The new value for the attribute.
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.ConstantPositiveVector(param_name, encoding, size_cols, **kwargs)[source]
Bases:
Block
Represents constant vector, with positive entry value represented as Gluon parameter, to be used in the context of wrapper classes in
gluon_blocks.py
. Shape,dtype
, and context are determined from the features argument:If
features.shape = (n, d)
:shape = (d, 1)
ifsize_cols = True
(number cols of features)shape = (n, 1)
ifsize_cols = False
(number rows of features)dtype = features.dtype
,ctx = features.ctx
Encoding and internal Gluon parameter: The positive scalar parameter is encoded via encoding (see
ScalarEncodingBase
). The internal Gluon parameter (before encoding) has thename param_name + "_internal"
.- forward(features, param_internal)[source]
Returns constant positive vector
If
features.shape = (n, d)
, the shape of the vector returned is(d, 1)
ifsize_cols = True
,(n, 1)
otherwise.- Parameters:
features – Matrix for shape, dtype, ctx
param_internal – Unwrapped parameter
- Returns:
Constant positive vector
- switch_updating(flag)[source]
Is the underlying parameter updated during learning?
By default, the parameter takes part in learning (its
grad_req
attribute is ‘write’). Forflag == False
, the attribute is flipped to ‘null’, and the parameter remains constant during learning.- Parameters:
flag – Update parameter during learning?
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.PositiveScalarEncoding(lower, constr_upper=None, init_val=None, regularizer=None, dimension=1)[source]
Bases:
ScalarEncodingBase
Provides encoding for positive scalar and vector:
param > lower
. Here,param
is represented asgluon.Parameter
. Theparam
is with shape(dimension,)
wheredimension
is 1 by default.The encoding is given as:
param = softrelu(param_internal) + lower
,softrelu(x) = log(1 + exp(x))
If
constr_upper
is used, the constraintparam_internal < dec(constr_upper)
can be enforced by an optimizer. Since
dec
is increasing, this translates toparam < constr_upper
. Note: Whilelower
is enforced by the encoding, the upper bound is not, has to be enforced by an optimizer.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.IdentityScalarEncoding(constr_lower=None, constr_upper=None, init_val=None, regularizer=None, dimension=1)[source]
Bases:
ScalarEncodingBase
Identity encoding for scalar and vector:
param = param_internal
This does not ensure that param is positive! Use this only if positivity is otherwise guaranteed.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.LogarithmScalarEncoding(constr_lower=None, constr_upper=None, init_val=None, regularizer=None, dimension=1)[source]
Bases:
ScalarEncodingBase
Logarithmic encoding for scalar and vector:
param = exp(param_internal)
,param_internal = param
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.unwrap_parameter(param_internal, some_arg=None)[source]
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.encode_unwrap_parameter(param_internal, encoding, some_arg=None)[source]
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.param_to_pretty_string(gluon_param, encoding)[source]
Take a gluon parameter and transform it to a string amenable to plotting If need be, the gluon parameter is appropriately encoded (e.g., log-exp transform).
- Parameters:
gluon_param (
Parameter
) – gluon parameterencoding (
ScalarEncodingBase
) – object in charge of encoding/decoding the gluon_param
- Return type:
str
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gp_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gp_model.GaussianProcessModel(random_seed=None)[source]
Bases:
object
Base class for Gaussian-linear models which support parameter fitting and prediction.
- property random_state: RandomState
- property states: List[PosteriorState] | None
- Returns:
Current posterior states (one per MCMC sample; just a single state if model parameters are optimized)
- fit(data)[source]
Adjust model parameters based on training data
data
. Can be done via optimization or MCMC sampling. The posterior states are computed at the end as well.- Parameters:
data (
Dict
[str
,Any
]) – Training data
- recompute_states(data)[source]
Recomputes posterior states for current model parameters.
- Parameters:
data (
Dict
[str
,Any
]) – Training data
- predict(features_test)[source]
Compute the posterior mean(s) and variance(s) for the points in features_test. If the posterior state is based on m target vectors, a (n, m) matrix is returned for posterior means.
- Parameters:
features_test (
ndarray
) – Data matrix X_test of size (n, d) (type np.ndarray) for which n predictions are made- Returns:
posterior_means, posterior_variances
- sample_marginals(features_test, num_samples=1)[source]
Draws marginal samples from predictive distribution at n test points. Notice we concat the samples for each state. Let n_states = len(self._states)
If the posterior state is based on m > 1 target vectors, a (n, m, num_samples * n_states) tensor is returned, for m == 1 we return a (n, num_samples * n_states) matrix.
- Parameters:
features_test (
ndarray
) – Test input points, shape (n, d)num_samples (
int
) – Number of samples
- Returns:
Samples with shape (n, num_samples * n_states) or (n, m, num_samples * n_states) if m > 1
- sample_joint(features_test, num_samples=1)[source]
Draws joint samples from predictive distribution at n test points. This scales cubically with n. the posterior state must be based on a single target vector (m > 1 is not supported).
- Parameters:
features_test (
ndarray
) – Test input points, shape (n, d)num_samples (
int
) – Number of samples
- Returns:
Samples, shape (n, num_samples)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gp_model.GaussianProcessOptimizeModel(optimization_config=None, random_seed=None, fit_reset_params=True)[source]
Bases:
GaussianProcessModel
Base class for models where parameters are fit by maximizing the marginal likelihood.
- property states: List[PosteriorState] | None
- Returns:
Current posterior states (one per MCMC sample; just a single state if model parameters are optimized)
- property likelihood: MarginalLikelihood
- fit(data)[source]
Fit the model parameters by optimizing the marginal likelihood, and set posterior states.
We catch exceptions during the optimization restarts. If any restarts fail, log messages are written. If all restarts fail, the current parameters are not changed.
- Parameters:
data (
Dict
[str
,Any
]) – Input data
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gp_regression module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gp_regression.GaussianProcessRegression(kernel, mean=None, target_transform=None, initial_noise_variance=None, optimization_config=None, random_seed=None, fit_reset_params=True)[source]
Bases:
GaussianProcessOptimizeModel
Gaussian Process Regression
Takes as input a mean function (which depends on X only) and a kernel function.
- Parameters:
kernel (
KernelFunction
) – Kernel functionmean (
Optional
[MeanFunction
]) – Mean function which depends on the input X only (by default, a scalar fitted while optimizing the likelihood)target_transform (
Optional
[ScalarTargetTransform
]) – Invertible transform of target values y to latent values z, which are then modelled as Gaussian. Defaults to the identityinitial_noise_variance (
Optional
[float
]) – Initial value for noise variance parameteroptimization_config (
Optional
[OptimizationConfig
]) – Configuration that specifies the behavior of the optimization of the marginal likelihood.random_seed – Random seed to be used (optional)
fit_reset_params (
bool
) – Reset parameters to initial values before running ‘fit’? If False, ‘fit’ starts from the current values
- property likelihood: MarginalLikelihood
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gpr_mcmc module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gpr_mcmc.GPRegressionMCMC(build_kernel, mcmc_config=MCMCConfig(n_samples=300, n_burnin=250, n_thinning=5), random_seed=None)[source]
Bases:
GaussianProcessModel
- property states: List[GaussProcPosteriorState] | None
- Returns:
Current posterior states (one per MCMC sample; just a single state if model parameters are optimized)
- property number_samples: int
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.likelihood module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.likelihood.MarginalLikelihood(prefix=None, params=None)[source]
Bases:
Block
Interface for marginal likelihood of Gaussian-linear model.
- forward(data)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- param_encoding_pairs()[source]
Return a list of tuples with the Gluon parameters of the likelihood and their respective encodings
- Return type:
List
[tuple
]
- box_constraints_internal()[source]
- Return type:
Dict
[str
,Tuple
[float
,float
]]- Returns:
Box constraints for all the underlying parameters
- reset_params(random_state)[source]
Reset hyperparameters to their initial values (or resample them).
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.likelihood.GaussianProcessMarginalLikelihood(kernel, mean=None, target_transform=None, initial_noise_variance=None, encoding_type=None, **kwargs)[source]
Bases:
MarginalLikelihood
Marginal likelihood of Gaussian process with Gaussian likelihood
- Parameters:
kernel (
KernelFunction
) – Kernel functionmean (
Optional
[MeanFunction
]) – Mean function which depends on the input X only (by default, a scalar fitted while optimizing the likelihood)target_transform (
Optional
[ScalarTargetTransform
]) – Invertible transform of target values y to latent values z, which are then modelled as Gaussian. Defaults to the identityinitial_noise_variance – A scalar to initialize the value of the residual noise variance
- forward(data)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean.MeanFunction(**kwargs)[source]
Bases:
Block
Mean function, parameterizing a surrogate model together with a kernel function.
Note: KernelFunction also inherits from this interface.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean.ScalarMeanFunction(initial_mean_value=0.0, **kwargs)[source]
Bases:
MeanFunction
Mean function defined as a scalar (fitted while optimizing the marginal likelihood).
- Parameters:
initial_mean_value – A scalar to initialize the value of the mean
- forward(X)[source]
Actual computation of the scalar mean function We compute mean_value * vector_of_ones, whose dimensions are given by the the first column of X
- Parameters:
X – input data of size (n,d) for which we want to compute the mean (here, only useful to extract the right dimension)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean.ZeroMeanFunction(**kwargs)[source]
Bases:
MeanFunction
- forward(X)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.optimization_utils module
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.optimization_utils.apply_lbfgs(exec_func, param_dict, bounds, **kwargs)[source]
Run SciPy L-BFGS-B on criterion given by autograd code
Run SciPy L-BFGS-B in order to minimize criterion given by autograd code. Criterion and gradient are computed by:
crit_val, gradient = exec_func(param_vec)
Given an autograd expression, use make_scipy_objective to obtain exec_func. param_vec must correspond to the parameter dictionary param_dict via ParamVecDictConverter. The initial param_vec is taken from param_dict, and final values are written back to param_dict (conversions are done by ParamVecDictConverter).
L-BFGS-B allows box constraints [a, b] for any coordinate. Here, None stands for -infinity (a) or +infinity (b). The default is (None, None), so no constraints. In bounds, box constraints can be specified per argument (the constraint applies to all coordinates of the argument). Pass {} for no constraints.
- Parameters:
exec_func – Function to compute criterion and gradient
param_dict – See above
bounds – See above
- Returns:
None, or dict with info about exception caught
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.optimization_utils.apply_lbfgs_with_multiple_starts(exec_func, param_dict, bounds, random_state, n_starts=5, **kwargs)[source]
When dealing with non-convex problems (e.g., optimization the marginal likelihood), we typically need to start from various starting points. This function applies this logic around apply_lbfgs, randomizing the starting points around the initial values provided in param_dict (see below “copy_of_initial_param_dict”).
The first optimization happens exactly at param_dict, so that the case n_starts=1 exactly coincides with the previously used apply_lbfgs. Importantly, the communication with the L-BFGS solver happens via param_dict, hence all the operations with respect to param_dict are inplace.
We catch exceptions and return ret_infos about these. If none of the restarts worked, param_dict is not modified.
- Parameters:
exec_func – see above
param_dict – see above
bounds – see above
random_state – RandomState for sampling
n_starts – Number of times we start an optimization with L-BFGS (must be >= 1)
- Returns:
List ret_infos of length n_starts. Entry is None if optimization worked, or otherwise has dict with info about exception caught
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.optimization_utils.add_regularizer_to_criterion(criterion, crit_args)[source]
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.optimization_utils.create_lbfgs_arguments(criterion, crit_args, verbose=False)[source]
Creates SciPy optimizer objective and param_dict for criterion function.
- Parameters:
criterion (
MarginalLikelihood
) – Learning criterion (nullary)crit_args (
list
) – Arguments for criterion.forward
- Returns:
scipy_objective, param_dict
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state.PosteriorState[source]
Bases:
object
Interface for posterior state of Gaussian-linear model.
- property num_data
- property num_features
- property num_fantasies
- predict(test_features)[source]
Computes marginal statistics (means, variances) for a number of test features.
- Parameters:
test_features (
ndarray
) – Features for test configs- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
posterior_means, posterior_variances
- sample_marginals(test_features, num_samples=1, random_state=None)[source]
See comments of
predict
.- Parameters:
test_features (
ndarray
) – Input points for test configsnum_samples (
int
) – Number of samplesrandom_state (
Optional
[RandomState
]) – PRNG
- Return type:
ndarray
- Returns:
Marginal samples, (num_test, num_samples)
- backward_gradient(input, head_gradients, mean_data, std_data)[source]
Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.
- Parameters:
input (
ndarray
) – Single input point x, shape (d,)head_gradients (
Dict
[str
,ndarray
]) – See Predictor.backward_gradientmean_data (
float
) – Mean used to normalize targetsstd_data (
float
) – Stddev used to normalize targets
- Return type:
ndarray
- Returns:
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state.PosteriorStateWithSampleJoint[source]
Bases:
PosteriorState
- sample_joint(test_features, num_samples=1, random_state=None)[source]
See comments of
predict
.- Parameters:
test_features (
ndarray
) – Input points for test configsnum_samples (
int
) – Number of samplesrandom_state (
Optional
[RandomState
]) – PRNG
- Return type:
ndarray
- Returns:
Joint samples, (num_test, num_samples)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state.GaussProcPosteriorState(features, targets, mean, kernel, noise_variance, debug_log=False, **kwargs)[source]
Bases:
PosteriorStateWithSampleJoint
Represent posterior state for Gaussian process regression model. Note that members are immutable. If the posterior state is to be updated, a new object is created and returned.
- property num_data
- property num_features
- property num_fantasies
- neg_log_likelihood()[source]
Works only if fantasy samples are not used (single targets vector).
- Return type:
ndarray
- predict(test_features)[source]
Computes marginal statistics (means, variances) for a number of test features.
- Parameters:
test_features (
ndarray
) – Features for test configs- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
posterior_means, posterior_variances
- sample_marginals(test_features, num_samples=1, random_state=None)[source]
See comments of
predict
.- Parameters:
test_features (
ndarray
) – Input points for test configsnum_samples (
int
) – Number of samplesrandom_state (
Optional
[RandomState
]) – PRNG
- Return type:
ndarray
- Returns:
Marginal samples, (num_test, num_samples)
- backward_gradient(input, head_gradients, mean_data, std_data)[source]
Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.
The posterior represented here is based on normalized data, while the acquisition function is based on the de-normalized predictive distribution, which is why we need ‘mean_data’, ‘std_data’ here.
- Parameters:
input (
ndarray
) – Single input point x, shape (d,)head_gradients (
Dict
[str
,ndarray
]) – See Predictor.backward_gradientmean_data (
float
) – Mean used to normalize targetsstd_data (
float
) – Stddev used to normalize targets
- Return type:
ndarray
- Returns:
- sample_joint(test_features, num_samples=1, random_state=None)[source]
See comments of
predict
.- Parameters:
test_features (
ndarray
) – Input points for test configsnum_samples (
int
) – Number of samplesrandom_state (
Optional
[RandomState
]) – PRNG
- Return type:
ndarray
- Returns:
Joint samples, (num_test, num_samples)
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state.backward_gradient_given_predict(predict_func, input, head_gradients, mean_data, std_data)[source]
Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.
The posterior represented here is based on normalized data, while the acquisition function is based on the de-normalized predictive distribution, which is why we need ‘mean_data’, ‘std_data’ here.
- Parameters:
predict_func (
Callable
[[ndarray
],Tuple
[ndarray
,ndarray
]]) – Function mapping input x to mean, varianceinput (
ndarray
) – Single input point x, shape (d,)head_gradients (
Dict
[str
,ndarray
]) – See Predictor.backward_gradientmean_data (
float
) – Mean used to normalize targetsstd_data (
float
) – Stddev used to normalize targets
- Return type:
ndarray
- Returns:
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state.IncrementalUpdateGPPosteriorState(features, targets, mean, kernel, noise_variance, **kwargs)[source]
Bases:
GaussProcPosteriorState
Extension of GaussProcPosteriorState which allows for incremental updating, given that a single data case is appended to the training set.
In order to not mutate members, “the update method returns a new object.”
- update(feature, target)[source]
- Parameters:
feature (
ndarray
) – Additional input xstar, shape (1, d)target (
ndarray
) – Additional target ystar, shape (1, m)
- Return type:
- Returns:
Posterior state for increased data set
- sample_and_update(feature, mean_impute_mask=None, random_state=None)[source]
Draw target(s), shape (1, m), from current posterior state, then update state based on these. The main computation of lvec is shared among the two. If mean_impute_mask is given, it is a boolean vector of size m (number columns of pred_mat). Columns j of target, where mean_impute_ mask[j] is true, are set to the predictive mean (instead of being sampled).
- Parameters:
feature (
ndarray
) – Additional input xstar, shape (1, d)mean_impute_mask – See above
random_state (
Optional
[RandomState
]) – PRN generator
- Return type:
(
ndarray
, IncrementalUpdateGPPosteriorState)- Returns:
target, poster_state_new
- expand_fantasies(num_fantasies)[source]
If this posterior has been created with a single targets vector, shape (n, 1), use this to duplicate this vector m = num_fantasies times. Call this method before fantasy targets are appended by update.
- Parameters:
num_fantasies (
int
) – Number m of fantasy samples- Return type:
- Returns:
New state with targets duplicated m times
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils module
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.cholesky_computations(features, targets, mean, kernel, noise_variance, debug_log=False)[source]
Given input matrix X (features), target matrix Y (targets), mean and kernel function, compute posterior state {L, P}, where L is the Cholesky factor of
k(X, X) + sigsq_final * I
- and
L P = Y - mean(X)
Here, sigsq_final >= noise_variance is minimal such that the Cholesky factorization does not fail.
- Parameters:
features – Input matrix X (n, d)
targets – Target matrix Y (n, m)
mean (
MeanFunction
) – Mean functionkernel (
Union
[KernelFunction
,Tuple
[KernelFunction
,ndarray
]]) – Kernel function, or tuplenoise_variance – Noise variance (may be increased)
debug_log (
bool
) – Debug output during add_jitter CustomOp?
- Returns:
L, P
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.predict_posterior_marginals(features, mean, kernel, chol_fact, pred_mat, test_features)[source]
Computes posterior means and variances for test_features. If pred_mat is a matrix, so will be posterior_means, but not posterior_variances. Reflects the fact that for GP regression and fixed hyperparameters, the posterior mean depends on the targets y, but the posterior covariance does not.
- Parameters:
features – Training inputs
mean (
MeanFunction
) – Mean functionkernel (
Union
[KernelFunction
,Tuple
[KernelFunction
,ndarray
]]) – Kernel function, or tuplechol_fact – Part L of posterior state
pred_mat – Part P of posterior state
test_features – Test inputs
- Returns:
posterior_means, posterior_variances
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.sample_posterior_marginals(features, mean, kernel, chol_fact, pred_mat, test_features, random_state, num_samples=1)[source]
Draws num_sample samples from the product of marginals of the posterior over input points test_features. If pred_mat is a matrix with m columns, the samples returned have shape (n_test, m, num_samples).
- Parameters:
features – Training inputs
mean (
MeanFunction
) – Mean functionkernel (
Union
[KernelFunction
,Tuple
[KernelFunction
,ndarray
]]) – Kernel function, or tuplechol_fact – Part L of posterior state
pred_mat – Part P of posterior state
test_features – Test inputs
num_samples (
int
) – Number of samples to draw
- Returns:
Samples, shape (n_test, num_samples) or (n_test, m, num_samples)
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.sample_posterior_joint(features, mean, kernel, chol_fact, pred_mat, test_features, random_state, num_samples=1)[source]
Draws num_sample samples from joint posterior distribution over inputs test_features. This is done by computing mean and covariance matrix of this posterior, and using the Cholesky decomposition of the latter. If pred_mat is a matrix with m columns, the samples returned have shape (n_test, m, num_samples).
- Parameters:
features – Training inputs
mean (
MeanFunction
) – Mean functionkernel (
Union
[KernelFunction
,Tuple
[KernelFunction
,ndarray
]]) – Kernel function, or tuplechol_fact – Part L of posterior state
pred_mat – Part P of posterior state
test_features – Test inputs
num_samples (
int
) – Number of samples to draw
- Returns:
Samples, shape (n_test, num_samples) or (n_test, m, num_samples)
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.cholesky_update(features, mean, kernel, chol_fact, pred_mat, noise_variance, feature, target, lvec=None)[source]
Incremental update of posterior state (Cholesky factor, prediction matrix), given one datapoint (feature, target).
Note: noise_variance is the initial value, before any jitter may have been added to compute chol_fact. Here, we add the minimum amount of jitter such that the new diagonal entry of the Cholesky factor is >= MIN_CHOLESKY_DIAGONAL_VALUE. This means that if cholesky_update is used several times, we in fact add a diagonal (but not spherical) jitter matrix.
- Parameters:
features – Shape (n, d)
chol_fact – Shape (n, n)
pred_mat – Shape (n, m)
mean (
MeanFunction
) –kernel (
Union
[KernelFunction
,Tuple
[KernelFunction
,ndarray
]]) –noise_variance –
feature – Shape (1, d)
target – Shape (1, m)
lvec – If given, this is the new column of the Cholesky factor except the diagonal entry. If not, this is computed here
- Returns:
chol_fact_new (n+1, n+1), pred_mat_new (n+1, m)
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.slice module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.slice.SliceSampler(log_density, scale, random_state)[source]
Bases:
object
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.slice.gen_random_direction(dimension, random_state)[source]
- Return type:
ndarray
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.target_transform module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.target_transform.ScalarTargetTransform(**kwargs)[source]
Bases:
MeanFunction
Interface for invertible transforms of scalar target values.
forward()
maps original target values \(y\) to latent target values \(z\), the latter are typically modelled as Gaussian.negative_log_jacobian()
returns the term to be added to \(-\log P(z)\), where \(z\) is mapped from \(y\), in order to obtain \(-\log P(y)\).- forward(targets)[source]
- Parameters:
targets – Target vector \(y\) in original form
- Returns:
Transformed latent target vector \(z\)
- negative_log_jacobian(targets)[source]
- Parameters:
targets – Target vector \(y\) in original form
- Returns:
Term to add to \(-\log P(z)\) to obtain \(-\log P(y)\)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.target_transform.IdentityTargetTransform(**kwargs)[source]
Bases:
ScalarTargetTransform
- forward(targets)[source]
- Parameters:
targets – Target vector \(y\) in original form
- Returns:
Transformed latent target vector \(z\)
- negative_log_jacobian(targets)[source]
- Parameters:
targets – Target vector \(y\) in original form
- Returns:
Term to add to \(-\log P(z)\) to obtain \(-\log P(y)\)
- inverse(latents)[source]
- Parameters:
latents – Latent target vector \(z\)
- Returns:
Corresponding target vector \(y\)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.target_transform.BoxCoxTargetTransform(initial_boxcox_lambda=None, **kwargs)[source]
Bases:
ScalarTargetTransform
The Box-Cox transform for \(y > 0\) is parameterized in terms of \(\lambda\):
\[ \begin{align}\begin{aligned}z = T(y, \lambda) = \frac{y^{\lambda} - 1}{\lambda},\quad \lambda\ne 0\\T(y, \lambda=0) = \log y\end{aligned}\end{align} \]One difficulty is that expressions involve division by \(\lambda\). Our implementation separates between (1) \(\lambda \ge \varepsilon\), (2) \(\lambda\le -\varepsilon\), and (3) \(-\varepsilon < \lambda < \varepsilon\), where \(\varepsilon\) is
BOXCOX_LAMBDA_EPS
. In case (3), we use the approximation \(z \approx u + \lambda u^2/2\), where \(u = \log y\).Note that we require \(1 + z\lambda > 0\), which restricts \(z\) if \(\lambda\ne 0\).
Note
Targets must be positive. They are thresholded at
BOXCOX_TARGET_THRES
, so negative targets do not raise an error.The Box-Cox transform has been proposed in the content of Bayesian optimization by
Cowen-Rivers, A. et.al.HEBO: Pushing the Limits of Sample-efficient Hyper-parameter OptimisationJournal of Artificial Intelligence Research 74 (2022), 1269-1349However, they decouple the transformation of targets from fitting the remaining surrogate model parameters, which is possible only under a simplifying assumption (namely, that targets after transform are modelled i.i.d. by a single univariate Gaussian). Instead, we treat \(\lambda\) as just one more parameter to fit along with all the others.
- param_encoding_pairs()[source]
- Returns list of tuples
(param_internal, encoding)
over all Gluon parameters maintained here.
- Returns:
List [(param_internal, encoding)]
- set_params(param_dict)[source]
- Parameters:
param_dict (
Dict
[str
,Any
]) – Dictionary with new hyperparameter values- Returns:
- negative_log_jacobian(targets)[source]
- Parameters:
targets – Target vector \(y\) in original form
- Returns:
Term to add to \(-\log P(z)\) to obtain \(-\log P(y)\)
- forward(targets)[source]
- Parameters:
targets – Target vector \(y\) in original form
- Returns:
Transformed latent target vector \(z\)
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping.Warping(dimension, coordinate_range=None, encoding_type='logarithm', **kwargs)[source]
Bases:
MeanFunction
Warping transform on contiguous range of feature \(x\). Each warped coordinate has two independent warping parameters.
If \(x = [x_1, \dots, x_d]\) and
coordinate_range = (l, r)
, the warping transform operates on \([x_l, \dots, x_{r-1}]\). The default forcoordinate_range
is the full range, and we must havel < r
. The block is the identity on all remaining coordinates. Input coordinates are assumed to lie in \([0, 1]\). The warping transform on each coordinate is due to Kumaraswamy:\[warp(x_j) = 1 - (1 - r(x_j)^{a_j})^{b_j}.\]Here, \(r(x_j)\) linearly maps \([0, 1]\) to \([\epsilon, 1 - \epsilon]\) for a small \(\epsilon > 0\), which avoids numerical issues when taking derivatives.
- Parameters:
dimension (
int
) – Dimension \(d\) of inputcoordinate_range (
Optional
[Tuple
[int
,int
]]) – Range(l, r)
, see above. Default is(0, dimension)
, so the full rangeencoding_type (
str
) – Encoding type
- forward(x)[source]
Actual computation of the warping transformation (see details above)
- Parameters:
x – Input data, shape
(n, d)
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping.warpings_for_hyperparameters(hp_ranges)[source]
It is custom to warp hyperparameters which are not categorical. This function creates warpings based on your configuration space.
- Parameters:
hp_ranges (
HyperparameterRanges
) – Encoding of configuration space- Return type:
List
[Warping
]- Returns:
To be used as
warpings
inWarpedKernel
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping.kernel_with_warping(kernel, hp_ranges)[source]
Note that the coordinates corresponding to categorical parameters are not warped.
- Parameters:
kernel (
KernelFunction
) – Kernel \(k(x, x')\) without warpinghp_ranges (
HyperparameterRanges
) – Encoding of configuration space
- Return type:
- Returns:
Kernel with warping
- class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping.WarpedKernel(kernel, warpings, **kwargs)[source]
Bases:
KernelFunction
Block that composes warping with an arbitrary kernel. We allow for a list of warping transforms, so that a non-contiguous set of input coordinates can be warped.
It is custom to warp hyperparameters which are not categorical. You can use
kernel_with_warping()
to furnish a kernel with warping for all non-categorical hyperparameters.- Parameters:
kernel (
KernelFunction
) – Kernel \(k(x, x')\)warpings (
List
[Warping
]) – List of warping transforms, which are applied sequentially. Ranges of different entries should be non-overlapping, this is not checked.
- forward(X1, X2)[source]
Overrides to implement forward computation using
NDArray
. Only accepts positional arguments. Parameters ———- *args : list of NDArrayInput tensors.
- diagonal(X)[source]
- Parameters:
X – Input data, shape
(n, d)
- Returns:
Diagonal of \(k(X, X)\), shape
(n,)
- diagonal_depends_on_X()[source]
For stationary kernels, diagonal does not depend on
X
- Returns:
Does
diagonal()
depend onX
?
syne_tune.optimizer.schedulers.searchers.bayesopt.models package
Subpackages
syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.cost_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.cost_model.CostValue(c0, c1)[source]
Bases:
object
Represents cost value \((c_0(x), c_1(x))\):
\(c_0(x)\): Startup cost for evaluation at config \(x\)
\(c_1(x)\): Cost per unit of resource \(r\) at config \(x\)
Our assumption is that, under the model, an evaluation at \(x\) until resource level \(r = 1, 2, 3, \dots\) costs \(c(x, r) = c_0(x) + r c_1(x)\)
-
c0:
float
-
c1:
float
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.cost_model.CostModel[source]
Bases:
object
Interface for (temporal) cost model in the context of multi-fidelity HPO. We assume there are configurations \(x\) and resource levels \(r\) (for example, number of epochs). Here, \(r\) is a positive int. Can be seen as simplified version of surrogate model, which is mainly used in order to draw (jointly dependent) values from the posterior over cost values \((c_0(x), c_1(x))\).
Note: The model may be random (in which case joint samples are drawn from the posterior) or deterministic (in which case the model is fitted to data, and then cost values returned are deterministic.
A cost model has an inner state, which is set by calling
update()
passing a dataset. This inner state is then used whensample_joint()
is called.- property cost_metric_name: str
- Returns:
Name of metric in
TrialEvaluations
of cases inTuningJobState
- update(state)[source]
Update inner representation in order to be ready to return cost value samples.
Note: The metric :attr``cost_metric_name`` must be dict-valued in
state
, with keys being resource values \(r\). In order to support a proper estimation of \(c_0\) and \(c_1\), there should (ideally) be entries with the same \(x\) and different resource levels \(r\). The likelihood function takes into account that \(c(x, r) = c_0(x) + r c_1(x)\).- Parameters:
state (
TuningJobState
) – Current dataset (onlytrials_evaluations
is used)
- resample()[source]
For a random cost model, the state is resampled, such that calls of joint_sample before and after are conditionally independent. Normally, successive calls of sample_joint are jointly dependent. For example, for a linear model, the state resampled here would be the weight vector, which is then used in
sample_joint()
.For a deterministic cost model, this method does nothing.
- sample_joint(candidates)[source]
Draws cost values \((c_0(x), c_1(x))\) for candidates (non-extended).
If the model is random, the sampling is done jointly. Also, if
sample_joint()
is called multiple times, the posterior is to be updated after each call, such that the sample over the union of candidates over all calls is drawn jointly (but seeresample()
). Also, if measurement noise is allowed in update, this noise is not added here. A sample from \(c(x, r)\) is obtained as \(c_0(x) + r c_1(x)\). If the model is deterministic, the model determined inupdate()
is just evaluated.- Parameters:
candidates (
List
[Dict
[str
,Union
[int
,float
,str
]]]) – Non-extended configs- Return type:
List
[CostValue
]- Returns:
List of \((c_0(x), c_1(x))\)
- static event_time(start_time, level, next_milestone, cost)[source]
If a task reported its last recent value at
start_time
at levellevel
, return time of reaching levelnext_milestone
, given costcost
.- Parameters:
start_time (
float
) – See abovelevel (
int
) – See abovenext_milestone (
int
) – See abovecost (
CostValue
) – See above
- Return type:
float
- Returns:
Time of reaching
next_milestone
under cost model
- predict_times(candidates, resources, cost_values, start_time=0)[source]
Given configs \(x\), resource values \(r\) and cost values returned by
sample_joint()
, compute time predictions for when each config \(x\) reaches its resource level \(r\) if started atstart_time
.- Parameters:
candidates (
List
[Dict
[str
,Union
[int
,float
,str
]]]) – Configsresources (
List
[int
]) – Resource levelscost_values (
List
[CostValue
]) – Cost values fromsample_joint()
start_time (
float
) – See above
- Return type:
List
[float
]- Returns:
Predicted times
syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model.LinearCostModel[source]
Bases:
CostModel
Deterministic cost model where both
c0(x)
andc1(x)
are linear models of the formc0(x) = np.dot(features0(x), weights0)
,c1(x) = np.dot(features1(x), weights1)
The feature maps
features0
,features1
are supplied by subclasses. The weights are fit by ridge regression, usingscikit.learn.RidgeCV
, the regularization constant is set by LOO cross-validation.- property cost_metric_name: str
- Returns:
Name of metric in
TrialEvaluations
of cases inTuningJobState
- feature_matrices(candidates)[source]
Has to be supplied by subclasses
- Parameters:
candidates (
List
[Dict
[str
,Union
[int
,float
,str
]]]) – List of n candidate configs (non-extended)- Return type:
(
ndarray
,ndarray
)- Returns:
Feature matrices
features0
(n, dim0)
,features1
(n, dim1)
- update(state)[source]
Update inner representation in order to be ready to return cost value samples.
Note: The metric :attr``cost_metric_name`` must be dict-valued in
state
, with keys being resource values \(r\). In order to support a proper estimation of \(c_0\) and \(c_1\), there should (ideally) be entries with the same \(x\) and different resource levels \(r\). The likelihood function takes into account that \(c(x, r) = c_0(x) + r c_1(x)\).- Parameters:
state (
TuningJobState
) – Current dataset (onlytrials_evaluations
is used)
- sample_joint(candidates)[source]
Draws cost values \((c_0(x), c_1(x))\) for candidates (non-extended).
If the model is random, the sampling is done jointly. Also, if
sample_joint()
is called multiple times, the posterior is to be updated after each call, such that the sample over the union of candidates over all calls is drawn jointly (but seeresample()
). Also, if measurement noise is allowed in update, this noise is not added here. A sample from \(c(x, r)\) is obtained as \(c_0(x) + r c_1(x)\). If the model is deterministic, the model determined inupdate()
is just evaluated.- Parameters:
candidates (
List
[Dict
[str
,Union
[int
,float
,str
]]]) – Non-extended configs- Return type:
List
[CostValue
]- Returns:
List of \((c_0(x), c_1(x))\)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model.MLPLinearCostModel(num_inputs, num_outputs, num_hidden_layers, hidden_layer_width, batch_size, bs_exponent=None, extra_mlp=False, c0_mlp_feature=False, expected_hidden_layer_width=None)[source]
Bases:
LinearCostModel
Deterministic linear cost model for multi-layer perceptron.
If config is a HP configuration,
num_hidden_layers(config)
is the number of hidden layers,hidden_layer_width(config, layer)
is the number of units in hidden layerlayer
(0-based),batch_size(config)
is the batch size.If
expected_hidden_layer_width
is given, it mapslayer
(0-based) to expected layer width under random sampling. In this case, all MLP features are normalized to expected value 1 under random sampling (but ignoringbs_exponent
if != 1). Note: If needed, we could incorporatebs_exponent
in general. Ifbatch_size
was uniform between a and b:\[ext{E}\left[ bs^{bs_{exp} - 1} \]ight] =
rac{ ext{b^{bs_{exp}} - a^{bs_{exp}} }{ (bs_{exp} * (b - a) }
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model.FixedLayersMLPCostModel(num_inputs, num_outputs, num_units_keys=None, bs_exponent=None, extra_mlp=False, c0_mlp_feature=False, expected_hidden_layer_width=None)[source]
Bases:
MLPLinearCostModel
Linear cost model for MLP with
num_hidden_layers
hidden layers.Constructs expected_hidden_layer_width function from the training evaluation function. Works because
impute_points_to_evaluate
imputes with the expected value under random sampling.- Parameters:
config_space (
Dict
) – Configuration spacenum_units_keys (
List
[str
]) – Keys intoconfig_space
for number of units of different layers
- Returns:
expected_hidden_layer_width
,exp_vals
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model.NASBench201LinearCostModel(config_keys, map_config_values, conv_separate_features, count_sum)[source]
Bases:
LinearCostModel
Deterministic linear cost model for NASBench201.
The cell graph is:
node1 = x0(node0)
node2 = x1(node0) + x2(node1)
node3 = x3(node0) + x4(node1) + x5(node2)
config_keys
contains attribute names ofx0, ..., x5
in a config, in this ordering.map_config_values
maps values in the config (for fields corresponding tox0, ..., x5
) to entries ofOp
.- Parameters:
config_keys (
Tuple
[str
,...
]) – See abovemap_config_values (
Dict
[str
,int
]) – See aboveconv_separate_features (
bool
) – If True, we use separate features fornor_conv_1x1
,nor_conv_3x3
(c1
has 4 features). Otherwise, these two are captured by a single features (c1
has 3 features)count_sum (
bool
) – If True, we use an additional feature for pointwise sum operators inside a cell (there are between 0 and 3)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model.BiasOnlyLinearCostModel[source]
Bases:
LinearCostModel
Simple baseline:
features0(x) = [1], features1(x) = [1]
syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.sklearn_cost_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.sklearn_cost_model.ScikitLearnCostModel(model_type=None)[source]
Bases:
NonLinearCostModel
Deterministic cost model, where
c0(x) = b0
(constant), andc1(x)
is given by ascikit.learn
(orscipy
) regression model. Parameters areb0
and those of the regression model.- Parameters:
model_type (
Optional
[str
]) – Regression model forc1(x)
- transform_dataset(dataset, num_data0, res_min)[source]
Transforms dataset (see
_data_for_c1_regression()
) into a dataset representation (dict), which is used askwargs
infit_regressor()
.- Parameters:
dataset (
List
[Tuple
[Dict
[str
,Union
[int
,float
,str
]],float
]]) –num_data0 (
int
) –res_min (
int
) –
- Return type:
Dict
[str
,Any
]- Returns:
Used as kwargs in fit_regressor
- static fit_regressor(b0, **kwargs)[source]
Given value for
b0
, fits regressor to dataset specified via kwargs (seetransform_dataset()
). Returns the criterion function value forb0
as well as the fitted regression model.- Parameters:
b0 (
float
) –kwargs –
- Returns:
fval, model
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.sklearn_cost_model.UnivariateSplineCostModel(scalar_attribute, input_range, spline_degree=3)[source]
Bases:
NonLinearCostModel
Here,
c1(x)
is given by a univariate spline (UnivariateSpline
), where a single scalar is extracted from x.In the second part of the dataset (
pos >= num_data0
), duplicate entries with the same config in dataset are grouped into one, using the mean as target value, and a weight equal to the number of duplicates. This still leaves duplicates in the overall dataset, one in data0, the other indata1
, but spline smoothing can deal with this.- transform_dataset(dataset, num_data0, res_min)[source]
Transforms dataset (see
_data_for_c1_regression()
) into a dataset representation (dict), which is used askwargs
infit_regressor()
.- Parameters:
dataset (
List
[Tuple
[Dict
[str
,Union
[int
,float
,str
]],float
]]) –num_data0 (
int
) –res_min (
int
) –
- Return type:
Dict
[str
,Any
]- Returns:
Used as kwargs in fit_regressor
- static fit_regressor(b0, **kwargs)[source]
Given value for
b0
, fits regressor to dataset specified via kwargs (seetransform_dataset()
). Returns the criterion function value forb0
as well as the fitted regression model.- Parameters:
b0 (
float
) –kwargs –
- Returns:
fval, model
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.models.acqfunc_factory module
- syne_tune.optimizer.schedulers.searchers.bayesopt.models.acqfunc_factory.acquisition_function_factory(name, **kwargs)[source]
- Return type:
Callable
[[Any
],AcquisitionFunction
]
syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost_fifo_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost_fifo_model.CostFixedResourcePredictor(state, model, fixed_resource, num_samples=1)[source]
Bases:
BasePredictor
Wraps cost model \(c(x, r)\) of
CostModel
to be used as surrogate model, where predictions are done at r =fixed_resource
.Note: For random cost models, we approximate expectations in
predict
by resamplingnum_samples
times (should be 1 for deterministic cost models).Note: Since this is a generic wrapper, we assume for
backward_gradient
that the gradient contribution through the cost model vanishes. For special cost models, the mapping from encoded input to predictive means may be differentiable, and prediction code inautograd
may be available. For such cost models, this wrapper should not be used, andbackward_gradient
should be implemented properly.- Parameters:
state (
TuningJobState
) – TuningJobSubStatemodel (
CostModel
) – Model parameters must have been fitfixed_resource (
int
) – \(c(x, r)\) is predicted for this resource level rnum_samples (
int
) – Number of samples drawn inpredict()
. Use this for random cost models only
- static keys_predict()[source]
Keys of signals returned by
predict()
.Note: In order to work with
AcquisitionFunction
implementations, the following signals are required:“mean”: Predictive mean
“std”: Predictive standard deviation
- Return type:
Set
[str
]- Returns:
Set of keys for
dict
returned bypredict()
- predict(inputs)[source]
Returns signals which are statistics of the predictive distribution at input points
inputs
. By default:“mean”: Predictive means. If the model supports fantasizing with a number
nf
of fantasies, this has shape(n, nf)
, otherwise(n,)
“std”: Predictive stddevs, shape
(n,)
If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.
- Parameters:
inputs (
ndarray
) – Input points, shape(n, d)
- Return type:
List
[Dict
[str
,ndarray
]]- Returns:
List of
dict
with keyskeys_predict()
, of length the number of MCMC samples, or length 1 for empirical Bayes
- backward_gradient(input, head_gradients)[source]
The gradient contribution through the cost model is blocked.
- Return type:
List
[ndarray
]
- predict_mean_current_candidates()[source]
Returns the predictive mean (signal with key ‘mean’) at all current candidates in the state (observed, pending).
If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.
- Return type:
List
[ndarray
]- Returns:
List of predictive means
- current_best()[source]
Returns the so-called incumbent, to be used in acquisition functions such as expected improvement. This is the minimum of predictive means (signal with key “mean”) at all current candidate locations (both state.trials_evaluations and state.pending_evaluations). Normally, a scalar is returned, but if the model supports fantasizing and the state contains pending evaluations, there is one incumbent per fantasy sample, so a vector is returned.
If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.
- Return type:
List
[ndarray
]- Returns:
Incumbent, see above
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost_fifo_model.CostEstimator(model, fixed_resource, num_samples=1)[source]
Bases:
Estimator
The name of the cost metric is
model.cost_metric_name
.- Parameters:
model (
CostModel
) – CostModel to be wrappedfixed_resource (
int
) – \(c(x, r)\) is predicted for this resource level rnum_samples (
int
) – Number of samples drawn inpredict()
. Use this for random cost models only
- property fixed_resource: int
syne_tune.optimizer.schedulers.searchers.bayesopt.models.estimator module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.estimator.Estimator[source]
Bases:
object
Interface for surrogate models used in
ModelStateTransformer
.In general, a surrogate model is probabilistic (or Bayesian), in that predictions are driven by a posterior distribution, represented in a posterior state of type
Predictor
. The model may also come with tunable (hyper)parameters, such as for example covariance function parameters for a Gaussian process model. These parameters can be accessed withget_params()
,set_params()
.- fit_from_state(state, update_params)[source]
Creates a
Predictor
object based on data instate
. For a Bayesian model, this involves computing the posterior state, which is wrapped in thePredictor
object.If the model also has (hyper)parameters, these are learned iff
update_params == True
. Otherwise, these parameters are not changed, but only the posterior state is computed. The idea is that in general, model fitting is much more expensive than just creating the final posterior state (or predictor). It then makes sense to partly work with stale model parameters.If your surrogate model is not Bayesian, or does not have hyperparameters, you can ignore the
update_params
argument,- Parameters:
state (
TuningJobState
) – Current data model parameters are to be fit on, and the posterior state is to be computed fromupdate_params (
bool
) – See above
- Return type:
- Returns:
Predictor, wrapping the posterior state
- property debug_log: DebugLogPrinter | None
- configure_scheduler(scheduler)[source]
Called by
configure_scheduler()
of searchers which make use of an class:Estimator
. Allows the estimator to depend on parameters of the scheduler.- Parameters:
scheduler – Scheduler object
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.estimator.TransformedData(features, targets, mean, std)[source]
Bases:
object
-
features:
ndarray
-
targets:
ndarray
-
mean:
float
-
std:
float
-
features:
- syne_tune.optimizer.schedulers.searchers.bayesopt.models.estimator.transform_state_to_data(state, active_metric=None, normalize_targets=True, num_fantasy_samples=1)[source]
Transforms
TuningJobState
objectstate
to features and targets. The former are encoded vectors fromstate.hp_ranges
. The latter are normalized to zero mean, unit variance ifnormalize_targets == True
, in which case the original mean and stddev is also returned.If
state.pending_evaluations
is not empty, it must contain entries of typeFantasizedPendingEvaluation
, which contain the fantasy samples. This is the case only for internal states.- Parameters:
state (
TuningJobState
) –TuningJobState
to transformactive_metric (
Optional
[str
]) – Name of target metric (optional)normalize_targets (
bool
) – Normalize targets? Defaults toTrue
num_fantasy_samples (
int
) – Number of fantasy samples. Defaults to 1
- Return type:
- Returns:
Transformed data
syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_mcmc_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_mcmc_model.GaussProcMCMCEstimator(gpmodel, active_metric='target', normalize_targets=True, debug_log=None, filter_observed_data=None, hp_ranges_for_prediction=None)[source]
Bases:
GaussProcEstimator
We support pending evaluations via fantasizing. Note that state does not contain the fantasy values, but just the pending configs. Fantasy values are sampled here.
We draw one fantasy sample per MCMC sample here. This could be extended by sampling
> 1
fantasy samples for each MCMC sample.- Parameters:
gpmodel (
GPRegressionMCMC
) – GPRegressionMCMC modelactive_metric (
str
) – Name of the metric to optimize.normalize_targets (
bool
) – Normalize target values instate.trials_evaluations
?
syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model.GaussProcPredictor(state, gpmodel, fantasy_samples, active_metric=None, normalize_mean=0.0, normalize_std=1.0, filter_observed_data=None, hp_ranges_for_prediction=None)[source]
Bases:
BasePredictor
Gaussian process surrogate model, where model parameters are either fit by marginal likelihood maximization (e.g.,
GaussianProcessRegression
), or integrated out by MCMC sampling (e.g.,GPRegressionMCMC
).Both
state
andgpmodel
are immutable. If parameters of the latter are to be fit, this has to be done before.fantasy_samples
contains the sampled (normalized) target values for pending configs. Onlyactive_metric
target values are considered. The target values for a pending config are a flat vector. If MCMC is used, its length is a multiple of the number of MCMC samples, containing the fantasy values for MCMC sample 0, sample 1, …- Parameters:
state (
TuningJobState
) – TuningJobSubStategpmodel (
Union
[GaussianProcessRegression
,GPRegressionMCMC
,IndependentGPPerResourceModel
,HyperTuneIndependentGPModel
,HyperTuneJointGPModel
]) – Model parameters must have been fit and/or posterior states been computedfantasy_samples (
List
[FantasizedPendingEvaluation
]) – See aboveactive_metric (
Optional
[str
]) – Name of the metric to optimize.normalize_mean (
float
) – Mean used to normalize targetsnormalize_std (
float
) – Stddev used to normalize targets
- hp_ranges_for_prediction()[source]
- Return type:
- Returns:
Feature generator to be used for
inputs
inpredict()
- predict(inputs)[source]
Returns signals which are statistics of the predictive distribution at input points
inputs
. By default:“mean”: Predictive means. If the model supports fantasizing with a number
nf
of fantasies, this has shape(n, nf)
, otherwise(n,)
“std”: Predictive stddevs, shape
(n,)
If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.
- Parameters:
inputs (
ndarray
) – Input points, shape(n, d)
- Return type:
List
[Dict
[str
,ndarray
]]- Returns:
List of
dict
with keyskeys_predict()
, of length the number of MCMC samples, or length 1 for empirical Bayes
- backward_gradient(input, head_gradients)[source]
Computes the gradient \(\nabla_x f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by
predict()
for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.Lists have
> 1
entry if MCMC is used, otherwise they are all size 1.- Parameters:
input (
ndarray
) – Single input point \(x\), shape(d,)
head_gradients (
List
[Dict
[str
,ndarray
]]) – See above
- Return type:
List
[ndarray
]- Returns:
Gradient \(\nabla_x f(x)\) (several if MCMC is used)
- property posterior_states: List[PosteriorState] | None
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model.GaussProcEstimator(gpmodel, active_metric, normalize_targets=True, debug_log=None, filter_observed_data=None, no_fantasizing=False, hp_ranges_for_prediction=None)[source]
Bases:
Estimator
We support pending evaluations via fantasizing. Note that state does not contain the fantasy values, but just the pending configs. Fantasy values are sampled here.
- Parameters:
gpmodel (
Union
[GaussianProcessRegression
,GPRegressionMCMC
,IndependentGPPerResourceModel
,HyperTuneIndependentGPModel
,HyperTuneJointGPModel
]) – Internal modelactive_metric (
str
) – Name of the metric to optimize.normalize_targets (
bool
) – Normalize observed target values?debug_log (
Optional
[DebugLogPrinter
]) – DebugLogPrinter (optional)filter_observed_data (
Optional
[Callable
[[Dict
[str
,Union
[int
,float
,str
]]],bool
]]) – Filter for observed data before computing incumbentno_fantasizing (
bool
) – If True, pending evaluations in the state are simply ignored, fantasizing is not done (not recommended)hp_ranges_for_prediction (
Optional
[HyperparameterRanges
]) – If given,GaussProcPredictor
should use this instead ofstate.hp_ranges
- property debug_log: DebugLogPrinter | None
- property gpmodel: GaussianProcessRegression | GPRegressionMCMC | IndependentGPPerResourceModel | HyperTuneIndependentGPModel | HyperTuneJointGPModel
- fit_from_state(state, update_params)[source]
Parameters of
self._gpmodel
are optimized iffupdate_params
. This requiresstate
to contain labeled examples.If
self.state.pending_evaluations
is not empty, we proceed as follows: :rtype:Predictor
Compute posterior for state without pending evals
Draw fantasy values for pending evals
Recompute posterior (without fitting)
- configure_scheduler(scheduler)[source]
Called by
configure_scheduler()
of searchers which make use of an class:Estimator
. Allows the estimator to depend on parameters of the scheduler.- Parameters:
scheduler – Scheduler object
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model.GaussProcEmpiricalBayesEstimator(gpmodel, num_fantasy_samples, active_metric='target', normalize_targets=True, debug_log=None, filter_observed_data=None, no_fantasizing=False, hp_ranges_for_prediction=None)[source]
Bases:
GaussProcEstimator
We support pending evaluations via fantasizing. Note that state does not contain the fantasy values, but just the pending configs. Fantasy values are sampled here.
- Parameters:
gpmodel (
Union
[GaussianProcessRegression
,GPRegressionMCMC
,IndependentGPPerResourceModel
,HyperTuneIndependentGPModel
,HyperTuneJointGPModel
]) –GaussianProcessRegression
modelnum_fantasy_samples (
int
) – See aboveactive_metric (
str
) – Name of the metric to optimize.normalize_targets (
bool
) – Normalize target values instate.candidate_evaluations
?
syne_tune.optimizer.schedulers.searchers.bayesopt.models.gpiss_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gpiss_model.GaussProcAdditivePredictor(state, gpmodel, fantasy_samples, active_metric, filter_observed_data=None, normalize_mean=0.0, normalize_std=1.0)[source]
Bases:
BasePredictor
Gaussian Process additive surrogate model, where model parameters are fit by marginal likelihood maximization.
Note:
predict_mean_current_candidates()
callspredict()
for all observed and pending extended configs. This may not be exactly correct, becausepredict()
is not meant to be used for configs which have observations (it IS correct at \(r = r_{max}\)).fantasy_samples
contains the sampled (normalized) target values for pending configs. Onlyactive_metric
target values are considered. The target values for a pending config are a flat vector.- Parameters:
state (
TuningJobState
) – TuningJobSubStategpmodel (
GaussianProcessLearningCurveModel
) – Parameters must have been fitfantasy_samples (
List
[FantasizedPendingEvaluation
]) – See aboveactive_metric (
str
) – See parent classfilter_observed_data (
Optional
[Callable
[[Dict
[str
,Union
[int
,float
,str
]]],bool
]]) – See parent classnormalize_mean (
float
) – Mean used to normalize targetsnormalize_std (
float
) – Stddev used to normalize targets
- predict(inputs)[source]
Input features
inputs
are w.r.t. extended configs(x, r)
.- Parameters:
inputs (
ndarray
) – Input features- Return type:
List
[Dict
[str
,ndarray
]]- Returns:
Predictive means, stddevs
- backward_gradient(input, head_gradients)[source]
Computes the gradient \(\nabla_x f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by
predict()
for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.Lists have
> 1
entry if MCMC is used, otherwise they are all size 1.- Parameters:
input (
ndarray
) – Single input point \(x\), shape(d,)
head_gradients (
List
[Dict
[str
,ndarray
]]) – See above
- Return type:
List
[ndarray
]- Returns:
Gradient \(\nabla_x f(x)\) (several if MCMC is used)
- property posterior_states: List[GaussProcAdditivePosteriorState] | None
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gpiss_model.GaussProcAdditiveEstimator(gpmodel, num_fantasy_samples, active_metric, config_space_ext, normalize_targets=False, debug_log=None, filter_observed_data=None)[source]
Bases:
Estimator
If
num_fantasy_samples > 0
, we draw this many fantasy targets independently, while each sample is dependent over all pending evaluations. Ifnum_fantasy_samples == 0
, pending evaluations instate
are ignored.- Parameters:
gpmodel (
GaussianProcessLearningCurveModel
) – GaussianProcessLearningCurveModelnum_fantasy_samples (
int
) – See aboveactive_metric (
str
) – Name of the metric to optimize.config_space_ext (
ExtendedConfiguration
) – ExtendedConfigurationnormalize_targets (
bool
) – Normalize observed target values?debug_log (
Optional
[DebugLogPrinter
]) – DebugLogPrinter (optional)filter_observed_data (
Optional
[Callable
[[Dict
[str
,Union
[int
,float
,str
]]],bool
]]) – Filter for observed data before computing incumbent
- property debug_log: DebugLogPrinter | None
- fit_from_state(state, update_params)[source]
Creates a
Predictor
object based on data instate
. For a Bayesian model, this involves computing the posterior state, which is wrapped in thePredictor
object.If the model also has (hyper)parameters, these are learned iff
update_params == True
. Otherwise, these parameters are not changed, but only the posterior state is computed. The idea is that in general, model fitting is much more expensive than just creating the final posterior state (or predictor). It then makes sense to partly work with stale model parameters.If your surrogate model is not Bayesian, or does not have hyperparameters, you can ignore the
update_params
argument,- Parameters:
state (
TuningJobState
) – Current data model parameters are to be fit on, and the posterior state is to be computed fromupdate_params (
bool
) – See above
- Return type:
- Returns:
Predictor, wrapping the posterior state
- predictor_for_fantasy_samples(state, fantasy_samples)[source]
Same as
model
withfit_params=False
, butfantasy_samples
are passed in, rather than sampled here.- Parameters:
state (
TuningJobState
) – Seemodel
fantasy_samples (
List
[FantasizedPendingEvaluation
]) – See above
- Return type:
- Returns:
See
model
- configure_scheduler(scheduler)[source]
Called by
configure_scheduler()
of searchers which make use of an class:Estimator
. Allows the estimator to depend on parameters of the scheduler.- Parameters:
scheduler – Scheduler object
syne_tune.optimizer.schedulers.searchers.bayesopt.models.kernel_factory module
- syne_tune.optimizer.schedulers.searchers.bayesopt.models.kernel_factory.base_kernel_factory(name, dimension, **kwargs)[source]
- Return type:
- syne_tune.optimizer.schedulers.searchers.bayesopt.models.kernel_factory.resource_kernel_factory(name, kernel_x, mean_x, **kwargs)[source]
Given kernel function
kernel_x
and mean functionmean_x
over configx
, create kernel and mean functions over(x, r)
, wherer
is the resource attribute (nonnegative scalar, usually in[0, 1]
).Note: For
name in ["matern52", "matern52-res-warp"]
, ifkernel_x
is of typeWarpedKernel
, the resulting kernel inherits this warping.- Parameters:
name (
str
) – Selects resource kernel typekernel_x (
KernelFunction
) – Kernel function over configsx
mean_x (
MeanFunction
) – Mean function over configsx
kwargs – Extra arguments (optional)
- Return type:
- Returns:
(res_kernel, res_mean)
, both over(x, r)
syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc.HeadWithGradient(hval, gradient)[source]
Bases:
object
gradient
maps each output model to a dict of head gradients, whose keys are those used bypredict
(e.g.,mean
,std
)-
hval:
ndarray
-
gradient:
Dict
[str
,Dict
[str
,ndarray
]]
-
hval:
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc.CurrentBestProvider[source]
Bases:
object
Helper class for
MeanStdAcquisitionFunction
. Thecurrent_best
values required incompute_acq()
andcompute_acq_with_gradient()
may depend on the MCMC sample index for each model (if none of the models use MCMC, this index is always(0, 0, ..., 0)
).
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc.NoneCurrentBestProvider[source]
Bases:
CurrentBestProvider
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc.ActiveMetricCurrentBestProvider(active_metric_current_best)[source]
Bases:
CurrentBestProvider
Default implementation in which
current_best
depends on the active metric only.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc.MeanStdAcquisitionFunction(predictor, active_metric=None)[source]
Bases:
AcquisitionFunction
Base class for standard acquisition functions which depend on predictive mean and stddev. Subclasses have to implement the head and its derivatives w.r.t. mean and std:
\[f(x, \mathrm{model}) = h(\mathrm{mean}, \mathrm{std}, \mathrm{model.current_best}())\]If model is a
Predictor
, then active_metric is ignored. If model is adict
mapping output names to models, then active_metric must be given.Note that acquisition functions will always be minimized!
- compute_acq(inputs, predictor=None)[source]
Note: If inputs has shape
(d,)
, it is taken to be(1, d)
syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.EIAcquisitionFunction(predictor, active_metric=None, jitter=0.01, debug_collect_stats=False)[source]
Bases:
MeanStdAcquisitionFunction
Minus expected improvement acquisition function (minus because the convention is to always minimize acquisition functions)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.LCBAcquisitionFunction(predictor, kappa, active_metric=None)[source]
Bases:
MeanStdAcquisitionFunction
Lower confidence bound (LCB) acquisition function:
\[h(\mu, \sigma) = \mu - \kappa * \sigma\]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.EIpuAcquisitionFunction(predictor, active_metric=None, exponent_cost=1.0, jitter=0.01)[source]
Bases:
MeanStdAcquisitionFunction
Minus cost-aware expected improvement acquisition function.
This is defined as
\[\mathrm{EIpu}(x) = \frac{\mathrm{EI(x)}}{\mathrm{power}(\mathrm{cost}(x), \mathrm{exponent_cost})},\]where \(\mathrm{EI}(x)\) is expected improvement, \(\mathrm{cost}(x)\) is the predictive mean of a cost model, and
exponent_cost
is an exponent in \((0, 1]\).exponent_cost
scales the influence of the cost term on the acquisition function. See also:Note: two metrics are expected in the model output: the main objective and the cost. The main objective needs to be indicated as
active_metric
when initializingEIpuAcquisitionFunction
. The cost is automatically assumed to be the other metric.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.ConstraintCurrentBestProvider(current_best_list, num_samples_active)[source]
Bases:
CurrentBestProvider
Here,
current_best
depends on two predictors, for active and constraint metric.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.CEIAcquisitionFunction(predictor, active_metric=None, jitter=0.01)[source]
Bases:
MeanStdAcquisitionFunction
Minus constrained expected improvement acquisition function. (minus because the convention is to always minimize the acquisition function)
This is defined as
CEI(x) = EI(x) * P(c(x) <= 0)
, whereEI
is the standard expected improvement with respect to the current feasible best, andP(c(x) <= 0)
is the probability that the hyperparameter configurationx
satisfies the constraint modeled byc(x)
.If there are no feasible hyperparameters yet, the current feasible best is undefined. Thus,
CEI
is reduced to theP(c(x) <= 0)
term until a feasible configuration is found.Two metrics are expected in the model output: the main objective and the constraint metric. The main objective needs to be indicated as
active_metric
when initializingCEIAcquisitionFunction
. The constraint is automatically assumed to be the other metric.References on
CEI
:Gardner et al.Bayesian Optimization with Inequality ConstraintsICML 2014and
Gelbart et al.Bayesian Optimization with Unknown ConstraintsUAI 2014.
- syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.get_quantiles(acquisition_par, fmin, m, s)[source]
Quantiles of the Gaussian distribution, useful to determine the acquisition function values.
- Parameters:
acquisition_par – parameter of the acquisition function
fmin – current minimum.
m – vector of means.
s – vector of standard deviations.
syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_base module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_base.BasePredictor(state, active_metric=None, filter_observed_data=None)[source]
Bases:
Predictor
Base class for (most)
Predictor
implementations, provides common code.- property filter_observed_data: Callable[[Dict[str, int | float | str]], bool] | None
- predict_mean_current_candidates()[source]
Returns the predictive mean (signal with key ‘mean’) at all current candidates in the state (observed, pending).
If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.
- Return type:
List
[ndarray
]- Returns:
List of predictive means
- current_best()[source]
Returns the so-called incumbent, to be used in acquisition functions such as expected improvement. This is the minimum of predictive means (signal with key “mean”) at all current candidate locations (both state.trials_evaluations and state.pending_evaluations). Normally, a scalar is returned, but if the model supports fantasizing and the state contains pending evaluations, there is one incumbent per fantasy sample, so a vector is returned.
If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.
- Return type:
List
[ndarray
]- Returns:
Incumbent, see above
syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt.SkipOptimizationPredicate[source]
Bases:
object
Interface for
skip_optimization
predicate inModelStateTransformer
.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt.NeverSkipPredicate[source]
Bases:
SkipOptimizationPredicate
Hyperparameter optimization is never skipped.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt.AlwaysSkipPredicate[source]
Bases:
SkipOptimizationPredicate
Hyperparameter optimization is always skipped.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt.SkipPeriodicallyPredicate(init_length, period, metric_name='target')[source]
Bases:
SkipOptimizationPredicate
Let
N
be the number of labeled points for metricmetric_name
. Optimizations are not skipped ifN < init_length
. Afterwards, we increase a counter wheneverN
is larger than in the previous call. With respect to this counter, optimizations are done everyperiod
times, in between they are skipped.- Parameters:
init_length (
int
) – See aboveperiod (
int
) – See abovemetric_name (
str
) – Name of internal metric. Defaults toINTERNAL_METRIC_NAME
.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt.SkipNoMaxResourcePredicate(init_length, max_resource, metric_name='target')[source]
Bases:
SkipOptimizationPredicate
This predicate works for multi-fidelity HPO, see for example
GPMultiFidelitySearcher
.We track the number of labeled datapoints at resource level
max_resource
. HP optimization is skipped if the total numberN
of labeled cases isN >= init_length
, and if the number ofmax_resource
cases has not increased since the last recent optimization.This means that as long as the dataset only grows w.r.t. cases at lower resources than
max_resource
, this does not trigger HP optimization.- Parameters:
init_length (
int
) – See abovemax_resource (
int
) – See abovemetric_name (
str
) – Name of internal metric. Defaults toINTERNAL_METRIC_NAME
.
syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_transformer module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_transformer.StateForModelConverter[source]
Bases:
object
Interface for state converters (optionally) used in
ModelStateTransformer
. These are applied to a state before being passed to the model for fitting and predictions. The main use case is to filter down data if fitting the model scales super-linearly.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_transformer.ModelStateTransformer(estimator, init_state, skip_optimization=None, state_converter=None)[source]
Bases:
object
This class maintains the
TuningJobState
object alongside an HPO experiment, and manages the reaction to changes of this state. In particular, it provides a fitted surrogate model on demand, which encapsulates the GP posterior.The state transformer is generic, it uses
Estimator
for anything specific to the model type.skip_optimization
is a predicate depending on the state, determining what is done at the next recent call ofmodel
. IfFalse
, the model parameters are refit, otherwise the current ones are not changed (which is usually faster, but risks stale-ness).We also track the observed data
state.trials_evaluations
. If this did not change since the last recentmodel()
call, we do not refit the model parameters. This is based on the assumption that model parameter fitting only depends onstate.trials_evaluations
(observed data), not on other fields (e.g., pending evaluations).If given,
state_converter
maps the state to another one which is then passed to the model for fitting and predictions. One important use case is filtering down data when model fitting is superlinear. Another is to convert multi-fidelity setups to be used with single-fidelity models inside.Note that
estimator
andskip_optimization
can also be a dictionary mapping output names to models. In that case, the state is shared but the models for each output metric are updated independently.- Parameters:
estimator (
Union
[Estimator
,Dict
[str
,Estimator
]]) – Surrogate model(s)init_state (
TuningJobState
) – Initial tuning job stateskip_optimization (
Union
[SkipOptimizationPredicate
,Dict
[str
,SkipOptimizationPredicate
],None
]) – Skip optimization predicate (see above). Defaults toNone
(fitting is never skipped)state_converter (
Optional
[StateForModelConverter
]) – See above, optional
- property state: TuningJobState
- property use_single_model: bool
- property skip_optimization: SkipOptimizationPredicate | Dict[str, SkipOptimizationPredicate]
- fit(**kwargs)[source]
If
skip_optimization
is given, it overrides theself._skip_optimization
predicate.
- append_trial(trial_id, config=None, resource=None)[source]
Appends new pending evaluation to the state.
- Parameters:
trial_id (
str
) – ID of trialconfig (
Optional
[Dict
[str
,Union
[int
,float
,str
]]]) – Must be given if this trial does not yet feature in the stateresource (
Optional
[int
]) – Must be given in the multi-fidelity case, to specify at which resource level the evaluation is pending
- drop_pending_evaluation(trial_id, resource=None)[source]
Drop pending evaluation from state. If it is not listed as pending, nothing is done
- Parameters:
trial_id (
str
) – ID of trialresource (
Optional
[int
]) – Must be given in the multi-fidelity case, to specify at which resource level the evaluation is pending
- Return type:
bool
- remove_observed_case(trial_id, metric_name='target', key=None)[source]
Removes specific observation from the state.
- Parameters:
trial_id (
str
) – ID of trialmetric_name (
str
) – Name of internal metrickey (
Optional
[str
]) – Must be given in the multi-fidelity case
- label_trial(data, config=None)[source]
Adds observed data for a trial. If it has observations in the state already,
data.metrics
are appended. Otherwise, a new entry is appended. If new observations replace pending evaluations, these are removed.config
must be passed if the trial has not yet been registered in the state (this happens normally with theappend_trial
call). If already registered,config
is ignored.
- filter_pending_evaluations(filter_pred)[source]
Filters
state.pending_evaluations
withfilter_pred
.- Parameters:
filter_pred (
Callable
[[PendingEvaluation
],bool
]) – Filtering predicate
syne_tune.optimizer.schedulers.searchers.bayesopt.models.sklearn_model module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.sklearn_model.SKLearnPredictorWrapper(sklearn_predictor, state, active_metric=None)[source]
Bases:
BasePredictor
Wrapper class for sklearn predictors returned by
fit_from_state
ofSKLearnEstimatorWrapper
.- predict(inputs)[source]
Returns signals which are statistics of the predictive distribution at input points
inputs
. By default:“mean”: Predictive means. If the model supports fantasizing with a number
nf
of fantasies, this has shape(n, nf)
, otherwise(n,)
“std”: Predictive stddevs, shape
(n,)
If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.
- Parameters:
inputs (
ndarray
) – Input points, shape(n, d)
- Return type:
List
[Dict
[str
,ndarray
]]- Returns:
List of
dict
with keyskeys_predict()
, of length the number of MCMC samples, or length 1 for empirical Bayes
- backward_gradient(input, head_gradients)[source]
Computes the gradient \(\nabla f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by
predict()
for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.- Parameters:
input (
ndarray
) – Single input point \(x\), shape(d,)
head_gradients (
List
[Dict
[str
,ndarray
]]) – See above
- Return type:
List
[ndarray
]- Returns:
Gradient \(\nabla f(x)\) (one-length list)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.sklearn_model.SKLearnEstimatorWrapper(sklearn_estimator, active_metric=None, *args, **kwargs)[source]
Bases:
Estimator
Wrapper class for sklearn estimators.
- fit_from_state(state, update_params)[source]
Creates a
Predictor
object based on data instate
.If the model also has hyperparameters, these are learned iff
update_params == True
. Otherwise, these parameters are not changed, but only the posterior state is computed. If your surrogate model is not Bayesian, or does not have hyperparameters, you can ignore theupdate_params
argument.If
self.state.pending_evaluations
is not empty, we compute posterior for state without pending evals. This method can be overwritten for any other behaviour such as one found infit_from_state()
.- Parameters:
state (
TuningJobState
) – Current data model parameters are to be fit on, and the posterior state is to be computed fromupdate_params (
bool
) – See above
- Return type:
- Returns:
Predictor, wrapping the posterior state
syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_multi_fidelity module
- syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_multi_fidelity.cap_size_tuning_job_state(state, max_size, random_state=None)[source]
Returns state which is identical to
state
, except that thetrials_evaluations
are replaced by a subset so the total number of metric values is<= max_size
. Filtering is done by preserving data from trials which have observations at the higher resource levels. For some trials, we may remove values at low resources, but keep values at higher ones, in order to meet themax_size
constraint.- Parameters:
state (
TuningJobState
) – Original state to filter downmax_size (
int
) – Maximum number of observed metric values in new staterandom_state (
Optional
[RandomState
]) – Used for random sampling. Defaults tonumpy.random
.
- Return type:
- Returns:
New state meeting the
max_size
constraint. This is a copy ofstate
even if this meets the constraint already.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_multi_fidelity.SubsampleMultiFidelityStateConverter(max_size, random_state=None)[source]
Bases:
StateForModelConverter
Converts state by (possibly) down sampling the observation so that their total number is
<= max_size
. This is done in a way that trials with observations in higher rung levels are retained (with all their data), so observations are preferentially removed at low levels, and from trials which do not have observations higher up.This state converter makes sense if observed data is only used at geometrically spaced rung levels, so the number of observations per trial remains small. If a trial runs up on the order of
max_resource_level
observations, it does not work, because it ends up retaining densely sampled observations from very few trials. UseSubsampleMFDenseDataStateConverter
in such a case.
- syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_multi_fidelity.sparsify_tuning_job_state(state, max_size, grace_period, reduction_factor)[source]
Does the first step of state conversion in
SubsampleMFDenseDataStateConverter
, in that dense observations are sparsified w.r.t. a geometrically spaced rung level system.- Parameters:
state (
TuningJobState
) – Original state to filter downmax_size (
int
) – Maximum number of observed metric values in new stategrace_period (
int
) – Minimum resource level \(r_{min}\)reduction_factor (
float
) – Reduction factor \(\eta\)
- Return type:
- Returns:
New state which either meets the
max_size
constraint, or is maximally sparsified
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_multi_fidelity.SubsampleMFDenseDataStateConverter(max_size, grace_period=None, reduction_factor=None, random_state=None)[source]
Bases:
SubsampleMultiFidelityStateConverter
Variant of
SubsampleMultiFidelityStateConverter
, which has the same goal, but does subsampling in a different way. The current default for most GP-based multi-fidelity algorithms (e.g., MOBSTER, Hyper-Tune) is to use observations only at geometrically spaced rung levels (such as 1, 3, 9, …), andSubsampleMultiFidelityStateConverter
makes sense.But for some (e.g., DyHPO), observations are recorded at all (or linearly spaced) resource levels, so there is much more data for trials which progressed further. Here, we do the state conversion in two steps, always stopping the process once the target size
max_size
is reached. We assume a geometric rung level spacing, given bygrace_period
andreduction_factor
, only for the purpose of state conversion. In the first step, we sparsify the observations. If each rung level \(r_k`\) defines a bucket \(B_k = r_{k-1} + 1, \dots, r_k\), each trial should have at most one observation in each bucket. Sparsification is done top down. If the result of this first step is still larger thanmax_size
, we continue with subsampling as inSubsampleMultiFidelityStateConverter
.
syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_single_fidelity module
- syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_single_fidelity.cap_size_tuning_job_state(state, max_size, mode, top_fraction, random_state=None)[source]
Returns state which is identical to
state
, except that thetrials_evaluations
are replaced by a subset so the total number of metric values is<= max_size
.- Parameters:
state (
TuningJobState
) – Original state to filter downmax_size (
int
) – Maximum number of observed metric values in new statemode (
str
) – “min” or “max”top_fraction (
float
) – See aboverandom_state (
Optional
[RandomState
]) – Used for random sampling. Defaults tonumpy.random
.
- Return type:
- Returns:
New state meeting the
max_size
constraint. This is a copy ofstate
even if this meets the constraint already.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_single_fidelity.SubsampleSingleFidelityStateConverter(max_size, mode, top_fraction, random_state=None)[source]
Bases:
StateForModelConverter
Converts state by (possibly) down sampling the observation so that their total number is
<= max_size
. Iflen(state) > max_size
, the subset is sampled as follows.max_size * top_fraction
is filled with the best observations. The remainder is sampled without replacement from the remaining observations.- Parameters:
max_size (
int
) – Maximum number of observed metric values in new statemode (
str
) – “min” or “max”top_fraction (
float
) – See aboverandom_state (
Optional
[RandomState
]) – Used for random sampling. Can also be set withset_random_state()
syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn package
- class syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.SKLearnPredictor[source]
Bases:
object
Base class for predictors generated by scikit-learn based estimators of
SKLearnEstimator
.This is only for predictors who return means and stddevs in
predict()
.- predict(X)[source]
Returns signals which are statistics of the predictive distribution at input points
inputs
.- Parameters:
inputs – Input points, shape
(n, d)
- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
(means, stds)
, where predictive meansmeans
and predictive stddevsstds
have shape(n,)
- backward_gradient(input, head_gradients)[source]
Needs to be implemented only if gradient-based local optimization of an acquisition function is supported.
Computes the gradient \(\nabla f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by
predict()
for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.- Parameters:
input (
ndarray
) – Single input point \(x\), shape(d,)
head_gradients (
Dict
[str
,ndarray
]) – See above
- Return type:
ndarray
- Returns:
Gradient \(\nabla f(x)\)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.SKLearnEstimator[source]
Bases:
object
Base class scikit-learn based estimators, giving rise to surrogate models for Bayesian optimization.
- fit(X, y, update_params)[source]
Implements
fit_from_state()
, given transformed data. Here,y
is normalized (zero mean, unit variance) iffnormalize_targets == True
.- Parameters:
X (
ndarray
) – Feature matrix, shape(n_samples, n_features)
y (
ndarray
) – Target values, shape(n_samples,)
update_params (
bool
) – Should model (hyper)parameters be updated? Ignored if estimator has no hyperparameters
- Return type:
- Returns:
Predictor, wrapping the posterior state
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.estimator module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.estimator.SKLearnEstimator[source]
Bases:
object
Base class scikit-learn based estimators, giving rise to surrogate models for Bayesian optimization.
- fit(X, y, update_params)[source]
Implements
fit_from_state()
, given transformed data. Here,y
is normalized (zero mean, unit variance) iffnormalize_targets == True
.- Parameters:
X (
ndarray
) – Feature matrix, shape(n_samples, n_features)
y (
ndarray
) – Target values, shape(n_samples,)
update_params (
bool
) – Should model (hyper)parameters be updated? Ignored if estimator has no hyperparameters
- Return type:
- Returns:
Predictor, wrapping the posterior state
syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.predictor module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.predictor.SKLearnPredictor[source]
Bases:
object
Base class for predictors generated by scikit-learn based estimators of
SKLearnEstimator
.This is only for predictors who return means and stddevs in
predict()
.- predict(X)[source]
Returns signals which are statistics of the predictive distribution at input points
inputs
.- Parameters:
inputs – Input points, shape
(n, d)
- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
(means, stds)
, where predictive meansmeans
and predictive stddevsstds
have shape(n,)
- backward_gradient(input, head_gradients)[source]
Needs to be implemented only if gradient-based local optimization of an acquisition function is supported.
Computes the gradient \(\nabla f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by
predict()
for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.- Parameters:
input (
ndarray
) – Single input point \(x\), shape(d,)
head_gradients (
Dict
[str
,ndarray
]) – See above
- Return type:
ndarray
- Returns:
Gradient \(\nabla f(x)\)
syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes module
- syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.assign_active_metric(predictor, active_metric)[source]
Checks that active_metric is provided when predictor consists of multiple output predictors. Otherwise, just sets active_metric to the only predictor output name available.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.NextCandidatesAlgorithm[source]
Bases:
object
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.Predictor(state, active_metric=None)[source]
Bases:
object
Base class for probabilistic predictors used in Bayesian optimization. They support marginal predictions feeding into an acquisition function, as well as computing gradients of an acquisition function w.r.t. inputs.
In general, a predictor is created by an estimator. It wraps a posterior state, which allows for probabilistic predictions on arbitrary inputs.
- Parameters:
state (
TuningJobState
) – Tuning job stateactive_metric (
Optional
[str
]) – Name of internal objective
- keys_predict()[source]
Keys of signals returned by
predict()
.Note: In order to work with
AcquisitionFunction
implementations, the following signals are required:“mean”: Predictive mean
“std”: Predictive standard deviation
- Return type:
Set
[str
]- Returns:
Set of keys for
dict
returned bypredict()
- predict(inputs)[source]
Returns signals which are statistics of the predictive distribution at input points
inputs
. By default:“mean”: Predictive means. If the model supports fantasizing with a number
nf
of fantasies, this has shape(n, nf)
, otherwise(n,)
“std”: Predictive stddevs, shape
(n,)
If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.
- Parameters:
inputs (
ndarray
) – Input points, shape(n, d)
- Return type:
List
[Dict
[str
,ndarray
]]- Returns:
List of
dict
with keyskeys_predict()
, of length the number of MCMC samples, or length 1 for empirical Bayes
- hp_ranges_for_prediction()[source]
- Return type:
- Returns:
Feature generator to be used for
inputs
inpredict()
- predict_candidates(candidates)[source]
Convenience variant of
predict()
- Parameters:
candidates (
Iterable
[Dict
[str
,Union
[int
,float
,str
]]]) – List of configurations- Return type:
List
[Dict
[str
,ndarray
]]- Returns:
Same as
predict()
- current_best()[source]
Returns the so-called incumbent, to be used in acquisition functions such as expected improvement. This is the minimum of predictive means (signal with key “mean”) at all current candidate locations (both state.trials_evaluations and state.pending_evaluations). Normally, a scalar is returned, but if the model supports fantasizing and the state contains pending evaluations, there is one incumbent per fantasy sample, so a vector is returned.
If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.
- Return type:
List
[ndarray
]- Returns:
Incumbent, see above
- backward_gradient(input, head_gradients)[source]
Computes the gradient \(\nabla_x f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by
predict()
for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.Lists have
> 1
entry if MCMC is used, otherwise they are all size 1.- Parameters:
input (
ndarray
) – Single input point \(x\), shape(d,)
head_gradients (
List
[Dict
[str
,ndarray
]]) – See above
- Return type:
List
[ndarray
]- Returns:
Gradient \(\nabla_x f(x)\) (several if MCMC is used)
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.ScoringFunction(predictor=None, active_metric=None)[source]
Bases:
object
Class to score candidates. As opposed to acquisition functions, scores do not support gradient computation. Note that scores are always minimized.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.AcquisitionFunction(predictor=None, active_metric=None)[source]
Bases:
ScoringFunction
Base class for acquisition functions \(f(x)\).
- Parameters:
- compute_acq(inputs, predictor=None)[source]
Note: If inputs has shape
(d,)
, it is taken to be(1, d)
- compute_acq_with_gradient(input, predictor=None)[source]
For a single input point \(x\), compute acquisition function value \(f(x)\) and gradient \(\nabla_x f(x)\).
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.LocalOptimizer(hp_ranges, predictor, acquisition_class, active_metric=None)[source]
Bases:
object
Class that tries to find a local candidate with a better score, typically using a local optimization method such as L-BFGS. It would normally encapsulate an acquisition function and predictor.
acquisition_class
contains the type of the acquisition function (subclass ofAcquisitionFunction
). It can also be a tuple of the form(type, kwargs)
, wherekwargs
are extra arguments to the class constructor.- Parameters:
hp_ranges (
HyperparameterRanges
) – Feature generator for configurationspredictor (
Union
[Predictor
,Dict
[str
,Predictor
]]) – Predictor(s) for acquisition functionacquisition_class (
Callable
[[Any
],AcquisitionFunction
]) – See aboveactive_metric (
Optional
[str
]) – Name of internal metric
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.CandidateGenerator[source]
Bases:
object
Class to generate candidates from which to start the local minimization, typically random candidate or some form of more uniformly spaced variation, such as latin hypercube or Sobol sequence.
- generate_candidates_en_bulk(num_cands, exclusion_list=None)[source]
- Parameters:
num_cands (
int
) – Number of candidates to generateexclusion_list (
Optional
[ExclusionList
]) – If given, these candidates must not be returned
- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]- Returns:
List of
num_cands
candidates. Ifexclusion_list
is given, the number of candidates returned can be< num_cands
syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm.BayesianOptimizationAlgorithm(initial_candidates_generator, initial_candidates_scorer, num_initial_candidates, local_optimizer, pending_candidate_state_transformer, exclusion_candidates, num_requested_candidates, greedy_batch_selection, duplicate_detector, num_initial_candidates_for_batch=None, sample_unique_candidates=False, debug_log=None)[source]
Bases:
NextCandidatesAlgorithm
Core logic of the Bayesian optimization algorithm
- Parameters:
initial_candidates_generator (
CandidateGenerator
) – generator of candidatesinitial_scoring_function – scoring function used to rank the initial candidates. Note: If a batch is selected in one go (
num_requested_candidates > 1
,greedy_batch_selection == False
), this function should encourage diversity among its top scorers. In general, greedy batch selection is recommended.num_initial_candidates (
int
) – how many initial candidates to generate, if possiblelocal_optimizer (
LocalOptimizer
) – local optimizer which starts from score minimizer. If a batch is selected in one go (not greedily), then local optimizations are started from the topnum_requested_candidates
ranked candidates (after scoring)pending_candidate_state_transformer (
Optional
[ModelStateTransformer
]) – Once a candidate is selected, it becomes pending, and the state is transformed by appending information. This is done by the transformer. This is object is needed only ifnext_candidates()
goes through more than one outer iterations (i.e., ifgreedy_batch_selection == True
andnum_requested_candidates > 1
. Otherwise, None can be passed here. Note: Model updates (by the state transformer) for batch candidates beyond the first do not involve fitting hyperparameters, so they are usually cheap.exclusion_candidates (
ExclusionList
) – Set of candidates that should not be returned, because they are already labeled, currently pending, or have failednum_requested_candidates (
int
) – number of candidates to returngreedy_batch_selection (
bool
) – If True andnum_requested_candidates > 1
, we generate, order, and locally optimize for each single candidate to be selected. Otherwise, this is done just once, andnum_requested_candidates
are extracted in one go. Note: If this is True,pending_candidate_state_transformer
is needed.duplicate_detector (
DuplicateDetector
) – used to make sure no candidates equal to already evaluated ones is returnednum_initial_candidates_for_batch (
Optional
[int
]) – This is used only ifnum_requested_candidates > 1
andgreedy_batch_selection == True
. In this case,num_initial_candidates_for_batch
overridesnum_initial_candidates
when selecting all but the first candidate for the batch. Typically,num_initial_candidates
is larger thannum_initial_candidates_for_batch
in this case, which speeds up selecting large batches, but still select the first candidate thoroughlysample_unique_candidates (
bool
) – IfTrue
, we check that initial candidates sampled at random are unique and disjoint from the exclusion list. This can be expensive. Defaults toFalse
debug_log (
Optional
[DebugLogPrinter
]) – If aDebugLogPrinter
object is passed here, it is used to write log messages
-
initial_candidates_generator:
CandidateGenerator
-
initial_candidates_scorer:
ScoringFunction
-
num_initial_candidates:
int
-
local_optimizer:
LocalOptimizer
-
pending_candidate_state_transformer:
Optional
[ModelStateTransformer
]
-
exclusion_candidates:
ExclusionList
-
num_requested_candidates:
int
-
greedy_batch_selection:
bool
-
duplicate_detector:
DuplicateDetector
-
num_initial_candidates_for_batch:
Optional
[int
] = None
-
sample_unique_candidates:
bool
= False
-
debug_log:
Optional
[DebugLogPrinter
] = None
syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.IndependentThompsonSampling(predictor=None, active_metric=None, random_state=None)[source]
Bases:
ScoringFunction
Note: This is not Thompson sampling, but rather a variant called “independent Thompson sampling”, where means and variances are drawn from the marginal rather than the joint distribution. This is cheap, but incorrect. In fact, the larger the number of candidates, the more likely the winning configuration is arising from pure chance.
- Parameters:
- score(candidates, predictor=None)[source]
- Parameters:
candidates (
Iterable
[Dict
[str
,Union
[int
,float
,str
]]]) – Configurations for which scores are to be computedpredictor (
Optional
[Predictor
]) – Overrides default predictor
- Return type:
List
[float
]- Returns:
List of score values, length of
candidates
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.LBFGSOptimizeAcquisition(hp_ranges, predictor, acquisition_class, active_metric=None)[source]
Bases:
LocalOptimizer
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.NoOptimization(*args, **kwargs)[source]
Bases:
LocalOptimizer
- optimize(candidate, predictor=None)[source]
Run local optimization, starting from
candidate
- Parameters:
candidate (
Dict
[str
,Union
[int
,float
,str
]]) – Starting pointpredictor (
Optional
[Predictor
]) – Overridesself.predictor
- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
Configuration found by local optimization
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.RandomStatefulCandidateGenerator(hp_ranges, random_state)[source]
Bases:
CandidateGenerator
This generator maintains a random state, so if
generate_candidates()
is called several times, different sequences are returned.- Parameters:
hp_ranges (
HyperparameterRanges
) – Feature generator for configurationsrandom_state (
RandomState
) – PRN generator
- generate_candidates_en_bulk(num_cands, exclusion_list=None)[source]
- Parameters:
num_cands (
int
) – Number of candidates to generateexclusion_list – If given, these candidates must not be returned
- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]- Returns:
List of
num_cands
candidates. Ifexclusion_list
is given, the number of candidates returned can be< num_cands
- syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.generate_unique_candidates(candidates_generator, num_candidates, exclusion_candidates)[source]
- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.RandomFromSetCandidateGenerator(base_set, random_state, ext_config=None)[source]
Bases:
CandidateGenerator
In this generator, candidates are sampled from a given set.
- Parameters:
base_set (
List
[Dict
[str
,Union
[int
,float
,str
]]]) – Set of all configurations to sample fromrandom_state (
RandomState
) – PRN generatorext_config (
Optional
[Dict
[str
,Union
[int
,float
,str
]]]) – If given, each configuration is updated with this dictionary before being returned
- generate_candidates_en_bulk(num_cands, exclusion_list=None)[source]
- Parameters:
num_cands (
int
) – Number of candidates to generateexclusion_list – If given, these candidates must not be returned
- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]- Returns:
List of
num_cands
candidates. Ifexclusion_list
is given, the number of candidates returned can be< num_cands
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.DuplicateDetector[source]
Bases:
object
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.DuplicateDetectorNoDetection[source]
Bases:
DuplicateDetector
- class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.DuplicateDetectorIdentical[source]
Bases:
DuplicateDetector
syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.defaults module
syne_tune.optimizer.schedulers.searchers.bayesopt.utils package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy module
- class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.ThreeHumpCamel[source]
Bases:
object
- property search_space
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.branin_function(x1, x2, r=6)[source]
- class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.Branin[source]
Bases:
object
- property search_space
- class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.BraninWithR(r)[source]
Bases:
Branin
- class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.Ackley[source]
Bases:
object
- property search_space
- class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.SimpleQuadratic[source]
Bases:
object
- property search_space
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.evaluate_blackbox(bb_func, inputs)[source]
- Return type:
ndarray
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.sample_data(bb_cls, num_train, num_grid, expand_datadct=True)[source]
- Return type:
Dict
[str
,Any
]
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.expand_data(data)[source]
Appends derived entries to data dict, which have non-elementary types.
- Return type:
Dict
[str
,Any
]
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.data_to_state(data)[source]
- Return type:
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.decode_inputs(inputs, ss_limits)[source]
- Return type:
(
List
[Dict
[str
,Union
[int
,float
,str
]]],Dict
)
syne_tune.optimizer.schedulers.searchers.bayesopt.utils.debug_log module
syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects module
Object definitions that are used for testing.
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.build_kernel(state, do_warping=False)[source]
- Return type:
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.default_gpmodel(state, random_seed, optimization_config)[source]
- Return type:
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.default_gpmodel_mcmc(state, random_seed, mcmc_config)[source]
- Return type:
- class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.RepeatedCandidateGenerator(n_unique_candidates)[source]
Bases:
CandidateGenerator
Generates candidates from a fixed set. Used to test the deduplication logic.
- class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.Quadratic3d(local_minima, active_metric, metric_names)[source]
Bases:
object
- property search_space
- property f_min
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.tuples_to_configs(config_tpls, hp_ranges)[source]
Many unit tests write configs as tuples.
- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.create_exclusion_set(candidates_tpl, hp_ranges, is_dict=False)[source]
Creates exclusion list from set of tuples.
- Return type:
- syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.create_tuning_job_state(hp_ranges, cand_tuples, metrics, pending_tuples=None, failed_tuples=None)[source]
Builds
TuningJobState
from basics, where configs are given as tuples or as dicts.NOTE: We assume that all configs in the different lists are different!
- Return type:
syne_tune.optimizer.schedulers.searchers.bore package
- class syne_tune.optimizer.schedulers.searchers.bore.Bore(config_space, metric, points_to_evaluate=None, allow_duplicates=None, restrict_configurations=None, mode=None, gamma=None, calibrate=None, classifier=None, acq_optimizer=None, feval_acq=None, random_prob=None, init_random=None, classifier_kwargs=None, **kwargs)[source]
Bases:
StochasticAndFilterDuplicatesSearcher
Implements “Bayesian optimization by Density Ratio Estimation” as described in the following paper:
BORE: Bayesian Optimization by Density-Ratio Estimation,Tiao, Louis C and Klein, Aaron and Seeger, Matthias W and Bonilla, Edwin V. and Archambeau, Cedric and Ramos, FabioProceedings of the 38th International Conference on Machine LearningAdditional arguments on top of parent class
StochasticAndFilterDuplicatesSearcher
:- Parameters:
mode (
Optional
[str
]) – Can be “min” (default) or “max”.gamma (
Optional
[float
]) – Defines the percentile, i.e how many percent of configurations are used to model \(l(x)\). Defaults to 0.25calibrate (
Optional
[bool
]) – If set to true, we calibrate the predictions of the classifier via CV. Defaults to Falseclassifier (
Optional
[str
]) – The binary classifier to model the acquisition function. Choices:{"mlp", "gp", "xgboost", "rf", "logreg"}
. Defaults to “xgboost”acq_optimizer (
Optional
[str
]) – The optimization method to maximize the acquisition function. Choices:{"de", "rs", "rs_with_replacement"}
. Defaults to “rs”feval_acq (
Optional
[int
]) – Maximum allowed function evaluations of the acquisition function. Defaults to 500random_prob (
Optional
[float
]) – probability for returning a random configurations (epsilon greedy). Defaults to 0init_random (
Optional
[int
]) –get_config()
returns randomly drawn configurations until at leastinit_random
observations have been recorded inupdate()
. After that, the BORE algorithm is used. Defaults to 6classifier_kwargs (
Optional
[dict
]) – Parameters for classifier. Optional
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- class syne_tune.optimizer.schedulers.searchers.bore.MultiFidelityBore(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, gamma=None, calibrate=None, classifier=None, acq_optimizer=None, feval_acq=None, random_prob=None, init_random=None, classifier_kwargs=None, resource_attr='epoch', **kwargs)[source]
Bases:
Bore
Adapts BORE (Tiao et al.) for the multi-fidelity Hyperband setting following BOHB (Falkner et al.). Once we collected enough data points on the smallest resource level, we fit a probabilistic classifier and sample from it until we have a sufficient amount of data points for the next higher resource level. We then refit the classifier on the data of this resource level. These steps are iterated until we reach the highest resource level. References:
BORE: Bayesian Optimization by Density-Ratio Estimation,Tiao, Louis C and Klein, Aaron and Seeger, Matthias W and Bonilla, Edwin V. and Archambeau, Cedric and Ramos, FabioProceedings of the 38th International Conference on Machine Learningand
BOHB: Robust and Efficient Hyperparameter Optimization at ScaleS. Falkner and A. Klein and F. HutterProceedings of the 35th International Conference on Machine LearningAdditional arguments on top of parent class
Bore
:- Parameters:
resource_attr (
str
) – Name of resource attribute. Defaults to “epoch”
Submodules
syne_tune.optimizer.schedulers.searchers.bore.bore module
- class syne_tune.optimizer.schedulers.searchers.bore.bore.Bore(config_space, metric, points_to_evaluate=None, allow_duplicates=None, restrict_configurations=None, mode=None, gamma=None, calibrate=None, classifier=None, acq_optimizer=None, feval_acq=None, random_prob=None, init_random=None, classifier_kwargs=None, **kwargs)[source]
Bases:
StochasticAndFilterDuplicatesSearcher
Implements “Bayesian optimization by Density Ratio Estimation” as described in the following paper:
BORE: Bayesian Optimization by Density-Ratio Estimation,Tiao, Louis C and Klein, Aaron and Seeger, Matthias W and Bonilla, Edwin V. and Archambeau, Cedric and Ramos, FabioProceedings of the 38th International Conference on Machine LearningAdditional arguments on top of parent class
StochasticAndFilterDuplicatesSearcher
:- Parameters:
mode (
Optional
[str
]) – Can be “min” (default) or “max”.gamma (
Optional
[float
]) – Defines the percentile, i.e how many percent of configurations are used to model \(l(x)\). Defaults to 0.25calibrate (
Optional
[bool
]) – If set to true, we calibrate the predictions of the classifier via CV. Defaults to Falseclassifier (
Optional
[str
]) – The binary classifier to model the acquisition function. Choices:{"mlp", "gp", "xgboost", "rf", "logreg"}
. Defaults to “xgboost”acq_optimizer (
Optional
[str
]) – The optimization method to maximize the acquisition function. Choices:{"de", "rs", "rs_with_replacement"}
. Defaults to “rs”feval_acq (
Optional
[int
]) – Maximum allowed function evaluations of the acquisition function. Defaults to 500random_prob (
Optional
[float
]) – probability for returning a random configurations (epsilon greedy). Defaults to 0init_random (
Optional
[int
]) –get_config()
returns randomly drawn configurations until at leastinit_random
observations have been recorded inupdate()
. After that, the BORE algorithm is used. Defaults to 6classifier_kwargs (
Optional
[dict
]) – Parameters for classifier. Optional
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
syne_tune.optimizer.schedulers.searchers.bore.de module
syne_tune.optimizer.schedulers.searchers.bore.gp_classififer module
syne_tune.optimizer.schedulers.searchers.bore.mlp_classififer module
syne_tune.optimizer.schedulers.searchers.bore.multi_fidelity_bore module
- class syne_tune.optimizer.schedulers.searchers.bore.multi_fidelity_bore.MultiFidelityBore(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, gamma=None, calibrate=None, classifier=None, acq_optimizer=None, feval_acq=None, random_prob=None, init_random=None, classifier_kwargs=None, resource_attr='epoch', **kwargs)[source]
Bases:
Bore
Adapts BORE (Tiao et al.) for the multi-fidelity Hyperband setting following BOHB (Falkner et al.). Once we collected enough data points on the smallest resource level, we fit a probabilistic classifier and sample from it until we have a sufficient amount of data points for the next higher resource level. We then refit the classifier on the data of this resource level. These steps are iterated until we reach the highest resource level. References:
BORE: Bayesian Optimization by Density-Ratio Estimation,Tiao, Louis C and Klein, Aaron and Seeger, Matthias W and Bonilla, Edwin V. and Archambeau, Cedric and Ramos, FabioProceedings of the 38th International Conference on Machine Learningand
BOHB: Robust and Efficient Hyperparameter Optimization at ScaleS. Falkner and A. Klein and F. HutterProceedings of the 35th International Conference on Machine LearningAdditional arguments on top of parent class
Bore
:- Parameters:
resource_attr (
str
) – Name of resource attribute. Defaults to “epoch”
syne_tune.optimizer.schedulers.searchers.botorch package
- class syne_tune.optimizer.schedulers.searchers.botorch.BoTorchSearcher(config_space, metric, points_to_evaluate=None, allow_duplicates=False, restrict_configurations=None, mode='min', num_init_random=3, no_fantasizing=False, max_num_observations=200, input_warping=True, **kwargs)[source]
Bases:
StochasticAndFilterDuplicatesSearcher
A searcher that suggest configurations using BOTORCH to build GP surrogate and optimize acquisition function.
qExpectedImprovement
is used for the acquisition function, given that it supports pending evaluations.Additional arguments on top of parent class
StochasticAndFilterDuplicatesSearcher
:- Parameters:
mode (
str
) – “min” (default) or “max”num_init_random (
int
) –get_config()
returns randomly drawn configurations until at leastinit_random
observations have been recorded inupdate()
. After that, the BOTorch algorithm is used. Defaults to 3no_fantasizing (
bool
) – IfTrue
, fantasizing is not done and pending evaluations are ignored. This may lead to loss of diversity in decisions. Defaults toFalse
max_num_observations (
Optional
[int
]) – Maximum number of observation to use when fitting the GP. If the number of observations gets larger than this number, then data is subsampled. IfNone
, then all data is used to fit the GP. Defaults to 200input_warping (
bool
) – Whether to apply input warping when fitting the GP. Defaults toTrue
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- register_pending(trial_id, config=None, milestone=None)[source]
Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.
- Parameters:
trial_id (
str
) – ID of trial to be registered as pending evaluationconfig (
Optional
[dict
]) – Iftrial_id
has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.milestone (
Optional
[int
]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers(config, milestone)
as pending.
- evaluation_failed(trial_id)[source]
Called by scheduler if an evaluation job for a trial failed.
The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).
- Parameters:
trial_id (
str
) – ID of trial whose evaluated failed
- cleanup_pending(trial_id)[source]
Removes all pending evaluations for trial
trial_id
.This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.
- Parameters:
trial_id (
str
) – ID of trial whose pending evaluations should be cleared
- dataset_size()[source]
- Returns:
Size of dataset a model is fitted to, or 0 if no model is fitted to data
- class syne_tune.optimizer.schedulers.searchers.botorch.BotorchSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
BoTorchSearcher
Downwards compatibility. Please use
BoTorchSearcher
instead
Submodules
syne_tune.optimizer.schedulers.searchers.botorch.botorch_searcher module
- class syne_tune.optimizer.schedulers.searchers.botorch.botorch_searcher.BoTorchSearcher(config_space, metric, points_to_evaluate=None, allow_duplicates=False, restrict_configurations=None, mode='min', num_init_random=3, no_fantasizing=False, max_num_observations=200, input_warping=True, **kwargs)[source]
Bases:
StochasticAndFilterDuplicatesSearcher
A searcher that suggest configurations using BOTORCH to build GP surrogate and optimize acquisition function.
qExpectedImprovement
is used for the acquisition function, given that it supports pending evaluations.Additional arguments on top of parent class
StochasticAndFilterDuplicatesSearcher
:- Parameters:
mode (
str
) – “min” (default) or “max”num_init_random (
int
) –get_config()
returns randomly drawn configurations until at leastinit_random
observations have been recorded inupdate()
. After that, the BOTorch algorithm is used. Defaults to 3no_fantasizing (
bool
) – IfTrue
, fantasizing is not done and pending evaluations are ignored. This may lead to loss of diversity in decisions. Defaults toFalse
max_num_observations (
Optional
[int
]) – Maximum number of observation to use when fitting the GP. If the number of observations gets larger than this number, then data is subsampled. IfNone
, then all data is used to fit the GP. Defaults to 200input_warping (
bool
) – Whether to apply input warping when fitting the GP. Defaults toTrue
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- register_pending(trial_id, config=None, milestone=None)[source]
Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.
- Parameters:
trial_id (
str
) – ID of trial to be registered as pending evaluationconfig (
Optional
[dict
]) – Iftrial_id
has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.milestone (
Optional
[int
]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers(config, milestone)
as pending.
- evaluation_failed(trial_id)[source]
Called by scheduler if an evaluation job for a trial failed.
The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).
- Parameters:
trial_id (
str
) – ID of trial whose evaluated failed
- cleanup_pending(trial_id)[source]
Removes all pending evaluations for trial
trial_id
.This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.
- Parameters:
trial_id (
str
) – ID of trial whose pending evaluations should be cleared
- dataset_size()[source]
- Returns:
Size of dataset a model is fitted to, or 0 if no model is fitted to data
- class syne_tune.optimizer.schedulers.searchers.botorch.botorch_searcher.BotorchSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
BoTorchSearcher
Downwards compatibility. Please use
BoTorchSearcher
instead
syne_tune.optimizer.schedulers.searchers.botorch.botorch_transfer_searcher module
- syne_tune.optimizer.schedulers.searchers.botorch.botorch_transfer_searcher.parse_value(val)[source]
- syne_tune.optimizer.schedulers.searchers.botorch.botorch_transfer_searcher.configs_from_df(df)[source]
- Return type:
List
[dict
]
- class syne_tune.optimizer.schedulers.searchers.botorch.botorch_transfer_searcher.BoTorchTransfer(config_space, metric, transfer_learning_evaluations, new_task_id, random_seed=None, encode_tasks_ordinal=False, **kwargs)[source]
Bases:
BoTorch
- class syne_tune.optimizer.schedulers.searchers.botorch.botorch_transfer_searcher.BoTorchTransferSearcher(config_space, metric, transfer_learning_evaluations, new_task_id, points_to_evaluate=None, allow_duplicates=False, num_init_random=0, encode_tasks_ordinal=False, **kwargs)[source]
Bases:
BoTorchSearcher
syne_tune.optimizer.schedulers.searchers.constrained package
- class syne_tune.optimizer.schedulers.searchers.constrained.ConstrainedGPFIFOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
MultiModelGPFIFOSearcher
Gaussian process-based constrained hyperparameter optimization (to be used with
FIFOScheduler
).Additional arguments on top of parent class
MultiModelGPFIFOSearcher
:- Parameters:
constraint_attr – Name of constraint metric in
report
passed to_update()
.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
Submodules
syne_tune.optimizer.schedulers.searchers.constrained.constrained_gp_fifo_searcher module
- class syne_tune.optimizer.schedulers.searchers.constrained.constrained_gp_fifo_searcher.ConstrainedGPFIFOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
MultiModelGPFIFOSearcher
Gaussian process-based constrained hyperparameter optimization (to be used with
FIFOScheduler
).Additional arguments on top of parent class
MultiModelGPFIFOSearcher
:- Parameters:
constraint_attr – Name of constraint metric in
report
passed to_update()
.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
syne_tune.optimizer.schedulers.searchers.cost_aware package
- class syne_tune.optimizer.schedulers.searchers.cost_aware.CostAwareGPFIFOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
MultiModelGPFIFOSearcher
Gaussian process-based cost-aware hyperparameter optimization (to be used with
FIFOScheduler
). The searcher requires a cost metric, which is given bycost_attr
.Implements two different variants. If
resource_attr
is given, cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given bykwargs["cost_model"]
.If
resource_attr
is not given, cost values are read only at the end (just like the primary metric) and cost is modeled as \(c(x)\), using a default GP surrogate model.Note: The presence or absence of
resource_attr
decides on which variant is used here. Ifresource_attr
is given,cost_model
must be given as well.Additional arguments on top of parent class
GPFIFOSearcher
:- Parameters:
cost_attr (str) – Mandatory. Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Depending on whether
resource_attr
is given, cost values are read from each report or only at the end.resource_attr (str, optional) – Name of resource attribute in reports, optional. If this is given, cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by
cost_model
. If not given, cost values are read only at the end (just like the primary metric) and cost is modeled as \(c(x)\), using a default GP surrogate model.cost_model (
CostModel
, optional) – Needed ifresource_attr
is given, model for \(c(x, r)\). Ignored ifresource_attr
is not given, since \(c(x)\) is represented by a default GP surrogate model.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
- class syne_tune.optimizer.schedulers.searchers.cost_aware.CostAwareGPMultiFidelitySearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
MultiModelGPMultiFidelitySearcher
Gaussian process-based cost-aware multi-fidelity hyperparameter optimization (to be used with
HyperbandScheduler
). The searcher requires a cost metric, which is given bycost_attr
.The acquisition function used here is the same as in
GPMultiFidelitySearcher
, but expected improvement (EI) is replaced by EIpu (seeEIpuAcquisitionFunction
).Cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by
kwargs["cost_model"]
.Additional arguments on top of parent class
GPMultiFidelitySearcher
:- Parameters:
cost_attr (str) – Mandatory. Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Depending on whether
resource_attr
is given, cost values are read from each report or only at the end.resource_attr (str) – Name of resource attribute in reports. Cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by
cost_model
.cost_model (
CostModel
, optional) – Model for \(c(x, r)\)
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
Submodules
syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_fifo_searcher module
- class syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_fifo_searcher.MultiModelGPFIFOSearcher(config_space, metric, points_to_evaluate=None, clone_from_state=False, **kwargs)[source]
Bases:
GPFIFOSearcher
Superclass for multi-model extensions of
GPFIFOSearcher
. We first call_create_internal()
passing factory andskip_optimization
predicate for theINTERNAL_METRIC_NAME
model, then replace the state transformer by a multi-model one.
- class syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_fifo_searcher.CostAwareGPFIFOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
MultiModelGPFIFOSearcher
Gaussian process-based cost-aware hyperparameter optimization (to be used with
FIFOScheduler
). The searcher requires a cost metric, which is given bycost_attr
.Implements two different variants. If
resource_attr
is given, cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given bykwargs["cost_model"]
.If
resource_attr
is not given, cost values are read only at the end (just like the primary metric) and cost is modeled as \(c(x)\), using a default GP surrogate model.Note: The presence or absence of
resource_attr
decides on which variant is used here. Ifresource_attr
is given,cost_model
must be given as well.Additional arguments on top of parent class
GPFIFOSearcher
:- Parameters:
cost_attr (str) – Mandatory. Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Depending on whether
resource_attr
is given, cost values are read from each report or only at the end.resource_attr (str, optional) – Name of resource attribute in reports, optional. If this is given, cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by
cost_model
. If not given, cost values are read only at the end (just like the primary metric) and cost is modeled as \(c(x)\), using a default GP surrogate model.cost_model (
CostModel
, optional) – Needed ifresource_attr
is given, model for \(c(x, r)\). Ignored ifresource_attr
is not given, since \(c(x)\) is represented by a default GP surrogate model.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_multifidelity_searcher module
- class syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_multifidelity_searcher.MultiModelGPMultiFidelitySearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
GPMultiFidelitySearcher
Superclass for multi-model extensions of
GPMultiFidelitySearcher
. We first call_create_internal()
passing factory andskip_optimization
predicate for theINTERNAL_METRIC_NAME
model, then replace the state transformer by a multi-model one.
- class syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_multifidelity_searcher.CostAwareGPMultiFidelitySearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
MultiModelGPMultiFidelitySearcher
Gaussian process-based cost-aware multi-fidelity hyperparameter optimization (to be used with
HyperbandScheduler
). The searcher requires a cost metric, which is given bycost_attr
.The acquisition function used here is the same as in
GPMultiFidelitySearcher
, but expected improvement (EI) is replaced by EIpu (seeEIpuAcquisitionFunction
).Cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by
kwargs["cost_model"]
.Additional arguments on top of parent class
GPMultiFidelitySearcher
:- Parameters:
cost_attr (str) – Mandatory. Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Depending on whether
resource_attr
is given, cost values are read from each report or only at the end.resource_attr (str) – Name of resource attribute in reports. Cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by
cost_model
.cost_model (
CostModel
, optional) – Model for \(c(x, r)\)
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
syne_tune.optimizer.schedulers.searchers.dyhpo package
- class syne_tune.optimizer.schedulers.searchers.dyhpo.DynamicHPOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
BaseSearcher
Supports model-based decisions in the DyHPO algorithm proposed by Wistuba etal (see
DyHPORungSystem
).It is not recommended to create
DynamicHPOSearcher
searcher objects directly, but rather to createHyperbandScheduler
objects withsearcher="dyhpo"
andtype="dyhpo"
, and passing arguments here insearch_options
. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
to create components in a consistent way.This searcher is special, in that it contains a searcher of type
GPMultiFidelitySearcher
. Also, its model-based scoring is not triggered byget_config()
, but rather when the scheduler tries to find a trial which can be promoted. At this point,score_paused_trials_and_new_configs()
is called, which scores all paused trials along with new configurations. Depending on who is the best scorer, a paused trial is resumed, or a trial with a new configuration is started. Since all the work is already done inscore_paused_trials_and_new_configs()
, the implementation ofget_config()
becomes trivial. See alsoDyHPORungSystem
. Extra points:The number of new configurations scored in
score_paused_trials_and_new_configs()
is the maximum ofnum_init_candidates
and the number of paused trials scored as wellThe parameters of the surrogate model are not refit in every call of
score_paused_trials_and_new_configs()
, but only when in the last recent call, a new configuration was chosen as top scorer. The aim is to do refitting in a similar frequency to MOBSTER, where decisions on whether to resume a trial are not done in a model-based way.
This searcher must be used with
HyperbandScheduler
andtype="dyhpo"
. It has the same constructor parameters asGPMultiFidelitySearcher
. Of these, the following are not used, but need to be given valid values:resource_acq
,initial_scoring
,skip_local_optimization
.- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[dict
]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
- on_trial_result(trial_id, config, result, update)[source]
Inform searcher about result
The scheduler passes every result. If
update == True
, the searcher should update its surrogate model (if any), otherwiseresult
is an intermediate result not modelled.The default implementation calls
_update()
ifupdate == True
. It can be overwritten by searchers which also react to intermediate results.- Parameters:
trial_id (
str
) – Seeon_trial_result()
config (
Dict
[str
,Any
]) – Seeon_trial_result()
result (
Dict
[str
,Any
]) – Seeon_trial_result()
update (
bool
) – Should surrogate model be updated?
- register_pending(trial_id, config=None, milestone=None)[source]
Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.
- Parameters:
trial_id (
str
) – ID of trial to be registered as pending evaluationconfig (
Optional
[dict
]) – Iftrial_id
has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.milestone (
Optional
[int
]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers(config, milestone)
as pending.
- remove_case(trial_id, **kwargs)[source]
Remove data case previously appended by
_update()
For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.
- Parameters:
trial_id (
str
) – ID of trial whose data is to be removedkwargs – Extra arguments, optional
- evaluation_failed(trial_id)[source]
Called by scheduler if an evaluation job for a trial failed.
The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).
- Parameters:
trial_id (
str
) – ID of trial whose evaluated failed
- cleanup_pending(trial_id)[source]
Removes all pending evaluations for trial
trial_id
.This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.
- Parameters:
trial_id (
str
) – ID of trial whose pending evaluations should be cleared
- dataset_size()[source]
- Returns:
Size of dataset a model is fitted to, or 0 if no model is fitted to data
- model_parameters()[source]
- Returns:
Dictionary with current model (hyper)parameter values if this is supported; otherwise empty
- score_paused_trials_and_new_configs(paused_trials, min_resource, new_trial_id)[source]
This method computes acquisition scores for a number of extended configs \((x, r)\). The acquisition score \(EI(x | r)\) is expected improvement (EI) at resource level \(r\). Here, the incumbent used in EI is the best value attained at level \(r\), or the best value overall if there is no data yet at that level. There are two types of configs being scored:
Paused trials: Passed by
paused_trials
as tuples(trial_id, resource)
, whereresource
is the level to be attained by the trial if it was resumedNew configurations drawn at random. For these, the score is EI at \(r\) equal to
min_resource
We return a dictionary. If a paused trial wins, its
trial_id
is returned with key “trial_id”. If a new configuration wins, this configuration is returned with key “config”.Note: As long as the internal searcher still returns configs from
points_to_evaluate
or drawn at random, this method always returns this config with key “config”. Scoring and considering paused trials is only done afterwards.- Parameters:
paused_trials (
List
[Tuple
[str
,int
,int
]]) – See above. Can be emptymin_resource (
int
) – Smallest resource levelnew_trial_id (
str
) – ID of new trial to be started in case a new configuration wins
- Return type:
Dict
[str
,Any
]- Returns:
Dictionary, see above
- get_state()[source]
Together with
clone_from_state()
, this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.- Return type:
Dict
[str
,Any
]- Returns:
Pickle-able mutable state of searcher
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- property debug_log: DebugLogPrinter | None
Some subclasses support writing a debug log, using
DebugLogPrinter
. SeeRandomSearcher
for an example.- Returns:
debug_log
object`` or None (not supported)
Submodules
syne_tune.optimizer.schedulers.searchers.dyhpo.dyhpo_searcher module
- class syne_tune.optimizer.schedulers.searchers.dyhpo.dyhpo_searcher.MyGPMultiFidelitySearcher(config_space, **kwargs)[source]
Bases:
GPMultiFidelitySearcher
This wrapper is for convenience, to avoid having to depend on internal concepts of
GPMultiFidelitySearcher
.- score_paused_trials_and_new_configs(paused_trials, min_resource, new_trial_id, skip_optimization)[source]
See
DynamicHPOSearcher.score_paused_trials_and_new_configs()
. Ifskip_optimization == True
, this is passed to the posterior state computation, and refitting of the surrogate model is skipped. Otherwise, nothing is passed, so the built-inskip_optimization
logic is used.- Return type:
Dict
[str
,Any
]
- class syne_tune.optimizer.schedulers.searchers.dyhpo.dyhpo_searcher.DynamicHPOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
BaseSearcher
Supports model-based decisions in the DyHPO algorithm proposed by Wistuba etal (see
DyHPORungSystem
).It is not recommended to create
DynamicHPOSearcher
searcher objects directly, but rather to createHyperbandScheduler
objects withsearcher="dyhpo"
andtype="dyhpo"
, and passing arguments here insearch_options
. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
to create components in a consistent way.This searcher is special, in that it contains a searcher of type
GPMultiFidelitySearcher
. Also, its model-based scoring is not triggered byget_config()
, but rather when the scheduler tries to find a trial which can be promoted. At this point,score_paused_trials_and_new_configs()
is called, which scores all paused trials along with new configurations. Depending on who is the best scorer, a paused trial is resumed, or a trial with a new configuration is started. Since all the work is already done inscore_paused_trials_and_new_configs()
, the implementation ofget_config()
becomes trivial. See alsoDyHPORungSystem
. Extra points:The number of new configurations scored in
score_paused_trials_and_new_configs()
is the maximum ofnum_init_candidates
and the number of paused trials scored as wellThe parameters of the surrogate model are not refit in every call of
score_paused_trials_and_new_configs()
, but only when in the last recent call, a new configuration was chosen as top scorer. The aim is to do refitting in a similar frequency to MOBSTER, where decisions on whether to resume a trial are not done in a model-based way.
This searcher must be used with
HyperbandScheduler
andtype="dyhpo"
. It has the same constructor parameters asGPMultiFidelitySearcher
. Of these, the following are not used, but need to be given valid values:resource_acq
,initial_scoring
,skip_local_optimization
.- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[dict
]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
- on_trial_result(trial_id, config, result, update)[source]
Inform searcher about result
The scheduler passes every result. If
update == True
, the searcher should update its surrogate model (if any), otherwiseresult
is an intermediate result not modelled.The default implementation calls
_update()
ifupdate == True
. It can be overwritten by searchers which also react to intermediate results.- Parameters:
trial_id (
str
) – Seeon_trial_result()
config (
Dict
[str
,Any
]) – Seeon_trial_result()
result (
Dict
[str
,Any
]) – Seeon_trial_result()
update (
bool
) – Should surrogate model be updated?
- register_pending(trial_id, config=None, milestone=None)[source]
Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.
- Parameters:
trial_id (
str
) – ID of trial to be registered as pending evaluationconfig (
Optional
[dict
]) – Iftrial_id
has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.milestone (
Optional
[int
]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers(config, milestone)
as pending.
- remove_case(trial_id, **kwargs)[source]
Remove data case previously appended by
_update()
For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.
- Parameters:
trial_id (
str
) – ID of trial whose data is to be removedkwargs – Extra arguments, optional
- evaluation_failed(trial_id)[source]
Called by scheduler if an evaluation job for a trial failed.
The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).
- Parameters:
trial_id (
str
) – ID of trial whose evaluated failed
- cleanup_pending(trial_id)[source]
Removes all pending evaluations for trial
trial_id
.This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.
- Parameters:
trial_id (
str
) – ID of trial whose pending evaluations should be cleared
- dataset_size()[source]
- Returns:
Size of dataset a model is fitted to, or 0 if no model is fitted to data
- model_parameters()[source]
- Returns:
Dictionary with current model (hyper)parameter values if this is supported; otherwise empty
- score_paused_trials_and_new_configs(paused_trials, min_resource, new_trial_id)[source]
This method computes acquisition scores for a number of extended configs \((x, r)\). The acquisition score \(EI(x | r)\) is expected improvement (EI) at resource level \(r\). Here, the incumbent used in EI is the best value attained at level \(r\), or the best value overall if there is no data yet at that level. There are two types of configs being scored:
Paused trials: Passed by
paused_trials
as tuples(trial_id, resource)
, whereresource
is the level to be attained by the trial if it was resumedNew configurations drawn at random. For these, the score is EI at \(r\) equal to
min_resource
We return a dictionary. If a paused trial wins, its
trial_id
is returned with key “trial_id”. If a new configuration wins, this configuration is returned with key “config”.Note: As long as the internal searcher still returns configs from
points_to_evaluate
or drawn at random, this method always returns this config with key “config”. Scoring and considering paused trials is only done afterwards.- Parameters:
paused_trials (
List
[Tuple
[str
,int
,int
]]) – See above. Can be emptymin_resource (
int
) – Smallest resource levelnew_trial_id (
str
) – ID of new trial to be started in case a new configuration wins
- Return type:
Dict
[str
,Any
]- Returns:
Dictionary, see above
- get_state()[source]
Together with
clone_from_state()
, this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.- Return type:
Dict
[str
,Any
]- Returns:
Pickle-able mutable state of searcher
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- property debug_log: DebugLogPrinter | None
Some subclasses support writing a debug log, using
DebugLogPrinter
. SeeRandomSearcher
for an example.- Returns:
debug_log
object`` or None (not supported)
syne_tune.optimizer.schedulers.searchers.dyhpo.hyperband_dyhpo module
- class syne_tune.optimizer.schedulers.searchers.dyhpo.hyperband_dyhpo.ScheduleDecision[source]
Bases:
object
- PROMOTE_SH = 0
- PROMOTE_DYHPO = 1
- START_DYHPO = 2
- class syne_tune.optimizer.schedulers.searchers.dyhpo.hyperband_dyhpo.DyHPORungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t, searcher, probability_sh, random_state)[source]
Bases:
PromotionRungSystem
Implements the logic which decides which paused trial to promote to the next resource level, or alternatively which configuration to start as a new trial, proposed in:
Wistuba, M. and Kadra, A. and Grabocka, J.Dynamic and Efficient Gray-Box Hyperparameter Optimization for Deep LearningWe do promotion-based scheduling, as in
PromotionRungSystem
. In fact, we run the successive halving rule inon_task_schedule()
with probabilityprobability_sh
, and the DyHPO logic otherwise, or if the SH rule does not promote a trial. This mechanism (not contained in the paper) ensures that trials are promoted eventually, even if DyHPO only starts new trials.Since
HyperbandScheduler
was designed for promotion decisions to be separate from decisions about new configs, the overall workflow is a bit tricky:In
FIFOScheduler._suggest()
, we first callpromote_trial_id, extra_kwargs = self._promote_trial()
. Ifpromote_trial_id != None
, this trial is promoted. Otherwise, we callconfig = self.searcher.get_config(**extra_kwargs, trial_id=trial_id)
and start a new trial with this config. In most cases,_promote_trial()
makes a promotion decision without using the searcher.Here, we use the fact that information can be passed from
_promote_trial()
toself.searcher.get_config
viaextra_kwargs
. Namely, :meth:``HyperbandScheduler._promote_trial` callson_task_schedule()
here, which callsscore_paused_trials_and_new_configs()
, where everything happens.First, all paused trials are scored w.r.t. the value of running them for one more unit of resource. Also, a number of random configs are scored w.r.t. the value of running them to the minimum resource.
If the winning config is from a paused trial, this is resumed. If the winning config is a new one,
on_task_schedule()
returns this config using a special keyKEY_NEW_CONFIGURATION
. This dict becomes part ofextra_kwargs
and is passed toself.searcher.get_config
get_config()
is trivial. It obtains an argument of nameKEY_NEW_CONFIGURATION
returns its value, which is the winning config to be started as new trial
We can ignore
rung_levels
andpromote_quantiles
, they are not used. For each trial, we only need to maintain the resource level at which it is paused.- on_task_schedule(new_trial_id)[source]
The main decision making happens here. We collect
(trial_id, resource)
for all paused trials and callsearcher
. The searcher scores all these trials along with a certain number of randomly drawn new configurations.If one of the paused trials has the best score, we return its
trial_id
along with extra information, so it gets promoted. If one of the new configurations has the best score, we return this configuration. In this case, a new trial is started with this configuration.Note: For this scheduler type,
kwargs
must contain the trial ID of the new trial to be started, in case none can be promoted.- Return type:
Dict
[str
,Any
]
- property schedule_records: List[Tuple[str, int, int]]
syne_tune.optimizer.schedulers.searchers.hypertune package
- class syne_tune.optimizer.schedulers.searchers.hypertune.HyperTuneSearcher(config_space, **kwargs)[source]
Bases:
GPMultiFidelitySearcher
Implements Hyper-Tune as extension of
GPMultiFidelitySearcher
, seeHyperTuneIndependentGPModel
for references. Two modifications:New brackets are sampled from a model-based distribution \([w_k]\)
The acquisition function is fed with predictive means and variances from a mixture over rung level distributions, weighted by \([ heta_k]\)
It is not recommended to create
HyperTuneSearcher
searcher objects directly, but rather to createHyperbandScheduler
objects withsearcher="hypertune"
, and passing arguments here insearch_options
. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
to create components in a consistent way.The following arguments of the parent class are not relevant here, and are ignored:
gp_resource_kernel
,resource_acq
,issm_gamma_one
,expdecay_normalize_inputs
.Additional arguments on top of parent class
GPMultiFidelitySearcher
:- Parameters:
model (str, optional) –
Selects surrogate model (learning curve model) to be used. Choices are:
”gp_multitask”: GP multi-task surrogate model
”gp_independent” (default): Independent GPs for each rung level, sharing an ARD kernel
The default is “gp_independent” (as in the Hyper-Tune paper), which is different to the default in
GPMultiFidelitySearcher
(which is “gp_multitask”). “gp_issm”, “gp_expdecay” not supported here.hypertune_distribution_num_samples (int, optional) – Parameter for estimating the distribution, given by \([ heta_k]\). Defaults to 50
Submodules
syne_tune.optimizer.schedulers.searchers.hypertune.hypertune_bracket_distribution module
- class syne_tune.optimizer.schedulers.searchers.hypertune.hypertune_bracket_distribution.HyperTuneBracketDistribution[source]
Bases:
DefaultHyperbandBracketDistribution
Represents the adaptive distribution over brackets [w_k].
syne_tune.optimizer.schedulers.searchers.hypertune.hypertune_searcher module
- class syne_tune.optimizer.schedulers.searchers.hypertune.hypertune_searcher.HyperTuneSearcher(config_space, **kwargs)[source]
Bases:
GPMultiFidelitySearcher
Implements Hyper-Tune as extension of
GPMultiFidelitySearcher
, seeHyperTuneIndependentGPModel
for references. Two modifications:New brackets are sampled from a model-based distribution \([w_k]\)
The acquisition function is fed with predictive means and variances from a mixture over rung level distributions, weighted by \([ heta_k]\)
It is not recommended to create
HyperTuneSearcher
searcher objects directly, but rather to createHyperbandScheduler
objects withsearcher="hypertune"
, and passing arguments here insearch_options
. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
to create components in a consistent way.The following arguments of the parent class are not relevant here, and are ignored:
gp_resource_kernel
,resource_acq
,issm_gamma_one
,expdecay_normalize_inputs
.Additional arguments on top of parent class
GPMultiFidelitySearcher
:- Parameters:
model (str, optional) –
Selects surrogate model (learning curve model) to be used. Choices are:
”gp_multitask”: GP multi-task surrogate model
”gp_independent” (default): Independent GPs for each rung level, sharing an ARD kernel
The default is “gp_independent” (as in the Hyper-Tune paper), which is different to the default in
GPMultiFidelitySearcher
(which is “gp_multitask”). “gp_issm”, “gp_expdecay” not supported here.hypertune_distribution_num_samples (int, optional) – Parameter for estimating the distribution, given by \([ heta_k]\). Defaults to 50
syne_tune.optimizer.schedulers.searchers.kde package
- class syne_tune.optimizer.schedulers.searchers.kde.KernelDensityEstimator(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, num_min_data_points=None, top_n_percent=None, min_bandwidth=None, num_candidates=None, bandwidth_factor=None, random_fraction=None, **kwargs)[source]
Bases:
StochasticAndFilterDuplicatesSearcher
Fits two kernel density estimators (KDE) to model the density of the top N configurations as well as the density of the configurations that are not among the top N, respectively. New configurations are sampled by optimizing the ratio of these two densities. KDE as model for Bayesian optimization has been originally proposed by Bergstra et al. Compared to their original implementation TPE, we use multi-variate instead of univariate KDE, as proposed by Falkner et al. Code is based on the implementation by Falkner et al: https://github.com/automl/HpBandSter/tree/master/hpbandster
Algorithms for Hyper-Parameter OptimizationJ. Bergstra and R. Bardenet and Y. Bengio and B. K{‘e}glProceedings of the 24th International Conference on Advances in Neural Information Processing Systemsand
BOHB: Robust and Efficient Hyperparameter Optimization at ScaleS. Falkner and A. Klein and F. HutterProceedings of the 35th International Conference on Machine LearningNote:
restrict_configurations
is not supported here, this would require reimplementing the selection of configs in_get_config()
.Additional arguments on top of parent class
StochasticAndFilterDuplicatesSearcher
:- Parameters:
mode (
Optional
[str
]) – Mode to use for the metric given, can be “min” or “max”. Is obtained from scheduler inconfigure_scheduler()
. Defaults to “min”num_min_data_points (
Optional
[int
]) – Minimum number of data points that we use to fit the KDEs. As long as less observations have been received inupdate()
, randomly drawn configurations are returned inget_config()
. If set toNone
, we set this to the number of hyperparameters. Defaults toNone
.top_n_percent (
Optional
[int
]) – Determines how many datapoints we use to fit the first KDE model for modeling the well performing configurations. Defaults to 15min_bandwidth (
Optional
[float
]) – The minimum bandwidth for the KDE models. Defaults to 1e-3num_candidates (
Optional
[int
]) – Number of candidates that are sampled to optimize the acquisition function. Defaults to 64bandwidth_factor (
Optional
[int
]) – We sample continuous hyperparameter from a truncated Normal. This factor is multiplied to the bandwidth to define the standard deviation of this truncated Normal. Defaults to 3random_fraction (
Optional
[float
]) – Defines the fraction of configurations that are drawn uniformly at random instead of sampling from the model. Defaults to 0.33
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- class syne_tune.optimizer.schedulers.searchers.kde.MultiFidelityKernelDensityEstimator(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, num_min_data_points=None, top_n_percent=None, min_bandwidth=None, num_candidates=None, bandwidth_factor=None, random_fraction=None, resource_attr=None, **kwargs)[source]
Bases:
KernelDensityEstimator
Adapts
KernelDensityEstimator
to the multi-fidelity setting as proposed by Falkner et al such that we can use it with Hyperband. Following Falkner et al, we fit the KDE only on the highest resource level where we have at least num_min_data_points. Code is based on the implementation by Falkner et al: https://github.com/automl/HpBandSter/tree/master/hpbandsterBOHB: Robust and Efficient Hyperparameter Optimization at ScaleS. Falkner and A. Klein and F. HutterProceedings of the 35th International Conference on Machine LearningAdditional arguments on top of parent class
KernelDensityEstimator
:- Parameters:
resource_attr (
Optional
[str
]) – Name of resource attribute. Defaults toscheduler.resource_attr
inconfigure_scheduler()
Submodules
syne_tune.optimizer.schedulers.searchers.kde.kde_searcher module
- class syne_tune.optimizer.schedulers.searchers.kde.kde_searcher.KernelDensityEstimator(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, num_min_data_points=None, top_n_percent=None, min_bandwidth=None, num_candidates=None, bandwidth_factor=None, random_fraction=None, **kwargs)[source]
Bases:
StochasticAndFilterDuplicatesSearcher
Fits two kernel density estimators (KDE) to model the density of the top N configurations as well as the density of the configurations that are not among the top N, respectively. New configurations are sampled by optimizing the ratio of these two densities. KDE as model for Bayesian optimization has been originally proposed by Bergstra et al. Compared to their original implementation TPE, we use multi-variate instead of univariate KDE, as proposed by Falkner et al. Code is based on the implementation by Falkner et al: https://github.com/automl/HpBandSter/tree/master/hpbandster
Algorithms for Hyper-Parameter OptimizationJ. Bergstra and R. Bardenet and Y. Bengio and B. K{‘e}glProceedings of the 24th International Conference on Advances in Neural Information Processing Systemsand
BOHB: Robust and Efficient Hyperparameter Optimization at ScaleS. Falkner and A. Klein and F. HutterProceedings of the 35th International Conference on Machine LearningNote:
restrict_configurations
is not supported here, this would require reimplementing the selection of configs in_get_config()
.Additional arguments on top of parent class
StochasticAndFilterDuplicatesSearcher
:- Parameters:
mode (
Optional
[str
]) – Mode to use for the metric given, can be “min” or “max”. Is obtained from scheduler inconfigure_scheduler()
. Defaults to “min”num_min_data_points (
Optional
[int
]) – Minimum number of data points that we use to fit the KDEs. As long as less observations have been received inupdate()
, randomly drawn configurations are returned inget_config()
. If set toNone
, we set this to the number of hyperparameters. Defaults toNone
.top_n_percent (
Optional
[int
]) – Determines how many datapoints we use to fit the first KDE model for modeling the well performing configurations. Defaults to 15min_bandwidth (
Optional
[float
]) – The minimum bandwidth for the KDE models. Defaults to 1e-3num_candidates (
Optional
[int
]) – Number of candidates that are sampled to optimize the acquisition function. Defaults to 64bandwidth_factor (
Optional
[int
]) – We sample continuous hyperparameter from a truncated Normal. This factor is multiplied to the bandwidth to define the standard deviation of this truncated Normal. Defaults to 3random_fraction (
Optional
[float
]) – Defines the fraction of configurations that are drawn uniformly at random instead of sampling from the model. Defaults to 0.33
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
syne_tune.optimizer.schedulers.searchers.kde.multi_fidelity_kde_searcher module
- class syne_tune.optimizer.schedulers.searchers.kde.multi_fidelity_kde_searcher.MultiFidelityKernelDensityEstimator(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, num_min_data_points=None, top_n_percent=None, min_bandwidth=None, num_candidates=None, bandwidth_factor=None, random_fraction=None, resource_attr=None, **kwargs)[source]
Bases:
KernelDensityEstimator
Adapts
KernelDensityEstimator
to the multi-fidelity setting as proposed by Falkner et al such that we can use it with Hyperband. Following Falkner et al, we fit the KDE only on the highest resource level where we have at least num_min_data_points. Code is based on the implementation by Falkner et al: https://github.com/automl/HpBandSter/tree/master/hpbandsterBOHB: Robust and Efficient Hyperparameter Optimization at ScaleS. Falkner and A. Klein and F. HutterProceedings of the 35th International Conference on Machine LearningAdditional arguments on top of parent class
KernelDensityEstimator
:- Parameters:
resource_attr (
Optional
[str
]) – Name of resource attribute. Defaults toscheduler.resource_attr
inconfigure_scheduler()
syne_tune.optimizer.schedulers.searchers.sklearn package
- class syne_tune.optimizer.schedulers.searchers.sklearn.SKLearnSurrogateSearcher(config_space, metric, estimator, points_to_evaluate=None, scoring_class=None, num_initial_candidates=250, num_initial_random_choices=3, allow_duplicates=False, restrict_configurations=None, clone_from_state=False, **kwargs)[source]
Bases:
BayesianOptimizationSearcher
SKLearn Surrogate Bayesian optimization for FIFO scheduler
This searcher must be used with
FIFOScheduler
. It provides Bayesian optimization, based on a scikit-learn estimator based surrogate model.Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
estimator (
SKLearnEstimator
) – Instance ofSKLearnEstimator
to be used as surrogate modelscoring_class (
Optional
[Callable
[[Any
],ScoringFunction
]]) – The scoring function (or acquisition function) class and any extra parameters used to instantiate it. IfNone
, expected improvement (EI) is used. Note that the acquisition function is not locally optimized with this searcher.num_initial_candidates (
int
) – Number of candidates sampled for scoring with acquisition function.num_initial_random_choices (
int
) – Number of randomly chosen candidates before surrogate model is used.allow_duplicates (
bool
) – IfTrue
, allow for the same candidate to be selected more than once.restrict_configurations (
Optional
[List
[Dict
[str
,Any
]]]) – If given, the searcher only suggests configurations from this list. Ifallow_duplicates == False
, entries are popped off this list once suggested.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
Submodules
syne_tune.optimizer.schedulers.searchers.sklearn.sklearn_surrogate_searcher module
- class syne_tune.optimizer.schedulers.searchers.sklearn.sklearn_surrogate_searcher.SKLearnSurrogateSearcher(config_space, metric, estimator, points_to_evaluate=None, scoring_class=None, num_initial_candidates=250, num_initial_random_choices=3, allow_duplicates=False, restrict_configurations=None, clone_from_state=False, **kwargs)[source]
Bases:
BayesianOptimizationSearcher
SKLearn Surrogate Bayesian optimization for FIFO scheduler
This searcher must be used with
FIFOScheduler
. It provides Bayesian optimization, based on a scikit-learn estimator based surrogate model.Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
estimator (
SKLearnEstimator
) – Instance ofSKLearnEstimator
to be used as surrogate modelscoring_class (
Optional
[Callable
[[Any
],ScoringFunction
]]) – The scoring function (or acquisition function) class and any extra parameters used to instantiate it. IfNone
, expected improvement (EI) is used. Note that the acquisition function is not locally optimized with this searcher.num_initial_candidates (
int
) – Number of candidates sampled for scoring with acquisition function.num_initial_random_choices (
int
) – Number of randomly chosen candidates before surrogate model is used.allow_duplicates (
bool
) – IfTrue
, allow for the same candidate to be selected more than once.restrict_configurations (
Optional
[List
[Dict
[str
,Any
]]]) – If given, the searcher only suggests configurations from this list. Ifallow_duplicates == False
, entries are popped off this list once suggested.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
syne_tune.optimizer.schedulers.searchers.utils package
- class syne_tune.optimizer.schedulers.searchers.utils.HyperparameterRanges(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]
Bases:
object
Wraps configuration space, provides services around encoding of hyperparameters (mapping configurations to
[0, 1]
vectors and vice versa).If
name_last_pos
is given, the hyperparameter of that name is assigned the final position in the vector returned byto_ndarray()
. This can be used to single out the (time) resource for a GP model, where that component has to come last.If in this case (
name_last_pos
given),value_for_last_pos
is also given, some methods are modified:random_config()
samples a config as normal, but then overwrites thename_last_pos
component byvalue_for_last_pos
get_ndarray_bounds()
works as normal, but returns bound(a, a)
forname_last_pos component
, where a is the internal value corresponding tovalue_for_last_pos
The use case is HPO with a resource attribute. This attribute should be fixed when optimizing the acquisition function, but can take different values in the evaluation data (coming from all previous searches).
If
active_config_space
is given, it contains a subset of non-constant hyperparameters inconfig_space
, and the range of each entry is a subset of the range of the correspondingconfig_space
entry. These active ranges affect the choice of new configs (by sampling). While the internal encoding is based on original ranges, search is restricted to active ranges (e.g., optimization of surrogate model). This option is required to implement transfer tuning, where domain ranges inconfig_space
may be narrower than what data from past tuning jobs requires.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space. Constant hyperparameters are filtered out herename_last_pos (
Optional
[str
]) – See above, optionalvalue_for_last_pos – See above, optional
active_config_space (
Optional
[dict
]) – See above, optionalprefix_keys (
Optional
[List
[str
]]) – If given, these keys intoconfig_space
come first in the internal ordering, which determines the internal encoding. Optional
- property internal_keys: List[str]
- property config_space_for_sampling: Dict[str, Any]
- to_ndarray(config)[source]
Map configuration to
[0, 1]
encoded vector- Parameters:
config (
Dict
[str
,Union
[int
,float
,str
]]) – Configuration to encode- Return type:
ndarray
- Returns:
Encoded vector
- to_ndarray_matrix(configs)[source]
Map configurations to
[0, 1]
encoded matrix- Parameters:
configs (
Iterable
[Dict
[str
,Union
[int
,float
,str
]]]) – Configurations to encode- Return type:
ndarray
- Returns:
Matrix of encoded vectors (rows)
- property ndarray_size: int
- Returns:
Dimensionality of encoded vector returned by
to_ndarray
- from_ndarray(enc_config)[source]
Maps encoded vector back to configuration (can involve rounding)
The encoded vector
enc_config
need to be in the image ofto_ndarray
. In fact, any[0, 1]
valued vector of dimensionalityndarray_size
is allowed.- Parameters:
enc_config (
ndarray
) – Encoded vector- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
Configuration corresponding to encoded vector
- property encoded_ranges: Dict[str, Tuple[int, int]]
Encoded ranges are
[0, 1]
or closed subintervals thereof, in caseactive_config_space
is used.- Returns:
Ranges of hyperparameters in the encoded ndarray representation
- random_config(random_state)[source]
Draws random configuration
- Parameters:
random_state (
RandomState
) – Random state- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
Random configuration
- random_configs(random_state, num_configs)[source]
Draws random configurations
- Parameters:
random_state – Random state
num_configs (
int
) – Number of configurations to sample
- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]- Returns:
Random configurations
- get_ndarray_bounds()[source]
- Return type:
List
[Tuple
[float
,float
]]- Returns:
List of
(lower, upper)
bounds for each dimension in encoded vector representation.
- filter_for_last_pos_value(configs)[source]
If
is_attribute_fixed
,configs
is filtered by removing entries whosename_last_pos attribute
value is different fromvalue_for_last_pos
. Otherwise, it is returned unchanged.- Parameters:
configs (
List
[Dict
[str
,Union
[int
,float
,str
]]]) – List of configs to be filtered- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]- Returns:
Filtered list of configs
- config_to_tuple(config, keys=None, skip_last=False)[source]
- Parameters:
config (
Dict
[str
,Union
[int
,float
,str
]]) – Configurationkeys (
Optional
[List
[str
]]) – Overrides_internal_keys
skip_last (
bool
) – If True andname_last_pos
is used, the corresponding attribute is skipped, so that config and tuple are non-extended
- Return type:
Tuple
[Union
[str
,int
,float
],...
]- Returns:
Tuple representation
- tuple_to_config(config_tpl, keys=None, skip_last=False)[source]
Reverse of
config_to_tuple()
.- Parameters:
config_tpl (
Tuple
[Union
[str
,int
,float
],...
]) – Tuple representationkeys (
Optional
[List
[str
]]) – Overrides_internal_keys
skip_last (
bool
) – If True andname_last_pos
is used, the corresponding attribute is skipped, so that config and tuple are non-extended
- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
Configuration corresponding to
config_tpl
- config_to_match_string(config, keys=None, skip_last=False)[source]
Maps configuration to match string, used to compare for approximate equality. Two configurations are considered to be different if their match strings are not the same.
- Parameters:
config (
Dict
[str
,Union
[int
,float
,str
]]) – Configurationkeys (
Optional
[List
[str
]]) – Overrides_internal_keys
skip_last (
bool
) – If True andname_last_pos
is used, the corresponding attribute is skipped, so that config and match string are non-extended
- Return type:
str
- Returns:
Match string
- syne_tune.optimizer.schedulers.searchers.utils.make_hyperparameter_ranges(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]
Default method to create
HyperparameterRanges
fromconfig_space
- Parameters:
config_space (
Dict
) – Configuration spacename_last_pos (
Optional
[str
]) – SeeHyperparameterRanges
, optionalvalue_for_last_pos – See
HyperparameterRanges
, optionalactive_config_space (
Optional
[Dict
]) – SeeHyperparameterRanges
, optionalprefix_keys (
Optional
[List
[str
]]) – SeeHyperparameterRanges
, optional
- Return type:
- Returns:
New object
- class syne_tune.optimizer.schedulers.searchers.utils.HyperparameterRangesImpl(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]
Bases:
HyperparameterRanges
Basic implementation of
HyperparameterRanges
.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spacename_last_pos (
Optional
[str
]) – SeeHyperparameterRanges
, optionalvalue_for_last_pos – See
HyperparameterRanges
, optionalactive_config_space (
Optional
[Dict
[str
,Any
]]) – SeeHyperparameterRanges
, optionalprefix_keys (
Optional
[List
[str
]]) – SeeHyperparameterRanges
, optional
- property ndarray_size: int
- Returns:
Dimensionality of encoded vector returned by
to_ndarray
- to_ndarray(config)[source]
Map configuration to
[0, 1]
encoded vector- Parameters:
config (
Dict
[str
,Union
[int
,float
,str
]]) – Configuration to encode- Return type:
ndarray
- Returns:
Encoded vector
- from_ndarray(enc_config)[source]
Maps encoded vector back to configuration (can involve rounding)
The encoded vector
enc_config
need to be in the image ofto_ndarray
. In fact, any[0, 1]
valued vector of dimensionalityndarray_size
is allowed.- Parameters:
enc_config (
ndarray
) – Encoded vector- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
Configuration corresponding to encoded vector
- property encoded_ranges: Dict[str, Tuple[int, int]]
Encoded ranges are
[0, 1]
or closed subintervals thereof, in caseactive_config_space
is used.- Returns:
Ranges of hyperparameters in the encoded ndarray representation
Submodules
syne_tune.optimizer.schedulers.searchers.utils.common module
syne_tune.optimizer.schedulers.searchers.utils.default_arguments module
- class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.CheckType[source]
Bases:
object
- class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.Float(lower=None, upper=None)[source]
Bases:
CheckType
- class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.Integer(lower=None, upper=None)[source]
Bases:
CheckType
- class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.IntegerOrNone(lower=None, upper=None)[source]
Bases:
Integer
- class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.Categorical(choices)[source]
Bases:
CheckType
- class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.String[source]
Bases:
CheckType
- class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.Boolean[source]
Bases:
CheckType
- class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.Dictionary[source]
Bases:
CheckType
- syne_tune.optimizer.schedulers.searchers.utils.default_arguments.check_and_merge_defaults(options, mandatory, default_options, constraints=None, dict_name=None)[source]
First, check that all keys in mandatory appear in options. Second, create result_options by merging
options
anddefault_options
, where entries inoptions
have precedence. Finally, ifconstraints
is given, this is used to check validity of values.- Parameters:
options (
Dict
[str
,Any
]) – Input argumentsmandatory (
Set
[str
]) – Set of mandatory argument namesdefault_options (
Dict
[str
,Any
]) – Default values foroptions
constraints (
Optional
[Dict
[str
,CheckType
]]) – See above, optionaldict_name (
Optional
[str
]) – Prefix used in assert messages, optional
- Return type:
Dict
[str
,Any
]- Returns:
Output arguments
- syne_tune.optimizer.schedulers.searchers.utils.default_arguments.filter_by_key(options, remove_keys)[source]
Filter options by removing entries whose keys are in
remove_keys
. Used to filter kwargs passed to a constructor, before passing it to the superclass constructor.- Parameters:
options (
Dict
[str
,Any
]) – Arguments to be filteredremove_keys (
Set
[str
]) – See above
- Return type:
Dict
[str
,Any
]- Returns:
Filtered options
syne_tune.optimizer.schedulers.searchers.utils.exclusion_list module
- class syne_tune.optimizer.schedulers.searchers.utils.exclusion_list.ExclusionList(hp_ranges, configurations=None)[source]
Bases:
object
Maintains exclusion list of configs, to avoid choosing configs several times. In fact,
self.excl_set
maintains a set of match strings.The exclusion list contains non-extended configs, but it can be fed with and queried with extended configs. In that case, the resource attribute is removed from the config.
- Parameters:
hp_ranges (
HyperparameterRanges
) – Encodes configurations to vectorsconfigurations (
Union
[List
[Dict
[str
,Union
[int
,float
,str
]]],Set
[str
],None
]) – Initial configurations. Default is empty
- class syne_tune.optimizer.schedulers.searchers.utils.exclusion_list.ExclusionListFromState(state, filter_observed_data=None)[source]
Bases:
ExclusionList
syne_tune.optimizer.schedulers.searchers.utils.hp_ranges module
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges.HyperparameterRanges(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]
Bases:
object
Wraps configuration space, provides services around encoding of hyperparameters (mapping configurations to
[0, 1]
vectors and vice versa).If
name_last_pos
is given, the hyperparameter of that name is assigned the final position in the vector returned byto_ndarray()
. This can be used to single out the (time) resource for a GP model, where that component has to come last.If in this case (
name_last_pos
given),value_for_last_pos
is also given, some methods are modified:random_config()
samples a config as normal, but then overwrites thename_last_pos
component byvalue_for_last_pos
get_ndarray_bounds()
works as normal, but returns bound(a, a)
forname_last_pos component
, where a is the internal value corresponding tovalue_for_last_pos
The use case is HPO with a resource attribute. This attribute should be fixed when optimizing the acquisition function, but can take different values in the evaluation data (coming from all previous searches).
If
active_config_space
is given, it contains a subset of non-constant hyperparameters inconfig_space
, and the range of each entry is a subset of the range of the correspondingconfig_space
entry. These active ranges affect the choice of new configs (by sampling). While the internal encoding is based on original ranges, search is restricted to active ranges (e.g., optimization of surrogate model). This option is required to implement transfer tuning, where domain ranges inconfig_space
may be narrower than what data from past tuning jobs requires.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space. Constant hyperparameters are filtered out herename_last_pos (
Optional
[str
]) – See above, optionalvalue_for_last_pos – See above, optional
active_config_space (
Optional
[dict
]) – See above, optionalprefix_keys (
Optional
[List
[str
]]) – If given, these keys intoconfig_space
come first in the internal ordering, which determines the internal encoding. Optional
- property internal_keys: List[str]
- property config_space_for_sampling: Dict[str, Any]
- to_ndarray(config)[source]
Map configuration to
[0, 1]
encoded vector- Parameters:
config (
Dict
[str
,Union
[int
,float
,str
]]) – Configuration to encode- Return type:
ndarray
- Returns:
Encoded vector
- to_ndarray_matrix(configs)[source]
Map configurations to
[0, 1]
encoded matrix- Parameters:
configs (
Iterable
[Dict
[str
,Union
[int
,float
,str
]]]) – Configurations to encode- Return type:
ndarray
- Returns:
Matrix of encoded vectors (rows)
- property ndarray_size: int
- Returns:
Dimensionality of encoded vector returned by
to_ndarray
- from_ndarray(enc_config)[source]
Maps encoded vector back to configuration (can involve rounding)
The encoded vector
enc_config
need to be in the image ofto_ndarray
. In fact, any[0, 1]
valued vector of dimensionalityndarray_size
is allowed.- Parameters:
enc_config (
ndarray
) – Encoded vector- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
Configuration corresponding to encoded vector
- property encoded_ranges: Dict[str, Tuple[int, int]]
Encoded ranges are
[0, 1]
or closed subintervals thereof, in caseactive_config_space
is used.- Returns:
Ranges of hyperparameters in the encoded ndarray representation
- random_config(random_state)[source]
Draws random configuration
- Parameters:
random_state (
RandomState
) – Random state- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
Random configuration
- random_configs(random_state, num_configs)[source]
Draws random configurations
- Parameters:
random_state – Random state
num_configs (
int
) – Number of configurations to sample
- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]- Returns:
Random configurations
- get_ndarray_bounds()[source]
- Return type:
List
[Tuple
[float
,float
]]- Returns:
List of
(lower, upper)
bounds for each dimension in encoded vector representation.
- filter_for_last_pos_value(configs)[source]
If
is_attribute_fixed
,configs
is filtered by removing entries whosename_last_pos attribute
value is different fromvalue_for_last_pos
. Otherwise, it is returned unchanged.- Parameters:
configs (
List
[Dict
[str
,Union
[int
,float
,str
]]]) – List of configs to be filtered- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]- Returns:
Filtered list of configs
- config_to_tuple(config, keys=None, skip_last=False)[source]
- Parameters:
config (
Dict
[str
,Union
[int
,float
,str
]]) – Configurationkeys (
Optional
[List
[str
]]) – Overrides_internal_keys
skip_last (
bool
) – If True andname_last_pos
is used, the corresponding attribute is skipped, so that config and tuple are non-extended
- Return type:
Tuple
[Union
[str
,int
,float
],...
]- Returns:
Tuple representation
- tuple_to_config(config_tpl, keys=None, skip_last=False)[source]
Reverse of
config_to_tuple()
.- Parameters:
config_tpl (
Tuple
[Union
[str
,int
,float
],...
]) – Tuple representationkeys (
Optional
[List
[str
]]) – Overrides_internal_keys
skip_last (
bool
) – If True andname_last_pos
is used, the corresponding attribute is skipped, so that config and tuple are non-extended
- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
Configuration corresponding to
config_tpl
- config_to_match_string(config, keys=None, skip_last=False)[source]
Maps configuration to match string, used to compare for approximate equality. Two configurations are considered to be different if their match strings are not the same.
- Parameters:
config (
Dict
[str
,Union
[int
,float
,str
]]) – Configurationkeys (
Optional
[List
[str
]]) – Overrides_internal_keys
skip_last (
bool
) – If True andname_last_pos
is used, the corresponding attribute is skipped, so that config and match string are non-extended
- Return type:
str
- Returns:
Match string
syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_factory module
- syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_factory.make_hyperparameter_ranges(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]
Default method to create
HyperparameterRanges
fromconfig_space
- Parameters:
config_space (
Dict
) – Configuration spacename_last_pos (
Optional
[str
]) – SeeHyperparameterRanges
, optionalvalue_for_last_pos – See
HyperparameterRanges
, optionalactive_config_space (
Optional
[Dict
]) – SeeHyperparameterRanges
, optionalprefix_keys (
Optional
[List
[str
]]) – SeeHyperparameterRanges
, optional
- Return type:
- Returns:
New object
syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl module
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRange(name)[source]
Bases:
object
- property name: str
- syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.scale_from_zero_one(value, lower_bound, upper_bound, scaling, lower_internal, upper_internal)[source]
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeContinuous(name, lower_bound, upper_bound, scaling, active_lower_bound=None, active_upper_bound=None)[source]
Bases:
HyperparameterRange
Real valued hyperparameter. If
active_lower_bound
and/oractive_upper_bound
are given, the feasible interval for values of new configs is reduced, but data can still contain configs with values in[lower_bound, upper_bound]
, and internal encoding is done w.r.t. this original range.- Parameters:
name (
str
) – Name of hyperparameterlower_bound (
float
) – Lower bound (included)upper_bound (
float
) – Upper bound (included)scaling (
Scaling
) – Determines internal representation, wherebyparameter = scaling(internal)
.active_lower_bound (
Optional
[float
]) – See aboveactive_upper_bound (
Optional
[float
]) – See above
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeInteger(name, lower_bound, upper_bound, scaling, active_lower_bound=None, active_upper_bound=None)[source]
Bases:
HyperparameterRange
Integer valued hyperparameter. Both bounds are included in the valid values. Under the hood generates a continuous range from
lower_bound - 0.5
toupper_bound + 0.5
. See docs for continuous hyperparameter for more information.- Parameters:
name (
str
) – Name of hyperparameterlower_bound (
int
) – Lower bound (integer, included)upper_bound (
int
) – Upper bound (integer, included)scaling (
Scaling
) – Determines internal representation, wherebyparameter = scaling(internal)
.active_lower_bound (
Optional
[int
]) – See aboveactive_upper_bound (
Optional
[int
]) – See above
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeFiniteRange(name, lower_bound, upper_bound, size, scaling, cast_int=False)[source]
Bases:
HyperparameterRange
Finite range numerical hyperparameter, see
FiniteRange
. Internally, we use anint
with linear scaling.Note: Different to
HyperparameterRangeContinuous
, we require thatlower_bound < upper_bound
andsize >=2
.- Parameters:
name (
str
) – Name of hyperparameterlower_bound (
float
) – Lower bound (included)upper_bound (
float
) – Upper bound (included)size (
int
) – Number of values in rangescaling (
Scaling
) – Determines internal representation, wherebyparameter = scaling(internal)
.cast_int (
bool
) – If True, values are cast toint
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeCategorical(name, choices)[source]
Bases:
HyperparameterRange
Base class for categorical hyperparameter.
- Parameters:
name (
str
) – Name of hyperparameterchoices (
Tuple
[Any
,...
]) – Values parameter can take
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeCategoricalNonBinary(name, choices, active_choices=None)[source]
Bases:
HyperparameterRangeCategorical
Can take on discrete set of values. We use one-hot encoding internally. If the value range has size 2, it is more efficient to use
HyperparameterRangeCategoricalBinary
.- Parameters:
name (
str
) – Name of hyperparameterchoices (
Tuple
[Any
,...
]) – Values parameter can takeactive_choices (
Optional
[Tuple
[Any
,...
]]) – If given, must be nonempty subset ofchoices
.
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeCategoricalBinary(name, choices, active_choices=None)[source]
Bases:
HyperparameterRangeCategorical
Here, the value range must be of size 2. The internal encoding is a single int, so 1 instead of 2 dimensions.
- Parameters:
name (
str
) – Name of hyperparameterchoices (
Tuple
[Any
,...
]) – Values parameter can take (must be size 2)active_choices (
Optional
[Tuple
[Any
,...
]]) – If given, must be nonempty subset ofchoices
.
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeOrdinalEqual(name, choices, active_choices=None)[source]
Bases:
HyperparameterRangeCategorical
Ordinal hyperparameter, equal distance encoding. See also
Ordinal
.- Parameters:
name (
str
) – Name of hyperparameterchoices (
Tuple
[Any
,...
]) – Values parameter can takeactive_choices (
Optional
[Tuple
[Any
,...
]]) – If given, must be nonempty contiguous subsequence ofchoices
.
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeOrdinalNearestNeighbor(name, choices, log_scale=False, active_choices=None)[source]
Bases:
HyperparameterRangeCategorical
Ordinal hyperparameter, nearest neighbour encoding. See also
OrdinalNearestNeighbor
.- Parameters:
name (
str
) – Name of hyperparameterchoices (
Tuple
[Any
,...
]) – Values parameter can take (numerical values, strictly increasing, size>= 2
)log_scale (
bool
) – IfTrue
, nearest neighbour done in log (choices
must be positive)active_choices (
Optional
[Tuple
[Any
,...
]]) – If given, must be nonempty contiguous subsequence ofchoices
.
- property log_scale: bool
- class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangesImpl(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]
Bases:
HyperparameterRanges
Basic implementation of
HyperparameterRanges
.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spacename_last_pos (
Optional
[str
]) – SeeHyperparameterRanges
, optionalvalue_for_last_pos – See
HyperparameterRanges
, optionalactive_config_space (
Optional
[Dict
[str
,Any
]]) – SeeHyperparameterRanges
, optionalprefix_keys (
Optional
[List
[str
]]) – SeeHyperparameterRanges
, optional
- property ndarray_size: int
- Returns:
Dimensionality of encoded vector returned by
to_ndarray
- to_ndarray(config)[source]
Map configuration to
[0, 1]
encoded vector- Parameters:
config (
Dict
[str
,Union
[int
,float
,str
]]) – Configuration to encode- Return type:
ndarray
- Returns:
Encoded vector
- from_ndarray(enc_config)[source]
Maps encoded vector back to configuration (can involve rounding)
The encoded vector
enc_config
need to be in the image ofto_ndarray
. In fact, any[0, 1]
valued vector of dimensionalityndarray_size
is allowed.- Parameters:
enc_config (
ndarray
) – Encoded vector- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
Configuration corresponding to encoded vector
- property encoded_ranges: Dict[str, Tuple[int, int]]
Encoded ranges are
[0, 1]
or closed subintervals thereof, in caseactive_config_space
is used.- Returns:
Ranges of hyperparameters in the encoded ndarray representation
- syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.decode_extended_features(features_ext, resource_attr_range)[source]
Given matrix of features from extended configs, corresponding to
ExtendedConfiguration
, split into feature matrix from normal configs and resource values.- Parameters:
features_ext (
ndarray
) – Matrix of features from extended configsresource_attr_range (
Tuple
[int
,int
]) –(r_min, r_max)
- Return type:
(
ndarray
,ndarray
)- Returns:
(features, resources)
syne_tune.optimizer.schedulers.searchers.utils.scaling module
syne_tune.optimizer.schedulers.searchers.utils.warmstarting module
- syne_tune.optimizer.schedulers.searchers.utils.warmstarting.create_hp_ranges_for_warmstarting(**kwargs)[source]
See
GPFIFOSearcher
for details on “transfer_learning_task_attr”, “transfer_learning_active_task”, “transfer_learning_active_config_space” as optional fields inkwargs
. If given, they determineactive_config_space
andprefix_keys
ofhp_ranges
created here, and they also place constraints onconfig_space
.This function is not only called in
gp_searcher_factory
to createhp_ranges
for a newGPFIFOSearcher
object. It is also needed to create theTuningJobState
object containing the data to be used in warmstarting.- Return type:
- syne_tune.optimizer.schedulers.searchers.utils.warmstarting.create_filter_observed_data_for_warmstarting(**kwargs)[source]
See
GPFIFOSearcher
for details on transfer_learning_task_attr’, ‘transfer_learning_active_task’ as optional fields inkwargs
.- Return type:
Optional
[Callable
[[Dict
[str
,Union
[int
,float
,str
]]],bool
]]
- syne_tune.optimizer.schedulers.searchers.utils.warmstarting.create_base_gp_kernel_for_warmstarting(hp_ranges, **kwargs)[source]
In the transfer learning case, the base kernel is a product of two
Matern52
kernels, the first non-ARD over the categorical parameter determining the task, the second ARD over the remaining parameters.- Return type:
Submodules
syne_tune.optimizer.schedulers.searchers.bracket_distribution module
- class syne_tune.optimizer.schedulers.searchers.bracket_distribution.BracketDistribution[source]
Bases:
object
Configures asynchronous multi-fidelity schedulers such as
HyperbandScheduler
with distribution over brackets. This distribution can be fixed up front, or change adaptively during the course of an experiment. It has an effect only if the scheduler is run with more than one bracket.
- class syne_tune.optimizer.schedulers.searchers.bracket_distribution.DefaultHyperbandBracketDistribution[source]
Bases:
BracketDistribution
Implements default bracket distribution, where probability for each bracket is proportional to the number of slots in each bracket in synchronous Hyperband.
syne_tune.optimizer.schedulers.searchers.gp_fifo_searcher module
- class syne_tune.optimizer.schedulers.searchers.gp_fifo_searcher.GPFIFOSearcher(config_space, metric, points_to_evaluate=None, clone_from_state=False, **kwargs)[source]
Bases:
BayesianOptimizationSearcher
Gaussian process Bayesian optimization for FIFO scheduler
This searcher must be used with
FIFOScheduler
. It provides Bayesian optimization, based on a Gaussian process surrogate model.It is not recommended creating
GPFIFOSearcher
searcher objects directly, but rather to createFIFOScheduler
objects withsearcher="bayesopt"
, and passing arguments here insearch_options
. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
to create components in a consistent way.Most of the implementation is generic in
BayesianOptimizationSearcher
.Note: If metric values are to be maximized (
mode-"max"
in scheduler), the searcher usesmap_reward
to map metric values to internal criterion values, and minimizes the latter. The default choice is to multiply values by -1.Pending configurations (for which evaluation tasks are currently running) are dealt with by fantasizing (i.e., target values are drawn from the current posterior, and acquisition functions are averaged over this sample, see
num_fantasy_samples
).The GP surrogate model uses a Matern 5/2 covariance function with automatic relevance determination (ARD) of input attributes, and a constant mean function. The acquisition function is expected improvement (EI). All hyperparameters of the surrogate model are estimated by empirical Bayes (maximizing the marginal likelihood). In general, this hyperparameter fitting is the most expensive part of a
get_config()
call.Note that the full logic of construction based on arguments is given in :mod:
syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
. In particular, seegp_fifo_searcher_defaults()
for default values.Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
clone_from_state (bool) – Internal argument, do not use
resource_attr (str, optional) – Name of resource attribute in reports. This is optional here, but required for multi-fidelity searchers. If
resource_attr
andcost_attr
are given, cost values are read from each report and stored in the state. This allows cost models to be fit on more data.cost_attr (str, optional) – Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Needed only by cost-aware searchers. Depending on whether
resource_attr
is given, cost values are read from each report or only at the end.num_init_random (int, optional) – Number of initial
get_config()
calls for which randomly sampled configs are returned. Afterwards, the model-based searcher is used. Defaults toDEFAULT_NUM_INITIAL_RANDOM_EVALUATIONS
num_init_candidates (int, optional) – Number of initial candidates sampled at random in order to seed the model-based search in
get_config
. Defaults toDEFAULT_NUM_INITIAL_CANDIDATES
num_fantasy_samples (int, optional) – Number of samples drawn for fantasizing (latent target values for pending evaluations), defaults to 20
no_fantasizing (bool, optional) – If True, fantasizing is not done and pending evaluations are ignored. This may lead to loss of diversity in decisions. Defaults to
False
input_warping (bool, optional) – If
True
, we use a warping transform, so the kernel function becomes \(k(w(x), w(x'))\), where \(w(x)\) is a warping transform parameterized by two non-negative numbers per component, which are learned as hyperparameters. See alsoWarping
. Coordinates which belong to categorical hyperparameters, are not warped. Defaults toFalse
.boxcox_transform (bool, optional) – If
True
, target values are transformed before being fitted with a Gaussian marginal likelihood. This is using the Box-Cox transform with a parameter \(\lambda\), which is learned alongside other parameters of the surrogate model. The transform is \(\log y\) for \(\lambda = 0\), and \(y - 1\) for \(\lambda = 1\). This option requires the targets to be positive. Defaults toFalse
.gp_base_kernel (str, optional) – Selects the covariance (or kernel) function to be used. Supported choices are
SUPPORTED_BASE_MODELS
. Defaults to “matern52-ard” (Matern 5/2 with automatic relevance determination).acq_function (str, optional) – Selects the acquisition function to be used. Supported choices are
SUPPORTED_ACQUISITION_FUNCTIONS
. Defaults to “ei” (expected improvement acquisition function).acq_function_kwargs (dict, optional) – Some acquisition functions have additional parameters, they can be passed here. If none are given, default values are used.
initial_scoring (str, optional) –
Scoring function to rank initial candidates (local optimization of EI is started from top scorer):
”thompson_indep”: Independent Thompson sampling; randomized score, which can increase exploration
”acq_func”: score is the same (EI) acquisition function which is used for local optimization afterwards
Defaults to
DEFAULT_INITIAL_SCORING
skip_local_optimization (bool, optional) – If
True
, the local gradient-based optimization of the acquisition function is skipped, and the top-ranked initial candidate (after initial scoring) is returned instead. In this case,initial_scoring="acq_func"
makes most sense, otherwise the acquisition function will not be used. Defaults to Falseopt_nstarts (int, optional) – Parameter for surrogate model fitting. Number of random restarts. Defaults to 2
opt_maxiter (int, optional) – Parameter for surrogate model fitting. Maximum number of iterations per restart. Defaults to 50
opt_warmstart (bool, optional) – Parameter for surrogate model fitting. If
True
, each fitting is started from the previous optimum. Not recommended in general. Defaults toFalse
opt_verbose (bool, optional) – Parameter for surrogate model fitting. If
True
, lots of output. Defaults toFalse
max_size_data_for_model (int, optional) – If this is set, we limit the number of observations the surrogate model is fitted on this value. If there are more observations, they are down sampled, see
SubsampleSingleFidelityStateConverter
for details. This down sampling is repeated every time the model is fit. Theopt_skip_*
predicates are evaluated before the state is downsampled. PassNone
not to apply such a threshold. The default isDEFAULT_MAX_SIZE_DATA_FOR_MODEL
.max_size_top_fraction (float, optional) – Only used if
max_size_data_for_model
is set. This fraction of the down sampled set is filled with the top entries in the full set, the remaining ones are sampled at random from the full set, seeSubsampleSingleFidelityStateConverter
for details. Defaults to 0.25.opt_skip_init_length (int, optional) – Parameter for surrogate model fitting, skip predicate. Fitting is never skipped as long as number of observations below this threshold. Defaults to 150
opt_skip_period (int, optional) – Parameter for surrogate model fitting, skip predicate. If
>1
, and number of observations aboveopt_skip_init_length
, fitting is done only K-th call, and skipped otherwise. Defaults to 1 (no skipping)allow_duplicates (bool, optional) – If
True
,get_config()
may return the same configuration more than once. Defaults toFalse
restrict_configurations (List[dict], optional) – If given, the searcher only suggests configurations from this list. This needs
skip_local_optimization == True
. Ifallow_duplicates == False
, entries are popped off this list once suggested.map_reward (str or
MapReward
, optional) –In the scheduler, the metric may be minimized or maximized, but internally, Bayesian optimization is minimizing the criterion.
map_reward
converts from metric to internal criterion:”minus_x”:
criterion = -metric
”<a>_minus_x”:
criterion = <a> - metric
. For example “1_minus_x” maps accuracy to zero-one error
From a technical standpoint, it does not matter what is chosen here, because criterion is only used internally. Also note that criterion data is always normalized to mean 0, variance 1 before fitted with a Gaussian process. Defaults to “1_minus_x”
transfer_learning_task_attr (str, optional) – Used to support transfer HPO, where the state contains observed data from several tasks, one of which is the active one. To this end,
config_space
must contain a categorical parameter of nametransfer_learning_task_attr
, whose range are all task IDs. Also,transfer_learning_active_task
must denote the active task, andtransfer_learning_active_config_space
is used asactive_config_space
argument inHyperparameterRanges
. This allows us to use a narrower search space for the active task than for the union of all tasks (config_space
must be that), which is needed if some configurations of non-active tasks lie outside of the ranges inactive_config_space
. One of the implications is thatfilter_observed_data()
is selecting configs of the active task, so that incumbents or exclusion lists are restricted to data from the active task.transfer_learning_active_task (str, optional) – See
transfer_learning_task_attr
.transfer_learning_active_config_space (Dict[str, Any], optional) – See
transfer_learning_task_attr
. If not given,config_space
is the search space for the active task as well. This active config space need not contain thetransfer_learning_task_attr
parameter. In fact, this parameter is set to a categorical withtransfer_learning_active_task
as single value, so that new configs are chosen for the active task only.transfer_learning_model (str, optional) –
See
transfer_learning_task_attr
. Specifies the surrogate model to be used for transfer learning:”matern52_product”: Kernel is product of Matern 5/2 (not ARD) on
transfer_learning_task_attr
and Matern 5/2 (ARD) on the rest. Assumes that data from same task are more closely related than data from different tasks”matern52_same”: Kernel is Matern 5/2 (ARD) on the rest of the variables,
transfer_learning_task_attr
is ignored. Assumes that data from all tasks can be merged together
Defaults to “matern52_product”
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
syne_tune.optimizer.schedulers.searchers.gp_multifidelity_searcher module
- class syne_tune.optimizer.schedulers.searchers.gp_multifidelity_searcher.GPMultiFidelitySearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
GPFIFOSearcher
Gaussian process Bayesian optimization for asynchronous Hyperband scheduler.
This searcher must be used with a scheduler of type
MultiFidelitySchedulerMixin
. It provides a novel combination of Bayesian optimization, based on a Gaussian process surrogate model, with Hyperband scheduling. In particular, observations across resource levels are modelled jointly.It is not recommended to create
GPMultiFidelitySearcher
searcher objects directly, but rather to createHyperbandScheduler
objects withsearcher="bayesopt"
, and passing arguments here insearch_options
. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
to create components in a consistent way.Most of
GPFIFOSearcher
comments apply here as well. In multi-fidelity HPO, we optimize a function \(f(\mathbf{x}, r)\), \(\mathbf{x}\) the configuration, \(r\) the resource (or time) attribute. The latter must be a positive integer. In most applications,resource_attr == "epoch"
, and the resource is the number of epochs already trained.If
model == "gp_multitask"
(default), we model the function \(f(\mathbf{x}, r)\) jointly over all resource levels \(r\) at which it is observed (but seesearcher_data
inHyperbandScheduler
). The kernel and mean function of our surrogate model are over \((\mathbf{x}, r)\). The surrogate model is selected bygp_resource_kernel
. More details about the supported kernels is in:Tiao, Klein, Lienart, Archambeau, Seeger (2020)Model-based Asynchronous Hyperparameter and Neural Architecture SearchThe acquisition function (EI) which is optimized in
get_config()
, is obtained by fixing the resource level \(r\) to a value which is determined depending on the current state. Ifresource_acq
== ‘bohb’, \(r\) is the largest value<= max_t
, where we have seen \(\ge \mathrm{dimension}(\mathbf{x})\) metric values. Ifresource_acq == "first"
, \(r\) is the first milestone which config \(\mathbf{x}\) would reach when started.Additional arguments on top of parent class
GPFIFOSearcher
.- Parameters:
model (str, optional) –
Selects surrogate model (learning curve model) to be used. Choices are:
”gp_multitask” (default): GP multi-task surrogate model
”gp_independent”: Independent GPs for each rung level, sharing an ARD kernel
”gp_issm”: Gaussian-additive model of ISSM type
”gp_expdecay”: Gaussian-additive model of exponential decay type (as in Freeze Thaw Bayesian Optimization)
gp_resource_kernel (str, optional) – Only relevant for
model == "gp_multitask"
. Surrogate model over criterion function \(f(\mathbf{x}, r)\), \(\mathbf{x}\) the config, \(r\) the resource. Note that \(\mathbf{x}\) is encoded to be a vector with entries in[0, 1]
, and \(r\) is linearly mapped to[0, 1]
, while the criterion data is normalized to mean 0, variance 1. The reference above provides details on the models supported here. For the exponential decay kernel, the base kernel over \(\mathbf{x}\) is Matern 5/2 ARD. SeeSUPPORTED_RESOURCE_MODELS
for supported choices. Defaults to “exp-decay-sum”resource_acq (str, optional) – Only relevant for ``model in
{"gp_multitask", "gp_independent"}
. Determines how the EI acquisition function is used. Values: “bohb”, “first”. Defaults to “bohb”max_size_data_for_model (int, optional) –
If this is set, we limit the number of observations the surrogate model is fitted on this value. If there are more observations, they are down sampled, see
SubsampleMultiFidelityStateConverter
for details. This down sampling is repeated every time the model is fit, which ensures that most recent data is taken into account. Theopt_skip_*
predicates are evaluated before the state is downsampled.Pass
None
not to apply such a threshold. The default isDEFAULT_MAX_SIZE_DATA_FOR_MODEL
.opt_skip_num_max_resource (bool, optional) – Parameter for surrogate model fitting, skip predicate. If
True
, and number of observations aboveopt_skip_init_length
, fitting is done only when there is a new datapoint atr = max_t
, and skipped otherwise. Defaults toFalse
issm_gamma_one (bool, optional) – Only relevant for
model == "gp_issm"
. IfTrue
, the gamma parameter of the ISSM is fixed to 1, otherwise it is optimized over. Defaults toFalse
expdecay_normalize_inputs (bool, optional) – Only relevant for
model == "gp_expdecay"
. IfTrue
, resource values r are normalized to[0, 1]
as input to the exponential decay surrogate model. Defaults toFalse
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- register_pending(trial_id, config=None, milestone=None)[source]
Registers trial as pending. This means the corresponding evaluation task is running. Once it finishes, update is called for this trial.
- evaluation_failed(trial_id)[source]
Called by scheduler if an evaluation job for a trial failed.
The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).
- Parameters:
trial_id (
str
) – ID of trial whose evaluated failed
- cleanup_pending(trial_id)[source]
Removes all pending evaluations for trial
trial_id
.This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.
- Parameters:
trial_id (
str
) – ID of trial whose pending evaluations should be cleared
- remove_case(trial_id, **kwargs)[source]
Remove data case previously appended by
_update()
For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.
- Parameters:
trial_id (
str
) – ID of trial whose data is to be removedkwargs – Extra arguments, optional
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object
syne_tune.optimizer.schedulers.searchers.gp_searcher_factory module
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.gp_fifo_searcher_factory(**kwargs)[source]
Returns
kwargs
for_create_internal()
, based onkwargs
equal tosearch_options
passed to and extended by scheduler (seeFIFOScheduler
).Extensions of
kwargs
by the scheduler:scheduler
: Name of scheduler("fifo", "hyperband_*")
config_space
: Configuration space
Only Hyperband schedulers:
resource_attr
: Name of resource (or time) attributemax_epochs
: Maximum resource value
- Parameters:
kwargs –
search_options
coming from scheduler- Return type:
Dict
[str
,Any
]- Returns:
kwargs
for_create_internal()
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.gp_multifidelity_searcher_factory(**kwargs)[source]
Returns
kwargs
for_create_internal()
, based onkwargs
equal tosearch_options
passed to and extended by scheduler (seeHyperbandScheduler
).- Parameters:
kwargs –
search_options
coming from scheduler- Return type:
Dict
[str
,Any
]- Returns:
kwargs
for_create_internal()
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.constrained_gp_fifo_searcher_factory(**kwargs)[source]
Returns
kwargs
for_create_internal()
, based onkwargs
equal tosearch_options
passed to and extended by scheduler (seeFIFOScheduler
).- Parameters:
kwargs –
search_options
coming from scheduler- Return type:
Dict
[str
,Any
]- Returns:
kwargs
for_create_internal()
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.cost_aware_coarse_gp_fifo_searcher_factory(**kwargs)[source]
Returns
kwargs
for_create_internal()
, based onkwargs
equal tosearch_options
passed to and extended by scheduler (seeFIFOScheduler
).This is for the coarse-grained variant, where costs \(c(x)\) are obtained together with metric values and are given a GP surrogate model.
- Parameters:
kwargs –
search_options
coming from scheduler- Return type:
Dict
[str
,Any
]- Returns:
kwargs
for_create_internal()
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.cost_aware_fine_gp_fifo_searcher_factory(**kwargs)[source]
Returns
kwargs
for_create_internal()
, based onkwargs
equal tosearch_options
passed to and extended by scheduler (seeFIFOScheduler
).This is for the fine-grained variant, where costs \(c(x, r)\) are obtained with each report and are represented by a
CostModel
surrogate model.- Parameters:
kwargs –
search_options
coming from scheduler- Return type:
Dict
[str
,Any
]- Returns:
kwargs
for_create_internal()
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.cost_aware_gp_multifidelity_searcher_factory(**kwargs)[source]
Returns
kwargs
for_create_internal()
, based onkwargs
equal tosearch_options
passed to and extended by scheduler (seeHyperbandScheduler
).- Parameters:
kwargs –
search_options
coming from scheduler- Return type:
Dict
[str
,Any
]- Returns:
kwargs
for_create_internal()
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.hypertune_searcher_factory(**kwargs)[source]
Returns
kwargs
for_create_internal()
, based onkwargs
equal tosearch_options
passed to and extended by scheduler (seeHyperbandScheduler
).- Parameters:
kwargs –
search_options
coming from scheduler- Return type:
Dict
[str
,Any
]- Returns:
kwargs
for_create_internal()
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.gp_fifo_searcher_defaults(kwargs)[source]
Returns
mandatory
,default_options
,config_space
forcheck_and_merge_defaults()
to be applied tosearch_options
forGPFIFOSearcher
.- Return type:
(
Set
[str
],dict
,dict
)- Returns:
(mandatory, default_options, config_space)
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.gp_multifidelity_searcher_defaults(kwargs)[source]
Returns
mandatory
,default_options
,config_space
forcheck_and_merge_defaults()
to be applied tosearch_options
forGPMultiFidelitySearcher
.- Return type:
(
Set
[str
],dict
,dict
)- Returns:
(mandatory, default_options, config_space)
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.constrained_gp_fifo_searcher_defaults(kwargs)[source]
Returns
mandatory
,default_options
,config_space
forcheck_and_merge_defaults()
to be applied tosearch_options
forConstrainedGPFIFOSearcher
.- Return type:
(
Set
[str
],dict
,dict
)- Returns:
(mandatory, default_options, config_space)
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.cost_aware_gp_fifo_searcher_defaults(kwargs)[source]
Returns
mandatory
,default_options
,config_space
forcheck_and_merge_defaults()
to be applied tosearch_options
forCostAwareGPFIFOSearcher
.- Return type:
(
Set
[str
],dict
,dict
)- Returns:
(mandatory, default_options, config_space)
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.cost_aware_gp_multifidelity_searcher_defaults(kwargs)[source]
Returns
mandatory
,default_options
,config_space
forcheck_and_merge_defaults()
to be applied tosearch_options
forCostAwareGPMultiFidelitySearcher
.- Return type:
(
Set
[str
],dict
,dict
)- Returns:
(mandatory, default_options, config_space)
- syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.hypertune_searcher_defaults(kwargs)[source]
Returns
mandatory
,default_options
,config_space
forcheck_and_merge_defaults()
to be applied tosearch_options
forHyperTuneSearcher
.- Return type:
(
Set
[str
],dict
,dict
)- Returns:
(mandatory, default_options, config_space)
syne_tune.optimizer.schedulers.searchers.gp_searcher_utils module
- class syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.MapReward(forward, reverse)[source]
Bases:
object
-
forward:
Callable
[[float
],float
]
-
reverse:
Callable
[[float
],float
]
-
forward:
- syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.map_reward_const_minus_x(const=1.0)[source]
Factory for map_reward argument in GPMultiFidelitySearcher.
- Return type:
- syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.encode_state(state)[source]
- Return type:
Dict
[str
,Any
]
- syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.decode_state(enc_state, hp_ranges)[source]
- Return type:
- syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.decode_state_from_old_encoding(enc_state, hp_ranges)[source]
Decodes
TuningJobState
from encoding done for the old definition ofTuningJobState
. Code maintained for backwards compatibility.Note: Since the old
TuningJobState
did not containtrial_id
, we need to make them up here. We assign these IDs in the ordercandidate_evaluations
,failed_candidates
,pending_candidates
, matching for duplicates.- Parameters:
enc_state (
Dict
[str
,Any
]) –hp_ranges (
HyperparameterRanges
) –
- Return type:
- Returns:
- class syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.ResourceForAcquisitionMap[source]
Bases:
object
In order to use a standard acquisition function (like expected improvement) for multi-fidelity HPO, we need to decide at which
r_acq
we would like to evaluate the AF, w.r.t. the posterior distribution overf(x, r=r_acq)
. This decision can depend on the current state.
- class syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.ResourceForAcquisitionBOHB(threshold, active_metric='target')[source]
Bases:
ResourceForAcquisitionMap
Implements a heuristic proposed in the BOHB paper:
r_acq
is the largestr
such that we have at leastthreshold
observations atr
. If there are less thanthreshold
observations at all levels, the smallest level is returned.
- class syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.ResourceForAcquisitionFirstMilestone[source]
Bases:
ResourceForAcquisitionMap
Here,
r_acq
is the smallest rung level to be attained by a config started from scratch.
- class syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.ResourceForAcquisitionFinal(r_max)[source]
Bases:
ResourceForAcquisitionMap
Here,
r_acq = r_max
is the largest resource level.
syne_tune.optimizer.schedulers.searchers.model_based_searcher module
- syne_tune.optimizer.schedulers.searchers.model_based_searcher.check_initial_candidates_scorer(initial_scoring)[source]
- Return type:
str
- class syne_tune.optimizer.schedulers.searchers.model_based_searcher.ModelBasedSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
StochasticSearcher
Common code for surrogate model based searchers
If
num_initial_random_choices > 0
, initial configurations are drawn using an internalRandomSearcher
object, which is created in_assign_random_searcher()
. This internal random searcher sharesrandom_state
with the searcher here. This ensures that ifModelBasedSearcher
andRandomSearcher
objects are created with the samerandom_seed
andpoints_to_evaluate
argument, initial configurations are identical until_get_config_modelbased()
kicks in.Note that this works because
random_state
is only used in the internal random searcher until meth:_get_config_modelbased
is first called.- on_trial_result(trial_id, config, result, update)[source]
Inform searcher about result
The scheduler passes every result. If
update == True
, the searcher should update its surrogate model (if any), otherwiseresult
is an intermediate result not modelled.The default implementation calls
_update()
ifupdate == True
. It can be overwritten by searchers which also react to intermediate results.- Parameters:
trial_id (
str
) – Seeon_trial_result()
config (
Dict
[str
,Any
]) – Seeon_trial_result()
result (
Dict
[str
,Any
]) – Seeon_trial_result()
update (
bool
) – Should surrogate model be updated?
- get_config(**kwargs)[source]
Runs Bayesian optimization in order to suggest the next config to evaluate.
- Return type:
Optional
[Dict
[str
,Any
]]- Returns:
Next config to evaluate at
- dataset_size()[source]
- Returns:
Size of dataset a model is fitted to, or 0 if no model is fitted to data
- model_parameters()[source]
- Returns:
Dictionary with current model (hyper)parameter values if this is supported; otherwise empty
- get_state()[source]
The mutable state consists of the GP model parameters, the
TuningJobState
, and theskip_optimization
predicate (which can have a mutable state). We assume thatskip_optimization
can be pickled.Note that we do not have to store the state of
_random_searcher
, since this internal searcher shares itsrandom_state
with the searcher here.- Return type:
Dict
[str
,Any
]
- property debug_log
Some subclasses support writing a debug log, using
DebugLogPrinter
. SeeRandomSearcher
for an example.- Returns:
debug_log
object`` or None (not supported)
- syne_tune.optimizer.schedulers.searchers.model_based_searcher.create_initial_candidates_scorer(initial_scoring, predictor, acquisition_class, random_state, active_metric='target')[source]
- Return type:
- class syne_tune.optimizer.schedulers.searchers.model_based_searcher.BayesianOptimizationSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
ModelBasedSearcher
Common Code for searchers using Bayesian optimization
We implement Bayesian optimization, based on a model factory which parameterizes the state transformer. This implementation works with any type of surrogate model and acquisition function, which are compatible with each other.
The following happens in
get_config()
:For the first
num_init_random
calls, a config is drawn at random (afterpoints_to_evaluate
, which are included in thenum_init_random
initial ones). Afterwards, Bayesian optimization is used, unless there are no finished evaluations yet (a surrogate model cannot be used with no data at all)For BO, model hyperparameter are refit first. This step can be skipped (see
opt_skip_*
parameters).Next, the BO decision is made based on
BayesianOptimizationAlgorithm
. This involves samplingnum_init_candidates`
configs are sampled at random, ranking them with a scoring function (initial_scoring
), and finally runing local optimization starting from the top scoring config.
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- register_pending(trial_id, config=None, milestone=None)[source]
Registers trial as pending. This means the corresponding evaluation task is running. Once it finishes, update is called for this trial.
- get_batch_configs(batch_size, num_init_candidates_for_batch=None, **kwargs)[source]
Asks for a batch of
batch_size
configurations to be suggested. This is roughly equivalent to callingget_config
batch_size
times, marking the suggested configs as pending in the state (but the state is not modified here). This means the batch is chosen sequentially, at about the cost of callingget_config
batch_size
times.If
num_init_candidates_for_batch
is given, it is used instead ofnum_init_candidates
for the selection of all but the first config in the batch. In order to speed up batch selection, choosenum_init_candidates_for_batch
smaller thannum_init_candidates
.If less than
batch_size
configs are returned, the search space has been exhausted.Note: Batch selection does not support
debug_log
right now: make sure to switch this off when creating scheduler and searcher.- Return type:
List
[Dict
[str
,Union
[int
,float
,str
]]]
syne_tune.optimizer.schedulers.searchers.random_grid_searcher module
- class syne_tune.optimizer.schedulers.searchers.random_grid_searcher.RandomSearcher(config_space, metric, points_to_evaluate=None, debug_log=False, resource_attr=None, allow_duplicates=None, restrict_configurations=None, **kwargs)[source]
Bases:
StochasticAndFilterDuplicatesSearcher
Searcher which randomly samples configurations to try next.
Additional arguments on top of parent class
StochasticAndFilterDuplicatesSearcher
:- Parameters:
debug_log (
Union
[bool
,DebugLogPrinter
]) – IfTrue
, debug log printing is activated. Logs which configs are chosen when, and which metric values are obtained. Defaults toFalse
resource_attr (
Optional
[str
]) – Optional. Key inresult
passed to_update()
for resource value (for multi-fidelity schedulers)
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- property debug_log
Some subclasses support writing a debug log, using
DebugLogPrinter
. SeeRandomSearcher
for an example.- Returns:
debug_log
object`` or None (not supported)
- class syne_tune.optimizer.schedulers.searchers.random_grid_searcher.GridSearcher(config_space, metric, points_to_evaluate=None, num_samples=None, shuffle_config=True, allow_duplicates=False, **kwargs)[source]
Bases:
StochasticSearcher
Searcher that samples configurations from an equally spaced grid over config_space.
It first evaluates configurations defined in points_to_evaluate and then continues with the remaining points from the grid.
Additional arguments on top of parent class
StochasticSearcher
.- Parameters:
num_samples (
Optional
[Dict
[str
,int
]]) – Dictionary, optional. Number of samples per hyperparameter. This is required for hyperparameters of type float, optional for integer hyperparameters, and will be ignored for other types (categorical, scalar). If left unspecified, a default value ofDEFAULT_NSAMPLE
will be used for float parameters, and the smallest ofDEFAULT_NSAMPLE
and integer range will be used for integer parameters.shuffle_config (
bool
) – IfTrue
(default), the order of configurations suggested after those specified inpoints_to_evaluate
is shuffled. Otherwise, the order will follow the Cartesian product of the configurations.allow_duplicates (
bool
) – IfTrue
,get_config()
may return the same configuration more than once. Defaults toFalse
- get_config(**kwargs)[source]
Select the next configuration from the grid.
This is done without replacement, so previously returned configs are not suggested again.
- Return type:
Optional
[dict
]- Returns:
A new configuration that is valid, or None if no new config can be suggested. The returned configuration is a dictionary that maps hyperparameters to its values.
- get_state()[source]
Together with
clone_from_state()
, this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.- Return type:
Dict
[str
,Any
]- Returns:
Pickle-able mutable state of searcher
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
syne_tune.optimizer.schedulers.searchers.regularized_evolution module
- class syne_tune.optimizer.schedulers.searchers.regularized_evolution.PopulationElement(result=None, score=0, config=None)[source]
Bases:
object
-
result:
Dict
[str
,Any
] = None
-
score:
int
= 0
-
config:
Dict
[str
,Any
] = None
-
result:
- class syne_tune.optimizer.schedulers.searchers.regularized_evolution.RegularizedEvolution(config_space, metric, points_to_evaluate=None, population_size=100, sample_size=10, **kwargs)[source]
Bases:
StochasticSearcher
Implements the regularized evolution algorithm. The original implementation only considers categorical hyperparameters. For integer and float parameters we sample a new value uniformly at random. Reference:
Real, E., Aggarwal, A., Huang, Y., and Le, Q. V.Regularized Evolution for Image Classifier Architecture Search.In Proceedings of the Conference on Artificial Intelligence (AAAI’19)The code is based one the original regularized evolution open-source implementation: https://colab.research.google.com/github/google-research/google-research/blob/master/evolution/regularized_evolution_algorithm/regularized_evolution.ipynb
Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
mode – Mode to use for the metric given, can be “min” or “max”, defaults to “min”
population_size (
int
) – Size of the population, defaults to 100sample_size (
int
) – Size of the candidate set to obtain a parent for the mutation, defaults to 10
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[dict
]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
syne_tune.optimizer.schedulers.searchers.searcher module
- syne_tune.optimizer.schedulers.searchers.searcher.impute_points_to_evaluate(points_to_evaluate, config_space)[source]
Transforms
points_to_evaluate
argument toBaseSearcher
. Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. Also, duplicate entries are filtered out. If None (default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.- Parameters:
points_to_evaluate (
Optional
[List
[Dict
[str
,Any
]]]) – Argument toBaseSearcher
config_space (
Dict
[str
,Any
]) – Configuration space
- Return type:
List
[Dict
[str
,Any
]]- Returns:
List of fully specified initial configs
- class syne_tune.optimizer.schedulers.searchers.searcher.BaseSearcher(config_space, metric, points_to_evaluate=None, mode='min')[source]
Bases:
object
Base class of searchers, which are components of schedulers responsible for implementing
get_config()
.Note
This is an abstract base class. In order to implement a new searcher, try to start from
StochasticAndFilterDuplicatesSearcher
orStochasticSearcher
, which implement generally useful properties.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spacemetric (
Union
[List
[str
],str
]) –Name of metric passed to
update()
. Can be obtained from scheduler inconfigure_scheduler()
. In the case of multi-objective optimization,metric is a list of strings specifying all objectives to be optimized.
points_to_evaluate (
Optional
[List
[Dict
[str
,Any
]]]) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. IfNone
(default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.mode (
Union
[List
[str
],str
]) – Should metric be minimized (“min”, default) or maximized (“max”). In the case of multi-objective optimization, mode can be a list defining for each metric if it is minimized or maximized
- configure_scheduler(scheduler)[source]
Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.
- Parameters:
scheduler (
TrialScheduler
) – Scheduler the searcher is used with.
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[Dict
[str
,Any
]]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
- on_trial_result(trial_id, config, result, update)[source]
Inform searcher about result
The scheduler passes every result. If
update == True
, the searcher should update its surrogate model (if any), otherwiseresult
is an intermediate result not modelled.The default implementation calls
_update()
ifupdate == True
. It can be overwritten by searchers which also react to intermediate results.- Parameters:
trial_id (
str
) – Seeon_trial_result()
config (
Dict
[str
,Any
]) – Seeon_trial_result()
result (
Dict
[str
,Any
]) – Seeon_trial_result()
update (
bool
) – Should surrogate model be updated?
- register_pending(trial_id, config=None, milestone=None)[source]
Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.
- Parameters:
trial_id (
str
) – ID of trial to be registered as pending evaluationconfig (
Optional
[Dict
[str
,Any
]]) – Iftrial_id
has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.milestone (
Optional
[int
]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers(config, milestone)
as pending.
- remove_case(trial_id, **kwargs)[source]
Remove data case previously appended by
_update()
For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.
- Parameters:
trial_id (
str
) – ID of trial whose data is to be removedkwargs – Extra arguments, optional
- evaluation_failed(trial_id)[source]
Called by scheduler if an evaluation job for a trial failed.
The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).
- Parameters:
trial_id (
str
) – ID of trial whose evaluated failed
- cleanup_pending(trial_id)[source]
Removes all pending evaluations for trial
trial_id
.This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.
- Parameters:
trial_id (
str
) – ID of trial whose pending evaluations should be cleared
- dataset_size()[source]
- Returns:
Size of dataset a model is fitted to, or 0 if no model is fitted to data
- model_parameters()[source]
- Returns:
Dictionary with current model (hyper)parameter values if this is supported; otherwise empty
- get_state()[source]
Together with
clone_from_state()
, this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.- Return type:
Dict
[str
,Any
]- Returns:
Pickle-able mutable state of searcher
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- property debug_log: DebugLogPrinter | None
Some subclasses support writing a debug log, using
DebugLogPrinter
. SeeRandomSearcher
for an example.- Returns:
debug_log
object`` or None (not supported)
syne_tune.optimizer.schedulers.searchers.searcher_base module
- syne_tune.optimizer.schedulers.searchers.searcher_base.extract_random_seed(**kwargs)[source]
- Return type:
(
int
,Dict
[str
,Any
])
- syne_tune.optimizer.schedulers.searchers.searcher_base.sample_random_configuration(hp_ranges, random_state, exclusion_list=None)[source]
Samples a configuration from
config_space
at random.- Parameters:
hp_ranges (
HyperparameterRanges
) – Used for sampling configurationsrandom_state (
RandomState
) – PRN generatorexclusion_list (
Optional
[ExclusionList
]) – Configurations not to be returned
- Return type:
Optional
[Dict
[str
,Any
]]- Returns:
New configuration, or
None
if configuration space has been exhausted
- class syne_tune.optimizer.schedulers.searchers.searcher_base.StochasticSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]
Bases:
BaseSearcher
Base class of searchers which use random decisions. Creates the
random_state
member, which must be used for all random draws.Making proper use of this interface allows us to run experiments with control of random seeds, e.g. for paired comparisons or integration testing.
Additional arguments on top of parent class
BaseSearcher
:- Parameters:
random_seed_generator (
RandomSeedGenerator
, optional) – If given, random seed is drawn from thererandom_seed (int, optional) – Used if
random_seed_generator
is not given.
- class syne_tune.optimizer.schedulers.searchers.searcher_base.StochasticAndFilterDuplicatesSearcher(config_space, metric, points_to_evaluate=None, allow_duplicates=None, restrict_configurations=None, **kwargs)[source]
Bases:
StochasticSearcher
Base class for searchers with the following properties:
Random decisions use common
random_state
Maintains exclusion list to filter out duplicates in
get_config()
ifallows_duplicates == False`. If this is ``True
, duplicates are not filtered, and the exclusion list is used only to avoid configurations of failed trials.If
restrict_configurations
is given, this is a list of configurations, and the searcher only suggests configurations from there. Ifallow_duplicates == False
, entries are popped off this list once suggested.points_to_evaluate
is filtered to only contain entries in this set.
In order to make use of these features:
Reject configurations in
get_config()
ifshould_not_suggest()
returnsTrue
. If the configuration is drawn at random, use_get_random_config()
, which incorporates this filteringImplement
_get_config()
instead ofget_config()
. The latter adds the new config to the exclusion list ifallow_duplicates == False
Note: Not all searchers which filter duplicates make use of this class.
Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
allow_duplicates (
Optional
[bool
]) – See above. Defaults toFalse
restrict_configurations (
Optional
[List
[Dict
[str
,Any
]]]) – See above, optional
- property allow_duplicates: bool
- should_not_suggest(config)[source]
- Parameters:
config (
Dict
[str
,Any
]) – Configuration- Return type:
bool
- Returns:
get_config()
should not suggest this configuration?
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[Dict
[str
,Any
]]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
- register_pending(trial_id, config=None, milestone=None)[source]
Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.
- Parameters:
trial_id (
str
) – ID of trial to be registered as pending evaluationconfig (
Optional
[Dict
[str
,Any
]]) – Iftrial_id
has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.milestone (
Optional
[int
]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers(config, milestone)
as pending.
syne_tune.optimizer.schedulers.searchers.searcher_callback module
- class syne_tune.optimizer.schedulers.searchers.searcher_callback.StoreResultsAndModelParamsCallback(add_wallclock_time=True)[source]
Bases:
StoreResultsCallback
Extends
StoreResultsCallback
by also storing the current surrogate model parameters inon_trial_result()
. This works for schedulers with model-based searchers. For other schedulers, this callback behaves the same as the superclass.- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
[str
,Any
]) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
- class syne_tune.optimizer.schedulers.searchers.searcher_callback.SimulatorAndModelParamsCallback[source]
Bases:
SimulatorCallback
Extends
SimulatorCallback
by also storing the current surrogate model parameters inon_trial_result()
. This works for schedulers with model-based searchers. For other schedulers, this callback behaves the same as the superclass.- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
[str
,Any
]) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
syne_tune.optimizer.schedulers.searchers.searcher_factory module
- syne_tune.optimizer.schedulers.searchers.searcher_factory.searcher_factory(searcher_name, **kwargs)[source]
Factory for searcher objects
This function creates searcher objects from string argument name and additional kwargs. It is typically called in the constructor of a scheduler (see
FIFOScheduler
), which provides most of the requiredkwargs
.- Parameters:
searcher_name (
str
) – Value ofsearcher
argument to scheduler (seeFIFOScheduler
)kwargs – Argument to
BaseSearcher
constructor
- Return type:
- Returns:
New searcher object
syne_tune.optimizer.schedulers.synchronous package
- class syne_tune.optimizer.schedulers.synchronous.SynchronousHyperbandScheduler(config_space, bracket_rungs, **kwargs)[source]
Bases:
SynchronousHyperbandCommon
,DefaultRemoveCheckpointsSchedulerMixin
Synchronous Hyperband. Compared to
HyperbandScheduler
, this is also scheduling jobs asynchronously, but decision-making is synchronized, in that trials are only promoted to the next milestone once the rung they are currently paused at, is completely occupied.Our implementation never delays scheduling of a job. If the currently active bracket does not accept jobs, we assign the job to a later bracket. This means that at any point in time, several brackets can be active, but jobs are preferentially assigned to the first one (the “primary” active bracket).
- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for trial evaluation functionbracket_rungs (
List
[List
[Tuple
[int
,int
]]]) – Determines rung level systems for each bracket, seeSynchronousHyperbandBracketManager
metric (str) – Name of metric to optimize, key in result’s obtained via
on_trial_result()
searcher (str, optional) – Searcher for
get_config
decisions. Passed tosearcher_factory()
along withsearch_options
and extra information. Supported values:SUPPORTED_SEARCHERS_HYPERBAND
. Defaults to “random” (i.e., random search)search_options (Dict[str, Any], optional) – Passed to
searcher_factory()
.mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”
points_to_evaluate (
List[dict]
, optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. IfNone
(default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using
RandomSeedGenerator
. If not given, the master random seed is drawn at random here.max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.
max_resource_level (int, optional) – Largest rung level, corresponds to
max_t
inFIFOScheduler
. Must be positive int larger thangrace_period
. If this is not given, it is inferred like inFIFOScheduler
. In particular, it is not needed ifmax_resource_attr
is given.resource_attr (str, optional) – Name of resource attribute in results obtained via ``
on_trial_result()
. The type of resource must be int. Default to “epoch”searcher_data (str, optional) –
Relevant only if a model-based searcher is used. Example: For NN tuning and
resource_attr == "epoch"
, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:”rungs” (default): Only results at rung levels. Cheapest
”all”: All results. Most expensive
Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.
- property rung_levels: List[int]
- Returns:
Rung levels (positive int; increasing), may or may not include
max_resource_level
- property num_brackets: int
- Returns:
Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
[str
,Any
]) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- on_trial_error(trial)[source]
Given the
trial
is currently pending, we send a result at its milestone for metric value NaN. Such trials are ranked after all others and will most likely not be promoted.
- metric_names()[source]
- Return type:
List
[str
]- Returns:
List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)
- metric_mode()[source]
- Return type:
str
- Returns:
“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned
- trials_checkpoints_can_be_removed()[source]
Supports the general case (see header comment). This method returns IDs of paused trials for which checkpoints can safely be removed. These trials either cannot be resumed anymore, or it is very unlikely they will be resumed. Any trial ID needs to be returned only once, not over and over. If a trial gets stopped (by returning
SchedulerDecision.STOP
inon_trial_result()
), its checkpoint is removed anyway, so its ID does not have to be returned here.- Return type:
List
[int
]- Returns:
IDs of paused trials for which checkpoints can be removed
- class syne_tune.optimizer.schedulers.synchronous.SynchronousGeometricHyperbandScheduler(config_space, **kwargs)[source]
Bases:
SynchronousHyperbandScheduler
Special case of
SynchronousHyperbandScheduler
with rung system defined by geometric sequences (seeSynchronousHyperbandRungSystem.geometric()
). This is the most frequently used case.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for trial evaluation functionmetric (str) – Name of metric to optimize, key in result’s obtained via
on_trial_result()
grace_period (int, optional) – Smallest (resource) rung level. Must be positive int. Defaults to 1
reduction_factor (float, optional) – Approximate ratio of successive rung levels. Must be >= 2. Defaults to 3
brackets (int, optional) – Number of brackets to be used. The default is to use the maximum number of brackets per iteration. Pass 1 for successive halving.
searcher (str, optional) – Selects searcher. Passed to
searcher_factory()
. Defaults to “random”search_options (Dict[str, Any], optional) – Passed to
searcher_factory()
.mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”
points_to_evaluate (
List[dict]
, optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using
RandomSeedGenerator
. If not given, the master random seed is drawn at random here.max_resource_level (int, optional) – Largest rung level, corresponds to
max_t
inFIFOScheduler
. Must be positive int larger thangrace_period
. If this is not given, it is inferred like inFIFOScheduler
. In particular, it is not needed ifmax_resource_attr
is given.max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.
resource_attr (str, optional) – Name of resource attribute in results obtained via ``
on_trial_result()
. The type of resource must be int. Default to “epoch”searcher_data (str, optional) –
Relevant only if a model-based searcher is used. Example: For NN tuning and
resource_attr == "epoch"
, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:”rungs” (default): Only results at rung levels. Cheapest
”all”: All results. Most expensive
Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.
- class syne_tune.optimizer.schedulers.synchronous.DifferentialEvolutionHyperbandScheduler(config_space, rungs_first_bracket, num_brackets_per_iteration=None, **kwargs)[source]
Bases:
SynchronousHyperbandCommon
Differential Evolution Hyperband, as proposed in
DEHB: Evolutionary Hyperband for Scalable, Robust and Efficient Hyperparameter OptimizationNoor Awad, Neeratyoy Mallik, Frank HutterIJCAI 30 (2021), pages 2147-2153We implement DEHB as a variant of synchronous Hyperband, which may differ slightly from the implementation of the authors. Main differences to synchronous Hyperband:
In DEHB, trials are not paused and potentially promoted (except in the very first bracket). Therefore, checkpointing is not used (except in the very first bracket, if
support_pause_resume
isTrue
)Only the initial configurations are drawn at random (or drawn from the searcher). Whenever possible, new configurations (in their internal encoding) are derived from earlier ones by way of differential evolution
- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for trial evaluation functionrungs_first_bracket (
List
[Tuple
[int
,int
]]) – Determines rung level systems for each bracket, seeDifferentialEvolutionHyperbandBracketManager
num_brackets_per_iteration (
Optional
[int
]) – Number of brackets per iteration. The algorithm cycles through these brackets in one iteration. If not given, the maximum number is used (i.e.,len(rungs_first_bracket)
)metric (str) – Name of metric to optimize, key in result’s obtained via
on_trial_result()
searcher (str, optional) – Searcher for
get_config
decisions. Passed tosearcher_factory()
along withsearch_options
and extra information. Supported values:SUPPORTED_SEARCHERS_HYPERBAND
. Ifsearcher == "random_encoded"
(default), the encoded configs are sampled directly, each entry independently from U([0, 1]). This distribution has higher entropy than for “random” if there are discrete hyperparameters inconfig_space
. Note thatpoints_to_evaluate
is still used in this case.search_options (Dict[str, Any], optional) – Passed to
searcher_factory()
. Note: Ifsearch_options["allow_duplicates"] == True
, thensuggest()
may return a configuration more than oncemode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”
points_to_evaluate (
List[dict]
, optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using
RandomSeedGenerator
. If not given, the master random seed is drawn at random here.max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.
max_resource_level (int, optional) – Largest rung level, corresponds to
max_t
inFIFOScheduler
. Must be positive int larger thangrace_period
. If this is not given, it is inferred like inFIFOScheduler
. In particular, it is not needed ifmax_resource_attr
is given.resource_attr (str, optional) – Name of resource attribute in results obtained via
on_trial_result()
. The type of resource must be int. Default to “epoch”mutation_factor (float, optional) – In \((0, 1]\). Factor \(F\) used in the rand/1 mutation operation of DE. Default to 0.5
crossover_probability (float, optional) – In \((0, 1)\). Probability \(p\) used in crossover operation (child entries are chosen with probability \(p\)). Defaults to 0.5
support_pause_resume (bool, optional) – If
True
,_suggest()
supports pause and resume in the first bracket (this is the default). If the objective supports checkpointing, this is made use of. Defaults toTrue
. Note: The resumed trial still gets assigned a newtrial_id
, but it starts from the earlier checkpoint.searcher_data (str, optional) –
Relevant only if a model-based searcher is used. Example: For NN tuning and
resource_attr == "epoch"
, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:”rungs” (default): Only results at rung levels. Cheapest
”all”: All results. Most expensive
Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.
- MAX_RETRIES = 50
- property rung_levels: List[int]
- Returns:
Rung levels (positive int; increasing), may or may not include
max_resource_level
- property num_brackets: int
- Returns:
Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
[str
,Any
]) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- on_trial_error(trial)[source]
Given the
trial
is currently pending, we send a result at its milestone for metric value NaN. Such trials are ranked after all others and will most likely not be promoted.
- class syne_tune.optimizer.schedulers.synchronous.GeometricDifferentialEvolutionHyperbandScheduler(config_space, **kwargs)[source]
Bases:
DifferentialEvolutionHyperbandScheduler
Special case of
DifferentialEvolutionHyperbandScheduler
with rung system defined by geometric sequences. This is the most frequently used case.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for trial evaluation functiongrace_period (int, optional) – Smallest (resource) rung level. Must be positive int. Defaults to 1
reduction_factor (float, optional) – Approximate ratio of successive rung levels. Must be >= 2. Defaults to 3
brackets (int, optional) – Number of brackets to be used. The default is to use the maximum number of brackets per iteration. Pass 1 for successive halving.
metric (str) – Name of metric to optimize, key in result’s obtained via
on_trial_result()
searcher (str, optional) – Selects searcher. Passed to
searcher_factory()
.. Ifsearcher == "random_encoded"
(default), the encoded configs are sampled directly, each entry independently from U([0, 1]). This distribution has higher entropy than for “random” if there are discrete hyperparameters inconfig_space
. Note thatpoints_to_evaluate
is still used in this case.search_options (Dict[str, Any], optional) – Passed to
searcher_factory()
.mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”
points_to_evaluate (
List[dict]
, optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using
RandomSeedGenerator
. If not given, the master random seed is drawn at random here.max_resource_level (int, optional) – Largest rung level, corresponds to
max_t
inFIFOScheduler
. Must be positive int larger thangrace_period
. If this is not given, it is inferred like inFIFOScheduler
. In particular, it is not needed ifmax_resource_attr
is given.max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.
resource_attr (str, optional) – Name of resource attribute in results obtained via
on_trial_result()
. The type of resource must be int. Default to “epoch”mutation_factor (float, optional) – In \((0, 1]\). Factor \(F\) used in the rand/1 mutation operation of DE. Default to 0.5
crossover_probability (float, optional) – In \((0, 1)\). Probability \(p\) used in crossover operation (child entries are chosen with probability \(p\)). Defaults to 0.5
support_pause_resume (bool, optional) – If
True
,_suggest()
supports pause and resume in the first bracket (this is the default). If the objective supports checkpointing, this is made use of. Defaults toTrue
. Note: The resumed trial still gets assigned a newtrial_id
, but it starts from the earlier checkpoint.
Submodules
syne_tune.optimizer.schedulers.synchronous.dehb module
- class syne_tune.optimizer.schedulers.synchronous.dehb.TrialInformation(encoded_config, level, metric_val=None)[source]
Bases:
object
Information the scheduler maintains per trial.
-
encoded_config:
ndarray
-
level:
int
-
metric_val:
Optional
[float
] = None
-
encoded_config:
- class syne_tune.optimizer.schedulers.synchronous.dehb.ExtendedSlotInRung(bracket_id, slot_in_rung)[source]
Bases:
object
Extends
SlotInRung
mostly for convenience
- class syne_tune.optimizer.schedulers.synchronous.dehb.DifferentialEvolutionHyperbandScheduler(config_space, rungs_first_bracket, num_brackets_per_iteration=None, **kwargs)[source]
Bases:
SynchronousHyperbandCommon
Differential Evolution Hyperband, as proposed in
DEHB: Evolutionary Hyperband for Scalable, Robust and Efficient Hyperparameter OptimizationNoor Awad, Neeratyoy Mallik, Frank HutterIJCAI 30 (2021), pages 2147-2153We implement DEHB as a variant of synchronous Hyperband, which may differ slightly from the implementation of the authors. Main differences to synchronous Hyperband:
In DEHB, trials are not paused and potentially promoted (except in the very first bracket). Therefore, checkpointing is not used (except in the very first bracket, if
support_pause_resume
isTrue
)Only the initial configurations are drawn at random (or drawn from the searcher). Whenever possible, new configurations (in their internal encoding) are derived from earlier ones by way of differential evolution
- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for trial evaluation functionrungs_first_bracket (
List
[Tuple
[int
,int
]]) – Determines rung level systems for each bracket, seeDifferentialEvolutionHyperbandBracketManager
num_brackets_per_iteration (
Optional
[int
]) – Number of brackets per iteration. The algorithm cycles through these brackets in one iteration. If not given, the maximum number is used (i.e.,len(rungs_first_bracket)
)metric (str) – Name of metric to optimize, key in result’s obtained via
on_trial_result()
searcher (str, optional) – Searcher for
get_config
decisions. Passed tosearcher_factory()
along withsearch_options
and extra information. Supported values:SUPPORTED_SEARCHERS_HYPERBAND
. Ifsearcher == "random_encoded"
(default), the encoded configs are sampled directly, each entry independently from U([0, 1]). This distribution has higher entropy than for “random” if there are discrete hyperparameters inconfig_space
. Note thatpoints_to_evaluate
is still used in this case.search_options (Dict[str, Any], optional) – Passed to
searcher_factory()
. Note: Ifsearch_options["allow_duplicates"] == True
, thensuggest()
may return a configuration more than oncemode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”
points_to_evaluate (
List[dict]
, optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using
RandomSeedGenerator
. If not given, the master random seed is drawn at random here.max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.
max_resource_level (int, optional) – Largest rung level, corresponds to
max_t
inFIFOScheduler
. Must be positive int larger thangrace_period
. If this is not given, it is inferred like inFIFOScheduler
. In particular, it is not needed ifmax_resource_attr
is given.resource_attr (str, optional) – Name of resource attribute in results obtained via
on_trial_result()
. The type of resource must be int. Default to “epoch”mutation_factor (float, optional) – In \((0, 1]\). Factor \(F\) used in the rand/1 mutation operation of DE. Default to 0.5
crossover_probability (float, optional) – In \((0, 1)\). Probability \(p\) used in crossover operation (child entries are chosen with probability \(p\)). Defaults to 0.5
support_pause_resume (bool, optional) – If
True
,_suggest()
supports pause and resume in the first bracket (this is the default). If the objective supports checkpointing, this is made use of. Defaults toTrue
. Note: The resumed trial still gets assigned a newtrial_id
, but it starts from the earlier checkpoint.searcher_data (str, optional) –
Relevant only if a model-based searcher is used. Example: For NN tuning and
resource_attr == "epoch"
, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:”rungs” (default): Only results at rung levels. Cheapest
”all”: All results. Most expensive
Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.
- MAX_RETRIES = 50
- property rung_levels: List[int]
- Returns:
Rung levels (positive int; increasing), may or may not include
max_resource_level
- property num_brackets: int
- Returns:
Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
[str
,Any
]) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- on_trial_error(trial)[source]
Given the
trial
is currently pending, we send a result at its milestone for metric value NaN. Such trials are ranked after all others and will most likely not be promoted.
syne_tune.optimizer.schedulers.synchronous.dehb_bracket module
- class syne_tune.optimizer.schedulers.synchronous.dehb_bracket.DifferentialEvolutionHyperbandBracket(rungs, mode)[source]
Bases:
SynchronousBracket
Represents a bracket in Differential Evolution Hyperband (DEHB).
There are a number of differences to brackets in standard synchronous Hyperband (
SynchronousHyperbandBracket
):on_result()
:result.trial_id
overwritestrial_id
in rung even if latter is notNone
.Promotions are not triggered automatically when a rung is complete
Some additional methods
- property num_rungs: int
syne_tune.optimizer.schedulers.synchronous.dehb_bracket_manager module
- class syne_tune.optimizer.schedulers.synchronous.dehb_bracket_manager.DifferentialEvolutionHyperbandBracketManager(rungs_first_bracket, mode, num_brackets_per_iteration=None)[source]
Bases:
SynchronousHyperbandBracketManager
Special case of
SynchronousHyperbandBracketManager
to manage DEHB brackets (typeDifferentialEvolutionHyperbandBracket
).In DEHB, the list of brackets is determined by the first one and the number of brackets. Also, later brackets have less total budget, because the size of a rung is determined by its level, independent of the bracket. This is different to what is done in synchronous Hyperband, where the rungs of later brackets have larger sizes, so the total budget of each bracket is the same.
We also need additional methods to access trial_id’s in specific rungs, as well as entries of the top lists for completed rungs. This is because DEHB controls the creation of new configurations at higher rungs, while synchronous Hyperband relies on automatic promotion from lower rungs.
- trial_id_from_parent_slot(bracket_id, level, slot_index)[source]
The parent slot has the same slot index and rung level in the largest bracket
< bracket_id
with a trial_id not None. If no such slot exists, None is returned. For a cross-over or selection operation, the target is chosen from the parent slot.- Return type:
Optional
[int
]
syne_tune.optimizer.schedulers.synchronous.hyperband module
- class syne_tune.optimizer.schedulers.synchronous.hyperband.SynchronousHyperbandCommon(config_space, **kwargs)[source]
Bases:
TrialSchedulerWithSearcher
,MultiFidelitySchedulerMixin
Common code for
_create_internal()
inSynchronousHyperbandScheduler
andDifferentialEvolutionHyperbandScheduler
- property searcher: BaseSearcher | None
- property resource_attr: str
- Returns:
Name of resource attribute in reported results
- property max_resource_level: int
- Returns:
Maximum resource level
- property searcher_data: str
- Returns:
Relevant only if a model-based searcher is used. Example: For NN tuning and
resource_attr == "epoch"
, we receive a result for each epoch, but not all epoch values are also rung levels.searcher_data
determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensiveget_config()
may become. Choices:”rungs”: Only results at rung levels. Cheapest
”all”: All results. Most expensive
”rungs_and_last”: Results at rung levels plus last recent one. Not available for all multi-fidelity schedulers
- class syne_tune.optimizer.schedulers.synchronous.hyperband.SynchronousHyperbandScheduler(config_space, bracket_rungs, **kwargs)[source]
Bases:
SynchronousHyperbandCommon
,DefaultRemoveCheckpointsSchedulerMixin
Synchronous Hyperband. Compared to
HyperbandScheduler
, this is also scheduling jobs asynchronously, but decision-making is synchronized, in that trials are only promoted to the next milestone once the rung they are currently paused at, is completely occupied.Our implementation never delays scheduling of a job. If the currently active bracket does not accept jobs, we assign the job to a later bracket. This means that at any point in time, several brackets can be active, but jobs are preferentially assigned to the first one (the “primary” active bracket).
- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for trial evaluation functionbracket_rungs (
List
[List
[Tuple
[int
,int
]]]) – Determines rung level systems for each bracket, seeSynchronousHyperbandBracketManager
metric (str) – Name of metric to optimize, key in result’s obtained via
on_trial_result()
searcher (str, optional) – Searcher for
get_config
decisions. Passed tosearcher_factory()
along withsearch_options
and extra information. Supported values:SUPPORTED_SEARCHERS_HYPERBAND
. Defaults to “random” (i.e., random search)search_options (Dict[str, Any], optional) – Passed to
searcher_factory()
.mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”
points_to_evaluate (
List[dict]
, optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. IfNone
(default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using
RandomSeedGenerator
. If not given, the master random seed is drawn at random here.max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.
max_resource_level (int, optional) – Largest rung level, corresponds to
max_t
inFIFOScheduler
. Must be positive int larger thangrace_period
. If this is not given, it is inferred like inFIFOScheduler
. In particular, it is not needed ifmax_resource_attr
is given.resource_attr (str, optional) – Name of resource attribute in results obtained via ``
on_trial_result()
. The type of resource must be int. Default to “epoch”searcher_data (str, optional) –
Relevant only if a model-based searcher is used. Example: For NN tuning and
resource_attr == "epoch"
, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:”rungs” (default): Only results at rung levels. Cheapest
”all”: All results. Most expensive
Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.
- property rung_levels: List[int]
- Returns:
Rung levels (positive int; increasing), may or may not include
max_resource_level
- property num_brackets: int
- Returns:
Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
[str
,Any
]) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- on_trial_error(trial)[source]
Given the
trial
is currently pending, we send a result at its milestone for metric value NaN. Such trials are ranked after all others and will most likely not be promoted.
- metric_names()[source]
- Return type:
List
[str
]- Returns:
List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)
- metric_mode()[source]
- Return type:
str
- Returns:
“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned
- trials_checkpoints_can_be_removed()[source]
Supports the general case (see header comment). This method returns IDs of paused trials for which checkpoints can safely be removed. These trials either cannot be resumed anymore, or it is very unlikely they will be resumed. Any trial ID needs to be returned only once, not over and over. If a trial gets stopped (by returning
SchedulerDecision.STOP
inon_trial_result()
), its checkpoint is removed anyway, so its ID does not have to be returned here.- Return type:
List
[int
]- Returns:
IDs of paused trials for which checkpoints can be removed
syne_tune.optimizer.schedulers.synchronous.hyperband_bracket module
- class syne_tune.optimizer.schedulers.synchronous.hyperband_bracket.SlotInRung(rung_index, level, slot_index, trial_id, metric_val)[source]
Bases:
object
Used to communicate slot positions and content for them.
-
rung_index:
int
-
level:
int
-
slot_index:
int
-
trial_id:
Optional
[int
]
-
metric_val:
Optional
[float
]
-
rung_index:
- class syne_tune.optimizer.schedulers.synchronous.hyperband_bracket.SynchronousBracket(mode)[source]
Bases:
object
Base class for a single bracket in synchronous Hyperband algorithms.
A bracket consists of a list of rungs. Each rung consists of a number of slots and a resource level (called rung level). The larger the rung level, the smaller the number of slots.
A slot is occupied (by a metric value), free, or pending. A pending slot has already been returned by
next_free_slot()
. Slots in the lowest rung (smallest rung level, largest size) are filled first. At any point in time, only slots in the lowest not fully occupied rung can be filled. If there are no free slots in the current rung, but there are pending ones, the bracket is blocked, and another bracket needs to be worked on.- property num_rungs: int
- num_pending_slots()[source]
- Return type:
int
- Returns:
Number of pending slots (have been returned by
next_free_slot
, but not yet occupied
- next_free_slot()[source]
- Return type:
Optional
[SlotInRung
]
- on_result(result)[source]
Provides result for slot previously requested by
next_free_slot
. Here,result.metric
is written to the slot in order to make it occupied. Also,result.trial_id
is written there.We normally return
None
. But if the result passed completes the current rung, this triggers the creation of a child run which consists of promoted trials from the current rung. In this case, we return the IDs of trials which have not been promoted. This is used in for early removal of checkpoints, seetrials_checkpoints_can_be_removed()
.- Parameters:
result (
SlotInRung
) – See above- Return type:
Optional
[List
[int
]]- Returns:
See above
- class syne_tune.optimizer.schedulers.synchronous.hyperband_bracket.SynchronousHyperbandBracket(rungs, mode)[source]
Bases:
SynchronousBracket
Represents a bracket in standard synchronous Hyperband.
When a rung is fully occupied, slots for the next rung are assigned with the trial_id’s having the best metric values. At any point in time, only slots in the lowest not fully occupied rung can be filled.
- property num_rungs: int
- syne_tune.optimizer.schedulers.synchronous.hyperband_bracket.get_top_list(rung, new_len, mode)[source]
Returns list of IDs of trials of len
new_len
which should be promoted, because they performed best. We also return the list of IDs of the remaining trials, which are not to be promoted.- Parameters:
rung (
List
[Tuple
[int
,float
]]) – Current rung which has just been completednew_len (
int
) – Size of new rungmode (
str
) – “min” or “max”
- Return type:
(
List
[int
],List
[int
])- Returns:
(top_list, remaining_list)
syne_tune.optimizer.schedulers.synchronous.hyperband_bracket_manager module
- class syne_tune.optimizer.schedulers.synchronous.hyperband_bracket_manager.SynchronousHyperbandBracketManager(bracket_rungs, mode)[source]
Bases:
object
Maintains all brackets, relays requests for another job and report of result to one of the brackets.
Each bracket contains a number of rungs, the largest one
max_num_rungs
. A bracket with k rungs has offsetmax_num_rungs - k
. Hyperband cycles through brackets with offset0, ..., num_brackets - 1
, wherenum_brackets <= max_num_rungs
.At any given time, one bracket is primary, all other active brackets are secondary. Jobs are preferentially assigned to the primary bracket, but if its current rung has no free slots (all are pending), secondary brackets are considered.
Each bracket has a
bracket_id
(nonnegative int). The primary bracket always has the lowest id of all active ones. For job assignment, we iterate over active brackets starting from the primary, and assign the job to the first bracket which has a free slot. If none of the active brackets have a free slot, a new bracket is created.- Parameters:
bracket_rungs (
List
[List
[Tuple
[int
,int
]]]) – Rungs for successive brackets, from largest to smallestmode (
str
) – Criterion is minimized (‘min’) or maximized (‘max’)
- property bracket_rungs: List[List[Tuple[int, int]]]
- level_to_prev_level(bracket_id, level)[source]
- Parameters:
bracket_id (
int
) –level (
int
) – Level in bracket
- Return type:
int
- Returns:
Previous level; or 0
- next_job()[source]
Called by scheduler to request a new job. Jobs are preferentially assigned to the primary bracket, which has the lowest id among all active brackets. If the primary bracket does not accept jobs (because all remaining slots are already pending), further active brackets are polled. If none of the active brackets accept jobs, a new bracket is created.
The job description returned is (bracket_id, slot_in_rung), where
slot_in_rung
isSlotInRung
, containing the info of what is to be done (trial_id
,level
fields). It is this entry which has to be returned in ‘on_result``, which themetric_val
field set. If the job returned here hastrial_id == None
, it comes from the lowest rung of its bracket, and thetrial_id
has to be set as well when returning the record inon_result
.- Return type:
Tuple
[int
,SlotInRung
]- Returns:
Tuple
(bracket_id, slot_in_rung)
- on_result(result)[source]
Called by scheduler to provide result for previously requested job. See
next_job()
.- Parameters:
result (
Tuple
[int
,SlotInRung
]) – Tuple(bracket_id, slot_in_rung)
- Return type:
Optional
[List
[int
]]- Returns:
See
on_result()
syne_tune.optimizer.schedulers.synchronous.hyperband_impl module
- class syne_tune.optimizer.schedulers.synchronous.hyperband_impl.SynchronousGeometricHyperbandScheduler(config_space, **kwargs)[source]
Bases:
SynchronousHyperbandScheduler
Special case of
SynchronousHyperbandScheduler
with rung system defined by geometric sequences (seeSynchronousHyperbandRungSystem.geometric()
). This is the most frequently used case.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for trial evaluation functionmetric (str) – Name of metric to optimize, key in result’s obtained via
on_trial_result()
grace_period (int, optional) – Smallest (resource) rung level. Must be positive int. Defaults to 1
reduction_factor (float, optional) – Approximate ratio of successive rung levels. Must be >= 2. Defaults to 3
brackets (int, optional) – Number of brackets to be used. The default is to use the maximum number of brackets per iteration. Pass 1 for successive halving.
searcher (str, optional) – Selects searcher. Passed to
searcher_factory()
. Defaults to “random”search_options (Dict[str, Any], optional) – Passed to
searcher_factory()
.mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”
points_to_evaluate (
List[dict]
, optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using
RandomSeedGenerator
. If not given, the master random seed is drawn at random here.max_resource_level (int, optional) – Largest rung level, corresponds to
max_t
inFIFOScheduler
. Must be positive int larger thangrace_period
. If this is not given, it is inferred like inFIFOScheduler
. In particular, it is not needed ifmax_resource_attr
is given.max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.
resource_attr (str, optional) – Name of resource attribute in results obtained via ``
on_trial_result()
. The type of resource must be int. Default to “epoch”searcher_data (str, optional) –
Relevant only if a model-based searcher is used. Example: For NN tuning and
resource_attr == "epoch"
, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:”rungs” (default): Only results at rung levels. Cheapest
”all”: All results. Most expensive
Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.
- class syne_tune.optimizer.schedulers.synchronous.hyperband_impl.GeometricDifferentialEvolutionHyperbandScheduler(config_space, **kwargs)[source]
Bases:
DifferentialEvolutionHyperbandScheduler
Special case of
DifferentialEvolutionHyperbandScheduler
with rung system defined by geometric sequences. This is the most frequently used case.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for trial evaluation functiongrace_period (int, optional) – Smallest (resource) rung level. Must be positive int. Defaults to 1
reduction_factor (float, optional) – Approximate ratio of successive rung levels. Must be >= 2. Defaults to 3
brackets (int, optional) – Number of brackets to be used. The default is to use the maximum number of brackets per iteration. Pass 1 for successive halving.
metric (str) – Name of metric to optimize, key in result’s obtained via
on_trial_result()
searcher (str, optional) – Selects searcher. Passed to
searcher_factory()
.. Ifsearcher == "random_encoded"
(default), the encoded configs are sampled directly, each entry independently from U([0, 1]). This distribution has higher entropy than for “random” if there are discrete hyperparameters inconfig_space
. Note thatpoints_to_evaluate
is still used in this case.search_options (Dict[str, Any], optional) – Passed to
searcher_factory()
.mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”
points_to_evaluate (
List[dict]
, optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified.random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using
RandomSeedGenerator
. If not given, the master random seed is drawn at random here.max_resource_level (int, optional) – Largest rung level, corresponds to
max_t
inFIFOScheduler
. Must be positive int larger thangrace_period
. If this is not given, it is inferred like inFIFOScheduler
. In particular, it is not needed ifmax_resource_attr
is given.max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.
resource_attr (str, optional) – Name of resource attribute in results obtained via
on_trial_result()
. The type of resource must be int. Default to “epoch”mutation_factor (float, optional) – In \((0, 1]\). Factor \(F\) used in the rand/1 mutation operation of DE. Default to 0.5
crossover_probability (float, optional) – In \((0, 1)\). Probability \(p\) used in crossover operation (child entries are chosen with probability \(p\)). Defaults to 0.5
support_pause_resume (bool, optional) – If
True
,_suggest()
supports pause and resume in the first bracket (this is the default). If the objective supports checkpointing, this is made use of. Defaults toTrue
. Note: The resumed trial still gets assigned a newtrial_id
, but it starts from the earlier checkpoint.
syne_tune.optimizer.schedulers.synchronous.hyperband_rung_system module
- class syne_tune.optimizer.schedulers.synchronous.hyperband_rung_system.SynchronousHyperbandRungSystem[source]
Bases:
object
Collects factory methods for
RungSystemsPerBracket
rung systems to be used inSynchronousHyperbandBracketManager
.- static geometric(min_resource, max_resource, reduction_factor, num_brackets=None)[source]
This is the geometric progression setup from the original papers on successive halving and Hyperband.
If
smax = ceil(log(max_resource / min_resource) / log(reduction_factor))
, there can be at mosts_max + 1
brackets. Here, bracket s hasr_num = s_max - s + 1
rungs, and the size of rung r in bracket s isn(r,s) = ceil( (s_max + 1) / r_num) * power(reduction_factor, r_num - r - 1)
- Parameters:
min_resource (
int
) – Smallest resource level (positive int)max_resource (
int
) – Largest resource level (positive int)reduction_factor (
float
) – Approximate ratio between successive rung levelsnum_brackets (
Optional
[int
]) – Number of brackets. If not given, the maximum number of brackets is used. Pass 1 for successive halving
- Return type:
List
[List
[Tuple
[int
,int
]]]- Returns:
Rung system
syne_tune.optimizer.schedulers.transfer_learning package
- class syne_tune.optimizer.schedulers.transfer_learning.TransferLearningTaskEvaluations(configuration_space, hyperparameters, objectives_names, objectives_evaluations)[source]
Bases:
object
Class that contains offline evaluations for a task that can be used for transfer learning. Args:
configuration_space: Dict the configuration space that was used when sampling evaluations. hyperparameters: pd.DataFrame the hyperparameters values that were acquired, all keys of configuration-space
should appear as columns.
objectives_names: List[str] the name of the objectives that were acquired objectives_evaluations: np.array values of recorded objectives, must have shape
(num_evals, num_seeds, num_fidelities, num_objectives)
-
configuration_space:
Dict
-
hyperparameters:
DataFrame
-
objectives_names:
List
[str
]
-
objectives_evaluations:
array
- top_k_hyperparameter_configurations(k, mode, objective)[source]
Returns the best k hyperparameter configurations. :type k:
int
:param k: The number of top hyperparameters to return. :type mode:str
:param mode: ‘min’ or ‘max’, indicating the type of optimization problem. :type objective:str
:param objective: The objective to consider for ranking hyperparameters. :rtype:List
[Dict
[str
,Any
]] :returns: List of hyperparameters in order.
-
configuration_space:
- class syne_tune.optimizer.schedulers.transfer_learning.TransferLearningMixin(config_space, transfer_learning_evaluations, metric_names, **kwargs)[source]
Bases:
object
- top_k_hyperparameter_configurations_per_task(transfer_learning_evaluations, num_hyperparameters_per_task, mode, metric)[source]
Returns the best hyperparameter configurations for each task. :type transfer_learning_evaluations:
Dict
[str
,TransferLearningTaskEvaluations
] :param transfer_learning_evaluations: Set of candidates to choose from. :type num_hyperparameters_per_task:int
:param num_hyperparameters_per_task: The number of top hyperparameters per task to return. :type mode:str
:param mode: ‘min’ or ‘max’, indicating the type of optimization problem. :type metric:str
:param metric: The metric to consider for ranking hyperparameters. :rtype:Dict
[str
,List
[Dict
[str
,Any
]]] :returns: Dict which maps from task name to list of hyperparameters in order.
- class syne_tune.optimizer.schedulers.transfer_learning.BoundingBox(scheduler_fun, config_space, metric, transfer_learning_evaluations, mode=None, num_hyperparameters_per_task=1)[source]
Bases:
TransferLearningMixin
,TrialScheduler
Simple baseline that computes a bounding-box of the best candidate found in previous tasks to restrict the search space to only good candidates. The bounding-box is obtained by restricting to the min-max of the best numerical hyperparameters and restricting to the set of the best candidates on categorical parameters. Reference:
Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning.Valerio Perrone, Huibin Shen, Matthias Seeger, Cédric Archambeau, Rodolphe Jenatton.NeurIPS 2019.scheduler_fun
is used to create the scheduler to be used here, feeding it with the modified config space. Any additional scheduler arguments (such aspoints_to_evaluate
) should be encoded inside this function. Example:from syne_tune.optimizer.baselines import RandomSearch def scheduler_fun(new_config_space: Dict[str, Any], mode: str, metric: str): return RandomSearch(new_config_space, metric, mode) bb_scheduler = BoundingBox(scheduler_fun, ...)
Here,
bb_scheduler
represents random search, where the hyperparameter ranges are restricted to contain the best evalutions of previous tasks, as provided bytransfer_learning_evaluations
.- Parameters:
scheduler_fun (
Callable
[[dict
,str
,str
],TrialScheduler
]) – Maps tuple of configuration space (dict), mode (str), metric (str) to a scheduler. This is required since the final configuration space is known only after computing a bounding-box.config_space (
Dict
[str
,Any
]) – Initial configuration space to consider, will be updated to the bounding of the best evaluations of previous tasksmetric (
str
) – Objective name to optimize, must be present in transfer learning evaluations.mode (
Optional
[str
]) – Mode to be considered, default to “min”.transfer_learning_evaluations (
Dict
[str
,TransferLearningTaskEvaluations
]) – Dictionary from task name to offline evaluations.num_hyperparameters_per_task (
int
) – Number of the best configurations to use per task when computing the bounding box, defaults to 1.
- suggest(trial_id)[source]
Returns a suggestion for a new trial, or one to be resumed
This method returns
suggestion
of typeTrialSuggestion
(unless there is no config left to explore, and None is returned).If
suggestion.spawn_new_trial_id
isTrue
, a new trial is to be started with configsuggestion.config
. Typically, this new trial is started from scratch. But ifsuggestion.checkpoint_trial_id
is given, the trial is to be (warm)started from the checkpoint written for the trial with this ID. The new trial has IDtrial_id
.If
suggestion.spawn_new_trial_id
isFalse
, an existing and currently paused trial is to be resumed, whose ID issuggestion.checkpoint_trial_id
. If this trial has a checkpoint, we start from there. In this case,suggestion.config
is optional. If not given (default), the config of the resumed trial does not change. Otherwise, its config is overwritten bysuggestion.config
(seeHyperbandScheduler
withtype="promotion"
for an example why this can be useful).Apart from the HP config, additional fields can be appended to the dict, these are passed to the trial function as well.
- Parameters:
trial_id (
int
) – ID for new trial to be started (ignored if existing trial to be resumed)- Return type:
Optional
[TrialSuggestion
]- Returns:
Suggestion for a trial to be started or to be resumed, see above. If no suggestion can be made, None is returned
- on_trial_add(trial)[source]
Called when a new trial is added to the trial runner.
Additions are normally triggered by
suggest
.- Parameters:
trial (
Trial
) – Trial to be added
- on_trial_complete(trial, result)[source]
Notification for the completion of trial.
Note that
on_trial_result()
is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignoreon_trial_result()
and just useresult
here.- Parameters:
trial (
Trial
) – Trial which is completingresult (
Dict
[str
,Any
]) – Result dictionary
- on_trial_remove(trial)[source]
Called to remove trial.
This is called when the trial is in PAUSED or PENDING state. Otherwise, call
on_trial_complete()
.- Parameters:
trial (
Trial
) – Trial to be removed
- on_trial_error(trial)[source]
Called when a trial has failed.
- Parameters:
trial (
Trial
) – Trial for which error is reported.
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
[str
,Any
]) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- class syne_tune.optimizer.schedulers.transfer_learning.RUSHScheduler(config_space, transfer_learning_evaluations, metric, type='stopping', points_to_evaluate=None, custom_rush_points=None, num_hyperparameters_per_task=1, **kwargs)[source]
Bases:
TransferLearningMixin
,HyperbandScheduler
A transfer learning variation of Hyperband which uses previously well-performing hyperparameter configurations as an initialization. The best hyperparameter configuration of each individual task provided is evaluated. The one among them which performs best on the current task will serve as a hurdle and is used to prune other candidates. This changes the standard successive halving promotion as follows. As usual, only the top-performing fraction is promoted to the next rung level. However, these candidates need to be at least as good as the hurdle configuration to be promoted. In practice this means that much fewer candidates can be promoted. Reference:
A resource-efficient method for repeated HPO and NAS.Giovanni Zappella, David Salinas, Cédric Archambeau.AutoML workshop @ ICML 2021.Additional arguments on top of parent class
HyperbandScheduler
.- Parameters:
transfer_learning_evaluations (
Dict
[str
,TransferLearningTaskEvaluations
]) – Dictionary from task name to offline evaluations.points_to_evaluate (
Optional
[List
[dict
]]) – If given, these configurations are evaluated aftercustom_rush_points
and configurations inferred fromtransfer_learning_evaluations
. These points are not used to prune any configurations.custom_rush_points (
Optional
[List
[dict
]]) – If given, these configurations are evaluated first, in addition to top performing configurations from other tasks and also serve to preemptively prune underperforming configurationsnum_hyperparameters_per_task (
int
) – The number of top hyperparameter configurations to consider per task. Defaults to 1
Subpackages
syne_tune.optimizer.schedulers.transfer_learning.quantile_based package
Submodules
syne_tune.optimizer.schedulers.transfer_learning.quantile_based.normalization_transforms module
- class syne_tune.optimizer.schedulers.transfer_learning.quantile_based.normalization_transforms.GaussianTransform(y, random_state=None)[source]
Bases:
object
Transform data into Gaussian by applying psi = Phi^{-1} o F where F is the truncated empirical CDF. :type y:
array
:param y: shape (n, dim) :type random_state:Optional
[RandomState
] :param random_state: If specified, randomize the rank when consecutive values exists between extreme values.If none use lowest rank of duplicated values.
- static z_transform(series, values_sorted, random_state=None)[source]
- Parameters:
series – shape (n, dim)
values_sorted – series sorted on the first axis
random_state (
Optional
[RandomState
]) – if not None, ranks are drawn uniformly for values with consecutive ranges
- Returns:
data with same shape as input series where distribution is normalized on all dimensions
syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher module
- syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher.extract_input_output(transfer_learning_evaluations, normalization, random_state)[source]
- syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher.fit_model(config_space, transfer_learning_evaluations, normalization, max_fit_samples, random_state, model=XGBRegressor(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, device=None, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=None, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=None, max_leaves=None, min_child_weight=None, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=None, n_jobs=None, num_parallel_tree=None, random_state=None, ...))[source]
- syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher.eval_model(model_pipeline, X, y)[source]
- syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher.subsample(X, y, max_samples=10000, random_state=None)[source]
Subsample both X and y with
max_samples
elements. Ifmax_samples
is not set then X and y are returned as such and if it is set, the index of X is reset. :rtype:Tuple
[DataFrame
,array
] :return: (X, y) withmax_samples
sampled elements.
- class syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher.QuantileBasedSurrogateSearcher(config_space, metric, transfer_learning_evaluations, mode=None, max_fit_samples=100000, normalization='gaussian', **kwargs)[source]
Bases:
StochasticSearcher
Implements the transfer-learning method:
A Quantile-based Approach for Hyperparameter Transfer Learning.David Salinas, Huibin Shen, Valerio Perrone.ICML 2020.This is the Copula Thompson Sampling approach described in the paper where a surrogate is fitted on the transfer learning data to predict mean/variance of configuration performance given a hyperparameter. The surrogate is then sampled from and the best configurations are returned as next candidate to evaluate.
Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
mode (
Optional
[str
]) – Whether to minimize or maximize, default to “min”.transfer_learning_evaluations (
Dict
[str
,TransferLearningTaskEvaluations
]) – Dictionary from task name to offline evaluations.max_fit_samples (
int
) – Maximum number to use when fitting the method. Defaults to 100000normalization (
str
) – Default to “gaussian” which first computes the rank and then applies Gaussian inverse CDF. “standard” applies just standard normalization (remove mean and divide by variance) but can perform significantly worse.
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state (
Dict
[str
,Any
]) – See above- Returns:
New searcher object
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[dict
]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
Submodules
syne_tune.optimizer.schedulers.transfer_learning.bounding_box module
- class syne_tune.optimizer.schedulers.transfer_learning.bounding_box.BoundingBox(scheduler_fun, config_space, metric, transfer_learning_evaluations, mode=None, num_hyperparameters_per_task=1)[source]
Bases:
TransferLearningMixin
,TrialScheduler
Simple baseline that computes a bounding-box of the best candidate found in previous tasks to restrict the search space to only good candidates. The bounding-box is obtained by restricting to the min-max of the best numerical hyperparameters and restricting to the set of the best candidates on categorical parameters. Reference:
Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning.Valerio Perrone, Huibin Shen, Matthias Seeger, Cédric Archambeau, Rodolphe Jenatton.NeurIPS 2019.scheduler_fun
is used to create the scheduler to be used here, feeding it with the modified config space. Any additional scheduler arguments (such aspoints_to_evaluate
) should be encoded inside this function. Example:from syne_tune.optimizer.baselines import RandomSearch def scheduler_fun(new_config_space: Dict[str, Any], mode: str, metric: str): return RandomSearch(new_config_space, metric, mode) bb_scheduler = BoundingBox(scheduler_fun, ...)
Here,
bb_scheduler
represents random search, where the hyperparameter ranges are restricted to contain the best evalutions of previous tasks, as provided bytransfer_learning_evaluations
.- Parameters:
scheduler_fun (
Callable
[[dict
,str
,str
],TrialScheduler
]) – Maps tuple of configuration space (dict), mode (str), metric (str) to a scheduler. This is required since the final configuration space is known only after computing a bounding-box.config_space (
Dict
[str
,Any
]) – Initial configuration space to consider, will be updated to the bounding of the best evaluations of previous tasksmetric (
str
) – Objective name to optimize, must be present in transfer learning evaluations.mode (
Optional
[str
]) – Mode to be considered, default to “min”.transfer_learning_evaluations (
Dict
[str
,TransferLearningTaskEvaluations
]) – Dictionary from task name to offline evaluations.num_hyperparameters_per_task (
int
) – Number of the best configurations to use per task when computing the bounding box, defaults to 1.
- suggest(trial_id)[source]
Returns a suggestion for a new trial, or one to be resumed
This method returns
suggestion
of typeTrialSuggestion
(unless there is no config left to explore, and None is returned).If
suggestion.spawn_new_trial_id
isTrue
, a new trial is to be started with configsuggestion.config
. Typically, this new trial is started from scratch. But ifsuggestion.checkpoint_trial_id
is given, the trial is to be (warm)started from the checkpoint written for the trial with this ID. The new trial has IDtrial_id
.If
suggestion.spawn_new_trial_id
isFalse
, an existing and currently paused trial is to be resumed, whose ID issuggestion.checkpoint_trial_id
. If this trial has a checkpoint, we start from there. In this case,suggestion.config
is optional. If not given (default), the config of the resumed trial does not change. Otherwise, its config is overwritten bysuggestion.config
(seeHyperbandScheduler
withtype="promotion"
for an example why this can be useful).Apart from the HP config, additional fields can be appended to the dict, these are passed to the trial function as well.
- Parameters:
trial_id (
int
) – ID for new trial to be started (ignored if existing trial to be resumed)- Return type:
Optional
[TrialSuggestion
]- Returns:
Suggestion for a trial to be started or to be resumed, see above. If no suggestion can be made, None is returned
- on_trial_add(trial)[source]
Called when a new trial is added to the trial runner.
Additions are normally triggered by
suggest
.- Parameters:
trial (
Trial
) – Trial to be added
- on_trial_complete(trial, result)[source]
Notification for the completion of trial.
Note that
on_trial_result()
is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignoreon_trial_result()
and just useresult
here.- Parameters:
trial (
Trial
) – Trial which is completingresult (
Dict
[str
,Any
]) – Result dictionary
- on_trial_remove(trial)[source]
Called to remove trial.
This is called when the trial is in PAUSED or PENDING state. Otherwise, call
on_trial_complete()
.- Parameters:
trial (
Trial
) – Trial to be removed
- on_trial_error(trial)[source]
Called when a trial has failed.
- Parameters:
trial (
Trial
) – Trial for which error is reported.
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
[str
,Any
]) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
syne_tune.optimizer.schedulers.transfer_learning.rush module
- class syne_tune.optimizer.schedulers.transfer_learning.rush.RUSHScheduler(config_space, transfer_learning_evaluations, metric, type='stopping', points_to_evaluate=None, custom_rush_points=None, num_hyperparameters_per_task=1, **kwargs)[source]
Bases:
TransferLearningMixin
,HyperbandScheduler
A transfer learning variation of Hyperband which uses previously well-performing hyperparameter configurations as an initialization. The best hyperparameter configuration of each individual task provided is evaluated. The one among them which performs best on the current task will serve as a hurdle and is used to prune other candidates. This changes the standard successive halving promotion as follows. As usual, only the top-performing fraction is promoted to the next rung level. However, these candidates need to be at least as good as the hurdle configuration to be promoted. In practice this means that much fewer candidates can be promoted. Reference:
A resource-efficient method for repeated HPO and NAS.Giovanni Zappella, David Salinas, Cédric Archambeau.AutoML workshop @ ICML 2021.Additional arguments on top of parent class
HyperbandScheduler
.- Parameters:
transfer_learning_evaluations (
Dict
[str
,TransferLearningTaskEvaluations
]) – Dictionary from task name to offline evaluations.points_to_evaluate (
Optional
[List
[dict
]]) – If given, these configurations are evaluated aftercustom_rush_points
and configurations inferred fromtransfer_learning_evaluations
. These points are not used to prune any configurations.custom_rush_points (
Optional
[List
[dict
]]) – If given, these configurations are evaluated first, in addition to top performing configurations from other tasks and also serve to preemptively prune underperforming configurationsnum_hyperparameters_per_task (
int
) – The number of top hyperparameter configurations to consider per task. Defaults to 1
syne_tune.optimizer.schedulers.transfer_learning.zero_shot module
- class syne_tune.optimizer.schedulers.transfer_learning.zero_shot.ZeroShotTransfer(config_space, metric, transfer_learning_evaluations, mode='min', sort_transfer_learning_evaluations=True, use_surrogates=False, **kwargs)[source]
Bases:
TransferLearningMixin
,StochasticSearcher
A zero-shot transfer hyperparameter optimization method which jointly selects configurations that minimize the average rank obtained on historic metadata (
transfer_learning_evaluations
). This is a searcher which can be used withFIFOScheduler
. Reference:Sequential Model-Free Hyperparameter Tuning.Martin Wistuba, Nicolas Schilling, Lars Schmidt-Thieme.IEEE International Conference on Data Mining (ICDM) 2015.Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
transfer_learning_evaluations (
Dict
[str
,TransferLearningTaskEvaluations
]) – Dictionary from task name to offline evaluations.mode (
str
) – Whether to minimize (“min”, default) or maximize (“max”)sort_transfer_learning_evaluations (
bool
) – UseFalse
if the hyperparameters for each task intransfer_learning_evaluations
are already in the same order. If set toTrue
, hyperparameters are sorted. Defaults toTrue
use_surrogates (
bool
) – If the same configuration is not evaluated on all tasks, set this toTrue
. This will generate a set of configurations and will impute their performance using surrogate models. Defaults toFalse
- get_config(**kwargs)[source]
Suggest a new configuration.
Note: Query
_next_initial_config()
for initial configs to return first.- Parameters:
kwargs – Extra information may be passed from scheduler to searcher
- Return type:
Optional
[dict
]- Returns:
New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.
syne_tune.optimizer.schedulers.utils package
Submodules
syne_tune.optimizer.schedulers.utils.simple_profiler module
- class syne_tune.optimizer.schedulers.utils.simple_profiler.ProfilingBlock(meta, time_stamp, durations)[source]
Bases:
object
-
meta:
Dict
[str
,Any
]
-
time_stamp:
float
-
durations:
Dict
[str
,List
[float
]]
-
meta:
- class syne_tune.optimizer.schedulers.utils.simple_profiler.SimpleProfiler[source]
Bases:
object
Useful to profile time of recurring computations, for example
get_config
calls in searchers.Measurements are divided into blocks. A block is started by
begin_block
. Each block stores meta data, a time stamp whenbegin_block
was called (relative to the time stamp for the first block, which is 0), and a dict of lists of durations, whose keys are tags. A tag corresponds to a range of code to be profiled. It may be executed many times within a block, therefore lists of durations.Tags can have multiple levels of prefixes, corresponding to brackets.
syne_tune.optimizer.schedulers.utils.successive_halving module
- syne_tune.optimizer.schedulers.utils.successive_halving.successive_halving_rung_levels(rung_levels, grace_period, reduction_factor, rung_increment, max_t)[source]
Creates
rung_levels
fromgrace_period
,reduction_factor
Note: If
rung_levels
is given andrung_levels[-1] == max_t
, we strip off this final entry, so that all rung levels are< max_t
.- Parameters:
rung_levels (
Optional
[List
[int
]]) – If given, this is returned (but see above)grace_period (
int
) – SeeHyperbandScheduler
reduction_factor (
Optional
[float
]) – SeeHyperbandScheduler
rung_increment (
Optional
[int
]) – SeeHyperbandScheduler
max_t (
int
) – SeeHyperbandScheduler
- Return type:
List
[int
]- Returns:
List of rung levels
Submodules
syne_tune.optimizer.schedulers.fifo module
- class syne_tune.optimizer.schedulers.fifo.FIFOScheduler(config_space, **kwargs)[source]
Bases:
TrialSchedulerWithSearcher
Scheduler which executes trials in submission order.
This is the most basic scheduler template. It can be configured to many use cases by choosing
searcher
along withsearch_options
.- Parameters:
config_space (Dict[str, Any]) – Configuration space for evaluation function
searcher (str or
BaseSearcher
) – Searcher forget_config
decisions. String values are passed tosearcher_factory()
along withsearch_options
and extra information. Supported values:SUPPORTED_SEARCHERS_FIFO
. Defaults to “random” (i.e., random search)search_options (Dict[str, Any], optional) – If searcher is
str
, these arguments are passed tosearcher_factory()
metric (str or List[str]) – Name of metric to optimize, key in results obtained via
on_trial_result
. For multi-objective schedulers, this can also be a listmode (str or List[str], optional) – “min” if
metric
is minimized, “max” ifmetric
is maximized, defaults to “min”. This can also be a list ifmetric
is a listpoints_to_evaluate (
List[dict]
, optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If not given, this is mapped to[dict()]
, a single default config determined by the midpoint heuristic. If[]
(empty list), no initial configurations are specified. Note: Ifsearcher
is of typeBaseSearcher
,points_to_evaluate
must be set there.random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using
RandomSeedGenerator
. If not given, the master random seed is drawn at random here.max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If this is given,
max_t
is not needed. We recommend to usemax_resource_attr
overmax_t
. If given, we use it to infermax_resource_level
. It is also used to limit trial executions in promotion-based multi-fidelity schedulers (see class:HyperbandScheduler
,type="promotion"
).max_t (int, optional) – Value for
max_resource_level
. Needed for schedulers which make use of intermediate reports viaon_trial_result
. If this is not given, we try to infer its value fromconfig_space
(seeResourceLevelsScheduler
). checkingconfig_space["epochs"]
,config_space["max_t"]
, andconfig_space["max_epochs"]
. Ifmax_resource_attr
is given, we use the valueconfig_space[max_resource_attr]
. But ifmax_t
is given here, it takes precedence.time_keeper (
TimeKeeper
, optional) – This will be used for timing here (see_elapsed_time
). The time keeper has to be started at the beginning of the experiment. If not given, we use a local time keeper here, which is started with the first call to_suggest()
. Can also be set after construction, withset_time_keeper()
. Note: If you useSimulatorBackend
, you need to pass itstime_keeper
here.
- property searcher: BaseSearcher | None
- set_time_keeper(time_keeper)[source]
Assign time keeper after construction.
This is possible only if the time keeper was not assigned at construction, and the experiment has not yet started.
- Parameters:
time_keeper (
TimeKeeper
) – Time keeper to be used
- on_trial_result(trial, result)[source]
We simply relay
result
to the searcher. Other decisions are done inon_trial_complete
.- Return type:
str
- metric_names()[source]
- Return type:
List
[str
]- Returns:
List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)
syne_tune.optimizer.schedulers.hyperband module
- syne_tune.optimizer.schedulers.hyperband.is_continue_decision(trial_decision)[source]
- Return type:
bool
- class syne_tune.optimizer.schedulers.hyperband.TrialInformation(config, time_stamp, bracket, keep_case, trial_decision, reported_result=None, largest_update_resource=None)[source]
Bases:
object
The scheduler maintains information about all trials it has been dealing with so far.
trial_decision
is the current status of the trial.keep_case
is relevant only ifsearcher_data == "rungs_and_last"
.largest_update_resource
is the largest resource level for which the searcher was updated, or None.reported_result
contains the last recent reported result, or None (task was started, but did not report anything yet). Only contains attributesself.metric
andself._resource_attr
.-
config:
Dict
[str
,Any
]
-
time_stamp:
float
-
bracket:
int
-
keep_case:
bool
-
trial_decision:
str
-
reported_result:
Optional
[dict
] = None
-
largest_update_resource:
Optional
[int
] = None
-
config:
- class syne_tune.optimizer.schedulers.hyperband.HyperbandScheduler(config_space, **kwargs)[source]
Bases:
FIFOScheduler
,MultiFidelitySchedulerMixin
,RemoveCheckpointsSchedulerMixin
Implements different variants of asynchronous Hyperband
See
type
for the different variants. One implementation detail is when using multiple brackets, task allocation to bracket is done randomly, based on a distribution which can be configured.For definitions of concepts (bracket, rung, milestone), see
Li, Jamieson, Rostamizadeh, Gonina, Hardt, Recht, Talwalkar (2018)A System for Massively Parallel Hyperparameter Tuningor
Tiao, Klein, Lienart, Archambeau, Seeger (2020)Model-based Asynchronous Hyperparameter and Neural Architecture SearchNote
This scheduler requires both
metric
andresource_attr
to be returned by the reporter. Here, resource values must be positive int. Ifresource_attr == "epoch"
, this should be the number of epochs done, starting from 1 (not the epoch number, starting from 0).Rung levels and promotion quantiles
Rung levels are values of the resource attribute at which stop/go decisions are made for jobs, comparing their metric against others at the same level. These rung levels (positive, strictly increasing) can be specified via
rung_levels
, the largest must be<= max_t
. Ifrung_levels
is not given, they are specified bygrace_period
andreduction_factor
orrung_increment
:If \(r_{min}\) is
grace_period
, \(\eta\) isreduction_factor
, then rung levels are \(\mathrm{round}(r_{min} \eta^j), j=0, 1, \dots\). This is the default choice for successive halving (Hyperband).If
rung_increment
is given, but notreduction_factor
, then rung levels are \(r_{min} + j \nu, j=0, 1, \dots\), where \(\nu\) isrung_increment
.
If
rung_levels
is given, thengrace_period
,reduction_factor
,rung_increment
are ignored. If they are given, a warning is logged.The rung levels determine the quantiles to be used in the stop/go decisions. If rung levels are \(r_j\), define \(q_j = r_j / r_{j+1}\). \(q_j\) is the promotion quantile at rung level \(r_j\). On average, a fraction of \(q_j\) jobs can continue, the remaining ones are stopped (or paused). In the default successive halving case, we have \(q_j = 1/\eta\) for all \(j\).
Cost-aware schedulers or searchers
Some schedulers (e.g.,
type == "cost_promotion"
) or searchers may depend on cost values (with keycost_attr
) reported alongside the target metric. For promotion-based scheduling, a trial may pause and resume several times. The cost received inon_trial_result
only counts the cost since the last resume. We maintain the sum of such costs in_cost_offset()
, and append a new entry toresult
inon_trial_result
with the total cost. If the evaluation function does not implement checkpointing, once a trial is resumed, it has to start from scratch. We detect this inon_trial_result
and reset the cost offset to 0 (if the trial runs from scratch, the cost reported needs no offset added).Note
This process requires
cost_attr
to be setPending evaluations
The searcher is notified, by
searcher.register_pending
calls, of (trial, resource) pairs for which evaluations are running, and a result is expected in the future. These pending evaluations can be used by the searcher in order to direct sampling elsewhere.The choice of pending evaluations depends on
searcher_data
. If equal to “rungs”, pending evaluations sit only at rung levels, because observations are only used there. In the other cases, pending evaluations sit at all resource levels for which observations are obtained. For example, if a trial is at rung level \(r\) and continues towards the next rung level \(r_{next}\), ifsearcher_data == "rungs"
,searcher.register_pending
is called for \(r_{next}\) only, while for othersearcher_data
values, pending evaluations are registered for \(r + 1, r + 2, \dots, r_{next}\). However, if in this case,register_pending_myopic
isTrue
, we instead callsearcher.register_pending
for \(r + 1\) when each observation is obtained (not just at a rung level). This leads to less pending evaluations at any one time. On the other hand, when a trial is continued at a rung level, we already know it will emit observations up to the next rung level, so it seems more “correct” to register all these pending evaluations in one go.Additional arguments on top of parent class
FIFOScheduler
:- Parameters:
searcher (str or
BaseSearcher
) – Searcher forget_config
decisions. String values are passed tosearcher_factory()
along withsearch_options
and extra information. Supported values:SUPPORTED_SEARCHERS_HYPERBAND
. Defaults to “random” (i.e., random search)resource_attr (str, optional) – Name of resource attribute in results obtained via
on_trial_result
, defaults to “epoch”grace_period (int, optional) – Minimum resource to be used for a job. Ignored if
rung_levels
is given. Defaults to 1reduction_factor (float, optional) – Parameter to determine rung levels. Ignored if
rung_levels
is given. Must be \(\ge 2\), defaults to 3rung_increment (int, optional) – Parameter to determine rung levels. Ignored if
rung_levels
orreduction_factor
are given. Must be postiverung_levels (
List[int]
, optional) – If given, prescribes the set of rung levels to be used. Must contain positive integers, strictly increasing. This information overridesgrace_period
,reduction_factor
,rung_increment
. Note that the stop/promote rule in the successive halving scheduler is set based on the ratio of successive rung levels.brackets (int, optional) – Number of brackets to be used in Hyperband. Each bracket has a different grace period, all share
max_t
andreduction_factor
. Ifbrackets == 1
(default), we run asynchronous successive halving.type (str, optional) –
Type of Hyperband scheduler. Defaults to “stopping”. Supported values (see also subclasses of
RungSystem
):stopping: A config eval is executed by a single task. The task is stopped at a milestone if its metric is worse than a fraction of those who reached the milestone earlier, otherwise it continues. See
StoppingRungSystem
.promotion: A config eval may be associated with multiple tasks over its lifetime. It is never terminated, but may be paused. Whenever a task becomes available, it may promote a config to the next milestone, if better than a fraction of others who reached the milestone. If no config can be promoted, a new one is chosen. See
PromotionRungSystem
.cost_promotion: This is a cost-aware variant of ‘promotion’, see
CostPromotionRungSystem
for details. In this case, costs must be reported under the namerung_system_kwargs["cost_attr"]
in results.pasha: Similar to promotion type Hyperband, but it progressively expands the available resources until the ranking of configurations stabilizes.
rush_stopping: A variation of the stopping scheduler which requires passing
rung_system_kwargs
andpoints_to_evaluate
. The firstrung_system_kwargs["num_threshold_candidates"]
ofpoints_to_evaluate
will enforce stricter rules on which task is continued. SeeRUSHStoppingRungSystem
andRUSHScheduler
.rush_promotion: Same as
rush_stopping
but for promotion, seeRUSHPromotionRungSystem
dyhpo: A model-based scheduler, which can be seen as extension of “promotion” with
rung_increment
rather thanreduction_factor
, seeDynamicHPOSearcher
cost_attr (str, optional) – Required if the scheduler itself uses a cost metric (i.e.,
type="cost_promotion"
), or if the searcher uses a cost metric. See also header comment.searcher_data (str, optional) –
Relevant only if a model-based searcher is used. Example: For NN tuning and ``resource_attr == “epoch”’, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:
”rungs” (default): Only results at rung levels. Cheapest
”all”: All results. Most expensive
”rungs_and_last”: Results at rung levels, plus the most recent result. This means that in between rung levels, only the most recent result is used by the searcher. This is in between
Note: For a Gaussian additive learning curve surrogate model, this has to be set to ‘all’.
register_pending_myopic (bool, optional) – See above. Used only if
searcher_data != "rungs"
. Defaults toFalse
rung_system_per_bracket (bool, optional) – This concerns Hyperband with
brackets > 1
. Defaults toFalse
. When starting a job for a new config, it is assigned a randomly sampled bracket. The larger the bracket, the larger the grace period for the config. Ifrung_system_per_bracket == True
, we maintain separate rung level systems for each bracket, so that configs only compete with others started in the same bracket. Ifrung_system_per_bracket == False
, we use a single rung level system, so that all configs compete with each other. In this case, the bracket of a config only determines the initial grace period, i.e. the first milestone at which it starts competing with others. This is the default. The concept of brackets in Hyperband is meant to hedge against overly aggressive filtering in successive halving, based on low fidelity criteria. In practice, successive halving (i.e.,brackets = 1
) often works best in the asynchronous case (as implemented here). Ifbrackets > 1
, the hedging is stronger ifrung_system_per_bracket
isTrue
.do_snapshots (bool, optional) – Support snapshots? If
True
, a snapshot of all running tasks and rung levels is returned by_promote_trial()
. This snapshot is passed tosearcher.get_config
. Defaults toFalse
. Note: Currently, only the stopping variant supports snapshots.rung_system_kwargs (Dict[str, Any], optional) –
Arguments passed to the rung system: * num_threshold_candidates: Used if ``type in [“rush_promotion”,
”rush_stopping”]``. The first
num_threshold_candidates
inpoints_to_evaluate
enforce stricter requirements to the continuation of training tasks. SeeRUSHScheduler
.probability_sh: Used if
type == "dyhpo"
. In DyHPO, we typically all paused trials against a number of new configurations, and the winner is either resumed or started (new trial). However, with the probability given here, we instead try to promote a trial as iftype == "promotion"
. If no trial can be promoted, we fall back to the DyHPO logic. Use this to make DyHPO robust against starting too many new trials, because all paused ones score poorly (this happens especially at the beginning).
early_checkpoint_removal_kwargs (Dict[str, Any], optional) – If given, speculative early removal of checkpoints is done, see
HyperbandRemoveCheckpointsCallback
. The constructor arguments for theHyperbandRemoveCheckpointsCallback
must be given here, if they cannot be inferred (keymax_num_checkpoints
is mandatory). This feature is used only for scheduler types which pause and resume trials.
- does_pause_resume()[source]
- Return type:
bool
- Returns:
Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?
- property rung_levels: List[int]
Note that all entries of
rung_levels
are smaller thanmax_t
(orconfig_space[max_resource_attr]
): rung levels are resource levels where stop/go decisions are made. In particular, ifrung_levels
is passed at construction withrung_levels[-1] == max_t
, this last entry is stripped off.- Returns:
Rung levels (strictly increasing, positive ints)
- property num_brackets: int
- Returns:
Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1
- property resource_attr: str
- Returns:
Name of resource attribute in reported results
- property max_resource_level: int
- Returns:
Maximum resource level
- property searcher_data: str
- Returns:
Relevant only if a model-based searcher is used. Example: For NN tuning and
resource_attr == "epoch"
, we receive a result for each epoch, but not all epoch values are also rung levels.searcher_data
determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensiveget_config()
may become. Choices:”rungs”: Only results at rung levels. Cheapest
”all”: All results. Most expensive
”rungs_and_last”: Results at rung levels plus last recent one. Not available for all multi-fidelity schedulers
- on_trial_error(trial)[source]
Called when a trial has failed.
- Parameters:
trial (
Trial
) – Trial for which error is reported.
- on_trial_result(trial, result)[source]
We simply relay
result
to the searcher. Other decisions are done inon_trial_complete
.- Return type:
str
- on_trial_remove(trial)[source]
Called to remove trial.
This is called when the trial is in PAUSED or PENDING state. Otherwise, call
on_trial_complete()
.- Parameters:
trial (
Trial
) – Trial to be removed
- on_trial_complete(trial, result)[source]
Notification for the completion of trial.
Note that
on_trial_result()
is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignoreon_trial_result()
and just useresult
here.- Parameters:
trial (
Trial
) – Trial which is completingresult (
Dict
[str
,Any
]) – Result dictionary
- callback_for_checkpoint_removal(stop_criterion)[source]
- Parameters:
stop_criterion (
Callable
[[TuningStatus
],bool
]) – Stopping criterion, as passed toTuner
- Return type:
Optional
[TunerCallback
]- Returns:
CP removal callback, or
None
if CP removal is not activated
- class syne_tune.optimizer.schedulers.hyperband.HyperbandBracketManager(scheduler_type, resource_attr, metric, mode, max_t, rung_levels, brackets, rung_system_per_bracket, cost_attr, random_seed, rung_system_kwargs, scheduler)[source]
Bases:
object
Maintains rung level systems for range of brackets. Differences depending on
scheduler_type
manifest themselves mostly at the level of the rung level system itself.- Parameters:
scheduler_type (
str
) – SeeHyperbandScheduler
.resource_attr (
str
) – SeeHyperbandScheduler
.metric (
str
) – SeeHyperbandScheduler
.mode (
str
) – SeeHyperbandScheduler
.max_t (
int
) – SeeHyperbandScheduler
.rung_levels (
List
[int
]) – SeeHyperbandScheduler
.brackets (
int
) – SeeHyperbandScheduler
.rung_system_per_bracket (
bool
) – SeeHyperbandScheduler
.cost_attr (
str
) – Overrides entry inrung_system_kwargs
random_seed (
int
) – Random seed for bracket samplingrung_system_kwargs (
Dict
[str
,Any
]) – Arguments passed to the rung systemscheduler (
HyperbandScheduler
) – The scheduler is needed in order to sample a bracket, and also some rung level systems need more information from the scheduler
- static does_pause_resume(scheduler_type)[source]
- Return type:
bool
- Returns:
Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?
- on_task_add(trial_id, **kwargs)[source]
Called when new task is started (can be new trial or trial being resumed).
Since the bracket has already been sampled, not much is done here. We return the list of milestones for this bracket in reverse (decreasing) order. The first entry is
max_t
, even if it is not a rung level in the bracket. This list contains the resource levels the task would reach if it ran tomax_t
without being stopped.- Parameters:
trial_id (
str
) – ID of trialkwargs – Further arguments passed to
rung_sys.on_task_add
- Return type:
List
[int
]- Returns:
List of milestones in decreasing order, where`` max_t`` is first
- on_task_report(trial_id, result)[source]
This method is called whenever a new report is received. It returns a dictionary with all the information needed for making decisions (e.g., stop / continue task, update model, etc). Keys are:
task_continues
: Should task continue or stop/pause?milestone_reached
: True if rung level (ormax_t
) is hitnext_milestone
: If hitrung level < max_t
, this is the subsequent rung level (otherwise: None)bracket_id
: Bracket in which the task is running
- Parameters:
trial_id (
str
) – ID of trialresult (
Dict
[str
,Any
]) – Results reported
- Return type:
Dict
[str
,Any
]- Returns:
See above
- on_task_remove(trial_id)[source]
Called when trial is stopped or completes
- Parameters:
trial_id – ID of trial
- on_task_schedule(new_trial_id)[source]
Samples bracket for task to be scheduled. Check whether any paused trial in that bracket can be promoted. If so, its
trial_id
is returned. We also returnextra_kwargs
to be used in_promote_trial
. This contains the bracket which was sampled (key “bracket”).Note:
extra_kwargs
can return information also iftrial_id = None
is returned. This information is passed toget_config
of the searcher.Note:
extra_kwargs
can return information also iftrial_id = None
is returned. This information is passed toget_config
of the searcher.- Parameters:
new_trial_id (
str
) – ID for new trial as passed to_suggest()
- Return type:
(
Optional
[str
],dict
)- Returns:
(trial_id, extra_kwargs)
- paused_trials(resource=None)[source]
Only for pause and resume schedulers (
does_pause_resume()
returnsTrue
), where trials can be paused at certain rung levels only. Ifresource
is not given, returns list of all paused trials(trial_id, rank, metric_val, level)
, wherelevel
is the rung level, andrank
is the rank of the trial in the rung (0 for the best metric value). Ifresource
is given, only the paused trials in the rung of this level are returned.- Parameters:
resource (
Optional
[int
]) – If given, paused trials of only this rung level are returned. Otherwise, all paused trials are returned- Return type:
List
[Tuple
[str
,int
,float
,int
]]- Returns:
See above
syne_tune.optimizer.schedulers.hyperband_checkpoint_removal module
- syne_tune.optimizer.schedulers.hyperband_checkpoint_removal.create_callback_for_checkpoint_removal(callback_kwargs, stop_criterion)[source]
- Return type:
Optional
[TunerCallback
]
syne_tune.optimizer.schedulers.hyperband_cost_promotion module
- class syne_tune.optimizer.schedulers.hyperband_cost_promotion.CostPromotionRungEntry(trial_id, metric_val, cost_val, was_promoted=False)[source]
Bases:
PromotionRungEntry
Appends
cost_val
to the superclass. This is the cost value \(c(x, r)\) recorded for the trial at the resource level.
- class syne_tune.optimizer.schedulers.hyperband_cost_promotion.CostPromotionRungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, cost_attr, max_t)[source]
Bases:
PromotionRungSystem
Cost-aware extension of promotion-based asynchronous Hyperband (ASHA).
This code is equivalent with base
PromotionRungSystem
, except the “promotable” condition in_find_promotable_trial()
is replaced.When a config \(\mathbf{x}\) reaches rung level \(r\), the result includes a metric \(f(\mathbf{x}, r)\), but also a cost \(c(\mathbf{x}, r)\). The latter is the cost (e.g., training time) spent to reach level \(r\).
Consider all trials who reached rung level \(r\) (whether promoted from there or still paused there), ordered w.r.t. \(f(\mathbf{x}, r)\), best first, and let their number be \(N\). Define
\[C(r, k) = \sum_{i\le k} c(\mathbf{x}_i, r)\]For a promotion quantile \(q\), define
\[K = \max_k \mathrm{I}[ C(r, k) \le q C(r, N) ]\]Any trial not yet promoted and ranked \(\le K\) can be promoted. As usual, we scan rungs from the top. If several trials are promotable, the one with the best metric value is promoted.
Note that costs \(c(\mathbf{x}, r)\) reported via
cost_attr
need to be total costs of a trial. If the trial is paused and resumed, partial costs have to be added up. SeeHyperbandScheduler
for how this works.
syne_tune.optimizer.schedulers.hyperband_pasha module
- class syne_tune.optimizer.schedulers.hyperband_pasha.PASHARungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t)[source]
Bases:
PromotionRungSystem
Implements PASHA algorithm. PASHA is a more efficient version of ASHA and is able to dynamically allocate maximum resources for the tuning procedure depending on the need. Experimental evaluation has shown PASHA consumes significantly fewer computational resources than ASHA.
- For more details, see the paper:
- Bohdal, Balles, Wistuba, Ermis, Archambeau, Zappella (2023)PASHA: Efficient HPO and NAS with Progressive Resource Allocation
syne_tune.optimizer.schedulers.hyperband_promotion module
- class syne_tune.optimizer.schedulers.hyperband_promotion.PromotionRungEntry(trial_id, metric_val, was_promoted=False)[source]
Bases:
RungEntry
Appends
was_promoted
to the superclass. This isTrue
iff the trial has been promoted from this rung. Otherwise, the trial is paused at this rung.
- class syne_tune.optimizer.schedulers.hyperband_promotion.PromotionRungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t)[source]
Bases:
RungSystem
Implements the promotion logic for an asynchronous variant of Hyperband, known as ASHA:
In ASHA, configs sit paused at milestones (rung levels) until they get promoted, which means that a free task picks up their evaluation until the next milestone.
The rule to decide whether a paused trial is promoted (or remains paused) is the same as in
StoppingRungSystem
, except that continues becomes gets promoted. If several paused trials in a rung can be promoted, the one with the best metric value is chosen.Note: Say that an evaluation is resumed from level
resume_from
. If the trial evaluation function does not implement pause & resume, it needs to start training from scratch, in which case metrics are reported for every epoch, also those< resume_from
. At least for some modes of fitting the searcher model to data, this would lead to duplicate target values for the same extended config \((x, r)\), which we want to avoid. The solution is to maintainresume_from
in the data for the terminator (see_running
). Given this, we can report inon_task_report()
that the current metric data should not be used for the searcher model (ignore_data = True
), namely as long as the evaluation has not yet gone beyond levelresume_from
.- on_task_schedule(new_trial_id)[source]
Used to implement
_promote_trial()
. Searches through rungs to find a trial which can be promoted. If one is found, we return thetrial_id
and other info (current milestone, milestone to be promoted to). We also mark the trial as being promoted at the rung level it sits right now.- Return type:
Dict
[str
,Any
]
- on_task_add(trial_id, skip_rungs, **kwargs)[source]
Called when new task is started. Depending on
kwargs["new_config"]
, this could start an evaluation (True
) or promote an existing config to the next milestone (False
). In the latter case,kwargs
contains additional information about the promotion (in “milestone”, “resume_from”).- Parameters:
trial_id (
str
) – ID of trial to be startedskip_rungs (
int
) – This number of the smallest rung levels are not considered milestones for this taskkwargs – Additional arguments
- on_task_report(trial_id, result, skip_rungs)[source]
Decision on whether task may continue (
task_continues=True
), or should be paused (task_continues=False
).milestone_reached
is a flag whether resource coincides with a milestone. For this scheduler, we have thattask_continues == not milestone_reached
,since a trial is always paused at a milestone.
ignore_data
is True if a result is received from a resumed trial at a level<= resume_from
. This happens if checkpointing is not implemented (or not used), because resumed trials are started from scratch then. These metric values should in general be ignored.- Parameters:
trial_id (
str
) – ID of trial which reported resultsresult (
Dict
[str
,Any
]) – Reported metricsskip_rungs (
int
) – This number of smallest rung levels are not considered milestones for this task
- Return type:
Dict
[str
,Any
]- Returns:
dict(task_continues, milestone_reached, next_milestone, ignore_data)
- on_task_remove(trial_id)[source]
Called when task is removed.
- Parameters:
trial_id (
str
) – ID of trial which is to be removed
- static does_pause_resume()[source]
- Return type:
bool
- Returns:
Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?
- support_early_checkpoint_removal()[source]
- Return type:
bool
- Returns:
Do we support early checkpoint removal via
paused_trials()
?
- paused_trials(resource=None)[source]
Only for pause and resume schedulers (
does_pause_resume()
returnsTrue
), where trials can be paused at certain rung levels only. Ifresource
is not given, returns list of all paused trials(trial_id, rank, metric_val, level)
, wherelevel
is the rung level, andrank
is the rank of the trial in the rung (0 for the best metric value). Ifresource
is given, only the paused trials in the rung of this level are returned. Ifresource
is not a rung level, the returned list is empty.- Parameters:
resource (
Optional
[int
]) – If given, paused trials of only this rung level are returned. Otherwise, all paused trials are returned- Return type:
List
[Tuple
[str
,int
,float
,int
]]- Returns:
See above
syne_tune.optimizer.schedulers.hyperband_rush module
- class syne_tune.optimizer.schedulers.hyperband_rush.RUSHDecider(num_threshold_candidates, mode)[source]
Bases:
object
Implements the additional decision logic according to the RUSH algorithm. It is used as part of
RUSHStoppingRungSystem
andRUSHPromotionRungSystem
. Reference:A resource-efficient method for repeated HPO and NAS.Giovanni Zappella, David Salinas, Cédric Archambeau.AutoML workshop @ ICML 2021.For a more detailed description, refer to
RUSHScheduler
.- Parameters:
num_threshold_candidates (
int
) – Number of threshold candidatesmode (
str
) – “min” or “max”
- class syne_tune.optimizer.schedulers.hyperband_rush.RUSHStoppingRungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t, num_threshold_candidates)[source]
Bases:
StoppingRungSystem
Implementation for RUSH algorithm, stopping variant.
Additional arguments on top of base class
StoppingRungSystem
:- Parameters:
num_threshold_candidates (
int
) – Number of threshold candidates
- class syne_tune.optimizer.schedulers.hyperband_rush.RUSHPromotionRungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t, num_threshold_candidates)[source]
Bases:
PromotionRungSystem
Implementation for RUSH algorithm, promotion variant.
Additional arguments on top of base class
PromotionRungSystem
:- Parameters:
num_threshold_candidates (
int
) – Number of threshold candidates
syne_tune.optimizer.schedulers.hyperband_stopping module
- class syne_tune.optimizer.schedulers.hyperband_stopping.RungEntry(trial_id, metric_val)[source]
Bases:
object
Represents entry in a rung. This class is extended by rung level systems which need to maintain more information per entry.
- Parameters:
trial_id (
str
) – ID of trialmetric_val (
float
) – Metric value
- class syne_tune.optimizer.schedulers.hyperband_stopping.Rung(level, prom_quant, mode, data=None)[source]
Bases:
object
- Parameters:
level (
int
) – Rung level \(r_j\)prom_quant (
float
) – promotion quantile \(q_j\)data (
Optional
[List
[RungEntry
]]) – Data of all previous jobs reaching the level. This list is kept sorted w.r.t.metric_val
, so that best values come first
- quantile()[source]
Returns same value as
numpy.quantile(metric_vals, q)
, wheremetric_vals
are the metric values indata
, andq = prom_quant
ifmode == "min"
,q = ``1 - prom_quant
otherwise. Iflen(data) < 2
, we returnNone
.See here. The default for
numpy.quantile
ismethod="linear"
.- Return type:
Optional
[float
]- Returns:
See above
- class syne_tune.optimizer.schedulers.hyperband_stopping.RungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t)[source]
Bases:
object
Terminology: Trials emit results at certain resource levels (e.g., epoch numbers). Some resource levels are rung levels, this is where scheduling decisions (stop, continue or pause, resume) are taken. For a running trial, the next rung level (or
max_t
) it will reach is called its next milestone.Note that
rung_levels
,promote_quantiles
can be empty. All entries ofrung_levels
are smaller thanmax_t
.- Parameters:
rung_levels (
List
[int
]) – List of rung levels (positive int, increasing)promote_quantiles (
List
[float
]) – List of promotion quantiles at each rung levelmetric (
str
) – Name of metric to optimizemode (
str
) – “min” or “max”resource_attr (
str
) – Name of resource attributemax_t (
int
) – Largest resource level
- on_task_schedule(new_trial_id)[source]
Called when new task is to be scheduled.
For a promotion-based rung system, check whether any trial can be promoted. If so, return dict with keys “trial_id”, “resume_from” (rung level where trial is paused), “milestone” (next rung level the trial will reach, or None).
If no trial can be promoted, or if the rung system is not promotion-based, the returned dictionary must not contain the “trial_id” key. It is nevertheless passed back via
extra_kwargs
inon_task_schedule()
. The default is to return an empty dictionary, but some special subclasses can use this to return information in case a trial is not promoted.If no trial can be promoted, or if the rung system is not promotion-based, the returned dictionary must not contain the “trial_id” key. It is nevertheless passed back via
extra_kwargs
inon_task_schedule()
. The default is to return an empty dictionary, but some special subclasses can use this to return information in case a trial is not promoted.- Parameters:
new_trial_id (
str
) – ID for new trial as passed to_suggest()
. Only needed by specific subclasses- Return type:
Dict
[str
,Any
]- Returns:
See above
- on_task_add(trial_id, skip_rungs, **kwargs)[source]
Called when new task is started.
- Parameters:
trial_id (
str
) – ID of trial to be startedskip_rungs (
int
) – This number of the smallest rung levels are not considered milestones for this taskkwargs – Additional arguments
- on_task_report(trial_id, result, skip_rungs)[source]
Called when a trial reports metric results.
Returns dict with keys “milestone_reached” (trial reaches its milestone), “task_continues” (trial should continue; otherwise it is stopped or paused), “next_milestone” (next milestone it will reach, or None). For certain subclasses, there may be additional entries.
- Parameters:
trial_id (
str
) – ID of trial which reported resultsresult (
Dict
[str
,Any
]) – Reported metricsskip_rungs (
int
) – This number of the smallest rung levels are not considered milestones for this task
- Return type:
Dict
[str
,Any
]- Returns:
See above
- on_task_remove(trial_id)[source]
Called when task is removed.
- Parameters:
trial_id (
str
) – ID of trial which is to be removed
- get_first_milestone(skip_rungs)[source]
- Parameters:
skip_rungs (
int
) – This number of the smallest rung levels are not considered milestones for this task- Return type:
int
- Returns:
First milestone to be considered
- get_milestones(skip_rungs)[source]
- Parameters:
skip_rungs (
int
) – This number of the smallest rung levels are not considered milestones for this task- Return type:
List
[int
]- Returns:
All milestones to be considered, in decreasing order; does not include
max_t
- snapshot_rungs(skip_rungs)[source]
A snapshot is a list of rung levels with entries
(level, data)
, ordered from top to bottom (largest rung first).- Parameters:
skip_rungs (
int
) – This number of the smallest rung levels are not considered milestones for this task- Return type:
List
[Tuple
[int
,List
[RungEntry
]]]- Returns:
Snapshot (see above)
- static does_pause_resume()[source]
- Return type:
bool
- Returns:
Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?
- support_early_checkpoint_removal()[source]
- Return type:
bool
- Returns:
Do we support early checkpoint removal via
paused_trials()
?
- paused_trials(resource=None)[source]
Only for pause and resume schedulers (
does_pause_resume()
returnsTrue
), where trials can be paused at certain rung levels only. Ifresource
is not given, returns list of all paused trials(trial_id, rank, metric_val, level)
, wherelevel
is the rung level, andrank
is the rank of the trial in the rung (0 for the best metric value). Ifresource
is given, only the paused trials in the rung of this level are returned. Ifresource
is not a rung level, the returned list is empty.- Parameters:
resource (
Optional
[int
]) – If given, paused trials of only this rung level are returned. Otherwise, all paused trials are returned- Return type:
List
[Tuple
[str
,int
,float
,int
]]- Returns:
See above
- class syne_tune.optimizer.schedulers.hyperband_stopping.StoppingRungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t)[source]
Bases:
RungSystem
The decision on whether a trial \(\mathbf{x}\) continues or is stopped at a rung level \(r\), is taken in
on_task_report()
. To this end, the metric value \(f(\mathbf{x}, r)\) is inserted into \(r.data\). Then:\[\mathrm{continues}(\mathbf{x}, r)\; \Leftrightarrow\; f(\mathbf{x}, r) \le \mathrm{np.quantile}(r.data, r.prom\_quant)\]in case
mode == "min"
. See also_task_continues()
.- on_task_schedule(new_trial_id)[source]
Called when new task is to be scheduled.
For a promotion-based rung system, check whether any trial can be promoted. If so, return dict with keys “trial_id”, “resume_from” (rung level where trial is paused), “milestone” (next rung level the trial will reach, or None).
If no trial can be promoted, or if the rung system is not promotion-based, the returned dictionary must not contain the “trial_id” key. It is nevertheless passed back via
extra_kwargs
inon_task_schedule()
. The default is to return an empty dictionary, but some special subclasses can use this to return information in case a trial is not promoted.If no trial can be promoted, or if the rung system is not promotion-based, the returned dictionary must not contain the “trial_id” key. It is nevertheless passed back via
extra_kwargs
inon_task_schedule()
. The default is to return an empty dictionary, but some special subclasses can use this to return information in case a trial is not promoted.- Parameters:
new_trial_id (
str
) – ID for new trial as passed to_suggest()
. Only needed by specific subclasses- Return type:
Dict
[str
,Any
]- Returns:
See above
- on_task_report(trial_id, result, skip_rungs)[source]
Called when a trial reports metric results.
Returns dict with keys “milestone_reached” (trial reaches its milestone), “task_continues” (trial should continue; otherwise it is stopped or paused), “next_milestone” (next milestone it will reach, or None). For certain subclasses, there may be additional entries.
- Parameters:
trial_id (
str
) – ID of trial which reported resultsresult (
Dict
[str
,Any
]) – Reported metricsskip_rungs (
int
) – This number of the smallest rung levels are not considered milestones for this task
- Return type:
Dict
[str
,Any
]- Returns:
See above
syne_tune.optimizer.schedulers.median_stopping_rule module
- class syne_tune.optimizer.schedulers.median_stopping_rule.MedianStoppingRule(scheduler, resource_attr, running_average=True, metric=None, grace_time=1, grace_population=5, rank_cutoff=0.5)[source]
Bases:
TrialScheduler
Applies median stopping rule in top of an existing scheduler.
If result at time-step ranks less than the cutoff of other results observed at this time-step, the trial is interrupted and otherwise, the wrapped scheduler is called to make the stopping decision.
Suggest decisions are left to the wrapped scheduler.
The mode of the wrapped scheduler is used.
Reference:
Google Vizier: A Service for Black-Box Optimization.Golovin et al. 2017.Proceedings of the 23rd ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, August 2017Pages 1487–1495- Parameters:
scheduler (
TrialScheduler
) – Scheduler to be called for trial suggestion or when median-stopping-rule decision is to continue.resource_attr (
str
) – Key in the reported dictionary that accounts for the resource (e.g. epoch).running_average (
bool
) – IfTrue
, then uses the running average of observation instead of raw observations. Defaults toTrue
metric (
Optional
[str
]) – Metric to be considered, defaults toscheduler.metric
grace_time (
Optional
[int
]) – Median stopping rule is only applied for results whoseresource_attr
exceeds this amount. Defaults to 1grace_population (
int
) – Median stopping rule when at leastgrace_population
have been observed at a resource level. Defaults to 5rank_cutoff (
float
) – Results whose quantiles are below this level are discarded. Defaults to 0.5 (median)
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- grace_condition(time_step)[source]
- Parameters:
time_step (
float
) – Valueresult[self.resource_attr]
- Return type:
bool
- Returns:
Decide for continue?
syne_tune.optimizer.schedulers.multi_fidelity module
- class syne_tune.optimizer.schedulers.multi_fidelity.MultiFidelitySchedulerMixin[source]
Bases:
object
Declares properties which are required for multi-fidelity schedulers.
- property resource_attr: str
- Returns:
Name of resource attribute in reported results
- property max_resource_level: int
- Returns:
Maximum resource level
- property rung_levels: List[int]
- Returns:
Rung levels (positive int; increasing), may or may not include
max_resource_level
- property searcher_data: str
- Returns:
Relevant only if a model-based searcher is used. Example: For NN tuning and
resource_attr == "epoch"
, we receive a result for each epoch, but not all epoch values are also rung levels.searcher_data
determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensiveget_config()
may become. Choices:”rungs”: Only results at rung levels. Cheapest
”all”: All results. Most expensive
”rungs_and_last”: Results at rung levels plus last recent one. Not available for all multi-fidelity schedulers
- property num_brackets: int
- Returns:
Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1
syne_tune.optimizer.schedulers.pbt module
- class syne_tune.optimizer.schedulers.pbt.PBTTrialState(trial, last_score=None, last_checkpoint=None, last_perturbation_time=0, stopped=False)[source]
Bases:
object
Internal PBT state tracked per-trial.
-
last_score:
float
= None
-
last_checkpoint:
int
= None
-
last_perturbation_time:
int
= 0
-
stopped:
bool
= False
-
last_score:
- class syne_tune.optimizer.schedulers.pbt.PopulationBasedTraining(config_space, custom_explore_fn=None, **kwargs)[source]
Bases:
FIFOScheduler
Implements the Population Based Training (PBT) algorithm. This is an adapted version of the Ray Tune implementation:
https://docs.ray.io/en/latest/tune/tutorials/tune-advanced-tutorial.html
PBT was originally presented in the following paper:
Population based training (PBT) maintains a population of models spread across an asynchronous set of workers and dynamically adjust their hyperparameters during training. Every time a worker reaches a user-defined milestone, it returns the performance of the currently evaluated network. If the network is within the top percentile of the population, the worker resumes its training until the next milestone. If not, PBT selects a model from the top percentile uniformly at random. The worker now continues with the latest checkpoint of this new model but mutates the hyperparameters.
The mutation happens as following. For each hyperparameter, we either resample its value uniformly at random, or otherwise increment (multiply by 1.2) or decrement (multiply by 0.8) the value (probability 0.5 each). For categorical hyperparameters, the value is always resampled uniformly.
Note: While this is implemented as child of
FIFOScheduler
, we requiresearcher="random"
(default), since the current code only supports a random searcher.Additional arguments on top of parent class
FIFOScheduler
.- Parameters:
resource_attr (str) – Name of resource attribute in results obtained via
on_trial_result
, defaults to “time_total_s”population_size (int, optional) – Size of the population, defaults to 4
perturbation_interval (float, optional) – Models will be considered for perturbation at this interval of
resource_attr
. Note that perturbation incurs checkpoint overhead, so you shouldn’t set this to be too frequent. Defaults to 60quantile_fraction (float, optional) – Parameters are transferred from the top
quantile_fraction
fraction of trials to the bottomquantile_fraction
fraction. Needs to be between 0 and 0.5. Setting it to 0 essentially implies doing no exploitation at all. Defaults to 0.25resample_probability (float, optional) – The probability of resampling from the original distribution when applying
_explore()
. If not resampled, the value will be perturbed by a factor of 1.2 or 0.8 if continuous, or changed to an adjacent value if discrete. Defaults to 0.25custom_explore_fn (function, optional) – Custom exploration function. This function is invoked as
f(config)
instead of the built-in perturbations, and should returnconfig
updated as needed. If this is given,resample_probability
is not used
syne_tune.optimizer.schedulers.random_seeds module
syne_tune.optimizer.schedulers.ray_scheduler module
- class syne_tune.optimizer.schedulers.ray_scheduler.RayTuneScheduler(config_space, ray_scheduler=None, ray_searcher=None, points_to_evaluate=None)[source]
Bases:
TrialScheduler
Allow using Ray scheduler and searcher. Any searcher/scheduler should work, except such which need access to
TrialRunner
(e.g., PBT), this feature is not implemented in Syne Tune.If
ray_searcher
is not given (defaults to random searcher), initial configurations to evaluate can be passed inpoints_to_evaluate
. Ifray_searcher
is given, this argument is ignored (needs to be passed toray_searcher
at construction). Note: Useimpute_points_to_evaluate()
in order to preprocesspoints_to_evaluate
specified by the user or the benchmark.- Parameters:
config_space (
Dict
) – Configuration spaceray_scheduler – Ray scheduler, defaults to FIFO scheduler
ray_searcher (
Optional
[Searcher
]) – Ray searcher, defaults to random searchpoints_to_evaluate (
Optional
[List
[Dict
]]) – See above
- RT_FIFOScheduler
alias of
FIFOScheduler
- RT_Searcher
alias of
Searcher
- class RandomSearch(config_space, points_to_evaluate, mode)[source]
Bases:
Searcher
- suggest(trial_id)[source]
Queries the algorithm to retrieve the next set of parameters.
- Return type:
Optional
[Dict
]
- Arguments:
trial_id: Trial ID used for subsequent notifications.
- Returns:
- dict | FINISHED | None: Configuration for a trial, if possible.
If FINISHED is returned, Tune will be notified that no more suggestions/configurations will be provided. If None is returned, Tune will skip the querying of the searcher for this step.
- on_trial_complete(trial_id, result=None, error=False)[source]
Notification for the completion of trial.
Typically, this method is used for notifying the underlying optimizer of the result.
- Args:
trial_id: A unique string ID for the trial. result: Dictionary of metrics for current training progress.
Note that the result dict may include NaNs or may not include the optimization metric. It is up to the subclass implementation to preprocess the result to avoid breaking the optimization process. Upon errors, this may also be None.
error: True if the training process raised an error.
- on_trial_add(trial)[source]
Called when a new trial is added to the trial runner.
Additions are normally triggered by
suggest
.- Parameters:
trial (
Trial
) – Trial to be added
- on_trial_error(trial)[source]
Called when a trial has failed.
- Parameters:
trial (
Trial
) – Trial for which error is reported.
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- on_trial_complete(trial, result)[source]
Notification for the completion of trial.
Note that
on_trial_result()
is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignoreon_trial_result()
and just useresult
here.- Parameters:
trial (
Trial
) – Trial which is completingresult (
Dict
) – Result dictionary
- on_trial_remove(trial)[source]
Called to remove trial.
This is called when the trial is in PAUSED or PENDING state. Otherwise, call
on_trial_complete()
.- Parameters:
trial (
Trial
) – Trial to be removed
- metric_names()[source]
- Return type:
List
[str
]- Returns:
List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)
- metric_mode()[source]
- Return type:
str
- Returns:
“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned
- static convert_config_space(config_space)[source]
Converts config_space from our type to the one of Ray Tune.
Note:
randint(lower, upper)
in Ray Tune has exclusiveupper
, while this is inclusive for us. On the other hand,lograndint(lower, upper)
has inclusiveupper
in Ray Tune as well.- Parameters:
config_space – Configuration space
- Returns:
config_space
converted into Ray Tune type
syne_tune.optimizer.schedulers.remove_checkpoints module
- class syne_tune.optimizer.schedulers.remove_checkpoints.RemoveCheckpointsSchedulerMixin[source]
Bases:
object
Methods to be implemented by pause-and-resume schedulers (in that
on_trial_result()
can returnSchedulerDecision.PAUSE
) which support early removal of checkpoints. Typically, model checkpoints are retained for paused trials, because they may get resumed later on. This can lead to the disk filling up, so removing checkpoints which are no longer needed, can be important.Early checkpoint removal is implemented as a callback used with
Tuner
, which is created bycallback_for_checkpoint_removal()
here.- callback_for_checkpoint_removal(stop_criterion)[source]
- Parameters:
stop_criterion (
Callable
[[TuningStatus
],bool
]) – Stopping criterion, as passed toTuner
- Return type:
Optional
[TunerCallback
]- Returns:
CP removal callback, or
None
if CP removal is not activated
syne_tune.optimizer.schedulers.scheduler_searcher module
- class syne_tune.optimizer.schedulers.scheduler_searcher.TrialSchedulerWithSearcher(config_space, **kwargs)[source]
Bases:
TrialScheduler
Base class for trial schedulers which have a
BaseSearcher
membersearcher
. This searcher has a methodconfigure_scheduler()
which has to be called before the searcher is first used.We also collect common code here:
Determine
max_resource_level
if not explicitly givenMaster seed,
random_seed_generator
- property searcher: BaseSearcher | None
- suggest(trial_id)[source]
Returns a suggestion for a new trial, or one to be resumed
This method returns
suggestion
of typeTrialSuggestion
(unless there is no config left to explore, and None is returned).If
suggestion.spawn_new_trial_id
isTrue
, a new trial is to be started with configsuggestion.config
. Typically, this new trial is started from scratch. But ifsuggestion.checkpoint_trial_id
is given, the trial is to be (warm)started from the checkpoint written for the trial with this ID. The new trial has IDtrial_id
.If
suggestion.spawn_new_trial_id
isFalse
, an existing and currently paused trial is to be resumed, whose ID issuggestion.checkpoint_trial_id
. If this trial has a checkpoint, we start from there. In this case,suggestion.config
is optional. If not given (default), the config of the resumed trial does not change. Otherwise, its config is overwritten bysuggestion.config
(seeHyperbandScheduler
withtype="promotion"
for an example why this can be useful).Apart from the HP config, additional fields can be appended to the dict, these are passed to the trial function as well.
- Parameters:
trial_id (
int
) – ID for new trial to be started (ignored if existing trial to be resumed)- Return type:
Optional
[TrialSuggestion
]- Returns:
Suggestion for a trial to be started or to be resumed, see above. If no suggestion can be made, None is returned
- on_trial_error(trial)[source]
Called when a trial has failed.
- Parameters:
trial (
Trial
) – Trial for which error is reported.
- on_trial_complete(trial, result)[source]
Notification for the completion of trial.
Note that
on_trial_result()
is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignoreon_trial_result()
and just useresult
here.- Parameters:
trial (
Trial
) – Trial which is completingresult (
Dict
[str
,Any
]) – Result dictionary
syne_tune.optimizer.schedulers.smac_scheduler module
Submodules
syne_tune.optimizer.baselines module
- class syne_tune.optimizer.baselines.RandomSearch(config_space, metric, **kwargs)[source]
Bases:
FIFOScheduler
Random search.
See
RandomSearcher
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizekwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.GridSearch(config_space, metric, **kwargs)[source]
Bases:
FIFOScheduler
Grid search.
See
GridSearcher
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizekwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.BayesianOptimization(config_space, metric, **kwargs)[source]
Bases:
FIFOScheduler
Gaussian process based Bayesian optimization.
See
GPFIFOSearcher
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizekwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.ASHA(config_space, metric, resource_attr, **kwargs)[source]
Bases:
HyperbandScheduler
Asynchronous Sucessive Halving (ASHA).
One of
max_t
,max_resource_attr
needs to be inkwargs
. Fortype="promotion"
, the latter is more useful.See also
HyperbandScheduler
forkwargs
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributekwargs – Additional arguments to
HyperbandScheduler
- class syne_tune.optimizer.baselines.MOBSTER(config_space, metric, resource_attr, **kwargs)[source]
Bases:
HyperbandScheduler
Model-based Asynchronous Multi-fidelity Optimizer (MOBSTER).
One of
max_t
,max_resource_attr
needs to be inkwargs
. Fortype="promotion"
, the latter is more useful, see alsoHyperbandScheduler
.MOBSTER can be run with different surrogate models. The model is selected by
search_options["model"]
inkwargs
. The default is"gp_multitask"
(jointly dependent multi-task GP model), another useful choice is"gp_independent"
(independent GP models at each rung level, with shared ARD kernel).See also:
HyperbandScheduler
forkwargs
parametersGPMultiFidelitySearcher
forkwargs["search_options"]
parameters
- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributekwargs – Additional arguments to
HyperbandScheduler
- class syne_tune.optimizer.baselines.HyperTune(config_space, metric, resource_attr, **kwargs)[source]
Bases:
HyperbandScheduler
One of
max_t
,max_resource_attr
needs to be inkwargs
. Fortype="promotion"
, the latter is more useful, see alsoHyperbandScheduler
.Hyper-Tune is a model-based variant of ASHA with more than one bracket. It can be seen as extension of MOBSTER and can be used with
search_options["model"]
inkwargs
being"gp_independent"
or"gp_multitask"
. It has a model-based way to sample the bracket for every new trial, as well as an ensemble predictive distribution feeding into the acquisition function. Our implementation is based on:Yang Li et alHyper-Tune: Towards Efficient Hyper-parameter Tuning at ScaleVLDB 2022See also:
HyperbandScheduler
forkwargs
parametersHyperTuneSearcher
forkwargs["search_options"]
parametersHyperTuneIndependentGPModel
for implementation
- Parameters:
config_space (
Dict
) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributekwargs – Additional arguments to
HyperbandScheduler
- class syne_tune.optimizer.baselines.DyHPO(config_space, metric, resource_attr, probability_sh=None, **kwargs)[source]
Bases:
HyperbandScheduler
Dynamic Gray-Box Hyperparameter Optimization (DyHPO)
One of
max_t
,max_resource_attr
needs to be inkwargs
. The latter is more useful (DyHPO is a pause-resume scheduler), see alsoHyperbandScheduler
.DyHPO can be run with the same surrogate models as
MOBSTER
, butsearch_options["model"] != "gp_independent"
. This is because DyHPO requires extrapolation to resource levels without any data, which cannot sensibly be done with independent GPs per resource level. Compared toMOBSTER
orHyperTune
, DyHPO is typically run with linearly spaced rung levels (the default being 1, 2, 3, …). Decisions whether to promote a paused trial are folded together with suggesting a new configuration, both are model-based. Our implementation is based onWistuba, M. and Kadra, A. and Grabocka, J.Dynamic and Efficient Gray-Box Hyperparameter Optimization for Deep LearningHowever, there are important differences:
We do not implement their surrogate model based on a neural network kernel, but instead just use the surrogate models we provide for
MOBSTER
as wellWe implement a hybrid of DyHPO with the asynchronous successive halving rule for promoting trials, controlled by
probability_sh
. With this probability, we promote a trial via the SH rule. This mitigates the issue that DyHPO tends to start many trials initially, because due to lack of any data at higher rungs, the score values for promoting a trial are much worse than those for starting a new one.
See
HyperbandScheduler
forkwargs
parameters, andGPMultiFidelitySearcher
forkwargs["search_options"]
parameters. The following parameters are most important for DyHPO:rung_increment
(andgrace_period
): These parameters determine the rung level spacing. DyHPO is run with linearly spaced rung levels :math:`r_{min} + k
- u`, where \(r_{min}\) is
grace_period
and :math:`
- u` is
rung_increment
. The default is 2. probability_sh
: See comment. The smaller this probability, the closer the method is to the published original, which tends to start many more trials than promote paused ones. On the other hand, if this probability is close to 1, you may as well run MOBSTER. The default isDEFAULT_SH_PROBABILITY
.search_options["opt_skip_period"]
: DyHPO can be quite a bit slower than MOBSTER, because the GP surrogate model is used more frequently. It can be sped up a bit by changingopt_skip_period
(general default is 1). The default here is 3.
- class syne_tune.optimizer.baselines.PASHA(config_space, metric, resource_attr, **kwargs)[source]
Bases:
HyperbandScheduler
Progressive ASHA.
One of
max_t
,max_resource_attr
needs to be inkwargs
. The latter is more useful, see alsoHyperbandScheduler
.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributekwargs – Additional arguments to
HyperbandScheduler
- class syne_tune.optimizer.baselines.BOHB(config_space, metric, resource_attr, **kwargs)[source]
Bases:
HyperbandScheduler
Asynchronous BOHB
Combines
ASHA
with TPE-like Bayesian optimization, using kernel density estimators.One of
max_t
,max_resource_attr
needs to be inkwargs
. Fortype="promotion"
, the latter is more useful, see alsoHyperbandScheduler
.See
MultiFidelityKernelDensityEstimator
forkwargs["search_options"]
parameters, andHyperbandScheduler
forkwargs
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributekwargs – Additional arguments to
HyperbandScheduler
- class syne_tune.optimizer.baselines.SyncHyperband(config_space, metric, resource_attr, **kwargs)[source]
Bases:
SynchronousGeometricHyperbandScheduler
Synchronous Hyperband.
One of
max_resource_level
,max_resource_attr
needs to be inkwargs
. The latter is more useful, see alsoHyperbandScheduler
.If
kwargs["brackets"]
is not given, the maximum number of brackets is used. Choosekwargs["brackets"] = 1
for synchronous successive halving.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributekwargs – Additional arguments to
SynchronousGeometricHyperbandScheduler
- class syne_tune.optimizer.baselines.SyncBOHB(config_space, metric, resource_attr, **kwargs)[source]
Bases:
SynchronousGeometricHyperbandScheduler
Synchronous BOHB.
Combines
SyncHyperband
with TPE-like Bayesian optimization, using kernel density estimators.One of
max_resource_level
,max_resource_attr
needs to be inkwargs
. The latter is more useful, see alsoHyperbandScheduler
.If
kwargs["brackets"]
is not given, the maximum number of brackets is used. Choosekwargs["brackets"] = 1
for synchronous successive halving.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributekwargs – Additional arguments to
SynchronousGeometricHyperbandScheduler
- class syne_tune.optimizer.baselines.DEHB(config_space, metric, resource_attr, **kwargs)[source]
Bases:
GeometricDifferentialEvolutionHyperbandScheduler
Differential Evolution Hyperband (DEHB).
Combines
SyncHyperband
with ideas from evolutionary algorithms.One of
max_resource_level
,max_resource_attr
needs to be inkwargs
. The latter is more useful, see alsoHyperbandScheduler
.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributekwargs – Additional arguments to
SynchronousGeometricHyperbandScheduler
- class syne_tune.optimizer.baselines.SyncMOBSTER(config_space, metric, resource_attr, **kwargs)[source]
Bases:
SynchronousGeometricHyperbandScheduler
Synchronous MOBSTER.
Combines
SyncHyperband
with Gaussian process based Bayesian optimization, just likeMOBSTER
builds on top ofASHA
in the asynchronous case.One of
max_resource_level
,max_resource_attr
needs to be inkwargs
. The latter is more useful, see alsoHyperbandScheduler
.If
kwargs["brackets"]
is not given, the maximum number of brackets is used. Choosekwargs["brackets"] = 1
for synchronous successive halving.The default surrogate model (
search_options["model"]
inkwargs
) is"gp_independent"
, different toMOBSTER
.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributekwargs – Additional arguments to
SynchronousGeometricHyperbandScheduler
- class syne_tune.optimizer.baselines.BORE(config_space, metric, random_seed=None, **kwargs)[source]
Bases:
FIFOScheduler
Bayesian Optimization by Density-Ratio Estimation (BORE).
See
Bore
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizerandom_seed (
Optional
[int
]) – Random seed, optionalkwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.ASHABORE(config_space, metric, resource_attr, random_seed=None, **kwargs)[source]
Bases:
HyperbandScheduler
Model-based ASHA with BORE searcher
See
MultiFidelityBore
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributerandom_seed (
Optional
[int
]) – Random seed, optionalkwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.BoTorch(config_space, metric, random_seed=None, **kwargs)[source]
Bases:
FIFOScheduler
Bayesian Optimization using BoTorch
See
BoTorchSearcher
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizerandom_seed (
Optional
[int
]) – Random seed, optionalkwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.REA(config_space, metric, population_size=100, sample_size=10, random_seed=None, **kwargs)[source]
Bases:
FIFOScheduler
Regularized Evolution (REA).
See
RegularizedEvolution
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizepopulation_size (
int
) – SeeRegularizedEvolution
. Defaults to 100sample_size (
int
) – SeeRegularizedEvolution
. Defaults to 10random_seed (
Optional
[int
]) – Random seed, optionalkwargs – Additional arguments to
FIFOScheduler
- syne_tune.optimizer.baselines.create_gaussian_process_estimator(config_space, metric, random_seed=None, search_options=None)[source]
- Return type:
- class syne_tune.optimizer.baselines.MORandomScalarizationBayesOpt(config_space, metric, mode='min', random_seed=None, estimators=None, **kwargs)[source]
Bases:
FIFOScheduler
Uses
MultiObjectiveMultiSurrogateSearcher
with one standard GP surrogate model per metric (same as inBayesianOptimization
, together with theMultiObjectiveLCBRandomLinearScalarization
acquisition function.If
estimators
is given, surrogate models are taken from there, and the default is used otherwise. This is useful if you have a good low-variance model for one of the objectives.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
List
[str
]) – Name of metrics to optimizemode (
Union
[List
[str
],str
]) – Modes of optimization. Defaults to “min” for allrandom_seed (
Optional
[int
]) – Random seed, optionalestimators (
Optional
[Dict
[str
,Estimator
]]) – Use these surrogate models instead of the default GP one. Optionalkwargs – Additional arguments to
FIFOScheduler
. Here,kwargs["search_options"]
is used to create the searcher and its GP surrogate models.
- class syne_tune.optimizer.baselines.NSGA2(config_space, metric, mode='min', population_size=20, random_seed=None, **kwargs)[source]
Bases:
FIFOScheduler
See
RandomSearcher
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
List
[str
]) – Name of metric to optimizepopulation_size (
int
) – The size of the population for NSGA-2random_seed (
Optional
[int
]) – Random seed, optionalkwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.MOREA(config_space, metric, mode='min', population_size=100, sample_size=10, random_seed=None, **kwargs)[source]
Bases:
FIFOScheduler
See
RandomSearcher
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
List
[str
]) – Name of metric to optimizepopulation_size (
int
) – SeeRegularizedEvolution
. Defaults to 100sample_size (
int
) – SeeRegularizedEvolution
. Defaults to 10random_seed (
Optional
[int
]) – Random seed, optionalkwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.MOLinearScalarizationBayesOpt(config_space, metric, scalarization_weights=None, **kwargs)[source]
Bases:
LinearScalarizedScheduler
Uses
LinearScalarizedScheduler
together with a default GP surrogate model.See
GPFIFOSearcher
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
List
[str
]) – Name of metric to optimizescalarization_weights (
Optional
[List
[float
]]) – Positive weight used for the scalarization. Defaults to all 1kwargs – Additional arguments to
FIFOScheduler
-
scalarization_weights:
ndarray
-
single_objective_metric:
str
-
base_scheduler:
TrialScheduler
- class syne_tune.optimizer.baselines.ConstrainedBayesianOptimization(config_space, metric, constraint_attr, **kwargs)[source]
Bases:
FIFOScheduler
Constrained Bayesian Optimization.
See
ConstrainedGPFIFOSearcher
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeconstraint_attr (
str
) – Name of constraint metrickwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.ZeroShotTransfer(config_space, transfer_learning_evaluations, metric, mode='min', sort_transfer_learning_evaluations=True, use_surrogates=False, random_seed=None, **kwargs)[source]
Bases:
FIFOScheduler
A zero-shot transfer hyperparameter optimization method which jointly selects configurations that minimize the average rank obtained on historic metadata (transfer_learning_evaluations). Reference:
Sequential Model-Free Hyperparameter Tuning.Martin Wistuba, Nicolas Schilling, Lars Schmidt-Thieme.IEEE International Conference on Data Mining (ICDM) 2015.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functiontransfer_learning_evaluations (
Dict
[str
,TransferLearningTaskEvaluations
]) – Dictionary from task name to offline evaluations.metric (
str
) – Name of metric to optimizemode (
str
) – Whether to minimize (min) or maximize (max)sort_transfer_learning_evaluations (
bool
) – UseFalse
if the hyperparameters for each task intransfer_learning_evaluations
are already in the same order. If set toTrue
, hyperparameters are sorted.use_surrogates (
bool
) – If the same configuration is not evaluated on all tasks, set this toTrue
. This will generate a set of configurations and will impute their performance using surrogate models.random_seed (
Optional
[int
]) – Used for randomly sampling candidates. Only used ifuse_surrogates=True
.kwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.ASHACTS(config_space, metric, resource_attr, transfer_learning_evaluations, mode='min', random_seed=None, **kwargs)[source]
Bases:
HyperbandScheduler
Runs ASHA where the searcher is done with the transfer-learning method:
A Quantile-based Approach for Hyperparameter Transfer Learning.David Salinas, Huibin Shen, Valerio Perrone.ICML 2020.This is the Copula Thompson Sampling approach described in the paper where a surrogate is fitted on the transfer learning data to predict mean and variance of configuration performance given a hyperparameter. The surrogate is then sampled from, and the best configurations are returned as next candidate to evaluate.
- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizeresource_attr (
str
) – Name of resource attributetransfer_learning_evaluations (
Dict
[str
,TransferLearningTaskEvaluations
]) – Dictionary from task name to offline evaluations.mode (
str
) – Whether to minimize (min) or maximize (max)random_seed (
Optional
[int
]) – Used for randomly sampling candidateskwargs – Additional arguments to
HyperbandScheduler
- class syne_tune.optimizer.baselines.KDE(config_space, metric, **kwargs)[source]
Bases:
FIFOScheduler
Single-fidelity variant of BOHB
Combines
FIFOScheduler
with TPE-like Bayesian optimization, using kernel density estimators.See
KernelDensityEstimator
forkwargs["search_options"]
parameters.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space for evaluation functionmetric (
str
) – Name of metric to optimizekwargs – Additional arguments to
FIFOScheduler
- class syne_tune.optimizer.baselines.CQR(config_space, metric, mode='min', random_seed=None, **kwargs)[source]
Bases:
FIFOScheduler
- Single-fidelity Conformal Quantile Regression approach proposed in:
- Optimizing Hyperparameters with Conformal Quantile Regression.David Salinas, Jacek Golebiowski, Aaron Klein, Matthias Seeger, Cedric Archambeau.ICML 2023.
The method predict quantile performance with gradient boosted trees and calibrate prediction with conformal predictions.
- class syne_tune.optimizer.baselines.ASHACQR(config_space, metric, resource_attr, mode='min', random_seed=None, **kwargs)[source]
Bases:
HyperbandScheduler
- Multi-fidelity Conformal Quantile Regression approach proposed in:
- Optimizing Hyperparameters with Conformal Quantile Regression.David Salinas, Jacek Golebiowski, Aaron Klein, Matthias Seeger, Cedric Archambeau.ICML 2023.
The method predict quantile performance with gradient boosted trees and calibrate prediction with conformal predictions.
syne_tune.optimizer.scheduler module
- class syne_tune.optimizer.scheduler.SchedulerDecision[source]
Bases:
object
Possible return values of
TrialScheduler.on_trial_result()
, signals the tuner how to proceed with the reporting trial.The difference between
PAUSE
andSTOP
is important. If a trial is stopped, it cannot be resumed afterwards. Its checkpoints may be deleted. If a trial is paused, it may be resumed in the future, and its most recent checkpoint should be retained.- CONTINUE = 'CONTINUE'
Status for continuing trial execution
- PAUSE = 'PAUSE'
Status for pausing trial execution
- STOP = 'STOP'
Status for stopping trial execution
- class syne_tune.optimizer.scheduler.TrialSuggestion(spawn_new_trial_id=True, checkpoint_trial_id=None, config=None)[source]
Bases:
object
Suggestion returned by
TrialScheduler.suggest()
- Parameters:
spawn_new_trial_id (
bool
) – Whether a newtrial_id
should be used.checkpoint_trial_id (
Optional
[int
]) – Checkpoint of this trial ID should be used to resume from. Ifspawn_new_trial_id
isFalse
, then the trialcheckpoint_trial_id
is resumed with its previous checkpoint.config (
Optional
[dict
]) – The configuration which should be evaluated.
-
spawn_new_trial_id:
bool
= True
-
checkpoint_trial_id:
Optional
[int
] = None
-
config:
Optional
[dict
] = None
- static start_suggestion(config, checkpoint_trial_id=None)[source]
Suggestion to start new trial
- Parameters:
config (
Dict
[str
,Any
]) – Configuration to use for the new trial.checkpoint_trial_id (
Optional
[int
]) – Use checkpoint of this trial when starting the new trial (otherwise, it is started from scratch).
- Return type:
- Returns:
A trial decision that consists in starting a new trial (which would receive a new trial-id).
- static resume_suggestion(trial_id, config=None)[source]
Suggestion to resume a paused trial
- Parameters:
trial_id (
int
) – ID of trial to be resumed (from its checkpoint)config (
Optional
[dict
]) – Configuration to use for resumed trial
- Return type:
- Returns:
A trial decision that consists in resuming trial
trial-id
withconfig
if provided, or the previous configuration used if not provided.
- class syne_tune.optimizer.scheduler.TrialScheduler(config_space)[source]
Bases:
object
Schedulers maintain and drive the logic of an experiment, making decisions which configs to evaluate in new trials, and which trials to stop early.
Some schedulers support pausing and resuming trials. In this case, they also drive the decision when to restart a paused trial.
- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spoce
- suggest(trial_id)[source]
Returns a suggestion for a new trial, or one to be resumed
This method returns
suggestion
of typeTrialSuggestion
(unless there is no config left to explore, and None is returned).If
suggestion.spawn_new_trial_id
isTrue
, a new trial is to be started with configsuggestion.config
. Typically, this new trial is started from scratch. But ifsuggestion.checkpoint_trial_id
is given, the trial is to be (warm)started from the checkpoint written for the trial with this ID. The new trial has IDtrial_id
.If
suggestion.spawn_new_trial_id
isFalse
, an existing and currently paused trial is to be resumed, whose ID issuggestion.checkpoint_trial_id
. If this trial has a checkpoint, we start from there. In this case,suggestion.config
is optional. If not given (default), the config of the resumed trial does not change. Otherwise, its config is overwritten bysuggestion.config
(seeHyperbandScheduler
withtype="promotion"
for an example why this can be useful).Apart from the HP config, additional fields can be appended to the dict, these are passed to the trial function as well.
- Parameters:
trial_id (
int
) – ID for new trial to be started (ignored if existing trial to be resumed)- Return type:
Optional
[TrialSuggestion
]- Returns:
Suggestion for a trial to be started or to be resumed, see above. If no suggestion can be made, None is returned
- on_trial_add(trial)[source]
Called when a new trial is added to the trial runner.
Additions are normally triggered by
suggest
.- Parameters:
trial (
Trial
) – Trial to be added
- on_trial_error(trial)[source]
Called when a trial has failed.
- Parameters:
trial (
Trial
) – Trial for which error is reported.
- on_trial_result(trial, result)[source]
Called on each intermediate result reported by a trial.
At this point, the trial scheduler can make a decision by returning one of
SchedulerDecision.CONTINUE
,SchedulerDecision.PAUSE
, orSchedulerDecision.STOP
. This will only be called when the trial is currently running.- Parameters:
trial (
Trial
) – Trial for which results are reportedresult (
Dict
[str
,Any
]) – Result dictionary
- Return type:
str
- Returns:
Decision what to do with the trial
- on_trial_complete(trial, result)[source]
Notification for the completion of trial.
Note that
on_trial_result()
is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignoreon_trial_result()
and just useresult
here.- Parameters:
trial (
Trial
) – Trial which is completingresult (
Dict
[str
,Any
]) – Result dictionary
- on_trial_remove(trial)[source]
Called to remove trial.
This is called when the trial is in PAUSED or PENDING state. Otherwise, call
on_trial_complete()
.- Parameters:
trial (
Trial
) – Trial to be removed
- metric_names()[source]
- Return type:
List
[str
]- Returns:
List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)
syne_tune.remote package
Submodules
syne_tune.remote.constants module
syne_tune.remote.estimators module
- syne_tune.remote.estimators.instance_sagemaker_estimator(**kwargs)[source]
Returns SageMaker estimator to be used for simulator back-end experiments and for remote launching of SageMaker back-end experiments.
- Parameters:
kwargs – Extra arguments to SageMaker estimator
- Returns:
SageMaker estimator
- syne_tune.remote.estimators.basic_cpu_instance_sagemaker_estimator(**kwargs)[source]
Returns SageMaker estimator to be used for simulator back-end experiments and for remote launching of SageMaker back-end experiments.
- Parameters:
kwargs – Extra arguments to SageMaker estimator
- Returns:
SageMaker estimator
- syne_tune.remote.estimators.pytorch_estimator(**estimator_kwargs)[source]
Get the PyTorch sagemaker estimator with the most up-to-date framework version. List of available containers: https://github.com/aws/deep-learning-containers/blob/master/available_images.md
- Parameters:
estimator_kwargs – Estimator parameters as discussed in https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/sagemaker.pytorch.html
- Return type:
PyTorch
- Returns:
PyTorch estimator
- syne_tune.remote.estimators.huggingface_estimator(**estimator_kwargs)[source]
Get the Huggingface sagemaker estimator with the most up-to-date framework version. List of available containers: https://github.com/aws/deep-learning-containers/blob/master/available_images.md
- Parameters:
estimator_kwargs – Estimator parameters as discussed in https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html
- Return type:
HuggingFace
- Returns:
PyTorch estimator
- syne_tune.remote.estimators.sklearn_estimator(**estimator_kwargs)[source]
Get the Scikit-learn sagemaker estimator with the most up-to-date framework version. List of available containers: https://github.com/aws/deep-learning-containers/blob/master/available_images.md
- Parameters:
estimator_kwargs – Estimator parameters as discussed in https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html
- Return type:
SKLearn
- Returns:
PyTorch estimator
- syne_tune.remote.estimators.mxnet_estimator(**estimator_kwargs)[source]
Get the MXNet sagemaker estimator with the most up-to-date framework version. List of available containers: https://github.com/aws/deep-learning-containers/blob/master/available_images.md
- Parameters:
estimator_kwargs – Estimator parameters as discussed in https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/sagemaker.mxnet.html
- Return type:
MXNet
- Returns:
PyTorch estimator
syne_tune.remote.remote_launcher module
- class syne_tune.remote.remote_launcher.RemoteLauncher(tuner, role=None, instance_type='ml.c5.4xlarge', dependencies=None, store_logs_localbackend=False, log_level=None, s3_path=None, no_tuner_logging=False, publish_tuning_metrics=True, **estimator_kwargs)[source]
Bases:
object
This class allows to launch a tuning job remotely. The remote tuning job may use either the local backend (in which case the remote instance will be used to evaluate trials) or the Sagemaker backend in which case the remote instance will spawn one Sagemaker job per trial.
- Parameters:
tuner (
Tuner
) – Tuner that should be run remotely on ainstance_type
instance. Note thatStoppingCriterion
should be used for theTuner
rather than a lambda function to ensure serialization.role (
Optional
[str
]) – SageMaker role to be used to launch the remote tuning instance.instance_type (
str
) – Instance where the tuning is going to happen. Defaults to “ml.c5.4xlarge”dependencies (
Optional
[List
[str
]]) – List of folders that should be included as dependencies for the backend script to runestimator_kwargs – Extra arguments for creating the SageMaker estimator for the tuning code.
store_logs_localbackend (
bool
) – Whether to sync logs and checkpoints to S3 when using the local backend. When using SageMaker backend, logs are persisted by SageMaker. UsingTrue
can lead to failure with large checkpoints. Defauls toFalse
log_level (
Optional
[int
]) – Logging level. Default islogging.INFO
, whilelogging.DEBUG
gives more messagess3_path (
Optional
[str
]) – S3 base path used for checkpointing, outputs of tuning will be stored under{s3_path}/{tuner_name}
. The logs of the local backend are only stored ifstore_logs_localbackend
is True. Defaults tos3_experiment_path()
no_tuner_logging (
bool
) – IfTrue
, the logging level forsyne_tune.tuner
is set tologging.ERROR
. Defaults toFalse
publish_tuning_metrics (
bool
) – IfTrue
, a number of tuning metrics (seeRemoteTuningMetricsCallback
) are reported and displayed in the SageMaker training job console. This is modifyingtuner
, in the sense that a callback is appended totuner.callbacks
. Defaults toTrue
.
- run(wait=True)[source]
- Parameters:
wait (
bool
) – Whether the call should wait until the job completes (default:True
). If False the call returns once the tuning job is scheduled on SageMaker.
- prepare_upload()[source]
Prepares the files that needs to be uploaded by SageMaker so that the tuning job can happen. This includes, 1) the entrypoint script of the backend and 2) the tuner that needs to run remotely.
syne_tune.remote.remote_main module
Entrypoint script that allows to launch a tuning job remotely. It loads the tuner from a specified path then runs it.
syne_tune.remote.remote_metrics_callback module
- class syne_tune.remote.remote_metrics_callback.RemoteTuningMetricsCallback(metric, mode, config_space=None, resource_attr=None)[source]
Bases:
TunerCallback
Reports metrics related to the experiment run by
Tuner
. With remote tuning, if these metrics are registered with the SageMaker estimator running the experiment, they are visualized in the SageMaker console. Metrics reported are:BEST_METRIC_VALUE
: Best value ofmetric
reported to tuner so farBEST_TRIAL_ID
: ID of trial for which the best metric value was reported so farBEST_RESOURCE_VALUE
: Resource value for which the best metric value was reported so far. Only ifresource_attr
is givenIf
config_space
is given, then for each hyperparametername
in there (entry with domain), we add a metricBEST_HP_PREFIX + name
. However, at mostMAX_METRICS_SUPPORTED_BY_SAGEMAKER
are supported
- register_metrics_with_estimator(estimator)[source]
Registers metrics reported here at SageMaker estimator
estimator
. This should be the one which runs the remote experiment.Note: The total number of metric definitions must not exceed
MAX_METRICS_SUPPORTED_BY_SAGEMAKER
. Otherwise, only the initial part ofmetric_names
is registered.- Parameters:
estimator (
EstimatorBase
) – SageMaker estimator to run the experiment
- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
[str
,Any
]) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
syne_tune.remote.scheduling module
syne_tune.utils package
- syne_tune.utils.add_checkpointing_to_argparse(parser)[source]
To be called for the argument parser in the endpoint script. Arguments added here are optional. If checkpointing is not supported, they are simply not parsed.
- Parameters:
parser (
ArgumentParser
) – Parser to add extra arguments to
- syne_tune.utils.resume_from_checkpointed_model(config, load_model_fn)[source]
Checks whether there is a checkpoint to be resumed from. If so, the checkpoint is loaded by calling
load_model_fn
. This function takes a local pathname (to which it appends a filename). It returns resume_from, the resource value (e.g., epoch) the checkpoint was written at. If it fails to load the checkpoint, it may return 0. This skips resuming from a checkpoint. This resume_from value is returned.If checkpointing is not supported in
config
, or no checkpoint is found, resume_from = 0 is returned.- Parameters:
config (
Dict
[str
,Any
]) – Configuration the training script is called withload_model_fn (
Callable
[[str
],int
]) – See above, must returnresume_from
. Seepytorch_load_save_functions()
for an example
- Return type:
int
- Returns:
resume_from
(0 if no checkpoint has been loaded)
- syne_tune.utils.checkpoint_model_at_rung_level(config, save_model_fn, resource)[source]
If checkpointing is supported, checks whether a checkpoint is to be written. This is the case if the checkpoint dir is set in
config
. A checkpoint is written by callingsave_model_fn
, passing the local pathname and resource.Note: Why is
resource
passed here? In the future, we want to support writing checkpoints only for certain resource levels. This is useful if writing the checkpoint is expensive compared to the time needed to run one resource unit.- Parameters:
config (
Dict
[str
,Any
]) – Configuration the training script is called withsave_model_fn (
Callable
[[str
,int
],Any
]) – See above. Seepytorch_load_save_functions()
for an exampleresource (
int
) – Current resource level (e.g., number of epochs done)
- syne_tune.utils.pytorch_load_save_functions(state_dict_objects, mutable_state=None, fname='checkpoint.json')[source]
Provides default
load_model_fn
,save_model_fn
functions for standard PyTorch models (arguments toresume_from_checkpointed_model()
,checkpoint_model_at_rung_level()
).- Parameters:
state_dict_objects (
Dict
[str
,Any
]) – Dict of PyTorch objects implementingstate_dict
andload_state_dict
mutable_state (
Optional
[dict
]) – Optional. Additional dict with elementary value typesfname (
str
) – Name of local file (path is taken from config)
- Returns:
load_model_fn, save_model_fn
- syne_tune.utils.add_config_json_to_argparse(parser)[source]
To be called for the argument parser in the endpoint script.
- Parameters:
parser (
ArgumentParser
) – Parser to add extra arguments to
- syne_tune.utils.load_config_json(args)[source]
Loads configuration from JSON file and returns the union with
args
.- Parameters:
args (
Dict
[str
,Any
]) – Arguments returned byArgumentParser
, as dictionary- Return type:
Dict
[str
,Any
]- Returns:
Combined configuration dictionary
- syne_tune.utils.streamline_config_space(config_space, exclude_names=None, verbose=False)[source]
Given a configuration space
config_space
, this function returns a new configuration space where some domains may have been replaced by approximately equivalent ones, which are however better suited for Bayesian optimization. Entries with key inexclude_names
are not replaced.See
convert_domain()
for what replacement rules may be applied.- Parameters:
config_space (
Dict
[str
,Any
]) – Original configuration spaceexclude_names (
Optional
[List
[str
]]) – Do not convert entries with these keysverbose (
bool
) – Log output for replaced domains? Defaults toFalse
- Return type:
Dict
[str
,Any
]- Returns:
Streamlined configuration space
Submodules
syne_tune.utils.checkpoint module
- syne_tune.utils.checkpoint.add_checkpointing_to_argparse(parser)[source]
To be called for the argument parser in the endpoint script. Arguments added here are optional. If checkpointing is not supported, they are simply not parsed.
- Parameters:
parser (
ArgumentParser
) – Parser to add extra arguments to
- syne_tune.utils.checkpoint.resume_from_checkpointed_model(config, load_model_fn)[source]
Checks whether there is a checkpoint to be resumed from. If so, the checkpoint is loaded by calling
load_model_fn
. This function takes a local pathname (to which it appends a filename). It returns resume_from, the resource value (e.g., epoch) the checkpoint was written at. If it fails to load the checkpoint, it may return 0. This skips resuming from a checkpoint. This resume_from value is returned.If checkpointing is not supported in
config
, or no checkpoint is found, resume_from = 0 is returned.- Parameters:
config (
Dict
[str
,Any
]) – Configuration the training script is called withload_model_fn (
Callable
[[str
],int
]) – See above, must returnresume_from
. Seepytorch_load_save_functions()
for an example
- Return type:
int
- Returns:
resume_from
(0 if no checkpoint has been loaded)
- syne_tune.utils.checkpoint.checkpoint_model_at_rung_level(config, save_model_fn, resource)[source]
If checkpointing is supported, checks whether a checkpoint is to be written. This is the case if the checkpoint dir is set in
config
. A checkpoint is written by callingsave_model_fn
, passing the local pathname and resource.Note: Why is
resource
passed here? In the future, we want to support writing checkpoints only for certain resource levels. This is useful if writing the checkpoint is expensive compared to the time needed to run one resource unit.- Parameters:
config (
Dict
[str
,Any
]) – Configuration the training script is called withsave_model_fn (
Callable
[[str
,int
],Any
]) – See above. Seepytorch_load_save_functions()
for an exampleresource (
int
) – Current resource level (e.g., number of epochs done)
- syne_tune.utils.checkpoint.pytorch_load_save_functions(state_dict_objects, mutable_state=None, fname='checkpoint.json')[source]
Provides default
load_model_fn
,save_model_fn
functions for standard PyTorch models (arguments toresume_from_checkpointed_model()
,checkpoint_model_at_rung_level()
).- Parameters:
state_dict_objects (
Dict
[str
,Any
]) – Dict of PyTorch objects implementingstate_dict
andload_state_dict
mutable_state (
Optional
[dict
]) – Optional. Additional dict with elementary value typesfname (
str
) – Name of local file (path is taken from config)
- Returns:
load_model_fn, save_model_fn
syne_tune.utils.config_as_json module
syne_tune.utils.convert_domain module
- syne_tune.utils.convert_domain.fit_to_regular_grid(x)[source]
Computes the least squares fit of \(a * j + b\) to
x[j]
, where \(j = 0,\dots, n-1\). Returns the LS estimate ofa
,b
, and the coefficient of variation \(R^2\).- Parameters:
x (
ndarray
) – Strictly increasing sequence- Return type:
Dict
[str
,float
]- Returns:
See above
- syne_tune.utils.convert_domain.convert_choice_domain(domain, name=None)[source]
If the choice domain
domain
has more than 2 numerical values, it is converted tofinrange()
,logfinrange()
,ordinal()
, orlogordinal()
. Otherwise,domain
is returned as is.The idea is to compute the least squares fit \(a * j + b\) to
x[j]
, wherex
are the sorted values or their logs (if all values are positive). If this fit is very close (judged by coefficient of variation \(R^2\)), we use the equispaced typesfinrange
orlogfinrange
, otherwise we useordinal
orlogordinal
.- Return type:
- syne_tune.utils.convert_domain.convert_linear_to_log_domain(domain, name=None)[source]
- Return type:
- syne_tune.utils.convert_domain.convert_domain(domain, name=None)[source]
If one of the following rules apply,
domain
is converted and returned, otherwise it is returned as is.domain
is categorical, its values are numerical. This is converted tofinrange()
,logfinrange()
,ordinal()
, orlogordinal()
. We fit the values or their logs to the closest regular grid, converting to(log)finrange
if the least squares fit to the grid is good enough, otherwise to(log)ordinal
, whereordinal
is withkind="nn"
. Note that the conversion to(log)finrange
may result in slightly different values.domain
isfloat` or ``int
. This is converted to the same type, but in log scale, if the current scale is linear,lower
is positive, and the ratioupper / lower
is larger thanUPPER_LOWER_RATIO_THRESHOLD
.
- syne_tune.utils.convert_domain.streamline_config_space(config_space, exclude_names=None, verbose=False)[source]
Given a configuration space
config_space
, this function returns a new configuration space where some domains may have been replaced by approximately equivalent ones, which are however better suited for Bayesian optimization. Entries with key inexclude_names
are not replaced.See
convert_domain()
for what replacement rules may be applied.- Parameters:
config_space (
Dict
[str
,Any
]) – Original configuration spaceexclude_names (
Optional
[List
[str
]]) – Do not convert entries with these keysverbose (
bool
) – Log output for replaced domains? Defaults toFalse
- Return type:
Dict
[str
,Any
]- Returns:
Streamlined configuration space
syne_tune.utils.parse_bool module
Submodules
syne_tune.config_space module
- class syne_tune.config_space.Domain[source]
Bases:
object
Base class to specify a type and valid range to sample parameters from.
This base class is implemented by parameter spaces, like float ranges (
Float
), integer ranges (Integer
), or categorical variables (Categorical
). TheDomain
object contains information about valid values (e.g. minimum and maximum values), and exposes methods that allow specification of specific samplers (e.g.uniform()
orloguniform()
).- sampler = None
- default_sampler_cls = None
- property value_type
- Returns:
Type of values (one of
str
,float
,int
)
- cast(value)[source]
- Parameters:
value – Value top cast
- Returns:
value
cast to domain. For a finite domain, this can involve rounding
- sample(spec=None, size=1, random_state=None)[source]
- Parameters:
spec (
Union
[List
[dict
],dict
,None
]) – Passed to samplersize (
int
) – Number of values to sample, defaults to 1random_state (
Optional
[RandomState
]) – PRN generator
- Return type:
Union
[Any
,List
[Any
]]- Returns:
Single value (
size == 1
) or list (size > 1
)
- is_valid(value)[source]
- Parameters:
value (
Any
) – Value to test- Returns:
Is
value
a valid value in domain?
- property domain_str
- match_string(value)[source]
Returns string representation of
value
(which must be of domain type) which is to match configurations for (approximate) equality. For discrete types (e.g.,Integer
,Categorical
), this matches for exact equality.- Parameters:
value (
Any
) – Value of domain type (usecast()
to be safe)- Return type:
str
- Returns:
String representation useful for matching
- class syne_tune.config_space.LogUniform(base=2.718281828459045)[source]
Bases:
Sampler
Note: We keep the argument
base
for compatibility with Ray Tune. Sincebase
has no effect on the distribution, we don’t use it internally.
- class syne_tune.config_space.Float(lower, upper)[source]
Bases:
Domain
Continuous value in closed interval
[lower, upper]
.- Parameters:
lower (
float
) – Lower bound (included)upper (
float
) – Upper bound (included)
- default_sampler_cls
alias of
_Uniform
- property value_type
- Returns:
Type of values (one of
str
,float
,int
)
- is_valid(value)[source]
- Parameters:
value (
float
) – Value to test- Returns:
Is
value
a valid value in domain?
- property domain_str
- match_string(value)[source]
Returns string representation of
value
(which must be of domain type) which is to match configurations for (approximate) equality. For discrete types (e.g.,Integer
,Categorical
), this matches for exact equality.- Parameters:
value – Value of domain type (use
cast()
to be safe)- Return type:
str
- Returns:
String representation useful for matching
- class syne_tune.config_space.Integer(lower, upper)[source]
Bases:
Domain
Integer value in closed interval
[lower, upper]
. Note thatupper
is included.- Parameters:
lower (
int
) – Lower bound (included)upper (
int
) – Upper bound (included)
- default_sampler_cls
alias of
_Uniform
- property value_type
- Returns:
Type of values (one of
str
,float
,int
)
- cast(value)[source]
- Parameters:
value – Value top cast
- Returns:
value
cast to domain. For a finite domain, this can involve rounding
- is_valid(value)[source]
- Parameters:
value (
int
) – Value to test- Returns:
Is
value
a valid value in domain?
- property domain_str
- match_string(value)[source]
Returns string representation of
value
(which must be of domain type) which is to match configurations for (approximate) equality. For discrete types (e.g.,Integer
,Categorical
), this matches for exact equality.- Parameters:
value – Value of domain type (use
cast()
to be safe)- Return type:
str
- Returns:
String representation useful for matching
- class syne_tune.config_space.Categorical(categories)[source]
Bases:
Domain
Value from finite set, whose values do not have a total ordering. For values with an ordering, use
Ordinal
.- Parameters:
categories (
Sequence
) – Finite sequence, all entries must have same type
- default_sampler_cls
alias of
_Uniform
- is_valid(value)[source]
- Parameters:
value (
Any
) – Value to test- Returns:
Is
value
a valid value in domain?
- property value_type
- Returns:
Type of values (one of
str
,float
,int
)
- property domain_str
- cast(value)[source]
- Parameters:
value – Value top cast
- Returns:
value
cast to domain. For a finite domain, this can involve rounding
- match_string(value)[source]
Returns string representation of
value
(which must be of domain type) which is to match configurations for (approximate) equality. For discrete types (e.g.,Integer
,Categorical
), this matches for exact equality.- Parameters:
value – Value of domain type (use
cast()
to be safe)- Return type:
str
- Returns:
String representation useful for matching
- class syne_tune.config_space.Ordinal(categories)[source]
Bases:
Categorical
Represents an ordered set. As far as random sampling is concerned, this type is equivalent to
Categorical
, but when used in methods that require encodings (or distances), nearby values have closer encodings.- Parameters:
categories (
Sequence
) – Finite sequence, all entries must have same type
- class syne_tune.config_space.OrdinalNearestNeighbor(categories, log_scale=False)[source]
Bases:
Ordinal
Different type for ordered set of numerical values (int or float). Essentially, the finite set is represented by a real-valued interval containing all values, and random sampling draws a value from this interval and rounds it to the nearest value in
categories
. Iflog_scale
is True, all of this happens in log scale. Unless values are equidistant, this is different fromOrdinal
.- Parameters:
categories (
Sequence
) – Finite sequence, must be strictly increasing, value type must befloat
orint
. Iflog_scale=True
, values must be positivelog_scale (
bool
) – Encoding and NN matching in log domain?
- property lower_int: float | None
- property upper_int: float | None
- property categories_int: ndarray | None
- cast(value)[source]
- Parameters:
value – Value top cast
- Returns:
value
cast to domain. For a finite domain, this can involve rounding
- sample(spec=None, size=1, random_state=None)[source]
- Parameters:
spec (
Union
[List
[dict
],dict
,None
]) – Passed to samplersize (
int
) – Number of values to sample, defaults to 1random_state (
Optional
[RandomState
]) – PRN generator
- Return type:
Union
[Any
,List
[Any
]]- Returns:
Single value (
size == 1
) or list (size > 1
)
- class syne_tune.config_space.FiniteRange(lower, upper, size, log_scale=False, cast_int=False)[source]
Bases:
Domain
Represents a finite range
[lower, ..., upper]
withsize
values equally spaced in linear or log domain. Ifcast_int
, the value type is int (rounding after the transform).- Parameters:
lower (
float
) – Lower bound (included)upper (
float
) – Upper bound (included)size (
int
) – Number of valueslog_scale (
bool
) – Equal spacing in log domain?cast_int (
bool
) – Value type isint
(float
otherwise)
- property values
- property value_type
- Returns:
Type of values (one of
str
,float
,int
)
- cast(value)[source]
- Parameters:
value – Value top cast
- Returns:
value
cast to domain. For a finite domain, this can involve rounding
- sample(spec=None, size=1, random_state=None)[source]
- Parameters:
spec (
Union
[List
[dict
],dict
,None
]) – Passed to samplersize (
int
) – Number of values to sample, defaults to 1random_state (
Optional
[RandomState
]) – PRN generator
- Return type:
Union
[Any
,List
[Any
]]- Returns:
Single value (
size == 1
) or list (size > 1
)
- property domain_str
- match_string(value)[source]
Returns string representation of
value
(which must be of domain type) which is to match configurations for (approximate) equality. For discrete types (e.g.,Integer
,Categorical
), this matches for exact equality.- Parameters:
value – Value of domain type (use
cast()
to be safe)- Return type:
str
- Returns:
String representation useful for matching
- syne_tune.config_space.uniform(lower, upper)[source]
Uniform float value between
lower
andupper
- Parameters:
lower (
float
) – Lower bound (included)upper (
float
) – Upper bound (included)
- Returns:
Float
object
- syne_tune.config_space.loguniform(lower, upper)[source]
Log-uniform float value between
lower
andupper
Sampling is done as
exp(x)
, where x is uniform betweenlog(lower)
andlog(upper)
.- Parameters:
lower (
float
) – Lower bound (included; positive)upper (
float
) – Upper bound (included; positive)
- Returns:
Float
object
- syne_tune.config_space.randint(lower, upper)[source]
Uniform integer between
lower
andupper
lower
andupper
are inclusive. This is a difference to Ray Tune, whereupper
is exclusive.- Parameters:
lower (
int
) – Lower bound (included)upper (
int
) – Upper bound (included)
:return
Integer
object
- syne_tune.config_space.lograndint(lower, upper)[source]
Log-uniform integer between
lower
andupper
lower
andupper
are inclusive. Note: Ray Tune has an argumentbase
here, but since this does not affect the distribution, we drop it.- Parameters:
lower (
int
) – Lower bound (included)upper (
int
) – Upper bound (included)
:return
Integer
object
- syne_tune.config_space.choice(categories)[source]
Uniform over list of categories
- Parameters:
categories (
list
) – Sequence of values, all entries must have the same type- Returns:
Categorical
object
- syne_tune.config_space.ordinal(categories, kind=None)[source]
Ordinal value from list
categories
. Different variants are selected bykind
.For
kind == "equal"
, sampling is the same as forchoice
, and the internal encoding is by int (first value maps to 0, second to 1, …).For
kind == "nn"
, the finite set is represented by a real-valued interval containing all values, and random sampling draws a value from this interval and rounds it to the nearest value incategories
. This behaves like a finite version ofuniform
orrandint
. Forkind == "nn-log"
, nearest neighbour rounding happens in log space, which behaves like a finite version ofloguniform`()
orlograndint`()
. You can also use the synonymlogordinal()
. For this type, values incategories
must be int or float and strictly increasing, and also positive ifkind == "nn-log"
.- Parameters:
categories (
list
) – Sequence of values, all entries must have the same typekind (
Optional
[str
]) – Can be “equal”, “nn”, “nn-log”
- Returns:
Ordinal
orOrdinalNearestNeighbor
object
- syne_tune.config_space.logordinal(categories)[source]
Corresponds to
ordinal()
withkind="nn-log"
, so that nearest neighbour mapping happens in log scale. Values incategories
must be int or float, strictly increasing, and positive.- Parameters:
categories (
list
) – Sequence of values, strictly increasing, of typefloat
orint
, all positive- Returns:
OrdinalNearestNeighbor
object
- syne_tune.config_space.finrange(lower, upper, size, cast_int=False)[source]
Finite range
[lower, ..., upper]
withsize
entries, which are equally spaced. Finite alternative touniform()
.- Parameters:
lower (
float
) – Smallest feasible valueupper (
float
) – Largest feasible valuesize (
int
) – Size of (finite) domain, must be >= 2cast_int (
bool
) – Values rounded and cast to int?
- Returns:
FiniteRange
object
- syne_tune.config_space.logfinrange(lower, upper, size, cast_int=False)[source]
Finite range
[lower, ..., upper]
withsize
entries, which are equally spaced in the log domain. Finite alternative tologuniform()
.- Parameters:
lower (
float
) – Smallest feasible value (positive)upper (
float
) – Largest feasible value (positive)size (
int
) – Size of (finite) domain, must be >= 2cast_int (
bool
) – Values rounded and cast to int?
- Returns:
FiniteRange
object
- syne_tune.config_space.is_log_space(domain)[source]
- Parameters:
domain (
Domain
) – Hyperparameter type- Return type:
bool
- Returns:
Logarithmic encoding?
- syne_tune.config_space.is_uniform_space(domain)[source]
- Parameters:
domain (
Domain
) – Hyperparameter type- Return type:
bool
- Returns:
Linear (uniform) encoding?
- syne_tune.config_space.add_to_argparse(parser, config_space)[source]
Use this to prepare argument parser in endpoint script, for the non-fixed parameters in
config_space
.- Parameters:
parser (
ArgumentParser
) –argparse.ArgumentParser
objectconfig_space (
Dict
[str
,Any
]) – Configuration space (modified)
- syne_tune.config_space.cast_config_values(config, config_space)[source]
Returns config with keys, values of
config
, but values are cast to their specific types.- Parameters:
config (
Dict
[str
,Any
]) – Config whose values are to be castconfig_space (
Dict
[str
,Any
]) – Configuration space
- Return type:
Dict
[str
,Any
]- Returns:
New config with values cast to correct types
- syne_tune.config_space.non_constant_hyperparameter_keys(config_space)[source]
- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration space- Return type:
List
[str
]- Returns:
Keys corresponding to (non-fixed) hyperparameters
- syne_tune.config_space.config_space_size(config_space, upper_limit=1048576)[source]
Counts the number of distinct configurations in the configuration space
config_space
. If this is infinite (due to real-valued parameters) or larger thanupper_limit
, None is returned.- Parameters:
config_space (
Dict
[str
,Any
]) – Configuration spaceupper_limit (
int
) – See above. Defaults to2**20
- Return type:
Optional
[int
]- Returns:
Number of distinct configurations; or
None
if infinite or more thanupper_limit
- syne_tune.config_space.config_to_match_string(config, config_space, keys)[source]
Maps configuration to a match string, which can be used to compare configs for (approximate) equality. Only keys in
keys
are used, in that ordering.- Parameters:
config (
Dict
[str
,Any
]) – Configuration to be encoded in match stringconfig_space (
Dict
[str
,Any
]) – Configuration spacekeys (
List
[str
]) – Keys of parameters to be encoded
- Return type:
str
- Returns:
Match string
- syne_tune.config_space.to_dict(x)[source]
We assume that for each
Domain
subclass, the__init__()
kwargs are also members, and all other members start with_
.
- syne_tune.config_space.config_space_to_json_dict(config_space)[source]
Converts
config_space
into a dictionary that can be saved as a json file.- Parameters:
config_space (
Dict
[str
,Union
[Domain
,int
,float
,str
]]) – Configuration space- Return type:
Dict
[str
,Union
[int
,float
,str
]]- Returns:
JSON-serializable dictionary representing
config_space
- syne_tune.config_space.config_space_from_json_dict(config_space_dict)[source]
Converts the given dictionary into a Syne Tune search space.
Reverse of
config_space_to_json_dict()
.- Parameters:
config_space_dict (
Dict
[str
,Union
[int
,float
,str
]]) – JSON-serializable dict, as output byconfig_space_to_json_dict()
- Return type:
Dict
[str
,Union
[Domain
,int
,float
,str
]]- Returns:
Configuration space corresponding to
config_space_dict
- syne_tune.config_space.restrict_domain(numerical_domain, lower, upper)[source]
Restricts a numerical domain to be in the range
[lower, upper]
- syne_tune.config_space.quniform(lower, upper, q)[source]
Sample a quantized float value uniformly between
lower
andupper
.Sampling from
tune.uniform(1, 10)
is equivalent to sampling fromnp.random.uniform(1, 10))
The value will be quantized, i.e. rounded to an integer increment of
q
. Quantization makes the upper bound inclusive.
- syne_tune.config_space.reverseloguniform(lower, upper)[source]
Values 0 <= x < 1, internally represented as -log(1 - x)
- Paam lower:
Lower boundary of the output interval (e.g. 0.99)
- Parameters:
upper (
float
) – Upper boundary of the output interval (e.g. 0.9999)- Returns:
Float
object
- syne_tune.config_space.qloguniform(lower, upper, q)[source]
Sugar for sampling in different orders of magnitude.
The value will be quantized, i.e. rounded to an integer increment of
q
. Quantization makes the upper bound inclusive.- Parameters:
lower (
float
) – Lower boundary of the output interval (e.g. 1e-4)upper (
float
) – Upper boundary of the output interval (e.g. 1e-2)q (
float
) – Quantization number. The result will be rounded to an integer increment of this value.
syne_tune.constants module
Collects constants to be shared between core code and tuning scripts or benchmarks.
- syne_tune.constants.SYNE_TUNE_ENV_FOLDER = 'SYNETUNE_FOLDER'
Environment variable that allows to overides default library folder
- syne_tune.constants.SYNE_TUNE_DEFAULT_FOLDER = 'syne-tune'
Name of default library folder used if the env variable is not defined
- syne_tune.constants.ST_WORKER_ITER = 'st_worker_iter'
Number of times reporter was called
- syne_tune.constants.ST_WORKER_TIMESTAMP = 'st_worker_timestamp'
Time stamp when worker was called
- syne_tune.constants.ST_WORKER_TIME = 'st_worker_time'
Time since creation of reporter
- syne_tune.constants.ST_WORKER_COST = 'st_worker_cost'
Estimate of dollar cost spent so far
- syne_tune.constants.ST_INSTANCE_TYPE = 'st_instance_type'
Instance type to be used for job execution (SageMaker backend)
- syne_tune.constants.ST_INSTANCE_COUNT = 'st_instance_count'
Number of instances o be used for job execution (SageMaker backend)
- syne_tune.constants.ST_CHECKPOINT_DIR = 'st_checkpoint_dir'
Name of config key for checkpoint directory
- syne_tune.constants.ST_CONFIG_JSON_FNAME_ARG = 'st_config_json_filename'
Name of config key for config JSON file
- syne_tune.constants.ST_REMOTE_UPLOAD_DIR_NAME = 'tuner'
Name for
upload_dir
inRemoteTuner
- syne_tune.constants.ST_RESULTS_DATAFRAME_FILENAME = 'results.csv.zip'
Name for results dataframe stored in
StoreResultsCallback
- syne_tune.constants.ST_METADATA_FILENAME = 'metadata.json'
Name for metadata file stored in
Tuner
- syne_tune.constants.ST_TUNER_DILL_FILENAME = 'tuner.dill'
Name for final tuner object file stored in
Tuner
- syne_tune.constants.ST_DATETIME_FORMAT = '%Y-%m-%d-%H-%M-%S'
Datetime format used in result path names
- syne_tune.constants.MAX_METRICS_SUPPORTED_BY_SAGEMAKER = 40
Max number of metrics allowed for estimator
- syne_tune.constants.TUNER_DEFAULT_SLEEP_TIME = 5.0
Default value for
sleep_time
syne_tune.num_gpu module
Adapted from to not run in Shell mode which is unsecure. https://github.com/aws/sagemaker-rl-container/blob/master/src/vw-serving/src/vw_serving/sagemaker/gpu.py
syne_tune.report module
- class syne_tune.report.Reporter(add_time=True, add_cost=True)[source]
Bases:
object
Callback for reporting metric values from a training script back to Syne Tune. Example:
from syne_tune import Reporter report = Reporter() for epoch in range(1, epochs + 1): # ... report(epoch=epoch, accuracy=accuracy)
- Parameters:
add_time (
bool
) – If True (default), the time (in secs) since creation of theReporter
object is reported automatically asST_WORKER_TIME
add_cost (
bool
) – If True (default), estimated dollar cost since creation ofReporter
object is reported automatically asST_WORKER_COST
. This is available for SageMaker backend only. Requiresadd_time=True
.
-
add_time:
bool
= True
-
add_cost:
bool
= True
syne_tune.results_callback module
- class syne_tune.results_callback.ExtraResultsComposer[source]
Bases:
object
Base class for
extra_results_composer
argument inStoreResultsCallback
. Extracts extra results inStoreResultsCallback.on_trial_result()
and returns them as dictionary to be appended to the results dataframe.Why don’t we use a lambda function instead? We would like the tuner, with all its dependent objects, to be dill serializable, and lambda functions are not.
- class syne_tune.results_callback.StoreResultsCallback(add_wallclock_time=True, extra_results_composer=None)[source]
Bases:
TunerCallback
Default implementation of
TunerCallback
which records all reported results, and allows to store them as CSV file.- Parameters:
add_wallclock_time (
bool
) – IfTrue
, wallclock time since call ofon_tuning_start
is stored asST_TUNER_TIME
.extra_results_composer (
Optional
[ExtraResultsComposer
]) – Optional. If given, this is called inon_trial_result()
, and the resulting dictionary is appended as extra columns to the results dataframe
- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
[str
,Any
]) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
syne_tune.stopping_criterion module
- class syne_tune.stopping_criterion.StoppingCriterion(max_wallclock_time=None, max_num_evaluations=None, max_num_trials_started=None, max_num_trials_completed=None, max_cost=None, max_num_trials_finished=None, min_metric_value=None, max_metric_value=None)[source]
Bases:
object
Stopping criterion that can be used in a Tuner, for instance
Tuner(stop_criterion=StoppingCriterion(max_wallclock_time=3600), ...)
.If several arguments are used, the combined criterion is true whenever one of the atomic criteria is true.
In principle,
stop_criterion
forTuner
can be any lambda function, but this class should be used with remote launching in order to ensure proper serialization.- Parameters:
max_wallclock_time (
Optional
[float
]) – Stop once this wallclock time is reachedmax_num_evaluations (
Optional
[int
]) – Stop once more than this number of metric records have been reportedmax_num_trials_started (
Optional
[int
]) – Stop once more than this number of trials have been startedmax_num_trials_completed (
Optional
[int
]) – Stop once more than this number of trials have been completed. This does not include trials which were stopped or failedmax_cost (
Optional
[float
]) – Stop once total cost of evaluations larger than this valuemax_num_trials_finished (
Optional
[int
]) – Stop once more than this number of trials have finished (i.e., completed, stopped, failed, or stopping)min_metric_value (
Optional
[Dict
[str
,float
]]) – Dictionary with thresholds for selected metrics. Stop once an evaluation reports a metric value below a thresholdmax_metric_value (
Optional
[Dict
[str
,float
]]) – Dictionary with thresholds for selected metrics. Stop once an evaluation reports a metric value above a threshold
-
max_wallclock_time:
float
= None
-
max_num_evaluations:
int
= None
-
max_num_trials_started:
int
= None
-
max_num_trials_completed:
int
= None
-
max_cost:
float
= None
-
max_num_trials_finished:
int
= None
-
min_metric_value:
Optional
[Dict
[str
,float
]] = None
-
max_metric_value:
Optional
[Dict
[str
,float
]] = None
- class syne_tune.stopping_criterion.PlateauStopper(metric, std=0.001, num_trials=10, mode='min', patience=0)[source]
Bases:
object
Stops the experiment when a metric plateaued for N consecutive trials for more than the given amount of iterations specified in the patience parameter. This code is inspired by Ray Tune.
syne_tune.try_import module
syne_tune.tuner module
- class syne_tune.tuner.Tuner(trial_backend, scheduler, stop_criterion, n_workers, sleep_time=5.0, results_update_interval=10.0, print_update_interval=30.0, max_failures=1, tuner_name=None, asynchronous_scheduling=True, wait_trial_completion_when_stopping=False, callbacks=None, metadata=None, suffix_tuner_name=True, save_tuner=True, start_jobs_without_delay=True, trial_backend_path=None)[source]
Bases:
object
Controller of tuning loop, manages interplay between scheduler and trial backend. Also, stopping criterion and number of workers are maintained here.
- Parameters:
trial_backend (
TrialBackend
) – Backend for trial evaluationsscheduler (
TrialScheduler
) – Tuning algorithm for making decisions about which trials to start, stop, pause, or resumestop_criterion (
Callable
[[TuningStatus
],bool
]) – Tuning stops when this predicates returnsTrue
. Called in each iteration with the current tuning status. It is recommended to useStoppingCriterion
.n_workers (
int
) – Number of workers used here. Note that the backend needs to support (at least) this number of workers to be run in parallelsleep_time (
float
) – Time to sleep when all workers are busy. Defaults toDEFAULT_SLEEP_TIME
results_update_interval (
float
) – Frequency at which results are updated and stored (in seconds). Defaults to 10.print_update_interval (
float
) – Frequency at which result table is printed. Defaults to 30.max_failures (
int
) – This many trial execution failures are allowed before the tuning loop is aborted. Defaults to 1tuner_name (
Optional
[str
]) – Name associated with the tuning experiment, default to the name of the entrypoint. Must consists of alpha-digits characters, possibly separated by ‘-’. A postfix with a date time-stamp is added to ensure uniqueness.asynchronous_scheduling (
bool
) – Whether to use asynchronous scheduling when scheduling new trials. IfTrue
, trials are scheduled as soon as a worker is available. IfFalse
, the tuner waits that all trials are finished before scheduling a new batch of sizen_workers
. Default toTrue
.wait_trial_completion_when_stopping (
bool
) – How to deal with running trials when stopping criterion is met. IfTrue
, the tuner waits until all trials are finished. IfFalse
, all trials are terminated. Defaults toFalse
.callbacks (
Optional
[List
[TunerCallback
]]) – Called at certain times in the tuning loop, for example when a result is seen. The default callback stores results everyresults_update_interval
.metadata (
Optional
[dict
]) – Dictionary of user-metadata that will be persisted in{tuner_path}/{ST_METADATA_FILENAME}
, in addition to metadata provided by the user.SMT_TUNER_CREATION_TIMESTAMP
is always included which measures the time-stamp when the tuner started to run.suffix_tuner_name (
bool
) – IfTrue
, a timestamp is appended to the providedtuner_name
that ensures uniqueness, otherwise the name is left unchanged and is expected to be unique. Defaults toTrue
.save_tuner (
bool
) – IfTrue
, theTuner
object is serialized at the end of tuning, including its dependencies (e.g., scheduler). This allows all details of the experiment to be recovered. Defaults toTrue
.start_jobs_without_delay (
bool
) –Defaults to
True
. If this isTrue
, the tuner starts new jobs depending on scheduler decisions communicated to the backend. For example, if a trial has just been stopped (by callingbackend.stop_trial
), the tuner may start a new one immediately, even if the SageMaker training job is still busy due to stopping delays. This can lead to faster experiment runtime, because the backend is temporarily going over its budget.If set to
False
, the tuner always asks the backend for the number of busy workers, which guarantees that we never go over then_workers
budget. This makes a difference for backends where stopping or pausing trials is not immediate (e.g.,SageMakerBackend
). Not going over budget means thatn_workers
can be set up to the available quota, without running the risk of an exception due to the quota being exceeded. If you get such exceptions, we recommend to usestart_jobs_without_delay=False
. Also, if the SageMaker warm pool feature is used, it is recommended to setstart_jobs_without_delay=False
, since otherwise more thann_workers
warm pools will be started, because existing ones are busy with stopping when they should be reassigned.trial_backend_path (
Optional
[str
]) –If this is given, the path of
trial_backend
(where logs and checkpoints of trials are stored) is set to this. Otherwise, it is set toself.tuner_path
, so that per-trial information is written to the same path as tuning results.If the backend is
LocalBackend
and the experiment is ru remotely, we recommend to set this, since otherwise checkpoints and logs are synced to S3, along with tuning results, which is costly and error-prone.
- best_config(metric=0)[source]
- Parameters:
metric (
Union
[str
,int
,None
]) – Indicates which metric to use, can be the index or a name of the metric. default to 0 - first metric defined in the Scheduler- Return type:
Tuple
[int
,Dict
[str
,Any
]]- Returns:
the best configuration found while tuning for the metric given and the associated trial-id
syne_tune.tuner_callback module
- class syne_tune.tuner_callback.TunerCallback[source]
Bases:
object
Allows user of
Tuner
to monitor progress, store additional results, etc.- on_tuning_end()[source]
Called once the tuning loop terminates
This is called before
Tuner
object is serialized (optionally), and also before running jobs are stopped.
- on_loop_start()[source]
Called at start of each tuning loop iteration
Every iteration starts with fetching new results from the backend. This is called before this is done.
- on_loop_end()[source]
Called at end of each tuning loop iteration
This is done before the loop stopping condition is checked and acted upon.
- on_fetch_status_results(trial_status_dict, new_results)[source]
Called just after
trial_backend.fetch_status_results
- Parameters:
trial_status_dict (
Dict
[int
,Tuple
[Trial
,str
]]) – Result offetch_status_results
new_results (
List
[Tuple
[int
,dict
]]) – Result offetch_status_results
- on_trial_complete(trial, result)[source]
Called when a trial completes (
Status.completed
)The arguments here also have been passed to
scheduler.on_trial_complete
, before this call here.- Parameters:
trial (
Trial
) – Trial that just completed.result (
Dict
[str
,Any
]) – Last result obtained.
- on_trial_result(trial, status, result, decision)[source]
Called when a new result (reported by a trial) is observed
The arguments here are inputs or outputs of
scheduler.on_trial_result
(called just before).- Parameters:
trial (
Trial
) – Trial whose report has been receivedstatus (
str
) – Status of trial beforescheduler.on_trial_result
has been calledresult (
Dict
[str
,Any
]) – Result dict receiveddecision (
str
) – Decision returned byscheduler.on_trial_result
- on_tuning_sleep(sleep_time)[source]
Called just after tuner has slept, because no worker was available
- Parameters:
sleep_time (
float
) – Time (in secs) for which tuner has just slept
syne_tune.tuning_status module
- class syne_tune.tuning_status.MetricsStatistics[source]
Bases:
object
Allows to maintain simple running statistics (min/max/sum/count) of metrics provided. Statistics are tracked for numeric types only. Types of first added metrics define its types.
- class syne_tune.tuning_status.TuningStatus(metric_names)[source]
Bases:
object
Information of a tuning job to display as progress or to use to decide whether to stop the tuning job.
- Parameters:
metric_names (
List
[str
]) – Names of metrics reported
- update(trial_status_dict, new_results)[source]
Updates the tuning status given new statuses and results.
- Parameters:
trial_status_dict (
Dict
[int
,Tuple
[Trial
,str
]]) – Dictionary mapping trial ID toTrial
object and statusnew_results (
List
[Tuple
[int
,dict
]]) – New results, along with trial IDs
- mark_running_job_as_stopped()[source]
Update the status of all trials still running to be marked as stop.
- property num_trials_started
- Returns:
Number of trials which have been started
- property num_trials_completed
- Returns:
Number of trials which have been completed
- property num_trials_failed
- Returns:
Number of trials which have failed
- property num_trials_finished
- Returns:
Number of trials that finished, e.g. that completed, were stopped or are stopping, or failed
- property num_trials_running
- Returns:
Number of trials currently running
- property wallclock_time
- Returns:
the wallclock time spent in the tuner
- property user_time
- Returns:
the total user time spent in the workers
- property cost
- Returns:
the estimated dollar-cost spent while tuning
- syne_tune.tuning_status.print_best_metric_found(tuning_status, metric_names, mode=None)[source]
Prints trial status summary and the best metric found.
- Parameters:
tuning_status (
TuningStatus
) – Current tuning statusmetric_names (
List
[str
]) – Plot results for first metric in this listmode (
Optional
[str
]) – “min” or “max”
- Return type:
Optional
[Tuple
[int
,float
]]- Returns:
trial-id and value of the best metric found
syne_tune.util module
- class syne_tune.util.RegularCallback(callback, call_seconds_frequency)[source]
Bases:
object
Allows to call the callback function at most once every
call_seconds_frequency
seconds.- Parameters:
callback (
callable
) – Callback objectcall_seconds_frequency (
float
) – Wait time between subsequent calls
- syne_tune.util.experiment_path(tuner_name=None, local_path=None)[source]
Return the path of an experiment which is used both by
Tuner
and to collect results of experiments.- Parameters:
tuner_name (
Optional
[str
]) – Name of a tuning experimentlocal_path (
Optional
[str
]) – Local path where results should be saved when running locally outside of SageMaker. If not specified, then the environment variable"SYNETUNE_FOLDER"
is used if defined otherwise~/syne-tune/
is used. Defining the environment variable"SYNETUNE_FOLDER"
allows to override the default path.
- Return type:
Path
- Returns:
Path where to write logs and results for Syne Tune tuner. On SageMaker, results are written to
"/opt/ml/checkpoints/"
so that files are persisted continuously to S3 by SageMaker.
- syne_tune.util.s3_experiment_path(s3_bucket=None, experiment_name=None, tuner_name=None)[source]
Returns S3 path for storing results and checkpoints.
- Parameters:
s3_bucket (
Optional
[str
]) – If not given, the default bucket for the SageMaker session is usedexperiment_name (
Optional
[str
]) – If given, this is used as first directorytuner_name (
Optional
[str
]) – If given, this is used as second directory
- Return type:
str
- Returns:
S3 path, ending on “/”
- syne_tune.util.name_from_base(base, default, max_length=63)[source]
Append a timestamp to the provided string.
This function assures that the total length of the resulting string is not longer than the specified max length, trimming the input parameter if necessary.
- Parameters:
base (
Optional
[str
]) – String used as prefix to generate the unique namedefault (
str
) – String used ifbase is None
max_length (
int
) – Maximum length for the resulting string (default: 63)
- Return type:
str
- Returns:
Input parameter with appended timestamp
- syne_tune.util.repository_root_path()[source]
- Return type:
Path
- Returns:
Returns path including
syne_tune
,examples
,benchmarking
- syne_tune.util.script_checkpoint_example_path()[source]
- Return type:
Path
- Returns:
Path of checkpoint example
- syne_tune.util.script_height_example_path()[source]
- Return type:
Path
- Returns:
Path of
train_heigth
example
- syne_tune.util.is_increasing(lst)[source]
- Parameters:
lst (
List
[Union
[float
,int
]]) – List of float or int entries- Return type:
bool
- Returns:
Is
lst
strictly increasing?
- syne_tune.util.is_positive_integer(lst)[source]
- Parameters:
lst (
List
[int
]) – List of int entries- Return type:
bool
- Returns:
Are all entries of
lst
of typeint
and positive?
- syne_tune.util.is_integer(lst)[source]
- Parameters:
lst (
list
) – List of entries- Return type:
bool
- Returns:
Are all entries of
lst
of typeint
?
- syne_tune.util.dump_json_with_numpy(x, filename=None)[source]
Serializes dictionary
x
in JSON, taking into account NumPy specific value types such asn.p.int64
.- Parameters:
x (
dict
) – Dictionary to serialize or encodefilename (
Union
[str
,Path
,None
]) – Name of file to store JSON to. Optional. If not given, the JSON encoding is returned as string
- Return type:
Optional
[str
]- Returns:
If
filename is None
, JSON encoding is returned
- syne_tune.util.dict_get(params, key, default)[source]
Returns
params[key]
if this exists and is not None, anddefault
otherwise. Note that this is not the same asparams.get(key, default)
. Namely, ifparams[key]
is equal to None, this would return None, but this method returnsdefault
.This function is particularly helpful when dealing with a dict returned by
argparse.ArgumentParser
. Wheneverkey
is added as argument to the parser, but a value is not provided, this leads toparams[key] = None
.- Return type:
Any
- syne_tune.util.recursive_merge(a, b, stop_keys=None)[source]
Merge dictionaries
a
andb
, whereb
takes precedence. We typically use this to modify a dictionarya
, sob
is smaller thana
. Further recursion is stopped on any node with key instop_keys
. Use this for dictionary-valued entries not to be merged, but to be replaced by what is inb
.- Parameters:
a (
Dict
[str
,Any
]) – Dictionaryb (
Dict
[str
,Any
]) – Dictionary (can be empty)stop_keys (
Optional
[List
[str
]]) – See above, optional
- Return type:
Dict
[str
,Any
]- Returns:
Merged dictionary
- syne_tune.util.metric_name_mode(metric_names, metric_mode, metric)[source]
Retrieve the metric mode given a metric queried by either index or name. :type metric_names:
List
[str
] :param metric_names: metrics names defined in a scheduler :type metric_mode:Union
[str
,List
[str
]] :param metric_mode: metric mode or modes of a scheduler :type metric:Union
[str
,int
] :param metric: Index or name of the selected metric :return the name and the mode of the queried metric- Return type:
Tuple
[str
,str
]