Syne Tune: Large-Scale and Reproducible Hyperparameter Optimization

Latest Release Python Versions License PyPI - Downloads _images/synetune.gif

This package provides state-of-the-art algorithms for hyperparameter optimization (HPO) with the following key features:

  • Wide coverage (>20) of different HPO methods, including:

    • Asynchronous versions to maximize utilization and distributed versions (i.e., with multiple workers);

    • Multi-fidelity methods supporting model-based decisions (BOHB, MOBSTER, Hyper-Tune, DyHPO, BORE);

    • Hyperparameter transfer learning to speed up (repeated) tuning jobs;

    • Multi-objective optimizers that can tune multiple objectives simultaneously (such as accuracy and latency).

  • HPO can be run in different environments (locally, AWS, simulation) by changing just one line of code.

  • Out-of-the-box tabulated benchmarks that allows you simulate results in seconds while preserving the real dynamics of asynchronous or synchronous HPO with any number of workers.

What’s New?

  • Andreas Mueller, co-creator and core contributor to scikit-learn, used Syne Tune extensively to optimize parameters of a hypernetwork which solves tabular classification tasks faster than state of the art boosted decision tree algorithms. Check out the video.

  • The experimentation framework of Syne Tune, providing an easy access to all the different methods, execution backends, and ways to run many experiments in parallel, is now available in syne_tune.experiments, there is no need to install from source anymore. This framework is the best place to start serious experimentation work with Syne Tune.

  • New tutorial: Distributed Hyperparameter Tuning: Finding the Right Model can be Fast and Fun. Provides an overview of Syne Tune and its experimentation framework.

  • You can now create comparative plots, combining the results of many experiments, as shown here.

  • Local Backend supports training with more than one GPU per trial.

  • Speculative early checkpoint removal for asynchronous multi-fidelity optimization. Retaining all checkpoints often exhausts all available disk space when training large models. With this feature, Syne Tune automatically removes checkpoints that are unlikely to be needed. Details.

  • New Multi-Objective Scheduler: LinearScalarizedScheduler. The method works by taking a multi-objective problem and turning it into a single-objective task by optimizing for a linear combination of all objectives. This wrapper works with all single-objective schedulers.

  • Support for automatic termination criterion proposed by Makarova et al. Instead of defining a fixed number of iterations or wall-clock time limit, we can set a threshold on how much worse we allow the final solution to be compared to the global optimum, such that we automatically stop the optimization process once we find a solution that meets this criteria.

Installation

To install Syne Tune from pip, you can simply do:

pip install 'syne-tune[basic]'

For development, you need to install Syne Tune from source:

git clone https://github.com/awslabs/syne-tune.git
cd syne-tune
python3 -m venv st_venv
. st_venv/bin/activate
pip install --upgrade pip
pip install -e '.[basic,dev]'

This installs Syne Tune in a virtual environment st_venv. Remember to activate this environment before working with Syne Tune. We also recommend building the virtual environment from scratch now and then, in particular when you pull a new release, as dependencies may have changed.

See our change log to check what has changed in the latest version.

In the examples above, Syne Tune is installed with the tag basic, which collects a reasonable number of dependencies. If you want to install all dependencies, replace basic with extra. You can further refine this selection by using partial dependencies.

What Is Hyperparameter Optimization?

Here is an introduction to hyperparameter optimization in the context of deep learning, which uses Syne Tune for some examples.

First Example

To enable tuning, you have to report metrics from a training script so that they can be communicated later to Syne Tune, this can be accomplished by just calling report(epoch=epoch, loss=loss), as shown in this example:

train_height_simple.py
import logging
import time

from syne_tune import Reporter
from argparse import ArgumentParser

if __name__ == "__main__":
    root = logging.getLogger()
    root.setLevel(logging.INFO)
    parser = ArgumentParser()
    parser.add_argument("--epochs", type=int)
    parser.add_argument("--width", type=float)
    parser.add_argument("--height", type=float)
    args, _ = parser.parse_known_args()
    report = Reporter()

    for step in range(args.epochs):
        time.sleep(0.1)
        dummy_score = 1.0 / (0.1 + args.width * step / 100) + args.height * 0.1
        # Feed the score back to Syne Tune
        report(epoch=step + 1, mean_loss=dummy_score)

Once you have annotated your training script in this way, you can launch a tuning experiment as follows:

launch_height_simple.py
from pathlib import Path

from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint
from syne_tune.optimizer.baselines import ASHA

# Hyperparameter configuration space
config_space = {
    "width": randint(1, 20),
    "height": randint(1, 20),
    "epochs": 100,
}
# Scheduler (i.e., HPO algorithm)
scheduler = ASHA(
    config_space,
    metric="mean_loss",
    resource_attr="epoch",
    max_resource_attr="epochs",
    search_options={"debug_log": False},
)

entry_point = str(
    Path(__file__).parent
    / "training_scripts"
    / "height_example"
    / "train_height_simple.py"
)
tuner = Tuner(
    trial_backend=LocalBackend(entry_point=entry_point),
    scheduler=scheduler,
    stop_criterion=StoppingCriterion(max_wallclock_time=30),
    n_workers=4,  # how many trials are evaluated in parallel
)
tuner.run()

This example runs ASHA with n_workers=4 asynchronously parallel workers for max_wallclock_time=30 seconds on the local machine it is called on (trial_backend=LocalBackend(entry_point=entry_point)).

Experimentation with Syne Tune

If you plan to use advanced features of Syne Tune, such as different execution backends or running experiments remotely, writing launcher scripts like examples/launch_height_simple.py can become tedious. Syne Tune provides an advanced experimentation framework, which you can learn about in this tutorial, or also in this one. Examples for the experimentation framework are given in benchmarking.examples and benchmarking.nursery.

Supported HPO Methods

The following hyperparameter optimization (HPO) methods are available in Syne Tune:

Method

Reference

Searcher

Asynchronous?

Multi-fidelity?

Transfer?

Grid Search

deterministic

yes

no

no

Random Search

Bergstra, et al. (2011)

random

yes

no

no

Bayesian Optimization

Snoek, et al. (2012)

model-based

yes

no

no

BORE

Tiao, et al. (2021)

model-based

yes

no

no

MedianStoppingRule

Golovin, et al. (2017)

any

yes

yes

no

SyncHyperband

Li, et al. (2018)

random

no

yes

no

SyncBOHB

Falkner, et al. (2018)

model-based

no

yes

no

SyncMOBSTER

Klein, et al. (2020)

model-based

no

yes

no

ASHA

Li, et al. (2019)

random

yes

yes

no

BOHB

Falkner, et al. (2018)

model-based

yes

yes

no

MOBSTER

Klein, et al. (2020)

model-based

yes

yes

no

DEHB

Awad, et al. (2021)

evolutionary

no

yes

no

HyperTune

Li, et al. (2022)

model-based

yes

yes

no

DyHPO *

Wistuba, et al. (2022)

model-based

yes

yes

no

ASHABORE

Tiao, et al. (2021)

model-based

yes

yes

no

PASHA

Bohdal, et al. (2022)

random

yes

yes

no

REA

Real, et al. (2019)

evolutionary

yes

no

no

KDE

Falkner, et al. (2018)

model-based

yes

no

no

PopulationBasedTraining

Jaderberg, et al. (2017)

evolutionary

no

yes

no

ZeroShotTransfer

Wistuba, et al. (2015)

deterministic

yes

no

yes

ASHA-CTS (ASHACTS)

Salinas, et al. (2021)

random

yes

yes

yes

RUSH (RUSHScheduler)

Zappella, et al. (2021)

random

yes

yes

yes

BoundingBox

Perrone, et al. (2019)

any

yes

yes

yes

*: We implement the model-based scheduling logic of DyHPO, but use the same Gaussian process surrogate models as MOBSTER and HyperTune. The original source code for the paper is here.

The searchers fall into four broad categories, deterministic, random, evolutionary and model-based. The random searchers sample candidate hyperparameter configurations uniformly at random, while the model-based searchers sample them non-uniformly at random, according to a model (e.g., Gaussian process, density ration estimator, etc.) and an acquisition function. The evolutionary searchers make use of an evolutionary algorithm.

Syne Tune also supports BoTorch searchers, see BoTorch.

Supported Multi-objective Optimization Methods

Method

Reference

Searcher

Asynchronous?

Multi-fidelity?

Transfer?

ConstrainedBayesianOptimization

Gardner, et al. (2014)

model-based

yes

no

no

MOASHA

Schmucker, et al. (2021)

random

yes

yes

no

NSGA2

Deb, et al. (2002)

evolutionary

no

no

no

MORandomScalarizationBayesOpt

Peria, et al. (2018)

model-based

yes

no

no

MOLinearScalarizationBayesOpt

model-based

yes

no

no

HPO methods listed can be used in a multi-objective setting by scalarization (LinearScalarizationPriority) or non-dominated sorting (NonDominatedPriority).

Security

See CONTRIBUTING for more information.

Citing Syne Tune

If you use Syne Tune in a scientific publication, please cite the following paper:

Syne Tune: A Library for Large Scale Hyperparameter Tuning and Reproducible Research

@inproceedings{
    salinas2022syne,
    title = {{Syne Tune}: A Library for Large Scale Hyperparameter Tuning and Reproducible Research},
    author = {David Salinas and Matthias Seeger and Aaron Klein and Valerio Perrone and Martin Wistuba and Cedric Archambeau},
    booktitle = {International Conference on Automated Machine Learning, AutoML 2022},
    year = {2022},
    url = {https://proceedings.mlr.press/v188/salinas22a.html}
}

License

This project is licensed under the Apache-2.0 License.

Frequently Asked Questions

Why should I use Syne Tune?

Hyperparameter Optimization (HPO) has been an important problem for many years, and a variety of commercial and open-source tools are available to help practitioners run HPO efficiently. Notable examples for open source tools are Ray Tune and Optuna. Here are some reasons why you may prefer Syne Tune over these alternatives:

  • Lightweight and platform-agnostic: Syne Tune is designed to work with different execution backends, so you are not locked into a particular distributed system architecture. Syne Tune runs with minimal dependencies.

  • Wide range of modalities: Syne Tune supports multi-fidelity HPO, constrained HPO, multi-objective HPO, transfer tuning, cost-aware HPO, population based training.

  • Simple, modular design: Rather than wrapping all sorts of other HPO frameworks, Syne Tune provides simple APIs and scheduler templates, which can easily be extended to your specific needs. Studying the code will allow you to understand what the different algorithms are doing, and how they differ from each other.

  • Industry-strength Bayesian optimization: Syne Tune has special support for Gaussian process based Bayesian optimization. The same code powers modalities like multi-fidelity HPO, constrained HPO, or cost-aware HPO, having been tried and tested for several years.

  • Support for distributed parallelized experimentation: We built Syne Tune to be able to move fast, using the parallel resources AWS SageMaker offers. Syne Tune allows ML/AI practitioners to easily set up and run studies with many experiments running in parallel.

  • Special support for researchers: Syne Tune allows for rapid development and comparison between different tuning algorithms. Its blackbox repository and simulator backend run realistic simulations of experiments many times faster than real time. Benchmarking is simple, efficient, and allows to compare different methods as apples to apples (same execution backend, implementation from the same parts).

If you are an AWS customer, there are additional good reasons to use Syne Tune over the alternatives:

  • If you use AWS services or SageMaker frameworks day to day, Syne Tune works out of the box and fits into your normal workflow. It unlocks the power of distributed experimentation that SageMaker offers.

  • Syne Tune is developed in collaboration with the team behind the Automatic Model Tuning service.

What are the different installation options supported?

To install Syne Tune with minimal dependencies from pip, you can simply do:

pip install 'syne-tune'

If you want in addition to install our own Gaussian process based optimizers, Ray Tune or Bore optimizer, you can run pip install 'syne-tune[X]' where X can be:

  • gpsearchers: For built-in Gaussian process based optimizers (such as BayesianOptimization, MOBSTER, or HyperTune)

  • aws: AWS SageMaker dependencies. These are required for remote launching or for the SageMakerBackend

  • raytune: For Ray Tune optimizers (see RayTuneScheduler), installs all Ray Tune dependencies

  • benchmarks: For installing dependencies required to run all benchmarks locally (not needed for remote launching or SageMakerBackend)

  • blackbox-repository: Blackbox repository for simulated tuning

  • yahpo: YAHPO Gym surrogate blackboxes

  • kde: For BOHB (such as SyncBOHB, or FIFOScheduler or HyperbandScheduler with searcher="kde")

  • botorch: Bayesian optimization from BoTorch (see BoTorchSearcher)

  • dev: For developers who would like to extend Syne Tune

  • bore: For Bore optimizer (see BORE)

There are also union tags you can use:

  • basic: Union of dependencies of a reasonable size (gpsearchers, kde, aws, moo, sklearn). Even if size does not matter for your local installation, you should consider basic for remote launching of experiments.

  • extra: Union of all dependencies listed above.

Our general recommendation is to use pip install 'syne-tune[basic]', then add

  • dev if you aim to extend Syne Tune

  • benchmarks if you like to run Syne Tune real benchmarks locally

  • blackbox-repository if you like to run surrogate benchmarks with the simulator backend

  • visual if you like to visualize results of experiments

In order to run schedulers which depend on BOTorch, you need to add botorch, and if you like to run Ray Tune schedulers, you need to add raytune (both of these come with many dependencies). If the size of the installation is of no concern, just use pip install 'syne-tune[extra]'.

If you run code which needs dependencies you have not installed, a warning message tells you which tag is missing, and you can always install it later.

To install the latest version from git, run the following:

pip install git+https://github.com/awslabs/syne-tune.git

For local development, we recommend using the following setup which will enable you to easily test your changes:

git clone https://github.com/awslabs/syne-tune.git
cd syne-tune
python3 -m venv st_venv
. st_venv/bin/activate
pip install --upgrade pip
pip install -e '.[basic,dev]'

This installs everything in a virtual environment st_venv. Remember to activate this environment before working with Syne Tune. We also recommend building the virtual environment from scratch now and then, in particular when you pull a new release, as dependencies may have changed.

How can I run on AWS and SageMaker?

If you want to launch experiments or training jobs on SageMaker rather than on your local machine, you will need access to AWS and SageMaker on your machine. Make sure that:

  • awscli is installed (see this link)

  • AWS credentials have been set properly (see this link).

  • The necessary SageMaker role has been created (see this page for instructions. If you’ve created a SageMaker notebook in the past, this role should already have been created for you).

The following command should run without error if your credentials are available:

python -c "import boto3; print(boto3.client('sagemaker').list_training_jobs(MaxResults=1))"

You can also run the following example that evaluates trials on SageMaker to test your setup.

python examples/launch_height_sagemaker.py

What are the metrics reported by default when calling the Reporter?

Whenever you call the reporter to log a result, the worker time-stamp, the worker time since the creation of the reporter and the number of times the reporter was called are logged under the fields ST_WORKER_TIMESTAMP, ST_WORKER_TIME, and ST_WORKER_ITER. In addition, when running on SageMaker, a dollar-cost estimate is logged under the field ST_WORKER_COST.

To see this behavior, you can simply call the reporter to see those metrics:

from syne_tune.report import Reporter
reporter = Reporter()
for step in range(3):
   reporter(step=step, metric=float(step) / 3)

# [tune-metric]: {"step": 0, "metric": 0.0, "st_worker_timestamp": 1644311849.6071281, "st_worker_time": 0.0001048670000045604, "st_worker_iter": 0}
# [tune-metric]: {"step": 1, "metric": 0.3333333333333333, "st_worker_timestamp": 1644311849.6071832, "st_worker_time": 0.00015910100000837701, "st_worker_iter": 1}
# [tune-metric]: {"step": 2, "metric": 0.6666666666666666, "st_worker_timestamp": 1644311849.60733, "st_worker_time": 0.00030723599996917983, "st_worker_iter": 2}

How can I utilize multiple GPUs?

To utilize multiple GPUs you can use the local backend LocalBackend, which will run on the GPUs available in a local machine. You can also run on a remote AWS instance with multiple GPUs using the local backend and the remote launcher, see here, or run with the SageMakerBackend which spins-up one training job per trial.

When evaluating trials on a local machine with LocalBackend, by default each trial is allocated to the least occupied GPU by setting CUDA_VISIBLE_DEVICES environment variable. When running on a machine with more than one GPU, you can adjust the number of GPUs assigned to each trial by num_gpus_per_trial. However, make sure that the product of n_workers and num_gpus_per_trial is not larger than the total number of GPUs, since otherwise trials will be delayed. You can also use gpus_to_use in order restrict Syne Tune to use a subset of available GPUs only.

What is the default mode when performing optimization?

The default mode is "min" when performing optimization, so the target metric is minimized. The mode can be configured when instantiating a scheduler.

How are trials evaluated on a local machine?

When trials are executed locally (e.g., when LocalBackend is used), each trial is evaluated as a different sub-process. As such the number of concurrent configurations evaluated at the same time (set by n_workers when creating the Tuner) should account for the capacity of the machine where the trials are executed.

Is the tuner checkpointed?

Yes. When performing the tuning, the tuner state is regularly saved on the experiment path under tuner.dill (every 10 seconds, which can be configured with results_update_interval when creating the Tuner). This allows to use spot-instances when running a tuning remotely with the remote launcher. It also allows to resume a past experiment or analyse the state of scheduler at any point.

Where can I find the output of the tuning?

When running locally, the output of the tuning is saved under ~/syne-tune/{tuner-name}/ by default. When running remotely on SageMaker, the output of the tuning is saved under /opt/ml/checkpoints/ by default and the tuning output is synced regularly to s3://{sagemaker-default-bucket}/syne-tune/{tuner-name}/.

Can I resume a previous tuning job?

Yes, if you want to resume tuning you can deserialize the tuner that is regularly checkpointed to disk possibly after having modified some part of the scheduler or adapting the stopping condition to your need. See examples/launch_resume_tuning.py. for an example which resumes a previous tuning after having updated the configuration space.

How can I change the default output folder where tuning results are stored?

To change the path where tuning results are written, you can set the environment variable SYNETUNE_FOLDER to the folder that you want.

For instance, the following runs a tuning where results tuning files are written under ~/new-syne-tune-folder:

export SYNETUNE_FOLDER="~/new-syne-tune-folder"
python examples/launch_height_baselines.py

You can also do the following for instance to permanently change the output folder of Syne Tune:

echo 'export SYNETUNE_FOLDER="~/new-syne-tune-folder"' >> ~/.bashrc && source ~/.bashrc

What does the output of the tuning contain?

Syne Tune stores the following files metadata.json, results.csv.zip, and tuner.dill, which are respectively metadata of the tuning job, results obtained at each time-step, and state of the tuner. If you create the Tuner with save_tuner=False, the tuner.dill file is not written. The content of results.csv.zip can be customized.

How can I enable trial checkpointing?

Since trials may be paused and resumed (either by schedulers or when using spot-instances), the user may checkpoint intermediate results to avoid starting computation from scratch. Model outputs and checkpoints must be written into a specific local path given by the command line argument ST_CHECKPOINT_DIR. Saving/loading model checkpoint from this directory enables to save/load the state when the job is stopped/resumed (setting the folder correctly and uniquely per trial is the responsibility of the backend). Here is an example of a tuning script with checkpointing enabled:

examples/training_scripts/checkpoint_example/checkpoint_example.py
import argparse
import json
import logging
import os
import time
from pathlib import Path

from syne_tune import Reporter
from syne_tune.constants import ST_CHECKPOINT_DIR


report = Reporter()


def load_checkpoint(checkpoint_path: Path):
    with open(checkpoint_path, "r") as f:
        return json.load(f)


def save_checkpoint(checkpoint_path: Path, epoch: int, value: float):
    os.makedirs(checkpoint_path.parent, exist_ok=True)
    with open(checkpoint_path, "w") as f:
        json.dump({"last_epoch": epoch, "last_value": value}, f)


if __name__ == "__main__":
    root = logging.getLogger()
    root.setLevel(logging.INFO)

    parser = argparse.ArgumentParser()
    parser.add_argument("--num-epochs", type=int, required=True)
    parser.add_argument("--multiplier", type=float, default=1)
    parser.add_argument("--sleep-time", type=float, default=0.1)

    # convention the path where to serialize and deserialize is given as st_checkpoint_dir
    parser.add_argument(f"--{ST_CHECKPOINT_DIR}", type=str)

    args, _ = parser.parse_known_args()

    num_epochs = args.num_epochs
    checkpoint_path = None
    start_epoch = 1
    current_value = 0
    checkpoint_dir = getattr(args, ST_CHECKPOINT_DIR)
    if checkpoint_dir is not None:
        checkpoint_path = Path(checkpoint_dir) / "checkpoint.json"
        if checkpoint_path.exists():
            state = load_checkpoint(checkpoint_path)
            logging.info(f"resuming from previous checkpoint {state}")
            start_epoch = state["last_epoch"] + 1
            current_value = state["last_value"]

    # write dumb values for loss to illustrate sagemaker ability to retrieve metrics
    # should be replaced by your algorithm
    for current_epoch in range(start_epoch, num_epochs + 1):
        time.sleep(args.sleep_time)
        current_value = (current_value + 1) * args.multiplier
        if checkpoint_path is not None:
            save_checkpoint(checkpoint_path, current_epoch, current_value)
        report(train_acc=current_value, epoch=current_epoch)

When using the SageMaker backend, we use the SageMaker checkpoint mechanism under the hood to sync local checkpoints to S3. Checkpoints are synced to s3://{sagemaker-default-bucket}/syne-tune/{tuner-name}/{trial-id}/, where sagemaker-default-bucket is the default bucket for SageMaker. A complete example is given by examples/launch_height_sagemaker_checkpoints.py.

The same mechanism is used to regularly write the tuning results to S3 during remote tuning. However, during remote tuning with the local backend, we do not want checkpoints to be synced to S3, since they are only required temporarily on the same instance. Syncing them to S3 would be costly and error-prone, because the SageMaker mechanism is not intended to work with different processes writing to and reading from the sync directory concurrently. In this case, we can switch off syncing checkpoints to S3 (but not tuning results!) by setting trial_backend_path=backend_path_not_synced_to_s3() when creating the Tuner object. An example is fine_tuning_transformer_glue/hpo_main.py. It is also supported by default in the experimentation framework and in RemoteLauncher.

There are some convenience functions which help you to implement checkpointing for your training script. Have a look at resnet_cifar10.py:

  • Checkpoints have to be written at the end of certain epochs (namely those after which the scheduler may pause the trial). This is dealt with by checkpoint_model_at_rung_level(config, save_model_fn, epoch). Here, epoch is the current epoch, allowing the function to decide whether to checkpoint or not. save_model_fn stores the current mutable state along with epoch to a local path (see below). Finally, config contains arguments provided by the scheduler (see below).

  • Before the training loop starts (and optionally), the mutable state to start from has to be loaded from a checkpoint. This is done by resume_from_checkpointed_model(config, load_model_fn). If the checkpoint has been loaded successfully, the training loop may start with epoch resume_from + 1 instead of 1. Here, load_model_fn loads the mutable state from a checkpoint in a local path, returning its epoch value if successful, which is returned as resume_from.

In general, load_model_fn and save_model_fn have to be provided as part of the script. For most PyTorch models, you can use pytorch_load_save_functions to this end. In general, you will want to include the model, the optimizer, and the learning rate scheduler.

Finally, the scheduler provides additional information about checkpointing in config (most importantly, the path in ST_CHECKPOINT_DIR). You don’t have to worry about this: add_checkpointing_to_argparse(parser) adds corresponding arguments to the parser.

How can I retrieve the best checkpoint obtained after tuning?

You can take a look at this example examples/launch_checkpoint_example.py which shows how to retrieve the best checkpoint obtained after tuning an XGBoost model.

How can I retrain the best model found after tuning?

You can call tuner.trial_backend.start_trial(config=tuner.best_config()) after tuning to retrain the best config, you can take a look at this example examples/launch_plot_example.py which shows how to retrain the best model found while tuning.

Which schedulers make use of checkpointing?

Checkpointing means storing the state of a trial (i.e., model parameters, optimizer or learning rate scheduler parameters), so that it can be paused and potentially resumed at a later point in time, without having to start training from scratch. The following schedulers make use of checkpointing:

  • Promotion-based asynchronous Hyperband: HyperbandScheduler with type="promotion" or type="dyhpo", as well as other asynchronous multi-fidelity schedulers. The code runs without checkpointing, but in this case, any trial which is resumed is started from scratch. For example, if a trial was paused after 9 epochs of training and is resumed later, training starts from scratch and the first 9 epochs are wasted effort. Moreover, extra variance is introduced by starting from scratch, since weights may be initialized differently. It is not recommended running promotion-based Hyperband without checkpointing.

  • Population-based training (PBT): PopulationBasedTraining does not work without checkpointing.

  • Synchronous Hyperband: SynchronousGeometricHyperbandScheduler, as well as other synchronous multi-fidelity schedulers. This code runs without checkpointing, but wastes effort in the same sense as promotion-based asynchronous Hyperband

Checkpoints are filling up my disk. What can I do?

When tuning large models, checkpoints can be large, and with the local backend, these checkpoints are stored locally. With multi-fidelity methods, many trials may be started, and keeping all checkpoints (which is the default) may exceed the available disk space.

If the trial backend TrialBackend is created with delete_checkpoints=True, Syne Tune removes the checkpoint of a trial once it is stopped or completes. All remaining checkpoints are removed at the end of the experiment. Moreover, a number of schedulers support early checkpoint removal for paused trials when they cannot be resumed anymore.

For promotion-based asynchronous multi-fidelity schedulers ( ASHA, MOBSTER, HyperTune), any paused trial can in principle be resumed in the future, and delete_checkpoints=True` alone does not remove checkpoints. In this case, you can activate speculative early checkpoint removal, by passing early_checkpoint_removal_kwargs when creating HyperbandScheduler (or ASHA, MOBSTER, HyperTune). This is a kwargs dictionary with the following arguments:

  • max_num_checkpoints: This is mandatory. Maximum number of trials with checkpoints being retained. Once more than this number of trials with checkpoints are present, checkpoints are removed selectively. This number must be larger than the number of workers, since running trials will always write checkpoints.

  • approx_steps: Positive integer. The computation of the ranking score is a step-wise approximation, which gets more accurate for larger approx_steps. However, this computation scales cubically in approx_steps. The default is 25, which may be sufficient in most cases, but if you need to keep the number of checkpoints quite small, you may want to tune this parameter.

  • max_wallclock_time: Maximum time in seconds the experiment is run for. This is the same as passed to StoppingCriterion, and if you use an instance of this as stop_criterion passed to Tuner, the value is taken from there. Speculative checkpoint removal can only be used if the stopping criterion includes max_wallclock_time.

  • prior_beta_mean: The method depends on the probability of the event that a trial arriving at a rung ranks better than a random paused trial with checkpoint at this rung. These probabilities are estimated for each rung, but we need some initial guess. You are most likely fine with the default. A value \(< 1/2\) is recommended.

  • prior_beta_size: See also prior_beta_mean. The initial guess is a Beta prior, defined in terms of mean and effective sample size (here). The smaller this positive number, the weaker the effect of the initial guess. You are most likely fine with the default.

  • min_data_at_rung: Also related to the estimators mentioned with prior_beta_mean. You are most likely fine with the default.

A complete example is examples/launch_fashionmnist_checkpoint_removal.py. For details on speculative checkpoint removal, look at HyperbandRemoveCheckpointsCallback.

Where can I find the output of my trials?

When running LocalBackend locally, results of trials are saved under ~/syne-tune/{tuner-name}/{trial-id}/ and contains the following files:

  • config.json: configuration that is being evaluated in the trial

  • std.err: standard error

  • std.out: standard output

In addition all checkpointing files used by a training script such as intermediate model checkpoint will also be located there. This is exemplified in the following example:

tree ~/syne-tune/train-height-2022-01-12-11-08-40-971/
~/syne-tune/train-height-2022-01-12-11-08-40-971/
├── 0   ├── config.json
│   ├── std.err
│   ├── std.out
│   └── stop
├── 1   ├── config.json
│   ├── std.err
│   ├── std.out
│   └── stop
├── 2   ├── config.json
│   ├── std.err
│   ├── std.out
│   └── stop
├── 3   ├── config.json
│   ├── std.err
│   ├── std.out
│   └── stop
├── metadata.json
├── results.csv.zip
└── tuner.dill

When running tuning remotely with the remote launcher, only config.json, metadata.json, results.csv.zip and tuner.dill are synced with S3 unless store_logs_localbackend=True when creating Tuner, in which case the trial logs and informations are also persisted.

Is the experimentation framework only useful to compare different HPO methods?

No, by all means no! Most of our users do not use it that way, but simply to speed up experimentation, often with a single HPO methods, but many variants of their problem. More details about Syne Tune for rapid experimentation are provided here and here. Just to clarify:

  • We use the term benchmark to denote a tuning problem, consisting of some code for training and evaluation, plus some default configuration space (which can be changed to result in different variants of the benchmark).

  • While the code for the experimentation framework resides in syne_tune.experiments, we collect example benchmarks in benchmarking (only available if Syne Tune is installed from source). Many of the examples there are about comparison of different HPO methods, but some are not (for example, benchmarking.examples.demo_experiment).

  • In fact, while you do not have to use the experimentation framework to run studies in Syne Tune, it is much easier than maintaining your own launcher scripts and plotting code, so you are strongly encouraged to do so, whether your goal is benchmarking HPO methods or simply just find a good ML model for your current problem faster.

How can I plot the results of a tuning?

Some basic plots can be obtained via ExperimentResult. An example is given in examples/launch_plot_results.py.

How can I plot comparative results across many experiments?

Syne Tune contains powerful plotting tools as part of the experimentation framework in mod:syne_tune.experiments, these are detailed here. An example is provided as part of benchmarking/examples/benchmark_hypertune.

How can I specify additional tuning metadata?

By default, Syne Tune stores the time, the names and modes of the metric being tuned, the name of the entrypoint, the name backend and the scheduler name. You can also add custom metadata to your tuning job by setting metadata in Tuner as follow:

from syne_tune import Tuner

tuner = Tuner(
    ...
    tuner_name="plot-results-demo",
    metadata={"tag": "special-tag", "user": "alice"},
)

All Syne Tune and user metadata are saved when the tuner starts under metadata.json.

How do I append additional information to the results which are stored?

Results are processed and stored by callbacks passed to Tuner, in particular see StoreResultsCallback. In order to add more information, you can inherit from this class. An example is given in StoreResultsAndModelParamsCallback.

If you run experiments with tabulated benchmarks using the BlackboxRepositoryBackend, as demonstrated in launch_nasbench201_simulated.py, results are stored by SimulatorCallback instead, and you need to inherit from this class. An example is given in SimulatorAndModelParamsCallback.

I don’t want to wait, how can I launch the tuning on a remote machine?

Remote launching of experiments has a number of advantages:

  • The machine you are working on is not blocked

  • You can launch many experiments in parallel

  • You can launch experiments with any instance type you like, without having to provision them yourselves. For GPU instances, you do not have to worry about setting up CUDA, etc.

You can use the remote launcher to launch an experiment on a remote machine. The remote launcher supports both LocalBackend and SageMakerBackend. In the former case, multiple trials will be evaluated on the remote machine (one use-case being to use a beefy machine), in the latter case trials will be evaluated as separate SageMaker training jobs. An example for running the remote launcher is given in launch_height_sagemaker_remotely.py.

Remote launching for experimentation is detailed in this tutorial or this tutorial.

How can I run many experiments in parallel?

You can remotely launch any number of experiments, which will then run in parallel, as detailed in this tutorial, see also these examples:

Note

In order to run these examples, you need to have installed Syne Tune from source.

How can I access results after tuning remotely?

You can either call load_experiment(), which will download files from S3 if the experiment is not found locally. You can also sync directly files from S3 under ~/syne-tune/ folder in batch for instance by running:

aws s3 sync s3://{sagemaker-default-bucket}/syne-tune/{tuner-name}/ ~/syne-tune/  --include "*"  --exclude "*tuner.dill"

To get all results without the tuner state (you can omit the include and exclude if you also want to include the tuner state).

How can I specify dependencies to remote launcher or when using the SageMaker backend?

When you run remote code, you often need to install packages (e.g., scipy) or have custom code available.

  • To install packages, you can add a file requirements.txt in the same folder as your endpoint script. All those packages will be installed by SageMaker when docker container starts.

  • To include custom code (for instance a library that you are working on), you can set the parameter dependencies on the remote launcher or on a SageMaker framework to a list of folders. The folders indicated will be compressed, sent to S3 and added to the python path when the container starts. More details are given in this tutorial.

How can I benchmark different methods?

The most flexible way to do so is to write a custom launcher script, as detailed in this tutorial, see also these examples:

Note

In order to run these examples, you need to have installed Syne Tune from source.

What different schedulers do you support? What are the main differences between them?

A succinct overview of supported schedulers is provided here.

Most methods can be accessed with short names by from syne_tune.optimizer.baselines, which is the best place to start.

We refer to HPO algorithms as schedulers. A scheduler decides which configurations to assign to new trials, but also when to stop a running or resume a paused trial. Some schedulers delegate the first decision to a searcher. The most important differences between schedulers in the single-objective case are:

  • Does the scheduler stop trials early or pause and resume trials (HyperbandScheduler) or not (FIFOScheduler). The former requires a resource dimension (e.g., number of epochs; size of training set) and slightly more elaborate reporting (e.g., evaluation after every epoch), but can outperform the latter by a large margin.

  • Does the searcher suggest new configurations by uniform random sampling (searcher="random") or by sequential model-based decision-making (searcher="bayesopt", searcher="kde", searcher="hypertune", searcher="botorch", searcher="dyhpo"). The latter can be more expensive if a lot of trials are run, but can also be more sample-efficient.

An overview of this landscape is given here.

Here is a tutorial for multi-fidelity schedulers. Further schedulers provided by Syne Tune include:

How do I define the configuration space?

While the training script defines the function to be optimized, some care needs to be taken to define the configuration space for the hyperparameter optimization problem. This being a global optimization problem without gradients easily available, it is most important to reduce the number of parameters. A general recommendation is to use streamline_config_space() on your configuration space, which does some automatic rewriting to enforce best practices. Details on how to choose a configuration space, and on automatic rewriting, is given here.

A powerful approach is to run experiments in parallel. Namely, split your hyperparameters into groups A, B, such that HPO over B is tractable. Draw a set of N configurations from A at random, then start N HPO experiments in parallel, where in each of them the search space is over B only, while the parameters in A are fixed. Syne Tune supports massively parallel experimentation, see this tutorial.

How do I set arguments of multi-fidelity schedulers?

When running schedulers like ASHA, MOBSTER, HyperTune, SyncHyperband, or DEHB, there are mandatory parameters resource_attr, max_resource_attr, max_t, max_resource_value. What are they for?

Full details are given in this tutorial. Multi-fidelity HPO needs metric values to be reported at regular intervals during training, for example after every epoch, or for successively larger training datasets. These reports are indexed by a resource value, which is a positive integer (for example, the number of epochs already trained).

  • resource_attr is the name of the resource attribute in the dictionary reported by the training script. For example, the script may report report(epoch=5, mean_loss=0.125) at the end of the 5-th epoch, in which case resource_attr = "epoch".

  • The training script needs to know how many resources to spend overall. For example, a neural network training script needs to know how many epochs to maximally train for. It is best practice to pass this maximum resource value as parameter into the script, which is done by making it part of the configuration space. In this case, max_resource_attr is the name of the attribute in the configuration space which contains the maximum resource value. For example, if your script should train for a maximum of 100 epochs (the scheduler may stop or pause it before, though), you could use config_space = dict(..., epochs=100), in which case max_resource_attr = "epochs".

  • Finally, you can also use max_t instead of max_resource_attr, even though this is not recommended. If you don’t want to include the maximum resource value in your configuration space, you can pass the value directly as max_t. However, this can lead to avoidable errors, and may be inefficient for some schedulers.

Note

When creating a multi-fidelity scheduler, we recommend to use max_resource_attr in favour of max_t or max_resource_value, as the latter is error-prone and may be less efficient for some schedulers.

Is my training script ready for multi-fidelity tuning?

A more detailed answer to this question is given in the multi-fidelity tutorial. In short:

  • You need to define the notion of resource for your script. Resource is a discrete variable (integer), so that time/costs scale linearly in it for every configuration. A common example is epochs of training for a neural network. You need to pass the name of this argument as max_resource_attr to the multi-fidelity scheduler.

  • One input argument to your script is the maximum number of resources. Your script loops over resources until this is reached, then terminates.

  • At the end of this resource loop (e.g., loop over training epochs), you report metrics. Here, you need to report the current resource level as well (e.g., number of epochs trained so far).

  • It is recommended to support checkpointing, as is detailed here.

Note

In pause-and-resume multi-fidelity schedulers, we know for how many resources each training job runs, since it is paused at the next rung level. Such schedulers will pass this resource level via max_resource_attr to the training script. This means that the script terminates on its own and does not have to be stopped by the trial execution backend.

How can I visualize the progress of my tuning experiment with Tensorboard?

To visualize the progress of Syne Tune in Tensorboard, you can pass the TensorboardCallback to the Tuner object:

from syne_tune.callbacks import TensorboardCallback

tuner = Tuner(
    ...
    callbacks=[TensorboardCallback()],
)

Note that, you need to install TensorboardX to use this callback:

pip install tensorboardX

The callback will log all metrics that are reported in your training script via the report(...) function. Now, to open Tensorboard, run:

tensorboard --logdir ~/syne-tune/{tuner-name}/tensorboard_output

If you want to plot the cumulative optimum of the metric you want to optimize, you can pass the target_metric argument to class:syne_tune.callbacks.TensorboardCallback. This will also report the best found hyperparameter configuration over time. A complete example is examples/launch_tensorboard_example.py.

How can I add a new scheduler?

This is explained in detail in this tutorial, and also in examples/launch_height_standalone_scheduler.

Please do consider contributing back your efforts to the Syne Tune community, thanks!

How can I add a new tabular or surrogate benchmark?

To add a new dataset of tabular evaluations, you need to:

Further details are given here.

How can I reduce delays in starting trials with the SageMaker backend?

The SageMaker backend executes each trial as a SageMaker training job, which encurs start-up delays up to several minutes. These delays can be reduced to about 20 seconds with SageMaker managed warm pools, as is detailed in this tutorial or this example. We strongly recommend to use managed warm pools with the SageMaker backend.

How can I pass lists or dictionaries to the training script?

By default, the hyperparameter configuration is passed to the training script as command line arguments. This precludes parameters from having complex types, such as lists or dictionaries. The configuration can also be passed as JSON file, in which case its entries can have any type which is JSON-serializable. This mode is activated with pass_args_as_json=True when creating the trial backend:

examples/launch_height_config_json.py
        trial_backend = LocalBackend(
            entry_point=str(entry_point),
            pass_args_as_json=True,
        )

The trial backend stores the configuration as JSON file and passes its filename as command line argument. In the training script, the configuration is loaded as follows:

examples/training_scripts/height_example/train_height_config_json.py
    parser = ArgumentParser()
    # Append required argument(s):
    add_config_json_to_argparse(parser)
    args, _ = parser.parse_known_args()
    # Loads config JSON and merges with ``args``
    config = load_config_json(vars(args))

The complete example is here. Note that entries automatically appended to the configuration by Syne Tune, such as ST_CHECKPOINT_DIR, are passed as command line arguments in any case.

How can I write extra results for an experiment?

By default, Syne Tune is writing these result files at the end of an experiment. Here, results.csv.zip contains all data reported by training jobs, along with time stamps. The contents of this dataframe can be customized, by adding extra columns to it, as demonstrated in examples/launch_height_extra_results.py.

Examples

Tune XGBoost

Install dependencies
[ ]:
%pip install 'syne-tune[basic]'
%pip install xgboost
[ ]:
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import PythonBackend
from syne_tune.config_space import randint, uniform, loguniform
from syne_tune.optimizer.baselines import BayesianOptimization
from syne_tune.experiments import load_experiment
Define the training function
[ ]:
def train(n_estimators: int, max_depth: int, gamma: float, reg_lambda: float):
    ''' Training function (the function to be tuned) with hyperparameters passed in as function arguments

    This example demonstrates training an XGBoost model on the UCI ML hand-written digits dataset.

    Note that the training function must be totally self-contained as it needs to be serialized.
    Everything (including variables and dependencies) must be defined or imported inside the function scope.

    For more information on XGBoost's hyperparameters, see https://xgboost.readthedocs.io/en/stable/parameter.html
    For more information about the dataset, see https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html
    '''

    from sklearn.datasets import load_digits
    from sklearn.model_selection import train_test_split
    from syne_tune import Reporter
    import xgboost
    import numpy as np

    X, y = load_digits(return_X_y=True)

    X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.25, random_state=42)

    report = Reporter()

    clf = xgboost.XGBClassifier(
        n_estimators=n_estimators,
        reg_lambda=reg_lambda,
        gamma=gamma,
        max_depth=max_depth,
    )
    clf.fit(X_train, y_train)

    y_pred = clf.predict(X_val)
    accuracy = (np.equal(y_val, y_pred) * 1.0).mean()

    # report metrics back to syne tune
    report(accuracy = accuracy)
Define the tuning parameters
[ ]:
# Hyperparameter configuration space
config_space = {
    "max_depth": randint(1,10),
    "gamma": uniform(1,10),
    "reg_lambda": loguniform(.0000001, 1),
    "n_estimators": randint(5, 15)
}

# Scheduler (i.e., HPO algorithm)
scheduler = BayesianOptimization(
    config_space,
    metric="accuracy",
    mode="max"
)

tuner = Tuner(
    trial_backend=PythonBackend(tune_function=train, config_space=config_space),
    scheduler=scheduler,
    stop_criterion=StoppingCriterion(max_wallclock_time=30),
    n_workers=4,  # how many trials are evaluated in parallel
)
Run the tuning
[ ]:
tuner.run()

tuning_experiment = load_experiment(tuner.name)

print(f"best result found: {tuning_experiment.best_config()}")

tuning_experiment.plot()

Launch HPO Experiment Locally

examples/launch_height_baselines.py
import logging
from pathlib import Path

from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint
from syne_tune.optimizer.baselines import (
    RandomSearch,
    ASHA,
)
from examples.training_scripts.height_example.train_height import (
    RESOURCE_ATTR,
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
)
from syne_tune.try_import import try_import_gpsearchers_message


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927
    max_epochs = 100
    n_workers = 4

    config_space = {
        MAX_RESOURCE_ATTR: max_epochs,
        "width": randint(0, 20),
        "height": randint(-100, 100),
    }
    entry_point = (
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    scheduler_kwargs = {
        "config_space": config_space,
        "metric": METRIC_ATTR,
        "mode": METRIC_MODE,
        "max_resource_attr": MAX_RESOURCE_ATTR,
    }
    schedulers = [
        RandomSearch(**scheduler_kwargs),
        ASHA(**scheduler_kwargs, resource_attr=RESOURCE_ATTR),
    ]
    try:
        from syne_tune.optimizer.baselines import BayesianOptimization

        # example of setting additional kwargs arguments
        schedulers.append(
            BayesianOptimization(
                **scheduler_kwargs,
                search_options={"num_init_random": n_workers + 2},
            )
        )
        from syne_tune.optimizer.baselines import MOBSTER

        schedulers.append(MOBSTER(*scheduler_kwargs, resource_attr=RESOURCE_ATTR))
    except Exception:
        logging.info(try_import_gpsearchers_message())

    for scheduler in schedulers:
        logging.info(f"\n*** running scheduler {scheduler} ***\n")

        trial_backend = LocalBackend(entry_point=str(entry_point))

        stop_criterion = StoppingCriterion(
            max_wallclock_time=20, min_metric_value={METRIC_ATTR: -6.0}
        )
        tuner = Tuner(
            trial_backend=trial_backend,
            scheduler=scheduler,
            stop_criterion=stop_criterion,
            n_workers=n_workers,
        )

        tuner.run()

Along with several of the examples below, this launcher script is using the following train_height.py training script:

examples/training_scripts/height_example/train_height.py
"""
Example similar to Raytune, https://github.com/ray-project/ray/blob/master/python/ray/tune/examples/skopt_example.py
"""
import logging
import time
from typing import Optional, Dict, Any

from syne_tune import Reporter
from argparse import ArgumentParser

from syne_tune.config_space import randint


report = Reporter()


RESOURCE_ATTR = "epoch"

METRIC_ATTR = "mean_loss"

METRIC_MODE = "min"

MAX_RESOURCE_ATTR = "steps"


def train_height(step: int, width: float, height: float) -> float:
    return 100 / (10 + width * step) + 0.1 * height


def height_config_space(
    max_steps: int, sleep_time: Optional[float] = None
) -> Dict[str, Any]:
    kwargs = {"sleep_time": sleep_time} if sleep_time is not None else dict()
    return {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
        **kwargs,
    }


if __name__ == "__main__":
    root = logging.getLogger()
    root.setLevel(logging.INFO)

    parser = ArgumentParser()
    parser.add_argument("--" + MAX_RESOURCE_ATTR, type=int)
    parser.add_argument("--width", type=float)
    parser.add_argument("--height", type=float)
    parser.add_argument("--sleep_time", type=float, default=0.1)

    args, _ = parser.parse_known_args()

    width = args.width
    height = args.height
    num_steps = getattr(args, MAX_RESOURCE_ATTR)
    for step in range(num_steps):
        # Sleep first, since results are returned at end of "epoch"
        time.sleep(args.sleep_time)
        # Feed the score back to Syne Tune.
        dummy_score = train_height(step, width, height)
        report(
            **{
                "step": step,
                METRIC_ATTR: dummy_score,
                RESOURCE_ATTR: step + 1,
            }
        )

Fine-Tuning Hugging Face Model for Sentiment Classification

examples/launch_huggingface_classification.py
"""
Example for how to fine-tune a DistilBERT model on the IMDB sentiment classification task using the Hugging Face SageMaker Framework.
"""
import logging
from pathlib import Path

from sagemaker.huggingface import HuggingFace

import syne_tune
from benchmarking.benchmark_definitions import distilbert_imdb_benchmark
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
    get_execution_role,
    default_sagemaker_session,
)
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune.remote.constants import (
    HUGGINGFACE_LATEST_FRAMEWORK_VERSION,
    HUGGINGFACE_LATEST_PYTORCH_VERSION,
    HUGGINGFACE_LATEST_TRANSFORMERS_VERSION,
    HUGGINGFACE_LATEST_PY_VERSION,
)

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    # We pick the DistilBERT on IMDB benchmark
    # The 'benchmark' dict contains arguments needed by scheduler and
    # searcher (e.g., 'mode', 'metric'), along with suggested default values
    # for other arguments (which you are free to override)
    random_seed = 31415927
    n_workers = 4
    benchmark = distilbert_imdb_benchmark()
    mode = benchmark.mode
    metric = benchmark.metric
    config_space = benchmark.config_space

    # Define Hugging Face SageMaker estimator
    root = Path(syne_tune.__path__[0]).parent
    estimator = HuggingFace(
        framework_version=HUGGINGFACE_LATEST_FRAMEWORK_VERSION,
        transformers_version=HUGGINGFACE_LATEST_TRANSFORMERS_VERSION,
        pytorch_version=HUGGINGFACE_LATEST_PYTORCH_VERSION,
        py_version=HUGGINGFACE_LATEST_PY_VERSION,
        entry_point=str(benchmark.script),
        base_job_name="hpo-transformer",
        instance_type=benchmark.instance_type,
        instance_count=1,
        role=get_execution_role(),
        dependencies=[root / "benchmarking"],
        sagemaker_session=default_sagemaker_session(),
    )

    # SageMaker backend
    trial_backend = SageMakerBackend(
        sm_estimator=estimator,
        metrics_names=[metric],
    )

    # Random search without stopping
    scheduler = RandomSearch(
        config_space, mode=mode, metric=metric, random_seed=random_seed
    )

    stop_criterion = StoppingCriterion(
        max_wallclock_time=3000
    )  # wall clock time can be increased to 1 hour for more performance
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
    )

    tuner.run()

Requirements:

In this example, we use the SageMaker backend together with the SageMaker Hugging Face framework in order to fine-tune a DistilBERT model on the IMDB sentiment classification task. This task is one of our built-in benchmarks. For other ways to run this benchmark on different backends or remotely, consult this tutorial.

A more advanced example for fine-tuning Hugging Face transformers is given here.

Launch HPO Experiment with Python Backend

examples/launch_height_python_backend.py
"""
An example showing to launch a tuning of a python function ``train_height``.
"""

from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import PythonBackend
from syne_tune.config_space import randint
from syne_tune.optimizer.baselines import ASHA


def train_height(steps: int, width: float, height: float):
    """
    The function to be tuned, note that import must be in PythonBackend and no global variable are allowed,
    more details on requirements of tuned functions can be found in
    :class:`~syne_tune.backend.PythonBackend`.
    """
    import logging
    from syne_tune import Reporter
    import time

    root = logging.getLogger()
    root.setLevel(logging.INFO)
    reporter = Reporter()
    for step in range(steps):
        dummy_score = (0.1 + width * step / 100) ** (-1) + height * 0.1
        # Feed the score back to Syne Tune.
        reporter(step=step, mean_loss=dummy_score, epoch=step + 1)
        time.sleep(0.1)


if __name__ == "__main__":
    import logging

    root = logging.getLogger()
    root.setLevel(logging.INFO)

    max_steps = 100
    n_workers = 4
    metric = "mean_loss"
    mode = "min"
    max_resource_attr = "steps"

    config_space = {
        max_resource_attr: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
    }

    scheduler = ASHA(
        config_space,
        metric=metric,
        max_resource_attr=max_resource_attr,
        resource_attr="epoch",
        mode=mode,
    )

    trial_backend = PythonBackend(tune_function=train_height, config_space=config_space)

    stop_criterion = StoppingCriterion(
        max_wallclock_time=10, min_metric_value={metric: -6.0}
    )
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
    )
    tuner.run()

The Python backend does not need a separate training script.

Population-Based Training (PBT)

examples/launch_pbt.py
import logging
from pathlib import Path

from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import PopulationBasedTraining
from syne_tune import Tuner
from syne_tune.config_space import loguniform
from syne_tune import StoppingCriterion


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.DEBUG)

    max_trials = 100

    config_space = {
        "lr": loguniform(0.0001, 0.02),
    }

    entry_point = (
        Path(__file__).parent / "training_scripts" / "pbt_example" / "pbt_example.py"
    )
    trial_backend = LocalBackend(entry_point=str(entry_point))

    mode = "max"
    metric = "mean_accuracy"
    time_attr = "training_iteration"
    population_size = 2

    pbt = PopulationBasedTraining(
        config_space=config_space,
        metric=metric,
        resource_attr=time_attr,
        population_size=population_size,
        mode=mode,
        max_t=200,
        perturbation_interval=1,
    )

    local_tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=pbt,
        stop_criterion=StoppingCriterion(max_wallclock_time=20),
        n_workers=population_size,
        results_update_interval=1,
    )

    local_tuner.run()

This launcher script is using the following pbt_example.py training script:

examples/training_scripts/pbt_example/pbt_example.py
import numpy as np
import argparse
import logging
import json
import os
import random
import time

from syne_tune import Reporter
from syne_tune.constants import ST_CHECKPOINT_DIR

report = Reporter()


def pbt_function(config):
    """Toy PBT problem for benchmarking adaptive learning rate.

    The goal is to optimize this trainable's accuracy. The accuracy increases
    fastest at the optimal lr, which is a function of the current accuracy.

    The optimal lr schedule for this problem is the triangle wave as follows.
    Note that many lr schedules for real models also follow this shape:

     best lr
      ^
      |    /\
      |   /  \
      |  /    \
      | /      \
      ------------> accuracy

    In this problem, using PBT with a population of 2-4 is sufficient to
    roughly approximate this lr schedule. Higher population sizes will yield
    faster convergence. Training will not converge without PBT.
    """
    lr = config["lr"]
    checkpoint_dir = config.get(ST_CHECKPOINT_DIR)
    accuracy = 0.0  # end = 1000
    start = 1
    if checkpoint_dir and os.path.isdir(checkpoint_dir):
        with open(os.path.join(checkpoint_dir, "checkpoint.json"), "r") as f:
            state = json.loads(f.read())
            accuracy = state["acc"]
            start = state["step"]

    midpoint = 100  # lr starts decreasing after acc > midpoint
    q_tolerance = 3  # penalize exceeding lr by more than this multiple
    noise_level = 2  # add gaussian noise to the acc increase
    # triangle wave:
    #  - start at 0.001 @ t=0,
    #  - peak at 0.01 @ t=midpoint,
    #  - end at 0.001 @ t=midpoint * 2,
    for step in range(start, 200):
        if accuracy < midpoint:
            optimal_lr = 0.01 * accuracy / midpoint
        else:
            optimal_lr = 0.01 - 0.01 * (accuracy - midpoint) / midpoint
        optimal_lr = min(0.01, max(0.001, optimal_lr))
        # Compute accuracy increase
        q_err = max(lr, optimal_lr) / min(lr, optimal_lr)
        if q_err < q_tolerance:
            accuracy += (1.0 / q_err) * random.random()
        elif lr > optimal_lr:
            accuracy -= (q_err - q_tolerance) * random.random()
        accuracy += noise_level * np.random.normal()
        accuracy = max(0, accuracy)
        # Save checkpoint
        if checkpoint_dir is not None:
            os.makedirs(os.path.join(checkpoint_dir), exist_ok=True)
            path = os.path.join(checkpoint_dir, "checkpoint.json")
            with open(path, "w") as f:
                f.write(json.dumps({"acc": accuracy, "step": step}))

        report(
            mean_accuracy=accuracy,
            cur_lr=lr,
            training_iteration=step,
            optimal_lr=optimal_lr,  # for debugging
            q_err=q_err,  # for debugging
            # done=accuracy > midpoint * 2  # this stops the training process
        )
        time.sleep(2)


if __name__ == "__main__":
    root = logging.getLogger()
    root.setLevel(logging.INFO)

    parser = argparse.ArgumentParser()
    parser.add_argument("--lr", type=float)
    parser.add_argument(f"--{ST_CHECKPOINT_DIR}", type=str)

    args, _ = parser.parse_known_args()

    params = vars(args)
    pbt_function(params)

For this toy example, PBT is run with a population size of 2, so only two parallel workers are needed. In order to use PBT competitively, choose the SageMaker backend. Note that PBT requires your training script to support checkpointing.

Visualize Tuning Progress with Tensorboard

examples/launch_tensorboard_example.py
"""
Example showing how to visualize the HPO process of Syne Tune with Tensorboard.
Results will be stored in ~/syne-tune/{tuner_name}/tensoboard_output. To start
tensorboard, execute in a separate shell:

.. code:: bash

   tensorboard --logdir  /~/syne-tune/{tuner_name}/tensorboard_output

Open the displayed URL in the browser.

To use this functionality you need to install tensorboardX:

.. code:: bash

   pip install tensorboardX

"""

import logging
from pathlib import Path

from syne_tune.backend import LocalBackend
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune import Tuner, StoppingCriterion
from syne_tune.config_space import randint
from syne_tune.callbacks.tensorboard_callback import TensorboardCallback
from syne_tune.results_callback import StoreResultsCallback
from examples.training_scripts.height_example.train_height import (
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
)

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.DEBUG)

    random_seed = 31415927
    max_steps = 100
    n_workers = 4

    config_space = {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
    }
    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    trial_backend = LocalBackend(entry_point=entry_point)

    # Random search without stopping
    scheduler = RandomSearch(
        config_space, mode=METRIC_MODE, metric=METRIC_ATTR, random_seed=random_seed
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=20)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        n_workers=n_workers,
        stop_criterion=stop_criterion,
        results_update_interval=5,
        # Adding the TensorboardCallback overwrites the default callback which consists of the StoreResultsCallback.
        # To write results on this disk as well, we put this in here as well.
        callbacks=[
            TensorboardCallback(target_metric=METRIC_ATTR, mode=METRIC_MODE),
            StoreResultsCallback(),
        ],
        tuner_name="tensorboardx-demo",
        metadata={"description": "just an example"},
    )

    tuner.run()

Requirements:

  • Needs tensorboardX to be installed: pip install tensorboardX.

Makes use of train_height.py.

Tensorboard visualization works by using a callback, for example TensorboardCallback, which is passed to the Tuner. In order to visualize other metrics, you may have to modify this callback.

Bayesian Optimization with Scikit-learn Based Surrogate Model

examples/launch_sklearn_surrogate_bo.py
import copy
from pathlib import Path
from typing import Tuple
import logging

import numpy as np
from sklearn.linear_model import BayesianRidge

from examples.training_scripts.height_example.train_height import (
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
)
from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint
from syne_tune.optimizer.schedulers import FIFOScheduler
from syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl import (
    EIAcquisitionFunction,
)
from syne_tune.optimizer.schedulers.searchers.sklearn import (
    SKLearnSurrogateSearcher,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn import (
    SKLearnEstimator,
    SKLearnPredictor,
)


class BayesianRidgePredictor(SKLearnPredictor):
    """
    Predictor for surrogate model given by ``sklearn.linear_model.BayesianRidge``.
    """

    def __init__(self, ridge: BayesianRidge):
        self.ridge = ridge

    def predict(self, X: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        return self.ridge.predict(X, return_std=True)


class BayesianRidgeEstimator(SKLearnEstimator):
    """
    Estimator for surrogate model given by ``sklearn.linear_model.BayesianRidge``.

    None of the parameters of ``BayesianRidge`` are exposed here, so they are all
    fixed up front.
    """

    def __init__(self, *args, **kwargs):
        self.ridge = BayesianRidge(*args, **kwargs)

    def fit(
        self, X: np.ndarray, y: np.ndarray, update_params: bool
    ) -> SKLearnPredictor:
        self.ridge.fit(X, y.ravel())
        return BayesianRidgePredictor(ridge=copy.deepcopy(self.ridge))


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927
    max_epochs = 100
    n_workers = 4

    config_space = {
        "width": randint(1, 20),
        "height": randint(1, 20),
        MAX_RESOURCE_ATTR: 100,
    }
    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    # We use ``FIFOScheduler`` with a specific searcher based on our surrogate
    # model
    searcher = SKLearnSurrogateSearcher(
        config_space=config_space,
        metric=METRIC_ATTR,
        estimator=BayesianRidgeEstimator(),
        scoring_class=EIAcquisitionFunction,
    )
    scheduler = FIFOScheduler(
        config_space,
        metric=METRIC_ATTR,
        mode=METRIC_MODE,
        max_resource_attr=MAX_RESOURCE_ATTR,
        searcher=searcher,
    )

    tuner = Tuner(
        trial_backend=LocalBackend(entry_point=entry_point),
        scheduler=scheduler,
        stop_criterion=StoppingCriterion(max_wallclock_time=60),
        n_workers=n_workers,
    )

    tuner.run()

Requirements:

  • Needs sckit-learn to be installed. If you installed Syne Tune with sklearn or basic, this dependence is included.

In this example, a simple new surrogate model is implemented based on sklearn.linear_model.BayesianRidge, and Bayesian optimization is run with this surrogate model rather than a Gaussian process model.

Launch HPO Experiment with Simulator Backend

examples/launch_nasbench201_simulated.py
"""
Example for running the simulator backend on a tabulated benchmark
"""
import logging

from syne_tune.experiments.benchmark_definitions.nas201 import nas201_benchmark
from syne_tune.blackbox_repository import BlackboxRepositoryBackend
from syne_tune.backend.simulator_backend.simulator_callback import SimulatorCallback
from syne_tune.optimizer.baselines import ASHA
from syne_tune import Tuner, StoppingCriterion


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927
    n_workers = 4
    dataset_name = "cifar100"
    benchmark = nas201_benchmark(dataset_name)

    # Simulator backend specialized to tabulated blackboxes
    max_resource_attr = benchmark.max_resource_attr
    trial_backend = BlackboxRepositoryBackend(
        elapsed_time_attr=benchmark.elapsed_time_attr,
        max_resource_attr=max_resource_attr,
        blackbox_name=benchmark.blackbox_name,
        dataset=dataset_name,
    )

    # Asynchronous successive halving (ASHA)
    blackbox = trial_backend.blackbox
    scheduler = ASHA(
        config_space=blackbox.configuration_space_with_max_resource_attr(
            max_resource_attr
        ),
        max_resource_attr=max_resource_attr,
        resource_attr=blackbox.fidelity_name(),
        mode=benchmark.mode,
        metric=benchmark.metric,
        search_options={"debug_log": False},
        random_seed=random_seed,
    )

    max_wallclock_time = 3600
    stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
    # Printing the status during tuning takes a lot of time, and so does
    # storing results.
    print_update_interval = 700
    results_update_interval = 300
    # It is important to set ``sleep_time`` to 0 here (mandatory for simulator
    # backend)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        sleep_time=0,
        results_update_interval=results_update_interval,
        print_update_interval=print_update_interval,
        # This callback is required in order to make things work with the
        # simulator callback. It makes sure that results are stored with
        # simulated time (rather than real time), and that the time_keeper
        # is advanced properly whenever the tuner loop sleeps
        callbacks=[SimulatorCallback()],
    )
    tuner.run()

Requirements:

  • Syne Tune dependencies blackbox-repository need to be installed.

  • Needs nasbench201 blackbox to be downloaded and preprocessed. This can take quite a while when done for the first time

  • If AWS SageMaker is used or an S3 bucket is accessible, the blackbox files are uploaded to your S3 bucket

In this example, we use the simulator backend with the NASBench-201 blackbox. Since time is simulated, we can use max_wallclock_time=3600 (one hour), but the experiment finishes in mere seconds. More details about the simulator backend is found in this tutorial.

Joint Tuning of Instance Type and Hyperparameters using MOASHA

examples/launch_moasha_instance_tuning.py
"""
Example showing how to tune instance types and hyperparameters with a Sagemaker Framework.
"""
import logging
from pathlib import Path

from sagemaker.huggingface import HuggingFace

from syne_tune import StoppingCriterion, Tuner
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.instance_info import select_instance_type
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
    get_execution_role,
    default_sagemaker_session,
)
from syne_tune.config_space import loguniform, choice
from syne_tune.constants import (
    ST_WORKER_TIME,
    ST_WORKER_COST,
    ST_INSTANCE_TYPE,
)
from syne_tune.optimizer.schedulers.multiobjective import MOASHA
from syne_tune.remote.constants import (
    DEFAULT_CPU_INSTANCE_SMALL,
    HUGGINGFACE_LATEST_FRAMEWORK_VERSION,
    HUGGINGFACE_LATEST_PYTORCH_VERSION,
    HUGGINGFACE_LATEST_TRANSFORMERS_VERSION,
    HUGGINGFACE_LATEST_PY_VERSION,
)
from syne_tune.remote.remote_launcher import RemoteLauncher

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    n_workers = 2
    epochs = 4

    # Select the instance types that are searched.
    # Alternatively, you can define the instance list explicitly:
    # :code:`instance_types = ["ml.c5.xlarge", "ml.m5.2xlarge"]`
    instance_types = select_instance_type(min_gpu=1, max_cost_per_hour=5.0)

    print(f"tuning over hyperparameters and instance types: {instance_types}")

    # define a search space that contains hyperparameters (learning-rate, weight-decay) and instance-type.
    config_space = {
        ST_INSTANCE_TYPE: choice(instance_types),
        "learning_rate": loguniform(1e-6, 1e-4),
        "weight_decay": loguniform(1e-5, 1e-2),
        "epochs": epochs,
        "dataset_path": "./",
    }
    entry_point = (
        Path(__file__).parent.parent
        / "benchmarking"
        / "training_scripts"
        / "distilbert_on_imdb"
        / "distilbert_on_imdb.py"
    )
    metric = "accuracy"

    # Define a MOASHA scheduler that searches over the config space to maximise accuracy and minimize cost and time.
    scheduler = MOASHA(
        max_t=epochs,
        time_attr="step",
        metrics=[metric, ST_WORKER_COST, ST_WORKER_TIME],
        mode=["max", "min", "min"],
        config_space=config_space,
    )

    # Define the training function to be tuned, use the Sagemaker backend to execute trials as separate training job
    # (since they are quite expensive).
    trial_backend = SageMakerBackend(
        sm_estimator=HuggingFace(
            framework_version=HUGGINGFACE_LATEST_FRAMEWORK_VERSION,
            transformers_version=HUGGINGFACE_LATEST_TRANSFORMERS_VERSION,
            pytorch_version=HUGGINGFACE_LATEST_PYTORCH_VERSION,
            py_version=HUGGINGFACE_LATEST_PY_VERSION,
            entry_point=str(entry_point),
            base_job_name="hpo-transformer",
            # instance-type given here are override by Syne Tune with values sampled from ST_INSTANCE_TYPE.
            instance_type=DEFAULT_CPU_INSTANCE_SMALL,
            instance_count=1,
            max_run=3600,
            role=get_execution_role(),
            dependencies=[str(Path(__file__).parent.parent / "benchmarking")],
            sagemaker_session=default_sagemaker_session(),
            disable_profiler=True,
            debugger_hook_config=False,
        ),
    )

    remote_launcher = RemoteLauncher(
        tuner=Tuner(
            trial_backend=trial_backend,
            scheduler=scheduler,
            stop_criterion=StoppingCriterion(max_wallclock_time=3600, max_cost=10.0),
            n_workers=n_workers,
            sleep_time=5.0,
        ),
        dependencies=[str(Path(__file__).parent.parent / "benchmarking")],
    )

    remote_launcher.run(wait=False)

Requirements:

  • Needs code from benchmarking.training_scripts.distilbert_on_imdb,

  • which requires Syne Tune to be installed from source.

  • Access to AWS SageMaker

  • Runs training jobs on instances of type ml.g4dn.xlarge, ml.g5.xlarge, ml.g4dn.2xlarge, ml.p2.xlarge, ml.g5.2xlarge, ml.g5.4xlarge, ml.g4dn.4xlarge, ml.g5.8xlarge, ml.g4dn.8xlarge, ml.p3.2xlarge, ml.g5.16xlarge. This list of instances types to be searched over can be modified by the user

In this example, we use the SageMaker backend together with the SageMaker Hugging Face framework in order to fine-tune a DistilBERT model on the IMDB sentiment classification task:

  • Instead of optimizing a single objective, we use MOASHA in order to sample the Pareto frontier w.r.t. three objectives

  • We not only tune hyperparameters such as learning rate and weight decay, but also the AWS instance type to be used for training. Here, one of the objectives to minimize is the training cost (in dollars).

Multi-objective Asynchronous Successive Halving (MOASHA)

examples/launch_height_moasha.py
"""
Example showing how to tune multiple objectives at once of an artificial function.
"""
import logging
from pathlib import Path

import numpy as np

from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers.multiobjective import MOASHA
from syne_tune import Tuner, StoppingCriterion
from syne_tune.config_space import uniform


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    np.random.seed(0)

    max_steps = 27
    n_workers = 4

    config_space = {
        "steps": max_steps,
        "theta": uniform(0, np.pi / 2),
        "sleep_time": 0.01,
    }
    entry_point = (
        Path(__file__).parent
        / "training_scripts"
        / "mo_artificial"
        / "mo_artificial.py"
    )
    mode = "min"

    np.random.seed(0)
    scheduler = MOASHA(
        max_t=max_steps,
        time_attr="step",
        mode=mode,
        metrics=["y1", "y2"],
        config_space=config_space,
    )
    trial_backend = LocalBackend(entry_point=str(entry_point))

    stop_criterion = StoppingCriterion(max_wallclock_time=20)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        sleep_time=0.5,
    )
    tuner.run()

This launcher script is using the following mo_artificial.py training script:

examples/training_scripts/mo_artificial/mo_artificial.py
import time
from argparse import ArgumentParser

import numpy as np

from syne_tune import Reporter


def f(t, theta):
    # Function drawing upper-right circles with radius set to ``t`` and with center set at
    # (-t, -t). ``t`` is interpreted as a fidelity and larger ``t`` corresponds to larger radius and better candidates.
    # The optimal multiobjective solution should select theta uniformly from [0, pi/2].
    return {
        "y1": -t + t * np.cos(theta),
        "y2": -t + t * np.sin(theta),
    }


def plot_function():
    import matplotlib.pyplot as plt

    ts = np.linspace(0, 27, num=5)
    thetas = np.linspace(0, 1) * np.pi / 2
    y1s = []
    y2s = []
    for t in ts:
        for theta in thetas:
            res = f(t, theta)
            y1s.append(res["y1"])
            y2s.append(res["y2"])
    plt.scatter(y1s, y2s)
    plt.show()


if __name__ == "__main__":
    # plot_function()
    parser = ArgumentParser()
    parser.add_argument("--steps", type=int, required=True)
    parser.add_argument("--theta", type=float, required=True)
    parser.add_argument("--sleep_time", type=float, required=False, default=0.1)
    args, _ = parser.parse_known_args()

    assert 0 <= args.theta < np.pi / 2
    reporter = Reporter()
    for step in range(args.steps):
        y = f(t=step, theta=args.theta)
        reporter(step=step, **y)
        time.sleep(args.sleep_time)

PASHA: Efficient HPO and NAS with Progressive Resource Allocation

examples/launch_pasha_nasbench201.py
"""
Example for running PASHA on NASBench201
"""
import logging

from syne_tune.experiments.benchmark_definitions.nas201 import nas201_benchmark
from syne_tune.blackbox_repository import BlackboxRepositoryBackend
from syne_tune.backend.simulator_backend.simulator_callback import SimulatorCallback
from syne_tune.optimizer.baselines import PASHA
from syne_tune import Tuner, StoppingCriterion


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.WARNING)

    random_seed = 1
    nb201_random_seed = 0
    n_workers = 4
    dataset_name = "cifar100"
    benchmark = nas201_benchmark(dataset_name)

    # simulator backend specialized to tabulated blackboxes
    max_resource_attr = benchmark.max_resource_attr
    trial_backend = BlackboxRepositoryBackend(
        blackbox_name=benchmark.blackbox_name,
        elapsed_time_attr=benchmark.elapsed_time_attr,
        max_resource_attr=max_resource_attr,
        dataset=dataset_name,
        seed=nb201_random_seed,
    )

    blackbox = trial_backend.blackbox
    scheduler = PASHA(
        config_space=blackbox.configuration_space_with_max_resource_attr(
            max_resource_attr
        ),
        max_resource_attr=max_resource_attr,
        resource_attr=blackbox.fidelity_name(),
        mode=benchmark.mode,
        metric=benchmark.metric,
        random_seed=random_seed,
    )

    max_num_trials_started = 256
    stop_criterion = StoppingCriterion(max_num_trials_started=max_num_trials_started)
    # printing the status during tuning takes a lot of time, and so does
    # storing results
    print_update_interval = 700
    results_update_interval = 300
    # it is important to set ``sleep_time`` to 0 here (mandatory for simulator
    # backend)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        sleep_time=0,
        results_update_interval=results_update_interval,
        print_update_interval=print_update_interval,
        # this callback is required in order to make things work with the
        # simulator callback. It makes sure that results are stored with
        # simulated time (rather than real time), and that the time_keeper
        # is advanced properly whenever the tuner loop sleeps
        callbacks=[SimulatorCallback()],
    )

    tuner.run()

Requirements:

  • Syne Tune dependencies blackbox-repository need to be installed.

  • Needs nasbench201 blackbox to be downloaded and preprocessed. This can take quite a while when done for the first time

PASHA typically uses max_num_trials_completed as the stopping criterion. After finding a strong configuration using PASHA, the next step is to fully train a model with the configuration.

Constrained Bayesian Optimization

examples/launch_bayesopt_constrained.py
"""
Example for running constrained Bayesian optimization on a toy example
"""
import logging
from pathlib import Path

from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import FIFOScheduler
from syne_tune.config_space import uniform
from syne_tune import StoppingCriterion, Tuner


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927
    n_workers = 2

    config_space = {
        "x1": uniform(-5, 10),
        "x2": uniform(0, 15),
        "constraint_offset": 1.0,  # the lower, the stricter
    }

    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "constrained_hpo"
        / "train_constrained_example.py"
    )
    mode = "max"
    metric = "objective"
    constraint_attr = "my_constraint_metric"

    # Local backend
    trial_backend = LocalBackend(entry_point=entry_point)

    # Bayesian constrained optimization:
    #   :math:`max_x f(x),   \mathrm{s.t.} c(x) <= 0`
    # Here, ``metric`` represents :math:`f(x)`, ``constraint_attr`` represents
    # :math:`c(x)`.
    search_options = {
        "num_init_random": n_workers,
        "constraint_attr": constraint_attr,
    }
    scheduler = FIFOScheduler(
        config_space,
        searcher="bayesopt_constrained",
        search_options=search_options,
        mode=mode,
        metric=metric,
        random_seed=random_seed,
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=20)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
    )

    tuner.run()

This launcher script is using the following train_constrained_example.py training script:

examples/training_scripts/constrained_hpo/train_constrained_example.py
import logging
import numpy as np

from syne_tune import Reporter
from argparse import ArgumentParser


report = Reporter()


if __name__ == "__main__":
    root = logging.getLogger()
    root.setLevel(logging.DEBUG)

    parser = ArgumentParser()
    parser.add_argument("--x1", type=float)
    parser.add_argument("--x2", type=float)
    parser.add_argument("--constraint_offset", type=float)

    args, _ = parser.parse_known_args()

    x1 = args.x1
    x2 = args.x2
    constraint_offset = args.constraint_offset
    r = 6
    objective_value = (
        (x2 - (5.1 / (4 * np.pi**2)) * x1**2 + (5 / np.pi) * x1 - r) ** 2
        + 10 * (1 - 1 / (8 * np.pi)) * np.cos(x1)
        + 10
    )
    constraint_value = (
        x1 * 2.0 - constraint_offset
    )  # feasible iff x1 <= 0.5 * constraint_offset
    report(objective=-objective_value, my_constraint_metric=constraint_value)

Restrict Scheduler to Tabulated Configurations with Simulator Backend

examples/launch_lcbench_simulated.py
"""
Example for running the simulator backend on the "lcbench" tabulated
benchmark. The scheduler is restricted to work with the configurations
which have been evaluated under the benchmark.
"""
import logging

from syne_tune.experiments.benchmark_definitions.lcbench import lcbench_benchmark
from syne_tune.blackbox_repository import BlackboxRepositoryBackend
from syne_tune.backend.simulator_backend.simulator_callback import SimulatorCallback
from syne_tune.optimizer.baselines import BayesianOptimization
from syne_tune import Tuner, StoppingCriterion


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927
    n_workers = 4
    dataset_name = "airlines"
    benchmark = lcbench_benchmark(dataset_name)

    # Simulator backend specialized to tabulated blackboxes
    # Note: Even though ``lcbench_benchmark`` defines a surrogate, we
    # do not use this here
    max_resource_attr = benchmark.max_resource_attr
    trial_backend = BlackboxRepositoryBackend(
        elapsed_time_attr=benchmark.elapsed_time_attr,
        max_resource_attr=max_resource_attr,
        blackbox_name=benchmark.blackbox_name,
        dataset=dataset_name,
    )

    # GP-based Bayesian optimization
    # Using ``restrict_configurations``, we restrict the scheduler to only
    # suggest configurations which have observations in the tabulated
    # blackbox
    blackbox = trial_backend.blackbox
    restrict_configurations = blackbox.all_configurations()
    scheduler = BayesianOptimization(
        config_space=blackbox.configuration_space_with_max_resource_attr(
            max_resource_attr
        ),
        max_resource_attr=max_resource_attr,
        mode=benchmark.mode,
        metric=benchmark.metric,
        random_seed=random_seed,
        search_options=dict(restrict_configurations=restrict_configurations),
    )

    max_wallclock_time = 3600
    stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
    # Printing the status during tuning takes a lot of time, and so does
    # storing results.
    print_update_interval = 700
    results_update_interval = 300
    # It is important to set ``sleep_time`` to 0 here (mandatory for simulator
    # backend)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        sleep_time=0,
        results_update_interval=results_update_interval,
        print_update_interval=print_update_interval,
        # This callback is required in order to make things work with the
        # simulator callback. It makes sure that results are stored with
        # simulated time (rather than real time), and that the time_keeper
        # is advanced properly whenever the tuner loop sleeps
        callbacks=[SimulatorCallback()],
    )
    tuner.run()

Requirements:

  • Syne Tune dependencies blackbox-repository need to be installed.

  • Needs lcbench blackbox to be downloaded and preprocessed. This can take quite a while when done for the first time

  • If AWS SageMaker is used or an S3 bucket is accessible, the blackbox files are uploaded to your S3 bucket

This example is similar to the one above, but here we use the tabulated LCBench benchmark, whose configuration space is infinite, and whose objective values have not been evaluated on a grid. With such a benchmark, we can either use a surrogate to interpolate objective values, or we can restrict the scheduler to only suggest configurations which have been observed in the benchmark. This example demonstrates the latter.

Since time is simulated, we can use max_wallclock_time=3600 (one hour), but the experiment finishes in mere seconds. More details about the simulator backend is found in this tutorial.

Tuning Reinforcement Learning

examples/launch_rl_tuning.py
"""
This launches a local HPO tuning the discount factor of PPO on cartpole.
To run this example, you should have installed dependencies in ``requirements.txt``.
"""
import logging
from pathlib import Path

import numpy as np

from syne_tune.backend import LocalBackend
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import ASHA
import syne_tune.config_space as sp
from syne_tune import Tuner, StoppingCriterion

if __name__ == "__main__":

    logging.getLogger().setLevel(logging.DEBUG)
    np.random.seed(0)
    max_steps = 100
    metric = "episode_reward_mean"
    mode = "max"
    max_resource_attr = "max_iterations"

    trial_backend = LocalBackend(
        entry_point=Path(__file__).parent
        / "training_scripts"
        / "rl_cartpole"
        / "train_cartpole.py"
    )

    scheduler = ASHA(
        config_space={
            max_resource_attr: max_steps,
            "gamma": sp.uniform(0.5, 0.99),
            "lr": sp.loguniform(1e-6, 1e-3),
        },
        metric=metric,
        mode=mode,
        max_resource_attr=max_resource_attr,
        resource_attr="training_iter",
        search_options={"debug_log": False},
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=60)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=2,
    )

    tuner.run()

    tuning_experiment = load_experiment(tuner.name)

    print(f"best result found: {tuning_experiment.best_config()}")

    tuning_experiment.plot()

This launcher script is using the following train_cartpole.py training script:

examples/training_scripts/rl_cartpole/train_cartpole.py
"""
Adapts the introductionary example of rllib that trains a Cartpole with PPO.
https://docs.ray.io/en/master/rllib/index.html
The input arguments learning-rate and gamma discount factor can be tuned for maximizing the episode mean reward.
"""
from argparse import ArgumentParser
from syne_tune import Reporter
from ray.rllib.algorithms.ppo import PPO

if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument("--max_training_steps", type=int, default=100)
    parser.add_argument("--lr", type=float, default=5e-5)
    parser.add_argument("--gamma", type=float, default=0.99)
    args, _ = parser.parse_known_args()

    # Configure the algorithm.
    config = {
        # Environment (RLlib understands openAI gym registered strings).
        "env": "CartPole-v0",
        "num_workers": 2,
        # Use "tf" for TensorFlow, "torch" for PyTorch, "tf2" for
        # tf2.x eager execution
        "framework": "torch",
        "gamma": args.gamma,
        "lr": args.lr,
    }

    trainer = PPO(config=config)

    reporter = Reporter()
    # Run it for n max_training_steps iterations. A training iteration includes
    # parallel sample collection by the environment workers as well as
    # loss calculation on the collected batch and a model update.
    # Episode reward mean is reported each time.
    for i in range(args.max_training_steps):
        results = trainer.train()
        reporter(
            training_iter=i + 1,
            episode_reward_mean=results["episode_reward_mean"],
        )

This training script requires the following dependencies to be installed:

examples/training_scripts/rl_cartpole/requirements.txt
tensorboardX==2.5.1
opencv-python
ray[rllib]==2.9.1
dm-tree==0.1.8
gymnasium==0.28.1
tensorflow==2.11.1
pygame==2.1.2

Launch HPO Experiment with SageMaker Backend

examples/launch_height_sagemaker.py
"""
Example showing how to run on Sagemaker with a Sagemaker Framework.
"""
import logging
import os
from pathlib import Path

from sagemaker.pytorch import PyTorch

from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
    get_execution_role,
    default_sagemaker_session,
)
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
)
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune.remote.constants import (
    DEFAULT_CPU_INSTANCE_SMALL,
    PYTORCH_LATEST_FRAMEWORK,
    PYTORCH_LATEST_PY_VERSION,
)

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927
    max_steps = 100
    n_workers = 4
    max_wallclock_time = 5 * 60

    config_space = {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
    }
    entry_point = (
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    # Random search without stopping
    scheduler = RandomSearch(
        config_space, mode=METRIC_MODE, metric=METRIC_ATTR, random_seed=random_seed
    )
    if "AWS_DEFAULT_REGION" not in os.environ:
        os.environ["AWS_DEFAULT_REGION"] = "us-west-2"

    trial_backend = SageMakerBackend(
        # we tune a PyTorch Framework from Sagemaker
        sm_estimator=PyTorch(
            instance_type=DEFAULT_CPU_INSTANCE_SMALL,
            instance_count=1,
            framework_version=PYTORCH_LATEST_FRAMEWORK,
            py_version=PYTORCH_LATEST_PY_VERSION,
            entry_point=str(entry_point),
            role=get_execution_role(),
            max_run=10 * 60,
            sagemaker_session=default_sagemaker_session(),
            disable_profiler=True,
            debugger_hook_config=False,
        ),
        # names of metrics to track. Each metric will be detected by Sagemaker if it is written in the
        # following form: "[RMSE]: 1.2", see in train_main_example how metrics are logged for an example
        metrics_names=[METRIC_ATTR],
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        sleep_time=5.0,
        tuner_name="hpo-hyperband",
    )

    tuner.run()

Requirements:

Makes use of train_height.py.

SageMaker Backend and Checkpointing

examples/launch_height_sagemaker_checkpoints.py
import logging
from pathlib import Path

from sagemaker.pytorch import PyTorch

from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
    get_execution_role,
    default_sagemaker_session,
)
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
    RESOURCE_ATTR,
)
from syne_tune.optimizer.baselines import ASHA
from syne_tune.remote.constants import (
    DEFAULT_CPU_INSTANCE_SMALL,
    PYTORCH_LATEST_FRAMEWORK,
    PYTORCH_LATEST_PY_VERSION,
)

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927
    max_steps = 100
    n_workers = 4
    delete_checkpoints = True
    max_wallclock_time = 5 * 60

    config_space = {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
    }
    entry_point = (
        Path(__file__).parent
        / "training_scripts"
        / "checkpoint_example"
        / "train_height_checkpoint.py"
    )

    # ASHA promotion
    scheduler = ASHA(
        config_space,
        metric=METRIC_ATTR,
        mode=METRIC_MODE,
        max_resource_attr=MAX_RESOURCE_ATTR,
        resource_attr=RESOURCE_ATTR,
        type="promotion",
        search_options={"debug_log": True},
    )
    # SageMaker backend: We use the warm pool feature here
    trial_backend = SageMakerBackend(
        sm_estimator=PyTorch(
            instance_type=DEFAULT_CPU_INSTANCE_SMALL,
            instance_count=1,
            framework_version=PYTORCH_LATEST_FRAMEWORK,
            py_version=PYTORCH_LATEST_PY_VERSION,
            entry_point=str(entry_point),
            role=get_execution_role(),
            max_run=10 * 60,
            sagemaker_session=default_sagemaker_session(),
            disable_profiler=True,
            debugger_hook_config=False,
            keep_alive_period_in_seconds=60,  # warm pool feature
        ),
        metrics_names=[METRIC_ATTR],
        delete_checkpoints=delete_checkpoints,
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        sleep_time=5.0,
        tuner_name="height-sagemaker-checkpoints",
        start_jobs_without_delay=False,
    )

    tuner.run()

Requirements:

This launcher script is using the following train_height_checkpoint.py training script:

examples/training_scripts/checkpoint_example/train_height_checkpoint.py
import logging
import time
from typing import Optional, Dict, Any
import json
from pathlib import Path
import os
import numpy as np

from syne_tune import Reporter
from argparse import ArgumentParser

from syne_tune.config_space import randint
from syne_tune.constants import ST_CHECKPOINT_DIR


report = Reporter()


RESOURCE_ATTR = "epoch"

METRIC_ATTR = "mean_loss"

METRIC_MODE = "min"

MAX_RESOURCE_ATTR = "steps"


def load_checkpoint(checkpoint_path: Path) -> Dict[str, Any]:
    with open(checkpoint_path, "r") as f:
        return json.load(f)


def save_checkpoint(checkpoint_path: Path, epoch: int, value: float):
    os.makedirs(checkpoint_path.parent, exist_ok=True)
    with open(checkpoint_path, "w") as f:
        json.dump({"epoch": epoch, "value": value}, f)


def train_height_delta(step: int, width: float, height: float, value: float) -> float:
    """
    For the original example, we have that

    .. math::
       f(t + 1) - f(t) = f(t) \cdot \frac{w}{10 + w \cdot t},

       f(0) = 10 + h / 10

    We implement an incremental version with a stochastic term.

    :param step: Step t, nonnegative int
    :param width: Width w, nonnegative
    :param height: Height h
    :param value: Value :math:`f(t - 1)` if :math:`t > 0`
    :return: New value :math:`f(t)`
    """
    u = 1.0 - 0.1 * np.random.rand()  # uniform(0.9, 1) multiplier
    if step == 0:
        return u * 10 + 0.1 * height
    else:
        return value * (1.0 + u * width / (width * (step - 1) + 10))


def height_config_space(
    max_steps: int, sleep_time: Optional[float] = None
) -> Dict[str, Any]:
    kwargs = {"sleep_time": sleep_time} if sleep_time is not None else dict()
    return {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
        **kwargs,
    }


if __name__ == "__main__":
    root = logging.getLogger()
    root.setLevel(logging.INFO)

    parser = ArgumentParser()
    parser.add_argument("--" + MAX_RESOURCE_ATTR, type=int)
    parser.add_argument("--width", type=float)
    parser.add_argument("--height", type=float)
    parser.add_argument("--sleep_time", type=float, default=0.1)
    parser.add_argument(f"--{ST_CHECKPOINT_DIR}", type=str)

    args, _ = parser.parse_known_args()

    width = args.width
    height = args.height
    checkpoint_dir = getattr(args, ST_CHECKPOINT_DIR)
    num_steps = getattr(args, MAX_RESOURCE_ATTR)
    start_step = 0
    value = 0.0
    if checkpoint_dir is not None:
        checkpoint_path = Path(checkpoint_dir) / "checkpoint.json"
        if checkpoint_path.exists():
            state = load_checkpoint(checkpoint_path)
            start_step = state["epoch"]
            value = state["value"]
    else:
        checkpoint_path = None

    for step in range(start_step, num_steps):
        # Sleep first, since results are returned at end of "epoch"
        time.sleep(args.sleep_time)
        # Feed the score back to Syne Tune.
        value = train_height_delta(step, width, height, value)
        epoch = step + 1
        if checkpoint_path is not None:
            save_checkpoint(checkpoint_path, epoch, value)
        report(
            **{
                "step": step,
                METRIC_ATTR: value,
                RESOURCE_ATTR: epoch,
            }
        )

Note that SageMakerBackend is configured to use SageMaker managed warm pools:

  • keep_alive_period_in_seconds=300 in the definition of the SageMaker estimator

  • start_jobs_without_delay=False when creating Tuner

Managed warm pools reduce both start-up and stop delays substantially, they are strongly recommended for multi-fidelity HPO with the SageMaker backend. More details are found in this tutorial.

Retrieving the Best Checkpoint

examples/launch_checkpoint_example.py
"""
An example showing how to retrieve the best checkpoint of an XGBoost model.
The script being tuned ``xgboost_checkpoint.py`` stores the checkpoint obtained after each trial evaluation.
After the tuning is done, this example loads the best checkpoint and evaluate the model.
"""

import logging
from pathlib import Path

from examples.training_scripts.xgboost.xgboost_checkpoint import evaluate_accuracy
from syne_tune.backend import LocalBackend
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import BayesianOptimization
from syne_tune import Tuner, StoppingCriterion
import syne_tune.config_space as cs


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    n_workers = 4

    config_space = {
        "max_depth": cs.randint(2, 5),
        "gamma": cs.uniform(1, 9),
        "reg_lambda": cs.loguniform(1e-6, 1),
        "n_estimators": cs.randint(1, 10),
    }

    entry_point = (
        Path(__file__).parent / "training_scripts" / "xgboost" / "xgboost_checkpoint.py"
    )

    trial_backend = LocalBackend(entry_point=str(entry_point))

    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=BayesianOptimization(config_space, metric="merror", mode="min"),
        stop_criterion=StoppingCriterion(max_wallclock_time=10),
        n_workers=n_workers,
    )

    tuner.run()

    exp = load_experiment(tuner.name)
    best_config = exp.best_config()
    checkpoint = trial_backend.checkpoint_trial_path(best_config["trial_id"])
    assert checkpoint.exists()

    print(f"Best config found {best_config} checkpointed at {checkpoint}")

    print(
        f"Retrieve best checkpoint and evaluate accuracy of best model: "
        f"found {evaluate_accuracy(checkpoint_dir=checkpoint)}"
    )

This launcher script is using the following xgboost_checkpoint.py training script:

examples/training_scripts/xgboost/xgboost_checkpoint.py
import os
from argparse import ArgumentParser
from pathlib import Path

import numpy as np
import xgboost
from sklearn.datasets import load_digits

from syne_tune import Reporter
from syne_tune.constants import ST_CHECKPOINT_DIR


class SyneTuneCallback(xgboost.callback.TrainingCallback):
    def __init__(self, error_metric: str) -> None:
        self.reporter = Reporter()
        self.error_metric = error_metric

    def after_iteration(self, model, epoch, evals_log):
        metrics = list(evals_log.values())[-1][self.error_metric]
        self.reporter(**{self.error_metric: metrics[-1]})
        pass


def train(
    checkpoint_dir: str,
    n_estimators: int,
    max_depth: int,
    gamma: float,
    reg_lambda: float,
    early_stopping_rounds: int = 5,
) -> None:
    eval_metric = "merror"
    early_stop = xgboost.callback.EarlyStopping(
        rounds=early_stopping_rounds, save_best=True
    )
    X, y = load_digits(return_X_y=True)

    clf = xgboost.XGBClassifier(
        n_estimators=n_estimators,
        reg_lambda=reg_lambda,
        gamma=gamma,
        max_depth=max_depth,
        eval_metric=eval_metric,
        callbacks=[early_stop, SyneTuneCallback(error_metric=eval_metric)],
    )
    clf.fit(
        X,
        y,
        eval_set=[(X, y)],
    )
    print("Total boosted rounds:", clf.get_booster().num_boosted_rounds())

    save_model(clf, checkpoint_dir=checkpoint_dir)


def save_model(clf, checkpoint_dir):
    checkpoint_dir.mkdir(parents=True, exist_ok=True)
    path = os.path.join(checkpoint_dir, "model.json")
    clf.save_model(path)


def load_model(checkpoint_dir):
    path = os.path.join(checkpoint_dir, "model.json")
    loaded = xgboost.XGBClassifier()
    loaded.load_model(path)
    return loaded


def evaluate_accuracy(checkpoint_dir):
    X, y = load_digits(return_X_y=True)

    clf = load_model(checkpoint_dir=checkpoint_dir)
    y_pred = clf.predict(X)
    return (np.equal(y, y_pred) * 1.0).mean()


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument("--max_depth", type=int, required=False, default=1)
    parser.add_argument("--gamma", type=float, required=False, default=2)
    parser.add_argument("--reg_lambda", type=float, required=False, default=0.001)
    parser.add_argument("--n_estimators", type=int, required=False, default=10)
    parser.add_argument(f"--{ST_CHECKPOINT_DIR}", type=str, default="./")

    args, _ = parser.parse_known_args()

    checkpoint_dir = Path(vars(args)[ST_CHECKPOINT_DIR])

    train(
        checkpoint_dir=checkpoint_dir,
        max_depth=args.max_depth,
        gamma=args.gamma,
        reg_lambda=args.reg_lambda,
        n_estimators=args.n_estimators,
    )

Launch with SageMaker Backend and Custom Docker Image

examples/launch_height_sagemaker_custom_image.py
"""
Example showing how to run on Sagemaker with a custom docker image.
"""
import logging
from pathlib import Path

from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.custom_framework import CustomFramework
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
    get_execution_role,
    default_sagemaker_session,
)
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
)
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune.remote.constants import DEFAULT_CPU_INSTANCE_SMALL

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927
    max_steps = 100
    n_workers = 4

    config_space = {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
    }
    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    # Random search without stopping
    scheduler = RandomSearch(
        config_space, mode=METRIC_MODE, metric=METRIC_ATTR, random_seed=random_seed
    )

    # indicate here an image_uri that is available in ecr, something like that "XXXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/my_image:latest"
    image_uri = ...

    trial_backend = SageMakerBackend(
        sm_estimator=CustomFramework(
            entry_point=entry_point,
            instance_type=DEFAULT_CPU_INSTANCE_SMALL,
            instance_count=1,
            role=get_execution_role(),
            image_uri=image_uri,
            max_run=10 * 60,
            job_name_prefix="hpo-hyperband",
            sagemaker_session=default_sagemaker_session(),
            disable_profiler=True,
            debugger_hook_config=False,
        ),
        # names of metrics to track. Each metric will be detected by Sagemaker if it is written in the
        # following form: "[RMSE]: 1.2", see in train_main_example how metrics are logged for an example
        metrics_names=[METRIC_ATTR],
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=600)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        sleep_time=5.0,
    )

    tuner.run()

Requirements:

  • Access to AWS SageMaker.

  • This example is incomplete. If your training script has dependencies which you would to provide as a Docker image, you need to upload it to ECR, after which you can refer to it with image_uri.

Makes use of train_height.py.

Launch Experiments Remotely on SageMaker

examples/launch_height_sagemaker_remotely.py
"""
This example show how to launch a tuning job that will be executed on Sagemaker rather than on your local machine.
"""
import logging
from pathlib import Path
from argparse import ArgumentParser

from sagemaker.pytorch import PyTorch

from syne_tune import StoppingCriterion, Tuner
from syne_tune.backend import LocalBackend
from syne_tune.backend import SageMakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
    get_execution_role,
    default_sagemaker_session,
)
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
)
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune.remote.constants import (
    DEFAULT_CPU_INSTANCE_SMALL,
    PYTORCH_LATEST_FRAMEWORK,
    PYTORCH_LATEST_PY_VERSION,
)
from syne_tune.remote.remote_launcher import RemoteLauncher

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    parser = ArgumentParser()
    parser.add_argument("--use_sagemaker_backend", type=int, default=0)
    args = parser.parse_args()
    use_sagemaker_backend = bool(args.use_sagemaker_backend)

    max_steps = 100
    n_workers = 4

    config_space = {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
    }
    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    # We can use the local or sagemaker backend when tuning remotely.
    # Using the local backend means that the remote instance will evaluate the trials locally.
    # Using the sagemaker backend means the remote instance will launch one sagemaker job per trial.
    if use_sagemaker_backend:
        trial_backend = SageMakerBackend(
            sm_estimator=PyTorch(
                instance_type=DEFAULT_CPU_INSTANCE_SMALL,
                instance_count=1,
                framework_version=PYTORCH_LATEST_FRAMEWORK,
                py_version=PYTORCH_LATEST_PY_VERSION,
                entry_point=entry_point,
                role=get_execution_role(),
                max_run=10 * 60,
                base_job_name="hpo-height",
                sagemaker_session=default_sagemaker_session(),
                disable_profiler=True,
                debugger_hook_config=False,
            ),
        )
    else:
        trial_backend = LocalBackend(entry_point=entry_point)

    num_seeds = 1 if use_sagemaker_backend else 2
    for seed in range(num_seeds):
        # Random search without stopping
        scheduler = RandomSearch(
            config_space, mode=METRIC_MODE, metric=METRIC_ATTR, random_seed=seed
        )

        tuner = RemoteLauncher(
            tuner=Tuner(
                trial_backend=trial_backend,
                scheduler=scheduler,
                n_workers=n_workers,
                tuner_name="height-tuning",
                stop_criterion=StoppingCriterion(max_wallclock_time=600),
            ),
            # Extra arguments describing the resource of the remote tuning instance and whether we want to wait
            # the tuning to finish. The instance-type where the tuning job runs can be different than the
            # instance-type used for evaluating the training jobs.
            instance_type=DEFAULT_CPU_INSTANCE_SMALL,
            # We can specify a custom container to use with this launcher with <image_uri=TK>
            # otherwise a sagemaker pre-build will be used
        )

        tuner.run(wait=False)

Requirements:

Makes use of train_height.py.

This launcher script starts the HPO experiment as SageMaker training job, which allows you to select any instance type you like, while not having your local machine being blocked. This tutorial explains how to run many such remote experiments in parallel, so to speed up comparisons between alternatives.

Launch HPO Experiment with Home-Made Scheduler

examples/launch_height_standalone_scheduler.py
"""
Example showing how to implement a new Scheduler.
"""
import logging
from pathlib import Path
from typing import Optional, List, Dict, Any

import numpy as np

from syne_tune.backend import LocalBackend
from syne_tune.backend.trial_status import Trial
from syne_tune.optimizer.scheduler import (
    TrialScheduler,
    SchedulerDecision,
    TrialSuggestion,
)
from syne_tune.tuner import Tuner
from syne_tune.stopping_criterion import StoppingCriterion
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
)


class SimpleScheduler(TrialScheduler):
    def __init__(
        self, config_space: Dict[str, Any], metric: str, mode: Optional[str] = None
    ):
        super(SimpleScheduler, self).__init__(config_space=config_space)
        self.metric = metric
        self.mode = mode if mode is not None else "min"
        self.sorted_results = []

    def _suggest(self, trial_id: int) -> Optional[TrialSuggestion]:
        # Called when a slot exists to run a trial, here we simply draw a
        # random candidate.
        config = {
            k: v.sample() if hasattr(v, "sample") else v
            for k, v in self.config_space.items()
        }
        return TrialSuggestion.start_suggestion(config)

    def on_trial_result(self, trial: Trial, result: Dict[str, Any]) -> str:
        # Given a new result, we decide whether the trial should stop or continue.
        # In this case, we implement a naive strategy that stops if the result is worse than 80% of previous results.
        # This is a naive strategy as we do not account for the fact that trial improves with more steps.

        new_metric = result[self.metric]

        # insert new metric in sorted results
        index = np.searchsorted(self.sorted_results, new_metric)
        self.sorted_results = np.insert(self.sorted_results, index, new_metric)
        normalized_rank = index / float(len(self.sorted_results))

        if self.mode == "max":
            normalized_rank = 1 - normalized_rank

        if normalized_rank < 0.8:
            return SchedulerDecision.CONTINUE
        else:
            logging.info(
                f"see new results {new_metric} which rank {normalized_rank * 100}%, "
                f"stopping it as it does not rank on the top 80%"
            )
            return SchedulerDecision.STOP

    def metric_names(self) -> List[str]:
        return [self.metric]


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.DEBUG)

    random_seed = 31415927
    max_steps = 100
    n_workers = 4

    config_space = {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
    }
    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    # Local backend
    trial_backend = LocalBackend(entry_point=entry_point)

    np.random.seed(random_seed)
    scheduler = SimpleScheduler(
        config_space=config_space, metric=METRIC_ATTR, mode=METRIC_MODE
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=20)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
    )

    tuner.run()

Makes use of train_height.py.

For a more thorough introduction on how to develop new schedulers and searchers in Syne Tune, consider this tutorial.

Launch HPO Experiment on mlp_fashionmnist Benchmark

examples/launch_fashionmnist.py
"""
Example for how to tune one of the benchmarks.
"""
import logging

from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import HyperbandScheduler
from syne_tune import Tuner, StoppingCriterion

from benchmarking.benchmark_definitions.mlp_on_fashionmnist import (
    mlp_fashionmnist_benchmark,
)


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.DEBUG)

    # We pick the MLP on FashionMNIST benchmark
    # The 'benchmark' dict contains arguments needed by scheduler and
    # searcher (e.g., 'mode', 'metric'), along with suggested default values
    # for other arguments (which you are free to override)
    random_seed = 31415927
    n_workers = 4
    benchmark = mlp_fashionmnist_benchmark()

    # If you don't like the default config_space, change it here. But let
    # us use the default
    config_space = benchmark.config_space

    # Local backend
    trial_backend = LocalBackend(entry_point=str(benchmark.script))

    # GP-based Bayesian optimization searcher. Many options can be specified
    # via ``search_options``, but let's use the defaults
    searcher = "bayesopt"
    search_options = {"num_init_random": n_workers + 2}
    # Hyperband (or successive halving) scheduler of the stopping type.
    # Together with 'bayesopt', this selects the MOBSTER algorithm.
    # If you don't like the defaults suggested, just change them:
    scheduler = HyperbandScheduler(
        config_space,
        searcher=searcher,
        search_options=search_options,
        max_resource_attr=benchmark.max_resource_attr,
        resource_attr=benchmark.resource_attr,
        mode=benchmark.mode,
        metric=benchmark.metric,
        random_seed=random_seed,
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=120)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
    )

    tuner.run()

Requirements:

  • Needs “mlp_fashionmnist” benchmark, which requires Syne Tune to have been installed from source.

In this example, we tune one of the built-in benchmark problems, which is useful in order to compare different HPO methods. More details on benchmarking is provided in this tutorial.

Transfer Tuning on NASBench-201

examples/launch_nas201_transfer_learning.py
from typing import Dict

from syne_tune.blackbox_repository import load_blackbox, BlackboxRepositoryBackend
from syne_tune.backend.simulator_backend.simulator_callback import SimulatorCallback
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.schedulers import FIFOScheduler
from syne_tune.optimizer.schedulers.transfer_learning import (
    TransferLearningTaskEvaluations,
    BoundingBox,
)
from syne_tune import StoppingCriterion, Tuner


def load_transfer_learning_evaluations(
    blackbox_name: str, test_task: str, metric: str
) -> Dict[str, TransferLearningTaskEvaluations]:
    bb_dict = load_blackbox(blackbox_name)
    metric_index = [
        i
        for i, name in enumerate(bb_dict[test_task].objectives_names)
        if name == metric
    ][0]
    transfer_learning_evaluations = {
        task: TransferLearningTaskEvaluations(
            hyperparameters=bb.hyperparameters,
            configuration_space=bb.configuration_space,
            objectives_evaluations=bb.objectives_evaluations[
                ..., metric_index : metric_index + 1
            ],
            objectives_names=[metric],
        )
        for task, bb in bb_dict.items()
        if task != test_task
    }
    return transfer_learning_evaluations


if __name__ == "__main__":
    blackbox_name = "nasbench201"
    test_task = "cifar100"
    elapsed_time_attr = "metric_elapsed_time"
    metric = "metric_valid_error"

    bb_dict = load_blackbox(blackbox_name)
    transfer_learning_evaluations = load_transfer_learning_evaluations(
        blackbox_name, test_task, metric
    )

    scheduler = BoundingBox(
        scheduler_fun=lambda new_config_space, mode, metric: FIFOScheduler(
            new_config_space,
            points_to_evaluate=[],
            searcher="random",
            metric=metric,
            mode=mode,
        ),
        mode="min",
        config_space=bb_dict[test_task].configuration_space,
        metric=metric,
        num_hyperparameters_per_task=10,
        transfer_learning_evaluations=transfer_learning_evaluations,
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=7200)

    trial_backend = BlackboxRepositoryBackend(
        blackbox_name=blackbox_name,
        elapsed_time_attr=elapsed_time_attr,
        dataset=test_task,
    )

    # It is important to set ``sleep_time`` to 0 here (mandatory for simulator backend)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=4,
        sleep_time=0,
        # This callback is required in order to make things work with the
        # simulator callback. It makes sure that results are stored with
        # simulated time (rather than real time), and that the time_keeper
        # is advanced properly whenever the tuner loop sleeps
        callbacks=[SimulatorCallback()],
    )
    tuner.run()

    tuning_experiment = load_experiment(tuner.name)
    print(tuning_experiment)

    print(f"best result found: {tuning_experiment.best_config()}")

    tuning_experiment.plot()

Requirements:

  • Syne Tune dependencies blackbox-repository need to be installed.

  • Needs nasbench201 blackbox to be downloaded and preprocessed. This can take quite a while when done for the first time

  • If AWS SageMaker is used or an S3 bucket is accessible, the blackbox files are uploaded to your S3 bucket

In this example, we use the simulator backend with the NASBench-201 blackbox. It serves as a simple demonstration how evaluations from related tasks can be used to speed up HPO.

Transfer Learning Example

examples/launch_transfer_learning_example.py
"""
Example collecting evaluations and using them for transfer learning on a
related task.
"""
from examples.training_scripts.height_example.train_height import (
    height_config_space,
    METRIC_ATTR,
    METRIC_MODE,
)

from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.optimizer.baselines import BayesianOptimization, ZeroShotTransfer
from syne_tune.optimizer.schedulers import FIFOScheduler

from syne_tune.optimizer.schedulers.transfer_learning import (
    TransferLearningTaskEvaluations,
    BoundingBox,
)

from syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher import (
    QuantileBasedSurrogateSearcher,
)

import argparse
import copy
import numpy as np
from pathlib import Path


def add_labels(ax, conf_space, title):
    ax.legend()
    ax.set_xlabel("width")
    ax.set_ylabel("height")
    ax.set_xlim([conf_space["width"].lower - 1, conf_space["width"].upper + 1])
    ax.set_ylim([conf_space["height"].lower - 10, conf_space["height"].upper + 10])
    ax.set_title(title)


def scatter_space_exploration(ax, task_hyps, max_trials, label, color=None):
    ax.scatter(
        task_hyps["width"][:max_trials],
        task_hyps["height"][:max_trials],
        alpha=0.4,
        label=label,
        color=color,
    )


colours = {
    "BayesianOptimization": "C0",
    "BoundingBox": "C1",
    "ZeroShotTransfer": "C2",
    "Quantiles": "C3",
}


def plot_last_task(max_trials, df, label, metric, color):
    max_tr = min(max_trials, len(df))
    plt.scatter(range(max_tr), df[metric][:max_tr], label=label, color=color)
    plt.plot([np.min(df[metric][:ii]) for ii in range(1, max_trials + 1)], color=color)


def filter_completed(df):
    # Filter out runs that didn't finish
    return df[df["status"] == "Completed"].reset_index()


def extract_transferable_evaluations(df, metric, config_space):
    """
    Take a dataframe from a tuner run, filter it and generate
    TransferLearningTaskEvaluations from it
    """
    filter_df = filter_completed(df)

    return TransferLearningTaskEvaluations(
        configuration_space=config_space,
        hyperparameters=filter_df[config_space.keys()],
        objectives_names=[metric],
        # objectives_evaluations need to be of shape
        # (num_evals, num_seeds, num_fidelities, num_objectives)
        # We only have one seed, fidelity and objective
        objectives_evaluations=np.array(filter_df[metric], ndmin=4).T,
    )


def run_scheduler_on_task(entry_point, scheduler, max_trials):
    """
    Take a scheduler and run it for max_trials on the backend specified by entry_point
    Return a dataframe of the optimisation results
    """
    tuner = Tuner(
        trial_backend=LocalBackend(entry_point=str(entry_point)),
        scheduler=scheduler,
        stop_criterion=StoppingCriterion(max_num_trials_finished=max_trials),
        n_workers=4,
        sleep_time=0.001,
    )
    tuner.run()

    return tuner.tuning_status.get_dataframe()


def init_scheduler(
    scheduler_str, max_steps, seed, mode, metric, transfer_learning_evaluations
):
    """
    Initialise the scheduler
    """
    kwargs = {
        "metric": metric,
        "config_space": height_config_space(max_steps=max_steps),
        "mode": mode,
        "random_seed": seed,
    }
    kwargs_w_trans = copy.deepcopy(kwargs)
    kwargs_w_trans["transfer_learning_evaluations"] = transfer_learning_evaluations

    if scheduler_str == "BayesianOptimization":
        return BayesianOptimization(**kwargs)

    if scheduler_str == "ZeroShotTransfer":
        return ZeroShotTransfer(use_surrogates=True, **kwargs_w_trans)

    if scheduler_str == "Quantiles":
        return FIFOScheduler(
            searcher=QuantileBasedSurrogateSearcher(**kwargs_w_trans),
            **kwargs,
        )

    if scheduler_str == "BoundingBox":
        kwargs_sched_fun = {key: kwargs[key] for key in kwargs if key != "config_space"}
        kwargs_w_trans[
            "scheduler_fun"
        ] = lambda new_config_space, mode, metric: BayesianOptimization(
            new_config_space,
            **kwargs_sched_fun,
        )
        del kwargs_w_trans["random_seed"]
        return BoundingBox(**kwargs_w_trans)
    raise ValueError("scheduler_str not recognised")


if __name__ == "__main__":

    max_trials = 10
    np.random.seed(1)
    # Use train_height backend for our tests
    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    # Collect evaluations on preliminary tasks
    transfer_learning_evaluations = {}
    for max_steps in range(1, 6):
        scheduler = init_scheduler(
            "BayesianOptimization",
            max_steps=max_steps,
            seed=np.random.randint(100),
            mode=METRIC_MODE,
            metric=METRIC_ATTR,
            transfer_learning_evaluations=None,
        )

        print("Optimising preliminary task %s" % max_steps)
        prev_task = run_scheduler_on_task(entry_point, scheduler, max_trials)

        # Generate TransferLearningTaskEvaluations from previous task
        transfer_learning_evaluations[max_steps] = extract_transferable_evaluations(
            prev_task, METRIC_ATTR, scheduler.config_space
        )

    # Collect evaluations on transfer task
    max_steps = 6
    transfer_task_results = {}
    labels = ["BayesianOptimization", "BoundingBox", "ZeroShotTransfer", "Quantiles"]
    for scheduler_str in labels:
        scheduler = init_scheduler(
            scheduler_str,
            max_steps=max_steps,
            seed=max_steps,
            mode=METRIC_MODE,
            metric=METRIC_ATTR,
            transfer_learning_evaluations=transfer_learning_evaluations,
        )
        print("Optimising transfer task using %s" % scheduler_str)
        transfer_task_results[scheduler_str] = run_scheduler_on_task(
            entry_point, scheduler, max_trials
        )

    # Optionally generate plots. Defaults to False
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--generate_plots", action="store_true", help="generate optimisation plots."
    )
    args = parser.parse_args()

    if args.generate_plots:
        from syne_tune.try_import import try_import_visual_message

        try:
            import matplotlib.pyplot as plt
        except ImportError:
            print(try_import_visual_message())

        print("Generating optimisation plots.")
        """ Plot the results on the transfer task """
        for label in labels:
            plot_last_task(
                max_trials,
                transfer_task_results[label],
                label=label,
                metric=METRIC_ATTR,
                color=colours[label],
            )
        plt.legend()
        plt.ylabel(METRIC_ATTR)
        plt.xlabel("Iteration")
        plt.title("Transfer task (max_steps=6)")
        plt.savefig("Transfer_task.png", bbox_inches="tight")

        """ Plot the configs tried for the preliminary tasks """
        fig, ax = plt.subplots()
        for key in transfer_learning_evaluations:
            scatter_space_exploration(
                ax,
                transfer_learning_evaluations[key].hyperparameters,
                max_trials,
                "Task %s" % key,
            )
        add_labels(
            ax,
            scheduler.config_space,
            "Explored locations of BO for preliminary tasks",
        )
        plt.savefig("Configs_explored_preliminary.png", bbox_inches="tight")

        """ Plot the configs tried for the transfer task """
        fig, ax = plt.subplots()

        # Plot the configs tried by the different schedulers on the transfer task
        for label in labels:
            finished_trials = filter_completed(transfer_task_results[label])
            scatter_space_exploration(
                ax, finished_trials, max_trials, label, color=colours[label]
            )

            # Plot the first config tested as a big square
            ax.scatter(
                finished_trials["width"][0],
                finished_trials["height"][0],
                marker="s",
                color=colours[label],
                s=100,
            )

        # Plot the optima from the preliminary tasks as black crosses
        past_label = "Preliminary optima"
        for key in transfer_learning_evaluations:
            argmin = np.argmin(
                transfer_learning_evaluations[key].objective_values(METRIC_ATTR)[
                    :max_trials, 0, 0
                ]
            )
            ax.scatter(
                transfer_learning_evaluations[key].hyperparameters["width"][argmin],
                transfer_learning_evaluations[key].hyperparameters["height"][argmin],
                color="k",
                marker="x",
                label=past_label,
            )
            past_label = None
        add_labels(ax, scheduler.config_space, "Explored locations for transfer task")
        plt.savefig("Configs_explored_transfer.png", bbox_inches="tight")

Requirements:

  • Needs matplotlib to be installed if the plotting flag is given: pip install matplotlib. If you installed Syne Tune with visual or extra, this dependence is included.

An example of how to use evaluations collected in Syne Tune to run a transfer learning scheduler. Makes use of train_height.py. Used in the transfer learning tutorial. To plot the figures, run as python launch_transfer_learning_example.py --generate_plots.

Plot Results of Tuning Experiment

examples/launch_plot_results.py
import logging
from pathlib import Path

from syne_tune.backend import LocalBackend
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune import Tuner, StoppingCriterion
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
)

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927
    max_steps = 100
    n_workers = 4

    config_space = {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
    }
    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    trial_backend = LocalBackend(entry_point=entry_point)

    # Random search without stopping
    scheduler = RandomSearch(
        config_space, mode=METRIC_MODE, metric=METRIC_ATTR, random_seed=random_seed
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=20)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        n_workers=n_workers,
        stop_criterion=stop_criterion,
        results_update_interval=5,
        tuner_name="plot-results-demo",
        metadata={"description": "just an example"},
    )

    tuner.run()

    # shows how to print the best configuration found from the tuner and retrains it
    trial_id, best_config = tuner.best_config()

    tuning_experiment = load_experiment(tuner.name)

    # prints the best configuration found from experiment-results
    print(f"best result found: {tuning_experiment.best_config()}")

    # plots the best metric over time
    tuning_experiment.plot()

    # plots values found by all trials over time
    tuning_experiment.plot_trials_over_time()

Requirements:

  • Needs matplotlib to be installed: pip install matplotlib. If you installed Syne Tune with visual or extra, this dependence is included.

Makes use of train_height.py.

Resume a Tuning Job

examples/launch_resume_tuning.py
from syne_tune.config_space import randint

import shutil
from pathlib import Path

from syne_tune import StoppingCriterion
from syne_tune import Tuner
from syne_tune.backend import LocalBackend
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import ASHA
from syne_tune.optimizer.schedulers.searchers.utils import make_hyperparameter_ranges
from syne_tune.util import random_string


def launch_first_tuning(experiment_name: str):
    max_epochs = 100
    metric = "mean_loss"
    mode = "min"
    config_space = {
        "steps": max_epochs,
        "width": randint(0, 10),
        "height": randint(0, 10),
    }

    entry_point = (
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    scheduler = ASHA(
        config_space=config_space,
        metric=metric,
        mode=mode,
        max_t=max_epochs,
        search_options={"allow_duplicates": True},
        resource_attr="epoch",
    )

    trial_backend = LocalBackend(entry_point=str(entry_point))

    stop_criterion = StoppingCriterion(
        max_num_trials_started=10,
    )
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=4,
        tuner_name=experiment_name,
        suffix_tuner_name=False,
    )

    tuner.run()


if __name__ == "__main__":
    experiment_name = f"resume-tuning-example-{random_string(5)}"

    # Launch a tuning, tuning results and checkpoints are written to disk
    launch_first_tuning(experiment_name)

    # Later loads an experiment from disk given the experiment name,
    # in particular sets `load_tuner` to True to deserialize the Tuner
    tuning_experiment = load_experiment(experiment_name, load_tuner=True)

    # Copy the tuner as it will be modified when retuning
    shutil.copy(
        tuning_experiment.path / "tuner.dill",
        tuning_experiment.path / "tuner-backup.dill",
    )

    # Update stop criterion to run the tuning a couple more trials than before
    tuning_experiment.tuner.stop_criterion = StoppingCriterion(
        max_num_trials_started=20
    )

    # Define a new config space for instance favoring a new part of the space based on data analysis
    new_config_space = {
        "steps": 100,
        "width": randint(10, 20),
        "height": randint(1, 10),
    }

    # Update scheduler with random searcher to use new configuration space,
    # For now we modify internals, adding a method `update_config_space` to RandomSearcher would be a cleaner option.
    tuning_experiment.tuner.scheduler.config_space = new_config_space
    tuning_experiment.tuner.scheduler.searcher._hp_ranges = make_hyperparameter_ranges(
        new_config_space
    )
    tuning_experiment.tuner.scheduler.searcher.configure_scheduler(
        tuning_experiment.tuner.scheduler
    )

    # Resume the tuning with the modified search space and stopping criterion
    # The scheduler will now explore the updated search space
    tuning_experiment.tuner.run()

Customize Results Written during an Experiment

examples/launch_height_extra_results.py
from typing import Dict, Any, Optional, List
from pathlib import Path
import logging

from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint
from syne_tune.constants import ST_TUNER_TIME
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import DyHPO
from syne_tune.optimizer.schedulers.searchers.dyhpo.hyperband_dyhpo import (
    DyHPORungSystem,
)
from syne_tune.results_callback import ExtraResultsComposer, StoreResultsCallback
from syne_tune import Tuner, StoppingCriterion


# We would like to extract some extra information from the scheduler during the
# experiment. To this end, we implement a class for extracting this information
class DyHPOExtraResults(ExtraResultsComposer):
    def __call__(self, tuner: Tuner) -> Optional[Dict[str, Any]]:
        scheduler = tuner.scheduler
        assert isinstance(scheduler, DyHPO)  # sanity check
        # :class:`~syne_tune.optimizer.schedulers.searchers.dyhpo.hyperband_dyhpo.DyHPORungSystem`
        # collects statistics about how often several types of decisions were made in
        # ``on_task_schedule``
        return scheduler.terminator._rung_systems[0].summary_schedule_records()

    def keys(self) -> List[str]:
        return DyHPORungSystem.summary_schedule_keys()


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927
    max_epochs = 100
    n_workers = 4
    # Hyperparameter configuration space
    config_space = {
        "width": randint(1, 20),
        "height": randint(1, 20),
        "epochs": 100,
    }

    # We use the DyHPO scheduler, since it records some interesting extra
    # informations
    scheduler = DyHPO(
        config_space,
        metric="mean_loss",
        resource_attr="epoch",
        max_resource_attr="epochs",
        search_options={"debug_log": False},
        grace_period=2,
    )
    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height_simple.py"
    )

    # Extra results are stored by the
    # :class:`~syne_tune.results_callback.StoreResultsCallback`. In fact, they
    # are appended to the default time-stamped results whenever a report is
    # received.
    extra_results_composer = DyHPOExtraResults()
    callbacks = [StoreResultsCallback(extra_results_composer=extra_results_composer)]
    tuner = Tuner(
        trial_backend=LocalBackend(entry_point=entry_point),
        scheduler=scheduler,
        stop_criterion=StoppingCriterion(max_wallclock_time=30),
        n_workers=4,  # how many trials are evaluated in parallel
        callbacks=callbacks,
    )
    tuner.run()

    # Let us have a look what was written. Here, we just look at the information
    # at the end of the experiment
    results_df = load_experiment(tuner.name).results
    final_pos = results_df.loc[:, ST_TUNER_TIME].argmax()
    final_row = dict(results_df.loc[final_pos])
    extra_results_at_end = {
        name: final_row[name] for name in extra_results_composer.keys()
    }
    print(f"\nExtra results at end of experiment:\n{extra_results_at_end}")

Makes use of train_height.py.

An example for how to append extra results to those written by default to results.csv.zip. This is done by customizing the StoreResultsCallback.

Pass Configuration as JSON File to Training Script

examples/launch_height_config_json.py
import os
import logging
from pathlib import Path
from argparse import ArgumentParser

from syne_tune.backend import LocalBackend, SageMakerBackend
from syne_tune.backend.sagemaker_backend.sagemaker_utils import (
    get_execution_role,
    default_sagemaker_session,
)
from syne_tune.optimizer.baselines import (
    ASHA,
)

from syne_tune import Tuner, StoppingCriterion
from syne_tune.remote.constants import (
    DEFAULT_CPU_INSTANCE_SMALL,
    PYTORCH_LATEST_FRAMEWORK,
    PYTORCH_LATEST_PY_VERSION,
)
from examples.training_scripts.height_example.train_height_config_json import (
    height_config_space,
    RESOURCE_ATTR,
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
)


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    parser = ArgumentParser()
    parser.add_argument("--use_sagemaker_backend", type=int, default=0)
    args = parser.parse_args()
    use_sagemaker_backend = bool(args.use_sagemaker_backend)

    random_seed = 31415927
    max_epochs = 100
    n_workers = 4
    max_wallclock_time = 5 * 60 if use_sagemaker_backend else 10

    config_space = height_config_space(max_epochs)
    entry_point = (
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height_config_json.py"
    )

    scheduler = ASHA(
        config_space,
        metric=METRIC_ATTR,
        mode=METRIC_MODE,
        max_resource_attr=MAX_RESOURCE_ATTR,
        resource_attr=RESOURCE_ATTR,
    )

    if not use_sagemaker_backend:
        trial_backend = LocalBackend(
            entry_point=str(entry_point),
            pass_args_as_json=True,
        )
    else:
        from sagemaker.pytorch import PyTorch
        import syne_tune

        if "AWS_DEFAULT_REGION" not in os.environ:
            os.environ["AWS_DEFAULT_REGION"] = "us-west-2"
        trial_backend = SageMakerBackend(
            sm_estimator=PyTorch(
                entry_point=str(entry_point),
                instance_type=DEFAULT_CPU_INSTANCE_SMALL,
                instance_count=1,
                framework_version=PYTORCH_LATEST_FRAMEWORK,
                py_version=PYTORCH_LATEST_PY_VERSION,
                role=get_execution_role(),
                dependencies=syne_tune.__path__,
                max_run=10 * 60,
                sagemaker_session=default_sagemaker_session(),
                disable_profiler=True,
                debugger_hook_config=False,
                keep_alive_period_in_seconds=60,  # warm pool feature
            ),
            metrics_names=[METRIC_ATTR],
            pass_args_as_json=True,
        )

    stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        start_jobs_without_delay=False,
    )

    tuner.run()

Requirements:

Makes use of the following train_height_config_json.py training script:

examples/training_scripts/height_example/train_height_config_json.py
import logging
import time
from typing import Optional, Dict, Any
from argparse import ArgumentParser

from syne_tune import Reporter
from syne_tune.config_space import randint
from syne_tune.utils import add_config_json_to_argparse, load_config_json


report = Reporter()


RESOURCE_ATTR = "epoch"

METRIC_ATTR = "mean_loss"

METRIC_MODE = "min"

MAX_RESOURCE_ATTR = "steps"


def train_height(step: int, width: float, height: float) -> float:
    return 100 / (10 + width * step) + 0.1 * height


def height_config_space(
    max_steps: int, sleep_time: Optional[float] = None
) -> Dict[str, Any]:
    if sleep_time is None:
        sleep_time = 0.1
    return {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
        "sleep_time": sleep_time,
        "list_arg": ["this", "is", "a", "list", 1, 2, 3],
        "dict_arg": {
            "this": 27,
            "is": [1, 2, 3],
            "a": "dictionary",
            "even": {
                "a": 0,
                "nested": 1,
                "one": 2,
            },
        },
    }


def _check_extra_args(config: Dict[str, Any]):
    config_space = height_config_space(5)
    for k in ("list_arg", "dict_arg"):
        a, b = config[k], config_space[k]
        assert a == b, (k, a, b)


if __name__ == "__main__":
    root = logging.getLogger()
    root.setLevel(logging.INFO)

    parser = ArgumentParser()
    # Append required argument(s):
    add_config_json_to_argparse(parser)
    args, _ = parser.parse_known_args()
    # Loads config JSON and merges with ``args``
    config = load_config_json(vars(args))

    # Check that args with complex types have been received correctly
    _check_extra_args(config)
    width = config["width"]
    height = config["height"]
    sleep_time = config["sleep_time"]
    num_steps = config[MAX_RESOURCE_ATTR]
    for step in range(num_steps):
        # Sleep first, since results are returned at end of "epoch"
        time.sleep(sleep_time)
        # Feed the score back to Syne Tune.
        dummy_score = train_height(step, width, height)
        report(
            **{
                "step": step,
                METRIC_ATTR: dummy_score,
                RESOURCE_ATTR: step + 1,
            }
        )

Speculative Early Checkpoint Removal

examples/launch_fashionmnist_checkpoint_removal.py
"""
Example for speculative checkpoint removal with asynchronous multi-fidelity
"""
from typing import Optional, Dict, Any, List
import logging

from syne_tune.backend import LocalBackend
from syne_tune.callbacks.hyperband_remove_checkpoints_callback import (
    HyperbandRemoveCheckpointsCommon,
)
from syne_tune.constants import ST_TUNER_TIME
from syne_tune.experiments import load_experiment
from syne_tune.optimizer.baselines import MOBSTER
from syne_tune.results_callback import ExtraResultsComposer, StoreResultsCallback
from syne_tune.util import find_first_of_type
from syne_tune import Tuner, StoppingCriterion

from benchmarking.benchmark_definitions.mlp_on_fashionmnist import (
    mlp_fashionmnist_benchmark,
)


# This is used to monitor what the checkpoint removal mechanism is doing, and
# writing out results. This is optional, the mechanism works without this.
class CPRemovalExtraResults(ExtraResultsComposer):
    def __call__(self, tuner: Tuner) -> Optional[Dict[str, Any]]:
        callback = find_first_of_type(tuner.callbacks, HyperbandRemoveCheckpointsCommon)
        return None if callback is None else callback.extra_results()

    def keys(self) -> List[str]:
        return HyperbandRemoveCheckpointsCommon.extra_results_keys()


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.DEBUG)
    random_seed = 31415927
    n_workers = 4
    max_num_checkpoints = 10
    # This time may be too short to see positive effects:
    max_wallclock_time = 1800
    # Monitor how checkpoint removal is doing over time, appending this
    # information to results.csv.zip?
    monitor_cp_removal_in_results = True

    # We pick the MLP on FashionMNIST benchmark
    benchmark = mlp_fashionmnist_benchmark()

    # Local backend
    # By setting ``delete_checkpoints=True``, we ask for checkpoints to be removed
    # once a trial cannot be resumed anymore
    trial_backend = LocalBackend(
        entry_point=str(benchmark.script),
        delete_checkpoints=True,
    )

    # MOBSTER (model-based ASHA) with promotion scheduling (pause and resume).
    # Checkpoints are written for each paused trial, and these are not removed,
    # because in principle, every paused trial may be resumed in the future.
    # If checkpoints are large, this may fill up your disk.
    # Here, we use speculative checkpoint removal to keep the number of checkpoints
    # to at most ``max_num_checkpoints``. To this end, paused trials are ranked by
    # expected cost of removing their checkpoint.
    scheduler = MOBSTER(
        benchmark.config_space,
        type="promotion",
        max_resource_attr=benchmark.max_resource_attr,
        resource_attr=benchmark.resource_attr,
        mode=benchmark.mode,
        metric=benchmark.metric,
        random_seed=random_seed,
        early_checkpoint_removal_kwargs=dict(
            max_num_checkpoints=max_num_checkpoints,
        ),
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=max_wallclock_time)
    # The tuner activates early checkpoint removal iff
    # ``trial_backend.delete_checkpoints``. In this case, it requests details
    # from the scheduler (which is ``early_checkpoint_removal_kwargs`` in our
    # case). Early checkpoint removal is done by appending a callback to those
    # normally used with the tuner.
    if monitor_cp_removal_in_results:
        # We can monitor how well checkpoint removal is working by storing
        # extra results (this is optional)
        extra_results_composer = CPRemovalExtraResults()
        callbacks = [
            StoreResultsCallback(extra_results_composer=extra_results_composer)
        ]
    else:
        extra_results_composer = None
        callbacks = None
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
        callbacks=callbacks,
    )
    tuner.run()

    if monitor_cp_removal_in_results:
        # We have monitored how checkpoint removal has been doing over time. Here,
        # we just look at the information at the end of the experiment
        results_df = load_experiment(tuner.name).results
        final_pos = results_df.loc[:, ST_TUNER_TIME].argmax()
        final_row = dict(results_df.loc[final_pos])
        extra_results_at_end = {
            name: final_row[name] for name in extra_results_composer.keys()
        }
        logging.info(f"Extra results at end of experiment:\n{extra_results_at_end}")

    # We can obtain additional details from the callback, which is the last one
    # in ``tuner``
    callback = find_first_of_type(tuner.callbacks, HyperbandRemoveCheckpointsCommon)
    trials_resumed = callback.trials_resumed_without_checkpoint()
    if trials_resumed:
        logging.info(
            f"The following {len(trials_resumed)} trials were resumed without a checkpoint:\n{trials_resumed}"
        )
    else:
        logging.info("No trials were resumed without a checkpoint")

Requirements:

  • Needs “mlp_fashionmnist” benchmark, which requires Syne Tune to have been installed from source.

This example uses the mlp_fashionmnist benchmark. It runs for about 30 minutes. It demonstrates speculative early checkpoint removal for MOBSTER with promotion scheduling (pause and resume).

Launch HPO Experiment with Ray Tune Scheduler

examples/launch_height_ray.py
import logging
from pathlib import Path

from ray.tune.schedulers import AsyncHyperBandScheduler
from ray.tune.search.skopt import SkOptSearch
import numpy as np

from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import RayTuneScheduler
from syne_tune import Tuner, StoppingCriterion
from syne_tune.config_space import randint
from examples.training_scripts.height_example.train_height import (
    RESOURCE_ATTR,
    METRIC_ATTR,
    METRIC_MODE,
    MAX_RESOURCE_ATTR,
)

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.DEBUG)

    random_seed = 31415927
    max_steps = 100
    n_workers = 4

    config_space = {
        MAX_RESOURCE_ATTR: max_steps,
        "width": randint(0, 20),
        "height": randint(-100, 100),
    }
    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    # Local backend
    trial_backend = LocalBackend(entry_point=entry_point)

    # Hyperband scheduler with SkOpt searcher
    np.random.seed(random_seed)
    ray_searcher = SkOptSearch()
    ray_searcher.set_search_properties(
        mode=METRIC_MODE,
        metric=METRIC_ATTR,
        config=RayTuneScheduler.convert_config_space(config_space),
    )

    ray_scheduler = AsyncHyperBandScheduler(
        max_t=max_steps,
        time_attr=RESOURCE_ATTR,
        mode=METRIC_MODE,
        metric=METRIC_ATTR,
    )

    scheduler = RayTuneScheduler(
        config_space=config_space,
        ray_scheduler=ray_scheduler,
        ray_searcher=ray_searcher,
    )

    stop_criterion = StoppingCriterion(max_wallclock_time=20)
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=n_workers,
    )

    tuner.run()

Makes use of train_height.py.

Stand-Alone Bayesian Optimization

examples/launch_standalone_bayesian_optimization.py
import logging

from syne_tune.config_space import uniform, randint, choice

from syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common import (
    dictionarize_objective,
)
from syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_factory import (
    make_hyperparameter_ranges,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects import (
    create_tuning_job_state,
)
from syne_tune.optimizer.schedulers.searchers.gp_fifo_searcher import GPFIFOSearcher
from syne_tune.optimizer.schedulers.searchers.gp_searcher_utils import encode_state


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)

    random_seed = 31415927

    # toy example of 3 hp's
    config_space = {
        "hp_1": uniform(-5.0, 5.0),
        "hp_2": randint(-5, 5),
        "hp_3": choice(["a", "b", "c"]),
    }
    hp_ranges = make_hyperparameter_ranges(config_space)
    batch_size = 16
    num_init_candidates_for_batch = 10
    state = create_tuning_job_state(
        hp_ranges=hp_ranges,
        cand_tuples=[
            (-3.0, -4, "a"),
            (2.2, -3, "b"),
            (-4.9, -1, "b"),
            (-1.9, -1, "c"),
            (-3.5, 3, "a"),
        ],
        metrics=[dictionarize_objective(x) for x in (15.0, 27.0, 13.0, 39.0, 35.0)],
    )

    gp_searcher = GPFIFOSearcher(
        state.hp_ranges.config_space,
        points_to_evaluate=None,
        random_seed=random_seed,
        metric="objective",
        debug_log=False,
    )
    gp_searcher_state = gp_searcher.get_state()
    gp_searcher_state["state"] = encode_state(state)
    gp_searcher = gp_searcher.clone_from_state(gp_searcher_state)

    next_candidate_list = gp_searcher.get_batch_configs(
        batch_size=batch_size,
        num_init_candidates_for_batch=num_init_candidates_for_batch,
    )

    assert len(next_candidate_list) == batch_size

Syne Tune combines a scheduler (HPO algorithm) with a backend to provide a complete HPO solution. If you already have a system in place for job scheduling and managing the state of the tuning problem, you may want to call the scheduler on its own. This example demonstrates how to do this for Gaussian process based Bayesian optimization.

Ask Tell Interface

examples/launch_ask_tell_scheduler.py
"""
This is an example on how to use syne-tune in the ask-tell mode.
In this setup the tuning loop and experiments are disentangled. The AskTell Scheduler suggests new configurations
and the users themselves perform experiments to test the performance of each configuration.
Once done, user feeds the result into the Scheduler which uses the data to suggest better configurations.


In some cases, experiments needed for function evaluations can be very complex and require extra orchestration
(example vary from setting up jobs on non-aws clusters to runnig physical lab experiments) in which case this
interface provides all the necessary flexibility
"""
from typing import Dict
import datetime
import logging

import dill
import numpy as np

from syne_tune.backend.trial_status import Trial, Status, TrialResult
from syne_tune.config_space import uniform
from syne_tune.optimizer.baselines import RandomSearch, BayesianOptimization
from syne_tune.optimizer.scheduler import TrialScheduler


class AskTellScheduler:
    bscheduler: TrialScheduler
    trial_counter: int
    completed_experiments: Dict[int, TrialResult]

    def __init__(self, base_scheduler: TrialScheduler):
        self.bscheduler = base_scheduler
        self.trial_counter = 0
        self.completed_experiments = {}

    def ask(self) -> Trial:
        """
        Ask the scheduler for new trial to run
        :return: Trial to run
        """
        trial_suggestion = self.bscheduler.suggest(self.trial_counter)
        trial = Trial(
            trial_id=self.trial_counter,
            config=trial_suggestion.config,
            creation_time=datetime.datetime.now(),
        )
        self.trial_counter += 1
        return trial

    def tell(self, trial: Trial, experiment_result: Dict[str, float]):
        """
        Feed experiment results back to the Scheduler

        :param trial: Trial that was run
        :param experiment_result: {metric: value} dictionary with experiment results
        """
        trial_result = trial.add_results(
            metrics=experiment_result,
            status=Status.completed,
            training_end_time=datetime.datetime.now(),
        )
        self.bscheduler.on_trial_complete(trial=trial, result=experiment_result)
        self.completed_experiments[trial_result.trial_id] = trial_result

    def best_trial(self, metric: str) -> TrialResult:
        """
        Return the best trial according to the provided metric
        """
        if self.bscheduler.mode == "max":
            sign = 1.0
        else:
            sign = -1.0

        return max(
            [value for key, value in self.completed_experiments.items()],
            key=lambda trial: sign * trial.metrics[metric],
        )


def target_function(x, noise: bool = True):
    fx = x * x + np.sin(x)
    if noise:
        sigma = np.cos(x) ** 2 + 0.01
        noise = 0.1 * np.random.normal(loc=x, scale=sigma)
        fx = fx + noise

    return fx


def get_objective():
    metric = "mean_loss"
    mode = "min"
    max_iterations = 100
    config_space = {
        "x": uniform(-1, 1),
    }
    return metric, mode, config_space, max_iterations


def plot_objective():
    """
    In this function, we will inspect the objective by plotting the target function
    :return:
    """
    from syne_tune.try_import import try_import_visual_message

    try:
        import matplotlib.pyplot as plt
    except ImportError:
        print(try_import_visual_message())

    metric, mode, config_space, max_iterations = get_objective()

    plt.set_cmap("viridis")
    x = np.linspace(config_space["x"].lower, config_space["x"].upper, 400)
    fx = target_function(x, noise=False)
    noise = 0.1 * np.cos(x) ** 2 + 0.01

    plt.plot(x, fx, "r--", label="True value")
    plt.fill_between(x, fx + noise, fx - noise, alpha=0.2, fc="r")
    plt.legend()
    plt.grid()
    plt.show()


def tune_with_random_search() -> TrialResult:
    metric, mode, config_space, max_iterations = get_objective()
    scheduler = AskTellScheduler(
        base_scheduler=RandomSearch(config_space, metric=metric, mode=mode)
    )
    for iter in range(max_iterations):
        trial_suggestion = scheduler.ask()
        test_result = target_function(**trial_suggestion.config)
        scheduler.tell(trial_suggestion, {metric: test_result})
    return scheduler.best_trial(metric)


def save_restart_with_gp() -> TrialResult:
    metric, mode, config_space, max_iterations = get_objective()
    scheduler = AskTellScheduler(
        base_scheduler=BayesianOptimization(config_space, metric=metric, mode=mode)
    )
    for iter in range(int(max_iterations / 2)):
        trial_suggestion = scheduler.ask()
        test_result = target_function(**trial_suggestion.config)
        scheduler.tell(trial_suggestion, {metric: test_result})

    # --- The scheduler can be written to disk to pause experiment
    output_path = "scheduler-checkpoint.dill"
    with open(output_path, "wb") as f:
        dill.dump(scheduler, f)

    # --- The Scheduler can be read from disk at a later time to resume experiments
    with open(output_path, "rb") as f:
        scheduler = dill.load(f)

    for iter in range(int(max_iterations / 2)):
        trial_suggestion = scheduler.ask()
        test_result = target_function(**trial_suggestion.config)
        scheduler.tell(trial_suggestion, {metric: test_result})
    return scheduler.best_trial(metric)


def tune_with_gp() -> TrialResult:
    metric, mode, config_space, max_iterations = get_objective()
    scheduler = AskTellScheduler(
        base_scheduler=BayesianOptimization(config_space, metric=metric, mode=mode)
    )
    for iter in range(max_iterations):
        trial_suggestion = scheduler.ask()
        test_result = target_function(**trial_suggestion.config)
        scheduler.tell(trial_suggestion, {metric: test_result})
    return scheduler.best_trial(metric)


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.WARN)
    # plot_objective() # Please uncomment this to plot the objective
    print("Random:", tune_with_random_search())
    print("GP with restart:", save_restart_with_gp())
    print("GP:", tune_with_gp())

This is an example on how to use syne-tune in the ask-tell mode. In this setup the tuning loop and experiments are disentangled. The AskTell Scheduler suggests new configurations and the users themselves perform experiments to test the performance of each configuration. Once done, user feeds the result into the Scheduler which uses the data to suggest better configurations.

In some cases, experiments needed for function evaluations can be very complex and require extra orchestration (example vary from setting up jobs on non-aws clusters to running physical lab experiments) in which case this interface provides all the necessary flexibility.

Ask Tell interface for Hyperband

examples/launch_ask_tell_scheduler_hyperband.py

"""
This is an example on how to use syne-tune in the ask-tell mode.
In this setup the tuning loop and experiments are disentangled. The AskTell Scheduler suggests new configurations
and the users themselves perform experiments to test the performance of each configuration.
Once done, user feeds the result into the Scheduler which uses the data to suggest better configurations.


In some cases, experiments needed for function evaluations can be very complex and require extra orchestration
(example vary from setting up jobs on non-aws clusters to runnig physical lab experiments) in which case this
interface provides all the necessary flexibility

This is an extension of launch_ask_tell_scheduler.py to run multi-fidelity methods such as Hyperband
"""

import logging
from typing import Tuple

import numpy as np

from examples.launch_ask_tell_scheduler import AskTellScheduler
from syne_tune.backend.trial_status import Trial, TrialResult
from syne_tune.config_space import uniform
from syne_tune.optimizer.baselines import ASHA
from syne_tune.optimizer.scheduler import SchedulerDecision


def target_function(x, step: int = None, noise: bool = True):
    fx = x * x + np.sin(x)
    if noise:
        sigma = np.cos(x) ** 2 + 0.01
        noise = 0.1 * np.random.normal(loc=x, scale=sigma)
        fx = fx + noise

    if step is not None:
        fx += step * 0.01

    return fx


def get_objective():
    metric = "mean_loss"
    mode = "min"
    max_iterations = 100
    config_space = {
        "x": uniform(-1, 1),
    }
    return metric, mode, config_space, max_iterations


def run_hyperband_step(
    scheduler: AskTellScheduler, trial_suggestion: Trial, max_steps: int, metric: str
) -> Tuple[float, float]:
    for step in range(1, max_steps):
        test_result = target_function(**trial_suggestion.config, step=step)
        decision = scheduler.bscheduler.on_trial_result(
            trial_suggestion, {metric: test_result, "epoch": step}
        )
        if decision == SchedulerDecision.STOP:
            break
    return step, test_result


def tune_with_hyperband() -> TrialResult:
    metric, mode, config_space, max_iterations = get_objective()
    max_steps = 100

    scheduler = AskTellScheduler(
        base_scheduler=ASHA(
            config_space,
            metric=metric,
            resource_attr="epoch",
            max_t=max_steps,
            mode=mode,
        )
    )
    for iter in range(max_iterations):
        trial_suggestion = scheduler.ask()
        final_step, test_result = run_hyperband_step(
            scheduler, trial_suggestion, max_steps, metric
        )
        scheduler.tell(trial_suggestion, {metric: test_result, "epoch": final_step})
    return scheduler.best_trial(metric)


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.WARN)
    print("Hyperband:", tune_with_hyperband())

This is an extension of launch_ask_tell_scheduler.py to run multi-fidelity methods such as Hyperband.

Multi Objective Multi Surrogate (MSMOS) Searcher

examples/launch_mb_mo_optimization.py
from pathlib import Path

import numpy as np

from syne_tune import Tuner, StoppingCriterion
from syne_tune.backend import LocalBackend
from syne_tune.config_space import randint, uniform
from syne_tune.optimizer.baselines import MORandomScalarizationBayesOpt


def main():
    random_seed = 6287623
    # Hyperparameter configuration space
    config_space = {
        "steps": randint(0, 100),
        "theta": uniform(0, np.pi / 2),
        "sleep_time": 0.01,
    }
    metrics = ["y1", "y2"]
    modes = ["min", "min"]

    # Creates a FIFO scheduler with a ``MultiObjectiveMultiSurrogateSearcher``. The
    # latter is configured by one default GP surrogate per objective, and with the
    # ``MultiObjectiveLCBRandomLinearScalarization`` acquisition function.
    scheduler = MORandomScalarizationBayesOpt(
        config_space=config_space,
        metric=metrics,
        mode=modes,
        random_seed=random_seed,
    )

    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "mo_artificial"
        / "mo_artificial.py"
    )
    tuner = Tuner(
        trial_backend=LocalBackend(entry_point=entry_point),
        scheduler=scheduler,
        stop_criterion=StoppingCriterion(max_wallclock_time=30),
        n_workers=1,  # how many trials are evaluated in parallel
    )
    tuner.run()


if __name__ == "__main__":
    main()

This example shows how to use the multi-objective multi-surrogate (MSMOS) searcher to tune a multi-objective problem. In this example, we use two Gaussian process regresors as the surrogate models and rely on lower confidence bound random scalarizer as the acquisition function. With that in mind, any Syne Tune Estimator can be used as surrogate.

Basics of Syne Tune

This tutorial provides a first overview of Syne Tune. You will learn about the most important concepts of automated hyperparameter tuning, and how to make it work for your setup.

Note

In order to run the code coming with this tutorial, you need to have installed Syne Tune from source.

Concepts and Terminology

Syne Tune is a library for large-scale distributed hyperparameter optimization (HPO). Here is some basic terminology. A specific set of values for hyperparameters is called a configuration. The configuration space is the domain of a configuration, prescribing type and valid range of each hyperparameter. Finally, a trial refers to an evaluation of the underlying machine learning model on a given configuration. A trial may result in one or more observations, for example the validation error after each epoch of training the model. Some HPO algorithms may pause a trial and restart it later in time.

HPO experiments in Syne Tune involve the interplay between three components: Tuner, Backend, and Scheduler. There is also dedicated tooling for Benchmarking.

Tuner

The Tuner orchestrates the overall search for the best configuration. It does so by interacting with scheduler and backend. It queries the scheduler for a new configuration to evaluate whenever a worker is free, and passes this suggestion to the backend for the execution of this trial.

Scheduler

In Syne Tune, HPO algorithms are called schedulers (base class TrialScheduler). They search for a new, most promising configuration and suggest it as a new trial to the tuner. Some schedulers may decide to resume a paused trial instead of suggesting a new one. Schedulers may also be in charge of stopping running trials. Syne Tune supports many schedulers, including multi-fidelity methods.

Backend

The backend module is responsible for starting, stopping, pausing and resuming trials, as well as accessing results reported by trials and their statuses (base class TrialBackend). Syne Tune currently supports four execution backends to facilitate experimentations: local backend, Python backend, SageMaker backend, and simulator backend. Recall that an HPO experiment is defined by two scripts. First, a launcher script which configures the configuration space, the backend, and the scheduler, then starts the tuning loop. Second, a training script, in which the machine learning model of interest (e.g., a deep neural network, or gradient boosted decision trees) is trained for a fixed hyperparameter configuration, and some validation metric is reported, either at the end or after each epoch of training. It is the responsibility of the backend to execute the training script for different configurations, often in parallel, and to relay their reports back to the tuner.

Local Backend

Class LocalBackend. This backend runs each training job locally, on the same machine as the tuner. Each training job is run as a subprocess. Importantly, this means that the number of workers, as specified by n_workers passed to Tuner, must be smaller or equal to the number of independent resources on this machine, e.g. the number of GPUs or CPU cores. Experiments with the local backend can either be launched on your current machine (in which case this needs to own the resources you are requesting, such as GPUs), or you can launch the experiment remotely as a SageMaker training job, using an instance type of your choice. The figure below demonstrates the local backend. On the left, both scripts are executed on the local machine, while on the right, scripts are run remotely.

image1

image2

Local backend on a local machine

Local backend when running on SageMaker

Syne Tune support rotating multiple GPUs on the machine, assigning the next trial to the least busy GPU, e.g. the GPU with the smallest amount of trials currently running.

The local backend is simple and has very small delays for starting, stopping, or resuming trials. However, it also has shortcomings. Most importantly, the number of trials which can run concurrently, is limited by the resources of the chosen instance. If GPUs are required, each trial is limited to using a single GPU, so that several trials can run in parallel.

The Python backend (PythonBackend) is simply a wrapper around the local backend, which allows you to define an experiment in a single script (instead of two).

SageMaker Backend

Class SageMakerBackend. This backend runs each trial evaluation as a separate SageMaker training job. Given sufficient instance limits, you can run your experiments with any number of workers you like, and each worker may use all resources on the executing instance. It is even possible to execute trials on instances of different types, which allows for joint tuning of hyperparameters and compute resources. The figure below demonstrates the SageMaker backend. On the left, the launcher script runs on the local machine, while on the right, it is run remotely.

image3

image4

SageMaker backend with tuner running locally

SageMaker backend with tuner running on SageMaker

The SageMaker backend executes each trial as independent SageMaker training job, This allows you to use any instance type and configuration you like. Also, you may use any of the SageMaker frameworks, from scikit-learn over PyTorch and TensorFlow, up to dedicated frameworks for distributed training. You may also bring your own Docker image.

This backend is most suited to tune models for which training is fairly expensive. SageMaker training jobs incur certain delays for starting or stopping, which are not present in the local backend. The SageMaker backend can be sped up by using SageMaker managed warm pools.

Simulator Backend

Class BlackboxRepositoryBackend. This backend is useful for comparing HPO methods, or variations of such methods. It runs on a tabulated or surrogate benchmark, where validation metric data typically obtained online by running a training script has been precomputed offline. In a corporate setting, simulation experiments are useful for unit and regression testing, but also to speed up evaluations of prototypes. More details are given here, and in this example.

The main advantage of the simulator backend is that it allows for realistic experimentation at very low cost, and running order of magnitude faster than real time. A drawback is the upfront cost of generating a tabulated benchmark of sufficient complexity to match the real problem of interest.

Importantly, Syne Tune is agnostic to which execution backend is being used. You can easily switch between backends by changing the trial_backend argument in Tuner:

Benchmarking

A benchmark is a collection of meta-datasets from different configuration spaces, where the exact dataset split, the evaluation protocol, and the performance measure are well-specified. Benchmarking allows for experimental reproducibility and assist us in comparing HPO methods on the specified configurations. Refer to this tutorial for a complete guide on benchmarking in Syne Tune.

Setting up the Problem

Running Example

For most of this tutorial, we will be concerned with one running example: tuning some hyperparameters of a two-layer perceptron on the FashionMNIST dataset.

FashionMNIST

Two-layer MLP

image1

image2

This is not a particularly difficult problem. Due to its limited size, and the type of model, you can run it on a CPU instance. It is not a toy problem either. Depending on model size, training for the full number of epochs can take more than 90 minutes. We will present results obtained by running HPO for 3 hours, using 4 workers. In order to get best possible results with model-based HPO, you would have to run for longer.

Annotating the Training Script

You will normally start with some code to train a machine learning model, which comes with a number of free parameters you would like to tune. The goal is to obtain a trained (and tuned) model with low prediction error on future data from the same task. One way to do this is to split available data into disjoint training and validation sets, and to score a configuration (i.e., an instantiation of all hyperparameters) by first training on the training set, then computing the error on the validation set. This is what we will do here, while noting that there are other (more costly) scores we could have used instead (e.g., cross-validation). Here is an example:

traincode_report_end.py
import argparse
import logging

from benchmarking.training_scripts.mlp_on_fashion_mnist.mlp_on_fashion_mnist import (
    download_data,
    split_data,
    model_and_optimizer,
    train_model,
    validate_model,
)
from syne_tune import Reporter


def objective(config):  # [1]
    # Download data
    data_train = download_data(config)
    # Report results to Syne Tune
    report = Reporter()
    # Split into training and validation set
    train_loader, valid_loader = split_data(config, data_train)
    # Create model and optimizer
    state = model_and_optimizer(config)
    # Training loop
    for epoch in range(1, config["epochs"] + 1):
        train_model(config, state, train_loader)

    # Report validation accuracy to Syne Tune
    # [2]
    accuracy = validate_model(config, state, valid_loader)
    report(accuracy=accuracy)


if __name__ == "__main__":
    root = logging.getLogger()
    root.setLevel(logging.INFO)
    # [3]
    parser = argparse.ArgumentParser()
    parser.add_argument("--epochs", type=int, required=True)
    parser.add_argument("--dataset_path", type=str, required=True)
    # Hyperparameters
    parser.add_argument("--n_units_1", type=int, required=True)
    parser.add_argument("--n_units_2", type=int, required=True)
    parser.add_argument("--batch_size", type=int, required=True)
    parser.add_argument("--dropout_1", type=float, required=True)
    parser.add_argument("--dropout_2", type=float, required=True)
    parser.add_argument("--learning_rate", type=float, required=True)
    parser.add_argument("--weight_decay", type=float, required=True)

    args, _ = parser.parse_known_args()
    # Evaluate objective and report results to Syne Tune
    objective(config=vars(args))

This script imports boiler plate code from mlp_on_fashionmnist.py. It is a typical script to train a neural network, using PyTorch:

  • [1] objective is encoding the function we would like to optimize. It downloads the data, splits it into training and validation set, and constructs the model and optimizer. Next, the model is trained for config['epochs'] epochs. An epoch constitutes a partitioning of the training set into mini-batches of size config['batch_size'], presented to the stochastic gradient descent optimizer in a random ordering.

  • [2] Finally, once training is done, we compute the accuracy of the model on the validation set and report it back to Syne Tune. To this end, we create a callback (report = Reporter()) and call it once the training loop finished, passing the validation accuracy (report(accuracy=accuracy)).

  • [3] Values in config are parameters of the training script. As is customary in SageMaker, these parameters are command line arguments to the script. A subset of these parameters are hyperparameters, namely the parameters we would like to tune. Our example has 7 hyperparameters, 3 of type int and 4 of type float. Another notable parameter is config['epochs'], the number of epochs to train. This is not a parameter to be tuned, even though it plays an important role when we get to early stopping methods below. If your training problem is iterative in nature, we recommend you include the number of iterations (or epochs) among the parameters to your script.

  • [4] Most hyperparameters determine the model, optimizer or learning rate scheduler. In model_and_optimizer, we can see that config['n_units_1'], config['n_units_2'] are the number of units in first and second hidden layer of a multi-layer perceptron with ReLU activations and dropout (FashionMNIST inputs are 28-by-28 grey-scale images, and there are 10 classes). Also, config['learning_rate'] and config['weight_decay] parameterize the Adam optimizer.

This script differs by a vanilla training script only by two lines, which create reporter and call it at the end of training. Namely, we report the validation accuracy after training as report(accuracy=accuracy).

Note

By default, the configuration is passed to the training script as command line arguments. This precludes passing arguments of complex type, such as lists or dictionaries, as there is also a length limit to arguments. In order to get around these restrictions, you can also pass arguments via a JSON file.

Defining the Configuration Space

Having defined the objective, we still need to specify the space we would like to search over. We will use the following configuration space throughout this tutorial:

hpo_main.py: Configuration space
from syne_tune.config_space import randint, uniform, loguniform


# Configuration space (or search space)
config_space = {
    "n_units_1": randint(4, 1024),
    "n_units_2": randint(4, 1024),
    "batch_size": randint(8, 128),
    "dropout_1": uniform(0, 0.99),
    "dropout_2": uniform(0, 0.99),
    "learning_rate": loguniform(1e-6, 1),
    "weight_decay": loguniform(1e-8, 1),
}


The configuration space is a dictionary with key names corresponding to command line input parameters of our training script. For each parameter you would like to tune, you need to specify a Domain, imported from syne_tune.config_space. A domain consists of a type (float, int, categorical), a range (inclusive on both ends), and an encoding (linear or logarithmic). In our example, n_units_1, n_units_2, batch_size are int with linear encoding (randint), dropout_1, dropout_2 are float with linear encoding (uniform), and learning_rate, weight_decay are float with logarithmic encoding (loguniform). We also need to specify upper and lower bounds: n_units_1 lies between 4 and 1024, the range includes both boundary values.

Choosing a good configuration space for a given problem may require some iterations. Parameters like learning rate or regularization constants are often log-encoded, as best values may vary over several orders of magnitude and may be close to 0. On the other hand, probabilities are linearly encoded. Search ranges need to be chosen wide enough not to discount potentially useful values up front, but setting them overly large risks a long tuning time.

In general, the range definitions are more critical for methods based on random exploration than for model-based HPO methods. On the other hand, we should avoid to encode finite-sized numerical ranges as categorical for model-based HPO, instead using one of the more specialized types in Syne Tune. More details on choosing the configuration space are provided here, where you will also learn about more types: categorical, finite range, and ordinal.

Finally, you can also tune only a subset of the hyperparameters of your training script, providing fixed (default) values for the remaining ones. For example, the following configuration space fixes the model architecture:

from syne_tune.config_space import randint, uniform, loguniform

config_space = {
    'n_units_1': 512,
    'n_units_2': 128,
    'batch_size': randint(8, 128),
    'dropout_1': uniform(0, 0.99),
    'dropout_2': uniform(0, 0.99),
    'learning_rate': loguniform(1e-6, 1),
    'weight_decay': loguniform(1e-8, 1),
}

Bayesian Optimization

What is Bayesian Optimization?

One of the oldest and most widely used instantiations of sequential model-based search is Bayesian optimization. There are a number of great tutorials and review articles on Bayesian optimization, and we won’t repeat them here:

Most instances of Bayesian optimization work by modelling the objective as function \(f(\mathbf{x})\), where \(\mathbf{x}\) is a configuration from the search space. Given such a probabilistic surrogate model, we can condition it on the observed metric data (b) in order to obtain a posterior distribution. Finally, we use this posterior distribution along with additional statistics obtained from the data (such as for example the best metric value attained so far) in order to compute a acquisition function \(a(\mathbf{x})\), an (approximate) maximum of which will be our suggested configuration. While \(a(\mathbf{x})\) can itself be difficult to globally optimize, it is available in closed form and can typically be differentiated w.r.t. \(\mathbf{x}\). Moreover, it is important to understand that \(a(\mathbf{x})\) is not an approximation to \(f(\mathbf{x})\), but instead scores the expected value of sampling the objective at \(\mathbf{x}\), thereby embodying the explore-exploit trade-off. In particular, once some \(\mathbf{x}_*\) is chosen and included into the set (a), \(a(\mathbf{x}_*)\) is much diminished.

The Bayesian optimization template requires us to make two choices:

  • Surrogate model: By far the most common choice is to use Gaussian process surrogate models (the tutorials linked above explain the basics of Gaussian processes). A Gaussian process is parameterized by a mean and a covariance (or kernel) function. In Syne Tune, the default corresponds to what is most frequently used in practice: Matern 5/2 kernel with automatic relevance determination (ARD). A nice side effect of this choice is that the model can learn about the relative relevance of each hyperparameter as more metric data is obtained, which allows this form of Bayesian optimization to render the curse of dimensionality much less severe than it is for random search.

  • Acquisition function: The default choice in Syne Tune corresponds to the most popular choice in practice: expected improvement.

GP-based Bayesian optimization is run by our launcher script with the argument --method BO. Many options can be specified via search_options, but we use the defaults here. See GPFIFOSearcher for all details. In our example, we set num_init_random to n_workers + 2, which is the number of initial decisions made by random search, before switching over to maximizing the acquisition function.

Results for Bayesian Optimization

Results for Bayesian Optimization

Results for Bayesian Optimization

Here is how Bayesian optimization performs on our running example, compared to random search. We used the same conditions (4 workers, 3 hours experiment time, 50 random repetitions).

In this particular setup, Bayesian optimization does not outperform random search after 3 hours. This is a rather common pattern. Bayesian optimization requires a certain amount of data in order to learn enough about the objective function (in particular, about which parameters are most relevant) in order to outperform random search by targeted exploration and exploitation. If we continued to 4 or 5 hours, we would see a significant difference.

Recommendations

Here, we collect some additional recommendations. Further details are found here.

Categorical Hyperparameters

While our running example does not have any, hyperparameters of categorical type are often used. For example:

from syne_tune.config_space import lograndint, choice

config_space = {
    'n_units_1': lograndint(4, 1024),
    # ...
    'activation': choice(['ReLU', 'LeakyReLU', 'Softplus']),
}

Here, activation could determine the type of activation function. It is important to understand that in Bayesian optimization, a categorical parameter is encoded as vector in the multi-dimensional unit cube: the encoding dimension is equal to the number of different values. This is to make sure there is no ordering information between the different values, each pair has the same distance in the encoding space.

This is usually not what you want with numerical values, whose ordering provide important information to the search. For example, it sounds simpler to search over the finite range choice([4, 8, 16, 32, 64, 128, 256, 512, 1024]) than over the infinite lograndint(4, 1024) for n_units_1, but the opposite is the case. The former occupies 9 dimensions, the latter 1 dimension in the encoded space, and ordering information is lost for the former. A better alternative is logfinrange(4, 1024, 9).

Syne Tune provides a range of finite numerical domains in order to avoid suboptimal performance of Bayesian optimization due to the uncritical use of choice. Since this is somewhat subtle, and you may also want to import configuration spaces from other HPO libraries which do not have these types, Syne Tune provides an automatic conversion logic with streamline_config_space(). Details are given here.

Note

When using Bayesian optimization or any other model-based HPO method, we strongly recommend to use streamline_config_space() in order to ensure that your domains are chosen in a way that works best with internal encoding.

Speeding up Decision-Making

Gaussian process surrogate models have many crucial advantages over other probabilistic surrogate models typically used in machine learning. But they have one key disadvantage: inference computations scale cubically in the number of observations. For most HPO use cases, this is not a problem, since no more than a few hundred evaluations can be afforded.

Syne Tune allows to control the number of observations the GP surrogate model is fit to, via max_size_data_for_model in search_options. If the data is larger, it is downsampled to this size. Sampling is controlled by another argument max_size_top_fraction. Namely, this fraction of entries in the downsampled set are filled by those points in the full set with the best metric values, while the remaining entries are sampled (with replacement) from the rest of the full set. The default for max_size_data_for_model is DEFAULT_MAX_SIZE_DATA_FOR_MODEL. The feature is switched off by setting this to None or a very large value, but this is not recommended. Subsampling is repeated every time the surrogate model is fit.

Beyond, there are some search_options arguments you can use in order to speed up Bayesian optimization. The most expensive part of making a decision consists in refitting the parameters of the GP surrogate model, such as the ARD parameters of the kernel. While this refitting is essential for good performance with a small number of observations, it can be thinned out or even stopped when the dataset gets large. You can use opt_skip_init_length, opt_skip_period to this end (details are here.

Warping of Inputs

If you use input_warping=True in search_options, inputs are warped before being fed into the covariance function, the effective kernel becomes \(k(w(x), w(x'))\), where \(w(x)\) is a warping transform with two non-negative parameters per component. These parameters are learned along with other parameters of the surrogate model. Input warping allows the surrogate model to represent non-stationary functions, while still keeping the numbers of parameters small. Note that only such components of \(x\) are warped which belong to non-categorical hyperparameters.

Box-Cox Transformation of Target Values

This option is available only for positive target values. If you use boxcox_transform=True in search_options, target values are transformed before being fitted with a Gaussian marginal likelihood. This is using the Box-Cox transform with a parameter \(\lambda\), which is learned alongside other parameters of the surrogate model. The transform is \(\log y\) for \(\lambda = 0\), and \(y - 1\) for \(\lambda = 1\).

Both input warping and Box-Cox transform of target values are combined in this paper:

Cowen-Rivers, A. et.al.
HEBO: Pushing the Limits of Sample-efficient Hyper-parameter Optimisation
Journal of Artificial Intelligence Research 74 (2022), 1269-1349

However, they fit \(\lambda\) up front by maximizing the likelihood of the targets under a univariate Gaussian assumption for the latent \(z\), while we learn \(\lambda\) jointly with all other parameters.

Asynchronous Successive Halving and Hyperband

Early Stopping

Learning Curves

Learning Curves (image from Aaron Klein)

In the methods discussed above, we train each model for 81 epochs before scoring it. This is expensive, it can take up to 1.5 hours. In order to figure out whether a configuration is pretty poor, do we really need to train it all the way to the end?

At least for most neural network training problems, the validation error after training only a few epoch can be a surprisingly strong signal separating the best from the worst configurations (see figure above). Therefore, if a certain trial shows worse performance after (say) 3 epochs than many others, we may just as well stop it early, allowing the worker to pick up another potentially more rewarding task.

Synchronous Successive Halving and Hyperband

Successive halving is a simple, yet powerful scheduling method based on the idea of early stopping. Applied to our running example, we would start 81 trials with different, randomly chosen configurations. Computing validation errors after 1 epoch, we stop the 54 (or 2/3) worst performing trials, allowing the 27 (or 1/3) best performing trials to continue. This procedure is repeated after 3, 9, and 27 epochs, each time the 2/3 worst performing trials are stopped. This way, only a single trial runs all the way to 81 epochs. Its configuration has survived stopping decisions after 1, 3, 9, and 27 epochs, so likely is worth its running time.

In practice, concurrent execution has to be mapped to a small number of workers, and successive halving is implemented by pausing trials at rung levels (i.e., after 1, 3, 9, 27 epochs), and then resuming the top 1/3 to continue training until the next rung level. Pause and resume scheduling is implemented by checkpointing. We will ignore these details for now, but come back to them later. Ignoring practical details of scheduling, and assuming that training time per epoch is the same for each trial, the idea behind successive halving is to spend the same amount of time on trials stopped after 1, 3, 9, and 27 epochs, while making sure that at each rung level, the 2/3 worst performers are eliminated.

Successive halving has two parameters: the reduction factor (3 in our example), and the grace period (1 in our example). For a reduction factor 2, rung levels would be 1, 2, 4, 8, 16, 32, 64, and we would eliminate the 1/2 worst performers at each of them. The larger the reduction factor, the fewer rung levels, and the more aggressive the filtering at each of them. The default value in Syne Tune is 3, which seems to work well for most neural network tuning problems. The grace period is the lowest rung level. Its choice is more delicate. If set too large, the potential advantage of early stopping is lost, since even the worst trials are trained for this many epochs. If set too small, the validation errors at the lowest rung level are determined more by the random initial weights than the training data, and stopping decisions there will be arbitrary.

Hyperband, a generalization of successive halving, eliminates the grace period as free parameter. In our example above, rung levels were [1, 3, 9, 27, 81], and the grace period was 1. Hyperband defines brackets as sub-sequences starting at 1, 3, 9, 27, 81, of size 5, 4, 3, 2, 1 respectively. Then, successive halving is run on each of these brackets in sequence, where the number of trials started for each bracket is adjusted in a way that roughly equalizes the total number of epochs trained in each bracket.

While successive halving and Hyperband are widely known, they do not work all that well for hyperparameter tuning of neural network models. The main reason for this is their synchronous nature of decision-making. If we think of rungs as lists of slots, which are filled by metric results of trials getting there, each rung has an a priori fixed size. In our successive halving example, rungs at r = 1, 3, 9, 27, 81 epochs have sizes 81 / r. Each rung is a synchronization point. Before any trial can be resumed towards level 3, all 81 trials have to complete their first epoch. The progress of well-performing trials is delayed, not only because workers are idle due to some trials finishing faster than others, but also because of sequential computations (we rarely have 81 workers available). At the other extreme, filling the final rung requires a single trial to train for 54 epochs, while all other workers are idle. This can be compensated to some extent by free workers running trials for the next iteration already, but scheduling becomes rather complex at this point. Syne Tune provides synchronous Hyperband as SynchronousHyperbandScheduler. However, we can usually do much better with asynchronous scheduling.

Asynchronous Successive Halving

An asynchronous scheduler needs to be free of synchronization points. Whenever a worker becomes available, the decision what it should do next must be instantaneous, based on the data available at that point in time. It is not hard to come up with an asynchronous variant successive halving. In fact, it can be done in several ways.

Returning to our example, we pre-define a system of rungs at 1, 3, 9, 27 epochs as before, and we record metric values of trials reaching each rung. However, instead of having fixed sizes up front, each rung is a growing list. Whenever a trial reaches a rung (by having trained as many epochs as the rung specifies), its metric value is entered into the sorted list. We can now compute a predicate continue which is true iff the new value lies in the top 1/3.

There are two variants of asynchronous successive halving (ASHA), with different requirements on the backend. In the stopping variant, a trial reaching a rung level is stopped and discarded if continue = False, otherwise it is allowed to continue. If there is not enough data at a rung, the trial continues by default. The backend needs to be able to stop jobs at random times.

In the promotion variant, a trial reaching a rung level is always paused, its worker is released. Once a worker becomes available, all rungs are scanned top down. If any paused trial with continue = True is found, it is resumed to train until the next rung level (e.g., a trial resumed at rung 3 trains until 9 epochs): the trial is promoted to the next rung. If no paused trial can be promoted, a new one is started from scratch. This ASHA variant requires pause and resume scheduling. In particular, a trial needs to checkpoint its state (at least at rung levels), and these checkpoints need to be accessible to all workers. On the other hand, the backend never needs to stop running trials, as the stopping condition for each training job is determined up front.

Scripts for Asynchronous Successive Halving

In this section, we will focus on the stopping variant of ASHA, leaving the promotion variant for later. First, we need to modify our training script. In order to support early stopping decisions, it needs to compute and report validation errors during training. Recall traincode_report_end.py used with random search and Bayesian optimization. We will replace objective with the following code snippet, giving rise to traincode_report_eachepoch.py:

traincode_report_eachepoch.py (relevant part)
def objective(config):
    # Download data
    data_train = download_data(config)
    # Report results to Syne Tune
    report = Reporter()
    # Split into training and validation set
    train_loader, valid_loader = split_data(config, data_train)
    # Create model and optimizer
    state = model_and_optimizer(config)
    # Training loop
    for epoch in range(1, config["epochs"] + 1):
        train_model(config, state, train_loader)
        # Report validation accuracy to Syne Tune
        accuracy = validate_model(config, state, valid_loader)
        report(epoch=epoch, accuracy=accuracy)


Instead of computing and reporting the validation error only after config['epochs'] epochs, we do this at the end of each epoch. To distinguish different reports, we also include epoch=epoch in each report. Here, epoch is called resource attribute. For Syne Tune’s asynchronous Hyperband and related schedulers, resource attributes must have positive integer values, which you can think of “resources spent”. For neural network training, the resource attribute is typically “epochs trained”.

This is the only modification we need. Curious readers may wonder why we report validation accuracy after every epoch, while ASHA really only needs to know it at rung levels. Indeed, with some extra effort, we could rewrite the script to compute and report validation metrics only at rung levels, and ASHA would work just the same. However, for most setups, training for an epoch is substantially more expensive than computing the validation error at the end, and we can keep our script simple. Moreover, Syne Tune provides some advanced model-based extensions of ASHA scheduling, which make good use of metric data reported at the end of every epoch.

Our launcher script runs stopping-based ASHA with the argument --method ASHA-STOP. Note that the entry point is traincode_report_eachepoch.py in this case, and the scheduler is ASHA. Also, we need to pass the name of the resource attribute in resource_attr. Finally, mode="stopping" selects the stopping variant. Further details about ASHA and relevant additional arguments (for which we use defaults here) are found in this tutorial.

When you run this script, you will note that many more trials are started than for random search, and that the majority of trials are stopped after 1 or 3 epochs.

Results for Asynchronous Successive Halving

Results for Asynchronous Successive Halving

Results for Asynchronous Successive Halving

Here are results for our running example (4 workers; 3 hours; median, 25/75 percentiles over 50 repeats). ASHA stopping makes a big difference, outperforming random search and Bayesian optimization substantially. Early stopping can speed up neural network tuning dramatically, compared to standard scheduling.

If we ran for much longer, Bayesian optimization would eventually catch up with ASHA and even do better. But of course, wall-clock time matters: it is an important, if not the most important metric for automated tuning. The faster satisfying results are obtained, the more manual iterations over data, model types, and high level features can be afforded.

Model-Based Asynchronous Successive Halving

Extrapolating Learning Curves

Learning Curves

Learning Curves (image from Aaron Klein)

By modelling metric data from earlier trials, Bayesian optimization learns to suggest more useful configurations down the line than randomly sampled ones. Since new configurations are sampled at random in ASHA, a natural question is how to combine it with Bayesian decision-making.

It is not immediately clear how to do this, since the data we observe per trial are not single numbers, but learning curves (see figure above). In fact, the most useful single function to model would be the validation error after the final epoch (81 in our example), but the whole point of early stopping scheduling is to query this function only very rarely. By the nature of successive halving scheduling, we observe at any point in time a lot more data for few epochs than for many. Therefore, Bayesian decision-making needs to incorporate some form of learning curve extrapolation.

One way to do so is to build a joint probabilistic model of all the data. The validation metric reported at the end of epoch \(r\) for configuration \(\mathbf{x}\) is denoted as \(f(\mathbf{x}, r)\). In order to allow for extrapolation from small \(r\) to \(r_{max}\) (81 in our example), our model needs to capture dependencies along epochs. Moreover, it also has to represent dependencies between learning curves for different configurations, since otherwise we cannot use it to score the value of a new configuration we have not seen data from before.

MOBSTER

A simple method combining ASHA with Bayesian optimization is MOBSTER. It restricts Bayesian decision-making to proposing configurations for new trials, leaving scheduling decisions for existing trials (e.g., stopping, pausing, promoting) to ASHA. Recall from Bayesian Optimization that we need two ingredients: a surrogate model \(f(\mathbf{x}, r)\) and an acquisition function \(a(\mathbf{x})\):

  • Surrogate model: MOBSTER uses joint surrogate models of \(f(\mathbf{x}, r)\) which start from a Gaussian process model over \(\mathbf{x}\) and extend it to learning curves, such that the distribution over \(f(\mathbf{x}, r)\) remains jointly Gaussian. This is done in several different ways, which are detailed below.

  • Acquisition function: MOBSTER adopts an idea from BOHB, where it is argued that the function of interest is really \(f(\mathbf{x}, r_{max})\) (where \(r_{max}\) is the full number of epochs), so expected improvement for this function would be a reasonable choice. However, this requires at least a small number of observations at this level. To this end, we use expected improvement for the function \(f(\mathbf{x}, r_{acq})\), where \(r_{acq}\) is the largest resource level for which a certain (small) number of observations are available.

These choices conveniently reduce MOBSTER to a Bayesian optimization searcher of similar form than without early stopping. One important difference is of course that a lot more data is available now, which has scaling implications for the surrogate model. More details about MOBSTER, and further options not discussed here, are given in this tutorial.

Our launcher script runs stopping-based MOBSTER with the argument --method MOBSTER-STOP. At least if defaults are chosen, this is much the same as for ASHA-STOP. However, we can configure the surrogate model with a range of options, which are detailed here.

Results for MOBSTER

Results for MOBSTER

Results for MOBSTER

Here are results for our running example (4 workers; 3 hours; median, 25/75 percentiles over 50 repeats). MOBSTER performs comparably to ASHA on this example. As with Bayesian optimization versus random search, it would need more time in order to make a real difference.

Results on NASBench201 (ImageNet-16)

We repeated this comparison on a harder benchmark problem: NASBench-201, on the ImageNet-16 dataset. Here, r_max = 200, and rung levels are 1, 3, 9, 27, 81, 200. We used 8 workers and 8 hours experiment time, and once more report median and 25/75 percentiles over 50 repeats. Now, after about 5 hours, MOBSTER starts to break away from ASHA and performs significantly better.

Results on NASBench201

Results on NASBench201 (ImageNet-16)

In order to understand why MOBSTER outperforms ASHA, we can visualize the learning curves of trials. In these plots, neighboring trials are assigned different colors, circles mark rung levels, and diamonds mark final rung levels reached.

ASHA

MOBSTER

image1

image2

We can see that ASHA continues to suggest poor configurations at a constant rate. While these are stopped after 1 epoch, they still take up valuable resources. In contrast, MOBSTER quickly learns how to avoid the worst configurations and spends available resource more effectively.

Promotion-based Scheduling

Pause and Resume. Checkpointing of Trials

As we have seen, one way to implement early stopping scheduling is to make trials report metrics at certain points (rung levels), and to stop them when their performance falls behind other trials. This is conceptually simple. A trial maps to a single training run, and it is very easy to annotate training code in order to support automated tuning.

Another idea is pause and resume. Here, a trial may be paused at the end of an epoch, releasing its worker. Any paused trial may be resumed later on when a worker becomes available, which means that it continues training where it left when paused. Synchronous schedulers need pause and resume, since trials reach a synchronization point at different times, and earlier ones have to wait for the slowest one. For asynchronous schedulers, pause and resume is an alternative to stopping trials, which can often work better. While a paused trial needs no resources, it can be resumed later on, so its past training time is not wasted.

However, pause and resume needs more support from the training script, which has to make sure that a paused trial can be resumed later on, continuing training as if nothing happened in between. To this end, the state of the training job has to be checkpointed (i.e., stored into a file). The training script has to be modified once more, by replacing objective with this code:

traincode_report_withcheckpointing.py
import argparse
import logging

from benchmarking.training_scripts.mlp_on_fashion_mnist.mlp_on_fashion_mnist import (
    download_data,
    split_data,
    model_and_optimizer,
    train_model,
    validate_model,
)
from syne_tune import Reporter
from syne_tune.utils import (
    resume_from_checkpointed_model,
    checkpoint_model_at_rung_level,
    add_checkpointing_to_argparse,
    pytorch_load_save_functions,
)


def objective(config):
    # Download data
    data_train = download_data(config)
    # Report results to Syne Tune
    report = Reporter()
    # Split into training and validation set
    train_loader, valid_loader = split_data(config, data_train)
    # Create model and optimizer
    state = model_and_optimizer(config)
    # Checkpointing
    load_model_fn, save_model_fn = pytorch_load_save_functions(
        {"model": state["model"], "optimizer": state["optimizer"]}
    )
    # Resume from checkpoint (optional)  [2]
    resume_from = resume_from_checkpointed_model(config, load_model_fn)
    # Training loop
    for epoch in range(resume_from + 1, config["epochs"] + 1):
        train_model(config, state, train_loader)
        # Write checkpoint (optional)  [1]
        checkpoint_model_at_rung_level(config, save_model_fn, epoch)
        # Report validation accuracy to Syne Tune
        accuracy = validate_model(config, state, valid_loader)
        report(epoch=epoch, accuracy=accuracy)


if __name__ == "__main__":
    root = logging.getLogger()
    root.setLevel(logging.INFO)
    parser = argparse.ArgumentParser()
    parser.add_argument("--epochs", type=int, required=True)
    parser.add_argument("--dataset_path", type=str, required=True)
    # Hyperparameters
    parser.add_argument("--n_units_1", type=int, required=True)
    parser.add_argument("--n_units_2", type=int, required=True)
    parser.add_argument("--batch_size", type=int, required=True)
    parser.add_argument("--dropout_1", type=float, required=True)
    parser.add_argument("--dropout_2", type=float, required=True)
    parser.add_argument("--learning_rate", type=float, required=True)
    parser.add_argument("--weight_decay", type=float, required=True)
    # [3]
    add_checkpointing_to_argparse(parser)

    args, _ = parser.parse_known_args()
    # Evaluate objective and report results to Syne Tune
    objective(config=vars(args))

Checkpointing requires you to implement the following:

  • [1] A checkpoint has to be written at the end of each epoch. The precise content of the checkpoint depends on the training script, but it has to contain the epoch at which it was written. It is recommended to write the checkpoint before reporting metrics to Syne Tune, since otherwise the writing of the checkpoint may be jeopardized by Syne Tune trying to stop the script.

  • [2] A checkpoint is to be loaded just before the start of the training loop. If the checkpoint file is present and the state can be restored, the training loop starts with the epoch resume_from + 1, where the checkpoint was written at the end of epoch resume_from. Otherwise, resume_from = 0, and the training loop starts from scratch.

  • [3] Checkpointing requires additional input arguments. You can add them by hand or use add_checkpointing_to_argparse. The most important is the local directory name where the checkpoint should be written or loaded from. A checkpoint may consist of different files. If this argument is not passed to the script, checkpointing is deactivated.

Syne Tune provides some helper functions for checkpointing, see FAQ.

  • checkpoint_model_at_rung_level(config, save_model_fn, epoch) stores a checkpoint at the end of epoch epoch. The main work is done by save_model_fn.

  • resume_from = resume_from_checkpointed_model(config, load_model_fn) loads a checkpoint, and returns its epoch if successful. Otherwise, 0 is returned. Again, load_model_fn does the main work.

  • pytorch_load_save_functions: If you use PyTorch, this is providing save_model_fn, load_model_fn that should work for you. In state_dict_objects, you pass a dict of PyTorch objects with a mutable state (look for load_state_dict, state_dict methods). Make sure to include all relevant objects (model, algorithm, learning rate scheduler). Optionally, mutable_state contains additional elementary variables.

Note that while checkpoints are written at the end of each epoch, the most recent one overwrites previous ones. In fact, for the purpose of pause and resume, checkpoints have to be written only at rung levels, because trials can only be paused there. Selective checkpointing could be supported by passing the rung levels to the training script, but this is currently not done in Syne Tune.

Our launcher script runs promotion-based ASHA with the argument --method ASHA-PROM, and promotion-based MOBSTER with --method MOBSTER-PROM:

  • Recall that the argument max_resource_attr for HyperbandScheduler allows the scheduler to infer the maximum resource level r_max. For promotion-based scheduling, this argument has a second function. Namely, it allows the scheduler to inform the training script until which epoch it has to train, so it does not have to be stopped anymore from the outside. For example, say that a trial paused at r=3 is promoted to run until the next rung level r=9. The scheduler calls the training script with config[max_resource_attr] = 9 (instead of 81). It is resumed from its r=3 checkpoint and runs epochs 4, 5, 6, 7, 8, 9, then terminates by itself. If max_resource_attr is not used, training scripts are started to be run until the end, and they need to be stopped by the backend. Depending on the backend, there can be a delay between a stopping signal being sent and a worker coming available again, which is avoided if max_resource_attr is used. Moreover, future backends may be able to use the information on how long a resumed trial needs to be run until paused for improved scheduling.

  • Syne Tune allows promotion-based schedulers to be used with training scripts which do not implement checkpointing. Our launcher script would just as well work with traincode_report_eachepoch.py. In this case, a trial to be resumed is started from scratch, and metric reports up to the resume epoch are ignored. For example, say a trial paused at r=3 is resumed. If the training script does not implement checkpointing, it will start from scratch and report for r = 1, 2, 3, 4, .... The scheduler discards the first 3 reports in this case. However, it is strongly recommended to implement checkpointing if promotion-based scheduling is to be used.

Early Removal of Checkpoints

By default, the checkpoints written by all trials are retained on disk (for a trial, later checkpoints overwrite earlier ones). When checkpoints are large and the local backend is used, this may result in a lot of disk space getting occupied, or even the disk filling up. Syne Tune supports checkpoints being removed once they are not needed anymore, or even speculatively, as is detailed here.

Results for promotion-based ASHA and MOBSTER

Results for promotion-based ASHA and MOBSTER

Results for promotion-based ASHA and MOBSTER

Here are results for our running example (4 workers; 3 hours; median, 25/75 percentiles over 50 repeats). These results are rather similar to what we obtained for stopping-based scheduling, except the random variations are somewhat larger for ASHA stopping than for ASHA promotion.

It is not a priori clear when stopping or promotion-based scheduling will work better. When it comes to the backend, promotion-based scheduling needs checkpointing, and the backend needs to efficiently handle the transfer of checkpoints between workers. On the other hand, promotion-based scheduling does not require the backend to stop jobs (see max_resource_attr discussion above), which can be subject to delays in some backends. Run with the local backend, where delays play no role, stopping and promotion-based scheduling can behave quite differently. In our experiments, we have often observed that stopping can be more efficient at the beginning, while promotion has an edge during later stages.

Our recommendation is to implement checkpointing in your training script, which gives you access to all Syne Tune schedulers, and then to gain some experience with what works best for your problem at hand.

SageMaker Backend

Limitations of the Local Backend

We have been using the local backend LocalBackend in this tutorial so far. Due to its simplicity and very low overheads for starting, stopping, or resuming trials, this is the preferred choice for getting started. But with models and datasets getting larger, some disadvantages become apparent:

  • All concurrent training jobs (as well as the tuning algorithm itself) are run as subprocesses on the same instance. This limits the number of workers by what is offered by the instance type. You can set n_workers to any value you like, but what you really get depends on available resources. If you want 4 GPU workers, your instance types needs to have at least 4 GPUs, and each training job can use only one of them.

  • It is hard to encapsulate dependencies of your training code. You need to specify them explicitly, and they need to be compatible with the Syne Tune dependencies. You cannot use Docker images.

  • You may be used to work with SageMaker frameworks, or even specialized setups such as distributed training. In such cases, it is hard to get tuning to work with the local backend.

Launcher Script for SageMaker Backend

Syne Tune offers the SageMaker backend SageMakerBackend as alternative to the local one. Using it requires some preparation, as is detailed here.

Recall our launcher script. In order to use the SageMaker backend, we need to create trial_backend differently:

trial_backend = SageMakerBackend(
    # we tune a PyTorch Framework from Sagemaker
    sm_estimator=PyTorch(
        entry_point=entry_point.name,
        source_dir=str(entry_point.parent),
        instance_type="ml.c5.4xlarge",
        instance_count=1,
        role=get_execution_role(),
        dependencies=[str(repository_root_path() / "benchmarking")],
        max_run=int(1.05 * args.max_wallclock_time),
        framework_version="1.7.1",
        py_version="py3",
        disable_profiler=True,
        debugger_hook_config=False,
        sagemaker_session=default_sagemaker_session(),
    ),
    metrics_names=[metric],
)

In essence, the SageMakerBackend is parameterized with a SageMaker estimator, which executes the training script. In our example, we use the PyTorch SageMaker framework as a pre-built container for the dependencies our training scripts requires. However, any other type of SageMaker estimator can be used here just as well. Finally, if you include any of the metrics reported by your training script in metrics_names, their values are visualized in the dashboard for the SageMaker training job.

If your training script requires additional dependencies not contained in the chosen SageMaker framework, you can specify those in a requirements.txt file in the same directory as your training script (i.e., in the source_dir of the SageMaker estimator). In our example, this file needs to contain the filelock dependence.

Note

This simple example avoids complications about writing results to S3 in a unified manner, or using special features of SageMaker which can speed up tuning substantially. For more information about the SageMaker backend, please consider this tutorial.

Outlook

Further Topics

We are at the end of this basic tutorial. There are many further topics we did not touch here. Some are established, but not basic, while others are still experimental. Here is an incomplete overview:

  • Running many experiments in parallel: We have stressed the importance of running repetitions of experiments, as results carry quite some stochastic variation. Also, there are higher-level decisions best done by trial-and-error, which can be seen as “outer loop random search”. Syne Tune offers facilities to launch many tuning experiments in parallel, as SageMaker training jobs. More details are found in this tutorial, see also the FAQ.

  • Multi-fidelity Schedulers: Syne Tune provides many more multi-fidelity schedulers than ASHA and MOBSTER. An overview and categorization of supported methods is given in this tutorial.

  • Population-based Training: This is a popular scheduler for tuning reinforcement learning, where optimization hyperparameters like learning rate can be changed at certain points during the training. An example is at examples/launch_pbt.py, see also PopulationBasedTraining. Note that checkpointing is mandatory for PBT.

  • Constrained HPO: In many applications, more than a single metric play a role. With constrained HPO, you can maximize recall subject to a constraint on precision; minimize prediction latency subject to a constraint on accuracy; or maximize accuracy subject to a constraint on a fairness metric. Constrained HPO is a special case of Bayesian Optimization, where searcher='bayesopt_constrained', and the name of the constraint metric (the constraint is feasible iff this metric is non-positive) must be given as constraint_attr in search_options. More details on constrained HPO and methodology adopted in Syne Tune can be found here, see also ConstrainedGPFIFOSearcher.

  • Multi-objective HPO: Another way to approach tuning problems with multiple metrics is trying to sample the Pareto frontier, i.e. identifying configurations whose performance along one metric cannot be improved without degrading performance along another. Syne Tune provides a range of methodology in this direction. An example is at examples/launch_height_moasha.py. More details on multi-objective HPO and methodology adopted in Syne Tune can be found here, see also MOASHA.

  • Transfer-learning Schedulers: Syne Tune provides several transfer-learning schedulers. To get started check out this tutorial.

How to Choose a Configuration Space

One important step in applying hyperparameter optimization to your tuning problem is to define a configuration space (or search space). Doing this optimally for any given problem is more of an art than a science, but in this tutorial you will learn about the basics and some gotchas. Syne Tune also provides some logic in streamline_config_space() to automatically transform domains into forms more suitable for Bayesian optimization, this is explained here as well.

Introduction

Here is an example for a configuration space:

from syne_tune.config_space import (
    lograndint, uniform, loguniform, choice,
)

config_space = {
    'n_units': lograndint(4, 1024),
    'dropout': uniform(0, 0.9),
    'learning_rate': loguniform(1e-6, 1),
    'activation': choice(['relu', 'tanh']),
    'epochs': 128,
}

Not all entries in config_space need to be hyperparameters. For example, epochs is simply a constant passed to the training function. For every hyperparameter, a domain has to be specified. The domain determines the value range of the parameter and its internal encoding.

Each hyperparameter is independent of the other entries in config_space. In particular, the domain of a hyperparameter cannot depend on the value of another. In fact, common actions involving a configuration space, such as sampling, encoding, or decoding a configuration are done independently on its hyperparameters.

Domains

A domain not only defines the value range of a parameter, but also its internal encoding. The latter is important in order to define what uniform sampling means, a basic component of many HPO algorithms. The following domains are currently supported (for full details, see syne_tune.config_space):

  • uniform(lower, upper): Real-valued uniform in [lower, upper]

  • loguniform(lower, upper): Real-valued log-uniform in [lower, upper]. More precisely, the value is exp(x), where x is drawn uniformly in [log(lower), log(upper)].

  • randint(lower, upper): Integer uniform in lower, ..., upper. The value range includes both lower and upper (difference to Python range convention, where upper would not be included).

  • lograndint(lower, upper): Integer log-uniform in lower, ..., upper. More precisely, the value is int(round(exp(x))), where x is drawn uniformly in [log(lower - 0.5), log(upper + 0.5)].

  • choice(categories): Uniform from the finite list categories of values. Entries in categories should ideally be of type str, but types int and float are also allowed (the latter can lead to errors due to round-off).

  • ordinal(categories, kind): Variant of choice for which the order of entries in categories matters. For methods like Bayesian optimization, nearby elements in the list have closer encodings. Compared to choice, the encoding consists of a single number here. Different variants are implemented. If kind="equal" (general default), we use randint(0, len(categories) - 1) internally on the positions in categories, so that each category is chosen with the same probability. If kind="nn" (default if categories strictly increasing and of type int or float), categories must contain strictly increasing int or float values. Internally, we use uniform for an interval containing all values, a real value is mapped to a category by nearest neighbor. If kind="nn-log", this is done in the log space. logordinal(categories) is a synonym for ordinal(categories, kind="nn-log"). The latter two kinds are finite set versions of uniform, loguniform, the different categories are (in general) not chosen with equal probabilities.

  • finrange(lower, upper, size): Can be used as finite analogue of uniform. Uniform from the finite range lower, ..., upper of size size, where entries are equally spaced. For example, finrange(0.5, 1.5, 3) means 0.5, 1.0, 1.5, and finrange(0.1, 1.0, 10) means 0.1, 0.2, ..., 1.0. We require that size >= 2. Note that both lower and upper are part of the value range.

  • logfinrange(lower, upper, size): Can be used as finite analogue of loguniform. Values are exp(x), where x is drawn uniformly from the finite range log(lower), ..., log(upper) of size size (entries equally spaced). Note that both lower and upper are part of the value range.

By default, the value type for finrange and logfinrange is float. It can be changed to int by the argument cast_int=True. For example, logfinrange(8, 256, 6, cast_int=True) results in 8, 16, 32, 64, 128, 256 and value type int, while logfinrange(8, 256, 6) results in 8.0, 16.0, 32.0, 64.0, 128.0, 256.0 and value type float.

Recommendations

How to choose the domain for a given hyperparameter? Obviously, we want to avoid illegal values: learning rates should be positive, probabilities lie in [0, 1], and numbers of units must be integers. Apart from this, the choice of domain is not always obvious, and different choices can affect search performance significantly in some cases.

With streamline_config_space(), Syne Tune provides some logic which transforms domains into others more suitable for Bayesian optimization. For example:

from syne_tune.config_space import randint, uniform, choice
from syne_tune.utils import streamline_config_space

config_space = {
    'n_units': randint(4, 1024),
    'dropout': uniform(0, 0.9),
    'learning_rate': uniform(1e-6, 1),
    'weigth_decay': choice([0.001, 0.01, 0.1, 1.0]),
    'magic_constant': choice([1, 2, 5, 10, 15, 30]),
}
new_config_space = streamline_config_space(config_space)
# Results in:
# new_config_space = {
#     'n_units': lograndint(4, 1024),
#     'dropout': uniform(0, 0.9),
#     'learning_rate': loguniform(1e-6, 1),
#     'weigth_decay': logfinrange(0.001, 1.0, 4),
#     'magic_constant': logordinal([1, 2, 5, 10, 15, 30]),
# }

Here, new_config_space results in the same set of configurations, but the internal encoding is more suitable for many of the model-based HPO methods in Syne Tune. Why?

  • Avoid using choice (categorical) for numerical parameters. Many HPO algorithms make very good use of the information that a parameter is numerical, therefore has a linear ordering. They cannot do that if you do not tell them, and search performance will normally suffer. A good example is Bayesian optimization. Numerical parameters are encoded as themselves (the int domain is relaxed to the corresponding float interval), allowing the surrogate model (e.g., Gaussian process covariance kernel) to exploit ordering and distance in these numerical spaces. On the other hand, a categorical parameter with 10 different values is one-hot encoded to 10(!) dimensions in [0, 1]. Now, all pairs of distinct values have exactly the same distance in this embedding, so that any ordering or distance information is lost. Bayesian optimization does not perform well in general in high-dimensional embedding spaces.

    It is for this reason that streamline_config_space() converts the domains of weight_decay and magic_constant from choice to logfinrange and logordinal respectively.

  • Use infinite ranges. No competitive HPO algorithm ever enumerates all possible configurations and iterates over all of them. There is almost certainly no gain in restricting a learning rate to 5 values you just picked out of your hat, instead of just using the loguniform domain. However, there is a lot to be lost. First, if you use choice, Bayesian optimization may perform poorly. Second, you may just be wrong with your initial choice and have to do time-consuming extra steps of refinement.

  • For finite numerical domains, use finrange or logfinrange. If you insist on a finite range (in some cases, this may be the better choice) for a numerical parameter, make use of finrange or logfinrange instead of choice, as alternatives to uniform and loguniform respectively. If your value spacing is not regular, you can use ordinal or logordinal. For example, choice([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]) can be replaced by logordinal([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]), which is what streamline_config_space() would do.

  • Use a log transform for parameters which may vary over several orders of magnitude. Examples are learning rates or regularization constants. In the example above, streamline_config_space() converts n_units from randint(4, 1024) to lograndint(4, 1024) and learning_rate from uniform(1e-6, 1) to loguniform(1e-6, 1).

  • Use points_to_evaluate. On top of refining your configuration space, we strongly recommend to specify initial default configurations by points_to_evaluate.

As a user, you can memory all of this, or you can use streamline_config_space() and just do the following:

  • Use uniform for float values, randint for int values, and leave the decision for log scaling to the logic.

  • Use choice for each finite domain, just make sure that all entries have the same type (str, int, or float). streamline_config_space() will transform your choice into finrange, logfinrange, ordinal, or logordinal for value types float or int.

You should also use streamline_config_space() when importing configuration spaces from other HPO libraries, which may not support the finite numerical domains Syne Tune has.

Note

The conversion of choice to finrange or logfinrange in streamline_config_space() can be approximate. While the list has the same size, some entries may be changed. For example, choice([1, 2, 5, 10, 20, 50]) is replaced by logfinrange with values 1, 2, 5, 10, 22, 48. If this is a problem for certain domains, use the exclude_names argument.

Finally, here is what streamline_config_space() is doing:

  • For a domain uniform(lower, upper) or randint(lower, upper): If lower > 0 and upper >= lower * 100, replace domain by loguniform(lower, upper) or lograndint(lower, upper).

  • For a domain choice(categories), where all entries in categories are of type int or float: This domain is replaced by finrange, logfinrange, ordinal, or logordinal (with the same value type), depending on best fit. Namely, categories is sorted to \(x_0 < \dots < x_{n-1}\), and a linear function \(a * j + b, j = 0,\dots, n-1\) is fit to \([x_j]\), and to \([\log x_j]\) if \(x_0 > 0\). The quality of the fit is scored by \(R^2\), it determines logarithmic or linear encoding, and also the choice between finrange and ordinal. For ordinal, we always use kind="nn".

  • In order to exclude certain hyperparameters from replacements, pass their names in the exclude_names argument of streamline_config_space().

Using the Built-in Schedulers

In this tutorial, you will learn how to use and configure the most important built-in HPO algorithms. Alternatively, you can also use most algorithms from Ray Tune.

This tutorial provides a walkthrough of some of the topics addressed here.

Schedulers and Searchers

The decision-making algorithms driving an HPO experiments are referred to as schedulers. As in Ray Tune, some of our schedulers are internally configured by a searcher. A scheduler interacts with the backend, making decisions on which configuration to evaluate next, and whether to stop, pause or resume existing trials. It relays “next configuration” decisions to the searcher. Some searchers maintain a surrogate model which is fitted to metric data coming from evaluations.

Note

There are two ways to create many of the schedulers of Syne Tune:

Importing from syne_tune.optimizer.baselines is often simpler. However, in this tutorial, we will use the template classes in order to expose the common structure and to explain arguments only once.

FIFOScheduler

This is the simplest kind of scheduler. It cannot stop or pause trials, each evaluation proceeds to the end. Depending on the searcher, this scheduler supports:

  • Random search [searcher="random"]

  • Bayesian optimization with Gaussian processes [searcher="bayesopt"]

  • Grid search [searcher="grid"]

  • TPE with kernel density estimators [searcher="kde"]

  • Constrained Bayesian optimization [searcher="bayesopt_constrained"]

  • Cost-aware Bayesian optimization [searcher="bayesopt_cost"]

  • Bore [searcher="bore"]

We will only consider the first two searchers in this tutorial. Here is a launcher script using FIFOScheduler:

import logging

from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import FIFOScheduler
from syne_tune import Tuner, StoppingCriterion

from benchmarking.benchmark_definitions import \
    mlp_fashionmnist_benchmark


if __name__ == '__main__':
    logging.getLogger().setLevel(logging.DEBUG)
    n_workers = 4
    max_wallclock_time = 120

    # We pick the MLP on FashionMNIST benchmark
    # The 'benchmark' object contains arguments needed by scheduler and
    # searcher (e.g., 'mode', 'metric'), along with suggested default values
    # for other arguments (which you are free to override)
    benchmark = mlp_fashionmnist_benchmark()
    config_space = benchmark.config_space

    backend = LocalBackend(entry_point=benchmark.script)

    # GP-based Bayesian optimization searcher. Many options can be specified
    # via `search_options`, but let's use the defaults
    searcher = "bayesopt"
    search_options = {'num_init_random': n_workers + 2}
    scheduler = FIFOScheduler(
        config_space,
        searcher=searcher,
        search_options=search_options,
        mode=benchmark.mode,
        metric=benchmark.metric,
    )

    tuner = Tuner(
        trial_backend=backend,
        scheduler=scheduler,
        stop_criterion=StoppingCriterion(
            max_wallclock_time=max_wallclock_time
        ),
        n_workers=n_workers,
    )

    tuner.run()

What happens in this launcher script?

  • We select the mlp_fashionmnist benchmark, adopting its default hyperparameter search space without modifications.

  • We select the local backend, which runs up to n_workers = 4 processes in parallel on the same instance.

  • We create a FIFOScheduler with searcher = "bayesopt". This means that new configurations to be evaluated are selected by Bayesian optimization, and all trials are run to the end. The scheduler needs to know the config_space, the name of metric to tune (metric) and whether to minimize or maximize this metric (mode). For mlp_fashionmnist, we have metric = "accuracy" and mode = "max", so we select a configuration which maximizes accuracy.

  • Options for the searcher can be passed via search_options. We use defaults, except for changing num_init_random (see below) to the number of workers plus two.

  • Finally, we create the tuner, passing trial_backend, scheduler, as well as the stopping criterion for the experiment (stop after 120 seconds) and the number of workers. The experiment is started by tuner.run().

FIFOScheduler provides the full range of arguments. Here, we list the most important ones:

  • config_space: Hyperparameter search space. This argument is mandatory. Apart from hyperparameters to be searched over, the space may contain fixed parameters (such as epochs in the example above). A config passed to the training script is always extended by these fixed parameters. If you use a benchmark, you can use benchmark["config_space"] here, or you can modify this default search space.

  • searcher: Selects searcher to be used (see below).

  • search_options: Options to configure the searcher (see below).

  • metric, mode: Name of metric to tune (i.e, key used in report call by the training script), which is either to be minimized (mode="min") or maximized (mode="max"). If you use a benchmark, just use benchmark["metric"] and benchmark["mode"] here.

  • points_to_evaluate: Allows to specify a list of configurations which are evaluated first. If your training code corresponds to some open source ML algorithm, you may want to use the defaults provided in the code. The entry (or entries) in points_to_evaluate do not have to specify values for all hyperparameters. For any hyperparameter not listed there, the following rule is used to choose a default. For float and int value type, the mid-point of the search range is used (in linear or log scaling). For categorical value type, the first entry in the value set is used. The default is a single config with all values chosen by the default rule. Pass an empty list in order to not specify any initial configs.

  • random_seed: Master random seed. Random sampling in schedulers and searchers are done by a number of numpy.random.RandomState generators, whose seeds are derived from random_seed. If not given, a random seed is sampled and printed in the log.

Bayesian Optimization

Bayesian optimization is obtained by searcher='bayesopt', or by using BayesianOptimization instead of FIFOScheduler. More information about Bayesian optimization is provided here.

Options for configuring the searcher are given in search_options. These include options for the random searcher. GPFIFOSearcher provides the full range of arguments. We list the most important ones:

  • num_init_random: Number of initial configurations chosen at random (or via points_to_evaluate). In fact, the number of initial configurations is the maximum of this and the length of points_to_evaluate. Afterwards, configurations are chosen by Bayesian optimization (BO). In general, BO is only used once at least one metric value from past trials is available. We recommend to set this value to the number of workers plus two.

  • opt_nstarts, opt_maxiter: BO employs a Gaussian process surrogate model, whose own hyperparameters (e.g., kernel parameters, noise variance) are chosen by empirical Bayesian optimization. In general, this is done whenever new data becomes available. It is the most expensive computation in each round. opt_maxiter is the maximum number of L-BFGS iterations. We run opt_nstarts such optimizations from random starting points and pick the best.

  • max_size_data_for_model, max_size_top_fraction: GP computations scale cubically with the number of observations, and decision making can become very slow for too many trials. Whenever there are more than max_size_data_for_model observations, the dataset is downsampled to this size. Here, max_size_data_for_model * max_size_top_fraction of the entries correspond to the cases with the best metric values, while the remaining entries are drawn at random (without replacement) from all other cases. Defaults to DEFAULT_MAX_SIZE_DATA_FOR_MODEL.

  • opt_skip_init_length, opt_skip_period: Refitting the GP hyperparameters in each round can become expensive, especially when the number of observations grows large. If so, you can choose to do it only every opt_skip_period rounds. Skipping optimizations is done only once the number of observations is above opt_skip_init_length.

  • gp_base_kernel: Selects the covariance (or kernel) function to be used in the surrogate model. Current choices are “matern52-ard” (Matern 5/2 with automatic relevance determination; the default) and “matern52-noard” (Matern 5/2 without ARD).

  • acq_function: Selects the acquisition function to be used. Current choices are “ei” (negative expected improvement; the default) and “lcb” (lower confidence bound). The latter has the form \(\mu(x) - \kappa \sigma(x)\), where \(\mu(x)\), \(\sigma(x)\) are predictive mean and standard deviation, and \(\kappa > 0\) is a parameter, which can be passed via acq_function_kwargs={"kappa": 0.5} for \(\kappa = 0.5\).

  • input_warping: If this is True, inputs are warped before being fed into the covariance function, the effective kernel becomes \(k(w(x), w(x'))\), where \(w(x)\) is a warping transform with two non-negative parameters per component. These parameters are learned along with other parameters of the surrogate model. Input warping allows the surrogate model to represent non-stationary functions, while still keeping the numbers of parameters small. Note that only such components of \(x\) are warped which belong to non-categorical hyperparameters.

  • boxcox_transform: If this is True, target values are transformed before being fitted with a Gaussian marginal likelihood. This is using the Box-Cox transform with a parameter \(\lambda\), which is learned alongside other parameters of the surrogate model. The transform is \(\log y\) for \(\lambda = 0\), and \(y - 1\) for \(\lambda = 1\). This option requires the targets to be positive.

HyperbandScheduler

This scheduler comes in at least two different variants, one may stop trials early (type="stopping"), the other may pause trials and resume them later (type="promotion"). For tuning neural network models, it tends to work much better than FIFOScheduler. You may have read about successive halving and Hyperband before. Chances are you read about synchronous scheduling of parallel evaluations, while both HyperbandScheduler and FIFOScheduler implement asynchronous scheduling, which can be substantially more efficient. This tutorial provides details about synchronous and asynchronous variants of successive halving and Hyperband.

Here is a launcher script using HyperbandScheduler:

import logging

from syne_tune.backend import LocalBackend
from syne_tune.optimizer.schedulers import HyperbandScheduler
from syne_tune import Tuner, StoppingCriterion

from benchmarking.benchmark_definitions import \
    mlp_fashionmnist_benchmark

if __name__ == '__main__':
    logging.getLogger().setLevel(logging.DEBUG)
    n_workers = 4
    max_wallclock_time = 120

    # We pick the MLP on FashionMNIST benchmark
    # The 'benchmark' object contains arguments needed by scheduler and
    # searcher (e.g., 'mode', 'metric'), along with suggested default values
    # for other arguments (which you are free to override)
    benchmark = mlp_fashionmnist_benchmark()
    config_space = benchmark.config_space

    backend = LocalBackend(entry_point=benchmark.script)

    # MOBSTER: Combination of asynchronous successive halving with
    # GP-based Bayesian optimization
    searcher = 'bayesopt'
    search_options = {'num_init_random': n_workers + 2}
    scheduler = HyperbandScheduler(
        config_space,
        searcher=searcher,
        search_options=search_options,
        type="stopping",
        max_resource_attr=benchmark.max_resource_attr,
        resource_attr=benchmark.resource_attr,
        mode=benchmark.mode,
        metric=benchmark.metric,
        grace_period=1,
        reduction_factor=3,
    )

    tuner = Tuner(
        trial_backend=backend,
        scheduler=scheduler,
        stop_criterion=StoppingCriterion(
            max_wallclock_time=max_wallclock_time
        ),
        n_workers=n_workers,
    )

    tuner.run()

Much of this launcher script is the same as for FIFOScheduler, but HyperbandScheduler comes with a number of extra arguments we will explain in the sequel (type, max_resource_attr, grace_period, reduction_factor, resource_attr). The mlp_fashionmnist benchmark trains a two-layer MLP on FashionMNIST (more details are here). The accuracy is computed and reported at the end of each epoch:

for epoch in range(resume_from + 1, config['epochs'] + 1):
    train_model(config, state, train_loader)
    accuracy = validate_model(config, state, valid_loader)
    report(epoch=epoch, accuracy=accuracy)

While metric="accuracy" is the criterion to be optimized, resource_attr="epoch" is the resource attribute. In the schedulers discussed here, the resource attribute must be a positive integer.

HyperbandScheduler maintains reported metrics for all trials at certain rung levels (levels of resource attribute epoch at which scheduling decisions are done). When a trial reports (epoch, accuracy) for a rung level == epoch, the scheduler makes a decision whether to stop (pause) or continue. This decision is done based on all accuracy values encountered before at the same rung level. Whenever a trial is stopped (or paused), the executing worker becomes available to evaluate a different configuration.

Rung level spacing and stop/go decisions are determined by the parameters max_resource_attr, grace_period, and reduction_factor. The first is the name of the attribute in config_space which contains the maximum number of epochs to train (max_resource_attr == "epochs" in our benchmark). This allows the training script to obtain max_resource_value = config["max_resource_attr"]. Rung levels are \(r_{min}, r_{min} \eta, r_{min} \eta^2, \dots, r_{max}\), where \(r_{min}\) is grace_period, \(\eta\) is reduction_factor, and \(r_{max}\) is max_resource_value. In the example above, max_resource_value = 81, grace_period = 1, and reduction_factor = 3, so that rung levels are 1, 3, 9, 27, 81. The spacing is such that stop/go decisions are done less frequently for trials which already went further: they have earned trust by not being stopped earlier. \(r_{max}\) need not be of the form \(r_{min} \eta^k\). If max_resource_value = 56 in the example above, the rung levels would be 1, 3, 9, 27, 56.

Given such a rung level spacing, stop/go decisions are done by comparing accuracy to the 1 / reduction_factor quantile of values recorded at the rung level. In the example above, our trial is stopped if accuracy is no better than the best 1/3 of previous values (the list includes the current accuracy value), otherwise it is stopped.

Further details about HyperbandScheduler and multi-fidelity HPO methods are given in this tutorial. HyperbandScheduler provides the full range of arguments. Here, we list the most important ones:

  • max_resource_attr, grace_period, reduction_factor: As detailed above, these determine the rung levels and the stop/go decisions. The resource attribute is a positive integer. We need reduction_factor >= 2. Note that instead of max_resource_attr, you can also use max_t, as detailed here.

  • rung_increment: This parameter can be used instead of reduction_factor (the latter takes precedence). In this case, rung levels are spaced linearly: \(r_{min} + j \nu, j = 0, 1, 2, \dots\), where \(\nu\) is rung_increment. The stop/go rule in the successive halving scheduler is set based on the ratio of successive rung levels.

  • rung_levels: Alternatively, the user can specify the list of rung levels directly (positive integers, strictly increasing). The stop/go rule in the successive halving scheduler is set based on the ratio of successive rung levels.

  • type: The most important values are "stopping", "promotion" (see above).

  • brackets: Number of brackets to be used in Hyperband. More details are found here. The default is 1 (successive halving).

Depending on the searcher, this scheduler supports:

We will only consider the first two searchers in this tutorial.

Asynchronous Hyperband (ASHA)

If HyperbandScheduler is configured with a random searcher, we obtain ASHA, as proposed in A System for Massively Parallel Hyperparameter Tuning. More details are provided here. Nothing much can be configured via search_options in this case. The arguments are the same as for random search with FIFOScheduler.

Model-based Asynchronous Hyperband (MOBSTER)

If HyperbandScheduler is configured with a Bayesian optimization searcher, we obtain MOBSTER, as proposed in Model-based Asynchronous Hyperparameter and Neural Architecture Search. By default, MOBSTER uses a multi-task Gaussian process surrogate model for metrics data observed at all resource levels. More details are provided here.

Recommendations

Finally, we provide some general recommendations on how to use our built-in schedulers.

  • If you can afford it for your problem, random search is a useful baseline (RandomSearch). However, if even a single full evaluation takes a long time, try ASHA (ASHA) instead. The default for ASHA is type="stopping", but you should consider type="promotion" as well (more details on this choice are given here.

  • Use these baseline runs to get an idea how long your experiment needs to run. It is recommended to use a stopping criterion of the form stop_criterion=StoppingCriterion(max_wallclock_time=X), so that the experiment is stopped after X seconds.

  • If your tuning problem comes with an obvious resource parameter, make sure to implement it such that results are reported during the evaluation, not only at the end. When training a neural network model, choose the number of epochs as resource. In other situations, choosing a resource parameter may be more difficult. Our schedulers require positive integers. Make sure that evaluations for the same configuration scale linearly in the resource parameter: an evaluation up to 2 * r should be roughly twice as expensive as one up to r.

  • If your problem has a resource parameter, always make sure to try HyperbandScheduler, which in many cases runs much faster than FIFOScheduler.

  • If you end up tuning the same ML algorithm or neural network model on different datasets, make sure to set points_to_evaluate appropriately. If the model comes from frequently used open source code, its built-in defaults will be a good choice. Any hyperparameter not covered in points_to_evaluate is set using a midpoint heuristic. While still better than choosing the first configuration at random, this may not be very good.

  • In general, the defaults should work well if your tuning problem is expensive enough (at least a minute per unit of r). In such cases, MOBSTER (MOBSTER) can outperform ASHA substantially. However, if your problem is cheap, so you can afford a lot of evaluations, the searchers based on GP surrogate models may end up expensive. In fact, once the number of evaluations surpassed a certain threshold, the data is filtered down before fitting the surrogate model (see here). You can adjust this threshold or change opt_skip_period in order to speed up MOBSTER.

Multi-Fidelity Hyperparameter Optimization

This tutorial provides an overview of multi-fidelity HPO algorithms implemented in Syne Tune. Multi-fidelity scheduling is one of the most successful recent ideas used to speed up HPO. You will learn about the differences and relationships between different methods, and how to choose the best approach for your own problems.

Note

In order to run the code in this tutorial, you need to have installed the blackbox-repository dependencies.

Introduction

In this section, we define and motivate some basic definitions. As this tutorial is mostly driven by examples, we will not go into much detail here.

What is Hyperparameter Optimization (HPO)?

In hyperparameter optimization (HPO), the goal is to minimize an a priori unknown function \(f(\mathbf{x})\) over a configuration space \(\mathbf{x}\in\mathcal{X}\). Here, \(\mathbf{x}\) is a hyperparameter configuration. For example, \(f(\mathbf{x})\) could be obtained by training a neural network model on a training dataset, then computing its error on a disjoint validation dataset. The hyperparameters may configure several aspects of this setup, for example:

  • Optimization parameters: Learning rate, batch size, momentum fraction, regularization constant, dropout fraction, choice of stochastic gradient descent (SDG) optimizer, warm-up ratio

  • Architecture parameters: Number of layers, width of layers, number of convolution filters, number of self-attention heads

If HPO ranges over architecture parameters, potentially including the operator types and connectivity of cells (or layers), it is also referred to as neural architecture search (NAS).

In general, HPO is a more difficult optimization problem than training for weights and biases, for a number of reasons:

  • Hyperparameters are often discrete (integer or categorical), so smooth optimization principles do not apply

  • HPO is the outer loop of a nested (or bi-level) optimization problem, where the inner loop consists of training for weights and biases. This means that an evaluation of \(f(\mathbf{x})\) can be very expensive (hours or even days)

  • The nested structure implies further difficulties. Training is non-deterministic (random initialization and mini-batch ordering), so \(f(\mathbf{x})\) is really a random function. Even for continuous hyperparamters, a gradient of \(f(\mathbf{x})\) is not tractable to obtain

For these reasons, a considerable amount of technology has so far been applied to the HPO problem. In the context of this tutorial, two directions are most relevant:

  • Saving compute resources and time by using partial evaluations of \(f(\mathbf{x})\) most of the time. Such evaluations are called low fidelity or low resource below

  • Fitting data from \(f(\mathbf{x})\) (and its lower fidelities) with a surrogate probabilistic model. The latter has properties that the real target function lacks (fast to evaluate; gradients can be computed), and this can efficiently guide the search. The main purpose of a surrogate model is to reduce the number of evaluations of \(f(\mathbf{x})\), while still finding a high quality optimum

Fidelities and Resources

In this section, we will introduce concepts of multi-fidelity hyperparameter optimization. Examples will be given further below. The reader may skip this section and return to it as a glossary.

An evaluation of \(f(\mathbf{x})\) requires a certain amount of compute resources and wallclock time. Most of this time is spent in training the model. In most cases, training resources and time can be broken down into units. For example:

  • Neural networks are trained for a certain number of epochs (i.e., sweeps over the training set). In this case, training for one epoch could be one resource unit. This resource unit will be used as running example in this tutorial.

  • Machine learning models can also be trained on subsets of the training set, in order to save resources. We could create a nested system of sets, where for simplicity all sizes are integer multiples of the smallest one. In this case, training on the smallest subset size is one resource unit.

We can decide the amount of resources when evaluating a configuration, giving rise to observations of \(f(\mathbf{x}, r)\), where \(r\in\{1, 2, 3, \dots, r_{max}\}\) denotes the resource used (e.g., number of epochs of training).

It is common to define \(f(\mathbf{x}, r_{max}) = f(\mathbf{x})\), so that the original criterion of interest has the largest resource that can be chosen. In this context, any \(f(\mathbf{x}, r)\) with \(r < r_{max}\) is called a low fidelity criterion w.r.t. \(f(\mathbf{x}, r_{max})\). The smaller \(r\), the lower the fidelity. A smaller resource requires less computation and waiting time, but it also produces a datapoint of less quality when approximating the target metric. Importantly, all methods discussed here make the following assumption:

  • For every fixed \(\mathbf{x}\), running time and compute cost of evaluating \(f(\mathbf{x}, r)\) scales roughly proportional to \(r\). If this is not the case for the natural resource unit in your problem, you need to map \(r\) to your unit in a non-linear way. Note that time may still strongly depend on the configuration \(\mathbf{x}\) itself.

Multi-Fidelity Scheduling

How could an existing HPO technology be extended in order to make use of multi-fidelity observations \(f(\mathbf{x}, r)\) at different resources? There are two basic principles which come to mind:

  • A priori decisions: Whenever a decision is required which configuration \(\mathbf{x}\) to evaluate next, the method also decides the resource \(r\) to be spent on that evaluation.

  • A posteriori decisions: Whenever a new configuration \(\mathbf{x}\) can be run, it is started without a definite amount of resource attached to it. After it spent some resources, its low-fidelity observations are compared to others who spent the same resource before. Decisions on stopping, or also on resuming, trials are taken based on the outcome of such comparisons.

While some work on multi-fidelity Bayesian optimization has chosen the former option, methods with a posteriori decision-making have been far more successful. All methods discussed in this tutorial adhere to the a posteriori principle for decisions which trials to stop or resume from a paused state. In the sequel, we will use the terminology scheduling decisions rather than a posteriori.

How to implement such scheduling decisions? In general, we need to compare a number of trials with each other on the basis of observations at a certain resource level \(r\) (or, more generally, on values up to \(r\)). In this tutorial, and in Syne Tune more generally, we use terminology defined in the ASHA publication. A rung is a list of trials \(\mathbf{x}_j\) and observations \(f(\mathbf{x}_j, r)\) at a certain resource level \(r\). This resource is also called rung level. In general, a decision on what to do with one or several trials in the rung is taken by sorting the rung members w.r.t. their metric values. A positive decision (i.e., continue, or resume) is taken if the trial ranks among the better ones (above a certain quantile), a negative one (i.e., stop, or keep paused) is taken otherwise.

More details will be given when we come to real examples below. Just a few remarks at this point, which will be substantiated with examples:

  • Modern successive halving methods innovated over earlier proposals by suggesting a geometric spacing of rung levels, and by calibrating the thresholds in scheduling decisions according to this spacing. For example, the median stopping rule predates successive halving, but is typically outperformed by ASHA (while MSR is implemented in Syne Tune, it is not discussed in this tutorial).

  • Scheduling decisions can either be made synchronously or asynchronously. In the former case, decisions are batched up for many trials, while in the latter case, decisions for each trial are made instantaneously.

  • Asynchronous scheduling can either be implemented as start-and-stop, or as pause-and-resume. In the former case, trials are started when workers become available, and they may be stopped at rung levels (and just continue otherwise). In pause-and-resume scheduling, any trial is always run until the next rung level and paused there. When a worker becomes available, it may be used to resume any of the paused trials, in case they compare well against peers at the same rung. These modalities place different requirements on the training script and the execution backend.

Setting up the Problem

If you have not done this before, it is recommended you first work through the Basics of Syne Tune tutorial, in order to become familiar with concepts such as configuration, configuration space, backend, scheduler.

Note

In this tutorial, we will use a surrogate benchmark in order to obtain realistic results with little computation. To this end, you need to have the blackbox-repository dependencies installed, as detailed here. Note that the first time you use a surrogate benchmark, its data files are downloaded and stored to your S3 bucket, this can take a considerable amount of time. The next time you use the benchmark, it is loaded from your local disk or your S3 bucket, which is fast.

Running Example

For most of this tutorial, we will be concerned with one running example: the NASBench-201 benchmark. NASBench-201 is a frequently used neural architecture search benchmark with a configuration space of six categorical parameters, with five values each. The authors trained networks under all these configurations and provide metrics, such as training error, evaluation error and runtime after each epoch, free for researchers to use. In this tutorial, we make use of the CIFAR100 variant of this benchmark, where the model architectures have been trained on the CIFAR100 image classification dataset.

NASBench-201 is an example for a tabulated benchmark. Researchers can benchmark and compare HPO algorithms on the data without having to spend efforts to train models. They do not need expensive GPU computation in order to explore ideas or do comparative studies.

Syne Tune is particularly well suited to work with tabulated benchmarks. First, it contains a blackbox repository for maintenance and fast access to tabulated benchmarks. Second, it features a simulator backend which simulates training evaluations from a blackbox. The simulator backend can be used with any Syne Tune scheduler, and experiment runs are very close to what would be obtained by running training for real. In particular, the simulation maintains correct timings and temporal order of events. Importantly, time is simulated as well. Not only are experiments very cheap to run (on basic CPU hardware), they also finish many times faster than real time.

The Launcher Script

The most flexible way to run HPO experiments in Syne Tune is by writing a launcher script. In this tutorial, we will use the following launcher script.

hpo_main.py
import logging
from argparse import ArgumentParser

from syne_tune.experiments.benchmark_definitions import nas201_benchmark
from syne_tune.backend.simulator_backend.simulator_callback import (
    SimulatorCallback,
)
from syne_tune.blackbox_repository.simulated_tabular_backend import (
    BlackboxRepositoryBackend,
)
from syne_tune.optimizer.baselines import (
    ASHA,
    MOBSTER,
    HyperTune,
    SyncHyperband,
    SyncBOHB,
    SyncMOBSTER,
    DEHB,
)
from syne_tune import Tuner, StoppingCriterion

if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    parser = ArgumentParser()
    parser.add_argument(
        "--method",
        type=str,
        choices=(
            "ASHA-STOP",
            "ASHA-PROM",
            "ASHA6-STOP",
            "MOBSTER-JOINT",
            "MOBSTER-INDEP",
            "HYPERTUNE-INDEP",
            "HYPERTUNE4-INDEP",
            "HYPERTUNE-JOINT",
            "SYNCHB",
            "SYNCSH",
            "SYNCMOBSTER",
            "BOHB",
            "DEHB",
        ),
        default="ASHA-STOP",
    )
    parser.add_argument(
        "--random_seed",
        type=int,
        default=31415927,
    )
    parser.add_argument(
        "--experiment_tag",
        type=str,
        default="mf-tutorial",
    )
    parser.add_argument(
        "--dataset",
        type=str,
        choices=("cifar10", "cifar100", "ImageNet16-120"),
        default="cifar100",
    )
    args = parser.parse_args()

    # [1]
    # Setting up simulator backend for blackbox repository
    # We use the NASBench201 blackbox for the training set `args.dataset`
    benchmark = nas201_benchmark(args.dataset)
    max_resource_attr = benchmark.max_resource_attr
    trial_backend = BlackboxRepositoryBackend(
        elapsed_time_attr=benchmark.elapsed_time_attr,
        max_resource_attr=max_resource_attr,
        blackbox_name=benchmark.blackbox_name,
        dataset=benchmark.dataset_name,
        surrogate=benchmark.surrogate,
        surrogate_kwargs=benchmark.surrogate_kwargs,
    )

    # [2]
    # Select configuration space for the benchmark. Here, we use the default
    # for the blackbox
    blackbox = trial_backend.blackbox
    # Common scheduler kwargs
    method_kwargs = dict(
        metric=benchmark.metric,
        mode=benchmark.mode,
        resource_attr=blackbox.fidelity_name(),
        random_seed=args.random_seed,
        max_resource_attr=max_resource_attr,
    )
    # Insert maximum resource level into configuration space. Doing so is
    # best practice and has advantages for pause-and-resume schedulers
    config_space = blackbox.configuration_space_with_max_resource_attr(
        max_resource_attr
    )

    scheduler = None
    if args.method in {"ASHA-STOP", "ASHA-PROM", "ASHA6-STOP"}:
        # [3]
        # Scheduler: Asynchronous Successive Halving (ASHA)
        # The 'stopping' variant stops trials which underperform compared to others
        # at certain resource levels (called rungs).
        # The 'promotion' variant pauses each trial at certain resource levels
        # (called rungs). Trials which outperform others at the same rung, are
        # promoted later on, to run to the next higher rung.
        # We configure this scheduler with random search: configurations for new
        # trials are drawn at random
        scheduler = ASHA(
            config_space,
            type="promotion" if args.method == "ASHA-PROM" else "stopping",
            brackets=6 if args.method == "ASHA6-STOP" else 1,
            **method_kwargs,
        )
    elif args.method in {"MOBSTER-JOINT", "MOBSTER-INDEP"}:
        # Scheduler: Asynchronous MOBSTER
        # We configure the scheduler with GP-based Bayesian optimization, using
        # the "gp_multitask" or the "gp_independent" surrogate model.
        search_options = None
        if args.method == "MOBSTER-INDEP":
            search_options = {"model": "gp_independent"}
        scheduler = MOBSTER(
            config_space,
            search_options=search_options,
            type="promotion",
            **method_kwargs,
        )
    elif args.method in {"HYPERTUNE-INDEP", "HYPERTUNE4-INDEP", "HYPERTUNE-JOINT"}:
        # Scheduler: Hyper-Tune
        # We configure the scheduler with GP-based Bayesian optimization, using
        # the "gp_multitask" or the "gp_independent" surrogate model.
        search_options = None
        if args.method == "HYPERTUNE-JOINT":
            search_options = {"model": "gp_multitask"}
        scheduler = HyperTune(
            config_space,
            search_options=search_options,
            type="promotion",
            brackets=4 if args.method == "HYPERTUNE4-INDEP" else 1,
            **method_kwargs,
        )
    elif args.method in {"SYNCHB", "SYNCSH"}:
        # Scheduler: Synchronous successive halving or Hyperband
        # We configure this scheduler with random search: configurations for new
        # trials are drawn at random
        scheduler = SyncHyperband(
            config_space,
            brackets=1 if args.method == "SYNCSH" else None,
            **method_kwargs,
        )
    elif args.method == "SYNCMOBSTER":
        # Scheduler: Synchronous MOBSTER
        # We configure this scheduler with GP-BO search. The default surrogate
        # model is "gp_independent": independent processes at each rung level,
        # which share a common ARD kernel, but separate mean functions and
        # covariance scales.
        scheduler = SyncMOBSTER(
            config_space,
            **method_kwargs,
        )
    elif args.method == "BOHB":
        # Scheduler: Synchronous BOHB
        # We configure this scheduler with KDE search, which is using the
        # "two-density" approximation of the EI acquisition function from
        # TPE (Bergstra & Bengio).
        scheduler = SyncBOHB(
            config_space,
            **method_kwargs,
        )
    elif args.method == "DEHB":
        # Scheduler: Differential Evolution Hyperband (DEHB)
        # We configure this scheduler with random search.
        scheduler = DEHB(
            config_space,
            **method_kwargs,
        )

    stop_criterion = StoppingCriterion(
        max_wallclock_time=benchmark.max_wallclock_time,
        max_num_evaluations=benchmark.max_num_evaluations,
    )

    # [4]
    tuner = Tuner(
        trial_backend=trial_backend,
        scheduler=scheduler,
        stop_criterion=stop_criterion,
        n_workers=benchmark.n_workers,
        sleep_time=0,
        callbacks=[SimulatorCallback()],
        tuner_name=args.experiment_tag,
        metadata={
            "seed": args.random_seed,
            "algorithm": args.method,
            "tag": args.experiment_tag,
            "benchmark": "nas201-" + args.dataset,
        },
    )

    tuner.run()

Let us have a walk through this script, assuming it is called with the default --method ASHA-STOP:

  • If you worked through Basics of Syne Tune, you probably miss the training scripts. Since we use the simulator backend with a blackbox (NASBench-201), a training script is not required, since the backend is directly linked to the blackbox repository and obtains evaluation data from there.

  • [1] We first select the benchmark and create the simulator backend linked with this benchmark. Relevant properties of supported benchmarks are collected in syne_tune.experiments.benchmark_definitions, using SurrogateBenchmarkDefinition. Some properties are tied to the benchmark and must not be changed (elapsed_time_attr, metric, mode, blackbox_name, max_resource_attr). Other properties are default values suggested for the benchmark and may be changed by the user (n_workers, max_num_evaluations, max_wallclock_time, surrogate). Some of the blackboxes are not computed on a dense grid, they require a surrogate regression model in order to be functional. For such, surrogate and surrogate_kwargs need to be considered. However, NASBench-201 comes with a finite configuration space, which has been sampled exhaustively.

  • [1] We then create the BlackboxRepositoryBackend. Instead of a training script, this backend needs information about the blackbox used for the simulation. elapsed_time_attr is the name of the elapsed time metric of the blackbox (time from start of training until end of epoch). max_resource_attr is the name of the maximum resource entry in the configuration space (more on this shortly).

  • [2] Next, we select the configuration space and determine some attribute names. With a tabulated benchmark, we are bound to use the configuration space coming with the blackbox, trial_backend.blackbox.configuration_space. If another configuration space is to be used, a surrogate regression model has to be specified. In this case, config_space_surrogate can be passed at the construction of BlackboxRepositoryBackend. Since NASBench-201 has a native finite configuration space, we can ignore this extra complexity in this tutorial. However, choosing a suitable configuration space and specifying a surrogate can be important for model-based HPO methods. Some more informations are given here.

  • [2] We can determine resource_attr (name of resource attribute) from the blackbox as blackbox.fidelity_name(). Next, if max_resource_attr is specified, we attach the information about the largest resource level to the configuration space, via blackbox.configuration_space_with_max_resource_attr(max_resource_attr). Doing so is best practice in general. In the end, the training script needs to know how long to train for at most (i.e., the maximum number of epochs in our example), this should not be hardcoded. Another advantage of attaching the maximum resource information to the configuration space is that pause-and-resume schedulers can use it to signal the training script how long to really run for. This is explained in more detail when we come to these schedulers. In short, we strongly recommend to use max_resource_attr and to configure schedulers with it.

  • [2] If max_resource_attr is not to be used, the scheduler needs to be passed the maximum resource value explicitly. For ASHA-STOP, this is the max_t attribute. This is not recommended, and not shown here.

  • [3] At this point, we create the multi-fidelity scheduler, which is ASHA in the default case. Most supported schedulers can easily be imported from syne_tune.optimizer.baselines, using common names.

  • [4] Finally, we create a stopping criterion and a Tuner. This should be well known from Basics of Syne Tune. One speciality here is that we require sleep_time=0 and callbacks=[SimulatorCallback()] for things to work out with the simulator backend. Namely, since time is simulated, the Tuner does not really have to sleep between its iterations (simulated time will be increased in distinct steps). Second, SimulatorCallback is needed for simulation of time. It is fine to add additional callbacks here, as long as SimulatorCallback is one of them.

The Blackbox Repository

Giving a detailed account of the blackbox repository is out of scope of this tutorial. If you run the launcher script above, you will be surprised how quickly it finishes. The only real time spent is on logging, fetching metric values from the blackbox, and running the scheduler code. Since the latter is very fast (mostly some random sampling and data organization), whole simulated HPO experiments with many parallel workers can be done in mere seconds.

However, when you run it for the very first time, you will have to wait for quite some time. This is because the blackbox repository downloads the raw data for the benchmark of your choice, processes it, and (optionally) stores it to your S3 bucket. It also stores a local copy. If the data is already in your S3 bucket, it will be downloaded from there if you run on a different instance, this is rather fast. But downloading and processing the raw data can take an hour or more for some of the blackboxes.

Synchronous Successive Halving and Hyperband

In this section, we will introduce some simple multi-fidelity HPO methods based on synchronous decision-making. Methods discussed here are not model-based, they suggest new configurations simply by drawing them uniformly at random from the configuration space, much like random search does.

Early Stopping Hyperparameter Configurations

The figure below depicts learning curves of a set of neural networks with different hyperparameter configurations trained for the same number of epochs. After a few epochs we are already able to visually distinguish between the well-performing and the poorly performing ones. However, the ordering is not perfect, and we might still require the full amount of 100 epochs to identify the best performing configuration.

Learning curves of random configurations

Learning curves for randomly drawn hyperparameter configurations

The idea of early stopping based HPO methods is to free up compute resources by early stopping the evaluation of poorly performing configurations and allocate them to more promising ones. This speeds up the optimization process, since we have a higher throughput of configurations that we can try.

Recall the notation of resource from the introduction. In this tutorial, resource equates to epochs trained, so \(r=2\) refers to metric values evaluated at the end of the second epoch. The main objective of interest, validation error in our tutorial, is denoted by \(f(\mathbf{x}, r)\), where \(\mathbf{x}\) is the configuration, \(r\) the resource level. Our problem typically defines a maximum resource level \(r_{max}\), so that in general the goal is to find \(\mathbf{x}\) which minimizes \(f(\mathbf{x}, r_{max})\). In NASBench-201, the maximum number of epochs is \(r_{max} = 200\).

Synchronous Successive Halving

One of the simplest competitive multi-fidelity HPO methods is synchronous successive halving (SH). The basic idea is to start with \(N\) configurations randomly sampled from the configuration space, training each of them for \(r_{min}\) epochs only (e.g., \(r_{min} = 1\)). We then discard a fraction of the worst performing trials and train the remaining ones for longer. Iterating this process, fewer trials run for longer, until at least one trial reaches \(r_{max}\) epochs.

More formally, successive halving (SH) is parameterized by a minimum resource \(r_{min}\) (for example 1 epoch) and a halving constant \(\eta\in\{2, 3, \dots\}\). The defaults in Syne Tune are \(r_{min} = 1\) and \(\eta = 3\), and we will use these for now. Next, we define rung levels \(\mathcal{R} = \{ r_{min}, r_{min}\eta, r_{min}\eta^2, \dots \}\), so that all \(r\in \mathcal{R}\) satisfy \(r\le r_{max}\). In our example, \(\mathcal{R} = \{ 1, 3, 9, 27, 81 \}\). Moreover, the initial number of configurations is set to \(N = \eta^5 = 243\). In general, a trial is trained until reaching the next recent rung level, then evaluated there, and the validation errors of all trials at a rung level are used to decide which of them to discard. We start with running \(N\) trials until rung level \(r_{min}\). Sorting the validation errors, we keep the top \(1 / \eta\) fraction (i.e, \(N / \eta\) configurations) and discard all the rest. The surviving trials are trained for \(r_{min}\eta\) epochs, and the process is repeated. Synchronized at each rung level, a \(1 / \eta\) fraction of trials survives and finds it budget to be multiplied by \(\eta\). With this particular choice of \(N\), only a single trial will be trained to the full resource \(r_{max}\). In our example:

  • We first train 243 randomly chosen configurations for 1 epoch each

  • Once all of them are finished, we promote those 81 trials with the lowest validation errors to train for 3 epochs

  • Then, the 27 best-performing ones after 3 epochs are trained for 9 epochs

  • The 9 best ones after 9 epochs are trained for 27 epochs

  • The 3 best ones after 27 epochs are trained for 81 epochs

  • The single best configuration after 81 epochs is trained for 200 epochs

Finally, once one such round of SH is finished, we start the next round with a new set of initial configurations, until the total budget is spent.

Our launcher script runs synchronous successive halving if method="SYNCSH". The relevant parameters are grace_period ( \(r_{min}\) ) and reduction_factor ( \(\eta\) ). Moreover, for SH, we need to set brackets=1, since otherwise an extension called Hyperband is run (to be discussed shortly).

API docs:

Synchronous SH employs pause-and-resume scheduling (see introduction). Once a trial reaches a rung level, it is paused there. This is because the decision of which trials to promote to the next rung level can only be taken once the current rung level is completely filled up: only then can we determine the top \(1 / \eta\) fraction of trials which are to be resumed. Syne Tune supports pause-and-resume schedulers with checkpointing. Namely, the state of a trial (e.g., weights of neural network model) is stored when it is paused. Once a trial is resumed, the checkpoint is loaded and training can resume from there. Say a trial is paused at \(r = 9\) and is later resumed towards \(r = 27\). With checkpointing, we have to train for \(27 - 9 = 18\) epochs only instead of 27 epochs for training from scratch. More details are given here. For tabulated benchmarks, checkpointing is supported by default.

Finally, it is important to understand in which sense the method detailed in this section is synchronous. This is because decision-making on which trials to resume is synchronized at certain points in time, namely when a rung level is completed. In general, a trial reaching a rung level has to be paused, because it is not the last one required to fill the rung. In our example, the rung at \(r = 1\) requires 243 trials to finish training for one epoch, so that 242 of them have to be paused for some time.

Synchronous decision-making does not mean that parallel compute resources (called workers in Syne Tune) need to sit idle. In Syne Tune, workers are asychronously scheduled in general: whenever a worker finishes, it is assigned a new task immediately. Say a worker just finished, but we find all remaining slots in the current rung to be pending (meaning that other workers evaluate trials to end up there, but are not finished yet). We cannot resume a trial from this rung, because promotion decisions require all slots to be filled. In such cases, our implementation starts a new round of SH (or further contributes to a new round already started for the same reason).

In the sequel, the synchronous / asynchronous terminology always refers to decision-making, and not to scheduling of parallel resources.

Synchronous Hyperband

While SH can greatly improve upon random search, the choice of \(r_{min}\) can have an impact on its performance. If \(r_{min}\) is too small, our network might not have learned anything useful, and even the best configurations may be filtered out at random. If \(r_{min}\) is too large on the other hand, the benefits of early stopping may be greatly diminished.

Hyperband is an extension of SH that mitigates the risk of setting \(r_{min}\) too small. It runs SH as subroutine, where each round, called a bracket, balances between \(r_{min}\) and the number of initial configurations \(N\), such that the same total amount of resources is used. One round of Hyperband consists of a sequential loop over brackets.

The number of brackets can be chosen anywhere between 1 (i.e., SH) and the number of rung levels. In Syne Tune, the default number of brackets is the maximum. Without going into formal details, here are the brackets for our NASBench-201 example:

  • Bracket 0: \(r_{min} = 1, N = 243\)

  • Bracket 1: \(r_{min} = 3, N = 98\)

  • Bracket 2: \(r_{min} = 9, N = 41\)

  • Bracket 3: \(r_{min} = 27, N = 18\)

  • Bracket 4: \(r_{min} = 81, N = 9\)

  • Bracket 5: \(r_{min} = 200, N = 6\)

Our launcher script runs synchronous Hyperband if method="SYNCHB". Since brackets is not used when creating SyncHyperband, the maximum value 6 is chosen. We also use the default values for grace_period (1) and reduction_factor (3).

API docs:

The advantages of Hyperband over SH are mostly theoretical. In practice, while Hyperband can improve on SH if \(r_{min}\) chosen for SH is clearly too small, it tends to perform worse than SH if \(r_{min}\) is adequate. This disadvantage of Hyperband is somewhat mitigated in the Syne Tune implementation, where new brackets are started whenever workers cannot contribute to the current bracket (because remaining slots in the current rung are pending, see above).

Asynchronous Successive Halving

In this section, we will turn our attention to methods adopting asynchronous decision-making, which tend to be more efficient than their synchronous counterparts.

Asynchronous Successive Halving: Early Stopping Variant

In synchronous successive halving (SH), decisions on whether to promote a trial or not can be delayed for a long time. In our example, say we are lucky and sample an excellent configuration early on, among the 243 initial ones. In order to promote it to train for 81 epochs, we first need to train 243 trials for 1 epoch, then 81 for 3 epochs, 27 for 9 epochs, and 9 for 27 epochs. Our excellent trial will always be among the top \(1/3\) of others at these rung levels, but its progress through the rungs is severely delayed.

In asynchronous successive halving (ASHA), the aim is to promote promising configurations as early as possible. There are two different variants of ASHA, and we will begin with the (arguably) simpler one. Whenever a worker becomes available, a new configuration is sampled at random, and a new trial starts training from scratch. Whenever a trial reaches a rung level, a decision is made immediately on whether to stop training or let it continue. This decision is made based on all data available at the rung until now. If the trial is among the top \(1 / \eta\) fraction of configurations previously registered at this rung, it continues. Otherwise, it is stopped. As long as a rung has less than \(\eta\) trials, the default is to continue.

Different to synchronous SH, there are no fixed rung sizes. Instead, each rung grows over time. ASHA is free of synchronization points. Promising trials can be trained for many epochs without having to wait for delayed promotion decisions. While asynchronous decision-making can be much more efficient at running good configurations to the end, it runs the risk of making bad decisions based on too little data.

Our launcher script runs the stopping variant of ASHA if method="ASHA-STOP".

API docs:

  • Baseline: ASHA

  • Additional arguments: HyperbandScheduler (type="stopping" selects the early stopping variant)

Asynchronous Successive Halving: Promotion Variant

In fact, the algorithm originally proposed as ASHA is slightly different to what has been detailed above. Instead of starting a trial once and rely on early stopping, this promotion variant is of the pause-and-resume type. Namely, whenever a trial reaches a rung, it is paused there. Whenever a worker becomes available, all rungs are scanned top to bottom. If a paused trial is found which lies in the top \(1 / \eta\) of all rung entries, it is promoted: it may resume and train until the next rung level. If no promotable paused trial is found, a new trial is started from scratch. Our launcher script runs the stopping variant of ASHA if method="ASHA-PROM".

API docs:

If these two variants (stopping and promotion) are compared under ideal conditions, one sometimes does better than the other, and vice versa. However, they come with different requirements. The promotion variant pauses and resumes trials, therefore benefits from checkpointing being implemented for the training code. If this is not the case, the stopping variant may be more attractive.

On the other hand, the stopping variant requires the backend to frequently stop workers and bringing them back in order to start a new trial. For some backends, the turn-around time for this process may be slow, in which case the promotion type can be more attractive. In this context, it is important to understand the relevance of passing max_resource_attr to the scheduler (and, in our case, also to the BlackboxRepositoryBackend). Recall the discussion here. If the configuration space contains an entry with the maximum resource, whose key is passed to the scheduler as max_resource_attr, the latter can modify this value when calling the backend to start or resume a trial. For example, if a trial is resumed at \(r = 3\) to train until \(r = 9\), the scheduler passes a configuration to the backend with {max_resource_attr: 9}. This means that the training code knows how long it has to run, it does not have to be stopped by the backend.

ASHA can be significantly accelerated by using PASHA (Progressive ASHA) that dynamically allocates maximum resources for the tuning procedure depending on the need. PASHA starts with a small initial amount of maximum resources and progressively increases them if the ranking of the configurations in the top two rungs has not stabilized. In practice PASHA leads to e.g. 3x speedup compared to ASHA, but this can be even higher for large datasets with millions of examples. A tutorial about PASHA is here.

Asynchronous Hyperband

Finally, ASHA can also be extended to use multiple brackets. Namely, whenever a new trial is started, its bracket (or, equivalently, its \(r_{min}\) value) is sampled randomly from a distribution. In Syne Tune, this distribution is proportional to the rung sizes in synchronous Hyperband. In our example with 6 brackets (see details here), this distribution is \(P(r_{min}) = [1:243/415, 3:98/415, 9:41/415, 27:18/415, 81:9/415, 200:6/415]\). Our launcher script runs asynchronous Hyperband with 6 brackets if method="ASHA6-STOP".

API docs:

  • Baseline: ASHA

  • Additional arguments: HyperbandScheduler (brackets selects the number of brackets, defaults to 1)

As also noted in ASHA, the algorithm often works best with a single bracket, so that brackets=1 is the default in Syne Tune. However, we will see further below that model-based variants of ASHA with multiple brackets can outperform the single-bracket version if the distribution over \(r_{min}\) is adaptively chosen.

Finally, Syne Tune implements two variants of ASHA with brackets > 1. In the default variant, there is only a single system of rungs. For each new trial, \(r_{min}\) is sampled to be equal to one of the rung levels, which means the trial does not have to compete with others at rung levels \(r < r_{min}\). The other variant is activated by passing rung_system_per_bracket=True to HyperbandScheduler. In this case, each bracket has its own rung system, and trials started in one bracket only have to compete with others in the same bracket.

Early Removal of Checkpoints

By default, the checkpoints written by all trials are retained on disk (for a trial, later checkpoints overwrite earlier ones). When checkpoints are large and the local backend is used, this may result in a lot of disk space getting occupied, or even the disk filling up. Syne Tune supports checkpoints being removed once they are not needed anymore, or even speculatively, as is detailed here.

Model-based Synchronous Hyperband

All methods considered so far have been extensions of random search by clever multi-fidelity scheduling. In this section, we consider combinations of Bayesian optimization with multi-fidelity scheduling, where configurations are chosen based on performance of previously chosen ones, rather than being sampled at random.

Basics of Syne Tune: Bayesian Optimization provides an introduction to Bayesian optimization in Syne Tune.

Synchronous BOHB

The first model-based method we consider is BOHB, which uses the TPE formulation of Bayesian optimization. In the latter, an approximation to the expected improvement (EI) acquisition function is interpreted via a ratio of two densities. BOHB uses kernel density estimators rather than tree Parzen estimators (as in TPE) to model the two densities.

BOHB uses the same scheduling mechanism (i.e., rung levels, promotion decisions) than synchronous Hyperband (or SH), but it uses a model fit to past data for suggesting the configuration of every new trial. Recall that validation error after \(r\) epochs is denoted by \(f(\mathbf{x}, r)\), where \(\mathbf{x}\) is the configuration. BOHB fits KDEs separately to the data obtained at each rung level. When a new configuration is to be suggested, it first determines the largest rung level \(r_{acq}\) supported by enough data for the two densities to be properly fit. It then makes a TPE decision at this resource level. Our launcher script runs synchronous BOHB if method="BOHB".

API docs:

While BOHB is often more efficient than SYNCHB, it is held back by synchronous decision-making. Note that BOHB does not model the random function \(f(\mathbf{x}, r)\) directly, which makes it hard to properly react to pending evaluations, i.e. trials which have been started but did not return metric values yet. BOHB ignores pending evaluations if present, which could lead to redundant decisions being made if the number of workers (i.e., parallelization factor) is large.

Synchronous MOBSTER

Another model-based variant is synchronous MOBSTER. We will provide more details on MOBSTER below, when discussing model-based asynchronous methods.

Our launcher script runs synchronous MOBSTER if method="SYNCMOBSTER". Note that the default surrogate model for SyncMOBSTER is gp_independent, where the data at each rung level is represented by an independent Gaussian process (more details are given here). It turns out that SyncMOBSTER outperforms SyncBOHB substantially on the benchmark chosen here.

API docs:

When running these experiments with the simulator backend, we note that suddenly it takes quite some time for an experiment to be finished. Still many times faster than real time, we now need many minutes instead of seconds. This is a reminder that model-based decision-making can take time. In GP-based Bayesian optimization, hyperparameters of a Gaussian process model are fit for every decision, and acquisition functions are being optimized over many candidates. On the real time scale (the x axis in our result plots), this time is often well spent. After all, SyncMOBSTER outperforms SyncBOHB significantly. But since decision-making computations cannot be tabulated, they slow down the simulations.

As a consequence, we should be careful with result plots showing performance with respect to number of training evaluations, as these hide both the time required to make decisions, as well as potential inefficiencies in scheduling jobs in parallel. HPO methods should always be compared with real experiment time on the x axis, and the any-time performance of methods should be visualized by plotting curves, not just quoting “final values”. Examples are provided here.

Note

Syne Tune allows to launch experiments remotely and in parallel in order to still obtain results rapidly, as is detailed here.

Differential Evolution Hyperband

Another recent model-based extension of synchronous Hyperband is Differential Evolution Hyperband (DEHB). DEHB is typically run with multiple brackets. A main difference to Hyperband is that configurations promoted from a rung to the next are also modified by an evolutionary rule, involving mutation, cross-over and selection. Since configurations are not just sampled once, but potentially modified at every rung, the hope is to find well-performing configurations faster. Our launcher script runs DEHB if method="DEHB".

API docs:

The main feature of DEHB over synchronous Hyperband is that configurations can be modified at every rung. However, this feature also has a drawback. Namely, DEHB cannot make effective use of checkpointing. If a trial is resumed with a different configuration, starting from its last recent checkpoint is not admissable. However, our implementation is careful to make use of checkpointing in the very first bracket of DEHB, which is equivalent to a normal run of synchronous SH.

Model-based Asynchronous Hyperband

We have seen that asynchronous decision-making tends to outperform synchronous variants in practice, and model-based extensions of the latter can outperform random sampling of new configurations. In this section, we discuss combinations of Bayesian optimization with asynchronous decision-making, leading to the currently best performing multi-fidelity methods in Syne Tune.

All examples here can either be run in stopping or promotion mode of ASHA. We will use the promotion mode here (i.e., pause-and-resume scheduling).

Surrogate Models of Learning Curves

Recall that validation error after \(r\) epochs is denoted by \(f(\mathbf{x}, r)\), with \(\mathbf{x}\) the configuration. Here, \(r\mapsto f(\mathbf{x}, r)\) is called learning curve. A learning curve surrogate model predicts \(f(\mathbf{x}, r)\) from observed data. A difficult requirement in the context of multi-fidelity HPO is that observations are much more abundant at smaller resource levels \(r\), while predictions are more valuable at larger \(r\).

In the context of Gaussian process based Bayesian optimization, Syne Tune supports a number of different learning curve surrogate models. The type of model is selected upon construction of the scheduler:

scheduler = MOBSTER(
    config_space,
    type="promotion",
    search_options=dict(
        model="gp_multitask",
        gp_resource_kernel="exp-decay-sum",
    ),
    metric=benchmark.metric,
    mode=benchmark.mode,
    resource_attr=resource_attr,
    random_seed=random_seed,
    max_resource_attr=max_resource_attr,
)

Here, options configuring the searcher are collected in search_options. The most important options are model, selecting the type of surrogate model, and gp_resource_kernel selecting the covariance model in the case model="gp_multitask".

Independent Processes at each Rung Level

A simple learning curve surrogate model is obtained by search_options["model"] = "gp_independent". Here, \(f(\mathbf{x}, r)\) at each rung level \(r\) is represented by an independent Gaussian process model. The models have individual constant mean functions \(\mu_r(\mathbf{x}) = \mu_r\) and covariance functions \(k_r(\mathbf{x}, \mathbf{x}') = c_r k(\mathbf{x}, \mathbf{x}')\), where \(k(\mathbf{x}, \mathbf{x}')\) is a Matern-5/2 ARD kernel without variance parameter, which is shared between the models, and the \(c_r > 0\) are individual variance parameters. The idea is that while validation errors at different rung levels may be scaled and shifted, they should still exhibit similar dependencies on the hyperparameters. The noise variance \(\sigma^2\) used in the Gaussian likelihood is the same across all data. However, if search_options["separate_noise_variances"] = True, different noise variances \(\sigma_r^2\) are used for data at different rung levels.

Multi-Task Gaussian Process Models

A more advanced set of learning curve surrogate models is obtained by search_options["model"] = "gp_multitask" (which is the default for asynchronous MOBSTER). In this case, a single Gaussian process model represents \(f(\mathbf{x}, r)\) directly, with mean function \(\mu(\mathbf{x}, r)\) and covariance function \(k((\mathbf{x}, r), (\mathbf{x}', r'))\). The GP model is selected by search_options["gp_resource_kernel"], currently supported options are "exp-decay-sum", "exp-decay-combined", "exp-decay-delta1", "freeze-thaw", "matern52", "matern52-res-warp", "cross-validation". The default choice is "exp-decay-sum", which is inspired by the exponential decay model proposed here. Details about these different models are given here and in the source code.

Decision-making is somewhat more expensive with "gp_multitask" than with "gp_independent", because the notorious cubic scaling of GP inference applies over observations made at all rung levels. However, the extra cost is limited by the fact that most observations by far are made at the lowest resource level \(r_{min}\) anyway.

Additive Gaussian Models

Two additional models are selected by search_options["model"] = "gp_expdecay" and search_options["model"] = "gp_issm". The former is the exponential decay model proposed here, the latter is a variant thereof. These additive Gaussian models represent dependencies across \(r\) in a cheaper way than in "gp_multitask", and they can be fit to all observed data, not just at rung levels. Also, joint sampling is cheap.

However, at this point, additive Gaussian models remain experimental, and they will not be further discussed here. They can be used with MOBSTER, but not with Hyper-Tune.

Asynchronous MOBSTER

MOBSTER combines ASHA and asynchronous Hyperband with GP-based Bayesian optimization. A Gaussian process learning curve surrogate model is fit to the data at all rung levels, and posterior predictive distributions are used in order to compute acquisition function values and decide on which configuration to start next. We distinguish between MOBSTER-JOINT with a GP multi-task model ("gp_multitask") and MOBSTER-INDEP with an independent GP model ("gp_independent"), as detailed above. The acquisition function is expected improvement (EI) at the rung level \(r_{acq}\) also used by BOHB.

Our launcher script runs (asynchronous) MOBSTER-JOINT if method="MOBSTER-JOINT". The searcher can be configured with search_options, but MOBSTER-JOINT with the "exp-decay-sum" covariance model is the default.

API docs:

As shown below, MOBSTER can outperform ASHA significantly. This is achieved by starting many less trials that stop very early (after 1 epoch) due to poor performance. Essentially, MOBSTER rapidly learns some important properties about the NASBench-201 problem and avoids basic mistakes which random sampling of configurations runs into at a constant rate. While ASHA stops such poor trials early, they still take away resources, which MOBSTER can spend on longer evaluations of more promising configurations. This advantage of model-based over random sampling based multi-fidelity methods is even more pronounced when starting and stopping jobs comes with delays. Such delays are typically present in real world distributed systems, but are absent in our simulations.

Different to BOHB, MOBSTER takes into account pending evaluations, i.e. trials which have been started but did not return metric values yet. This is done by integrating out their metric values by Monte Carlo. Namely, we draw a certain number of joint samples over pending targets and average the acquisition function over these. In the context of multi-fidelity, if a trial is running, a pending evaluation is registered for the next recent rung level it will reach.

Why is the surrogate model in MOBSTER-JOINT fit to the data at rung levels only? After all, training scripts tend to report validation errors after each epoch, why not use all this data? Syne Tune allows to do so (for the "gp_multitask" model), by passing searcher_data="all" when creating the HyperbandScheduler (another intermediate is searcher_data="rungs_and_last"). However, while this may lead to a more accurate model, it also becomes more expensive to fit, and does not tend to make a difference, so the default searcher_data="rungs" is recommended.

Finally, we can also combine ASHA with BOHB decision-making, by choosing searcher="kde" in HyperbandScheduler. This is an asynchronous version of BOHB.

MOBSTER-INDEP

Our launcher script runs (asynchronous) MOBSTER-INDEP if method="MOBSTER-INDEP". The independent GPs model is selected by search_options["model"] = "gp_independent". MOBSTER tends to perform slightly better with a joint multi-task GP model than with an independent GPs model, justifying the Syne Tune default. In our experience so far, changing the covariance model in MOBSTER-JOINT has only marginal impact.

API docs:

MOBSTER and Hyperband

Just like ASHA can be run with multiple brackets, so can MOBSTER, simply by selecting brackets when creating HyperbandScheduler. In our experience so far, just like with ASHA, MOBSTER tends to work best with a single bracket.

Controlling MOBSTER Computations

MOBSTER often outperforms ASHA substantially. However, when applied to a problem where many evaluations can be done, fitting the GP surrogate model to all observed data can become slow. In fact, Gaussian process inference scales cubically in the number of observations. The amount of computation spent by MOBSTER can be controlled:

  • Setting the limit max_size_data_for_model: Once the total number of observations is above this limit, the data is sampled down to this size. This is done in a way which retains all observations from trials which reached higher rung levels, while data from trials stopped early are more likely to be removed. This down sampling is redone every time the surrogate model is fit, so that new data (especially at higher rungs) is taken into account. Also, scheduling decisions about stopping, pausing, or promoting trials are always done based on all data.

    The default value for max_size_data_for_model is DEFAULT_MAX_SIZE_DATA_FOR_MODEL. It can be changed by passing search_options = {"max_size_data_for_model": XYZ} when creating the MOBSTER scheduler. You can switch off the limit mechanism by passing None or a very large value. As the current default value is on the smaller end, to ensure fast computations, you may want to experiment with larger values as well.

  • Parameters opt_skip_init_length, opt_skip_period: When fitting the GP surrogate model, the most expensive computation by far is refitting its own parameters, such as kernel parameters. The frequency of this computation can be regulated, as detailed here.

Hyper-Tune

Hyper-Tune is a model-based extension of ASHA with some additional features compared to MOBSTER. It can be seen as extending MOBSTER-INDEP (with the "gp_independent" surrogate model) in two ways. First, it uses an acquisition function based on an ensemble predictive distribution, while MOBSTER relies on the \(r_{acq}\) heuristic from BOHB. Second, if multiple brackets are used (Hyperband case), Hyper-Tune offers an adaptive mechanism to sample the bracket for a new trial. Both extensions are based on a quantification of consistency of data on different rung levels, which is used to weight rung levels according to their reliability for making decisions (namely, which configuration \(\mathbf{x}\) and bracket \(r_{min}\) to associate with a new trial).

Our launcher script runs Hyper-Tune if method="HYPERTUNE-INDEP". The searcher can be configured with search_options, but the independent GPs model "gp_independent" is the default. In this example, Hyper-Tune is using a single bracket, so the difference to MOBSTER-INDEP is due to the ensemble predictive distribution for the acquisition function.

Syne Tune also implements Hyper-Tune with the GP multi-task surrogate models used in MOBSTER. In result plots for this tutorial, original Hyper-Tune is called HYPERTUNE-INDEP, while this latter variant is called HYPERTUNE-JOINT. Our launcher script runs this variant if method="HYPERTUNE-JOINT".

API docs:

Finally, computations of Hyper-Tune can be controlled in the same way as in MOBSTER.

Hyper-Tune with Multiple Brackets

Just like ASHA and MOBSTER, Hyper-Tune can also be run with multiple brackets, simply by using the brackets argument of HyperbandScheduler. If brackets > 1, Hyper-Tune samples the bracket for a new trial from an adaptive distribution closely related to the ensemble distribution used for acquisitions. Our launcher script runs Hyper-Tune with 4 brackets if method="HYPERTUNE4-INDEP".

Recall that both ASHA and MOBSTER tend to work better for one than for multiple brackets. This may well be due to the fixed, non-adaptive distribution that brackets are sampled from. Ideally, a method would learn over time whether a low rung level tends to be reliable in predicting the ordering at higher ones, or whether it should rather be avoided (and \(r_{min}\) should be increased). This is what the adaptive mechanism in Hyper-Tune tries to do. In our comparisons, we find that HYPERTUNE-INDEP with multiple brackets can outperform MOBSTER-JOINT with a single bracket.

Details

In this section, we provide some details about Hyper-Tune and our implementation. The Hyper-Tune extensions are based on a quantification of consistency of data on different rung levels For example, assume that \(r < r_{*}\) are two rung levels, with sufficiently many points at \(r_{*}\). If \(\mathcal{X}_{*}\) collects trials with data at \(r_{*}\), all these have also been observed at \(r\). Sampling \(f(\mathcal{X}_{*}, r)\) from the posterior distribution of the surrogate model, we can compare the ordering of these predictions at \(r\) with the ordering of observations at \(r_{*}\), using a pair-wise ranking loss. A large loss value means frequent cross-overs of learning curves between \(r\) and \(r_{*}\), and predictions at rung level \(r\) are unreliable when it comes to the ordering of trials \(\mathcal{X}_{*}\) at \(r_{*}\).

At any point during the algorithm, denote by \(r_{*}\) the largest rung level with a sufficient number of observations (our implementation requires 6 points). Assuming that \(r_{*} > r_{min}\), we can estimate a distribution \([\theta_r]\) over rung levels \(\mathcal{R}_{*} = \{r\in\mathcal{R}\, |\, r\le r_{*}\}\) as follows. We draw \(S\) independent samples from the model at these rung levels. For each sample \(s\), we compute loss values \(l_{r, s}\) for \((r, r_{*})\) over all \(r\in\mathcal{R}_{*}\), and determine the argmin indicator \([\text{I}_{l_{r, s} = m_s}]\), where \(m_s = \text{min}(l_{r, s} | r\in\mathcal{R}_{*})\). The distribution \([\theta_r]\) is obtained as normalized sum of these indicators over \(s=1,\dots, S\). We also need to compute loss values \(l_{r_{*}, s}\), this is done using a cross-validation approximation, see here or the code in syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune for details. In the beginning, with too little data at the second rung level, we use \(\theta_{r_{min}} = 1\) and 0 elsewhere.

Decisions about a new configuration are based on an acquisition function over a predictive distribution indexed by \(\mathbf{x}\) alone. For Hyper-Tune, an ensemble distribution with weighting distribution \([\theta_r]\) is used. Sampling from this distribution works by first sampling \(r\sim [\theta_r]\), then \(f(\mathbf{x}) = f(\mathbf{x}, r)\) from the predictive distribution for that \(r\). This means that models from all rung levels are potentially used, weighted by how reliable they predict the ordering at the highest level \(r_{*}\) supported by data. In our experiments so far, this adaptive weighting can outperform the \(r_{acq}\) heuristic used in BOHB and MOBSTER.

Note that our implementation generalizes Hyper-Tune in that ranking losses and \([\theta_r]\) are estimated once \(r_{*} > r_{min}\) (i.e., once \(r_{*}\) is equal to the second rung level). In the original work, one has to wait until \(r_{*} = r_{max}\), i.e. the maximum rung level is supported by enough data. We find that for many expensive tuning problems, early decision-making can make a large difference, so if the Hyper-Tune extensions provide benefits, they should be used as early during the experiment as possible. For example, in the trial plots for Hyper-Tune shown above, it takes more than 10000 seconds for 6 trials to reach the full 200 epochs, so in the original variant of Hyper-Tune, advanced decision-making only starts when more than half of the experiment is already done.

If Hyper-Tune is used with more than one bracket, the \([\theta_r]\) is also used in order to sample the bracket for a new trial. To this end, we need to determine a distribution \(P(r)\) over all rung levels which feature as \(r_{min}\) in a bracket. In our NASBench-201 example, if Hyper-Tune is run with 5 brackets, the support of \(P(r)\) would be \(\mathcal{S} = \{1, 3, 9, 27, 81\}\). Also, denote the default distribution used in ASHA and MOBSTER by \(P_0(r)\). Let \(r_0 = \text{min}(r_{*}, \text{max}(\mathcal{S}))\). For \(r\in\mathcal{S}\), we define \(P(r) = M \theta_r / r\) for \(r\le r_0\), and \(P(r) = P_0(r)\) for \(r > r_0\), where \(M = \sum_{r\in\mathcal{S}, r\le r_0} P_0(r)\). In other words, we use \(\theta_r / r\) for rung levels supported by data, and the default \(P_0(r)\) elsewhere. Once more, this slightly generalizes Hyper-Tune.

DyHPO

DyHPO is another recent model-based multi-fidelity method. It is a promotion-based scheduler like the ones below with type="promotion", but differs from MOBSTER and Hyper-Tune in that promotion decisions are done based on the surrogate model, not on the quantile-based rule of successive halving. In a nutshell:

  • Rung levels are equi-spaced: \(\mathcal{R} = \{ r_{min}, r_{min} + \nu, r_{min} + 2 \nu, \dots \}\). If \(r_{min} = \nu\), this means that a trial which is promoted or started from scratch, always runs for \(\nu\) resources, independent of its current rung level.

  • Once a worker is free, we can either promote a paused trial or start a new one. In DyHPO, all paused trials compete with a number of new configurations for the next \(\nu\) resources to be spent. The scoring criterion is a special version of expected improvement, so depends on the surrogate model.

  • Different to MOBSTER, the surrogate model is used more frequently. Namely, in MOBSTER, if any trial can be promoted, the surrogate model is not accessed. This means that DyHPO comes with higher decision-making costs, which need to be controlled.

  • Since scoring trials paused at the highest rung populated so far requires extrapolation in terms of resource \(r\), it cannot be used with search_options["model"] = "gp_independent". The other surrogate models are supported.

Our implementation of DyHPO differs from the published work in a number of important points:

  • DyHPO uses an advanced surrogate model based on a neural network covariance kernel which is fitted to the current data. Our implementation supports DyHPO with the GP surrogate models detailed above, except for "gp_independent".

  • Our decision rule is different from DyHPO as published, and can be seen as a hybrid between DyHPO and ASHA. Namely, we throw a coin \(\{0, 1\}\) with probability \(P_1\) being configurable as probability_sh. If this gives 1, we try to promote a trial using the ASHA rule based on quantiles. Here, the quantile thresholds are adjusted to the linear spacing of rung levels. If no trial can be promoted this way, we fall back to the DyHPO rule. If the coin comes up 0, we use the DyHPO rule. The algorithm as published is obtained for \(P_1 = 0\). However, we find that a non-zero probability_sh is crucial for obtaining robust behaviour, since the original DyHPO rule on its own tends to start too many trials at the beginning before promoting any paused ones.

  • Since in DyHPO, the surrogate model is used more frequently than in MOBSTER, it is important to control surrogate model computations, as detailed above. Apart from the default for max_size_data_for_model, we also use opt_skip_period = 3 as default for DyHPO.

API docs:

Comparison of Methods

In this section, we present an empirical comparison of all methods discussed in this tutorial. The methodology of our study is as follows:

  • We use the NASBench-201 benchmark (CIFAR100 dataset)

  • All methods are run with a max_wallclock_time limit of 6 hours (or 21600 seconds). We plot minimum validation error attained as function of wallclock time (which, in our case, is simulated time)

  • Results are aggregated over a number of repetitions. The number of repetitions is 50 for SYNCSH, SYNCHB, BOHB, DEHB, ASHA-STOP, ASHA-PROM, ASHA6-STOP and SYNCMOBSTER, while MOBSTER-JOINT, MOBSTER-INDEP, HYPERTUNE1-INDEP, HYPERTUNE4-INDEP and HYPERTUNE-JOINT are repeated 30 times. Figures plot the interquartile mean in bold and a bootstrap 95% confidence interval for this estimator in dashed lines (the IQM is a robust estimator of the mean, but depends on more data than the median)

  • SYNCSH, ASHA-STOP, ASHA-PROM, MOBSTER-JOINT, MOBSTER-INDEP, HYPERTUNE1-INDEP use 1 bracket, HYPERTUNE4-INDEP, HYPERTUNE-JOINT use 4 brackets, and SYNCHB, BOHB, DEHB, SYNCMOBSTER use the maximum of 6 brackets

  • In SYNCSH, SYNCHB, ASHA-STOP, ASHA-PROM, ASHA6-STOP, new configurations are drawn at random, while BOHB, SYNCMOBSTER, MOBSTER-JOINT, MOBSTER-INDEP, HYPERTUNE1-INDEP, HYPERTUNE4-INDEP, HYPERTUNE-JOINT are variants of Bayesian optimization. In DEHB, configurations in the first bracket are drawn at random, but in later brackets, they are evolved from earlier ones

  • ASHA-STOP, ASHA6-STOP use early stopping, while SYNCSH, SYNCHB, BOHB, SYNCMOBSTER, ASHA-PROM, MOBSTER-JOINT, MOBSTER-INDEP, HYPERTUNE1-INDEP, HYPERTUNE4-INDEP, HYPERTUNE-JOINT use pause-and-resume. DEHB is a synchronous method, but does not resume trials from checkpoints (except in the very first bracket)

Here are results, grouped by synchronous decision-making, asynchronous decision-making (promotion type), and asynchronous decision-making (stopping type). ASHA-PROM results are repeated in all plots for reference.

Synchronous HPO

Synchronous Multi-fidelity HPO

Asynchronous HPO

Asynchronous Multi-fidelity HPO (promotion)

Asynchronous Stopping

Asynchronous Multi-fidelity HPO (stopping)

These results are obtained on a single benchmark with a rather small configuration space. Nevertheless, they are roughly in line with results we obtained on a larger range of benchmarks. A few conclusions can be drawn, which may help readers choosing the best HPO method and its configuration for their own problem.

  • Asynchronous methods outperform synchronous ones in general, in particular when it comes to any-time performance. A notable exception (on this benchmark) is SYNCMOBSTER, which performs en par with the best asynchronous methods.

  • Among the synchronous methods, SYNCMOBSTER performs best, followed by BOHB. SYNCHB and SYNCSH perform very similar. The performance of DEHB is somewhat disappointing on this benchmark.

  • The best-performing methods on this benchmark are MOBSTER-JOINT and HYPERTUNE1-INDEP, with HYPERTUNE4-INDEP a close runner-up. For MOBSTER, the joint multi-task surrogate model should be preferred, while for HYPERTUNE, the independent GPs model works better.

  • On this benchmark, moving to multiple brackets does not pay off for the asynchronous methods. However, on benchmarks where the choice of \(r_{min}\) is more critical, moving beyond successive halving can be beneficial. In such cases, we currently recommend to use HYPERTUNE-INDEP, whose adaptive weighting and bracket sampling is clearly more effective than simpler heuristics used in Hyperband or BOHB.

Benchmarking in Syne Tune

Benchmarking refers to the comparison of a range of HPO algorithms on one or more tuning problems, or benchmarks. This tutorial provides an overview of tooling which facilitates benchmarking of HPO algorithms in Syne Tune. The same tooling can be used to rapidly create launcher scripts for any HPO experiment, allowing you to easily switch between local, SageMaker, and simulator backend. The tutorial also shows how any number of experiments can be run in parallel, in order to obtain desired results faster.

Note

In order to run the code in this tutorial, you need to have installed Syne Tune from source. Also, make sure to have installed the blackbox-repository dependencies.

Note

Benchmarking (i.e., comparing different HPO methods) is using the Syne Tune experimentation framework in syne_tune.experiments. In this framework, a benchmark is simply just a tuning problem endowed with some defaults. There are other use cases of experimentation than benchmarking (see here and here), but the term benchmark for tuning problem is used in all of them.

Benchmarking with Simulator Backend

The fastest and cheapest way to compare a number of different HPO methods, or variants thereof, is benchmarking with the simulator backend. In this case, all training evaluations are simulated by querying metric and time values from a tabulated blackbox or a surrogate model. Not only are expensive computations on GPUs avoided, but the experiment also runs faster than real time. In some cases, results for experiments with max_wallclock_time of several hours, can be obtained in a few seconds.

Note

In order to use surrogate benchmarks and the simulator backend, you need to have the blackbox-repository dependencies installed, as detailed here. For the YAHPO blackbox, you also need the yahpo dependencies. Note that the first time you use a surrogate benchmark, its data files are downloaded and stored to your S3 bucket, this can take a considerable amount of time. The next time you use the benchmark, it is loaded from your local disk or your S3 bucket, which is fast.

Note

The experimentation framework in syne_tune.experiments which is used here, is not limited to benchmarking (i.e., comparing the performance between different HPO methods), but is also the default way to run many experiments in parallel, say with different configuration spaces. This is explained more in this tutorial.

Defining the Experiment

As usual in Syne Tune, the experiment is defined by a number of scripts. We will look at an example in benchmarking/examples/benchmark_hypertune/. Common code used in these benchmarks can be found in syne_tune.experiments.

Let us look at the scripts in order, and how you can adapt them to your needs:

  • benchmarking/examples/benchmark_hypertune/baselines.py: Defines the HPO methods to take part in the experiment, in the form of a dictionary methods which maps method names to factory functions, which in turn map MethodArguments to scheduler objects. The MethodArguments class contains the union of attributes needed to configure schedulers. In particular, scheduler_kwargs contains constructor arguments. For your convenience, the mapping from MethodsArguments to scheduler are defined for most baseline methods in syne_tune.experiments.default_baselines (as noted just below, this mapping involves merging argument dictionaries), but you can override arguments as well (for example, type in the examples here). Note that if you like to compare different variants of a method, you need to create different entries in methods, for example Methods.MOBSTER_JOINT and Methods.MOBSTER_INDEP are different variants of MOBSTER.

  • benchmarking/examples/benchmark_hypertune/benchmark_definitions.py: Defines the benchmarks to be considered in this experiment, in the form of a dictionary benchmark_definitions with values of type SurrogateBenchmarkDefinition. In general, you will just pick definitions from syne_tune.experiments.benchmark_definitions, unless you are using your own surrogate benchmark not contained in Syne Tune. But you can also modify parameters, for example surrogate and surrogate_kwargs in order to select a different surrogate model, or you can change the defaults for n_workers or max_wallclock_time.

  • benchmarking/examples/benchmark_hypertune/hpo_main.py: Script for launching experiments locally. All you typically need to do here is to import syne_tune.experiments.launchers.hpo_main_simulator and (optionally) to add additional command line arguments you would like to parameterize your experiment with. In our example here, we add two options, num_brackets which configures Hyperband schedulers, and num_samples which configures the Hyper-Tune methods only. Apart from extra_args, you also need to define map_method_args, which modifies method_kwargs (the arguments of MethodArguments) based on the extra arguments. Details for map_method_args are given just below. Finally, main() is called with your methods and benchmark_definitions dictionaries, and (optionally) with extra_args and map_method_args. We will see shortly how the launcher is called, and what happens inside.

  • benchmarking/examples/benchmark_hypertune/launch_remote.py: Script for launching experiments remotely, in that each experiment runs as its own SageMaker training job, in parallel with other experiments. You need to import syne_tune.experiments.launchers.launch_remote_simulator and pass the same methods, benchmark_definitions, extra_args as in benchmarking.examples.benchmark_hypertune.hpo_main. Moreover, you need to specify paths for source dependencies. If you installed Syne Tune from sources, it is easiest to specify source_dependencies=benchmarking.__path__, as this allows access to all benchmarks and examples included there. On top of that, you can pass an indicator function is_expensive_method to tag the HPO methods which are themselves expensive to run. As detailed below, our script runs different seeds (repetitions) in parallel for expensive methods, but sequentially for cheap ones. We will see shortly how the launcher is called, and what happens inside.

  • benchmarking/examples/benchmark_hypertune/requirements.txt: Dependencies for hpo_main.py to be run remotely as SageMaker training job, in the context of launching experiments remotely. In particular, this needs the dependencies of Syne Tune itself. A safe bet here is syne-tune[extra] and tqdm (which is the default if requirements.txt is missing). However, you can decrease startup time by narrowing down the dependencies you really need (see FAQ). In our example here, we need gpsearchers and kde for methods. For simulated experiments, you always need to have blackbox-repository here. In order to use YAHPO benchmarks, also add yahpo.

Specifying Extra Arguments

In many cases, you will want to run different methods using their default arguments, or only change them as part of the definition in baselines.py. But sometimes, it can be useful to be able to set options via extra command line arguments. This can be done via extra_args and map_method_args, which are typically used in order to be able to configure scheduler arguments for certain methods. But in principle, any argument of MethodArguments can be modified. Here, extra_args is simply extending arguments to the command line parser, where the name field contains the name of the option without any leading “-“.

map_method_args has the signature

method_kwargs = map_method_args(args, method, method_kwargs)

Here, method_kwargs are arguments of MethodArguments, which can be modified by map_method_args (the modified dictionary is returned). args is the result of command line parsing, and method is the name of the method to be constructed based on these arguments. The latter argument allows map_method_args to depend on the method. In our example benchmarking/examples/benchmark_hypertune/hpo_main.py, num_brackets applies to all methods, while num_samples only applies to the variants of Hyper-Tune. Both arguments modify the dictionary scheduler_kwargs in MethodArguments, which contains constructor arguments for the scheduler.

Note the use of recursive_merge. This means that the changes done in map_method_args are recursively merged into the prior method_kwargs. In our example, we may already have method_kwargs.scheduler_kwargs or even method_kwargs.scheduler_kwargs.search_options. While the new settings here take precedence, prior content of method_kwargs not affected remains in place. In the same way, extra arguments passed to baseline wrappers in syne_tune.experiments.default_baselines are recursively merged into the arguments determined by the default logic.

Note

map_method_args is applied to rewrite method_kwargs just before the method is created. This means that all entries of MethodArguments can be modified from their default values. You can also use map_method_args independent of extra_args (however, if extra_args is given, then map_method_args must be given as well).

Writing Extra Results

By default, Syne Tune writes result files metadata.json, results.csv.zip, and tuner.dill for every experiment, see here. Here, results.csv.zip contains all data reported by training jobs, along with time stamps. The contents of this dataframe can be customized, by adding extra columns to it. This is done by passing extra_results_composer of type ExtraResultsComposer when creating the StoreResultsCallback callback, which is passed in callbacks to Tuner. You can use this mechanism by passing a ExtraResultsComposer object as extra_results to main. This object extracts extra information and returns it as dictionary, which is appended to the results dataframe. A complete example is benchmarking/examples/benchmark_dyhpo.

Launching Experiments Locally

Here is an example of how simulated experiments are launched locally (if you installed Syne Tune from source, you need to start the script from the benchmarking/examples directory):

python benchmark_hypertune/hpo_main.py \
  --experiment_tag tutorial-simulated --benchmark nas201-cifar100 \
  --method ASHA --num_seeds 10

This call runs a number of experiments sequentially on the local machine:

  • experiment_tag: Results of experiments are written to ~/syne-tune/{experiment_tag}/*/{experiment_tag}-*/. This name should confirm to S3 conventions (alphanumerical and -; no underscores).

  • benchmark: Selects benchmark from keys of benchmark_definitions. If this is not given, experiments for all keys in benchmark_definitions are run in sequence.

  • method: Selects HPO method to run from keys of methods. If this is not given, experiments for all keys in methods are run in sequence.

  • num_seeds: Each experiment is run num_seeds times with different seeds (0, ..., num_seeds - 1). Due to random factors both in training and tuning, a robust comparison of HPO methods requires such repetitions. Fortunately, these are cheap to obtain in the simulation context. Another parameter is start_seed (default: 0), giving seeds start_seed, ..., num_seeds - 1. For example, --start_seed 5  --num_seeds 6 runs for a single seed equal to 5. The dependence of random choices on the seed is detailed below.

  • max_wallclock_time, n_workers: These arguments overwrite the defaults specified in the benchmark definitions.

  • max_size_data_for_model: Parameter for Bayesian optimization, MOBSTER or Hyper-Tune, see here and here.

  • scale_max_wallclock_time: If 1, and if n_workers is given as argument, but not max_wallclock_time, the benchmark default benchmark.max_wallclock_time is multiplied by :math:B / min(A, B), where A = n_workers, B = benchmark.n_workers. This means we run for longer if n_workers < benchmark.n_workers, but keep benchmark.max_wallclock_time the same otherwise.

  • use_long_tuner_name_prefix: If 1, results for an experiment are written to a directory whose prefix is f"{experiment_tag}-{benchmark_name}-{seed}", followed by a postfix containing date-time and a 3-digit hash. If 0, the prefix is experiment_tag only. The default is 1 (long prefix).

  • restrict_configurations: See below.

  • fcnet_ordinal: Applies to FCNet benchmarks only. The hyperparameter hp_init_lr has domain choice([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]). Since the parameter is really ordinal, this is not a good choice. With this option, the domain can be switched to different variants of ordinal. The default is nn-log, which is the domain logordinal([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]) (this is also the replacement which streamline_config_space() would do). In order to keep the original categorical domain, use --fcnet_ordinal none.

If you defined additional arguments via extra_args, you can use them here as well. For example, --num_brackets 3 would run all multi-fidelity methods with 3 brackets (instead of the default 1).

Launching Experiments Remotely

There are some drawbacks of launching experiments locally. First, they block the machine you launch from. Second, different experiments are run sequentially, not in parallel. Remote launching has exactly the same parameters as launching locally, but experiments are sliced along certain axes and run in parallel, using a number of SageMaker training jobs. Here is an example (if you installed Syne Tune from source, you need to start the script from the benchmarking/examples directory):

python benchmark_hypertune/launch_remote.py \
  --experiment_tag tutorial-simulated --benchmark nas201-cifar100 \
  --num_seeds 10

Since --method is not used, we run experiments for all methods. Also, we run experiments for 10 seeds. There are 7 methods, so the total number of experiments is 70 (note that we select a single benchmark here). Running this command will launch 43 SageMaker training jobs, which do the work in parallel. Namely, for methods ASHA, SYNCHB, BOHB, all 10 seeds are run sequentially in a single SageMaker job, since our is_expensive_method function returns False for them. Simulating experiments is so fast for these methods that it is best to run seeds sequentially. However, for MOBSTER-JOINT, MOBSTER-INDEP, HYPERTUNE-INDEP, HYPERTUNE-JOINT, our is_expensive_method returns True, and we use one SageMaker training jobs for each seeds, giving rise to 4 * 10 = 40 jobs running in parallel. For these methods, the simulation time is quite a bit longer, because decision making takes more time (these methods fit Gaussian process surrogate models to data and optimize acquisition functions). Results are written to ~/syne-tune/{experiment_tag}/ASHA/ for the cheap method ASHA, and to /syne-tune/{experiment_tag}/MOBSTER-INDEP-3/ for the expensive method MOBSTER-INDEP and seed 3.

The command above selected a single benchmark nas201-cifar100. If --benchmark is not given, we iterate over all benchmarks in benchmark_definitions. This is done sequentially, which works fine for a limited number of benchmarks.

However, you may want to run experiments on a large number of benchmarks, and to this end also parallelize along the benchmark axis. To do so, you can pass a nested dictionary as benchmark_definitions. For example, we could use the following:

from syne_tune.experiments.benchmark_definitions import (
    nas201_benchmark_definitions,
    fcnet_benchmark_definitions,
    lcbench_selected_benchmark_definitions,
)

benchmark_definitions = {
    "nas201": nas201_benchmark_definitions,
    "fcnet": fcnet_benchmark_definitions,
    "lcbench": lcbench_selected_benchmark_definitions,
}

In this case, experiments are sliced along the axis ("nas201", "fcnet", "lcbench") to be run in parallel in different SageMaker training jobs.

Dealing with ResourceLimitExceeded Errors

When launching many experiments in parallel, you may run into your AWS resource limits, so that no more SageMaker training jobs can be run. The default behaviour in this case is to wait for 10 minutes and try again. You can influence this by --estimator_fit_backoff_wait_time <wait_time>, where <wait_time> is the waiting time between attempts in seconds. If this is 0 or negative, the script terminates with an error once your resource limits are reached.

Pitfalls of Experiments from Tabulated Blackboxes

Comparing HPO methods on tabulated benchmarks, using simulation, has obvious benefits. Costs are very low. Moreover, results are often obtain many times faster than real time. However, we recommend you do not rely on such kind of benchmarking only. Here are some pitfalls:

  • Tabulated benchmarks are often of limited complexity, because more complex benchmarks cannot be sampled exhaustively

  • Tabulated benchmarks do not reflect the stochasticity of real benchmarks (e.g., random weight initialization, random ordering of mini-batches)

  • While tabulated benchmarks like nas201 or fcnet are evaluated exhaustively or on a fine grid, other benchmarks (like lcbench) contain observations only at a set of randomly chosen configurations, while their configuration space is much larger or even infinite. For such benchmarks, you can either restrict the scheduler to suggest configurations only from the set supported by the benchmark (see subsection just below), or you can use a surrogate model which interpolates observations from those contained in the benchmark to all others in the configuration space. Unfortunately, the choice of surrogate model can strongly affect the benchmark, for the same underlying data. As a general recommendation, you should be careful with surrogate benchmarks which offer a large configuration space, but are based on only medium amounts of real data.

Restricting Scheduler to Configurations of Tabulated Blackbox

For a tabulated benchmark like lcbench, most entries of the configuration space are not covered by data. For such, you can either use a surrogate, which can be configured by attributes surrogate, surrogate_kwargs, and add_surrogate_kwargs of SurrogateBenchmarkDefinition. Or you can restrict the scheduler to only suggest configurations covered by data. The latter is done by the option --restrict_configurations 1. The advantage of doing so is that your comparison does not depend on the choice of surrogate, but only on the benchmark data itself. However, there are also some drawbacks:

  • This option is currently not supported for the following schedulers:

    • Grid Search

    • SyncBOHB

    • BOHB

    • DEHB

    • REA

    • KDE

    • PopulationBasedTraining

    • ZeroShotTransfer

    • ASHACTS

    • MOASHA

  • Schedulers like Gaussian process based Bayesian optimization typically use local gradient-based optimization of the acquisition function. This is not possible with --restrict_configurations 1. Instead, they evaluate the acquisition function at a finite number num_init_candidates of points and pick the best one

  • In general, you should avoid to use surrogate benchmarks which offer a large configuration space, but are based on only medium amounts of real data. When using --restrict_configurations 1 with such a benchmark, your methods may perform better than they should, just because they nearly sample the space exhaustively

In general, --restrict_configurations 1 is supported for schedulers which select the next configuration from a finite set. In contrast, methods like DEHB or BOHB (or Bayesian optimization with local acquisition function optimization) optimize over encoded vectors, then round the solution back to a configuration. In order to use a tabulated benchmark like lcbench with these methods, you need to specify a surrogate. Maybe the least intrusive surrogate is nearest neighbor. Here is the benchmark definition for lcbench:

syne_tune/experiments/benchmark_definitions/lcbench.py
def lcbench_benchmark(dataset_name: str, datasets=None) -> SurrogateBenchmarkDefinition:
    """
    The default is to use nearest neighbour regression with ``K=1``. If
    you use a more sophisticated surrogate, it is recommended to also
    define ``add_surrogate_kwargs``, for example:

    .. code-block:: python

       surrogate="RandomForestRegressor",
       add_surrogate_kwargs={
           "predict_curves": True,
           "fit_differences": ["time"],
       },

    :param dataset_name: Value for ``dataset_name``
    :param datasets: Used for transfer learning
    :return: Definition of benchmark
    """
    return SurrogateBenchmarkDefinition(
        max_wallclock_time=7200,
        n_workers=4,
        elapsed_time_attr="time",
        metric="val_accuracy",
        mode="max",
        blackbox_name="lcbench",
        dataset_name=dataset_name,
        surrogate="KNeighborsRegressor",  # 1-nn surrogate
        surrogate_kwargs={"n_neighbors": 1},
        max_num_evaluations=4000,
        datasets=datasets,
        max_resource_attr="epochs",
    )


The 1-NN surrogate is selected by surrogate="KNeighborsRegressor" and setting the number of nearest neighbors to 1. For each configuration, the surrogate finds the nearest neighbor in the table (w.r.t. Euclidean distance between encoded vectors) and returns its metric values.

Selecting Benchmarks from benchmark_definitions

Each family of tabulated (or surrogate) blackboxes accessible to the benchmarking tooling discussed here, are represented by a Python file in syne_tune.experiments.benchmark_definitions (the same directly also contains definitions for real benchmarks). For example:

Typically, a blackbox concerns a certain machine learning algorithm with a fixed configuration space. Many of them have been evaluated over a number of different datasets. Note that in YAHPO, a blackbox is called scenario, and a dataset is called instance, so that a scenario can have a certain number of instances. In our terminology, a tabulated benchmark is obtained by selecting a blackbox together with a dataset.

The files in syne_tune.experiments.benchmark_definitions typically contain:

  • Functions named *_benchmark, which map arguments (such as dataset_name) to the benchmark definition SurrogateBenchmarkDefinition and * being the name of the blackbox (or scenario).

  • Dictionaries named *_benchmark_definitions with SurrogateBenchmarkDefinition values. If a blackbox has a lot of datasets, we also define a dictionary *_selected_benchmark_definitions, which selects benchmarks which are interesting (e.g., not all baselines achieving the same performance rapidly). In general, we recommend starting with these selected benchmarks.

The YAHPO Family

A rich source of blackbox surrogates in Syne Tune comes from YAHPO, which is also detailed in this paper. YAHPO contains a number of blackboxes (called scenarios), some of which over a lot of datasets (called instances). All our definitions are in syne_tune.experiments.benchmark_definitions.yahpo. Further details can also be found in the import code syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import. Here is an overview:

  • yahpo_nb301: NASBench301. Single scenario and instance.

  • yahpo_lcbench: LCBench. Same underlying data than our own LCBench, but different surrogate model.

  • yahpo_iaml: Family of blackboxes, parameterized by ML method (yahpo_iaml_methods) and target metric (yahpo_iaml_metrics). Each of th`ese have 4 datasets (OpenML datasets).

  • yahpo_rbv2: Family of blackboxes, parameterized by ML method (yahpo_rbv2_methods) and target metric (yahpo_rbv2_metrics). Each of these come with a large number of datasets (OpenML datasets). Note that compared to YAHPO Gym, we filtered out scenarios which are invalid (e.g., F1 score 0, AUC/F1 equal to 1). We also determined useful max_wallclock_time values (yahpo_rbv2_max_wallclock_time), and selected benchmarks which show interesting behaviour (yahpo_rbv2_selected_instances).

Note

At present (YAHPO Gym v1.0), the yahpo_lcbench surrogate has been trained on invalid LCBench original data (namely, values for first and last fidelity value have to be removed). As long as this is not fixed, we recommend using our built-in lcbench blackbox instead.

Note

In YAHPO Gym, yahpo_iaml and yahpo_rbv2 have a fidelity attribute trainsize with values between 1/20 and 1, which is the fraction of full dataset the method has been trained. Our import script multiplies trainsize values with 20 and designates type randint(1, 20), since common Syne Tune multi-fidelity schedulers require resource_attr values to be positive integers. yahpo_rbv2 has a second fidelity attribute repl, whose value is constant 10, this is removed by our import script.

Benchmarking with Local Backend

A real benchmark (as opposed to a benchmark based on tabulated data or a surrogate model) is based on a training script, which is executed for each evaluation. The local backend is the default choice in Syne Tune for running on real benchmarks.

Note

While Syne Tune contains benchmark definitions for all surrogate benchmarks in syne_tune.experiments.benchmark_definitions, examples for real benchmarks are only available when Syne Tune is installed from source. They are located in benchmarking.

Defining the Experiment

As usual in Syne Tune, the experiment is defined by a number of scripts. We will look at an example in benchmarking/examples/launch_local/. Common code used in these benchmarks can be found in syne_tune.experiments:

Let us look at the scripts in order, and how you can adapt them to your needs:

Extra arguments can be specified by extra_args, map_method_args, and extra results can be written using extra_results, as is explained here.

Launching Experiments Locally

Here is an example of how experiments with the local backend are launched locally:

python benchmarking/examples/launch_local/hpo_main.py \
  --experiment_tag tutorial-local --benchmark resnet_cifar10 \
  --method ASHA --num_seeds 1 --n_workers 1

This call runs a single experiment on the local machine (which needs to have a GPU with PyTorch being installed):

  • experiment_tag: Results of experiments are written to ~/syne-tune/{experiment_tag}/*/{experiment_tag}-*/. This name should confirm to S3 conventions (alphanumerical and -; no underscores).

  • benchmark: Selects benchmark from keys of real_benchmark_definitions(). The default is resnet_cifar10.

  • method: Selects HPO method to run from keys of methods. If this is not given, experiments for all keys in methods are run in sequence.

  • num_seeds: Each experiment is run num_seeds times with different seeds (0, ..., num_seeds - 1). Due to random factors both in training and tuning, a robust comparison of HPO methods requires such repetitions. Another parameter is start_seed (default: 0), giving seeds start_seed, ..., num_seeds - 1. For example, --start_seed 5 --num_seeds 6 runs for a single seed equal to 5.

  • n_workers, max_wallclock_time: You can overwrite the default values for the selected benchmark by these command line arguments.

  • max_size_data_for_model: Parameter for Bayesian optimization, MOBSTER or Hyper-Tune, see here and here.

  • num_gpus_per_trial: If you run on an instance with more than one GPU, you can prescribe how many GPUs should be allocated to each trial. The default is 1. Note that if the product of n_workers and num_gpus_per_trial is larger than the number of GPUs on the instance, trials will be delayed.

  • gpus_to_use: Allows to restrict the GPUs used by Syne Tune. For example, if your instance has 8 GPUs, but you also want to use the latter four of them, use gpus_to_use=[4, 5, 6, 7].

  • delete_checkpoints: If 1, checkpoints of trials are removed whenever they are not needed anymore. The default is 0, in that all checkpoints are retained.

  • scale_max_wallclock_time: If 1, and if n_workers is given as argument, but not max_wallclock_time, the benchmark default benchmark.max_wallclock_time is multiplied by :math:B / min(A, B), where A = n_workers, B = benchmark.n_workers. This means we run for longer if n_workers < benchmark.n_workers, but keep benchmark.max_wallclock_time the same otherwise.

  • use_long_tuner_name_prefix: If 1, results for an experiment are written to a directory whose prefix is f"{experiment_tag}-{benchmark_name}-{seed}", followed by a postfix containing date-time and a 3-digit hash. If 0, the prefix is experiment_tag only. The default is 1 (long prefix).

If you defined additional arguments via extra_args, you can use them here as well.

Note

When launching an experiment locally, you need to be on an instance which supports the required computations (e.g., has 1 or more GPUs), and you need to have installed all required dependencies, including those of the SageMaker framework. In the example above, resnet_cifar10 uses the PyTorch framework, and n_workers=4 by default, which we overwrite by n_workers=1: you need to launch on a machine with 1 GPU, and with PyTorch being installed and properly setup to run GPU computations. If you cannot be bothered with all of this, please consider remote launching as an alternative. On the other hand, you can launch experiments locally without using SageMaker (or AWS) at all.

Benchmark Definitions

In the example above, we select a benchmark via --benchmark resnet_cifar10. All currently included real benchmarks are collected in real_benchmark_definitions(), a function which returns the dictionary of real benchmarks, configured by some extra arguments. If you are happy with selecting one of these existing benchmarks, you may safely skip this subsection.

For resnet_cifar10, this selects resnet_cifar10_benchmark(), which returns meta-data for the benchmark as a RealBenchmarkDefinition object. Here, the argument sagemaker_backend is False in our case, since we use the local backend, and additional **kwargs override arguments of RealBenchmarkDefinition. Important arguments are:

  • script: Absolute filename of the training script. If your script requires additional dependencies on top of the SageMaker framework, you need to specify them in requirements.txt in the same directory.

  • config_space: Configuration space, this must include max_resource_attr

  • metric, mode, max_resource_attr, resource_attr: Names related to the benchmark, either of methods reported (output) or of config_space entries (input).

  • max_wallclock_time, n_workers, max_num_evaluations: Defaults for tuner or stopping criterion, suggested for this benchmark.

  • instance_type: Suggested AWS instance type for this benchmark.

  • framework, estimator_kwargs: SageMaker framework and additional arguments to SageMaker estimator.

Note that parameters like n_workers, max_wallclock_time, or instance_type are given default values here, which can be overwritten by command line arguments. This is why the function signature ends with **kwargs, and we execute _kwargs.update(kwargs) just before creating the RealBenchmarkDefinition object.

Launching Experiments Remotely

Remote launching is particularly convenient for experiments with the local backend, even if you just want to run a single experiment. For local launching, you need to be on an EC2 instance of the desired instance type, and Syne Tune has to be installed there along with all dependencies of your benchmrk. None of this needs to be done for remote launching. Here is an example:

python benchmarking/examples/launch_local/launch_remote.py \
  --experiment_tag tutorial-local --benchmark resnet_cifar10 \
  --num_seeds 5

Since --method is not used, we run experiments for all methods (RS, BO, ASHA, MOBSTER), and for 5 seeds. These are 20 experiments, which are mapped to 20 SageMaker training jobs. These will run on instances of type ml.g4dn.12xlarge, which is the default for resnet_cifar10 and the local backend. Instances of this type have 4 GPUs, so we can use n_workers up to 4 (the default being 4). Results are written to S3, using paths such as syne-tune/{experiment_tag}/ASHA-3/ for method ASHA and seed 3.

Finally, some readers may be puzzled why Syne Tune dependencies are defined in benchmarking/examples/launch_local/requirements-synetune.txt, and not in requirements.txt instead. The reason is that dependencies of the SageMaker estimator for running the experiment locally is really the union of two such files. First, requirements-synetune.txt for the Syne Tune dependencies, and second, requirements.txt next to the training script. The remote launching script is creating a requirements.txt file with this union in benchmarking/examples/launch_local/, which should not become part of the repository.

Visualizing Tuning Metrics in the SageMaker Training Job Console

When experiments are launched remotely with the local or SageMaker backend, a number of metrics are published to the SageMaker training job console (this feature can be switched off with --remote_tuning_metrics 0):

  • BEST_METRIC_VALUE: Best metric value attained so far

  • BEST_TRIAL_ID: ID of trial for best metric value so far

  • BEST_RESOURCE_VALUE: Resource value for best metric value so far

  • BEST_HP_PREFIX, followed by hyperparameter name: Hyperparameter value for best metric value so far

You can inspect these metrics in real time in AWS CloudWatch. To do so:

  • Locate the training job running your experiment in the AWS SageMaker console. Click on Training, then Training jobs, then on the job in the list. For the command above, the jobs are named like tutorial-local-RS-0-XyK8 (experiment tag, then method, then seed, then 4-character hash).

  • Under Metrics, you will see a number of entries, starting with best_metric_value and best_trial_id.

  • Further below, under Monitor, click on View algorithm metrics. This opens a CloudWatch dashboard

  • At this point, you need to change a few defaults, in that CloudWatch only samples metrics (by grepping the logs) every 5 minutes and then displays average values over the 5-minute window. Click on Browse and select the metrics you want to display. For now, select best_metric_value, best_trial_id, best_resource_value.

  • Click on Graphed metrics, and for every metric, select Period -> 30 seconds. Also, select Statistics -> Maximum for metrics best_trial_id, best_resource_value. For best_metric_value, select Statistics -> Minimum if your objective metric is minimized (mode="min"), and Statistics -> Maximum otherwise. In our resnet_cifar10 example, the objective is accuracy, to be maximized, so we select the latter.

  • Finally, select `10s for auto-refresh (the circle with arrow in the upper right corner), and change the temporal resolution by displaying 1h (top row).

This visualization shows you the best metric value attained so far, and which trial attained it for which resource value (e.g., number of epochs). It can be improved. For example, we could plot the curves in different axes. Also, we can visualize the best hyperparameter configuration found so far. In the resnet_cifar10 example, this is given by the metrics best_hp_lr, best_hp_batch_size, best_hp_weight_decay, best_hp_momentum.

Random Seeds and Paired Comparisons

Random effects are the most important reason for variations in experimental outcomes, due to which a meaningful comparison of HPO methods needs to run a number of repetitions (also called seeds above). There are two types of random effects:

  • Randomness in the evaluation of the objective \(f(x)\) to optimize: repeated evaluations of \(f\) for the same configuration \(x\) result in different metric values. In neural network training, these variations originate from random weight initialization and the ordering of mini-batches.

  • Randomness in the HPO algorithm itself. This is evident for random search and ASHA, but just as well concerns Bayesian optimization, since the initial configurations are drawn at random, and the optimization of the acquisition function involves random choices as well.

Syne Tune allows the second source of randomness to be controlled by passing a random seed to the scheduler at initialization. If random search is run several times with the same random seed for the same configuration space, exactly the same sequence of configurations is suggested. The same holds for ASHA. When running random search and Bayesian optimization with the same random seed, the initial configurations (which in BO are either taken from points_to_evaluate or drawn at random) are identical.

The scheduler random seed used in a benchmark experiment is a combination of a master random seed and the seed number introduced above (the latter has values \(0, 1, 2, \dots\)). The master random seed is passed to launch_remote.py or hpo_main.py as --random_seed. If no master random seed is passed, it is drawn at random and output. The master random seed is also written into metadata.json as part of experimental results. Importantly, the scheduler random seed is the same across different methods for the same seed. This implements a practice called paired comparison, whereby for each seed, different methods are fed with the same random number sequence. This practice reduces variance between method outcomes, while still taking account of randomness by running the experiment several times (for different seeds \(0, 1, 2, \dots\)).

Note

When comparing several methods on the same benchmark, it is recommended to (a) repeat the experiment several times (via --num_seeds), and to (b) use the same master random seed. If all comparisons are done with a single call of launch_remote.py or hpo_main.py, this is automatically the case, as the master random seed is drawn at random. However, if the comparison extends over several calls, make sure to note down the master random seed from the first call and pass this value via --random_seed to subsequent calls. The master random seed is also stored as random_seed in the metadata metadata.json as part of experimental results.

Benchmarking with SageMaker Backend

The SageMaker backend allows you to run distributed tuning across several instances, where the number of parallel evaluations is not limited by the configuration of an instance, but only by your compute budget.

Defining the Experiment

The scripts required to define an experiment are pretty much the same as in the local backend case. We will look at an example in benchmarking/examples/launch_sagemaker/. Common code used in these benchmarks can be found in syne_tune.experiments:

The scripts benchmarking/examples/launch_sagemaker/baselines.py, benchmarking/examples/launch_sagemaker/hpo_main.py, and benchmarking/examples/launch_sagemaker/launch_remote.py are identical in structure to what happens in the local backend case, with the only difference that syne_tune.experiments.launchers.hpo_main_sagemaker or syne_tune.experiments.launchers.launch_remote_sagemaker are imported from. Moreover, Syne Tune dependencies need to be specified in benchmarking/examples/launch_sagemaker/requirements.txt.

In terms of benchmarks, the same definitions can be used for the SageMaker backend, in particular you can select from real_benchmark_definitions(). However, the functions there are called with sagemaker_backend=True, which can lead to different values in RealBenchmarkDefinition. For example, resnet_cifar10_benchmark() returns instance_type=ml.g4dn.xlarge for the SageMaker backend (1 GPU per instance), but instance_type=ml.g4dn.12xlarge for the local backend (4 GPUs per instance). This is because for the local backend to support n_workers=4, the instance needs to have at least 4 GPUs, but for the SageMaker backend, each worker uses its own instance, so a cheaper instance type can be used.

Extra arguments can be specified by extra_args, map_method_args, and extra results can be written using extra_results, as is explained here.

Launching Experiments Locally

Here is an example of how experiments with the SageMaker backend are launched locally:

python benchmarking/examples/launch_sagemaker/hpo_main.py \
  --experiment_tag tutorial-sagemaker --benchmark resnet_cifar10 \
  --method ASHA --num_seeds 1

This call launches a single experiment on the local machine (however, each trial launches the training script as a SageMaker training job, using the instance type suggested for the benchmark). The command line arguments are the same as in the local backend case. Additional arguments are:

  • n_workers, max_wallclock_time: Overwrite the default values for the selected benchmark.

  • max_failures: Number of trials which can fail without terminating the entire experiment.

  • warm_pool: This flag is discussed below.

  • max_size_data_for_model: Parameter for Bayesian optimization, MOBSTER or Hyper-Tune, see here and here.

  • scale_max_wallclock_time: If 1, and if n_workers is given as argument, but not max_wallclock_time, the benchmark default benchmark.max_wallclock_time is multiplied by :math:B / min(A, B), where A = n_workers, B = benchmark.n_workers. This means we run for longer if n_workers < benchmark.n_workers, but keep benchmark.max_wallclock_time the same otherwise.

  • use_long_tuner_name_prefix: If 1, results for an experiment are written to a directory whose prefix is f"{experiment_tag}-{benchmark_name}-{seed}", followed by a postfix containing date-time and a 3-digit hash. If 0, the prefix is experiment_tag only. The default is 1 (long prefix).

If you defined additional arguments via extra_args, you can use them here as well.

Launching Experiments Remotely

Sagemaker backend experiments can also be launched remotely, in which case each experiment is run in a SageMaker training job, using a cheap instance type, within which trials are executed as SageMaker training jobs as well. The usage is the same as in the local backend case.

When experiments are launched remotely with the SageMaker backend, a number of metrics are published to the SageMaker training job console (this feature can be switched off with --remote_tuning_metrics 0). This is detailed here.

Using SageMaker Managed Warm Pools

The SageMaker backend supports SageMaker managed warm pools, a recently launched feature of SageMaker. In a nutshell, this feature allows customers to circumvent start-up delays for SageMaker training jobs which share a similar configuration (e.g., framework) with earlier jobs which have already terminated. For Syne Tune with the SageMaker backend, this translates to experiments running faster or, for a fixed max_wallclock_time, running more trials. Warm pools are used if the command line argument --warm_pool 1 is used with hpo_main.py. For the example above:

python benchmarking/examples/launch_sagemaker/hpo_main.py \
  --experiment_tag tutorial-sagemaker --benchmark resnet_cifar10 \
  --method ASHA --num_seeds 1 --warm_pool 1

The warm pool feature is most useful with multi-fidelity HPO methods (such as ASHA and MOBSTER in our example). Some points you should be aware of:

  • When using SageMaker managed warm pools with the SageMaker backend, it is important to use start_jobs_without_delay=False when creating the Tuner.

  • Warm pools are a billable resource, and you may incur extra costs arising from the fact that up to n_workers instances are kept running for about 10 minutes at the end of your experiment. You have to request warm pool quota increases for instance types you would like to use. For our example, you need to have quotas for (at least) four ml.g4dn.xlarge instances, both for training and warm pool usage.

  • As a sanity check, you can watch the training jobs in the console. You should see InUse and Reused in the Warm pool status column. Running the example above, the first 4 jobs should complete in about 7 to 8 minutes, while all subsequent jobs should take only 2 to 3 minutes.

Visualization of Results

As we have seen, Syne Tune is a powerful tool for running a large number of experiments in parallel, which can be used to compare different tuning algorithms, or to split a difficult tuning problem into smaller pieces, which can be worked on in parallel. In this section, we show how results of all experiments of such a comparative study can be visualized, using plotting facilities provided in Syne Tune.

Note

This section offers an example of the plotting facilities in Syne Tune. A more comprehensive tutorial is here.

A Comparative Study

For the purpose of this tutorial, we ran the setup of benchmarking/examples/benchmark_hypertune/, using 15 random repetitions (or seeds). This is the command:

python benchmarking/examples/benchmark_hypertune/launch_remote.py \
  --experiment_tag docs-1 --random_seed 2965402734 --num_seeds 15

Note that we fix the seed here in order to obtain repeatable results. Recall from here that we compare 7 methods on 12 surrogate benchmarks:

  • Since 4 of the 7 methods are “expensive”, the above command launches 3 + 4 * 15 = 63 remote tuning jobs in parallel. Each of these jobs runs experiments for one method and all 12 benchmarks. For the “expensive” methods, each job runs a single seed, while for the remaining methods (ASHA, SYNCHB, BOHB), all seeds are run sequentially in a single job, so that a job for a “cheap” method runs 12 * 15 = 180 experiments sequentially.

  • The total number of experiment runs is 7 * 12 * 15 = 1260

  • Results of these experiments are stored to S3, using paths such as <s3-root>/syne-tune/docs-1/ASHA/docs-1-<datetime>/ for ASHA (all seeds), or <s3-root>/syne-tune/docs-1/HYPERTUNE-INDEP-5/docs-1-<datetime>/ for seed 5 of HYPERTUNE-INDEP. Result files are metadata.json, results.csv.gz, and tuner.dill. The former two are required for plotting results.

Once all of this has finished, we are left with 3780 result files on S3. We will now show how these can be downloaded, processed, and visualized.

Visualization of Results

First, we need to download the results from S3 to the local disk. This can be done by a command which is also printed at the end of launch_remote.py:

aws s3 sync s3://<BUCKET-NAME>/syne-tune/docs-1/ ~/syne-tune/docs-1/ \
  --exclude "*" --include "*metadata.json" --include "*results.csv.zip"

This command can also be run from inside the plotting code. Note that the tuner.dill result files are not downloaded, since they are not needed for result visualization.

Here is the code for generating result plots for two of the benchmarks:

benchmarking/examples/benchmark_hypertune/plot_results.py
from typing import Dict, Any, Optional
import logging

from baselines import methods
from benchmark_definitions import benchmark_definitions
from syne_tune.experiments import ComparativeResults, PlotParameters, SubplotParameters


def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
    # The setup is the algorithm. No filtering
    return metadata["algorithm"]


SETUPS_RIGHT = ("ASHA", "SYNCHB", "BOHB")


def metadata_to_subplot(metadata: Dict[str, Any]) -> Optional[int]:
    return int(metadata["algorithm"] in SETUPS_RIGHT)


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    experiment_name = "docs-1"
    experiment_names = (experiment_name,)
    setups = list(methods.keys())
    num_runs = 15
    download_from_s3 = False  # Set ``True`` in order to download files from S3
    # Plot parameters across all benchmarks
    plot_params = PlotParameters(
        xlabel="wall-clock time",
        aggregate_mode="iqm_bootstrap",
        grid=True,
    )
    # We would like two subplots (1 row, 2 columns), with MOBSTER and HYPERTUNE
    # results on the left, and the remaining baselines on the right. Each
    # column gets its own title, and legends are shown in both
    plot_params.subplots = SubplotParameters(
        nrows=1,
        ncols=2,
        kwargs=dict(sharey="all"),
        titles=["Model-based Methods", "Baselines"],
        legend_no=[0, 1],
    )
    # The creation of ``results`` downloads files from S3 (only if
    # ``download_from_s3 == True``), reads the metadata and creates an inverse
    # index. If any result files are missing, or there are too many of them,
    # warning messages are printed
    results = ComparativeResults(
        experiment_names=experiment_names,
        setups=setups,
        num_runs=num_runs,
        metadata_to_setup=metadata_to_setup,
        plot_params=plot_params,
        metadata_to_subplot=metadata_to_subplot,
        download_from_s3=download_from_s3,
    )
    # We can now create plots for the different benchmarks
    # First: nas201-cifar100
    benchmark_name = "nas201-cifar100"
    benchmark = benchmark_definitions[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
        ylim=(0.265, 0.31),
    )
    results.plot(
        benchmark_name=benchmark_name,
        plot_params=plot_params,
        file_name=f"./{experiment_name}-{benchmark_name}.png",
    )
    # Next: nas201-ImageNet16-120
    benchmark_name = "nas201-ImageNet16-120"
    benchmark = benchmark_definitions[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
        ylim=(0.535, 0.58),
    )
    results.plot(
        benchmark_name=benchmark_name,
        plot_params=plot_params,
        file_name=f"./{experiment_name}-{benchmark_name}.png",
    )

The figure for benchmark nas201-cifar-100 looks as follows:

Results for nas201-cifar-100

Results for NASBench-201 (CIFAR-100)

  • There are two subfigures next to each other. Each contains a number of curves in bold, along with confidence intervals. The horizontal axis depicts wall-clock time, and on the vertical axis, we show the best metric value found until this time.

  • More general, the data from our 1260 experiments can be grouped w.r.t. subplot, then setup. Each setup gives rise to one curve (bold, with confidence band). Subplots are optional, the default is to plot a single figure.

  • The function metadata_to_setup maps the metadata stored for an experiment to the setup name, or to None if this experiment should be filtered out. In our basic case, the setup is simply the name of the tuning algorithm. Our benchmarking framework stores a host of information as metadata, the most useful keys for grouping are:

    • algorithm: Name of method (ASHA, MOBSTER-INDEP, … in our example)

    • tag: Experiment tag. This is docs-1 in our example. Becomes useful when we merge data from different studies in a single figure

    • benchmark: Benchmark name (nas201-cifar-100, … in our example)

    • n_workers: Number of workers

    Other keys may be specific to algorithm.

  • Once the data is grouped w.r.t. benchmark, then subplot (optional), then setup, we should be left with 15 experiments, one for each seed. Each seed gives rise to a best metric value curve. A metric value metric_val is converted as metric_multiplier * metric_val if mode == "min", and as 1 - metric_multiplier * metric_val if mode == "max". For example, if your metric is accuracy in percent (from 0 to 100), then mode="max" and metric_multiplier=0.01, and the curve shows error in [0, 1]. However, if convert_to_min == False, metric_val is always converted as metric_multiplier * metric_val, so that larger is better if mode == "max".

  • These 15 curves are now interpolated to a common grid, and at each grid point, the 15 values (one for each seed) are aggregated into 3 values lower, aggregate, upper. In the figure, aggregate is shown in bold, and lower, upper in dashed. Different aggregation modes are supported (selected by plot_params.aggregate_mode):

  • Plotting starts with the creation of a ComparativeResults object. We need to pass the experiment names (or tags), the list of all setups, the number of runs (or seeds), the metadata_to_setup function, as well as default plot parameters in plot_params. See PlotParameters for full details about the latter. In our example, we set xlabel, aggregate_mode (see above), and enable a grid with grid=True. Note that these parameters can be extended and overwritten by parameters for each plot.

  • In our example, we separate the MOBSTER and HYPERTUNE setups from the baselines, by using two subfigures. This is done by specifying plot_params.subplots and metadata_to_subplot. In the former, plot_params.subplots.nrows and plot_params.subplots.ncols are mandatory, providing the shape of the subplot arrangement. In plot_params.subplots.titles, we can provide titles for each column (which we do here). If given, this overrides plot_params.title. Also, plot_params.subplots.legend_no=[0, 1] asks for legends in both subplots (the default is no legend at all). For full details about these arguments, see SubplotParameters

  • The creation of results does a number of things. First, if download_from_s3=True, result files are downloaded from S3. In our example, we assume this has already been done. Next, all result files are iterated over, all metadata.json are read, and an inverse index from benchmark name to paths, setup_name, and subplot_no is created. This process also checks that exactly num_runs experiments are present for every setup. For large studies, it frequently happens that too few or too many results are found. The warning outputs can be used for debugging.

  • Given results, we can create plots for every benchmark. In our example, this is done for nas201-cifar100 and nas201-ImageNet16-120, by calling results.plot(). Apart from the benchmark name, we also pass plot parameters in plot_params, which extend (and overwrite) those passed at construction. In particular, we need to pass metric and mode, which we can obtain from the benchmark description. Moreover, ylim is a sensible range for the vertical axis, which is different for every benchmark (this is optional).

  • If we pass file_name as argument to results.plot, the figure is stored in this file.

Note

Apart from plots comparing different setups, aggregated over multiple seeds, we can also visualize the learning curves per trial for a single experiment. Details are given in this tutorial.

Contributing Your Benchmark

In order to increase its scope and usefulness, Syne Tune greatly welcomes the contribution of new benchmarks, in particular in areas not yet well covered. In a nutshell, contributing a benchmark is pretty similar to a code contribution, but in this section, we provide some extra hints.

Contributing a Real Benchmark

In principle, a real benchmark consists of a Python script which runs evaluations, adhering to the conventions of Syne Tune. However, in order for your benchmark to be useful for the community, here are some extra requirements:

  • The benchmark should not be excessively expensive to run

  • If your benchmark involves training a machine learning model, the code should work with the dependencies of a SageMaker framework. You can specify extra dependencies, but they should be small. While Syne Tune (and SageMaker) supports Docker containers, Syne Tune is not hosting them. At present, we also do not accept Dockerfile script contributions, since we cannot maintain them.

  • If your benchmark depends on data files, these must be hosted for public read access somewhere. Syne Tune cannot host data files, and will reject contributions with large files. If downloading and preprocessing the data for your benchmark takes too long, you may contribute an import script of a similar type to what is done in our syne_tune.blackbox_repository.

Let us have a look at the resnet_cifar10 benchmark as example of what needs to be done:

  • resnet_cifar10.py: The training script for your benchmark should be in a subdirectory of benchmarking/training_scripts/. The same directory can contain a file requirements.txt with dependencies beyond the SageMaker framework you specify for your code. You are invited to study the code resnet_cifar10.py in detail. Important points are:

    • Your script needs to report relevant metrics back to Syne Tune at the end of each epoch (or only once, at the end, if your script does not support multi-fidelity tuning), using an instance of Reporter.

    • We strongly recommend your script to support checkpointing, and the resnet_cifar10 script is a good example for how to do this with PyTorch training scripts. If checkpointing is not supported, all pause-and-resume schedulers will run substantially slower than they really have to, because every resume operation requires them to train the model from scratch.

  • benchmarking.benchmark_definitions.resnet_cifar10: You need to define some meta-data for your benchmark in benchmarking.benchmark_definitions. This should be a function returning a RealBenchmarkDefinition object. Arguments should be a flag sagemaker_backend (True for SageMaker backend experiment, False otherwise), and **kwargs overwriting values in RealBenchmarkDefinition. Hints:

    • framework should be one of the SageMaker frameworks. You should also specify framework_version and py_version in the estimator_kwargs dict.

    • config_space is the configuration space for your benchmark. Please make sure to choose hyperparameter domains wisely.

    • instance_type, n_workers: You need to specify a default instance type and number of workers for experiments running your benchmark. If in doubt, choose instances with the lowest costs. Currently, most of our GPU benchmarks use ml.g4dn.xlarge, and CPU benchmarks use ml.c5.4xlarge. Note that for experiments with the local backend (sagemaker_backend=False), the instance type must offer at least n_workers GPUs or CPU cores. For example, ml.g4dn.xlarge only has 1 GPU, while ml.g4dn.12xlarge provides for n_workers=4.

    • max_wallclock_time is a default value for the length of experiments running your benchmark, a value which depends on instance_type, n_workers. * metric, mode, max_resource_attr, resource_attr are required parameters for your benchmark, which are arguments to schedulers.

Note

If you simply would like to run experiments with your own training code, it is not necessary for you to the benchmarking module at all. It just makes comparisons to other built-in benchmarks easier. See this tutorial for more details.

Role of benchmarking/nursery/

The best place to contribute a new benchmark, along with launcher scripts, is to create a new package in benchmarking.nursery. This package contains:

  • Training script and meta-data definition, as detailed above

  • Launcher scripts, as detailed in the remainder of this tutorial

  • Optionally, some scripts to visualize results

You are encouraged to run some experiments with your benchmark, involving a number of baseline HPO methods, and submit results along with your pull request.

Once your benchmark is in there, it may be used by the community. If others find it useful, it can be graduated into benchmarking.benchmark_definitions, benchmarking.training_scripts, and benchmarking.examples.

We are looking forward to your pull request.

Contributing a Tabulated Benchmark

Syne Tune contains a blackbox repository syne_tune.blackbox_repository for maintaining and serving tabulated and surrogate benchmarks, as well as a simulator backend (syne_tune.backend.simulator_backend), which simulates training evaluations from a blackbox. The simulator backend can be used with any Syne Tune scheduler, and experiment runs are very close to what would be obtained by running training for real. Since time is simulated as well, not only are experiments very cheap to run (on basic CPU hardware), they also finish many times faster than real time. An overview is given here.

If you have the data for a tabulated benchmark, we strongly encourage you to contribute an import script to Syne Tune. Examples for such scripts are syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import, syne_tune.blackbox_repository.conversion_scripts.scripts.nasbench201_import, syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import, syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import, syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.lcbench. See also FAQ.

Visualization of Results

Finding the best model to deploy for a task at hand is a semi-automated process. The data scientist runs a set of experiments in parallel, visualizes comparative results, based on which the next set of experiments are planned. Syne Tune does not only allow you to run many experiments in parallel, but also provides tooling to rapidly create customized visualization in order to gain insights for the next steps, or to present final results to clients. This tutorial provides an overview of visualization facilities.

Note

In order to run the code in this tutorial, you need to have installed Syne Tune from source. Also, make sure to have installed the blackbox-repository dependencies.

Visualization Results of a Single Experiment

In this section, we describe the setup to be used for this tutorial. Then, we show how the results of a single experiment can be visualized.

Note

This tutorial shares some content with this one, but is more comprehensive in terms of features.

Note

In this tutorial, we will use a surrogate benchmark in order to obtain realistic results with little computation. To this end, you need to have the blackbox-repository dependencies installed, as detailed here. Note that the first time you use a surrogate benchmark, its data files are downloaded and stored to your S3 bucket, this can take a considerable amount of time. The next time you use the benchmark, it is loaded from your local disk or your S3 bucket, which is fast.

A Comparative Study

For the purpose of this tutorial, we ran the setup of benchmarking/examples/benchmark_hypertune/, using 15 random repetitions (or seeds). This is the command:

python benchmarking/examples/benchmark_hypertune/launch_remote.py \
  --experiment_tag docs-1 --random_seed 2965402734 --num_seeds 15

Note that we fix the seed here in order to obtain repeatable results. Recall from here that we compare 7 methods on 12 surrogate benchmarks:

  • Since 4 of the 7 methods are “expensive”, the above command launches 3 + 4 * 15 = 63 remote tuning jobs in parallel. Each of these jobs runs experiments for one method and all 12 benchmarks. For the “expensive” methods, each job runs a single seed, while for the remaining methods (ASHA, SYNCHB, BOHB), all seeds are run sequentially in a single job, so that a job for a “cheap” method runs 12 * 15 = 180 experiments sequentially.

  • The total number of experiment runs is 7 * 12 * 15 = 1260

  • Results of these experiments are stored to S3, using paths such as <s3-root>/syne-tune/docs-1/ASHA/docs-1-<datetime>/ for ASHA (all seeds), or <s3-root>/syne-tune/docs-1/HYPERTUNE-INDEP-5/docs-1-<datetime>/ for seed 5 of HYPERTUNE-INDEP. Result files are metadata.json, results.csv.gz, and tuner.dill. The former two are required for plotting results.

Once all of this has finished, we are left with 3780 result files on S3. First, we need to download the results from S3 to the local disk. This can be done by a command which is also printed at the end of launch_remote.py:

aws s3 sync s3://<BUCKET-NAME>/syne-tune/docs-1/ ~/syne-tune/docs-1/ \
  --exclude "*" --include "*metadata.json" --include "*results.csv.zip"

This command can also be run from inside the plotting code. Note that the tuner.dill result files are not downloaded, since they are not needed for result visualization.

Visualization of a Single Experiment

For a single experiment, we can directly plot the best metric value obtained as a function of wall-clock time. This can be done directly following the experiment, as shown in this example. In our setup, experiments have been launched remotely, so in order to plot results for a single experiment, we need to know the full tuner name. Say, we would like to plot results of MOBSTER-JOINT, seed=0. The names of single experiments are obtained by:

ls ~/syne-tune/docs-1/MOBSTER-JOINT-0/

There is one experiment per benchmark, starting with docs-1-nas201-ImageNet16-120-0, docs-1-nas201-cifar100-0, docs-1-nas201-cifar10-0, followed by date-time strings. Once the tuner name is known, the following scripts plots the desired curve and also displays the best configuration found:

code/plot_single_experiment_results.py
from syne_tune.experiments import load_experiment


if __name__ == "__main__":
    # Replace with name for your experiment:
    # Run:
    #    ls ~/syne-tune/docs-1/MOBSTER-JOINT-0/
    tuner_name = (
        "docs-1/MOBSTER-JOINT-0/docs-1-nas201-cifar10-0-2023-04-15-11-35-31-201"
    )

    tuning_experiment = load_experiment(tuner_name)
    print(tuning_experiment)

    print(f"best result found: {tuning_experiment.best_config()}")

    tuning_experiment.plot()

In general, you will have run more than one experiment. As in our study above, you may want to compare different methods, or variations of the tuning problem. You may want to draw conclusions by running on several benchmarks, and counter random effects by repeating experiments several times. In the next section, we show how comparative plots over many experiments can be created.

Visualization of Results from many Experiments

Apart from troubleshooting, visualizing the results of a single experiment is of limited use. In this section, we show how to create comparative plots, using results of many experiment. We will use results from the study detailed above.

A First Comparative Plot

Here is the code for generating result plots for two of the benchmarks:

benchmarking/examples/benchmark_hypertune/plot_results.py
from typing import Dict, Any, Optional
import logging

from baselines import methods
from benchmark_definitions import benchmark_definitions
from syne_tune.experiments import ComparativeResults, PlotParameters, SubplotParameters


def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
    # The setup is the algorithm. No filtering
    return metadata["algorithm"]


SETUPS_RIGHT = ("ASHA", "SYNCHB", "BOHB")


def metadata_to_subplot(metadata: Dict[str, Any]) -> Optional[int]:
    return int(metadata["algorithm"] in SETUPS_RIGHT)


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    experiment_name = "docs-1"
    experiment_names = (experiment_name,)
    setups = list(methods.keys())
    num_runs = 15
    download_from_s3 = False  # Set ``True`` in order to download files from S3
    # Plot parameters across all benchmarks
    plot_params = PlotParameters(
        xlabel="wall-clock time",
        aggregate_mode="iqm_bootstrap",
        grid=True,
    )
    # We would like two subplots (1 row, 2 columns), with MOBSTER and HYPERTUNE
    # results on the left, and the remaining baselines on the right. Each
    # column gets its own title, and legends are shown in both
    plot_params.subplots = SubplotParameters(
        nrows=1,
        ncols=2,
        kwargs=dict(sharey="all"),
        titles=["Model-based Methods", "Baselines"],
        legend_no=[0, 1],
    )
    # The creation of ``results`` downloads files from S3 (only if
    # ``download_from_s3 == True``), reads the metadata and creates an inverse
    # index. If any result files are missing, or there are too many of them,
    # warning messages are printed
    results = ComparativeResults(
        experiment_names=experiment_names,
        setups=setups,
        num_runs=num_runs,
        metadata_to_setup=metadata_to_setup,
        plot_params=plot_params,
        metadata_to_subplot=metadata_to_subplot,
        download_from_s3=download_from_s3,
    )
    # We can now create plots for the different benchmarks
    # First: nas201-cifar100
    benchmark_name = "nas201-cifar100"
    benchmark = benchmark_definitions[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
        ylim=(0.265, 0.31),
    )
    results.plot(
        benchmark_name=benchmark_name,
        plot_params=plot_params,
        file_name=f"./{experiment_name}-{benchmark_name}.png",
    )
    # Next: nas201-ImageNet16-120
    benchmark_name = "nas201-ImageNet16-120"
    benchmark = benchmark_definitions[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
        ylim=(0.535, 0.58),
    )
    results.plot(
        benchmark_name=benchmark_name,
        plot_params=plot_params,
        file_name=f"./{experiment_name}-{benchmark_name}.png",
    )

The figure for benchmark nas201-cifar-100 looks as follows:

Results for nas201-cifar-100

Results for NASBench-201 (CIFAR-100)

  • There are two subfigures next to each other. Each contains a number of curves in bold, along with confidence intervals. The horizontal axis depicts wall-clock time, and on the vertical axis, we show the best metric value found until this time.

  • More general, the data from our 1260 experiments can be grouped w.r.t. subplot, then setup. Each setup gives rise to one curve (bold, with confidence band). Subplots are optional, the default is to plot a single figure.

  • The function metadata_to_setup maps the metadata stored for an experiment to the setup name, or to None if this experiment should be filtered out. In our basic case, the setup is simply the name of the tuning algorithm. Our experimentation framework stores a host of information as metadata, the most useful keys for grouping are:

    • algorithm: Name of method (ASHA, MOBSTER-INDEP, … in our example)

    • tag: Experiment tag. This is docs-1 in our example. Becomes useful when we merge data from different studies in a single figure

    • benchmark: Benchmark name (nas201-cifar-100, … in our example)

    • n_workers: Number of workers

    Other keys may be specific to algorithm.

  • Once the data is grouped w.r.t. benchmark, then subplot (optional), then setup, we should be left with 15 experiments, one for each seed. Each seed gives rise to a best metric value curve. A metric value metric_val is converted as metric_multiplier * metric_val if mode == "min", and as 1 - metric_multiplier * metric_val if mode == "max". For example, if your metric is accuracy in percent (from 0 to 100), then mode="max" and metric_multiplier=0.01, and the curve shows error in [0, 1].

  • These 15 curves are now interpolated to a common grid, and at each grid point, the 15 values (one for each seed) are aggregated into 3 values lower, aggregate, upper. In the figure, aggregate is shown in bold, and lower, upper in dashed. Different aggregation modes are supported (selected by plot_params.aggregate_mode):

  • Plotting starts with the creation of a ComparativeResults object. We need to pass the experiment names (or tags), the list of all setups, the number of runs (or seeds), the metadata_to_setup function, as well as default plot parameters in plot_params. See PlotParameters for full details about the latter. In our example, we set xlabel, aggregate_mode (see above), and enable a grid with grid=True. Note that these parameters can be extended and overwritten by parameters for each plot.

  • In our example, we separate the MOBSTER and HYPERTUNE setups from the baselines, by using two subfigures. This is done by specifying plot_params.subplots and metadata_to_subplot. In the former, plot_params.subplots.nrows and plot_params.subplots.ncols `` are mandatory, prescribing the shape of the subplot arrangement. In ``plot_params.subplots.titles, we can provide titles for each column (which we do here). If given, this overrides plot_params.title. Also, plot_params.subplots.legend_no=[0, 1] asks for legends in both subplots (the default is no legend at all). For full details about these arguments, see SubplotParameters

  • The creation of results does a number of things. First, if download_from_s3=True, result files are downloaded from S3. In our example, we assume this has already been done. Next, all result files are iterated over, all metadata.json are read, and an inverse index from benchmark name to paths, setup_name, and subplot_no is created. This process also checks that exactly num_runs experiments are present for every setup. For large studies, it frequently happens that too few or too many results are found. The warning outputs can be used for debugging.

  • Given results, we can create plots for every benchmark. In our example, this is done for nas201-cifar100 and nas201-ImageNet16-120, by calling results.plot(). Apart from the benchmark name, we also pass plot parameters in plot_params, which extend (and overwrite) those passed at construction. In particular, we need to pass metric and mode, which we can obtain from the benchmark description. Moreover, ylim is a sensible range for the vertical axis, which is different for every benchmark (this is optional).

  • If we pass file_name as argument to results.plot, the figure is stored in this file.

  • results.plot returns a dictionary, whose entries “fig” and “axs” contain the figure and its axes (subfigures), allowing for further fine-tuning.

Note

If suplots are used, the grouping is w.r.t. (subplot, setup), not just by setup. This means you can use the same setup name in different subplots to show different data. For example, your study may have run a range of methods under different conditions (say, using a different number of workers). You can then map these conditions to subplots and show the same setups in each subplot. In any case, the mapping of setups to colors is fixed and the same in every subplot.

Note

Plotting features presented here can also be used to visualize results for a single seed. In this case, there are no error bars.

Additional Features

In this section, we discuss additional features, allowing you to customize your result plots.

Combining Results from Multiple Studies

HPO experiments are expensive to do, so you want to avoid re-running them for baselines over and over. Our plotting tools allow you to easily combine results across multiple studies.

As an example, say we would like to relate our docs-1 results to what random search and Bayesian optimization do on the same benchmarks. These baseline results were already obtained as part of an earlier study baselines-1, in which a number of methods were compared, among them RS and BO. As an additional complication, the earlier study used 30 repetitions (or seeds), while docs-1 uses 15. Here is the modification of the code above in order to include these additional baseline results in the plot on the right side. First, we need to replace metadata_to_setup and SETUPS_RIGHT:

def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
    algorithm = metadata["algorithm"]
    tag = metadata["tag"]
    seed = int(metadata["seed"])
    # Filter out experiments from "baselines-1" we don't want to compare
    # against
    if tag == "baselines-1" and (seed >= 15 or algorithm not in ("RS", "BO")):
        return None
    else:
        return algorithm


SETUPS_RIGHT = ("ASHA", "SYNCHB", "BOHB", "RS", "BO")

There are now two more setups, “RS” and “BO”, whose results come from the earlier baselines-1 study. Now, ComparativeResults has to be created differently:

experiment_names = experiment_names + ("baselines-1",)
setups = setups + ["RS", "BO"]
results = ComparativeResults(
    experiment_names=experiment_names,
    setups=setups,
    num_runs=num_runs,
    metadata_to_setup=metadata_to_setup,
    plot_params=plot_params,
    metadata_to_subplot=metadata_to_subplot,
    download_from_s3=download_from_s3,
)

Note

If you intend to combine results from several different studies, it is recommended to use the same random seed (specified as --random_seed), which ensures that the same sequence of random numbers is used in each experiment. This results in a so-called paired comparison, lowering the random variations across setups. In our example, we would look up the master random seed of the baselines-1 study and use this for docs-1 as well.

Add Performance of Initial Trials

When using HPO, you often have an idea about one or several default configurations that should be tried first. In Syne Tune, such initial configurations can be specified by points_to_evaluate (see here for details). An obvious question to ask is how long it takes for a HPO method to find a configuration which works significantly better than these initial ones.

We can visualize the performance of initial trials by specifying plot_params.show_init_trials of type ShowTrialParameters. In our docs-1 study, points_to_evaluate is not explicitly used, but the configuration of the first trial is selected by a mid-point heuristic. Our plotting script from above needs to be modified:

plot_params.show_init_trials = ShowTrialParameters(
    setup_name="ASHA",
    trial_id=0,
    new_setup_name="default"
)
results = ComparativeResults(
    experiment_names=experiment_names,
    setups=setups,
    num_runs=num_runs,
    metadata_to_setup=metadata_to_setup,
    plot_params=plot_params,
    metadata_to_subplot=metadata_to_subplot,
    download_from_s3=download_from_s3,
)

Since the ASHA curve is plotted on the right side, this will add another curve there with label default. This curve shows the best metric value, using data from the first trial only (trial_id == 0). It is extended as a flat constant line to the end of the horizontal range.

If you specify a number of initial configurations with points_to_evaluate, set ShowTrialParameters.trial_id to their number minus 1. The initial trials curve will use data from trials with ID less or equal this number.

Controlling Subplots

Our example above already creates two subplots, horizontally arranged, and we discussed the role of metadata_to_subplot. Here, we provide extra details about fields in SubplotParameters, the type for plot_params.subplots:

  • nrows, ncols: Shape of subplot matrix. The total number of subplots is <= ncols * nrows. kwargs contains further arguments to matplotlib.pyplot.subplots. For example, if sharey="all", the y tick labels are only created for the first column. If you use nrows > 1, you may want to share x tick labels as well, with sharex="all".

  • titles: If title_each_figure == False, this is a list of titles, one for each column. If title_each_figure == True, then titles contains a title for each subplot. If titles is not given, the global title plot_params.title is printed on top of the left-most column.

  • legend_no: List of subfigures in which the legend is shown. The default is not to show legends. In our example, there are different setups in each subplot, so we want a legend in each. If your subplots show the same setups under different conditions, you may want to show the legend in one of the subplots only, in which case legend_no contains a single number.

  • xlims: Use this if your subfigures have x axis ranges. The global xlim is overwritten by (0, xlims[subplot_no]).

  • subplot_indices: Any given plot produced by plot() does not have to contain all subfigures. For example, you may want to group your results into 4 or 8 bins, then create a sequence of plots comparing pairs of them. If subplot_indices is given, it contains the subplot indices to be shown, and this order. Otherwise, this is \(0, 1, 2, \dots\). If this is given, then titles and xlims is relative to this list (in that xlims[i] corresponds to subfigure subplot_indices[i]), but legend_no is not.

Plotting Derived Metrics

You can also plot metrics which are not directly contained in the results data (as a column), but which can be computed from the results. To this end, you can pass a dataframe column generator as dataframe_column_generator to plot(). For example, assume we run multi-objective HPO methods on a benchmark involving metrics cost and latency (mode="min" for both of them). The final plot command would look like this:

from syne_tune.experiments.multiobjective import (
    hypervolume_indicator_column_generator,
)

# ...

dataframe_column_generator = hypervolume_indicator_column_generator(
    metrics_and_modes = [("cost", "min"), ("latency", "min")]
)
plot_params = PlotParameters(
    metric="hypervolume_indicator",
    mode="max",
)
results.plot(
    benchmark_name=benchmark_name,
    plot_params=plot_params,
    dataframe_column_generator=dataframe_column_generator,
    one_result_per_trial=True,
)
  • The mapping returned by hypervolume_indicator_column_generator() maps a results dataframe to a new column containing the best hypervolume indicator as function of wall-clock time for the metrics cost and latency, which must be contained in the results dataframe.

  • The option one_result_per_trial=True of results.plot ensures that the result data is filtered, so that for each experiment, one the final row for each trial remains. This option is useful if the methods are single-fidelity, but results are reported after each epoch. The filtering makes sure that only results for the largest epoch are used for each trial. Since this is done before the best hypervolume indicator is computed, it can speed up the computation dramatically.

Filtering Experiments by DateTime Bounds

Results can be filtered out by having metadata_to_setup or metadata_to_subplot return None. This is particularly useful if results from several studies are to be combined. Another way to filter experiments is using the datetime_bounds argument of ComparativeResults. A common use case is that experiments for a large study have been launched in several stages, and those of an early stage failed. If the corresponding result files are not removed on S3, the creation of ComparativeResults will complain about too many results being found. datetime_bounds is specified in terms of date-time strings of the format ST_DATETIME_FORMAT, which currently is “YYYY-MM-DD-HH-MM-SS”. For example, if results are valid from “2023-03-19-22-01-57” onwards, but invalid before, we can use datetime_bounds=("2023-03-19-22-01-57", None). datetime_bounds can also be a dictionary with keys from experiment_names, in which case bounds are specific to different experiment prefixes.

Extract Meta-Data Values

Apart from plotting results, we can also retrieve meta-data values. This is done by passing a list of meta-data key names via metadata_keys when creating ComparativeResults. Afterwards, the corresponding meta-data values can be queried by calling results.metadata_values(benchmark_name). The result is a nested dictionary result, so that result[key][setup_name] is a list of values, where key is the meta-data key from metadata_keys, setup_name is a setup name. The list contains values from all experiments mapped to this setup_name. If you use the same setup names across different subplots, set metadata_subplot_level=True, in which case results.metadata_values(benchmark_name) returns result[key][setup_name][subplot_no], so the grouping w.r.t. setup names and subplots is used.

Extract Final Values for Extra Results

Syne Tune allows extra results to be stored alongside the usual metrics data, as shown in examples/launch_height_extra_results.py. These are simply additional columns in the result dataframe. In order to plot them over time, you currently need to write your own plotting scripts. If the best value over time approach of Syne Tune’s plotting tools makes sense for any single column, you can just specify their name for plot_params.metric and set plot_params.mode accordingly.

However, in many cases it is sufficient to know final values for extra results, grouped in the same way as everything else. For example, extra results may be used to monitor some internals of the HPO method being used, in which case we may be satisfied to see these statistics at the end of experiments. If extra_results_keys is used in plot(), the method returns a nested dictionary extra_results under key “extra_results”, so that extra_results[setup_name][key] contains a list of values (one for each seed) for setup setup_name and key an extra result name from extra_results_keys. As above, if metadata_subplot_level=True at construction of ComparativeResults, the structure of the dictionary is extra_results[setup_name][subplot_no][key].

Visualizing Learning Curves

We have seen how results from many experiments can be visualized jointly in order to compare different HPO methods, different variations of the benchmark (e.g., different configuration spaces), or both. In order to understand differences between two setups in a more fine-grained fashion, it can be useful to look at learning curve plots. In this section, we demonstrate Syne Tune tooling along this direction.

Why Hyper-Tune Does Outperform ASHA?

In our docs-1 study, HYPERTUNE-INDEP significantly outperforms ASHA. The best metric value curve descends much faster initially, and also the final performance at max_wallclock_time is significantly better.

How can this difference be explained? Both methods use the same scheduling logic, so differences are mostly due to how configurations of new trials are suggested. In ASHA, this is done by random sampling. In HYPERTUNE-INDEP, independent Gaussian process surrogate models are fitted on observations at each rung level, and decisions are made based on an acquisition function which carefully weights the input from each of these models (details are given here). But how exactly does this difference matter? We can find out by plotting learning curves of trials for two experiments next to each other, ASHA on the left, HYPERTUNE-INDEP` on the right. Here is the code for doing this:

Here is the code for generating result plots for two of the benchmarks:

code/plot_learning_curves.py
from typing import Dict, Any, Optional

from syne_tune.experiments import (
    TrialsOfExperimentResults,
    PlotParameters,
    MultiFidelityParameters,
)
from benchmarking.examples.benchmark_hypertune.benchmark_definitions import (
    benchmark_definitions,
)


SETUPS_TO_COMPARE = ("ASHA", "HYPERTUNE-INDEP")


def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
    algorithm = metadata["algorithm"]
    return algorithm if algorithm in SETUPS_TO_COMPARE else None


if __name__ == "__main__":
    experiment_name = "docs-1"
    benchmark_name_to_plot = "nas201-cifar100"
    seed_to_plot = 7
    download_from_s3 = False  # Set ``True`` in order to download files from S3

    experiment_names = (experiment_name,)
    # Plot parameters across all benchmarks
    plot_params = PlotParameters(
        xlabel="wall-clock time",
        grid=True,
    )
    # We need to provide details about rung levels of the multi-fidelity methods.
    # Also, all methods compared are pause-and-resume
    multi_fidelity_params = MultiFidelityParameters(
        rung_levels=[1, 3, 9, 27, 81, 200],
        multifidelity_setups={name: True for name in SETUPS_TO_COMPARE},
    )
    # The creation of ``results`` downloads files from S3 (only if
    # ``download_from_s3 == True``), reads the metadata and creates an inverse
    # index. If any result files are missing, or there are too many of them,
    # warning messages are printed
    results = TrialsOfExperimentResults(
        experiment_names=experiment_names,
        setups=SETUPS_TO_COMPARE,
        metadata_to_setup=metadata_to_setup,
        plot_params=plot_params,
        multi_fidelity_params=multi_fidelity_params,
        download_from_s3=download_from_s3,
    )

    # Create plot for certain benchmark and seed
    benchmark = benchmark_definitions[benchmark_name_to_plot]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
    )
    results.plot(
        benchmark_name=benchmark_name_to_plot,
        seed=seed_to_plot,
        plot_params=plot_params,
        file_name=f"./learncurves-{experiment_name}-{benchmark_name_to_plot}.png",
    )

The figure for benchmark nas201-cifar-100 and seed=7 looks as follows:

Learning curves for nas201-cifar-100

Learning curves for NASBench-201 (CIFAR-100), seed=7

The class for creating learning curve plots is TrialsOfExperimentResults. It is quite similar to ComparativeResults, but there are differences:

  • For learning curve plots, each setup occupies its own subfigure. Also, the seed for each plot is fixed, so each subfigure is based on the results for a single experiment.

  • metadata_to_setup is used to filter out the experiments we want to compare. In this case, this is ASHA and HYPERTUNE-INDEP.

  • The default for plot_params.subplots is a single row of subfigures, one for each setup, and titles correspond to setup names. In our example, we use this default. If you want to compare many setups, you can use an arrangement with multiple rows as well.

  • In learning curve plots, the trajectory of metric values for a trial is plotted in a different color per trial (more precisely, we circle through a palette, so that eventually colors are repeated). The final metric value of a trial is marked with a diamond.

  • If comparing multi-fidelity methods (like ASHA, Hyper-Tune, MOBSTER), you should also specify multi_fidelity_params, passing the rung levels. In this case, metric values at rung levels are marked by a circle, or by a diamond if this is the final value for a trial.

  • If some of your multi-fidelity setups are of the pause-and-resume type (i.e., the evaluation of a trial can be paused and possibly resumed later on), list them in multi_fidelity_params.pause_resume_setups. Trajectories of pause-and-resume methods need to be plotted differently: there has to be a gap between the value at a rung level and the next one, instead of a line connecting them. In our example, all setups are pause-and-resume, and these gaps are clearly visible.

What do these plots tell us about the differences between ASHA and HYPERTUNE-INDEP? First of all, HYPERTUNE-INDEP has many less isolated diamonds than ASHA. These correspond to trials which are paused after one epoch and never resumed. For ASHA, both the rate of single diamonds and their metric distribution remains stationary over time, while for HYPERTUNE-INDEP, the rate rapidly diminishes, and also the metric values for single diamonds improve. This is what we would expect. ASHA does not learn anything from the past, and simply continues to suggest configurations at random, while HYPERTUNE-INDEP rapidly learns what part of the configuration to avoid and does not repeat basic mistakes moving forward. This means that overall, ASHA wastes resources on starting poorly performing trials over and over, while HYPERTUNE-INDEP uses these resources in order to resume training for more trials, thereby reaching better performances over the same time horizon. These results were obtained in the context of simulated experimentation, without delays for starting, pausing, or resuming trials. In the presence of such delays, the advantage of model-based methods over ASHA becomes more pronounced.

With specific visualizations, we can drill deeper to figure out what HYPERTUNE-INDEP learns about the configuration space. For example, the configurations of all trials are stored in the results as well. Doing so, we can confirm that HYPERTUNE-INDEP rapidly learns about basic properties of the NASBench-201 configuration space, where certain connections are mandatory for good results, and consistenty chooses them after a short initial phase.

Rapid Experimentation with Syne Tune

The main goal of automated tuning is to help the user to find and adjust the best machine learning model as quickly as possible, given some computing resources controlled by the user. Syne Tune contains some tooling which can speed up this interactive process substantially. The user can launch many experiments in parallel, slicing the complete model selection and tuning problems into smaller parts. Comparative plots can be created from past experimental data and easily customized to specific needs.

Syne Tune’s tooling for rapid experimentation is part of the benchmarking framework, which is covered in detail in this tutorial. However, as demonstrated here, this framework is useful for experimentation beyond the comparison of different HPO algorithm. The tutorial here is self-contained, but the reader may want to consult the benchmarking tutorial for background information.

Note

The code used in this tutorial is contained in the Syne Tune sources, it is not installed by pip. You can obtain this code by installing Syne Tune from source, but the only code that is needed is in benchmarking.examples.demo_experiment. The final section also needs code from benchmarking.nursery.odsc_tutorial.

Also, make sure to have installed the blackbox-repository dependencies.

Setting up an Experimental Study

Any statistical analysis consists of a sequence of experiments, where later ones are planned given outcomes of earlier ones. Parallelization can be used to speed up this process:

  • If outcomes or decision-making are randomized (e.g., training neural networks starts from random initial weights; HPO may suggest configurations drawn at random), it is important to repeat experiments several times in order to gain robust outcomes.

  • If a search problem becomes too big, it can be broken down into several parts, which can be worked on independently.

In this section, we describe the setup for a simple study, which can be used to showcase tooling in Syne Tune for splitting up a large problem into pieces, running random repetitions, writing out extra information, and creating customized comparative plots.

For simplicity, we use surrogate benchmarks from the fcnet family, whereby tuning is simulated. This is the default configuration space for these benchmarks:

syne_tune/blackbox_repository/conversion_scripts/scripts/fcnet_import.py
CONFIGURATION_SPACE = {
    "hp_activation_fn_1": choice(["tanh", "relu"]),
    "hp_activation_fn_2": choice(["tanh", "relu"]),
    "hp_batch_size": logfinrange(8, 64, 4, cast_int=True),
    "hp_dropout_1": finrange(0.0, 0.6, 3),
    "hp_dropout_2": finrange(0.0, 0.6, 3),
    "hp_init_lr": choice([0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]),
    "hp_lr_schedule": choice(["cosine", "const"]),
    NUM_UNITS_1: logfinrange(16, 512, 6, cast_int=True),
    NUM_UNITS_2: logfinrange(16, 512, 6, cast_int=True),
}


Note

In the Syne Tune experimentation framework, a tuning problem (i.e., training and evaluation script or blackbox, together with defaults) is called a benchmark. This terminology is used even if the goal of experimentation is not benchmarking (i.e., comparing different HPO methods), as is the case in this tutorial here.

Note

The code used in this tutorial is contained in the Syne Tune source, it is not installed by pip. You can obtain this code by installing Syne Tune from source, but the only code that is needed is in benchmarking.examples.demo_experiment, so if you copy that out of the repository, you do not need all the remaining source code.

Note

In order to use surrogate benchmarks and the simulator backend, you need to have the blackbox-repository dependencies installed, as detailed here. Note that the first time you use a surrogate benchmark, its data files are downloaded and stored to your S3 bucket, this can take a considerable amount of time. The next time you use the benchmark, it is loaded from your local disk or your S3 bucket, which is fast.

Modifying the Configuration Space

The hyperparameters hp_activation_fn_1 and hp_activation_fn_2 prescribe the type of activation function in hidden layers 1 and 2. We can split the overall tuning problem into smaller pieces by fixing these parameters to fixed values, considering relu and tanh networks independently. In our study, we will compare the following methods:

  • ASHA-TANH, MOBSTER-TANH: Runs ASHA and MOBSTER on the simplified configuration space, where hp_activation_fn_1 = hp_activation_fn_2 = "tanh"

  • ASHA-RELU, MOBSTER-RELU: Runs ASHA and MOBSTER on the simplified configuration space, where hp_activation_fn_1 = hp_activation_fn_2 = "relu"

  • ASHA, MOBSTER: Runs ASHA and MOBSTER on the original configuration space

  • RS, BO: Runs baselines random search and Bayesian optimization on the original configuration space

Here is the script defining these alternatives:

benchmarking/examples/demo_experiment/baselines.py
import copy

from syne_tune.experiments.default_baselines import (
    RandomSearch,
    BayesianOptimization,
    ASHA,
    MOBSTER,
)
from syne_tune.experiments.baselines import MethodArguments


class Methods:
    RS = "RS"
    BO = "BO"
    ASHA = "ASHA"
    MOBSTER = "MOBSTER"
    ASHA_TANH = "ASHA-TANH"
    MOBSTER_TANH = "MOBSTER-TANH"
    ASHA_RELU = "ASHA-RELU"
    MOBSTER_RELU = "MOBSTER-RELU"


def _modify_config_space(
    method_arguments: MethodArguments, value: str
) -> MethodArguments:
    result = copy.copy(method_arguments)
    result.config_space = dict(
        method_arguments.config_space,
        hp_activation_fn_1=value,
        hp_activation_fn_2=value,
    )
    return result


methods = {
    Methods.RS: lambda method_arguments: RandomSearch(method_arguments),
    Methods.BO: lambda method_arguments: BayesianOptimization(method_arguments),
    Methods.ASHA: lambda method_arguments: ASHA(
        method_arguments,
        type="promotion",
    ),
    Methods.MOBSTER: lambda method_arguments: MOBSTER(
        method_arguments,
        type="promotion",
    ),
    # Fix activations to "tanh"
    Methods.ASHA_TANH: lambda method_arguments: ASHA(
        _modify_config_space(method_arguments, value="tanh"),
        type="promotion",
    ),
    Methods.MOBSTER_TANH: lambda method_arguments: MOBSTER(
        _modify_config_space(method_arguments, value="tanh"),
        type="promotion",
    ),
    # Fix activations to "relu"
    Methods.ASHA_RELU: lambda method_arguments: ASHA(
        _modify_config_space(method_arguments, value="relu"),
        type="promotion",
    ),
    Methods.MOBSTER_RELU: lambda method_arguments: MOBSTER(
        _modify_config_space(method_arguments, value="relu"),
        type="promotion",
    ),
}
  • Different methods are defined in dictionary methods, as functions mapping method_arguments of type MethodArguments to a scheduler object. Here, method_arguments.config_space contains the default configuration space for the benchmark, where both hp_activation_fn_1 and hp_activation_fn_2 are hyperparameters of type choice(["tanh", "relu"]).

  • For ASHA-TANH, MOBSTER-TANH, ASHA-RELU, MOBSTER-RELU, we fix these parameters. This is done in _modify_config_space, where method_arguments.config_space is replaced by a configuration space where the two hyperparameters are fixed (so methods do not search over them anymore).

  • Another way to modify method_arguments just before a method is created, is to use the map_extra_args argument of main(), as detailed here. This allows the modification to depend on extra command line arguments.

Next, we define the benchmarks our study should run over. For our simple example, we use the fcnet benchmarks:

benchmarking/examples/demo_experiment/benchmark_definitions.py
from syne_tune.experiments.benchmark_definitions import fcnet_benchmark_definitions


benchmark_definitions = fcnet_benchmark_definitions.copy()

This is where you would have to plug in your own benchmarks, namely your training script with a bit of metadata. Examples are provided here and here.

Recording Extra Results

Next, we need to write the hpo_main.py script which runs a single experiment. As shown here, this is mostly about selecting the correct main function among syne_tune.experiments.launchers.hpo_main_simulator.main(), syne_tune.experiments.launchers.hpo_main_local.main(), syne_tune.experiments.launchers.hpo_main_sagemaker.main(), depending on the trial backend we want to use. In our case, we also would like to record extra information about the experiment. Here is the script:

benchmarking/examples/demo_experiment/hpo_main.py
from typing import Optional, Dict, Any, List

from baselines import methods
from benchmark_definitions import benchmark_definitions
from syne_tune import Tuner
from syne_tune.experiments.launchers.hpo_main_simulator import main
from syne_tune.optimizer.schedulers import HyperbandScheduler
from syne_tune.results_callback import ExtraResultsComposer


RESOURCE_LEVELS = [1, 3, 9, 27, 81]


class RungLevelsExtraResults(ExtraResultsComposer):
    """
    We would like to monitor the sizes of rung levels over time. This is an extra
    information normally not recorded and stored.
    """

    def __call__(self, tuner: Tuner) -> Optional[Dict[str, Any]]:
        if not isinstance(tuner.scheduler, HyperbandScheduler):
            return None
        rung_information = tuner.scheduler.terminator.information_for_rungs()
        return {
            f"num_at_level{resource}": num_entries
            for resource, num_entries, _ in rung_information
            if resource in RESOURCE_LEVELS
        }

    def keys(self) -> List[str]:
        return [f"num_at_level{r}" for r in RESOURCE_LEVELS]


if __name__ == "__main__":
    extra_results = RungLevelsExtraResults()
    main(methods, benchmark_definitions, extra_results=extra_results)
  • As usual, we import syne_tune.experiments.launchers.hpo_main_simulator.main() (we use the simulator backend) and call it, passing our methods and benchmark_definitions. We also pass extra_results, since we would like to record extra results.

  • Note that apart from syne_tune imports, this script is only doing local imports. No other code from benchmarking is required.

  • A certain number of time-stamped results are recorded by default in results.csv.zip, details are here. In particular, all metric values reported for all trials are recorded.

  • In our example, we would also like to record information about the multi-fidelity schedulers ASHA and MOBSTER. As detailed in this tutorial, they record metric values for trials at different rung levels these trials reached (e.g., number of epochs trained), and decisions on which paused trial to promote to the next rung level are made by comparing its performance with all others in the same rung. The rung levels are growing over time, and we would like to record their respective sizes as a function of wall-clock time.

  • To this end, we create a subclass of ExtraResultsComposer, whose __call__ method extracts the desired information from the current Tuner object. In our example, we first test whether the current scheduler is ASHA or MOBSTER (recall that we also run RS and BO as baselines). If so, we extract the desired information and return it as a dictionary.

  • Finally, we create extra_results and pass it to the main function.

The outcome is that a number of additional columns are appended to the dataframe stored in results.csv.zip, at least for experiments with ASHA or MOBSTER schedulers. Running this script launches an experiment locally (if you installed Syne Tune from source, you need to start the script from the benchmarking/examples directory):

python demo_experiment/hpo_main.py --experiment_tag docs-2-debug
Running Experiments in Parallel

Running our hpo_main.py script launches a single experiment on the local machine, writing results to a local directory. This is nice for debugging, but slow and cumbersome once we convinced ourselves that the setup is working. We will want to launch many experiments in parallel on AWS, and use our local machine for other work.

  • Experiments with our setups RS, BO, ASHA-TANH, MOBSTER-TANH, ASHA-RELU, MOBSTER-RELU, ASHA, MOBSTER are independent and can be run in parallel.

  • We repeat each experiment 20 times, in order to quantify the random fluctuation in the results. These seeds are independent and can be run in parallel.

  • We could also run experiments with different benchmarks (i.e., datasets in fcnet) in parallel. But since a single simulated experiment is fast to do, we are not doing this here.

Running experiments in parallel requires a remote launcher script:

benchmarking/examples/demo_experiment/launch_remote.py
from pathlib import Path

from benchmark_definitions import benchmark_definitions
from baselines import methods
from syne_tune.experiments.launchers.launch_remote_simulator import launch_remote


if __name__ == "__main__":

    def _is_expensive_method(method: str) -> bool:
        return method.startswith("MOBSTER") or method == "BO"

    entry_point = Path(__file__).parent / "hpo_main.py"
    launch_remote(
        entry_point=entry_point,
        methods=methods,
        benchmark_definitions=benchmark_definitions,
        is_expensive_method=_is_expensive_method,
    )
  • Again, we simply choose the correct launch_remote function among launch_remote(), launch_remote(), launch_remote(), depending on the trial backend.

  • Note that apart from syne_tune imports, this script is only doing local imports. No other code from benchmarking is required.

  • In is_expensive_method, we pass a predicate from method name. If is_expensive_method(method) is True, the 20 different seeds are run in parallel. Otherwise, they are run sequentially.

  • In our example, we know that BO and MOBSTER run quite a bit slower in the simulator than RS and ASHA, so we label the former as expensive. This means we have 4 expensive methods and 4 cheap ones, and our complete study will launch 4 + 4 * 20 = 84 SageMaker training jobs. Since fcnet contains four benchmarks, we run 8 * 20 * 4 = 640 experiments in total.

All of these experiments can be launched with a single command (if you installed Syne Tune from source, you need to start the script from the benchmarking/examples directory):

python demo_experiment/launch_remote.py \
  --experiment_tag docs-2 --random_seed 2465497701 --num_seeds 20

If --random_seed is not given, a master random seed is drawn at random, printed and also stored in the metadata. If a study consists of launching experiments in several steps, it is good practice to pass the same random seed for each launch command. For example, you can run the first launch command without passing a seed, then note the seed from the output and use it for further launches.

Avoiding Costly Failures

In practice, with a new experimental setup, it is not a good idea to launch all experiments in one go. We recommend to move in stages.

First, if our benchmarks run locally as well, we should start with some local tests. For example:

python demo_experiment/hpo_main.py \
  --experiment_tag docs-2-debug --random_seed 2465497701 \
  --method ASHA-RELU --verbose 1

We can cycle through several methods and check whether anything breaks. Note that --verbose 1 generates useful output about the progress of the method, which can be used to check whether properties are the way we expect (for example, "relu" is chosen for the fixed hyperparameters). Results are stored locally under ~/syne_tune/docs-2-debug/.

Next, we launch the setup remotely, but for a single seed:

python demo_experiment/launch_remote.py \
  --experiment_tag docs-2 --random_seed 2465497701 --num_seeds 1

This will start 8 SageMaker training jobs, one for each method, and with seed=0. Some of them, like RS, ASHA, ASHA-* will finish very rapidly, and it makes sense to quickly browse their logs, to check whether desired properties are met.

Finally, if this looks good, we can launch all the rest:

python demo_experiment/launch_remote.py \
  --experiment_tag docs-2 --random_seed 2465497701 --num_seeds 20 \
  --start_seed 1

This is launching all remaining experiments with seed from 1 to 19.

Note

If something breaks when remotely launching for seed=0, it may be that results have already been written to S3. This is because results are written out periodically. If you use the same tag docs-2 for initial debugging, you will have to remove these results on S3, or otherwise be careful filtering them out later on (this is discussed below).

In a large study consisting of many experiments, it can happen that some experiments fail for reasons which do not invalidate results of the other ones. If this happens, it is not a good idea, both time and cost wise, to start the whole study from scratch. Instead, we recommend to clean up and restart only the experiments which failed. For example, assume that in our study above, the MOBSTER-TANH experiments of seed == 13 failed:

  • We need to remove incomplete results of these experiments, which can corrupt final aggregate results otherwise. This can either be done by removing them on S3, or by advanced filtering (discussed below). In general, we recommend the former. For our example, the results to be removed are in s3://{sagemaker-default-bucket}/syne-tune/docs-2/MOBSTER-TANH-13/. Namely, since MOBSTER-TANH is an “expensive” method, results for different seeds are written to different subdirectories.

  • Next, we need to start the failed experiments again:

python demo_experiment/launch_remote.py \
  --experiment_tag docs-2 --random_seed 2465497701 --num_seeds 14 \
  --start_seed 13 --method MOBSTER-TANH

Instead, assume that the ASHA experiments for seed == 13 failed. This is a “cheap” method, so results for all seeds are written to s3://{sagemaker-default-bucket}/syne-tune/docs-2/ASHA/, into subdirectories of the form docs-2-<benchmark>-<seed>-<datetime>. Since this method is cheap, we can rerun all its experiments, by first removing everything under s3://{sagemaker-default-bucket}/syne-tune/docs-2/ASHA/, then:

python demo_experiment/launch_remote.py \
  --experiment_tag docs-2 --random_seed 2465497701 --num_seeds 20 \
  --method ASHA

Note

Don’t worry if you restart failed experiments without first removing its incomplete results on S3. Due to the <datetime> postfix of directory names, results of a restart never conflict with older ones. However, once you plot aggregate results, you will get a warning that too many results have been found, along with where these results are located. At this point, you can still remove the incomplete ones.

Visualization of Results

Once all results are obtained, we would rapidly like to create comparative plots. In Syne Tune, each experiment stores two files, metadata.json with metadata, and results.csv.zip containing time-stamped results. The Tuner object at the end of the experiment is also serialized to tuner.dill, but this is not needed here.

Note

This section offers an example of the plotting facilities in Syne Tune. More details are provided in this tutorial.

First, we need to download the results from S3 to the local disk. This can be done by a command which is also printed at the end of launch_remote.py:

aws s3 sync s3://<BUCKET-NAME>/syne-tune/docs-2/ ~/syne-tune/docs-2/ \
  --exclude "*" --include "*metadata.json" --include "*results.csv.zip"

This command can also be run from inside the plotting code. Note that the tuner.dill result files are not downloaded, since they are not needed for result visualization.

Here is the code for generating result plots for two of the benchmarks:

benchmarking/examples/demo_experiment/plot_results.py
from typing import Dict, Any, Optional, List, Set
import logging

from baselines import methods
from benchmark_definitions import benchmark_definitions
from hpo_main import RungLevelsExtraResults
from syne_tune.experiments import ComparativeResults, PlotParameters, SubplotParameters


def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
    # The setup is the algorithm. No filtering
    return metadata["algorithm"]


SETUP_TO_SUBPLOT = {
    "ASHA": 0,
    "MOBSTER": 0,
    "ASHA-TANH": 1,
    "MOBSTER-TANH": 1,
    "ASHA-RELU": 2,
    "MOBSTER-RELU": 2,
    "RS": 3,
    "BO": 3,
}


def metadata_to_subplot(metadata: Dict[str, Any]) -> Optional[int]:
    return SETUP_TO_SUBPLOT[metadata["algorithm"]]


def _print_extra_results(
    extra_results: Dict[str, Dict[str, List[float]]],
    keys: List[str],
    skip_setups: Set[str],
):
    for setup_name, results_for_setup in extra_results.items():
        if setup_name not in skip_setups:
            print(f"[{setup_name}]:")
            for key in keys:
                values = results_for_setup[key]
                print(f"  {key}: {[int(x) for x in values]}")


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    experiment_name = "docs-2"
    experiment_names = (experiment_name,)
    setups = list(methods.keys())
    num_runs = 20
    download_from_s3 = False  # Set ``True`` in order to download files from S3
    # Plot parameters across all benchmarks
    plot_params = PlotParameters(
        xlabel="wall-clock time",
        aggregate_mode="iqm_bootstrap",
        grid=True,
    )
    # We would like four subplots (2 row, 2 columns), each showing two setups.
    # Each subplot gets its own title, and legends are shown in each,
    plot_params.subplots = SubplotParameters(
        nrows=2,
        ncols=2,
        kwargs=dict(sharex="all", sharey="all"),
        titles=[
            "activations tuned",
            "activations = tanh",
            "activations = relu",
            "single fidelity",
        ],
        title_each_figure=True,
        legend_no=[0, 1, 2, 3],
    )

    # The creation of ``results`` downloads files from S3 (only if
    # ``download_from_s3 == True``), reads the metadata and creates an inverse
    # index. If any result files are missing, or there are too many of them,
    # warning messages are printed
    results = ComparativeResults(
        experiment_names=experiment_names,
        setups=setups,
        num_runs=num_runs,
        metadata_to_setup=metadata_to_setup,
        plot_params=plot_params,
        metadata_to_subplot=metadata_to_subplot,
        download_from_s3=download_from_s3,
    )

    # We can now create plots for the different benchmarks:
    # - We store the figures as PNG files
    # - We also load the extra results collected during the experiments
    #   (recall that we monitored sizes of rungs for ASHA and MOBSTER).
    #   Instead of plotting their values over time, we print out their
    #   values at the end of each experiment
    extra_results_keys = RungLevelsExtraResults().keys()
    skip_setups = {"RS", "BO"}
    # First: fcnet-protein
    benchmark_name = "fcnet-protein"
    benchmark = benchmark_definitions[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
        ylim=(0.22, 0.30),
    )
    extra_results = results.plot(
        benchmark_name=benchmark_name,
        plot_params=plot_params,
        file_name=f"./{experiment_name}-{benchmark_name}.png",
        extra_results_keys=extra_results_keys,
    )["extra_results"]
    _print_extra_results(extra_results, extra_results_keys, skip_setups=skip_setups)
    # Next: fcnet-slice
    benchmark_name = "fcnet-slice"
    benchmark = benchmark_definitions[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
        ylim=(0.00025, 0.0012),
    )
    results.plot(
        benchmark_name=benchmark_name,
        plot_params=plot_params,
        file_name=f"./{experiment_name}-{benchmark_name}.png",
        extra_results_keys=extra_results_keys,
    )
    _print_extra_results(extra_results, extra_results_keys, skip_setups=skip_setups)

The figure for benchmark fcnet-protein looks as follows:

Results for fcnet-protein

Results for FCNet (protein dataset)

Moreover, we obtain an output for extra results, as follows:

[ASHA]:
  num_at_level1: [607, 630, 802, 728, 669, 689, 740, 610, 566, 724, 691, 812, 837, 786, 501, 642, 554, 625, 531, 672]
  num_at_level3: [234, 224, 273, 257, 247, 238, 271, 222, 191, 256, 240, 273, 287, 272, 185, 227, 195, 216, 197, 241]
  num_at_level9: [97, 81, 99, 95, 99, 99, 106, 92, 73, 98, 90, 95, 99, 98, 74, 86, 78, 82, 85, 101]
  num_at_level27: [49, 36, 37, 36, 41, 47, 37, 43, 36, 35, 34, 37, 39, 39, 39, 44, 41, 30, 45, 49]
  num_at_level81: [22, 17, 18, 15, 21, 22, 19, 26, 20, 15, 16, 13, 13, 23, 27, 29, 20, 17, 20, 26]
[MOBSTER]:
  num_at_level1: [217, 311, 310, 353, 197, 96, 377, 135, 364, 336, 433, 374, 247, 282, 175, 302, 187, 225, 182, 240]
  num_at_level3: [107, 133, 124, 138, 104, 64, 163, 72, 157, 132, 146, 140, 123, 112, 110, 129, 90, 100, 86, 126]
  num_at_level9: [53, 62, 55, 59, 66, 51, 83, 47, 72, 55, 54, 59, 54, 51, 72, 65, 60, 49, 55, 70]
  num_at_level27: [29, 34, 30, 26, 50, 37, 49, 31, 27, 25, 23, 28, 27, 28, 49, 33, 42, 27, 34, 45]
  num_at_level81: [18, 20, 16, 14, 33, 25, 37, 24, 13, 17, 10, 14, 17, 20, 32, 24, 29, 15, 26, 31]
[ASHA-TANH]:
  num_at_level1: [668, 861, 755, 775, 644, 916, 819, 710, 694, 870, 764, 786, 769, 710, 862, 807, 859, 699, 757, 794]
  num_at_level3: [237, 295, 265, 272, 221, 311, 302, 246, 246, 294, 278, 280, 276, 240, 297, 290, 304, 258, 270, 279]
  num_at_level9: [86, 112, 101, 97, 91, 104, 119, 90, 92, 104, 98, 96, 98, 90, 108, 120, 105, 109, 105, 102]
  num_at_level27: [37, 47, 39, 39, 40, 39, 45, 44, 39, 41, 41, 44, 44, 40, 45, 43, 38, 53, 49, 39]
  num_at_level81: [21, 16, 16, 16, 20, 16, 17, 18, 17, 14, 18, 21, 21, 20, 17, 19, 16, 19, 23, 20]
[MOBSTER-TANH]:
  num_at_level1: [438, 594, 462, 354, 307, 324, 317, 359, 483, 523, 569, 492, 516, 391, 408, 565, 492, 322, 350, 479]
  num_at_level3: [166, 206, 156, 135, 133, 127, 129, 131, 175, 211, 191, 165, 178, 169, 151, 204, 164, 122, 132, 205]
  num_at_level9: [69, 75, 56, 54, 78, 60, 57, 60, 76, 80, 72, 56, 72, 103, 67, 77, 63, 48, 59, 92]
  num_at_level27: [36, 35, 25, 28, 45, 37, 27, 36, 46, 27, 37, 26, 37, 58, 31, 36, 26, 28, 33, 39]
  num_at_level81: [20, 13, 12, 11, 23, 20, 13, 20, 23, 10, 13, 9, 18, 31, 16, 18, 11, 16, 19, 21]
[ASHA-RELU]:
  num_at_level1: [599, 670, 682, 817, 608, 585, 770, 397, 613, 721, 599, 601, 618, 718, 613, 674, 715, 638, 598, 652]
  num_at_level3: [201, 246, 242, 277, 225, 209, 282, 140, 212, 245, 202, 205, 215, 245, 207, 239, 238, 224, 221, 234]
  num_at_level9: [75, 94, 94, 100, 89, 92, 101, 60, 78, 89, 76, 82, 80, 98, 86, 96, 83, 84, 90, 91]
  num_at_level27: [37, 43, 36, 34, 40, 45, 39, 35, 34, 31, 40, 40, 38, 39, 35, 34, 29, 34, 41, 35]
  num_at_level81: [23, 19, 14, 13, 19, 21, 15, 24, 17, 13, 20, 18, 19, 18, 20, 16, 13, 15, 22, 17]
[MOBSTER-RELU]:
  num_at_level1: [241, 319, 352, 438, 354, 386, 197, 262, 203, 387, 320, 139, 359, 401, 334, 294, 361, 403, 178, 141]
  num_at_level3: [110, 156, 135, 166, 138, 143, 104, 124, 95, 136, 133, 71, 133, 151, 130, 122, 134, 151, 92, 74]
  num_at_level9: [50, 83, 59, 75, 59, 55, 57, 72, 53, 53, 58, 40, 62, 63, 61, 54, 52, 65, 48, 47]
  num_at_level27: [31, 51, 29, 31, 29, 23, 39, 38, 36, 20, 29, 36, 32, 29, 32, 29, 24, 27, 31, 34]
  num_at_level81: [20, 35, 12, 11, 12, 15, 22, 18, 26, 12, 16, 27, 16, 15, 20, 15, 15, 13, 18, 22]
  • There are four subfigures arranged as two-by-two matrix. Each contains two curves in bold, along with confidence intervals. The horizontal axis depicts wall-clock time, and on the vertical axis, we show the best metric value found until this time.

  • More general, the data from our 640 experiments can be grouped w.r.t. subplot, then setup. Each setup gives rise to one curve (bold, with confidence band). Subplots are optional, the default is to plot a single figure.

  • The function metadata_to_setup maps the metadata stored for an experiment to the setup name. In our basic case, the setup is simply the name of the method.

  • The function metadata_to_subplot maps the metadata to the subplot index (0, 1, 2, 3). We group setups with the same configuration space, but also split multi-fidelity and single-fidelity methods.

  • Once the data is grouped w.r.t. benchmark, then subplot (optional), then setup, we should be left with 20 experiments, one for each seed. These 20 curves are now interpolated to a common grid, and at each grid point, the 20 values are aggregated into lower, aggregate, upper. In the figure, aggregate is shown in bold, and lower, upper in dashed. Different aggregation modes are supported (selected by plot_params.aggregate_mode).

  • We pass extra_results_keys to the plot() method in order to also retrieve extra results. This method returns a dictionary, whose “extra_results” entry is what we need.

Advanced Experimenting

Once you start to run many experiments, you will get better at avoiding wasteful repetitions. Here are some ways in which Syne Tune can support you.

  • Combining results from several studies: It often happens that results for a new idea need to be compared to baselines on a common set of benchmarks. You do not have to re-run baselines, but can easily combine older results with more recent ones. This is explained here.

  • When running many experiments, some may fail. Syne Tune supports you in not having to re-run everything from scratch. As already noted above, when creating aggregate plots, it is important not to use incomplete results stored for failed experiments. The cleanest way to do so is to remove these results on S3. Another option is to filter out corrupt results:

    • If you forget about removing such corrupt results, you will get a reminder when creating ComparativeResults. Since you pass the list of setup names and the number of seeds (in num_runs), you get a warning when too many experiments have been found, along with the path names.

    • Results are stored on S3, using object name prefixes of the form <s3-bucket>/syne-tune/docs-2/ASHA/docs-2-fcnet-protein-7-2023-04-20-15-20-18-456/ or <s3-bucket>/syne-tune/docs-2/MOBSTER-7/docs-2-fcnet-protein-7-2023-04-20-15-20-00-677/. The pattern is <tag>/<method>/<tag>-<benchmark>-<seed>-<datetime>/ for cheap methods, and <tag>/<method>-<seed>/<tag>-<benchmark>-<seed>-<datetime>/ for expensive methods.

    • Instead of removing corrupt results on S3, you can also filter them by datetime, using the datetime_bounds argument of ComparativeResults. This allows you define an open or closed datetime range for results you want to keep. If your failed attempts preceed the ones that finally worked out, this type of filtering can save you the head-ache of removing files on S3.

    • Warning: When you remove objects on S3 for some experiment tag, it is strongly recommended to remove all result files locally (so everything at ~/syne-tune/<tag>/) and sync them back from S3, using the command at the start of this section. aws s3 sync is prone to make mistakes otherwise, which are very hard to track down.

My Code Contains Packages

All code in benchmarking.examples.demo_experiment is contained in a single directory. If your code for launching experiments and defining benchmarks is structured into packages, you need to follow some extra steps.

There are two choices you have:

  • Either, you install Syne Tune from source. In this case, you can just keep your launcher scripts and benchmark definitions in there, and use absolute imports from benchmarking. One advantage of this is that you can use all benchmarks currently included in benchmarking.benchmark_definitions.

  • Or you do not install Syne Tune from source, in which case this section is for you.

We will use the example in benchmarking.nursery.odsc_tutorial. More details about this example are found in this tutorial. We will not assume that Syne Tune is installed from source, but just that the code from benchmarking.nursery.odsc_tutorial is present at <abspath>/odsc_tutorial/.

The root package for this example is transformer_wikitext2, in that all imports start from there, for example:

transformer_wikitext2/local/hpo_main.py
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.hpo_main_local import main


if __name__ == "__main__":
    main(methods, benchmark_definitions)

The code has the following structure:

tree transformer_wikitext2/
transformer_wikitext2
├── __init__.py
├── baselines.py
├── benchmark_definitions.py
├── code
│   ├── __init__.py
│   ├── requirements.txt
│   ├── training_script.py
│   ├── training_script_no_checkpoints.py
│   ├── training_script_report_end.py
│   └── transformer_wikitext2_definition.py
├── local   ├── __init__.py
│   ├── hpo_main.py
│   ├── launch_remote.py
│   ├── plot_learning_curve_pairs.py
│   ├── plot_learning_curves.py
│   ├── plot_results.py
│   └── requirements-synetune.txt
└── sagemaker
    ├── __init__.py
    ├── hpo_main.py
    ├── launch_remote.py
    ├── plot_results.py
    └── requirements.txt

Training code and benchmark definition are in code, launcher and plotting scripts for the local backend in local, and ditto for the SageMaker backend in sagemaker.

In order to run any of the scripts, the PYTHONPATH environment variable needs to be appended to as follows:

export PYTHONPATH="${PYTHONPATH}:<abspath>/odsc_tutorial/"

Here, you need to replace <abspath> with the absolute path to odsc_tutorial. Once this is done, the following should work:

python transformer_wikitext2/local/hpo_main.py \
  --experiment_tag mydebug --benchmark transformer_wikitext2 --num_seeds 1

Of course, this script needs all training script dependencies to be installed locally. If you work with SageMaker, it is much simpler to launch experiments remotely. The launcher script is as follows:

transformer_wikitext2/local/launch_remote.py
from pathlib import Path

from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.launch_remote_local import launch_remote


if __name__ == "__main__":
    entry_point = Path(__file__).parent / "hpo_main.py"
    source_dependencies = [str(Path(__file__).parent.parent)]
    launch_remote(
        entry_point=entry_point,
        methods=methods,
        benchmark_definitions=benchmark_definitions,
        source_dependencies=source_dependencies,
    )

Importantly, you need to set source_dependencies in this script. Here, source_dependencies = [str(Path(__file__).parent.parent)] translates to ["<abspath>/odsc_tutorial/transformer_wikitext2"]. If you have multiple root packages you want to import from, source_dependencies must contain all of them.

The following command should work now:

python transformer_wikitext2/local/launch_remote.py \
  --experiment_tag mydebug --benchmark transformer_wikitext2 --num_seeds 1 \
  --method BO

This should launch one SageMaker training job, which runs Bayesian optimization with 4 workers. You can also test remote launching with the SageMaker backend:

python transformer_wikitext2/sagemaker/launch_remote.py \
  --experiment_tag mydebug --benchmark transformer_wikitext2 --num_seeds 1 \
  --method BO --n_workers 2

This command should launch one SageMaker training job running Bayesian optimization with the SageMaker backend, meaning that at any given time, two worker training jobs are running.

How to Contribute a New Scheduler

This tutorial guides developers and researchers to contribute a new scheduler to Syne Tune, or to modify and extend an existing one.

We hope this information inspires you to give it a try. Please do consider contributing your efforts to Syne Tune:

  • Reproducible research: Syne Tune contains careful implementations of many baselines and SotA algorithms. Once your new method is in there, you can compare apples against apples (same backend, same benchmarks, same stopping rules) instead of apples against oranges.

  • Faster and cheaper: You have a great idea for a new scheduler? Test it right away on a large range of benchmarks. Use Syne Tune’s blackbox repository and simulator backend in order to dramatically cut compute costs and waiting time.

  • Impact: If you compared your method to a range of others, you know how hard it is to get full-fledged HPO code of others running. Why would it be any different for yours? We did a lot of the hard work already, why not benefit from that?

  • Your code is more awesome than ours? Great! Why not contribute your backend or your benchmarks to Syne Tune as well?

Note

In order to develop new methodology in Syne Tune, make sure to use an installation from source. In particular, you need to have installed the dev dependencies.

A First Example

In this section, we start with a simple example and clarify some basic concepts.

If you have not done so, we recommend you have a look at Basics of Syne Tune in order to get familiar with basic concepts of Syne Tune.

First Example

A simple example for a new scheduler (called SimpleScheduler) is given by the script examples/launch_height_standalone_scheduler.py. All schedulers are subclasses of TrialScheduler. Important methods include:

  • Constructor: Needs to be passed the configuration space. Most schedulers also have metric (name of metric to be optimized) and mode (whether metric is to be minimized or maximized; default is "min").

  • _suggest (internal version of suggest): Called by the Tuner whenever a worker is available. Returns trial to execute next, which in most cases will start a new configuration using trial ID trial_id (as start_suggestion). Some schedulers may also suggest to resume a paused trial (as resume_suggestion). Our SimpleScheduler simply draws a new configuration at random from the configuration space.

  • on_trial_result: Called by the Tuner whenever a new result reported by a running trial has been received. Here, trial provides information about the trial (most important is trial.trial_id), and result contains the arguments passed to Reporter by the underlying training script. All but the simplest schedulers maintain a state which is modified based on this information. The scheduler also decides what to do with this trial, returning a SchedulerDecision to the Tuner, which in turn relays this decision to the backend. Our SimpleScheduler maintains a sorted list of all metric values reported in self.sorted_results. Whenever a trial reports a metric value which is worse than 4/5 of all previous reports (across all trials), the trial is stopped, otherwise it may continue. This is an example for a multi-fidelity scheduler, in that a trial reports results multiple times (for example, a script training a neural network may report validation errors at the end of each epoch). Even if your scheduler does not support a multi-fidelity setup, in that it does not make use of intermediate results, it should work properly with training scripts which report such results (e.g., after every epoch).

  • metric_names: Returns names of metrics which are relevant to this scheduler. These names appear as keys in the result dictionary passed to on_trial_result.

There are further methods in TrialScheduler, which will be discussed in detail below. This simple scheduler is also missing the points_to_evaluate argument, which we recommend every new scheduler to support, and which is discussed in more detail here.

Basic Concepts

Recall from Basics of Syne Tune that an HPO experiment is run as interplay between a backend and a scheduler, which is orchestrated by the Tuner. The backend starts, stops, pauses, or resumes training jobs and relays their reports. A trial abstracts the evaluation of a hyperparameter configuration. There is a diverse range of schedulers which can be implemented in Syne Tune, some examples are:

  • Simple “full evaluation” schedulers. These suggest configurations for new trials, but do not try to interact with running trials, even if the latter post intermediate results. A basic example is FIFOScheduler, to be discussed below.

  • Early-stopping schedulers. These require trials to post intermediate results (e.g., validation errors after every epoch), and their on_trial_result may stop underperforming trials early. An example is HyperbandScheduler with type="stopping".

  • Pause-and-resume schedulers. These require trials to post intermediate results (e.g., validation errors after every epoch). Their on_trial_result may pause trials at certain points in time, and their _suggest may decide to resume a paused trial instead of starting a new one. An example is HyperbandScheduler with type="promotion".

Note

The method on_trial_result() returns a SchedulerDecision, signaling the tuner to continue, stop, or pause the reporting trial. The difference between pause and stop is important. If a trial is stopped, it cannot be resumed later on. In particular, its checkpoints may be removed (if the backend is created with delete_checkpoints=True). On the other hand, if a trial is paused, it may be resumed in the future, and its most recent checkpoint is retained (more details are given here).

Asynchronous Job Execution

One important constraint on any scheduler to be run in Syne Tune is that calls to both suggest and on_trial_report have to be non-blocking: they need to return instantaneously, i.e. must not wait for some future events to happen. This is to ensure that in the presence of several workers (i.e., parallel execution resources), idle time is avoided: Syne Tune is always executing parallel jobs asynchronously.

Unfortunately, many HPO algorithms proposed in the literature assume a synchronous job execution setup, often for conceptual simplicity (examples include successive halving and Hyperband, as well as batch suggestions for Bayesian optimization). In general, it just takes a little extra effort to implement non-blocking versions of these, and Syne Tune provides ample support code for doing so, as will be demonstrated in detail.

Searchers and Schedulers

Many HPO algorithms have a modular structure. They need to make decisions about how to keep workers busy in order to obtain new information (suggest), and they need to react to new results posted by trials (on_trial_result). Most schedulers make these decisions following a general principle, such as:

  • Random search: New configurations are sampled at random.

  • Bayesian optimization: Surrogate models representing metrics are fit to result data, and they are used to make decisions (mostly suggest). Examples include Gaussian process based BO or TPE (Tree Parzen Estimator).

  • Evolutionary search: New configurations are obtained by mutating well-performing members of a population.

Once such internal structure is recognized, we can use it to expand the range of methods while maintaining simple, modular implementations. In Syne Tune, this is done by configuring generic schedulers with internal searchers. A basic example is given below, more advanced examples follow further below.

If you are familiar with Ray Tune, please note a difference in terminology. In Ray Tune, searcher and scheduler are two independent concepts, mapping to different decisions to be made by an HPO algorithm. In Syne Tune, the HPO algorithm is represented by the scheduler, which may have a searcher as component. We found that once model-based HPO is embraced (e.g., Bayesian optimization), this creates strong dependencies between suggest and stop or resume decisions, so that the supposed modularity does not really exist.

Maybe the most important recommendation for implementing a new scheduler in Syne Tune is this: be lazy!

The TrialScheduler API

In this section, we have a closer look at the TrialScheduler API, and how a scheduler interacts with the trial backend.

Interaction between TrialScheduler and TrialBackend

Syne Tune supports a multitude of automatic tuning scenarios which embrace asynchronous job execution. The goal of automatic tuning is to find a configuration whose evaluation results in a sufficiently small (or large, if mode="max") metric value, and to do so as fast as possible. This is done by starting trials with promising configurations (suggest), and (optionally) by stopping or pausing trials which underperform. A certain number of such evaluation (or training) jobs can be executed in parallel, on separate workers (which can be different GPUs or CPU cores on the same instance, or different instances).

In Syne Tune, this process is split between two entities: the trial backend and the trial scheduler. The backend wraps the training code to be executed for different configurations and is responsible to start jobs, as well as stop, pause or resume them. It also collects results reported by the training jobs and relays them to the scheduler. In Syne Tune, pause-and-resume scheduling is done via checkpointing. While code to write and load checkpoints locally must be provided by the training script, the backend makes them available when needed. There are two basic events which happen repeatedly during an HPO experiment, as orchestrated by the Tuner:

  • The Tuner polls the backend, which signals that one or more workers are available. For each free worker, it calls suggest(), asking for what to do next. As already seen in our first example, the scheduler will typically suggest a configuration for a new trial to be started. On the other hand, a pause-and-resume scheduler may also suggest to resume a trial which is currently paused (having been started, and then paused, in the past). Based on the scheduler response, the Tuner asks the backend to start a new trial, or to resume an existing one.

  • The Tuner polls the backend for new results, having been reported since the last recent poll. For each such result, on_trial_result() is called. The scheduler makes a decision of what to do with the reporting trial. Based on this decision, the Tuner asks the backend to stop or pause the trial (or does nothing, in case the trial is to continue).

The processing of these events is non-blocking and full asynchronous, without any synchronization points. Depending on the backend, there can be substantial delays between a trial reporting a result and a stop or pause decision being executed. During this time, the training code simply continues, it may even report further results. Moreover, a worker may be idle between finishing an evaluation and starting or resuming another one, due to delays in the backend or even compute time for decisions in the scheduler. However, it will never be idle having to wait for results from other trials.

TrialScheduler API

We now discuss additional aspects of the TrialScheduler API, beyond what has already been covered here:

  • suggest returns a TrialSuggestion object with fields spawn_new_trial_id, checkpoint_trial_id, config. Here, start_suggestion() has spawn_new_trial_id=True and requires config. A new trial is to be started with configuration config. Typically, this trial starts training from scratch. However, some specific schedulers allow the trial to warm-start from a checkpoint written for a different trial (an example is PopulationBasedTraining). A pause-and-resume scheduler may also return resume_suggestion(), where spawn_new_trial_id=False and checkpoint_trial_id is mandatory. In this case, a currently paused trial with ID checkpoint_trial_id is to be resumed. Typically, the configuration of the trial does not change, but if config is used, the resumed trial is assigned a new configuration. However, for all schedulers currently implemented in Syne Tune, a trial’s configuration never changes.

  • The only reason for suggest to return None is if no further suggestion can be made. This can happen if the configuration space has been exhausted. As discussed here, the scheduler cannot delay a suggest decision to a later point in time.

  • The helper methods _preprocess_config and _postprocess_config are used when interfacing with a searcher. Namely, the configuration space (member config_space) may contain any number of fixed attributes alongside the hyperparameters to be tuned (the latter have values of type Domain), and each hyperparameter has a specific value_type (mostly float, int or str). Searchers require clean configurations, containing only hyperparameters with the correct value types, which is ensured by _preprocess_config. Also, _postprocess_config adds back the fixed attributes from config_space, unless they have already been set.

  • on_trial_add: This method is called by Tuner once a new trial has been scheduled to be started. In general, a scheduler may assume that if suggest returns start_suggestion(), the corresponding trial is going to be started, so on_trial_add is not mandatory.

  • on_trial_error: This method is called by Tuner if the backend reports a trial’s evaluation to have failed. A useful reaction for the scheduler is to not propose this configuration again, and also to remove pending evaluations associated with this trial.

  • on_trial_complete: This method is called once a trial’s evaluation is complete, without having been stopped early. The final reported result is passed here. Schedulers who ignore intermediate reports from trials, may just implement this method and have on_trial_result return SchedulerDecision.CONTINUE. Multi-fidelity schedulers may ignore this method, since any reported result is transmitted via on_trial_result (the final result is transmitted twice, first via on_trial_result, then via on_trial_complete).

  • on_trial_remove is called when a trial gets stopped or paused, so is not running anymore, but also did not finish naturally. Once more, this method is not mandatory.

Wrapping External Scheduler Code

One of the most common instances of extending Syne Tune is wrapping external code. While there are comprehensive open source frameworks for HPO, many recent advanced algorithms are only available as research codes, typically ignoring systems aspects such as distributed scheduling, or maintaining results in an interchangeable format. Due to the modular, backend-agnostic design of Syne Tune, external scheduler code is easily integrated, and can then be compared “apples to apples” against a host of baselines, be it by fast simulation on surrogate benchmarks, or distributed across several machines.

In this chapter, we will walk through an example of how to wrap Gaussian process based Bayesian optimization from BoTorch.

BoTorchSearcher

While Syne Tune supports Gaussian process based Bayesian optimization natively via GPFIFOSearcher, with searcher="bayesopt" in FIFOScheduler, you can also use BoTorch via BoTorchSearcher, with searcher="botorch" in FIFOScheduler.

Before we look into the code, note that even though we wrap external HPO code, we still need to implement some details on our side:

  • We need to maintain the trials which have resulted in observations, as well as those which are pending (e.g., have been started, but have not yet returned an observation).

  • We need to provide the code for suggesting initial configurations, either drawing from points_to_evaluate, or sampling at random.

  • We need to avoid duplicate suggestions if allow_duplicates == False.

  • BoTorch requires configurations to be encoded as vectors with values in \([0, 1]\). We need to provide this encoding and decoding as well.

Such details are often ignored in research code (in fact, most HPO code just implements the equivalent of get_config(), given all previous data), but has robust and easy to use solutions in Syne Tune, as we demonstrate here. Here is _get_config():

syne_tune/optimizer/schedulers/searchers/botorch/botorch_searcher.py
    def _get_config(self, trial_id: str, **kwargs) -> Optional[dict]:
        trial_id = int(trial_id)
        config_suggested = self._next_initial_config()

        if config_suggested is None:
            if len(self.objectives()) < self.num_minimum_observations:
                config_suggested = self._get_random_config()
            else:
                config_suggested = self._sample_next_candidate()

        if config_suggested is not None:
            self.trial_configs[trial_id] = config_suggested

        return config_suggested

  • First, self._next_initial_config() is called, which returns a configuration from points_to_evaluate if there is still one not yet returned, otherwise None.

  • Otherwise, if there are less than self.num_minimum_observations trials which have returned observation, we return a randomly sampled configuration (self._get_random_config()), otherwise one suggested by BoTorch (self._sample_next_candidate()).

  • Here, self._get_random_config() is implemented in the base class StochasticAndFilterDuplicatesSearcher and calls the same code as all other schedulers employing random suggestions in Syne Tune. In particular, this function allows to pass an exclusion list of configurations to avoid.

  • The exclusion list self._excl_list is maintained in the base class StochasticAndFilterDuplicatesSearcher. If allow_duplicates == False, it contains all configurations suggested previously. Otherwise, it contains configurations of failed or pending trials, which we want to avoid in any case. The exclusion list is implemented as ExclusionList. Configurations are represented by hash strings which are independent of details such as floating point resolution.

  • If allow_duplicates == False and the configuration space is finite, it can happen that all configurations have already been suggested, in which case get_config returns None.

  • Finally, _get_config is called in get_config(), where if allow_duplicates == False, the new configuration is added to the exclusion list.

  • In _sample_next_candidate(), the usage of self._restrict_configurations is of interest. It relates to the restrict_configurations argument. If this is not None, configurations are suggested from a finite set, namely those in self._restrict_configurations. If allows_duplicates == False, entries are removed from there once suggested. For our example, we need to avoid doing a local optimization of the acquisition function (via optimize_acqf) in this case, but use _sample_and_pick_acq_best() instead. Since the latter uses self._get_random_config(), we are all set, since this makes use of self._restrict_configurations already.

Other methods are straightforward:

  • We also take care of pending evaluations (i.e. trials whose observations have not been reported yet). In register_pending(), the trial ID is added to self.pending_trials.

  • _update() stores the metric value from result[self._metric], where self._metric is the name of the primary metric. Also, the trial is removed from self.pending_trials, so it ceases to be pending.

  • By implementing evaluation_failed() and cleanup_pending(), we make sure that failed trials do not remain pending.

  • configure_scheduler() is a callback which allows the searcher to depend on its scheduler. In particular, the searcher should reject non-supported scheduler types. The base class implementation configure_scheduler() sets self._metric and self._mode from the corresponding attributes of the scheduler, so they do not have to be set at construction of the searcher.

Finally, all the code specific to BoTorch is located in _sample_next_candidate() and other internal methods. Importantly, BoTorch requires configurations to be encoded as vectors with values in \([0, 1]\), which is done using the self._hp_ranges member, as is detailed below.

Note

When implementing a new searcher, whether from scratch or wrapping external code, we recommend you use the base class StochasticAndFilterDuplicatesSearcher and implement the allow_duplicates argument. This will also give you proper random seed management and points_to_evaluate. Instead of get_config, you implement the internal method _get_config. If you need to draw configurations at random, use the method _get_random_config which uses the built-in exclusion list, properly deals with configuration spaces of finite size, and uses the random generator seeded in a consistent and reproducible way.

We also recommend that you implement the restrict_configurations argument, unless this is hard to do for your scheduler. Often, a scheduler can be made to score a certain number of configurations and return the best. If so, you use self._get_random_config() to select the configurations to score, which take care of restrict_configurations.

HyperparameterRanges

Most model-based HPO algorithms require configurations to be encoded as vectors with values in \([0, 1]\). If \(\mathbf{u} = e(\mathbf{x})\) and \(\mathbf{x} = d(\mathbf{u})\) denote encoding and decoding map, where \(\mathbf{x}\in \mathcal{X}\) is a configuration and \(\mathbf{u} \in [0,1]^k\), then \(d(e(\mathbf{x})) = \mathbf{x}\) for every configuration \(\mathbf{x}\), and a random sample \(d(\mathbf{u})\), where the components of \(\mathbf{u}\) are sampled uniformly at random, is equivalent to a random sample from the configuration space, as defined by the hyperparameter domains.

With HyperparameterRanges, Syne Tune provides encoding and decoding for all domains in syne_tune.config_space (see this tutorial for a summary). In fact, this API can be implemented in different ways, and the factory function make_hyperparameter_ranges() can be used to create a HyperparameterRanges object from a configuration space.

  • to_ndarray() provides the encoding map \(e(\mathbf{x})\), and to_ndarray_matrix() encodes a list of configurations into a matrix.

  • from_ndarray() provides the decoding map \(d(\mathbf{u})\).

  • config_to_match_string() maps a configuration to a hash string which can be used to test for (approximate) equality (see allow_duplicates discussion above).

Apart from encoding and decoding, HyperparameterRanges provides further functionalities, such as support for a resource attribute in model-based multi-fidelity schedulers, or the active_config_space feature which is useful to support transfer tuning (i.e., HPO in the presence of evaluation data from earlier experiments with different configuration spaces).

Note

When implementing a new searcher or wrapping external code, we recommend you use HyperparameterRanges in order to encode and decode configurations as vectors, instead of writing this on your own. Doing so ensures that your searcher supports all hyperparameter domais offered by Syne Tune, even new ones potentially added in the future. If you do not like the built-in implementation of the HyperparameterRanges API, feel free to contribute a different one.

Managing Dependencies

External code can come with extra dependencies. For example, BoTorchSearcher depends on torch, botorch, and gpytorch. If you just use Syne Tune for your own experiments, you do not have to worry about this. However, we strongly encourage you to contribute back your extension.

Since some applications of Syne Tune require restricted dependencies, such are carefully managed. There are different installation options, each of which coming with a requirements.txt file (see setup.py for details).

  • First, check whether any of the installation options cover the dependencies of your extension (possibly a union of several of them). If so, please use conditional imports w.r.t. these (see below)

  • If the required dependencies are not covered, you can create a new installation option (say, foo), via requirements-foo.txt and a modification of setup.py. In this case, please also extend try_import by a function try_import_foo_message.

Once all required dependencies are covered by some installation option, wrap their imports as follows:

try:
    from foo import bar  # My dependencies
    # ...
except ImportError:
    print(try_import_foo_message())

Extending Asynchronous Hyperband

Syne Tune provides powerful generic scheduler templates for popular methods like successive halving and Hyperband. These can be run with synchronous or asynchronous decision-making. The most important generic templates at the moment are:

Chances are your idea for a new scheduler maps to one of these templates, in which case you can save a lot of time and headache by just extending the template, rather than re-implementing the wheel. Due to Syne Tune’s modular design of schedulers and their components (e.g., searchers, decision rules), you may even get more than you bargained for.

In this section, we will walk through an example of how to furnish the asynchronous successive halving scheduler with a specific searcher.

HyperbandScheduler

Details about asynchronous successive halving and Hyperband are given in the Multi-fidelity HPO tutorial. This is a multi-fidelity scheduler, where trials report intermediate results (e.g., validation error at the end of each epoch of training). We can formalize this notion by the concept of resource \(r = 1, 2, 3, \dots\) (e.g., \(r\) is the number of epochs trained). A generic implementation of this method is provided in class:HyperbandScheduler. Let us have a look at its arguments not shared with the base class class:FIFOScheduler:

  • A mandatory argument is resource_attr, which is the name of a field in the result dictionary passed to scheduler.on_trial_report. This field contains the resource \(r\) for which metric values have been reported. For example, if a trial reports validation error at the end of the 5-th epoch of training, result contains {resource_attr: 5}.

  • We already noted the arguments max_resource_attr and max_t in class:FIFOScheduler. They are used to determine the maximum resource \(r_{max}\) (e.g., the total number of epochs a trial is to be trained, if not stopped before). As discussed in detail here, it is best practice reserving a field in the configuration space scheduler.config_space to contain \(r_{max}\). If this is done, its name should be passed in max_resource_attr. Now, every configuration sent to the training script contains \(r_{max}\), which should not be hardcoded in the script. Moreover, if max_resource_attr is used, a pause-and-resume scheduler (e.g., HyperbandScheduler with type="stopping") can modify this field in the configuration of a trial which is only to be run until a certain resource less than \(r_{max}\). Nevertheless, if max_resource_attr is not used, then \(r_{max}\) has to be passed explicitly via max_t (which is not needed if max_resource_attr is used).

  • reduction_factor, grace_period, brackets are important parameters detailed in the tutorial. If brackets > 1, we run asynchronous Hyperband with this number of brackets, while for bracket == 1 we run asynchronous successive halving (this is the default).

  • As detailed in the tutorial, type determines whether the method uses early stopping (type="stopping") or pause-and-resume scheduling (type="promotion"). Further choices of type activate specific algorithms such as RUSH, PASHA, or cost-sensitive successive halving.

Kernel Density Estimator Searcher

One of the most flexible ways of extending HyperbandScheduler is to provide it with a novel searcher. In order to understand how this is done, we will walk through MultiFidelityKernelDensityEstimator and KernelDensityEstimator. This searcher implements suggest as in BOHB, as also detailed in this tutorial. In a nutshell, the searcher splits all observations into two parts (good and bad), depending on metric values lying above or below a certain quantile, and fits kernel density estimators to these two subsets. It then makes decisions based on a particular ratio of these densities, which is approximating a variant of the expected improvement acquisition function.

We begin with the base class KernelDensityEstimator, which works with schedulers implementing TrialSchedulerWithSearcher (the most important one being FIFOScheduler), but already implements most of what is needed in the multi-fidelity context.

  • The code does quite some bookkeeping concerned with mapping configurations to feature vectors. If you want to do this from scratch for your searcher, we recommend to use HyperparameterRanges. However, KernelDensityEstimator was extracted from the original BOHB implementation.

  • Observation data is collected in self.X (feature vectors for configurations) and self.y (values for self._metric, negated if self.mode == "max"). In particular, the _update method simply appends new data to these members.

  • get_config fits KDEs to the good and bad parts of self.X, self.y. It then samples self.num_candidates configurations at random, evaluates the TPE acquisition function for each candidate, and returns the best one.

  • configure_scheduler is a callback which allows the searcher to check whether its scheduler is compatible, and to depend on details of this scheduler. In our case, we check whether the scheduler implements TrialSchedulerWithSearcher, which is the minimum requirement for a searcher.

Note

Any scheduler configured by a searcher should inherit from TrialSchedulerWithSearcher, which mainly makes sure that configure_scheduler() is called before the searcher is first used. It is also strongly recommended to implement configure_scheduler for a new searcher, restricting usage to compatible schedulers.

The class MultiFidelityKernelDensityEstimator inherits from KernelDensityEstimator:

  • On top of self.X and self.y, it also maintains resource values \(r\) for each datapoint in self.resource_levels.

  • get_config remains the same, only its subroutine train_kde for training the good and bad density models is modified. The idea is to fit these to data from a single rung level, namely the largest level at which we have observed at least self.num_min_data_points points.

  • configure_scheduler restricts usage to schedulers implementing MultiFidelitySchedulerMixin, which all multi-fidelity schedulers need to inherit from (examples are HyperbandScheduler for asynchronous Hyperband and SynchronousHyperbandScheduler for synchronous Hyperband). It also calls configure_scheduler(). Moreover, self.resource_attr is obtained from the scheduler, so does not have to be passed.

Note

Any multi-fidelity scheduler configured by a searcher should inherit from both TrialSchedulerWithSearcher and MultiFidelitySchedulerMixin. The latter is a basic API to be implemented by multi-fidelity schedulers, which is used by the configure_scheduler of searchers specialized to multi-fidelity HPO. Doing so makes sure any new multi-fidelity scheduler can seamlessly be used with any such searcher.

While being functional and simple, the MultiFidelityKernelDensityEstimator does not showcase the full range of information exchanged between HyperbandScheduler and a searcher. In particular:

  • register_pending: BOHB does not take pending evaluations into account.

  • remove_case, evaluation_failed are not implemented.

  • get_state, clone_from_state are not implemented, so schedulers with this searcher are not properly serialized.

For a more complete and advanced example, the reader is invited to study GPMultiFidelitySearcher and GPFIFOSearcher. This searcher takes pending evaluations into account (by way of fantasizing). Moreover, it can be configured with a Gaussian process model and an acquisition function, which is optimized in a gradient-based manner.

Moreover, as already noted here, HyperbandScheduler also allows to configure the decision rule for stop/continue or pause/resume as part of on_trial_report. Examples for this are found in StoppingRungSystem, PromotionRungSystem, RUSHStoppingRungSystem, PASHARungSystem, CostPromotionRungSystem.

Extending Synchronous Hyperband

In the previous section, we gave an example of how to extend asynchronous Hyperband with a new searcher. Syne Tune also provides a scheduler template for synchronous Hyperband. In this section, we will walk through an example of how to extend this template.

Our example here is somewhat more advanced than the one given for asynchronous Hyperband. In fact, we will walk through the implementation of Differential Evolution Hyperband (DEHB) in Syne Tune. Readers who are not interested in how to extend synchronous Hyperband, may skip this section without loss.

Synchronous Hyperband

The differences between synchronous and asynchronous successive halving and Hyperband are detailed in this tutorial. In a nutshell, synchronous Hyperband uses rung levels of a priori fixed size, and decisions on which trials to promote to the next level are only done when all slots in the current rung are filled. In other words, promotion decisions are synchronized, while the execution of parallel jobs still happens asynchronously. This requirement poses slight additional challenges for an implementation, over what is said in published work. We start with an overview of SynchronousHyperbandScheduler. Concepts such as resource, rung, bracket, grace period \(r_{min}\), reduction factor \(\eta\) are detailed in this tutorial.

SynchronousHyperbandBracket represents a bracket, consisting of a list of rungs, where each rung is defined by (rung_size, level), rung_size is the number of slots, level the resource level. Any system of rungs is admissible, as long as rung_size is strictly decreasing and level is strictly increasing.

  • Any active bracket (i.e., supporting running trials) has a self.current_rung, where not all slots are occupied.

  • A slot in the current rung can be occupied, pending, or free. A slot is free if it has not been associated with a trial yet. It is pending if it is associated with a trial, but the latter has not returned a metric value yet. It is occupied if it contains a metric value. A rung is worked on by turning free slots to pending by associating them with a trial, and turning pending slots to occupied when their trials return values.

  • next_free_slot: Returns SlotInRung information about the next free slot, or None if all slots are occupied or pending. This method is called as part of suggest.

  • on_result: This method is called as part of on_trial_result, when a trial reports the result a pending slot is waiting for. The corresponding slot becomes occupied. If this action renders the rung complete (i.e., all slots are occupied), then _promote_trials_at_rung_complete is called. This method increases self.current_rung and populates the trial_id fields by the top performers of the rung just completed. All slots in the new rung are free. Note that the trial_id fields of the first rung are assigned to None at the beginning, they are set by the caller (using new trial_id values provided by the backend).

SynchronousHyperbandBracketManager maintains all brackets during an experiment. It is configured by a list of brackets, where each bracket has one less rungs than its predecessor. The Hyperband algorithm cycles through this RungSystemsPerBracket in a round robin fashion. The bracket manager relays next_job and on_result calls to the correct SynchronousHyperbandBracket. The first bracket which is not yet complete, is the primary bracket.

  • next_job: The preferred bracket to take the job (via next_free_slot) is the primary one. However, a bracket may not be able to take the job, because its current rung has no free slots (i.e., they are all occupied or pending). In this case, the manager scans successive brackets. If no existing bracket can take the job, a new bracket is created.

Given these classes, SynchronousHyperbandScheduler is straightforward. It is a pause-and-resume scheduler, and it implements the API MultiFidelitySchedulerMixin, so that any searchers supporting multi-fidelity schedulers can be used. More precisely, SynchronousHyperbandScheduler inherits from SynchronousHyperbandCommon, which derives from TrialSchedulerWithSearcher and MultiFidelitySchedulerMixin and collects some code used during construction.

  • _suggest polls self.bracket_manager.next_job(). If the SlotInRung returned has trial_id assigned, it corresponds to a trial to be promoted, so the decision is resume_suggestion() Otherwise, the scheduler decides for start_suggestion() with a new trial_id, which also updates the SlotInRung.trial_id field. In any case, the scheduler maintains the curently pending slots in self._trial_to_pending_slot.

  • on_trial_result relays information back via self.bracket_manager.on_result((bracket_id, slot_in_rung)), as long as trial_id appears in self._trial_to_pending_slot and has reached its required rung level.

Differential Evolution Hyperband

We will now have a closer look at the implementation of DEHB in Syne Tune, which is a recent extension of synchronous Hyperband, where configurations of trials are chosen by evolutionary computations (mutation, cross-over, selection). This example is more advanced than the one above, in that we need to do more than furnishing SynchronousHyperbandScheduler with a new searcher. The only time when a searcher suggests configurations is at the very start, when the first rung of the first bracket is filled. All further configurations are obtained by evolutionary means.

The main difference between DEHB and synchronous Hyperband is how configurations to be evaluated in a rung are chosen, based on trials in the rung above and in earlier brackets. In synchronous Hyperband, we simply promote the best performing trials from the rung above. In particular, the configurations do not change, and trials paused in the rung above are resumed. In DEHB, this promotion process is more complicated, and importantly, it leads to new trials with different configurations. This means that trials are not resumed in DEHB. Moreover, each configuration attached to a trial is represented by an encoded vector with values in \([0, 1]\), where the mapping from vectors to configurations is not invertible if the configuration space contains discrete parameters. Much the same is done in Gaussian process based Bayesian optimization.

The very first bracket of DEHB is processed in the same way as in synchronous Hyperband, so assume the current bracket is not the first. This is how the configuration vector for a free slot in a rung is chosen:

  • Identify a mutation candidate set. If there is a rung above, this set contains the best performing trials from there, namely those that would be promoted in synchronous Hyperband. If there is no rung above, the set is the rung with same level from the previous bracket. Now, if this set contains less than 3 entries, we add configurations from earlier trials at the same rung level (the global parent pool). This mutation candidate set is the same for all choices in the same rung.

  • Draw 3 configurations at random, without replacement, from the mutation candidate set and create a mutant as a linear combination of them.

  • Identify the target configuration from the same slot and rung level in the previous bracket. The candidate for the slot is obtained by cross-over between mutant and target, in that each entry of the vector is picked randomly from that position in one of the two. An evaluation is started for this candidate configuration.

  • Finally, there is selection. Once the slot is to be occupied, we compare metric values between target and candidate, and the better one gets assigned to the slot.

While this sounds quite foreign to what we saw above, we can make progress by associating each candidate vector arising from mutation and cross-over with a new trial_id. After all, in order to determine the winner between candidate and target, we have to evaluate the former. Once this is done, we can map mutation and cross-over to suggest, and selection to on_trial_report. It becomes clear that we can use most of the infrastructure for synchronous Hyperband without change.

DifferentialEvolutionHyperbandBracket has only minor differences to SynchronousHyperbandBracket. First, _promote_trials_at_rung_complete does nothing, because promotion (i.e., determining the trials for a rung from the one above) is a more complex process now. In particular, the trial_id fields of free slots in the current rung are None until they become occupied. Second, top_list_for_previous_rung returns the top performing trials of the rung above the current one. This information is needed in order to create the mutation candidate set. All other methods remain the same. We still need to identify the next free slot (at the time of mutation and cross-over), and need to write information back when a slot gets occupied.

At this point, it is important to acknowledge some difficulties arising from asynchronous job execution. Namely, mutation and cross-over require the configurations for the mutation candidate set and target to have been determined before, and selection needs the metric value for the target. If this type of information is not present when we need it, we are not allowed to wait.

  • If the current rung is not the first in the bracket, we know that all slots in the rung above are occupied. After all, DEHB is still a synchronous HPO method.

  • The rung from where to choose the target can be problematic, as it may not have been decided upon completely when mutation starts for the current rung. In this case, our implementation cycles back through the brackets until an assigned slot (i.e., not free) is found in the right place.

  • For this reason, it is possible in principle that the target trial_id changes between cross-over and selection. Also, in rare cases, the target may not have a metric at selection time. In this case, the candidate wins.

DifferentialEvolutionHyperbandBracketManager is very similar to SynchronousHyperbandBracketManager. Differences include:

  • The system of brackets is more rigid in DEHB, in that subsequent brackets are determined by the first one. In particular, later brackets have less total budget, because rung sizes are inherited from the first bracket.

  • top_of_previous_rung helps choosing the mutation candidate set. Its return values are cached.

  • trial_id_from_parent_slot selects the trial_id for the target for cross-over and selection.

DifferentialEvolutionHyperbandScheduler implements the DEHB scheduler. Just like SynchronousHyperbandScheduler, it inherits from SynchronousHyperbandCommon, which contains common code used by both of them.

  • On top of SynchronousHyperbandScheduler, it also maps trial_id to encoded configuration in self._trial_info, and self._global_parent_pool maintains all completed trials at each rung level.

  • _suggest: We start by determining a free slot, then a configuration vector for the new trial, typically by mutation and cross-over. One difficulty is that this could end up suggesting a configuration already proposed before, because many encoded vectors map to the same configuration. In this case, we retry and may ultimately draw encoded configs at random. Except for a special case in the very first bracket, we return with start_suggestion().

  • New encoded configurations are chosen only for the first rung of the first bracket. Our implementation allows a searcher to be specified for this choice. However, the default is to sample the new vector uniformly at random, see _encoded_config_from_searcher. Importantly, this is different from using searcher="random". The latter samples a configuration and maps it to an encoded vector, a process which has less entropy if discrete hyperparameters are present.

  • on_trial_result is similar to what happens in SynchronousHyperbandScheduler, except that selection is happening as well. If the target wins in the selection, ext_slot.trial_id is changed to the target trial_id. In any case, we return SchedulerDecision.STOP because the trial will not have to be resumed later on (except in the very first bracket).

Linking in a New Searcher

At this point, you should have learned everything needed for implementing a new scheduler, or for modifying an existing template scheduler to your special requirements. Say, you have implemented a new searcher to be plugged into one of the existing generic schedulers. In this section, we will look into how a new searcher can be made available in an easy-to-use fashion.

The Searcher Factory

Recall that our generic schedulers, such as FIFOScheduler or HyperbandScheduler allow the user to choose a searcher via the string argument searcher, and to configure the searcher (away from defaults) by the dictionary argument search_options. While searcher can also be a BaseSearcher instance, it is simpler and more convenient to choose the searcher by name. For example:

  • Generic schedulers only work with certain types of searchers. This consistency is checked when searcher is a name, but may lead to subtle errors if not.

  • Several arguments of a searcher are typically just the same as for the surrounding scheduler, or can be inferred from arguments of the scheduler. This can become complex for some searchers and leads to difficult boiler plate code in case searcher is to be created by hand.

  • While not covered in this tutorial, constructing schedulers and searchers for Gaussian process based Bayesian optimization and its extensions to multi-fidelity scheduling, constrained or cost-aware search is significantly more complex, as can be seen in syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.

It is the purpose of searcher_factory() to create the correct BaseSearcher object for given scheduler arguments, including searcher (name) and search_options. Let us have a look how the constructor of FIFOScheduler calls the factory. We see how scheduler arguments like metric, mode, points_to_evaluate are just passed through to the factory. We also need to set search_options["scheduler"] in order to tell searcher_factory which generic scheduler is calling it.

The searcher_factory() code should be straightforward to understand and extend. Pick a name for your new searcher and set searcher_cls and supported_schedulers (the latter can be left to None if your searcher works with all generic schedulers). The constructor of your searcher needs to have the signature

def __init__(self, config_space: dict, metric: str, **kwargs):

Here, kwargs will be fed with search_options, but enriched with fields like mode, points_to_evaluate, random_seed_generator, scheduler. Your searcher is not required to make use of them, even though we strongly recommend to support points_to_evaluate and to make use of random_seed_generator (as is shown here). Here are some best practices for linking a new searcher into the factory:

  • The Syne Tune code is written in a way which allows to run certain scenarios with a restricted set of all possible dependencies (see FAQ). This is achieved by conditional imports. If your searcher requires dependencies beyond the core, please make sure to use try ... except ImportError as you see in the code.

  • Try to make sure that your searcher also works without search_options being specified by the user. You will always have the fields contributed by the generic schedulers, and for all others, your code should ideally come with sensible defaults.

  • Make sure to implement the configure_scheduler method of your new searcher, restricting usage to supported scheduler types.

The Baseline Wrappers

In order to facilitate choosing and configuring a scheduler along with its searcher, Syne Tune defines the most frequently used combinations in syne_tune.optimizer.baselines. The minimal signature of a baseline class is this:

def __init__(self, config_space: dict, metric: str, **kwargs):

Or, in the multi-objective case:

def __init__(self, config_space: dict, metric: List[str], **kwargs):

If the underlying scheduler maintains a searcher (as most schedulers do), arguments to the searcher (except for config_space, metric) are given in kwargs["search_options"]. If a scheduler is of multi-fidelity type, the minimal signature is:

def __init__(self, config_space: dict, metric: str, resource_attr: str, **kwargs):

If the scheduler accepts a random seed, this must be kwargs["random_seed"]. Several wrapper classes in syne_tune.optimizer.baselines have signatures with more arguments, which are either passed to the scheduler or to the searcher. For example, some wrappers make random_seed explicit in the signature, instead of having it in kwargs.

Note

If a scheduler maintains a searcher inside, and in particular if it simply configures FIFOScheduler or class:HyperbandScheduler with a new searcher, it is strongly recommended to adhere to the policy to specify searcher arguments in kwargs["search_options"]. This simplifies enabling the new scheduler in the simple experimentation framework of syne_tune.experiments, and in general provides a common user experience across different schedulers.

Let us look at an example of a baseline wrapper whose underlying scheduler is of type FIFOScheduler with a specific searcher, which is not itself created via a searcher factory:

syne_tune/optimizer/baselines.py – REA
class REA(FIFOScheduler):
    """Regularized Evolution (REA).

    See :class:`~syne_tune.optimizer.schedulers.searchers.regularized_evolution.RegularizedEvolution`
    for ``kwargs["search_options"]`` parameters.

    :param config_space: Configuration space for evaluation function
    :param metric: Name of metric to optimize
    :param population_size: See
        :class:`~syne_tune.optimizer.schedulers.searchers.RegularizedEvolution`.
        Defaults to 100
    :param sample_size: See
        :class:`~syne_tune.optimizer.schedulers.searchers.RegularizedEvolution`.
        Defaults to 10
    :param random_seed: Random seed, optional
    :param kwargs: Additional arguments to
        :class:`~syne_tune.optimizer.schedulers.FIFOScheduler`
    """

    def __init__(
        self,
        config_space: Dict[str, Any],
        metric: str,
        population_size: int = 100,
        sample_size: int = 10,
        random_seed: Optional[int] = None,
        **kwargs,
    ):
        searcher_kwargs = _create_searcher_kwargs(
            config_space, metric, random_seed, kwargs
        )
        searcher_kwargs["population_size"] = population_size
        searcher_kwargs["sample_size"] = sample_size
        super(REA, self).__init__(
            config_space=config_space,
            metric=metric,
            searcher=RegularizedEvolution(**searcher_kwargs),
            random_seed=random_seed,
            **kwargs,
        )


def create_gaussian_process_estimator(
    config_space: Dict[str, Any],
    metric: str,
    random_seed: Optional[int] = None,
    search_options: Optional[Dict[str, Any]] = None,
) -> Estimator:
    scheduler = BayesianOptimization(
        config_space=config_space,
        metric=metric,
        random_seed=random_seed,
        search_options=search_options,
    )
    searcher = scheduler.searcher  # GPFIFOSearcher
    state_transformer = searcher.state_transformer  # ModelStateTransformer
    estimator = state_transformer.estimator  # GaussProcEmpiricalBayesEstimator

    # update the estimator properties
    estimator.active_metric = metric
    return estimator


class MORandomScalarizationBayesOpt(FIFOScheduler):
    """
    Uses :class:`~syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveMultiSurrogateSearcher`
    with one standard GP surrogate model per metric (same as in
    :class:`BayesianOptimization`, together with the
    :class:`~syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveLCBRandomLinearScalarization`
    acquisition function.

    If `estimators` is given, surrogate models are taken from there, and the
    default is used otherwise. This is useful if you have a good low-variance
    model for one of the objectives.

    :param config_space: Configuration space for evaluation function
    :param metric: Name of metrics to optimize
    :param mode: Modes of optimization. Defaults to "min" for all
    :param random_seed: Random seed, optional
    :param estimators: Use these surrogate models instead of the default GP
        one. Optional
    :param kwargs: Additional arguments to
        :class:`~syne_tune.optimizer.schedulers.FIFOScheduler`. Here,
        ``kwargs["search_options"]`` is used to create the searcher and its
        GP surrogate models.
    """

    def __init__(
        self,
        config_space: Dict[str, Any],
        metric: List[str],
        mode: Union[List[str], str] = "min",
        random_seed: Optional[int] = None,
        estimators: Optional[Dict[str, Estimator]] = None,
        **kwargs,
    ):
        try:
            from syne_tune.optimizer.schedulers.multiobjective import (
                MultiObjectiveMultiSurrogateSearcher,
                MultiObjectiveLCBRandomLinearScalarization,
            )
        except ImportError:
            logging.info(try_import_moo_message())
            raise

        searcher_kwargs = _create_searcher_kwargs(
            config_space, metric, random_seed, kwargs
        )

        if estimators is None:
            estimators = dict()
        else:
            estimators = estimators.copy()
        if isinstance(mode, str):
            mode = [mode] * len(metric)
        if "search_options" in kwargs:
            search_options = kwargs["search_options"].copy()
        else:
            search_options = dict()
        search_options["no_fantasizing"] = True
        for _metric in metric:
            if _metric not in estimators:
                estimators[_metric] = create_gaussian_process_estimator(
                    config_space=config_space,
                    metric=_metric,
                    search_options=search_options,
                )
        # Note: ``mode`` is dealt with in the ``update`` method of the MO
        # searcher, by converting the metrics. Internally, all metrics are
        # minimized
        searcher = MultiObjectiveMultiSurrogateSearcher(
            estimators=estimators,
            mode=mode,
            scoring_class=partial(
                MultiObjectiveLCBRandomLinearScalarization, random_seed=random_seed
            ),
            **searcher_kwargs,
        )
        super().__init__(
            config_space=config_space,
            metric=metric,
            mode=mode,
            searcher=searcher,
            random_seed=random_seed,
            **kwargs,
        )


class NSGA2(FIFOScheduler):
    """
    See :class:`~syne_tune.optimizer.schedulers.searchers.RandomSearcher`
    for ``kwargs["search_options"]`` parameters.

    :param config_space: Configuration space for evaluation function
    :param metric: Name of metric to optimize
    :param population_size: The size of the population for NSGA-2
    :param random_seed: Random seed, optional
    :param kwargs: Additional arguments to
        :class:`~syne_tune.optimizer.schedulers.FIFOScheduler`
    """

    def __init__(
        self,
        config_space: Dict[str, Any],
        metric: List[str],
        mode: Union[List[str], str] = "min",
        population_size: int = 20,
        random_seed: Optional[int] = None,
        **kwargs,
    ):
        searcher_kwargs = _create_searcher_kwargs(
            config_space, metric, random_seed, kwargs
        )
        searcher_kwargs["mode"] = mode
        searcher_kwargs["population_size"] = population_size
        super(NSGA2, self).__init__(
            config_space=config_space,
            metric=metric,
            mode=mode,
            searcher=NSGA2Searcher(**searcher_kwargs),
            random_seed=random_seed,
            **kwargs,
        )


  • The signature has config_space, metric, and random_seed. It also has two searcher arguments, population_size and sample_size.

  • In order to compile the arguments searcher_kwargs for creating the searcher, we first call _create_searcher_kwargs(config_space, metric, random_seed, kwargs). Doing so is particularly important in order to ensure random seeds are managed between scheduler and searcher in the same way across different Syne Tune schedulers.

  • Next, the additional arguments population_size and sample_size need to be appended to these searcher arguments. Had we used kwargs["search_options"] instead, this would not be necessary.

  • Finally, we create FIFOScheduler, passing config_space, metric, as well as the new searcher via searcher=RegularizedEvolution(**searcher_kwargs), and finally pass **kwargs at the end.

Baselines and Benchmarking

As shown in this tutorial and this tutorial, a particularly convenient way to define and run experiments is using the code in syne_tune.experiments. Once a new scheduler has a baseline wrapper, it is very easy to make it available there: you just need to add a wrapper in syne_tune.experiments.default_baselines. For the REA example above, this is:

from syne_tune.optimizer.baselines import REA as _REA

def REA(method_arguments: MethodArguments, **kwargs):
    return _REA(**_baseline_kwargs(method_arguments, kwargs))
Contribute your Extension

At this point, you are ready to plug in your latest idea and make it work in Syne Tune. Given that it works well, we would encourage you to contribute it back to the community. We are looking forward to your pull request.

Extending the Documentation

Syne Tune comes with an extensive amount of documentation:

  • User-facing APIs are commented in the code, using the reStructered text format. This is used to generate the API Reference. Please refer to the code in order to understand our conventions. Please make sure that links to classes, methods, or functions work. In the presence of :math: expression, the docstring should be raw: r""" ... """.

  • Examples in examples/ are working, documented scripts showcasing individual features. If you contribute a new example, please also link it in docs/source/examples.rst.

  • Frequently asked questions at docs/source/faq.rst.

  • Table of all HPO algorithms in docs/source/getting_started.rst. If you contribute a new HPO method, please add a row there. As explained above, please also extend baselines.

  • Tutorials at docs/source/tutorials/. These are short chapters, explaining a concept in more detail than an example. A tutorial should be self-contained and come with functioning code, which can be run in a reasonable amount of time and cost. It may contain figures created with a larger effort.

Building the Documentation

You can build the documentation locally as follows. Make sure to have Syne Tune installed with dev dependencies:

cd docs
rm -rf source/_apidoc
make clean
make html

Then, open docs/build/html/index.html in your browser.

The documentation is also built as part of our CI system, so you can inspect it as part of a pull request:

  • Move to the list of all checks (if the PR is in good shape, you should see All checks have passed)

  • Locate docs/readthedocs.org:syne-tune at the end of the list. Click on Details

  • Click on View docs just below Build took X seconds (do not click on the tall View Docs button upper right, this leads to the latest public docs)

When extending the documentation, please verify the following:

  • Check whether links work. They typically fail silently, possibly emitting a warning. Use proper links when referring to classes, modules, functions, methods, or constants, and check whether the links to the API Reference work.

Conventions

We use the following conventions to ensure that documentation stays up-to-date:

  • Use literalinclude for almost all code snippets. In general, the documentation is showing code which is part of a functional script, which can either be in examples/, in benchmarking/examples/, or otherwise next to the documentation files.

  • Almost all code shown in the documentation is run as part of integration testing (.github/workflows/integ-tests.yml) or end-to-end testing (.github/workflows/end-to-end-tests.yml). If you contribute documentation with code, please insert your functional script into one of the two:

    • integ-tests.yml is run as part of our CI system. Code should run for no more than 30 seconds. It must not depend on data loaded from elsewhere, and not make use of surrogate blackboxes. It must not use SageMaker.

    • end-to-end-tests.yml is run manually on a regular basis, and in particular before a new release. Code may download files or depend on surrogate blackboxes. It may use SageMaker. Costs and runtime should be kept reasonable.

  • Links to other parts of the documentation should be used frequently. We use anonymous references (two trailing underscores).

  • Whenever mentioning a code construction (class, method, function, module, constant), please use a proper link with absolute module name and leading tilde. This allows interested readers to inspect API details and the code. When the same name is used several times in the same paragraph, it is sufficient to use a proper link for the first occurence only.

How to Implement Bayesian Optimization

This tutorial can be seen as more advanced successor of our developer tutorial. It provides an overview of how model-based search, and in particular Bayesian optimization, is implemented in Syne Tune, and how this code can be extended in order to fit your needs. The basic developer tutorial is a prerequisite to take full advantage of the advanced tutorial here.

We hope this information inspires you to give it a try to extend Syne Tune’s Bayesian optimization to your needs. Please do consider contributing your efforts to Syne Tune.

Note

In order to develop new methodology in Syne Tune, make sure to use an installation from source. In particular, you need to have installed the dev dependencies.

Overview of Module Structure

We begin with an overview of the module structure of the Bayesian optimization (BO) code in Syne Tune. Feel free to directly move to the first example and come back here for reference.

Recall that Bayesian optimization is implemented in a searcher, which is a component of a scheduler responsible for suggesting the next configuration to sample, given data from earlier trials. While searchers using BO are located in syne_tune.optimizer.schedulers.searchers and submodules, the BO code itself is found in syne_tune.optimizer.schedulers.searchers.bayesopt. Recall that a typical BO algorithm is configured by a surrogate model and an acquisition function. In Syne Tune, acquisition functions are implemented generically, while (except for special cases) surrogate models can be grouped in two different classes:

  • Gaussian process based surrogate models: Implementations in gpautograd.

  • Surrogate models based on scikit-learn like estimators: Implementations in sklearn.

The remaining code in syne_tune.optimizer.schedulers.searchers.bayesopt is generic or wraps lower-level code. Submodules are as follows:

  • datatypes: Collects types related to maintaining data obtained from trials. The most important class is TuningJobState, which collects relevant data during an experiment. Note that other relevant classes are in syne_tune.optimizer.schedulers.searchers.utils, such as HyperparameterRanges, which wraps a configuration space and maps configurations to encoded vectors used as inputs to a surrogate model.

  • models: Contains a range of surrogate models, both for single and multi-fidelity tuning, along with the machinery to fit parameters of these models. In a nutshell, retraining of parameters and posterior computations for a surrogate model are defined in Estimator, which returns a Predictor to be used for posterior predictions, which in turn drive the optimization of an acquisition function. A model-based searcher interacts with a ModelStateTransformer, which maintains the state of the experiment (a TuningJobState object) and interacts with an Estimator. Subclasses of Estimator and Predictor are mainly wrappers of underlying code in gpautograd or sklearn. Details will be provided shortly. This module also contains a range of acquisition functions, mostly in meanstd_acqfunc.

  • tuning_algorithms: The Bayesian optimization logic resides here, mostly in BayesianOptimizationAlgorithm. Interfaces for all relevant concepts are defined in base_classes:

    • Predictor: Probabilistic predictor obtained from surrogate model, to be plugged into acquisition function.

    • AcquisitionFunction: Acquisition function, which is optimized in order to suggest the next configuration.

    • ScoringFunction: Base class of AcquisitionFunction which does not support gradient computations. Score functions can be used to rank a finite number of candidates.

    • LocalOptimizer: Local optimizer for minimizing the acquisition function.

  • gpautograd: The Gaussian process based surrogate models, defined in models, can be implemented in different ways. Syne Tune currently uses the lightweight autograd library, and the corresponding implementation lies in this module.

  • sklearn: Collects code required to implement surrogate models based on scikit-learn like estimators.

Note

The most low-level code for Gaussian process based Bayesian optimization is contained in gpautograd, which is specific to autograd and L-BFGS optimization. Unless you want to implement a new kernel function, you probably do not have to extend this code. As we will see, most extensions of interest can be done in models (new surrogate model, new acquisition function), or in tuning_algorithms (different BO workflow).

A Walk Through Bayesian Optimization

The key primitive of BO is to suggest a next configuration to evaluate the unknown target function at (e.g., the validation error after training a machine learning model with a hyperparameter configuration), based on all data gathered about this function in the past. This primitive is triggered in the get_config() method of a BO searcher. It consists of two main steps:

  • Estimate surrogate model(s), given all data obtained. Often, a single surrogate model represents the target metric of interest, but in generalized setups such as multi-fidelity, constrained, or multi-objective BO, surrogate models may be fit to several metrics. A surrogate model provides predictive distributions for the metric it represents, at any configuration, which allows BO to explore the space of configurations not yet sampled at. For most built-in GP based surrogate models, estimation is done by maximizing the log marginal likelihood, as we see in more detail below.

  • Use probabilistic predictions of surrogate models to search for the best next configuration to sample at. This is done in BayesianOptimizationAlgorithm, and is the main focus here.

BayesianOptimizationAlgorithm can suggest a batch of num_requested_candidates > 1. If greedy_batch_selection == True, this is done greedily, one configuration at a time, yet diversity is maintained by inserting already suggested configurations as pending into the state. If greedy_batch_selection == False, we simply return the num_requested_candidates top-scoring configurations. For simplicity, we focus on num_requested_candidates == 1, so that a single configuration is suggested. This happens in several steps:

  • First, a list of num_initial_candidates initial configurations is drawn at random from initial_candidates_generator of type CandidateGenerator.

  • Next, these configurations are scored using initial_candidates_scorer of type ScoringFunction. This is a parent class of AcquisitionFunction, but acquisition functions support gradient computation as well. The scoring function typically depends on a predictor obtained from a surrogate model.

  • Finally, local optimization of an acquisition function is run, using an instance of LocalOptimizer, which depends on an acquisition function and one or more predictors. Local optimization is initialized with the top-scoring configuration from the previous step. If it fails or does not result in a configuration with a better acquisition value, then this initial configuration is returned. The final local optimization can be skipped by passing an instance of NoOptimization.

This workflow offers a number of opportunities for customization:

  • The initial_candidates_generator by default draws configurations at random with replacement (checking for duplicates is expensive, and does not add value). This could be replaced by pseudo-random sampling with better coverage properties, or by Latin hypercube designs.

  • The initial_candidate_scorer is often the same as the acquisition function in the final local optimization. Other acquisition strategies, such as (independent) Thompson sampling, can be implemented here.

  • You may want to customize the acquisition function feeding into local optimization (and initial scoring), more details are provided below.

Implementing a Surrogate Model

In Bayesian optimization (BO), a surrogate model represents the data observed from a target metric so far, and its probabilistic predictions at new configurations (typically involving both predictive mean and variance) guides the search for a most informative next acquisition. In this section, we will show how surrogate models are implemented in Syne Tune, and give an example of how a novel model can be added.

Recall from above that Syne Tune offers surrogate model from two broad classes: Gaussian process based models and scikit-learn estimator based models. Both are implemented in terms of the same abstractions Estimator, and Predictor. We will first walk through GP based surrogate models, then dive into an example of how to implement a new scikit-learn estimator based model. More details about how to extend GP based models are provided further below.

Example

Before diving into details, let us look at a simple example for how to implement a new surrogate model in Syne Tune, of the scikit-learn estimator based type. It does not come with some of the complexities of Gaussian process based surrogate models, to be discussed below:

  • Fantasizing is not supported

  • MCMC (or ensemble predictions) is not supported

  • Gradient-based optimization of an acquisition function is not supported, in that Bayesian optimization is scoring a finite number of candidates drawn at random, selecting the best

The full example code is given here. We implement subclasses of SKLearnPredictor and SKLearnEstimator. These are wrapped by SKLearnPredictorWrapper and SKLearnEstimatorWrapper.

examples/launch_sklearn_surrogate_bo.py
from syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn import (
    SKLearnEstimator,
    SKLearnPredictor,
)


class BayesianRidgePredictor(SKLearnPredictor):
    """
    Predictor for surrogate model given by ``sklearn.linear_model.BayesianRidge``.
    """

    def __init__(self, ridge: BayesianRidge):
        self.ridge = ridge

    def predict(self, X: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
        return self.ridge.predict(X, return_std=True)


class BayesianRidgeEstimator(SKLearnEstimator):
    """
    Estimator for surrogate model given by ``sklearn.linear_model.BayesianRidge``.

    None of the parameters of ``BayesianRidge`` are exposed here, so they are all
    fixed up front.
    """

    def __init__(self, *args, **kwargs):
        self.ridge = BayesianRidge(*args, **kwargs)

    def fit(
        self, X: np.ndarray, y: np.ndarray, update_params: bool
    ) -> SKLearnPredictor:
        self.ridge.fit(X, y.ravel())
        return BayesianRidgePredictor(ridge=copy.deepcopy(self.ridge))


  • The BayesianRidgeEstimator is wrapping the scikit-learn estimator sklearn.linear_model.BayesianRidge, which implements a form of Bayesian regression estimation. While this method has hyperparameters, they are automatically set in fit, so we do not need to make them explicit. The result of fit is a BayesianRidgePredictor instance which wraps a copy of the fitted scikit-learn estimator.

  • In BayesianRidgePredictor, the predict methods calls the equivalent of the scikit-learn estimator with return_std=True, so that both predictive means and stddevs are returned.

The remaining launcher script is much the same as other examples, except that FIFOScheduler is used with a particular searcher:

examples/launch_sklearn_surrogate_bo.py
    searcher = SKLearnSurrogateSearcher(
        config_space=config_space,
        metric=METRIC_ATTR,
        estimator=BayesianRidgeEstimator(),
        scoring_class=EIAcquisitionFunction,
    )
The Predictor Class

Scikit-learn based estimators are typically rather simple and based on deterministic machine learning methods. Bayesian optimization is usually run with Bayesian models, where proper quantification of uncertainty is center-stage, and supporting these is a little more difficult.

In Bayesian statistics, (surrogate) models are conditioned on data in order to obtain a posterior distribution, represented by a posterior state. Given this state, probabilistic predictions can be done at arbitrary input points. This is done by objects of type Predictor, whose methods deal with predictions on new configurations.

Note

Before moving on, it is important to understand the difference between conditioning a probabilistic model on data in order to obtain a posterior distribution, with which probabilistic predictions (i.e., mean and variance) can be computed at input points, and learning (or fitting) the (hyper)parameters of the model. For a Bayesian surrogate model, the latter involves Markov Chain Monte Carlo or marginal likelihood optimization, which requires conditioning on data several times. For non-Bayesian models, parameters are often fit by cross-validation.

At this point, there are a number of relevant concepts:

  • A model can be “fitted” by Markov Chain Monte Carlo (MCMC), in which case its predictive distribution is an ensemble. This is why prediction methods returns lists. In the default case (single model, no MCMC), these lists are of size one.

  • A model may support fantasizing in order to properly deal with pending configurations in the current state (see also register_pending in the discussion here). At least in the Gaussian process surrogate model case, fantasizing is done by drawing nf samples of target values for the pending configurations, then average predictions over this sample. The Gaussian predictive distributions in this average share the same variance, but have different means. A surrogate model which does not support fantasizing, can ignore this extra complexity.

Take the example of a basic Gaussian process surrogate model, which is behind BayesianOptimization. The predictor is GaussProcPredictor. This class can serve models fit by marginal likelihood optimization (empirical Bayes) or MCMC, but let us focus on the former. Predictions in this model are based on a posterior state, which maintains a representation of the Gaussian posterior distribution needed for probabilistic predictions. Say we would like do a prediction at some configuration \(\mathbf{c}\). First, this configuration is mapped to an (encoded) input vector \(\mathbf{x}\). Next, predictive distributions are computed, using the posterior state:

\[P(y | \mathbf{x}) = \left[ \mathcal{N}(y | \mu_j(\mathbf{x}), \sigma^2(\mathbf{x})) \right],\quad j=1,\dots, \mathrm{nf}.\]

Here, nf denotes the number of fantasy samples (nf=1 if fantasizing is not supported). This is served by methods of Predictor:

  • hp_ranges_for_prediction: Returns instance of HyperparameterRanges which is used to map a configuration \(\mathbf{c}\) to an encoded vector \(\mathbf{x}\).

  • predict: Given a matrix \(\mathbf{X}\) of input vectors (these are the rows \(\mathbf{x}_i\)), return a list of dictionaries. In our non-MCMC example, this list has length 1. The dictionary contains statistics of the predictive distribution. In our example, this would be predictive means (key “mean”) and predictive standard deviations (key “std”). More precisely, the entry for “mean” would be a matrix \([\mu_j(\mathbf{x}_i)]_{i,j}\) of shape (n, nf), where n is the number of input vectors, and the entry for “std” would be a vector \([\sigma(\mathbf{x}_i)]_i\) of shape (n,). If the surrogate model does not support fantasizing, the entry for “mean” is also a vector of shape (n,).

  • predict_candidates: Version of predict, where the input is a list of configurations \([\mathbf{c}_j]\), which are first mapped to rows of the matrix \(\mathbf{X}\) by using hp_ranges_for_prediction.

  • keys_predict: Keys of dictionaries returned by predict. If a surrogate model is to be used with a standard acquisition function, such as expected improvement, it needs to return at least means (“mean”) and standard deviations (“std”). However, in other contexts, a surrogate model may be deterministic, in which case only means (“mean”) are returned. This method allows an acquisition function to check whether it can work with surrogate models passed to it.

  • backward_gradient: This method is needed in order to support local gradient-based optimization of an acquisition function, as discussed here. It is detailed below.

  • current_best: A number of acquisition functions depend on the incumbent, which is a smooth approximation to the best target value observed so far. Typically, this is implemented as \(\mathrm{min}(\mu_j(\mathbf{x}_i))\) over all inputs \(\mathbf{x}_i\) already sampled for previous trials. As with predict, this returns a list of vectors of shape (nf,), catering for fantasizing. If fantasizing is not supported, this is a list of scalars, and the list size is 1 for non-MCMC.

Note

In fact, GaussProcPredictor inherits from BasePredictor, which extends the base interface by some helper code to implement the current_best method.

Supporting Local Gradient-based Optimization

As discussed above, BO in Syne Tune supports local gradient-based optimization of an acquisition function. This needs to be supported by an implementation of Predictor, in terms of the backward_gradient method.

In the most basic case, an acquisition function \(\alpha(\mathbf{x})\) has the following structure:

\[\alpha(\mathbf{x}) = \alpha(\mu(\mathbf{x}), \sigma(\mathbf{x})).\]

We ignore fantasizing here, otherwise \(\mu(\mathbf{x})\) becomes a vector. For gradient-based optimization, we need derivatives

\[\frac{\partial\alpha}{\partial\mathbf{x}} = \frac{\partial\alpha}{\partial\mu} \frac{\partial\mu}{\partial\mathbf{x}} + \frac{\partial\alpha}{\partial\sigma} \frac{\partial\sigma}{\partial\mathbf{x}}.\]

The backward_gradient method takes arguments \(\mathbf{x}\) (input) and a dictionary mapping “mean” to \(\partial\alpha/\partial\mu\) at \(\mu = \mu(\mathbf{x})\), “std” to \(\partial\alpha/\partial\sigma\) at \(\sigma = \sigma(\mathbf{x})\) (head_gradients), and returns the gradient \(\partial\alpha/\partial\mathbf{x}\).

Readers familiar with deep learning frameworks like PyTorch may wonder why we don’t just combine surrogate model and acquisition function into forming \(\alpha(\mathbf{x})\), and compute its gradient by reverse mode differentiation. However, this would strongly couple the two concepts, in that they would have to be implemented in the same auto-differentiation system. Instead, backward_gradient decouples the gradient computation into head gradients for the acquisition function, which (as we will see) can be implemented in native NumPy, and backward_gradient for the surrogate model itself. For Syne Tune’s Gaussian process surrogate models, the latter is implemented using autograd. If the predict method is implemented using this framework, gradients are obtained automatically as usual.

ModelStateTransformer and Estimator

An instance of Predictor represents the posterior distribution of a model conditioned on observed data. Where does this conditioning take place? Note that while machine learning APIs like scikit-learn couple fitting and prediction in a single API, these two are decoupled in Syne Tune by design:

  • Estimator: The most important method is fit_from_state(). It computes the posterior state by conditioning on observed data, which are sufficient statistics required for probabilistic predictions. Moreover, if update_params=True, this final conditioning is preceded by fitting the (hyper)parameters of the model (this is more expensive, and if update_params=False, the current parameters are used without updating them).

  • Predictor: Wraps the posterior state computed by the Estimator, allows for predictions.

The fitting of surrogate models underlying a Bayesian optimization experiment happens in ModelStateTransformer, which interfaces between a model-based searcher and the surrogate model. The ModelStateTransformer maintains the state of the experiment, where all data about observations and pending configurations are collected. Its fit() method triggers fitting the surrogate models to the current data (this step can be skipped for computational savings) and computing their posterior states.

ModelStateTransformer hands down these tasks to an object of type Estimator, which is specific to the surrogate model being used. For our Gaussian process example, this would be GaussProcEmpiricalBayesEstimator. Here, parameters of the Gaussian process models (such as parameters of the covariance function) are fitted by marginal likelihood maximization, and the GP posterior state is computed.

Note

To summarize, if your surrogate model needs to be fit to data, you need to implement a subclass of Estimator, whose fit_from_state method takes in data in form of a TuningJobState and returns a Predictor. You can use transform_state_to_data() in order to convert the TuningJobState object into the usual pair of feature matrix features and target vector targets, along with normalization of targets.

Implementing Components of Bayesian Optimization

At this point, you should have obtained an overview of how Bayesian optimization (BO) is structured in Syne Tune, and understood how a new surrogate model can be implemented. In this section, we turn to other components of BO: the acquisition function, and the covariance kernel of the Gaussian process surrogate model. We also look inside the factory for creating Gaussian process based searchers.

Implementing an Acquisition Function

In Bayesian optimization, the next configuration to sample at is chosen by minimizing an acquisition function:

\[\mathbf{x}_* = \mathrm{argmin}_{\mathbf{x}} \alpha(\mathbf{x})\]

In general, the acquisition function \(\alpha(\mathbf{x})\) is optimized over encoded vectors \(\mathbf{x}\), and the optimal \(\mathbf{x}_*\) is rounded back to a configuration. This allows for gradient-based optimization of \(\alpha(\mathbf{x})\).

In Syne Tune, acquisition functions are subclasses of AcquisitionFunction. It may depend on one or more surrogate models, by being a function of the predictive statistics returned by the predict method of Predictor. For a wide range of acquisition functions used in practice, we have that

\[\alpha(\mathbf{x}) = \alpha(\mu(\mathbf{x}), \sigma(\mathbf{x})).\]

In other words, \(\alpha(\mathbf{x})\) is a function of the predictive mean and standard deviation of a single surrogate model. This case is covered by MeanStdAcquisitionFunction. More general, this class implements acquisition functions depending on one or more surrogate models, each of which returning means and (optionally) standard deviations in predict. Given the generic code in Syne Tune, a new acquisition function of this type is easy to implement. As an example, consider the lower confidence bound (LCB) acquisition function:

\[\alpha_{\mathrm{LCB}}(\mathbf{x}) = \mu(\mathbf{x}) - \kappa \sigma(\mathbf{x}),\quad \kappa > 0.\]

Here is the code:

bayesopt/models/meanstd_acqfunc_impl.py
class LCBAcquisitionFunction(MeanStdAcquisitionFunction):
    r"""
    Lower confidence bound (LCB) acquisition function:

    .. math::

       h(\mu, \sigma) = \mu - \kappa * \sigma
    """

    def __init__(self, predictor: Predictor, kappa: float, active_metric: str = None):
        super().__init__(predictor, active_metric)
        assert isinstance(predictor, Predictor)
        assert kappa > 0, "kappa must be positive"
        self.kappa = kappa

    def _head_needs_current_best(self) -> bool:
        return False

    def _compute_head(
        self,
        output_to_predictions: SamplePredictionsPerOutput,
        current_best: Optional[np.ndarray],
    ) -> np.ndarray:
        means, stds = self._extract_mean_and_std(output_to_predictions)
        return np.mean(means - stds * self.kappa, axis=1)

    def _compute_head_and_gradient(
        self,
        output_to_predictions: SamplePredictionsPerOutput,
        current_best: Optional[np.ndarray],
    ) -> HeadWithGradient:
        mean, std = self._extract_mean_and_std(output_to_predictions)
        nf_mean = mean.size

        dh_dmean = np.ones_like(mean) / nf_mean
        dh_dstd = (-self.kappa) * np.ones_like(std)
        return HeadWithGradient(
            hval=np.mean(mean - std * self.kappa),
            gradient={self.active_metric: dict(mean=dh_dmean, std=dh_dstd)},
        )


  • An object is constructed by passing model (a Predictor) and kappa (the positive constant \(\kappa\)). The surrogate model must return means and standard deviations in its predict method.

  • _compute_head: This method computes \(\alpha(\mathbf{\mu}, \mathbf{\sigma})\), given means and standard deviations. The argument output_to_predictions is a dictionary of dictionaries. If the acquisition function depends on a dictionary of surrogate models, the first level corresponds to that. The second level corresponds to the statistics returned by predict. In the simple case here, the first level is a single entry with key INTERNAL_METRIC_NAME, and the second level uses keys “mean” and “std” for means \(\mathbf{\mu}\) and stddevs \(\mathbf{\sigma}\). Recall that due to fantasizing, the “mean” entry can be a (n, nf) matrix, in which case we compute the average along the columns. The argument current_best is needed only for acquisition functions which depend on the incumbent.

  • _compute_head_and_gradient: This method is needed for the computation of \(\partial\alpha/\partial\mathbf{x}\), for a single input \(\mathbf{x}\). Given the same arguments as _compute_head (but for \(n = 1\) inputs), it returns a HeadWithGradient object, whose hval entry is the same as the return value of _compute_head, whereas the gradient entry contains the head gradients which are passed to the backward_gradient method of the Predictor. This entry is a nested dictionary of the same structure as output_to_predictions. The head gradient for a single surrogate model (as in our example) has \(\partial\alpha/(\partial\mathbf{\mu})\) for “mean” and \(\partial\alpha/(\partial\mathbf{\sigma})\) for “std”. It is particularly simple for the LCB example.

  • _head_needs_current_best returns False, since the LCB acquisition function does not depend on the incumbent (i.e., the current best metric value), which means that the current_best arguments need not be provided.

Finally, a new acquisition function should be linked into acquisition_function_factory(), so that users can select it via arguments acq_function and acq_function_kwargs in BayesianOptimization. The factory code is:

bayesopt/models/acqfunc_factory.py
from functools import partial

from syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes import (
    AcquisitionFunctionConstructor,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl import (
    EIAcquisitionFunction,
    LCBAcquisitionFunction,
)


SUPPORTED_ACQUISITION_FUNCTIONS = (
    "ei",
    "lcb",
)


def acquisition_function_factory(name: str, **kwargs) -> AcquisitionFunctionConstructor:
    assert (
        name in SUPPORTED_ACQUISITION_FUNCTIONS
    ), f"name = {name} not supported. Choose from:\n{SUPPORTED_ACQUISITION_FUNCTIONS}"
    if name == "ei":
        return EIAcquisitionFunction
    else:  # name == "lcb"
        kappa = kwargs.get("kappa", 1.0)
        return partial(LCBAcquisitionFunction, kappa=kappa)

Here, acq_function_kwargs is passed as kwargs. For our example, acq_function="lcb". The user can pass a value for kappa via acq_function_kwargs={"kappa": 0.5}.

A slightly more involved example is EIAcquisitionFunction, representing the expected improvement (EI) acquisition function, which is the default choice for BayesianOptimization in Syne Tune. This function depends on the incumbent, so current_best needs to be given. Note that if the means passed to _compute_head have shape (n, nf) due to fantasies, then current_best has shape (1, nf), since the incumbent depends on the fantasy sample.

Acquisition functions can depend on more than one surrogate model. In such a case, the model argument to their constructor is a dictionary, and the key names of the corresponding models (or outputs) are also used in the output_to_predictions arguments and head gradients:

  • EIpuAcquisitionFunction is an acquisition function for cost-aware HPO:

    \[\alpha_{\mathrm{EIpu}}(\mathbf{x}) = \frac{\alpha_{\mathrm{EI}}(\mu_y(\mathbf{x}), \sigma_y(\mathbf{x}))}{\mu_c(\mathbf{x})^{\rho}}\]

    Here, \((\mu_y, \sigma_y)\) are predictions from the surrogate model for the target function \(y(\mathbf{x})\), whereas \(\mu_c\) are mean predictions for the cost function \(c(\mathbf{x})\). The latter can be represented by a deterministic surrogate model, whose predict method only returns means as “mean”. In fact, the method _output_to_keys_predict specifies which moments are required from each surrogate model.

  • CEIAcquisitionFunction is an acquisition function for constrained HPO:

    \[\alpha_{\mathrm{CEI}}(\mathbf{x}) = \alpha_{\mathrm{EI}}(\mu_y(\mathbf{x}), \sigma_y(\mathbf{x})) \cdot \mathbb{P}(c(\mathbf{x})\le 0).\]

    Here, \(y(\mathbf{x})\) is the target function, \(c(\mathbf{x})\) is the constraint function. Both functions are represented by probabilistic surrogate models, whose predict method returns means and stddevs. We say that \(\mathbf{x}\) is feasible if \(c(\mathbf{x})\le 0\), and the goal is to minimize \(y(\mathbf{x})\) over feasible points.

    One difficulty with this acquisition function is that the incumbent in the EI term is computed only over observations which are feasible (so \(c_i\le 0\)). This means we cannot rely on the surrogate model for \(y(\mathbf{x})\) to provide the incumbent, but instead need to determine the feasible incumbent ourselves, in the _get_current_bests_internal method.

A final complication in MeanStdAcquisitionFunction arises if some or all surrogate models are MCMC ensembles. In such a case, we average over the sample for each surrogate model involved. Inside this sum over the Cartesian product, the incumbent depends on the sample index for each model. This is dealt with by CurrentBestProvider. In the default case for an acquisition function which needs the incumbent (such as, for example, EI), this value depends only on the model for the active (target) metric, and ActiveMetricCurrentBestProvider is used.

Note

Acquisition function implementations are independent of which auto-differentiation mechanism is used under the hood. Different to surrogate models, there is no acquisition function code in syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd. This is because the implementation only needs to provide head gradients in compute_acq_with_gradient, which are easy to derive and compute for common acquisition functions.

Implementing a Covariance Function for GP Surrogate Models

A Gaussian process, modelling a random function \(y(\mathbf{x})\), is defined by a mean function \(\mu(\mathbf{x})\) and a covariance function (or kernel) \(k(\mathbf{x}, \mathbf{x}')\). While Syne Tune contains a number of different covariance functions for multi-fidelity HPO, where learning curves \(y(\mathbf{x}, r)\) are modelled, \(r = 1, 2, \dots\) the number of epochs trained (details are provided here), it currently provides the Matern 5/2 covariance function only for models of \(y(\mathbf{x})\). A few comments up front:

  • Mean and covariance functions are parts of (Gaussian process) surrogate models. For these models, complex gradients are required for different purposes. First, our Bayesian optimization code supports gradient-based minimization of the acquisition function. Second, a surrogate model is fitted to observed data, which is typically done by gradient-based optimization (e.g., marginal likelihood optimization, empirical Bayes) or by gradient-based Markov Chain Monte Carlo (e.g., Hamiltonian Monte Carlo). This means that covariance function code must be written in a framework supporting automatic differentiation. In Syne Tune, this code resides in syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd. It is based on autograd.

  • Covariance functions contain parameters to be fitted to observed data. Kernels in Syne Tune typically feature an overall output scale, as well as inverse bandwidths for the input. In the (so called) automatic relevance determination parameterization, we use one inverse bandwidth per input vector component. This allows the surrogate model to learn relevance to certain input components: if components are not relevant to explain the observed data, their inverse bandwidths can be driven to very small values. Syne Tune uses code extracted from MXNet Gluon for managing parameters. The base class KernelFunction derives from MeanFunction, which derives from Block. The main service of this class is to maintain a parameter dictionary, collecting all parameters in the current objects and its members (recursively).

In order to understand how a new covariance function can be implemented, we will go through the most important parts of Matern52. This covariance function is defined as:

\[k(\mathbf{x}, \mathbf{x}') = c \left( 1 + d + d^2/3 \right) e^{-d}, \quad d = \sqrt{5} \|\mathbf{S} (\mathbf{x} - \mathbf{x}')\|.\]

Its parameters are the output scale \(c > 0\) and the inverse bandwidths \(s_j > 0\), where \(\mathbf{S}\) is the diagonal matrix with diagonal entries \(s_j\). If ARD == False, there is only a single bandwidth parameter \(s > 0\).

First, we need some includes:

bayesopt/gpautograd/kernel/base.py – includes
import autograd.numpy as anp
from autograd.tracer import getval
from typing import Dict, Any

from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.constants import (
    INITIAL_COVARIANCE_SCALE,
    INITIAL_INVERSE_BANDWIDTHS,
    DEFAULT_ENCODING,
    INVERSE_BANDWIDTHS_LOWER_BOUND,
    INVERSE_BANDWIDTHS_UPPER_BOUND,
    COVARIANCE_SCALE_LOWER_BOUND,
    COVARIANCE_SCALE_UPPER_BOUND,
    NUMERICAL_JITTER,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution import (
    Uniform,
    LogNormal,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon import Block
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers import (
    encode_unwrap_parameter,
    register_parameter,
    create_encoding,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean import (
    MeanFunction,
)


Since a number of covariance functions are simple expressions of squared distances \(\|\mathbf{S} (\mathbf{x} - \mathbf{x}')\|^2\), Syne Tune contains a block for this one:

bayesopt/gpautograd/kernel/base.py – SquaredDistance
class SquaredDistance(Block):
    r"""
    Block that is responsible for the computation of matrices of squared
    distances. The distances can possibly be weighted (e.g., ARD
    parametrization). For instance:

    .. math::

       m_{i j} = \sum_{k=1}^d ib_k^2 (x_{1: i k} - x_{2: j k})^2

       \mathbf{X}_1 = [x_{1: i j}],\quad \mathbf{X}_2 = [x_{2: i j}]

    Here, :math:`[ib_k]` is the vector :attr:`inverse_bandwidth`.
    if ``ARD == False``, ``inverse_bandwidths`` is equal to a scalar broadcast to the
    d components (with ``d = dimension``, i.e., the number of features in ``X``).

    :param dimension: Dimensionality :math:`d` of input vectors
    :param ARD: Automatic relevance determination (``inverse_bandwidth`` vector
        of size ``d``)? Defaults to ``False``
    :param encoding_type: Encoding for ``inverse_bandwidth``. Defaults to
        :const:`~syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.constants.DEFAULT_ENCODING`
    """

    def __init__(
        self,
        dimension: int,
        ARD: bool = False,
        encoding_type: str = DEFAULT_ENCODING,
        **kwargs
    ):
        super().__init__(**kwargs)
        self.ARD = ARD
        inverse_bandwidths_dimension = 1 if not ARD else dimension
        self.encoding = create_encoding(
            encoding_type,
            INITIAL_INVERSE_BANDWIDTHS,
            INVERSE_BANDWIDTHS_LOWER_BOUND,
            INVERSE_BANDWIDTHS_UPPER_BOUND,
            inverse_bandwidths_dimension,
            Uniform(INVERSE_BANDWIDTHS_LOWER_BOUND, INVERSE_BANDWIDTHS_UPPER_BOUND),
        )

        with self.name_scope():
            self.inverse_bandwidths_internal = register_parameter(
                self.params,
                "inverse_bandwidths",
                self.encoding,
                shape=(inverse_bandwidths_dimension,),
            )

    def _inverse_bandwidths(self):
        return encode_unwrap_parameter(self.inverse_bandwidths_internal, self.encoding)

    def forward(self, X1, X2):
        """Computes matrix of squared distances

        :param X1: input matrix, shape ``(n1, d)``
        :param X2: input matrix, shape ``(n2, d)``
        """
        # In case inverse_bandwidths if of size (1, dimension), dimension>1,
        # ARD is handled by broadcasting
        inverse_bandwidths = anp.reshape(self._inverse_bandwidths(), (1, -1))

        X1_scaled = anp.multiply(X1, inverse_bandwidths)
        X1_squared_norm = anp.sum(anp.square(X1_scaled), axis=1)
        if X2 is X1:
            D = -2.0 * anp.dot(X1_scaled, anp.transpose(X1_scaled))
            X2_squared_norm = X1_squared_norm
        else:
            X2_scaled = anp.multiply(X2, inverse_bandwidths)
            D = -2.0 * anp.matmul(X1_scaled, anp.transpose(X2_scaled))
            X2_squared_norm = anp.sum(anp.square(X2_scaled), axis=1)
        D = D + anp.reshape(X1_squared_norm, (-1, 1))
        D = D + anp.reshape(X2_squared_norm, (1, -1))

        return anp.abs(D)

    def get_params(self) -> Dict[str, Any]:
        """
        Parameter keys are "inv_bw<k> "if ``dimension > 1``, and "inv_bw" if
        ``dimension == 1``.
        """
        inverse_bandwidths = anp.reshape(self._inverse_bandwidths(), (-1,))
        if inverse_bandwidths.size == 1:
            return {"inv_bw": inverse_bandwidths[0]}
        else:
            return {
                "inv_bw{}".format(k): inverse_bandwidths[k]
                for k in range(inverse_bandwidths.size)
            }

    def set_params(self, param_dict: Dict[str, Any]):
        dimension = self.encoding.dimension
        if dimension == 1:
            inverse_bandwidths = [param_dict["inv_bw"]]
        else:
            keys = ["inv_bw{}".format(k) for k in range(dimension)]
            for k in keys:
                assert k in param_dict, "'{}' not in param_dict = {}".format(
                    k, param_dict
                )
            inverse_bandwidths = [param_dict[k] for k in keys]
        self.encoding.set(self.inverse_bandwidths_internal, inverse_bandwidths)


  • In the constructor, we create a parameter vector for the inverse bandwidths \([s_j]\), which can be just a scalar if ARD == False. In Syne Tune, each parameter has an encoding (e.g., identity or logarithmic), which includes a lower and upper bound, an initial value, as well as a prior distribution. The latter is used for regularization during optimization.

  • The most important method is forward. Given two matrices \(\mathbf{X}_1\), \(\mathbf{X}_2\), whose rows are input vectors, we compute the matrix \([\|\mathbf{x}_{1:i} - \mathbf{x}_{2:j}\|^2]_{i, j}\) of squared distances. Most important, we use anp = autograd.numpy here instead of numpy. These autograd wrappers ensure that automatic differentiation can be used in order to compute gradients w.r.t. leaf nodes in the computation graph spanned by the numpy computations. Also, note the use of encode_unwrap_parameter in _inverse_bandwidths to obtain the inverse bandwidth parameters as numpy array. Finally, note that X1 and X2 can be the same object, in which case we can save compute time and create a smaller computation graph.

  • Each block in Syne Tune also provides get_params and set_params methods, which are used for serialization and deserialization.

Given this code, the implementation of Matern52 is simple:

bayesopt/gpautograd/kernel/base.py – Matern52
class Matern52(KernelFunction):
    """
    Block that is responsible for the computation of Matern 5/2 kernel.

    if ``ARD == False``, ``inverse_bandwidths`` is equal to a scalar broadcast to the
    d components (with ``d = dimension``, i.e., the number of features in ``X``).

    Arguments on top of base class :class:`SquaredDistance`:

    :param has_covariance_scale: Kernel has covariance scale parameter? Defaults
        to ``True``
    """

    def __init__(
        self,
        dimension: int,
        ARD: bool = False,
        encoding_type: str = DEFAULT_ENCODING,
        has_covariance_scale: bool = True,
        **kwargs
    ):
        super(Matern52, self).__init__(dimension, **kwargs)
        self.has_covariance_scale = has_covariance_scale
        self.squared_distance = SquaredDistance(
            dimension=dimension, ARD=ARD, encoding_type=encoding_type
        )
        if has_covariance_scale:
            self.encoding = create_encoding(
                encoding_name=encoding_type,
                init_val=INITIAL_COVARIANCE_SCALE,
                constr_lower=COVARIANCE_SCALE_LOWER_BOUND,
                constr_upper=COVARIANCE_SCALE_UPPER_BOUND,
                dimension=1,
                prior=LogNormal(0.0, 1.0),
            )
            with self.name_scope():
                self.covariance_scale_internal = register_parameter(
                    self.params, "covariance_scale", self.encoding
                )

    @property
    def ARD(self) -> bool:
        return self.squared_distance.ARD

    def _covariance_scale(self):
        if self.has_covariance_scale:
            return encode_unwrap_parameter(
                self.covariance_scale_internal, self.encoding
            )
        else:
            return 1.0

    def forward(self, X1, X2):
        """Computes Matern 5/2 kernel matrix

        :param X1: input matrix, shape ``(n1,d)``
        :param X2: input matrix, shape ``(n2,d)``
        """
        covariance_scale = self._covariance_scale()
        X1 = self._check_input_shape(X1)
        if X2 is not X1:
            X2 = self._check_input_shape(X2)
        D = 5.0 * self.squared_distance(X1, X2)
        # Using the plain np.sqrt is numerically unstable for D ~ 0
        # (non-differentiability)
        # that's why we add NUMERICAL_JITTER
        B = anp.sqrt(D + NUMERICAL_JITTER)
        return anp.multiply((1.0 + B + D / 3.0) * anp.exp(-B), covariance_scale)

    def diagonal(self, X):
        X = self._check_input_shape(X)
        covariance_scale = self._covariance_scale()
        covariance_scale_times_ones = anp.multiply(
            anp.ones((getval(X.shape[0]), 1)), covariance_scale
        )

        return anp.reshape(covariance_scale_times_ones, (-1,))

    def diagonal_depends_on_X(self):
        return False

    def param_encoding_pairs(self):
        result = [
            (
                self.squared_distance.inverse_bandwidths_internal,
                self.squared_distance.encoding,
            )
        ]
        if self.has_covariance_scale:
            result.insert(0, (self.covariance_scale_internal, self.encoding))
        return result

    def get_covariance_scale(self):
        if self.has_covariance_scale:
            return self._covariance_scale()[0]
        else:
            return 1.0

    def set_covariance_scale(self, covariance_scale):
        assert self.has_covariance_scale, "covariance_scale is fixed to 1"
        self.encoding.set(self.covariance_scale_internal, covariance_scale)

    def get_params(self) -> Dict[str, Any]:
        result = self.squared_distance.get_params()
        if self.has_covariance_scale:
            result["covariance_scale"] = self.get_covariance_scale()
        return result

    def set_params(self, param_dict: Dict[str, Any]):
        self.squared_distance.set_params(param_dict)
        if self.has_covariance_scale:
            self.set_covariance_scale(param_dict["covariance_scale"])
  • In the constructor, we create an object of type SquaredDistance. A nice feature of MXNet Gluon blocks is that the parameter dictionary of an object is automatically extended by the dictionaries of members, so we don’t need to cater for that. Beware that this only works for members which are of type Block directly. If you use a list or dictionary containing such objects, you need to include their parameter dictionaries explicitly. Next, we also define a covariance scale parameter \(c > 0\), unless has_covariance_scale == False.

  • forward calls forward of the SquaredDistance object, then computes the kernel matrix, using anp = autograd.numpy once more.

  • diagonal returns the diagonal of the kernel matrix based on a matrix X of inputs. For this particular kernel, the diagonal does not depend on the content of X, but only its shape, which is why diagonal_depends_on_X returns False.

  • Besides get_params and set_params, we also need to implement param_encoding_pairs, which is required by the optimization code used for fitting the surrogate model parameters.

At this point, you should not have any major difficulties implementing a new covariance function, such as the Gaussian kernel or the Matern kernel with parameter 3/2.

The Factory for Gaussian Process Searchers

Once a covariance function (or any other component of a surrogate model) has been added, how is it accessed by a user? In general, all details about the surrogate model are specified in search_options passed to FIFOScheduler or BayesianOptimization. Available options are documented in GPFIFOSearcher. Syne Tune offers a range of searchers based on various Gaussian process surrogate models (e.g., single fidelity, multi-fidelity, constrained, cost-aware). The code to generate all required components for these searchers is bundled in gp_searcher_factory. For each type of searcher, there is a factory function and a defaults function. For BayesianOptimization (which is equivalent to FIFOScheduler with searcher="bayesopt"), we have:

The searcher object is created in searcher_factory(). Finally, search_options are merged with default values, and searcher_factory is called in the constructor of FIFOScheduler. This process keeps things simple for the user, who just has to specify the type of searcher by searcher, and additional arguments by search_options. For any argument not provided there, a sensible default value is used.

Factory and default functions in gp_searcher_factory are based on common code in this module, which reflects the complexity of some of the searchers, but is otherwise self-explanatory. As a continuation of the previous section, suppose we had implemented a novel covariance function to be used in GP-based Bayesian optimization. The user-facing argument to select a kernel is gp_base_kernel, its default value is “matern52-ard” (Matern 5/2 with ARD parameters). Here is the code for creating this covariance function in gp_searcher_factory:

gp_searcher_factory.py
def _create_base_gp_kernel(hp_ranges: HyperparameterRanges, **kwargs) -> KernelFunction:
    """
    The default base kernel is :class:`Matern52` with ARD parameters.
    But in the transfer learning case, the base kernel is a product of
    two ``Matern52`` kernels, the first non-ARD over the categorical
    parameter determining the task, the second ARD over the remaining
    parameters.
    """
    input_warping = kwargs.get("input_warping", False)
    if kwargs.get("transfer_learning_task_attr") is not None:
        if input_warping:
            logger.warning(
                "Cannot use input_warping=True together with transfer_learning_task_attr. Will use input_warping=False"
            )
        # Transfer learning: Specific base kernel
        kernel = create_base_gp_kernel_for_warmstarting(hp_ranges, **kwargs)
    else:
        has_covariance_scale = kwargs.get("has_covariance_scale", True)
        kernel = base_kernel_factory(
            name=kwargs["gp_base_kernel"],
            dimension=hp_ranges.ndarray_size,
            has_covariance_scale=has_covariance_scale,
        )
        if input_warping:
            # Use input warping on all coordinates which do not belong to a
            # categorical hyperparameter
            kernel = kernel_with_warping(kernel, hp_ranges)
            if kwargs.get("debug_log", False) and isinstance(kernel, WarpedKernel):
                ranges = [(warp.lower, warp.upper) for warp in kernel.warpings]
                logger.info(
                    f"Creating base GP covariance kernel with input warping: ranges = {ranges}"
                )
    return kernel


  • Ignoring transfer_learning_task_attr, we first call base_kernel_factory to create the base kernel, passing kwargs["gp_base_kernel"] as its name.

  • Syne Tune also supports warping of the inputs to a kernel, which adds two more parameters for each component (except those coming from categorical hyperparameters, these are not warped).

bayesopt/models/kernel_factory.py
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel import (
    KernelFunction,
    Matern52,
    ExponentialDecayResourcesKernelFunction,
    ExponentialDecayResourcesMeanFunction,
    FreezeThawKernelFunction,
    FreezeThawMeanFunction,
    CrossValidationMeanFunction,
    CrossValidationKernelFunction,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping import (
    WarpedKernel,
    Warping,
)
from syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean import (
    MeanFunction,
)


SUPPORTED_BASE_MODELS = (
    "matern52-ard",
    "matern52-noard",
)


def base_kernel_factory(name: str, dimension: int, **kwargs) -> KernelFunction:
    assert (
        name in SUPPORTED_BASE_MODELS
    ), f"name = {name} not supported. Choose from:\n{SUPPORTED_BASE_MODELS}"
    return Matern52(
        dimension=dimension,
        ARD=name == "matern52-ard",
        has_covariance_scale=kwargs.get("has_covariance_scale", True),
    )


  • base_kernel_factory creates the base kernel, based on its name (must be in SUPPORTED_BASE_MODELS, the dimension of input vectors, as well as further parameters (has_covariance_scale in our example). Currently, Syne Tune only supports the Matern 5/2 kernel, with and without ARD.

  • Had we implemented a novel covariance function, we would have to select a new name, insert it into SUPPORTED_BASE_MODELS, and insert code into base_kernel_factory. Once this is done, the new base kernel can as well be selected as component in multi-fidelity or constrained Bayesian optimization.

Combining a Gaussian Process Model from Components

We have already seen above how to implement a surrogate model from scratch. However, many Gaussian process models proposed in the Bayesian optimization literature are combinations of more basic underlying models. In this section, we show how such combinations are implemented in Syne Tune.

Note

When planning to implement a new Gaussian process model, you should first check whether the outcome is simply a Gaussian process with mean and covariance function arising from combinations of means and kernels of the components. If that is the case, it is often simpler and more efficient to implement a new mean and covariance function using existing code (as shown above), and to use a standard GP model with these functions.

Independent Processes for Multiple Fidelities

In this section, we will look at the example of independent, providing a surrogate model for a set of functions \(y(\mathbf{x}, r)\), where \(r\in \mathcal{R}\) is an integer from a finite set. This model is used in the context of multi-fidelity HPO. Each \(y(\mathbf{x}, r)\) is represented by an independent Gaussian process, with mean function \(\mu_r(\mathbf{x})\) and covariance function \(c_r k(\mathbf{x}, \mathbf{x}')\). The covariance function \(k\) is shared between all the processes, but the scale parameters \(c_r > 0\) are different for each process. In multi-fidelity HPO, we observe more data at smaller resource levels \(r\). Using the same ARD-parameterized kernel for all processes allows to share statistical strenght between the different levels. The code in independent follows a useful pattern:

  • IndependentGPPerResourcePosteriorState: Posterior state, representing the posterior distribution after conditioning on data. This is used (a) to compute the log marginal likelihood for fitting the model parameters, and (b) for predictions driving the acquisition function optimization.

  • IndependentGPPerResourceMarginalLikelihood: Wraps code to generate posterior state, and represents the negative log marginal likelihood function used to fit the model parameters.

  • IndependentGPPerResourceModel: Wraps code for creating the likelihood object. API towards higher level code.

The code of IndependentGPPerResourcePosteriorState is a simple reduction to GaussProcPosteriorState, the posterior state for a basic Gaussian process. For example, here is the code to compute the posterior state:

bayesopt/gpautograd/independent/posterior_state.py
    def _compute_states(
        self,
        features: np.ndarray,
        targets: np.ndarray,
        kernel: KernelFunction,
        mean: Dict[int, MeanFunction],
        covariance_scale: Dict[int, np.ndarray],
        noise_variance: Dict[int, np.ndarray],
        resource_attr_range: Tuple[int, int],
        debug_log: bool = False,
    ):
        features, resources = decode_extended_features(features, resource_attr_range)
        self._states = dict()
        for resource, mean_function in mean.items():
            cov_scale = covariance_scale[resource]
            rows = np.flatnonzero(resources == resource)
            if rows.size > 0:
                r_features = features[rows]
                r_targets = targets[rows]
                self._states[resource] = GaussProcPosteriorState(
                    features=r_features,
                    targets=r_targets,
                    mean=mean_function,
                    kernel=(kernel, cov_scale),
                    noise_variance=noise_variance[resource],
                    debug_log=debug_log,
                )

  • mean and covariance_scale are dictionaries containing \(\mu_r\) and \(c_r\) respectively.

  • features are extended features of the form \((\mathbf{x}_i, r_i)\). The function decode_extended_features maps this to arrays \([\mathbf{x}_i]\) and \([r_i]\).

  • We compute separate posterior states for each level \(r\in\mathcal{R}\), using the data \((\mathbf{x}_i, y_i)\) so that \(r_i = r\).

  • Other methods of the base class PosteriorStateWithSampleJoint are implemented accordingly, reducing computations to the states for each level.

The code of IndependentGPPerResourceMarginalLikelihood is obvious, given the base class MarginalLikelihood. The same holds for IndependentGPPerResourceModel, given the base class GaussianProcessOptimizeModel. One interesting feature is that the creation of the likelihood object is delayed, because the set of rung levels \(\mathcal{R}\) of the multi-fidelity scheduler need to be known. The create_likelihood method is called in configure_scheduler(), a callback function with the scheduler as argument.

Since our independent GP model implements the APIs of MarginalLikelihood and GaussianProcessOptimizeModel, we can plug it into generic code in syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model, which works as outlined above. In particular, the estimator GaussProcEmpiricalBayesEstimator accepts gp_model of type IndependentGPPerResourceModel, and it creates predictors of type GaussProcPredictor.

Overview of gpautograd

Most of the code in syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd adheres to the same pattern (posterior state, likelihood function, model wrapper):

PASHA: Efficient HPO and NAS with Progressive Resource Allocation

Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. PASHA is an approach designed to tackle the challenge of tuning machine learning models trained on large datasets with limited computational resources. PASHA extends ASHA and is able to dynamically allocate maximum resources for the tuning procedure depending on the need. The experimental comparison shows that PASHA identifies well-performing hyperparameter configurations and architectures while consuming significantly fewer computational resources than ASHA.

What is PASHA?

The goal of PASHA is to identify well-performing configurations significantly faster than current methods, so that we can then retrain the model with the selected configuration (in practice on the combined training and validation sets). By giving preference to evaluating more configurations rather than evaluating them for longer than needed, PASHA can lead to significant speedups while achieving similar performance as existing methods.

PASHA is a variant of ASHA that starts with a small amount of initial resources and gradually increases them depending on the stability of configuration rankings in the top two rungs (rounds of promotion). Each time the ranking of configurations in the top two rungs becomes inconsistent, PASHA increases the maximum number of resources. This can be understood as “unlocking” a new rung level. An illustration of how PASHA stops early if the ranking of configurations has stabilized is shown in Figure 1.

_images/pasha_illustration.png

Given that deep-learning algorithms typically rely on stochastic gradient descent, ranking inconsistencies can occur between similarly performing configurations. Hence, we need some benevolence in estimating the ranking. As a solution, PASHA uses a soft-ranking approach where we group configurations based on their validation performance metric (e.g. accuracy).

In soft ranking, configurations are still sorted by predictive performance but they are considered equivalent if the performance difference is smaller than a value \(\epsilon\) (or equal to it). Instead of producing a sorted list of configurations, this provides a list of lists where for every position of the ranking there is a list of equivalent configurations. The concept is explained graphically in Figure 2. The value of \(\epsilon\) is automatically estimated by measuring noise in rankings.

_images/soft_ranking.png

How well does PASHA work?

Experimental evaluation has shown PASHA consistently leads to strong improvements in runtime, while achieving similar accuracies as ASHA. PASHA is e.g. three times faster than ASHA on NASBench201. Full experiments and further details are available in PASHA: Efficient HPO and NAS with Progressive Resource Allocation.

We provide an example script launch_pasha_nasbench201.py that shows how to run an experiment with PASHA on NASBench201.

Recommendations

  • PASHA is particularly useful for large-scale datasets with millions of datapoints, where it can lead to e.g. 15x speedup compared to ASHA.

  • If only a few epochs are used for training, it is useful to define rung levels in terms of the number of datapoints processed rather than the number of epochs. This makes it possible for PASHA to stop the HPO significantly earlier and obtain a large speedup.

  • A suitable stopping criterion for PASHA is the number of configurations that have been evaluated so far, but it can also be evaluated using stopping criteria based on the wallclock time. With time-based criteria PASHA would make an impact when the stopping time is selected as a small value.

Using Syne Tune for Transfer Learning

Transfer learning allows us to speed up our current optimisation by learning from related optimisation runs. For instance, imagine we want to change from a smaller to a larger model. We already have a collection of hyperparameter evaluations for the smaller model. Then we can use these to guide our hyperparameter optimisation of the larger model, for instance by starting with the configuration that performed best. Or imagine that we keep the same model, but add more training data or add another data feature. Then we expect good hyperparameter configurations on the previous training data to work well on the augmented data set as well.

Syne Tune includes implementations of several transfer learning schedulers; a list of available schedulers is given here. In this tutorial we look at three of them:

  • ZeroShotTransfer
    Sequential Model-Free Hyperparameter Tuning.
    Martin Wistuba, Nicolas Schilling, Lars Schmidt-Thieme.
    IEEE International Conference on Data Mining (ICDM) 2015.

    First we calculate the rank of each hyperparameter configuration on each previous task. Then we choose configurations in order to minimise the sum of the ranks across the previous tasks. The idea is to speed up optimisation by picking configurations with high ranks on previous tasks.
  • BoundingBox
    Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning.
    Valerio Perrone, Huibin Shen, Matthias Seeger, Cédric Archambeau, Rodolphe Jenatton.
    NeurIPS 2019.

    We construct a smaller hyperparameter search space by taking the minimum box which contains the optimal configurations for the previous tasks. The idea is to speed up optimisation by not searching areas which have been suboptimal for all previous tasks.
  • Quantiles (quantile_based_searcher)
    A Quantile-based Approach for Hyperparameter Transfer Learning.
    David Salinas, Huibin Shen, Valerio Perrone.
    ICML 2020.

    We map the hyperparameter evaluations to quantiles for each task. Then we learn a distribution of quantiles given hyperparameters. Finally, we sample from the distribution and evaluate the best sample. The idea is to speed up optimisation by searching areas with high-ranking configurations but without enforcing hard limits on the search space.

We compare them to standard BayesianOptimization (BO).

We construct a set of tasks based on the height example. We first collect evaluations on five tasks, and then compare results on the sixth. We consider the single-fidelity case. For each task we assume a budget of 10 (max_trials) evaluations. We use BO on the preliminary tasks, and for the transfer task we compare BO, ZeroShot, BoundingBox and Quantiles. The set of tasks is made by adjusting the max_steps parameter in the height example, but could correspond to adjusting the training data instead.

The code is available here. Make sure to run it as python launch_transfer_learning_example.py --generate_plots if you want to generate the plots locally. The optimisations vary between runs, so your plots might look different.

In order to run our transfer learning schedulers we need to parse the output of the tuner into a dict of TransferLearningTaskEvaluations. We do this in the extract_transferable_evaluations function.

Code to prepare evaluations from previous tasks for transfer learning.
def filter_completed(df):
    # Filter out runs that didn't finish
    return df[df["status"] == "Completed"].reset_index()


def extract_transferable_evaluations(df, metric, config_space):
    """
    Take a dataframe from a tuner run, filter it and generate
    TransferLearningTaskEvaluations from it
    """
    filter_df = filter_completed(df)

    return TransferLearningTaskEvaluations(
        configuration_space=config_space,
        hyperparameters=filter_df[config_space.keys()],
        objectives_names=[metric],
        # objectives_evaluations need to be of shape
        # (num_evals, num_seeds, num_fidelities, num_objectives)
        # We only have one seed, fidelity and objective
        objectives_evaluations=np.array(filter_df[metric], ndmin=4).T,
    )


We start by collecting evaluations by running BayesianOptimization on the five preliminary tasks. We generate the different tasks by setting max_steps=1..5 in the backend in init_scheduler, giving five very similar tasks. Once we have run BO on the task we store the evaluations as TransferLearningTaskEvaluations.

Code to initialise schedulers, use it to optimise a task and collect evaluations on preliminary tasks.
def run_scheduler_on_task(entry_point, scheduler, max_trials):
    """
    Take a scheduler and run it for max_trials on the backend specified by entry_point
    Return a dataframe of the optimisation results
    """
    tuner = Tuner(
        trial_backend=LocalBackend(entry_point=str(entry_point)),
        scheduler=scheduler,
        stop_criterion=StoppingCriterion(max_num_trials_finished=max_trials),
        n_workers=4,
        sleep_time=0.001,
    )
    tuner.run()

    return tuner.tuning_status.get_dataframe()


def init_scheduler(
    scheduler_str, max_steps, seed, mode, metric, transfer_learning_evaluations
):
    """
    Initialise the scheduler
    """
    kwargs = {
        "metric": metric,
        "config_space": height_config_space(max_steps=max_steps),
        "mode": mode,
        "random_seed": seed,
    }
    kwargs_w_trans = copy.deepcopy(kwargs)
    kwargs_w_trans["transfer_learning_evaluations"] = transfer_learning_evaluations

    if scheduler_str == "BayesianOptimization":
        return BayesianOptimization(**kwargs)

    if scheduler_str == "ZeroShotTransfer":
        return ZeroShotTransfer(use_surrogates=True, **kwargs_w_trans)

    if scheduler_str == "Quantiles":
        return FIFOScheduler(
            searcher=QuantileBasedSurrogateSearcher(**kwargs_w_trans),
            **kwargs,
        )

    if scheduler_str == "BoundingBox":
        kwargs_sched_fun = {key: kwargs[key] for key in kwargs if key != "config_space"}
        kwargs_w_trans[
            "scheduler_fun"
        ] = lambda new_config_space, mode, metric: BayesianOptimization(
            new_config_space,
            **kwargs_sched_fun,
        )
        del kwargs_w_trans["random_seed"]
        return BoundingBox(**kwargs_w_trans)
    raise ValueError("scheduler_str not recognised")


if __name__ == "__main__":

    max_trials = 10
    np.random.seed(1)
    # Use train_height backend for our tests
    entry_point = str(
        Path(__file__).parent
        / "training_scripts"
        / "height_example"
        / "train_height.py"
    )

    # Collect evaluations on preliminary tasks
    transfer_learning_evaluations = {}
    for max_steps in range(1, 6):
        scheduler = init_scheduler(
            "BayesianOptimization",
            max_steps=max_steps,
            seed=np.random.randint(100),
            mode=METRIC_MODE,
            metric=METRIC_ATTR,
            transfer_learning_evaluations=None,
        )

        print("Optimising preliminary task %s" % max_steps)
        prev_task = run_scheduler_on_task(entry_point, scheduler, max_trials)

        # Generate TransferLearningTaskEvaluations from previous task
        transfer_learning_evaluations[max_steps] = extract_transferable_evaluations(
            prev_task, METRIC_ATTR, scheduler.config_space
        )

Then we run different schedulers to compare on our transfer task with max_steps=6. For ZeroShotTransfer we set use_surrogates=True, meaning that it uses an XGBoost model to estimate the rank of configurations, as we do not have evaluations of the same configurations on all previous tasks.

Code to run schedulers on transfer task.
    # Collect evaluations on transfer task
    max_steps = 6
    transfer_task_results = {}
    labels = ["BayesianOptimization", "BoundingBox", "ZeroShotTransfer", "Quantiles"]
    for scheduler_str in labels:
        scheduler = init_scheduler(
            scheduler_str,
            max_steps=max_steps,
            seed=max_steps,
            mode=METRIC_MODE,
            metric=METRIC_ATTR,
            transfer_learning_evaluations=transfer_learning_evaluations,
        )
        print("Optimising transfer task using %s" % scheduler_str)
        transfer_task_results[scheduler_str] = run_scheduler_on_task(
            entry_point, scheduler, max_trials
        )

We plot the results on the transfer task. We see that the early performance of the transfer schedulers is much better than standard BO. We only plot the first max_trials results. The transfer task is very similar to the preliminary tasks, so we expect the transfer schedulers to do well. And that is what we see in the plot below.

Plotting helper code.
def add_labels(ax, conf_space, title):
    ax.legend()
    ax.set_xlabel("width")
    ax.set_ylabel("height")
    ax.set_xlim([conf_space["width"].lower - 1, conf_space["width"].upper + 1])
    ax.set_ylim([conf_space["height"].lower - 10, conf_space["height"].upper + 10])
    ax.set_title(title)


def scatter_space_exploration(ax, task_hyps, max_trials, label, color=None):
    ax.scatter(
        task_hyps["width"][:max_trials],
        task_hyps["height"][:max_trials],
        alpha=0.4,
        label=label,
        color=color,
    )


colours = {
    "BayesianOptimization": "C0",
    "BoundingBox": "C1",
    "ZeroShotTransfer": "C2",
    "Quantiles": "C3",
}


def plot_last_task(max_trials, df, label, metric, color):
    max_tr = min(max_trials, len(df))
    plt.scatter(range(max_tr), df[metric][:max_tr], label=label, color=color)
    plt.plot([np.min(df[metric][:ii]) for ii in range(1, max_trials + 1)], color=color)


Code to plot results on transfer task.
    # Optionally generate plots. Defaults to False
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--generate_plots", action="store_true", help="generate optimisation plots."
    )
    args = parser.parse_args()

    if args.generate_plots:
        from syne_tune.try_import import try_import_visual_message

        try:
            import matplotlib.pyplot as plt
        except ImportError:
            print(try_import_visual_message())

        print("Generating optimisation plots.")
        """ Plot the results on the transfer task """
        for label in labels:
            plot_last_task(
                max_trials,
                transfer_task_results[label],
                label=label,
                metric=METRIC_ATTR,
                color=colours[label],
            )
        plt.legend()
        plt.ylabel(METRIC_ATTR)
        plt.xlabel("Iteration")
        plt.title("Transfer task (max_steps=6)")
        plt.savefig("Transfer_task.png", bbox_inches="tight")
_images/Transfer_task.png

We also look at the parts of the search space explored. First by looking at the preliminary tasks.

Code to plot the configurations tried for the preliminary tasks.
        """ Plot the configs tried for the preliminary tasks """
        fig, ax = plt.subplots()
        for key in transfer_learning_evaluations:
            scatter_space_exploration(
                ax,
                transfer_learning_evaluations[key].hyperparameters,
                max_trials,
                "Task %s" % key,
            )
        add_labels(
            ax,
            scheduler.config_space,
            "Explored locations of BO for preliminary tasks",
        )
        plt.savefig("Configs_explored_preliminary.png", bbox_inches="tight")
_images/Configs_explored_preliminary.png

Then we look at the explored search space for the transfer task. For all the transfer methods the first tested point (marked as a square) is closer to the previously explored optima (in black crosses), than for BO which starts by checking the middle of the search space.

Code to plot the configurations tried for the transfer task.
        """ Plot the configs tried for the transfer task """
        fig, ax = plt.subplots()

        # Plot the configs tried by the different schedulers on the transfer task
        for label in labels:
            finished_trials = filter_completed(transfer_task_results[label])
            scatter_space_exploration(
                ax, finished_trials, max_trials, label, color=colours[label]
            )

            # Plot the first config tested as a big square
            ax.scatter(
                finished_trials["width"][0],
                finished_trials["height"][0],
                marker="s",
                color=colours[label],
                s=100,
            )

        # Plot the optima from the preliminary tasks as black crosses
        past_label = "Preliminary optima"
        for key in transfer_learning_evaluations:
            argmin = np.argmin(
                transfer_learning_evaluations[key].objective_values(METRIC_ATTR)[
                    :max_trials, 0, 0
                ]
            )
            ax.scatter(
                transfer_learning_evaluations[key].hyperparameters["width"][argmin],
                transfer_learning_evaluations[key].hyperparameters["height"][argmin],
                color="k",
                marker="x",
                label=past_label,
            )
            past_label = None
        add_labels(ax, scheduler.config_space, "Explored locations for transfer task")
        plt.savefig("Configs_explored_transfer.png", bbox_inches="tight")
_images/Configs_explored_transfer.png

Distributed Hyperparameter Tuning: Finding the Right Model can be Fast and Fun

These sections are part of a tutorial given at the Open Data Science Conference Europe in June 2023. They provide hands-on examples for distributed hyperparameter tuning, as well as links to further details for self-teaching.

Note

The code used in this tutorial is contained in the Syne Tune sources, it is not installed by pip. You can obtain this code by installing Syne Tune from source, but the only code that is needed is in benchmarking.nursery.odsc_tutorial. You also need have access to AWS SageMaker, and work through these setups.

Getting Started with Hyperparameter Tuning

In this section, you will learn what is needed to get hyperparameter tuning up and running. We will look at an example where a deep learning language model is trained on natural language text.

What is Hyperparameter Tuning?

When solving a business problem with machine learning, there are parts which can be automated by spending compute resources, and other parts require human expert attention and choices to be made. By automating some of the more tedious parts of the latter, hyperparameter tuning shifts the needle between these cost factors. Like any other smart tool, it saves you time to concentrate on where your strengths really lie, and where you can create the most value.

At a high level, hyperparameter tuning finds configurations of a system which optimize a target metric (or several ones, as we will see later). We can try any configuration from a configuration space, but each evaluation of the system has a cost and takes time. The main challenge of hyperparameter tuning is to run as few trials as possible, so that total costs are minimal. Also, if possible, trials should be run in parallel, so that the total experiment time is minimal.

In this tutorial, we will mostly be focussed on making decisions and tuning free parameters in the context of training machine learning models on data, so their predictions can be used as part of a solution to a business problem. There are many other steps between the initial need and a deployed solution, such as understanding business requirements, collecting, cleaning and labeling data, monitoring and maintenance. Some of these can be addressed with automated tuning as well, others need different tools.

A common paradigm for decision-making and parameter tuning is to try a number of different configurations and select the best in the end.

  • A trial consists of training a model on a part of the data (the training data). Here, training is an automated process (for example, stochastic gradient descent on weight and biases of a neural network model), given a configuration (e.g., what learning rate is used, what batch size, etc.). Then, the trained model is evaluated on another part of the data (validation data, disjoint from training data), giving rise to a quality metric (e.g., validation error, AUC, F1), or even several ones. For small datasets, we can also use cross-validation, by repeating training and evaluation on a number of different splits, reporting the average of validation metrics.

  • This metric value (or values) is the response of the system to a configuration. Note that the response is stochastic: if we run again with the same configuration, we may get a different value. This is because training has random elements (e.g., initial weights are sampled, ordering of training data).

Enough high level and definitions, let us dive into an example.

Annotating a Training Script

First, we need a script to execute a trial, by training a model and evaluating it. Since training models is bread and butter to machine learners, you will have no problem to come up with one. We start with an example: training_script_report_end.py. Ignoring the boilerplate, here are the important parts. First, we define the hyperparameters which should be optimized over:

transformer_wikitext2/code/training_script_report_end.py – hyperparameters
from syne_tune import Reporter
from syne_tune.config_space import randint, uniform, loguniform, add_to_argparse


METRIC_NAME = "val_loss"

MAX_RESOURCE_ATTR = "epochs"


_config_space = {
    "lr": loguniform(1e-6, 1e-3),
    "dropout": uniform(0, 0.99),
    "batch_size": randint(16, 48),
    "momentum": uniform(0, 0.99),
    "clip": uniform(0, 1),
}


  • The keys of _config_space are the hyperparameters we would like to tune (lr, dropout, batch_size, momentum, clip). It also defines their ranges and datatypes, we come back to this below.

  • METRIC_NAME is the name of the target metric returned, MAX_RESOURCE_ATTR the key name for how many epochs to train.

Next, here is the function which executes a trial:

transformer_wikitext2/code/training_script_report_end.py – objective
def objective(config):
    torch.manual_seed(config["seed"])
    use_cuda = config["use_cuda"]
    if torch.cuda.is_available() and not use_cuda:
        print("WARNING: You have a CUDA device, so you should run with --use-cuda 1")
    device = torch.device("cuda" if use_cuda else "cpu")
    # [1]
    # Download data, setup data loaders
    corpus = download_dataset(config)
    ntokens = len(corpus.dictionary)
    train_data = batchify(corpus.train, bsz=config["batch_size"], device=device)
    valid_data = batchify(corpus.valid, bsz=10, device=device)
    # Used for reporting metrics to Syne Tune
    report = Reporter()
    # [2]
    # Create model and optimizer
    model, optimizer, criterion = create_training_objects(config, ntokens, device)
    # [3]
    for epoch in range(1, config[MAX_RESOURCE_ATTR] + 1):
        train(model, train_data, optimizer, criterion, config, ntokens, epoch)
    # [4]
    # Report validation loss back to Syne Tune
    val_loss = evaluate(model, valid_data, criterion, config, ntokens)
    report(**{METRIC_NAME: val_loss})
  • The input config to objective is a configuration dictionary, containing values for the hyperparameters and other fixed parameters (such as the number of epochs to train).

  • [1] We start with downloading training and validation data. The training data loader train_data depends on hyperparameter config["batch_size"].

  • [2] Next, we create model and optimizer. This depends on the remaining hyperparameters in config.

  • [3] We then run config[MAX_RESOURCE_ATTR] epochs of training.

  • [4] Finally, we compute the error on the validation data and report it back to Syne Tune. The latter is done by creating report of type Reporter and calling it with a dictionary, using METRIC_NAME as key.

Finally, the script needs some command line arguments:

transformer_wikitext2/code/training_script_report_end.py – command line arguments
    parser = argparse.ArgumentParser(
        description="PyTorch Wikitext-2 Transformer Language Model",
        formatter_class=argparse.RawTextHelpFormatter,
    )
    parser.add_argument(
        "--" + MAX_RESOURCE_ATTR, type=int, default=40, help="upper epoch limit"
    )
    parser.add_argument("--use_cuda", type=int, default=1)
    parser.add_argument(
        "--input_data_dir",
        type=str,
        default="./",
        help="location of the data corpus",
    )
    parser.add_argument(
        "--optimizer_name", type=str, default="sgd", choices=["sgd", "adam"]
    )
    parser.add_argument("--bptt", type=int, default=35, help="sequence length")
    parser.add_argument("--seed", type=int, default=1111, help="random seed")
    parser.add_argument(
        "--precision", type=str, default="float", help="float | double | half"
    )
    parser.add_argument(
        "--log_interval",
        type=int,
        default=200,
        help="report interval",
    )
    parser.add_argument("--d_model", type=int, default=256, help="width of the model")
    parser.add_argument(
        "--ffn_ratio", type=int, default=1, help="the ratio of d_ffn to d_model"
    )
    parser.add_argument("--nlayers", type=int, default=2, help="number of layers")
    parser.add_argument(
        "--nhead",
        type=int,
        default=2,
        help="the number of heads in the encoder/decoder of the transformer model",
    )
    add_to_argparse(parser, _config_space)

    args, _ = parser.parse_known_args()
    args.use_cuda = bool(args.use_cuda)

    objective(config=vars(args))
  • We use an argument parser parser. Hyperparameters can be added by add_to_argparse(parser, _config_space), given the configuration space is defined in this script, or otherwise you can do this manually. We also need some more inputs, which are not hyperparameters, for example MAX_RESOURCE_ATTR.

You can also provide the input to a training script as JSON file.

Compared to a vanilla training script, we only added two lines, creating report and calling it for reporting the validation error at the end.

Choosing a Configuration Space

Apart from annotating a training script, making hyperparameters explicit as inputs, you also need to define a configuration space. In our example, we add this definition to the script, but you can also keep it separate and use the same training script with different configuration spaces:

transformer_wikitext2/code/training_script_report_end.py – configuration space
_config_space = {
    "lr": loguniform(1e-6, 1e-3),
    "dropout": uniform(0, 0.99),
    "batch_size": randint(16, 48),
    "momentum": uniform(0, 0.99),
    "clip": uniform(0, 1),
}


  • Each hyperparameters gets assigned a data type and a range. In this example, batch_size is an integer, while lr, dropout, momentum, clip are floats. lr is encoded in log scale.

Syne Tune provides a range of data types. Choosing them well requires a bit of attention, guidelines are given here.

Specifying Default Values

Once you have annotated your training script and chosen a configuration space, you have specified all the input Syne Tune needs. You can now specify the details about your tuning experiment in code, as discussed here. However, Syne Tune provides some tooling in syne_tune.experiments which makes the life of most users easier, and we will use this tooling in the rest of the tutorial. To this end, we need to define some defaults about how experiments are to be run (most of these can be overwritten by command line arguments):

transformer_wikitext2/code/transformer_wikitext2_definition.py
from pathlib import Path

from transformer_wikitext2.code.training_script import (
    _config_space,
    METRIC_NAME,
    RESOURCE_ATTR,
    MAX_RESOURCE_ATTR,
)
from syne_tune.experiments.benchmark_definitions.common import RealBenchmarkDefinition
from syne_tune.remote.constants import (
    DEFAULT_GPU_INSTANCE_1GPU,
    DEFAULT_GPU_INSTANCE_4GPU,
)


def transformer_wikitext2_benchmark(sagemaker_backend: bool = False, **kwargs):
    if sagemaker_backend:
        instance_type = DEFAULT_GPU_INSTANCE_1GPU
    else:
        # For local backend, GPU cores serve different workers
        instance_type = DEFAULT_GPU_INSTANCE_4GPU
    fixed_parameters = dict(
        **{MAX_RESOURCE_ATTR: 40},
        d_model=256,
        ffn_ratio=1,
        nlayers=2,
        nhead=2,
        bptt=35,
        optimizer_name="sgd",
        input_data_dir="./",
        use_cuda=1,
        seed=1111,
        precision="float",
        log_interval=200,
    )
    config_space = {**_config_space, **fixed_parameters}
    _kwargs = dict(
        script=Path(__file__).parent / "training_script.py",
        config_space=config_space,
        metric=METRIC_NAME,
        mode="min",
        max_resource_attr=MAX_RESOURCE_ATTR,
        resource_attr=RESOURCE_ATTR,
        max_wallclock_time=5 * 3600,
        n_workers=4,
        instance_type=instance_type,
        framework="PyTorch",
    )
    _kwargs.update(kwargs)
    return RealBenchmarkDefinition(**_kwargs)

All you need to do is to provide a function (transformer_wikitext2_benchmark here) which returns an instance of RealBenchmarkDefinition. The most important fields are:

  • script: Filename of training script.

  • config_space: The configuration space to be used by default. This consists of two parts. First, the hyperparameters from _config, already discussed above. Second, fixed_parameters are passed to each trial as they are. In particular, we would like to train for 40 epochs, so pass {MAX_RESOURCE_ATTR: 40}.

  • metric, max_resource_attr, resource_attr: Names of inputs to and metrics reported from the training script. If mode == "max", the target metric metric is maximized, if mode == "min", it is minimized.

  • max_wallclock_time: Wallclock time the experiment is going to run (5 hours in our example).

  • n_workers: Maximum number of trials which run in parallel (4 in our example). The achievable degree of parallelism may be lower, depending on which execution backend is used and which hardware instance we run on.

Also, note the role of **kwargs in the function signature, which allows to overwrite any of the default values (e.g., for max_wallclock_time, n_workers, or instance_type) with command line arguments.

Note

In the Syne Tune experimentation framework, a tuning problem (i.e., training and evaluation script together with defaults) is called a benchmark. This terminology is used even if the goal of experimentation is not benchmarking (i.e., comparing different HPO methods), as is the case in this tutorial here.

Multi-Fidelity Hyperparameter Tuning

In our example above, a transformer language model is trained for 40 epochs before being validated. If a configuration performs poorly, we should find out earlier, and a lot of time could be saved by stopping poorly performing trials early. This is what multi-fidelity HPO methods are doing. There are different variants:

  • Early stopping (“stopping” type): Trials are not just validated after 40 epochs, but at the end of every epoch. If a trial is performing worse than many others trained for the same number of epochs, it is stopped early.

  • Pause and resume (“promotion” type): Trials are generally paused at the end of certain epochs, called rungs. A paused trial gets promoted (i.e., its training is resumed) if it does better than a majority of trials who reached the same rung.

Syne Tune provides a large number of multi-fidelity HPO methods, more details are given in this tutorial. In this section, you learn what needs to be done to support multi-fidelity hyperparameter tuning.

Annotating a Training Script for Multi-fidelity Tuning

Clearly, the training script training_script_report_end.py won’t do for multi-fidelity tuning. These methods need to know validation errors of models after each epoch of training, while the script above only validates the model at the end, after 40 epochs of training. A small modification of our training script, training_script_no_checkpoints.py, enables multi-fidelity tuning. The relevant part is this:

transformer_wikitext2/code/training_script_no_checkpoints.py – objective
def objective(config):
    torch.manual_seed(config["seed"])
    use_cuda = config["use_cuda"]
    if torch.cuda.is_available() and not use_cuda:
        print("WARNING: You have a CUDA device, so you should run with --use-cuda 1")
    device = torch.device("cuda" if use_cuda else "cpu")
    # Download data, setup data loaders
    corpus = download_dataset(config)
    ntokens = len(corpus.dictionary)
    train_data = batchify(corpus.train, bsz=config["batch_size"], device=device)
    valid_data = batchify(corpus.valid, bsz=10, device=device)
    # Used for reporting metrics to Syne Tune
    report = Reporter()
    # Create model and optimizer
    model, optimizer, criterion = create_training_objects(config, ntokens, device)

    for epoch in range(1, config[MAX_RESOURCE_ATTR] + 1):
        train(model, train_data, optimizer, criterion, config, ntokens, epoch)
        val_loss = evaluate(model, valid_data, criterion, config, ntokens)
        print("-" * 89)
        print(
            f"| end of epoch {epoch:3d} | valid loss {val_loss:5.2f} | "
            f"valid ppl {np.exp(val_loss):8.2f}"
        )
        print("-" * 89)
        # Report validation loss back to Syne Tune
        report(**{RESOURCE_ATTR: epoch, METRIC_NAME: val_loss})

Instead of calling report only once, at the end, we evaluate the model and report back at the end of each epoch. We also need to report the number of epochs done, using RESOURCE_ATTR as key. The execution backend receives these reports and relays them to the HPO method, which in turn makes a decision whether the trial may continue or should be stopped.

Checkpointing

Instead of stopping underperforming trials, some multi-fidelity methods rather pause trials. Any paused trial can be resumed in the future if there is evidence that it outperforms the majority of other trials. If training is very expensive, pause-and-resume scheduling can work better than early stopping, because any pause decision can be revisited in the future, while a stopping decision is final. Moreover, pause-and-resume scheduling does not require trials to be stopped, which can carry delays in some execution backends.

However, pause-and-resume scheduling needs checkpointing in order to work well. Once a trial is paused, its mutable state is stored in disk. When a trial gets resumed, this state is loaded from disk, and training can resume exactly from where it stopped.

Checkpointing needs to be implemented as part of the training script. Fortunately, Syne Tune provides some tooling to simplify this. Another modification of our training script, training_script.py, enables checkpointing. The relevant part is this:

transformer_wikitext2/code/training_script.py – objective
def objective(config):
    torch.manual_seed(config["seed"])
    use_cuda = config["use_cuda"]
    if torch.cuda.is_available() and not use_cuda:
        print("WARNING: You have a CUDA device, so you should run with --use-cuda 1")
    device = torch.device("cuda" if use_cuda else "cpu")
    # Download data, setup data loaders
    corpus = download_dataset(config)
    ntokens = len(corpus.dictionary)
    train_data = batchify(corpus.train, bsz=config["batch_size"], device=device)
    valid_data = batchify(corpus.valid, bsz=10, device=device)
    # Used for reporting metrics to Syne Tune
    report = Reporter()
    # Create model and optimizer
    model, optimizer, criterion = create_training_objects(config, ntokens, device)
    # [3]
    # Checkpointing
    state_dict_objects = {
        "model": model,
        "optimizer": optimizer,
    }
    if config["precision"] == "half":
        state_dict_objects["amp"] = amp
    load_model_fn, save_model_fn = pytorch_load_save_functions(
        state_dict_objects=state_dict_objects,
    )
    # [2]
    # Resume from checkpoint
    resume_from = resume_from_checkpointed_model(config, load_model_fn)

    for epoch in range(resume_from + 1, config[MAX_RESOURCE_ATTR] + 1):
        train(model, train_data, optimizer, criterion, config, ntokens, epoch)
        val_loss = evaluate(model, valid_data, criterion, config, ntokens)
        print("-" * 89)
        print(
            f"| end of epoch {epoch:3d} | valid loss {val_loss:5.2f} | "
            f"valid ppl {np.exp(val_loss):8.2f}"
        )
        print("-" * 89)
        # [1]
        # Write checkpoint
        checkpoint_model_at_rung_level(config, save_model_fn, epoch)
        # Report validation loss back to Syne Tune
        report(**{RESOURCE_ATTR: epoch, METRIC_NAME: val_loss})

Full details about supporting checkpointing are given in this tutorial. In a nutshell:

  • [1] Checkpoints have to be written at the end of each epoch, to a path passed as command line argument. A checkpoint needs to include the epoch number when it was written.

  • [2] Before the training loop starts, a checkpoint should be loaded from the same place. If one is found, the training loop skips all epochs already done. If not, it starts from scratch as usual.

  • [3] Syne Tune provides some checkpointing tooling for PyTorch models.

At this point, we have a final version, training_script.py, of our training script, which can be used with all HPO methods in Syne Tune. While earlier versions are simpler to implement, we recommend to include reporting and checkpointing after every epoch in any training script you care about. When checkpoints become very large, you may run into problems with disk space, which can be dealt with as described here.

Note

The pause-and-resume HPO methods in Syne Tune also work if checkpointing is not implemented. However, this means that training for a trial to be resumed in fact starts from scratch. The additional overhead makes running these methods less attractive. We strongly recommend to implement checkpointing.

Comparing Different HPO Methods

We have learned about different methods for hyperparameter tuning:

  • RandomSearch: Sample configurations at random

  • BayesianOptimization: Learn how to best sample by probabilistic modeling of past observations

  • ASHA: Compare running trials with each other after certain numbers of epochs and stop those which underperform

  • MOBSTER: Combine early stopping from ASHA with informed sampling from BayesianOptimization

How do these methods compare when applied to our transformer_wikitext2 tuning problem? In this section, we look at comparative plots which can easily be generated with Syne Tune.

Note

Besides MOBSTER, Syne Tune provides a number of additional state-of-the-art model-based variants of ASHA, such as HyperTune or DyHPO. Moreover, these methods can be configured in many ways, see this tutorial.

A Comparative Study

It is easy to compare different setups with each other in Syne Tune, be it a number of HPO methods, or the same method on different variations, such as different number of workers, or different configuration spaces. First, we specify which methods to compare with each other:

transformer_wikitext2/baselines.py
from syne_tune.experiments.default_baselines import (
    RandomSearch,
    BayesianOptimization,
    ASHA,
    MOBSTER,
)


class Methods:
    RS = "RS"
    BO = "BO"
    ASHA = "ASHA"
    MOBSTER = "MOBSTER"


methods = {
    Methods.RS: lambda method_arguments: RandomSearch(method_arguments),
    Methods.BO: lambda method_arguments: BayesianOptimization(method_arguments),
    Methods.ASHA: lambda method_arguments: ASHA(method_arguments, type="promotion"),
    Methods.MOBSTER: lambda method_arguments: MOBSTER(
        method_arguments, type="promotion"
    ),
}

We compare random search (RS), Bayesian Optimization (BO), ASHA (ASHA), and MOBSTER (MOBSTER), deviating from the defaults for each method only in that we use the promotion (or pause-and-resume) variant of the latter two. Next, we specify which baselines we would like to consider in our study:

transformer_wikitext2/benchmark_definitions.py
from typing import Dict

from syne_tune.experiments.benchmark_definitions import RealBenchmarkDefinition
from transformer_wikitext2.code.transformer_wikitext2_definition import (
    transformer_wikitext2_benchmark,
)


def benchmark_definitions(
    sagemaker_backend: bool = False, **kwargs
) -> Dict[str, RealBenchmarkDefinition]:
    return {
        "transformer_wikitext2": transformer_wikitext2_benchmark(
            sagemaker_backend=sagemaker_backend, **kwargs
        ),
    }

The only benchmark we consider in this study is our transformer_wikitext2 tuning problem, with its default configuration space (in general, many benchmarks can be selected from benchmarking.benchmark_definitions.real_benchmark_definitions.real_benchmark_definitions()). Our study has the following properties:

  • We use LocalBackend as execution backend, which runs n_workers=4 trials as parallel processes. The AWS instance type is instance_type="ml.g4dn.12xlarge", which provides 4 GPUs, one for each worker.

  • We repeat each experiment 10 times with different random seeds, so that all in all, we run 40 experiments (4 methods, 10 seeds).

These details are specified in scripts hpo_main.py and launch_remote.py, which we will discuss in more detail in Module 2, along with the choice of the execution backend. Once all experiments have finished (if all of them are run in parallel, this takes a little more than max_wallclock_time, or 5 hours), we can visualize results.

Local transformer_wikitext2

Comparison of methods on transformer_wikitext2 benchmark, using the local backend with 4 workers.

We can clearly see the benefits coming both from Bayesian optimization (intelligent rather than random sampling) and multi-fidelity scheduling. A combination of the two, MOBSTER, provides both a rapid initial decrease and the best performance after 5 hours.

Launching Experiments Remotely

As a machine learning practitioner, you operate in a highly competitive landscape. Your success depends to a large extent on whether you can decrease the time to the next decision. In this section, we discuss one important approach, namely how to increase the number of experiments run in parallel.

Note

Imports in our scripts are absolute against the root package transformer_wikitext2, so that only the code in benchmarking.nursery.odsc_tutorial has to be present. In order to run them, you need to append <abspath>/odsc_tutorial/ to the PYTHONPATH environment variable. This is required even if you have installed Syne Tune from source.

Launching our Study

Here is how we specified and ran experiments of our study. First, we specify a script for launching experiments locally:

transformer_wikitext2/local/hpo_main.py
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.hpo_main_local import main


if __name__ == "__main__":
    main(methods, benchmark_definitions)

This is very simple, as most work is done by the generic syne_tune.experiments.launchers.hpo_main_local.main(). Note that hpo_main_local needs to be chosen, since we use the local backend.

This local launcher script can be used to configure your experiment, given additional command line arguments, as is explained in detail here.

You can use hpo_main.py to launch experiments locally, but they’ll run sequentially, one after the other, and you need to have all dependencies installed locally. A second script is needed in order to launch many experiments in parallel:

transformer_wikitext2/local/launch_remote.py
from pathlib import Path

from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.launch_remote_local import launch_remote


if __name__ == "__main__":
    entry_point = Path(__file__).parent / "hpo_main.py"
    source_dependencies = [str(Path(__file__).parent.parent)]
    launch_remote(
        entry_point=entry_point,
        methods=methods,
        benchmark_definitions=benchmark_definitions,
        source_dependencies=source_dependencies,
    )

Once more, all the hard work in done in syne_tune.experiments.launchers.launch_remote_local.launch_remote(), where launch_remote_local needs to be chosen for the local backend. Most important is that our previous hpo_main.py is specified as entry_point here. Here is the command to run all experiments of our study in parallel (replace ... by the absolute path to odsc_tutorial):

export PYTHONPATH="${PYTHONPATH}:/.../odsc_tutorial/"
python transformer_wikitext2/local/launch_remote.py \
  --experiment_tag odsc-1 --benchmark transformer_wikitext2 --num_seeds 10
  • This command launches 40 SageMaker training jobs, running 10 random repetitions (seeds) for each of the 4 methods specified in baselines.py.

  • Each SageMaker training job uses one ml.g4dn.12xlarge AWS instance. You can only run all 40 jobs in parallel if your resource limit for this instance type is 40 or larger. Each training job will run a little longer than 5 hours, as specified by max_wallclock_time.

  • You can use --instance_type and --max_wallclock_time command line arguments to change these defaults. However, if you choose an instance type with less than 4 GPUs, the local backend will not be able to run 4 trials in parallel.

  • If benchmark_definitions.py defines a single benchmark only, the --benchmark argument can also be dropped.

When using remote launching, results of your experiments are written to S3, to the default bucket for your AWS account. Once all jobs have finished (which takes a little more than 5 hours if you have sufficient limits, and otherwise longer), you can create the comparative plot shown above, using this script:

transformer_wikitext2/local/plot_results.py
from typing import Dict, Any, Optional
import logging

from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments import ComparativeResults, PlotParameters


SETUPS = list(methods.keys())


def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
    return metadata["algorithm"]


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    experiment_names = ("odsc-1",)
    num_runs = 10
    download_from_s3 = False  # Set ``True`` in order to download files from S3
    # Plot parameters across all benchmarks
    plot_params = PlotParameters(
        xlabel="wall-clock time",
        aggregate_mode="iqm_bootstrap",
        grid=True,
    )
    # The creation of ``results`` downloads files from S3 (only if
    # ``download_from_s3 == True``), reads the metadata and creates an inverse
    # index. If any result files are missing, or there are too many of them,
    # warning messages are printed
    results = ComparativeResults(
        experiment_names=experiment_names,
        setups=SETUPS,
        num_runs=num_runs,
        metadata_to_setup=metadata_to_setup,
        plot_params=plot_params,
        download_from_s3=download_from_s3,
    )
    # Create comparative plot (single panel)
    benchmark_name = "transformer_wikitext2"
    benchmark = benchmark_definitions(sagemaker_backend=False)[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
        ylim=(5, 8),
    )
    results.plot(
        benchmark_name=benchmark_name,
        plot_params=plot_params,
        file_name=f"./odsc-comparison-local-{benchmark_name}.png",
    )

For details about visualization of results in Syne Tune, please consider this tutorial. In a nutshell, this is what happens:

  • Collect and filter results from all experiments of a study

  • Group them according to setup (HPO method here), aggregate over seeds

  • Create plot in which each setup is represented by a curve and confidence bars

Distributed Tuning

The second approach to shorten the time to the next decision is to decrease the time per experiment. This can be done, to some extent, by increasing the number of workers, i.e. the number of trials which are run in parallel. In this section, we show how this can be done.

Note

Imports in our scripts are absolute against the root package transformer_wikitext2, so that only the code in benchmarking.nursery.odsc_tutorial has to be present. In order to run them, you need to append <abspath>/odsc_tutorial/ to the PYTHONPATH environment variable. This is required even if you have installed Syne Tune from source.

Comparing Different Numbers of Workers

Our study above was done with 4 workers. With the local backend, an experiment with all its workers runs on a single instance. We need to select an instance type with at least 4 GPUs, and each training script can use one of them only.

Syne Tune provides another backend, SageMakerBackend, which executes each trial as a separate SageMaker training job. This allows you to decouple the number of workers from the instance type. In fact, for this backend, the default instance type for our benchmark is ml.g4dn.xlarge, which has a single GPU and is cheaper to run than ml.g4dn.12xlarge we used with the local backend above.

In order to showcase the SageMaker backend, we run a second study in order to compare our 4 methods RS, BO, ASHA, and MOBSTER using a variable number of workers (2, 4, 8). Here, max_wallclock_time is 5 hours for 4, 8 workers, but double that (10 hours) for 2 workers. Using the SageMaker backend instead of the local one only requires a minimal change in the launcher scripts:

transformer_wikitext2/sagemaker/hpo_main.py
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.hpo_main_sagemaker import main


if __name__ == "__main__":
    main(methods, benchmark_definitions)
transformer_wikitext2/sagemaker/launch_remote.py
from pathlib import Path

from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.launch_remote_sagemaker import launch_remote


if __name__ == "__main__":
    entry_point = Path(__file__).parent / "hpo_main.py"
    source_dependencies = [str(Path(__file__).parent.parent)]
    launch_remote(
        entry_point=entry_point,
        methods=methods,
        benchmark_definitions=benchmark_definitions,
        source_dependencies=source_dependencies,
    )

We import from hpo_main_sagemaker and launch_remote_sagemaker instead of hpo_main_local and launch_remote_local. Here is how the experiments are launched (replace ... by the absolute path to odsc_tutorial):

export PYTHONPATH="${PYTHONPATH}:/.../odsc_tutorial/"
python benchmarking/nursery/odsc_tutorial/transformer_wikitext2/sagemaker/launch_remote.py \
  --experiment_tag tmlr-10 --benchmark transformer_wikitext2 \
  --random_seed 2938702734 --scale_max_wallclock_time 1 \
  --num_seeds 5 --n_workers <n-workers>

Here, <n_workers> is 2, 4, 8 respectively.

  • We run 5 random repetitions (seeds), therefore 20 experiments per value of <n_workers>.

  • Running the experiments for <n_workers> requires a resource limit larger or equal to <n_workers> * 20 for instance type ml.g4dn.xlarge. If your limit is less than this, you should launch fewer experiments in parallel, since otherwise most of the experiments will not be able to use <n_workers> workers.

  • With --scale_max_wallclock_time 1, we adjust max_wallclock_time if n_workers is smaller than the default value (4) for our benchmark. In our example, the case --n_workers 2 runs for 10 hours instead of 5.

Once all experiments are finished, with results written to S3, we can create a plot comparing the performance across different numbers of workers, using the following script:

transformer_wikitext2/sagemaker/plot_results.py
from typing import Dict, Any, Optional
import logging

from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments import ComparativeResults, PlotParameters, SubplotParameters


TMLR10_SETUPS = [
    "2 workers",
    "4 workers",
    "8 workers",
]


def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
    return f"{metadata['n_workers']} workers"


TMLR10_METHOD_TO_SUBPLOT = {
    "RS": 0,
    "BO": 1,
    "ASHA": 2,
    "MOBSTER": 3,
}


def metadata_to_subplot(metadata: dict) -> Optional[int]:
    return TMLR10_METHOD_TO_SUBPLOT[metadata["algorithm"]]


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    experiment_names = ("tmlr-10",)
    num_runs = 5
    download_from_s3 = False  # Set ``True`` in order to download files from S3
    # Plot parameters across all benchmarks
    plot_params = PlotParameters(
        xlabel="wall-clock time",
        aggregate_mode="iqm_bootstrap",
        grid=True,
    )
    # We would like to have 4 subfigures, one for each method
    plot_params.subplots = SubplotParameters(
        nrows=2,
        ncols=2,
        kwargs=dict(sharex="all", sharey="all"),
        titles=["RS", "BO", "ASHA", "MOBSTER"],
        title_each_figure=True,
        legend_no=[0],
    )
    # The creation of ``results`` downloads files from S3 (only if
    # ``download_from_s3 == True``), reads the metadata and creates an inverse
    # index. If any result files are missing, or there are too many of them,
    # warning messages are printed
    results = ComparativeResults(
        experiment_names=experiment_names,
        setups=TMLR10_SETUPS,
        num_runs=num_runs,
        metadata_to_setup=metadata_to_setup,
        plot_params=plot_params,
        metadata_to_subplot=metadata_to_subplot,
        download_from_s3=download_from_s3,
    )
    # Create comparative plot (single panel)
    benchmark_name = "transformer_wikitext2"
    benchmark = benchmark_definitions(sagemaker_backend=True)[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
        ylim=(5, 8),
    )
    results.plot(
        benchmark_name=benchmark_name,
        plot_params=plot_params,
        file_name=f"./odsc-comparison-sagemaker-{benchmark_name}.png",
    )

For details about visualization of results in Syne Tune, please consider this tutorial. In a nutshell:

  • Different to the plot above, we have four subplots here, one for each method. In each subplot, we compare results for different numbers of workers.

  • metadata_to_subplot configures grouping w.r.t. subplot (depends on method), while metadata_to_setup configures grouping w.r.t. each curve shown in each subplot (depends on n_workers).

Here is the plot:

SageMaker transformer_wikitext2

Comparison of methods on transformer_wikitext2 benchmark, using the SageMaker backend with 2, 4, 8 workers.

  • In general, we obtain good results faster with more workers. However, especially for BO and MOBSTER, the improvements are less pronounced than one might expect.

  • Our results counter a common misconception, that as we go to higher degrees of parallelization of trials, the internals of the HPO method do not matter anymore, and one might as well use random search. This is certainly not the case for our problem, where BO with 2 workers attains a better performance after 5 hours than RS with 8 workers, at a quarter of the cost.

Drilling Down on Performance Differences

Often, we would like to gain an understanding about why one method performs better than another on a given problem. In this section, we show another type of visualization which can shed some light on this question.

Plotting Learning Curves per Trial

A useful step towards understanding performance differences between setups is to look at the learning curves of trials. Here is a script for creating such plots for the methods compared in our study:

transformer_wikitext2/local/plot_learning_curves.py
from typing import Dict, Any, Optional
import logging

from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments import (
    TrialsOfExperimentResults,
    PlotParameters,
    MultiFidelityParameters,
    SubplotParameters,
)


SETUPS = list(methods.keys())


def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
    return metadata["algorithm"]


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    experiment_names = ("odsc-1",)
    seed_to_plot = 0
    download_from_s3 = False  # Set ``True`` in order to download files from S3

    # Plot parameters across all benchmarks
    plot_params = PlotParameters(
        xlabel="wall-clock time",
        grid=True,
    )
    # We need to provide details about rung levels of the multi-fidelity methods.
    # Also, all methods compared are pause-and-resume
    multi_fidelity_params = MultiFidelityParameters(
        rung_levels=[1, 3, 9, 27, 40],
        multifidelity_setups={"ASHA": True, "MOBSTER": True},
    )
    # We would like to have 4 subfigures, one for each method
    plot_params.subplots = SubplotParameters(
        nrows=2,
        ncols=2,
        kwargs=dict(sharex="all", sharey="all"),
        titles=SETUPS,
        title_each_figure=True,
    )
    # The creation of ``results`` downloads files from S3 (only if
    # ``download_from_s3 == True``), reads the metadata and creates an inverse
    # index. If any result files are missing, or there are too many of them,
    # warning messages are printed
    results = TrialsOfExperimentResults(
        experiment_names=experiment_names,
        setups=SETUPS,
        metadata_to_setup=metadata_to_setup,
        plot_params=plot_params,
        multi_fidelity_params=multi_fidelity_params,
        download_from_s3=download_from_s3,
    )

    # Create plot for certain benchmark and seed
    benchmark_name = "transformer_wikitext2"
    benchmark = benchmark_definitions(sagemaker_backend=True)[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
    )
    results.plot(
        benchmark_name=benchmark_name,
        seed=seed_to_plot,
        plot_params=plot_params,
        file_name=f"./odsc-learncurves-local-seed{seed_to_plot}.png",
    )

Full details about visualization of results in Syne Tune are given in this tutorial. In a nutshell, this is what happens:

  • The workflow is similar to comparative plots, but here, each setup occupies a different subfigure, and there is no aggregation over seeds (the seed has to be specified in results.plot).

  • Two of the methods compared are multi-fidelity (ASHA, MOBSTER), which is why additional information has to be passed as multi_fidelity_params. This is because learning curves are plotted differently for single-fidelity, multi-fidelity of early stopping and of pause-and-resume type.

  • With plot_params.subplots, we ask for a two-by-two matrix of subfigures. By default, subfigures are oriented as a single row.

Learning curves transformer_wikitext2

Learning curves of trials for different methods on transformer_wikitext2 benchmark, using the local backend with 4 workers.

  • Learning curves of different trials are plotted in different colors.

  • For ASHA and MOBSTER, learning curves are interrupted by pauses at rung levels, and in some cases resume later. Single markers are trials run for a single epoch only.

  • Comparing RS with BO, we see that BO learns to avoid early mistakes rapidly, while RS samples poorly performing configurations at a constant rate.

  • Comparing RS with ASHA, we see that ASHA stops poor trials early, so can explore more configurations, but still suffers from repeating mistakes over and over.

  • Comparing BO with MOBSTER, both clearly learn from the past. However, MOBSTER pauses suboptimal configurations earlier, which allows it to find very good configurations earlier than BO (in about half the time).

With a small modification of the script, we can plot pairs of subfigures for side-by-side comparisons:

transformer_wikitext2/local/plot_learning_curve_pairs.py
from typing import Dict, Any, Optional
import logging

from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments import (
    TrialsOfExperimentResults,
    PlotParameters,
    MultiFidelityParameters,
    SubplotParameters,
)


SETUPS = list(methods.keys())


def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
    return metadata["algorithm"]


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    experiment_names = ("odsc-1",)
    seed_to_plot = 0
    download_from_s3 = False  # Set ``True`` in order to download files from S3

    # Plot parameters across all benchmarks
    plot_params = PlotParameters(
        xlabel="wall-clock time",
        grid=True,
        ylim=(5, 13),
    )
    # We need to provide details about rung levels of the multi-fidelity methods.
    # Also, all methods compared are pause-and-resume
    multi_fidelity_params = MultiFidelityParameters(
        rung_levels=[1, 3, 9, 27, 40],
        multifidelity_setups={"ASHA": True, "MOBSTER": True},
    )
    # The creation of ``results`` downloads files from S3 (only if
    # ``download_from_s3 == True``), reads the metadata and creates an inverse
    # index. If any result files are missing, or there are too many of them,
    # warning messages are printed
    results = TrialsOfExperimentResults(
        experiment_names=experiment_names,
        setups=SETUPS,
        metadata_to_setup=metadata_to_setup,
        plot_params=plot_params,
        multi_fidelity_params=multi_fidelity_params,
        download_from_s3=download_from_s3,
    )

    # Create plots for certain benchmark and seed
    benchmark_name = "transformer_wikitext2"
    benchmark = benchmark_definitions(sagemaker_backend=True)[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
    )
    for indices, name in [
        ([0, 1], "rs-vs-bo"),
        ([0, 2], "rs-vs-asha"),
        ([1, 3], "bo-vs-mobster"),
    ]:
        plot_params.subplots = SubplotParameters(
            nrows=1,
            ncols=2,
            kwargs=dict(sharey="all"),
            subplot_indices=indices,
            titles=[SETUPS[ind] for ind in indices],
        )
        results.plot(
            benchmark_name=benchmark_name,
            seed=seed_to_plot,
            plot_params=plot_params,
            file_name=f"./odsc-learncurves-{name}-seed{seed_to_plot}.png",
        )

Videos featuring Syne Tune

API Reference

benchmarking package

Subpackages
benchmarking.benchmark_definitions package
Submodules
benchmarking.benchmark_definitions.distilbert_on_imdb module
benchmarking.benchmark_definitions.distilbert_on_imdb.distilbert_imdb_benchmark(sagemaker_backend=False, **kwargs)[source]
benchmarking.benchmark_definitions.finetune_transformer_glue module
benchmarking.benchmark_definitions.finetune_transformer_glue.finetune_transformer_glue_benchmark(sagemaker_backend=False, choose_model=False, dataset='rte', model_type='bert-base-cased', num_train_epochs=3, train_valid_fraction=0.7, random_seed=31415927, **kwargs)[source]

This benchmark consists of fine-tuning a Hugging Face transformer model, selected from the zoo, on one of the GLUE benchmarks:

Wang etal.
GLUE: A Multi-task Benchmark and Analysis Platform for Natural
Language Understanding
ICLR 2019
Parameters:
  • sagemaker_backend (bool) – Use SageMaker backend? This affects the choice of instance type. Defaults to False

  • choose_model (bool) – Should tuning involve selecting the best pre-trained model from PRETRAINED_MODELS? If so, the configuration space is extended by another choice variable. Defaults to False

  • dataset (str) – Name of GLUE task, from TASK2METRICSMODE. Defaults to “rte”

  • model_type (str) – Pre-trained model to be used. If choose_model is set, this is the model used in the first evaluation. Defaults to “bert-base-cased”

  • num_train_epochs (int) – Maximum number of epochs for fine-tuning. Defaults to 3

  • train_valid_fraction (float) – The original training set is split into training and validation part, this is the fraction of the training part

  • random_seed (int) – Random seed for training script

  • kwargs – Overwrites default params in RealBenchmarkDefinition object returned

Return type:

RealBenchmarkDefinition

benchmarking.benchmark_definitions.finetune_transformer_glue.finetune_transformer_glue_all_benchmarks(sagemaker_backend=False, model_type='bert-base-cased', num_train_epochs=3, train_valid_fraction=0.7, random_seed=31415927, **kwargs)[source]
Return type:

Dict[str, RealBenchmarkDefinition]

benchmarking.benchmark_definitions.finetune_transformer_swag module
benchmarking.benchmark_definitions.finetune_transformer_swag.finetune_transformer_swag_benchmark(sagemaker_backend=False, num_train_epochs=3, per_device_train_batch_size=8, **kwargs)[source]
Parameters:
  • sagemaker_backend (bool) – Use SageMaker backend? This affects the choice of instance type. Defaults to False

  • num_train_epochs (int) – Maximum number of epochs for fine-tuning. Defaults to 3

  • per_device_train_batch_size (int) – Batch size per device. Defaults to 8

  • kwargs – Overwrites default params in RealBenchmarkDefinition object returned

Return type:

RealBenchmarkDefinition

benchmarking.benchmark_definitions.lstm_wikitext2 module
benchmarking.benchmark_definitions.lstm_wikitext2.lstm_wikitext2_benchmark(sagemaker_backend=False, **kwargs)[source]
benchmarking.benchmark_definitions.mlp_on_fashionmnist module
benchmarking.benchmark_definitions.mlp_on_fashionmnist.mlp_fashionmnist_benchmark(sagemaker_backend=False, **kwargs)[source]
benchmarking.benchmark_definitions.real_benchmark_definitions module
benchmarking.benchmark_definitions.real_benchmark_definitions.real_benchmark_definitions(sagemaker_backend=False, **kwargs)[source]
Return type:

Dict[str, RealBenchmarkDefinition]

benchmarking.benchmark_definitions.resnet_cifar10 module
benchmarking.benchmark_definitions.resnet_cifar10.resnet_cifar10_benchmark(sagemaker_backend=False, **kwargs)[source]
benchmarking.benchmark_definitions.transformer_wikitext2 module
benchmarking.benchmark_definitions.transformer_wikitext2.transformer_wikitext2_benchmark(sagemaker_backend=False, **kwargs)[source]
benchmarking.examples package
Subpackages
benchmarking.examples.benchmark_dehb package
Submodules
benchmarking.examples.benchmark_dehb.baselines module
class benchmarking.examples.benchmark_dehb.baselines.Methods[source]

Bases: object

ASHA = 'ASHA'
SYNCHB = 'SYNCHB'
DEHB = 'DEHB'
BOHB = 'BOHB'
ASHA_ORD = 'ASHA-ORD'
SYNCHB_ORD = 'SYNCHB-ORD'
DEHB_ORD = 'DEHB-ORD'
BOHB_ORD = 'BOHB-ORD'
ASHA_STOP = 'ASHA-STOP'
SYNCMOBSTER = 'SYNCMOBSTER'
benchmarking.examples.benchmark_dehb.baselines.conv_numeric_then_rest(margs)[source]
Return type:

Dict[str, Any]

benchmarking.examples.benchmark_dehb.benchmark_definitions module
benchmarking.examples.benchmark_dehb.hpo_main module
benchmarking.examples.benchmark_dehb.launch_remote module
benchmarking.examples.benchmark_dyhpo package
Submodules
benchmarking.examples.benchmark_dyhpo.baselines module
class benchmarking.examples.benchmark_dyhpo.baselines.Methods[source]

Bases: object

BO = 'BO'
ASHA = 'ASHA'
MOBSTER = 'MOBSTER'
HYPERTUNE = 'HyperTune'
DYHPO = 'DYHPO'
benchmarking.examples.benchmark_dyhpo.benchmark_definitions module
benchmarking.examples.benchmark_dyhpo.hpo_main module
benchmarking.examples.benchmark_dyhpo.launch_remote module
benchmarking.examples.benchmark_hypertune package
Submodules
benchmarking.examples.benchmark_hypertune.baselines module
class benchmarking.examples.benchmark_hypertune.baselines.Methods[source]

Bases: object

ASHA = 'ASHA'
MOBSTER_JOINT = 'MOBSTER-JOINT'
MOBSTER_INDEP = 'MOBSTER-INDEP'
HYPERTUNE_INDEP = 'HYPERTUNE-INDEP'
HYPERTUNE_JOINT = 'HYPERTUNE-JOINT'
SYNCHB = 'SYNCHB'
BOHB = 'BOHB'
benchmarking.examples.benchmark_hypertune.benchmark_definitions module
benchmarking.examples.benchmark_hypertune.hpo_main module
benchmarking.examples.benchmark_hypertune.launch_remote module
benchmarking.examples.benchmark_hypertune.plot_results module
benchmarking.examples.benchmark_warping package
Submodules
benchmarking.examples.benchmark_warping.baselines module
class benchmarking.examples.benchmark_warping.baselines.Methods[source]

Bases: object

RS = 'RS'
ASHA = 'ASHA'
BO = 'BO'
BO_WARP = 'BO-WARP'
BO_BOXCOX = 'BO-BOXCOX'
BO_WARP_BOXCOX = 'BO-WARP-BOXCOX'
MOBSTER = 'MOBSTER'
MOBSTER_WARP = 'MOBSTER-WARP'
MOBSTER_BOXCOX = 'MOBSTER-BOXCOX'
MOBSTER_WARP_BOXCOX = 'MOBSTER-WARP-BOXCOX'
benchmarking.examples.benchmark_warping.benchmark_definitions module
benchmarking.examples.benchmark_warping.hpo_main module
benchmarking.examples.benchmark_warping.launch_remote module
benchmarking.examples.demo_experiment package
Submodules
benchmarking.examples.demo_experiment.baselines module
class benchmarking.examples.demo_experiment.baselines.Methods[source]

Bases: object

RS = 'RS'
BO = 'BO'
ASHA = 'ASHA'
MOBSTER = 'MOBSTER'
ASHA_TANH = 'ASHA-TANH'
MOBSTER_TANH = 'MOBSTER-TANH'
ASHA_RELU = 'ASHA-RELU'
MOBSTER_RELU = 'MOBSTER-RELU'
benchmarking.examples.demo_experiment.benchmark_definitions module
benchmarking.examples.demo_experiment.hpo_main module
benchmarking.examples.demo_experiment.launch_remote module
benchmarking.examples.demo_experiment.plot_results module
benchmarking.examples.fine_tuning_transformer_glue package
Submodules
benchmarking.examples.fine_tuning_transformer_glue.baselines module
class benchmarking.examples.fine_tuning_transformer_glue.baselines.Methods[source]

Bases: object

BO = 'BO'
MOBSTER = 'MOBSTER'
benchmarking.examples.fine_tuning_transformer_glue.hpo_main module
benchmarking.examples.fine_tuning_transformer_glue.hpo_main.map_method_args(args, method, method_kwargs)[source]
Return type:

Dict[str, Any]

benchmarking.examples.fine_tuning_transformer_glue.launch_remote module
benchmarking.examples.fine_tuning_transformer_glue.plot_results module
benchmarking.examples.fine_tuning_transformer_glue.plot_results.metadata_to_setup(metadata)[source]
Return type:

Optional[str]

benchmarking.examples.fine_tuning_transformer_swag package
Submodules
benchmarking.examples.fine_tuning_transformer_swag.baselines module
class benchmarking.examples.fine_tuning_transformer_swag.baselines.Methods[source]

Bases: object

BO = 'BO'
MOBSTER = 'MOBSTER'
benchmarking.examples.fine_tuning_transformer_swag.hpo_main module
benchmarking.examples.fine_tuning_transformer_swag.hpo_main.map_method_args(args, method, method_kwargs)[source]
Return type:

Dict[str, Any]

benchmarking.examples.fine_tuning_transformer_swag.launch_remote module
benchmarking.examples.launch_local package
Submodules
benchmarking.examples.launch_local.baselines module
class benchmarking.examples.launch_local.baselines.Methods[source]

Bases: object

RS = 'RS'
BO = 'BO'
ASHA = 'ASHA'
MOBSTER = 'MOBSTER'
benchmarking.examples.launch_local.hpo_main module
benchmarking.examples.launch_local.launch_remote module
benchmarking.examples.launch_sagemaker package
Submodules
benchmarking.examples.launch_sagemaker.baselines module
class benchmarking.examples.launch_sagemaker.baselines.Methods[source]

Bases: object

RS = 'RS'
BO = 'BO'
ASHA = 'ASHA'
MOBSTER = 'MOBSTER'
benchmarking.examples.launch_sagemaker.hpo_main module
benchmarking.examples.launch_sagemaker.launch_remote module
benchmarking.training_scripts package
benchmarking.utils package
benchmarking.utils.get_cost_model_for_batch_size(params, batch_size_key, batch_size_range)[source]

Returns cost model depending on the batch size only.

Parameters:
  • params (Dict[str, Any]) – Command line arguments

  • batch_size_key (str) – Name of batch size entry in config

  • batch_size_range (Tuple[int, int]) – (lower, upper) for batch size, both sides are inclusive

Returns:

Cost model (or None if dependencies cannot be imported)

class benchmarking.utils.StoreSearcherStatesCallback[source]

Bases: TunerCallback

Stores list of searcher states alongside a tuning run. The list is extended by a new state whenever the TuningJobState has changed compared to the last recently added one.

This callback is useful to create meaningful unit tests, by sampling a given searcher alongside a realistic experiment.

Works only for ModelBasedSearcher searchers. For other searchers, nothing is stored.

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tuner (Tuner) – Tuner object

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

property states
searcher_state_as_code(pos, add_info=False)[source]
Submodules
benchmarking.utils.get_cost_model module
benchmarking.utils.get_cost_model.get_cost_model_for_batch_size(params, batch_size_key, batch_size_range)[source]

Returns cost model depending on the batch size only.

Parameters:
  • params (Dict[str, Any]) – Command line arguments

  • batch_size_key (str) – Name of batch size entry in config

  • batch_size_range (Tuple[int, int]) – (lower, upper) for batch size, both sides are inclusive

Returns:

Cost model (or None if dependencies cannot be imported)

benchmarking.utils.launch_sample_searcher_states module

This script launches an experiment for the purpose of sampling searcher states, which can then be used in unit tests.

benchmarking.utils.searcher_state_callback module
class benchmarking.utils.searcher_state_callback.StoreSearcherStatesCallback[source]

Bases: TunerCallback

Stores list of searcher states alongside a tuning run. The list is extended by a new state whenever the TuningJobState has changed compared to the last recently added one.

This callback is useful to create meaningful unit tests, by sampling a given searcher alongside a realistic experiment.

Works only for ModelBasedSearcher searchers. For other searchers, nothing is stored.

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tuner (Tuner) – Tuner object

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

property states
searcher_state_as_code(pos, add_info=False)[source]

setup module

syne_tune package

class syne_tune.StoppingCriterion(max_wallclock_time=None, max_num_evaluations=None, max_num_trials_started=None, max_num_trials_completed=None, max_cost=None, max_num_trials_finished=None, min_metric_value=None, max_metric_value=None)[source]

Bases: object

Stopping criterion that can be used in a Tuner, for instance Tuner(stop_criterion=StoppingCriterion(max_wallclock_time=3600), ...).

If several arguments are used, the combined criterion is true whenever one of the atomic criteria is true.

In principle, stop_criterion for Tuner can be any lambda function, but this class should be used with remote launching in order to ensure proper serialization.

Parameters:
  • max_wallclock_time (Optional[float]) – Stop once this wallclock time is reached

  • max_num_evaluations (Optional[int]) – Stop once more than this number of metric records have been reported

  • max_num_trials_started (Optional[int]) – Stop once more than this number of trials have been started

  • max_num_trials_completed (Optional[int]) – Stop once more than this number of trials have been completed. This does not include trials which were stopped or failed

  • max_cost (Optional[float]) – Stop once total cost of evaluations larger than this value

  • max_num_trials_finished (Optional[int]) – Stop once more than this number of trials have finished (i.e., completed, stopped, failed, or stopping)

  • min_metric_value (Optional[Dict[str, float]]) – Dictionary with thresholds for selected metrics. Stop once an evaluation reports a metric value below a threshold

  • max_metric_value (Optional[Dict[str, float]]) – Dictionary with thresholds for selected metrics. Stop once an evaluation reports a metric value above a threshold

max_wallclock_time: float = None
max_num_evaluations: int = None
max_num_trials_started: int = None
max_num_trials_completed: int = None
max_cost: float = None
max_num_trials_finished: int = None
min_metric_value: Optional[Dict[str, float]] = None
max_metric_value: Optional[Dict[str, float]] = None
class syne_tune.Tuner(trial_backend, scheduler, stop_criterion, n_workers, sleep_time=5.0, results_update_interval=10.0, print_update_interval=30.0, max_failures=1, tuner_name=None, asynchronous_scheduling=True, wait_trial_completion_when_stopping=False, callbacks=None, metadata=None, suffix_tuner_name=True, save_tuner=True, start_jobs_without_delay=True, trial_backend_path=None)[source]

Bases: object

Controller of tuning loop, manages interplay between scheduler and trial backend. Also, stopping criterion and number of workers are maintained here.

Parameters:
  • trial_backend (TrialBackend) – Backend for trial evaluations

  • scheduler (TrialScheduler) – Tuning algorithm for making decisions about which trials to start, stop, pause, or resume

  • stop_criterion (Callable[[TuningStatus], bool]) – Tuning stops when this predicates returns True. Called in each iteration with the current tuning status. It is recommended to use StoppingCriterion.

  • n_workers (int) – Number of workers used here. Note that the backend needs to support (at least) this number of workers to be run in parallel

  • sleep_time (float) – Time to sleep when all workers are busy. Defaults to DEFAULT_SLEEP_TIME

  • results_update_interval (float) – Frequency at which results are updated and stored (in seconds). Defaults to 10.

  • print_update_interval (float) – Frequency at which result table is printed. Defaults to 30.

  • max_failures (int) – This many trial execution failures are allowed before the tuning loop is aborted. Defaults to 1

  • tuner_name (Optional[str]) – Name associated with the tuning experiment, default to the name of the entrypoint. Must consists of alpha-digits characters, possibly separated by ‘-’. A postfix with a date time-stamp is added to ensure uniqueness.

  • asynchronous_scheduling (bool) – Whether to use asynchronous scheduling when scheduling new trials. If True, trials are scheduled as soon as a worker is available. If False, the tuner waits that all trials are finished before scheduling a new batch of size n_workers. Default to True.

  • wait_trial_completion_when_stopping (bool) – How to deal with running trials when stopping criterion is met. If True, the tuner waits until all trials are finished. If False, all trials are terminated. Defaults to False.

  • callbacks (Optional[List[TunerCallback]]) – Called at certain times in the tuning loop, for example when a result is seen. The default callback stores results every results_update_interval.

  • metadata (Optional[dict]) – Dictionary of user-metadata that will be persisted in {tuner_path}/{ST_METADATA_FILENAME}, in addition to metadata provided by the user. SMT_TUNER_CREATION_TIMESTAMP is always included which measures the time-stamp when the tuner started to run.

  • suffix_tuner_name (bool) – If True, a timestamp is appended to the provided tuner_name that ensures uniqueness, otherwise the name is left unchanged and is expected to be unique. Defaults to True.

  • save_tuner (bool) – If True, the Tuner object is serialized at the end of tuning, including its dependencies (e.g., scheduler). This allows all details of the experiment to be recovered. Defaults to True.

  • start_jobs_without_delay (bool) –

    Defaults to True. If this is True, the tuner starts new jobs depending on scheduler decisions communicated to the backend. For example, if a trial has just been stopped (by calling backend.stop_trial), the tuner may start a new one immediately, even if the SageMaker training job is still busy due to stopping delays. This can lead to faster experiment runtime, because the backend is temporarily going over its budget.

    If set to False, the tuner always asks the backend for the number of busy workers, which guarantees that we never go over the n_workers budget. This makes a difference for backends where stopping or pausing trials is not immediate (e.g., SageMakerBackend). Not going over budget means that n_workers can be set up to the available quota, without running the risk of an exception due to the quota being exceeded. If you get such exceptions, we recommend to use start_jobs_without_delay=False. Also, if the SageMaker warm pool feature is used, it is recommended to set start_jobs_without_delay=False, since otherwise more than n_workers warm pools will be started, because existing ones are busy with stopping when they should be reassigned.

  • trial_backend_path (Optional[str]) –

    If this is given, the path of trial_backend (where logs and checkpoints of trials are stored) is set to this. Otherwise, it is set to self.tuner_path, so that per-trial information is written to the same path as tuning results.

    If the backend is LocalBackend and the experiment is ru remotely, we recommend to set this, since otherwise checkpoints and logs are synced to S3, along with tuning results, which is costly and error-prone.

run()[source]

Launches the tuning.

save(folder=None)[source]
static load(tuner_path)[source]
best_config(metric=0)[source]
Parameters:

metric (Union[str, int, None]) – Indicates which metric to use, can be the index or a name of the metric. default to 0 - first metric defined in the Scheduler

Return type:

Tuple[int, Dict[str, Any]]

Returns:

the best configuration found while tuning for the metric given and the associated trial-id

class syne_tune.Reporter(add_time=True, add_cost=True)[source]

Bases: object

Callback for reporting metric values from a training script back to Syne Tune. Example:

from syne_tune import Reporter

report = Reporter()
for epoch in range(1, epochs + 1):
    # ...
    report(epoch=epoch, accuracy=accuracy)
Parameters:
  • add_time (bool) – If True (default), the time (in secs) since creation of the Reporter object is reported automatically as ST_WORKER_TIME

  • add_cost (bool) – If True (default), estimated dollar cost since creation of Reporter object is reported automatically as ST_WORKER_COST. This is available for SageMaker backend only. Requires add_time=True.

add_time: bool = True
add_cost: bool = True
Subpackages
syne_tune.backend package
class syne_tune.backend.LocalBackend(entry_point, delete_checkpoints=False, pass_args_as_json=False, rotate_gpus=True, num_gpus_per_trial=1, gpus_to_use=None)[source]

Bases: TrialBackend

A backend running locally by spawning sub-process concurrently. Note that no resource management is done so the concurrent number of trials should be adjusted to the machine capacity.

Additional arguments on top of parent class TrialBackend:

Parameters:
  • entry_point (str) – Path to Python main file to be tuned

  • rotate_gpus (bool) – In case several GPUs are present, each trial is scheduled on a different GPU. A new trial is preferentially scheduled on a free GPU, and otherwise the GPU with least prior assignments is chosen. If False, then all GPUs are used at the same time for all trials. Defaults to True.

  • num_gpus_per_trial (int) – Number of GPUs to be allocated to each trial. Must be not larger than the total number of GPUs available. Defaults to 1

  • gpus_to_use (Optional[List[int]]) – If this is given, the backend only uses GPUs in this lists (non-negative ints). Entries must be in range(get_num_gpus()). Defaults to using all GPUs.

trial_path(trial_id)[source]
Parameters:

trial_id (int) – ID of trial

Return type:

Path

Returns:

Directory where files related to trial are written to

checkpoint_trial_path(trial_id)[source]
Parameters:

trial_id (int) – ID of trial

Return type:

Path

Returns:

Directory where checkpoints for trial are written to and read from

copy_checkpoint(src_trial_id, tgt_trial_id)[source]

Copy the checkpoint folder from one trial to the other.

Parameters:
  • src_trial_id (int) – Source trial ID (copy from)

  • tgt_trial_id (int) – Target trial ID (copy to)

delete_checkpoint(trial_id)[source]

Removes checkpoint folder for a trial. It is OK for the folder not to exist.

Parameters:

trial_id (int) – ID of trial for which checkpoint files are deleted

busy_trial_ids()[source]

Returns list of ids for currently busy trials

A trial is busy if its status is in_progress or stopping. If the execution setup is able to run n_workers jobs in parallel, then if this method returns a list of size n, the tuner may start n_workers - n new jobs.

Return type:

List[Tuple[int, str]]

Returns:

List of (trial_id, status)

stdout(trial_id)[source]

Fetch stdout log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stdout)

stderr(trial_id)[source]

Fetch stderr log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stderr)

set_path(results_root=None, tuner_name=None)[source]
Parameters:
  • results_root (Optional[str]) – The local folder that should contain the results of the tuning experiment. Used by Tuner to indicate a desired path where the results should be written to. This is used to unify the location of backend files and Tuner results when possible (in the local backend). By default, the backend does not do anything since not all backends may be able to unify their file locations.

  • tuner_name (Optional[str]) – Name of the tuner, can be used for instance to save checkpoints on remote storage.

entrypoint_path()[source]
Return type:

Path

Returns:

Entrypoint path of script to be executed

set_entrypoint(entry_point)[source]

Update the entrypoint.

Parameters:

entry_point (str) – New path of the entrypoint.

class syne_tune.backend.PythonBackend(tune_function, config_space, rotate_gpus=True, delete_checkpoints=False)[source]

Bases: LocalBackend

A backend that supports the tuning of Python functions (if you rather want to tune an endpoint script such as “train.py”, then you should use LocalBackend). The function tune_function should be serializable, should not reference any global variable or module and should have as arguments a subset of the keys of config_space. When deserializing, a md5 is checked to ensure consistency.

For instance, the following function is a valid way of defining a backend on top of a simple function:

from syne_tune.backend import PythonBackend
from syne_tune.config_space import uniform

def f(x, epochs):
    import logging
    import time
    from syne_tune import Reporter
    root = logging.getLogger()
    root.setLevel(logging.DEBUG)
    reporter = Reporter()
    for i in range(epochs):
        reporter(epoch=i + 1, y=x + i)

config_space = {
    "x": uniform(-10, 10),
    "epochs": 5,
}
backend = PythonBackend(tune_function=f, config_space=config_space)

See examples/launch_height_python_backend.py for a complete example.

Additional arguments on top of parent class LocalBackend:

Parameters:
  • tune_function (Callable) – Python function to be tuned. The function must call Syne Tune reporter to report metrics and be serializable, imports should be performed inside the function body.

  • config_space (Dict[str, object]) – Configuration space corresponding to arguments of tune_function

property tune_function_path: Path
set_path(results_root=None, tuner_name=None)[source]
Parameters:
  • results_root (Optional[str]) – The local folder that should contain the results of the tuning experiment. Used by Tuner to indicate a desired path where the results should be written to. This is used to unify the location of backend files and Tuner results when possible (in the local backend). By default, the backend does not do anything since not all backends may be able to unify their file locations.

  • tuner_name (Optional[str]) – Name of the tuner, can be used for instance to save checkpoints on remote storage.

save_tune_function(tune_function)[source]
class syne_tune.backend.SageMakerBackend(sm_estimator, metrics_names=None, s3_path=None, delete_checkpoints=False, pass_args_as_json=False, **sagemaker_fit_kwargs)[source]

Bases: TrialBackend

This backend executes each trial evaluation as a separate SageMaker training job, using sm_estimator as estimator.

Checkpoints are written to and loaded from S3, using checkpoint_s3_uri of the estimator.

Compared to LocalBackend, this backend can run any number of jobs in parallel (given sufficient resources), and any instance type can be used.

This backend allows to select the instance type and count for a trial evaluation, by passing values in the configuration, using names ST_INSTANCE_TYPE and ST_INSTANCE_COUNT. If these are given in the configuration, they overwrite the default in sm_estimator. This allows for tuning instance type and count along with the hyperparameter configuration.

Additional arguments on top of parent class TrialBackend:

Parameters:
  • sm_estimator (Framework) – SageMaker estimator for trial evaluations.

  • metrics_names (Optional[List[str]]) – Names of metrics passed to report, used to plot live curve in SageMaker (optional, only used for visualization)

  • s3_path (Optional[str]) – S3 base path used for checkpointing. The full path also involves the tuner name and the trial_id. The default base path is the S3 bucket associated with the SageMaker account

  • sagemaker_fit_kwargs – Extra arguments that passed to sagemaker.estimator.Framework when fitting the job, for instance {'train': 's3://my-data-bucket/path/to/my/training/data'}

property sm_client
add_metric_definitions_to_sagemaker_estimator(metrics_names)[source]
busy_trial_ids()[source]

Returns list of ids for currently busy trials

A trial is busy if its status is in_progress or stopping. If the execution setup is able to run n_workers jobs in parallel, then if this method returns a list of size n, the tuner may start n_workers - n new jobs.

Return type:

List[Tuple[int, str]]

Returns:

List of (trial_id, status)

stdout(trial_id)[source]

Fetch stdout log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stdout)

stderr(trial_id)[source]

Fetch stderr log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stderr)

property source_dir: str | None
set_entrypoint(entry_point)[source]

Update the entrypoint.

Parameters:

entry_point (str) – New path of the entrypoint.

entrypoint_path()[source]
Return type:

Path

Returns:

Entrypoint path of script to be executed

initialize_sagemaker_session()[source]
copy_checkpoint(src_trial_id, tgt_trial_id)[source]

Copy the checkpoint folder from one trial to the other.

Parameters:
  • src_trial_id (int) – Source trial ID (copy from)

  • tgt_trial_id (int) – Target trial ID (copy to)

delete_checkpoint(trial_id)[source]

Removes checkpoint folder for a trial. It is OK for the folder not to exist.

Parameters:

trial_id (int) – ID of trial for which checkpoint files are deleted

set_path(results_root=None, tuner_name=None)[source]

For this backend, it is mandatory to call this method passing tuner_name before the backend is used. results_root is ignored here.

on_tuner_save()[source]

Called at the end of save().

Subpackages
syne_tune.backend.python_backend package
Submodules
syne_tune.backend.python_backend.python_backend module
syne_tune.backend.python_backend.python_backend.file_md5(filename)[source]
Return type:

str

class syne_tune.backend.python_backend.python_backend.PythonBackend(tune_function, config_space, rotate_gpus=True, delete_checkpoints=False)[source]

Bases: LocalBackend

A backend that supports the tuning of Python functions (if you rather want to tune an endpoint script such as “train.py”, then you should use LocalBackend). The function tune_function should be serializable, should not reference any global variable or module and should have as arguments a subset of the keys of config_space. When deserializing, a md5 is checked to ensure consistency.

For instance, the following function is a valid way of defining a backend on top of a simple function:

from syne_tune.backend import PythonBackend
from syne_tune.config_space import uniform

def f(x, epochs):
    import logging
    import time
    from syne_tune import Reporter
    root = logging.getLogger()
    root.setLevel(logging.DEBUG)
    reporter = Reporter()
    for i in range(epochs):
        reporter(epoch=i + 1, y=x + i)

config_space = {
    "x": uniform(-10, 10),
    "epochs": 5,
}
backend = PythonBackend(tune_function=f, config_space=config_space)

See examples/launch_height_python_backend.py for a complete example.

Additional arguments on top of parent class LocalBackend:

Parameters:
  • tune_function (Callable) – Python function to be tuned. The function must call Syne Tune reporter to report metrics and be serializable, imports should be performed inside the function body.

  • config_space (Dict[str, object]) – Configuration space corresponding to arguments of tune_function

property tune_function_path: Path
set_path(results_root=None, tuner_name=None)[source]
Parameters:
  • results_root (Optional[str]) – The local folder that should contain the results of the tuning experiment. Used by Tuner to indicate a desired path where the results should be written to. This is used to unify the location of backend files and Tuner results when possible (in the local backend). By default, the backend does not do anything since not all backends may be able to unify their file locations.

  • tuner_name (Optional[str]) – Name of the tuner, can be used for instance to save checkpoints on remote storage.

save_tune_function(tune_function)[source]
syne_tune.backend.python_backend.python_entrypoint module

An entry point that loads a serialized function from PythonBackend and executes it with the provided hyperparameter. The md5 hash of the file is first checked before executing the deserialized function.

syne_tune.backend.sagemaker_backend package
Submodules
syne_tune.backend.sagemaker_backend.custom_framework module
class syne_tune.backend.sagemaker_backend.custom_framework.CustomFramework(entry_point, image_uri, source_dir=None, hyperparameters=None, **kwargs)[source]

Bases: Framework

LATEST_VERSION = '0.1'
create_model(model_server_workers=None, role=None, vpc_config_override='VPC_CONFIG_DEFAULT')[source]

Create a SageMaker Model object that can be deployed to an Endpoint.

Args:
**kwargs: Keyword arguments used by the implemented method for

creating the Model.

Returns:

sagemaker.model.Model: A SageMaker Model object. See Model() for full details.

syne_tune.backend.sagemaker_backend.instance_info module
class syne_tune.backend.sagemaker_backend.instance_info.InstanceInfo(name, num_cpu, num_gpu, cost_per_hour)[source]

Bases: object

name: str
num_cpu: int
num_gpu: int
cost_per_hour: float
class syne_tune.backend.sagemaker_backend.instance_info.InstanceInfos[source]

Bases: object

Utility to get information of an instance type (num cpu/gpu, cost per hour).

syne_tune.backend.sagemaker_backend.instance_info.select_instance_type(min_gpu=0, max_gpu=16, min_cost_per_hour=None, max_cost_per_hour=None)[source]
Parameters:
  • min_gpu (int) –

  • max_gpu (int) –

  • min_cost_per_hour (Optional[float]) –

  • max_cost_per_hour (Optional[float]) –

Return type:

List[str]

Returns:

a list of instance type that met the required constrain on minimum/maximum number of GPU and

minimum/maximum cost per hour.

syne_tune.backend.sagemaker_backend.sagemaker_backend module
class syne_tune.backend.sagemaker_backend.sagemaker_backend.SageMakerBackend(sm_estimator, metrics_names=None, s3_path=None, delete_checkpoints=False, pass_args_as_json=False, **sagemaker_fit_kwargs)[source]

Bases: TrialBackend

This backend executes each trial evaluation as a separate SageMaker training job, using sm_estimator as estimator.

Checkpoints are written to and loaded from S3, using checkpoint_s3_uri of the estimator.

Compared to LocalBackend, this backend can run any number of jobs in parallel (given sufficient resources), and any instance type can be used.

This backend allows to select the instance type and count for a trial evaluation, by passing values in the configuration, using names ST_INSTANCE_TYPE and ST_INSTANCE_COUNT. If these are given in the configuration, they overwrite the default in sm_estimator. This allows for tuning instance type and count along with the hyperparameter configuration.

Additional arguments on top of parent class TrialBackend:

Parameters:
  • sm_estimator (Framework) – SageMaker estimator for trial evaluations.

  • metrics_names (Optional[List[str]]) – Names of metrics passed to report, used to plot live curve in SageMaker (optional, only used for visualization)

  • s3_path (Optional[str]) – S3 base path used for checkpointing. The full path also involves the tuner name and the trial_id. The default base path is the S3 bucket associated with the SageMaker account

  • sagemaker_fit_kwargs – Extra arguments that passed to sagemaker.estimator.Framework when fitting the job, for instance {'train': 's3://my-data-bucket/path/to/my/training/data'}

property sm_client
add_metric_definitions_to_sagemaker_estimator(metrics_names)[source]
busy_trial_ids()[source]

Returns list of ids for currently busy trials

A trial is busy if its status is in_progress or stopping. If the execution setup is able to run n_workers jobs in parallel, then if this method returns a list of size n, the tuner may start n_workers - n new jobs.

Return type:

List[Tuple[int, str]]

Returns:

List of (trial_id, status)

stdout(trial_id)[source]

Fetch stdout log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stdout)

stderr(trial_id)[source]

Fetch stderr log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stderr)

property source_dir: str | None
set_entrypoint(entry_point)[source]

Update the entrypoint.

Parameters:

entry_point (str) – New path of the entrypoint.

entrypoint_path()[source]
Return type:

Path

Returns:

Entrypoint path of script to be executed

initialize_sagemaker_session()[source]
copy_checkpoint(src_trial_id, tgt_trial_id)[source]

Copy the checkpoint folder from one trial to the other.

Parameters:
  • src_trial_id (int) – Source trial ID (copy from)

  • tgt_trial_id (int) – Target trial ID (copy to)

delete_checkpoint(trial_id)[source]

Removes checkpoint folder for a trial. It is OK for the folder not to exist.

Parameters:

trial_id (int) – ID of trial for which checkpoint files are deleted

set_path(results_root=None, tuner_name=None)[source]

For this backend, it is mandatory to call this method passing tuner_name before the backend is used. results_root is ignored here.

on_tuner_save()[source]

Called at the end of save().

syne_tune.backend.sagemaker_backend.sagemaker_utils module
syne_tune.backend.sagemaker_backend.sagemaker_utils.default_config()[source]

https://aws.amazon.com/premiumsupport/knowledge-center/sagemaker-python-throttlingexception/

Return type:

Config

Returns:

Default config which avoids throttling

syne_tune.backend.sagemaker_backend.sagemaker_utils.default_sagemaker_session()[source]
syne_tune.backend.sagemaker_backend.sagemaker_utils.get_log(jobname, log_client=None)[source]
Parameters:
  • jobname (str) – name of a sagemaker training job

  • log_client – a log client, for instance boto3.client('logs') if None, the client is instantiated with the

default AWS configuration :rtype: List[str] :return: lines appearing in the log of the Sagemaker training job

syne_tune.backend.sagemaker_backend.sagemaker_utils.decode_sagemaker_hyperparameter(hp)[source]
Parameters:
  • trial_ids_and_names (List[Tuple[int, str]]) – Trial ids and sagemaker jobnames to retrieve information from

  • sm_client – Sagemaker client used to search for jobs

  • sm_client – Log client used to query lob logs

Return type:

List[TrialResult]

Returns:

list of dictionary containing job information (status, creation-time, metrics, hyperparameters etc).

In term of speed around 100 jobs can be retrieved per second.

syne_tune.backend.sagemaker_backend.sagemaker_utils.metric_definitions_from_names(metrics_names)[source]
Parameters:

metrics_names (List[str]) – names of the metrics present in the log.

Metrics must be written in the log as [metric-name]: value, for instance [accuracy]: 0.23 :return: a list of metric dictionaries that can be passed to sagemaker so that metrics are parsed from logs, the list can be passed to metric_definitions in sagemaker.

syne_tune.backend.sagemaker_backend.sagemaker_utils.add_metric_definitions_to_sagemaker_estimator(estimator, metrics_names)[source]

Adds metric definitions according to metric_definitions_from_names() to estimator for each name in metrics_names. The regexp for each name is compatible with how Reporter outputs metric values.

Parameters:
  • estimator (EstimatorBase) – SageMaker estimator

  • metrics_names (List[str]) – Names of metrics to be appended

syne_tune.backend.sagemaker_backend.sagemaker_utils.add_syne_tune_dependency(sm_estimator)[source]
syne_tune.backend.sagemaker_backend.sagemaker_utils.sagemaker_fit(sm_estimator, hyperparameters, checkpoint_s3_uri=None, wait=False, job_name=None, *sagemaker_fit_args, **sagemaker_fit_kwargs)[source]
Parameters:
  • sm_estimator (Framework) – sagemaker estimator to be fitted

  • hyperparameters (Dict[str, object]) – dictionary of hyperparameters that are passed to entry_point_script

  • checkpoint_s3_uri (Optional[str]) – checkpoint_s3_uri of Sagemaker Estimator

  • wait (bool) – whether to wait for job completion

  • metrics_names – names of metrics to track reported with report.py. In case those metrics are passed, their

learning curves will be shown in Sagemaker console. :return: name of sagemaker job

syne_tune.backend.sagemaker_backend.sagemaker_utils.get_execution_role()[source]
Returns:

sagemaker execution role that is specified with the environment variable AWS_ROLE, if not specified then

we infer it by searching for the role associated to Sagemaker. Note that import sagemaker; sagemaker.get_execution_role() does not return the right role outside of a Sagemaker notebook.

syne_tune.backend.sagemaker_backend.sagemaker_utils.untar(filename)[source]
syne_tune.backend.sagemaker_backend.sagemaker_utils.download_sagemaker_results(s3_path=None)[source]

Download results obtained after running tuning remotely on Sagemaker, e.g. when using RemoteLauncher.

syne_tune.backend.sagemaker_backend.sagemaker_utils.map_identifier_limited_length(name, max_length=63, rnd_digits=4)[source]

If name is longer than ‘max_length`` characters, it is mapped to a new identifier of length max_length, being the concatenation of the first max_length - rnd_digits characters of name, followed by a random string of length hash_digits.

Parameters:
  • name (str) – Identifier to be limited in length

  • max_length (int) – Maximum length for output

  • rnd_digits (int) – See above

Return type:

str

Returns:

See above

syne_tune.backend.sagemaker_backend.sagemaker_utils.s3_copy_objects_recursively(s3_source_path, s3_target_path)[source]

Recursively copies objects from s3_source_path to s3_target_path.

We return a dict with ‘num_action_calls’, ‘num_successful_action_calls’, ‘first_error_message’ (the error message for the first failed action call encountered).

Note

This function should not be used to copy a large number of objects, as it is rather slow (one API call for object)

Parameters:
  • s3_source_path (str) –

  • s3_target_path (str) –

Return type:

Dict[str, Any]

Returns:

See above

syne_tune.backend.sagemaker_backend.sagemaker_utils.s3_delete_objects_recursively(s3_path)[source]

Recursively deletes objects from s3_path.

We return a dict with ‘num_action_calls’, ‘num_successful_action_calls’, ‘first_error_message’ (the error message for the first failed action call encountered).

Note

This function should not be used to delete a large number of objects, as it is rather slow (one API call for object)

Parameters:

s3_path (str) –

Return type:

Dict[str, Any]

Returns:

See above

syne_tune.backend.sagemaker_backend.sagemaker_utils.s3_download_files_recursively(s3_source_path, target_path, valid_postfixes=None)[source]

Recursively downloads objects from s3_source_path and stores them locally as files below target_path

We return a dict with ‘num_action_calls’, ‘num_successful_action_calls’, ‘first_error_message’ (the error message for the first failed action call encountered).

If valid_postfixes is given, only such objects are downloaded for which object_key.endswith(postfix) for some postfix in valid_postfixes.

Note

This function should not be used to download a large number of objects, as it is rather slow (one API call for object). In this case, running aws s3 sync can be much faster.

Parameters:
  • s3_source_path (str) – See above

  • target_path (str) – See above

  • valid_postfixes (Optional[List[str]]) – See above, optional

Return type:

Dict[str, Any]

Returns:

See above

syne_tune.backend.sagemaker_backend.sagemaker_utils.backend_path_not_synced_to_s3()[source]

When an experiment with the local backend is run remotely (as SageMaker training job), we do not want checkpoints to be synced to S3, since this is expensive and error-prone (since several trials may write checkpoints at the same time). Pass the returned path to trial_backend_path when constructing the :class`~syne_tune.Tuner`.

Here, we direct checkpoint writing to /opt/ml/input/data/, which is mounted on a partition with sufficient space. Different to /opt/ml/checkpoints, this directory is not synced to S3.

Return type:

Path

Returns:

Path to set in local backend

syne_tune.backend.simulator_backend package
Submodules
syne_tune.backend.simulator_backend.events module
class syne_tune.backend.simulator_backend.events.Event(trial_id)[source]

Bases: object

Base class for events dealt with in the simulator.

trial_id: int
class syne_tune.backend.simulator_backend.events.StartEvent(trial_id)[source]

Bases: Event

Start training evaluation function for trial_id. In fact, the function is run completely, and OnTrialResultEvent events and one CompleteEvent are generated.

trial_id: int
class syne_tune.backend.simulator_backend.events.CompleteEvent(trial_id, status)[source]

Bases: Event

Job for trial trial_id completes with status status. This is registered at the backend.

status: str
class syne_tune.backend.simulator_backend.events.StopEvent(trial_id)[source]

Bases: Event

Job for trial trial_id is stopped. This leads to all later events for trial_id to be deleted, and a new CompleteEvent.

trial_id: int
class syne_tune.backend.simulator_backend.events.OnTrialResultEvent(trial_id, result)[source]

Bases: Event

Result reported by some worker arrives at the backend and is registered there.

result: Dict[str, Any]
class syne_tune.backend.simulator_backend.events.SimulatorState(event_heap=None, events_added=0)[source]

Bases: object

Maintains the state of the simulator, in particular the event heap.

event_heap is the priority queue for events, the key being (time, cnt), where time is the event time, and cnt is a non-negative int used to break ties. When an event is added, the cnt value is taken from events_added. This means that ties are broken first_in_first_out.

push(event, event_time)[source]

Push new event onto heap

Parameters:
  • event (Event) –

  • event_time (float) –

remove_events(trial_id)[source]

Remove all events with trial_id equal to trial_id.

Parameters:

trial_id (int) –

next_until(time_until)[source]

Returns (and pops) event on top of heap, if event time is <= time_until. Otherwise, returns None.

Parameters:

time_until (float) –

Return type:

Optional[Tuple[float, Event]]

Returns:

syne_tune.backend.simulator_backend.simulator_backend module
class syne_tune.backend.simulator_backend.simulator_backend.SimulatorConfig(delay_on_trial_result=0.05, delay_complete_after_final_report=0.05, delay_complete_after_stop=0.05, delay_start=0.05, delay_stop=0.05)[source]

Bases: object

Configures the simulator:

Parameters:
  • delay_on_trial_result (float) – Time from report called on worker to result registered at backend, defaults to DEFAULT_DELAY

  • delay_complete_after_final_report (float) – Time from final report called on worker to job completion being registered at backend. Defaults to DEFAULT_DELAY

  • delay_complete_after_stop (float) – Time from stop signal received at worker to job completion being registered at backend. Defaults to DEFAULT_DELAY

  • delay_start (float) – Time from start command being sent at backend and job starting on the worker (which is free). Defaults to DEFAULT_DELAY

  • delay_stop (float) – Time from stop signal being sent at backend to signal received at worker (which is running). Defaults to DEFAULT_DELAY

delay_on_trial_result: float = 0.05
delay_complete_after_final_report: float = 0.05
delay_complete_after_stop: float = 0.05
delay_start: float = 0.05
delay_stop: float = 0.05
class syne_tune.backend.simulator_backend.simulator_backend.SimulatorBackend(entry_point, elapsed_time_attr, simulator_config=None, tuner_sleep_time=5.0, debug_resource_attr=None)[source]

Bases: LocalBackend

This simulator backend drives experiments with tabulated training evaluation functions, which return their computation time rather than spend it. To this end, time (on the tuning instance) is simulated using a time_keeper and an event priority queue in _simulator_state.

Time is advanced both by run() waiting, and by non-negligible computations during the tuning loop (in particular, we take care of scheduler.suggest and scheduler.on_trial_result there).

When the entry_point script is executed, we wait for all results to be returned. In each result, the value for key elapsed_time_attr contains the time since start of the script. These values are used to place worker events on the simulated timeline (represented by simulator_state). NOTE: If a trial is resumed, the elapsed_time value contains the time since start of the last recent resume, NOT the cumulative time used by the trial.

Each method call starts by advancing time by what was spent outside, since the last recent call to the backend. Then, all events in simulator_state are processed whose time is before the current time in time_keeper. The method ends by time_keeper.mark_exit().

Note

In this basic version of the simulator backend, we still call a Python main function as a subprocess, which returns the requested metrics by looking them up or running a surrogate. This is flexible, but has the overhead of loading a table at every call. For fast and convenient simulations, use :BlackboxRepositoryBackend after bringing your tabulated data or surrogate benchmark into the blackbox repository.

Parameters:
  • entry_point (str) – Python main file to be tuned (this should return all results directly, and report elapsed time in the elapsed_time_attr field

  • elapsed_time_attr (str) – See above

  • simulator_config (Optional[SimulatorConfig]) – Parameters for simulator, optional

  • tuner_sleep_time (float) – Effective sleep time in run(). This information is needed in SimulatorCallback. Defaults to DEFAULT_SLEEP_TIME

property time_keeper: SimulatedTimeKeeper
start_trial(config, checkpoint_trial_id=None)[source]

Start new trial with new trial ID

Parameters:
  • config (Dict) – Configuration for new trial

  • checkpoint_trial_id (Optional[int]) – If given, the new trial starts from the checkpoint written by this previous trial

Return type:

Trial

Returns:

New trial, which includes new trial ID

fetch_status_results(trial_ids)[source]
Parameters:

trial_ids (List[int]) – Trials whose information should be fetched.

Return type:

(Dict[int, Tuple[Trial, str]], List[Tuple[int, dict]])

Returns:

A tuple containing 1) a dictionary from trial-id to Trial and status information; 2) a list of (trial-id, results) pairs for each new result emitted since the last call. The list of results is sorted by the worker time-stamp.

busy_trial_ids()[source]

Returns list of ids for currently busy trials

A trial is busy if its status is in_progress or stopping. If the execution setup is able to run n_workers jobs in parallel, then if this method returns a list of size n, the tuner may start n_workers - n new jobs.

Return type:

List[Tuple[int, str]]

Returns:

List of (trial_id, status)

syne_tune.backend.simulator_backend.simulator_callback module
class syne_tune.backend.simulator_backend.simulator_callback.SimulatorCallback(extra_results_composer=None)[source]

Bases: StoreResultsCallback

Callback to be used in run() in order to support the SimulatorBackend.

This is doing two things. First, on_tuning_sleep() is advancing the time_keeper of the simulator backend by tuner_sleep_time (also defined in the backend). The real sleep time in Tuner must be 0.

Second, we need to make sure that results written out are annotated by simulated time, not real time. This is already catered for by SimulatorBackend adding ST_TUNER_TIME entries to each result it receives.

Third (and most subtle), we need to make sure the stop criterion in run() is using simulated time instead of real time when making a decision based on max_wallclock_time. By default, StoppingCriterion takes TuningStatus as an input, which counts real time and knows nothing about simulated time. To this end, we modify stop_criterion of the tuner to instead depend on the ST_TUNER_TIME fields in the results received. This allows us to keep both Tuner and TuningStatus independent of the time keeper.

Parameters:

extra_results_composer (Optional[ExtraResultsComposer]) – Optional. If given, this is called in on_trial_result(), and the resulting dictionary is appended as extra columns to the results dataframe

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tuner (Tuner) – Tuner object

on_tuning_sleep(sleep_time)[source]

Called just after tuner has slept, because no worker was available

Parameters:

sleep_time (float) – Time (in secs) for which tuner has just slept

on_tuning_end()[source]

Called once the tuning loop terminates

This is called before Tuner object is serialized (optionally), and also before running jobs are stopped.

syne_tune.backend.simulator_backend.time_keeper module
class syne_tune.backend.simulator_backend.time_keeper.SimulatedTimeKeeper[source]

Bases: TimeKeeper

Here, time is simulated. It needs to be advanced explicitly.

In addition, mark_exit() and real_time_since_last_recent_exit() are used to measure real time spent outside the backend (i.e., in the tuner loop and scheduler). Namely, every method of SimulatorBackend calls mark_exit() before leaving, and real_time_since_last_recent_exit() at the start, advancing the time counter accordingly.

property start_time_stamp: datetime
Returns:

Time stamp (datetime) of (last recent) call of start_of_time

start_of_time()[source]

Called at the start of the experiment. Can be called multiple times if several experiments are run in sequence.

time()[source]
Return type:

float

Returns:

Time elapsed since the start of the experiment

time_stamp()[source]
Return type:

datetime

Returns:

Timestamp (datetime) corresponding to time()

advance(step)[source]

Advance time by step. For real time, this means we sleep for step seconds.

advance_to(to_time)[source]
mark_exit()[source]
real_time_since_last_recent_exit()[source]
Return type:

float

Submodules
syne_tune.backend.local_backend module
class syne_tune.backend.local_backend.LocalBackend(entry_point, delete_checkpoints=False, pass_args_as_json=False, rotate_gpus=True, num_gpus_per_trial=1, gpus_to_use=None)[source]

Bases: TrialBackend

A backend running locally by spawning sub-process concurrently. Note that no resource management is done so the concurrent number of trials should be adjusted to the machine capacity.

Additional arguments on top of parent class TrialBackend:

Parameters:
  • entry_point (str) – Path to Python main file to be tuned

  • rotate_gpus (bool) – In case several GPUs are present, each trial is scheduled on a different GPU. A new trial is preferentially scheduled on a free GPU, and otherwise the GPU with least prior assignments is chosen. If False, then all GPUs are used at the same time for all trials. Defaults to True.

  • num_gpus_per_trial (int) – Number of GPUs to be allocated to each trial. Must be not larger than the total number of GPUs available. Defaults to 1

  • gpus_to_use (Optional[List[int]]) – If this is given, the backend only uses GPUs in this lists (non-negative ints). Entries must be in range(get_num_gpus()). Defaults to using all GPUs.

trial_path(trial_id)[source]
Parameters:

trial_id (int) – ID of trial

Return type:

Path

Returns:

Directory where files related to trial are written to

checkpoint_trial_path(trial_id)[source]
Parameters:

trial_id (int) – ID of trial

Return type:

Path

Returns:

Directory where checkpoints for trial are written to and read from

copy_checkpoint(src_trial_id, tgt_trial_id)[source]

Copy the checkpoint folder from one trial to the other.

Parameters:
  • src_trial_id (int) – Source trial ID (copy from)

  • tgt_trial_id (int) – Target trial ID (copy to)

delete_checkpoint(trial_id)[source]

Removes checkpoint folder for a trial. It is OK for the folder not to exist.

Parameters:

trial_id (int) – ID of trial for which checkpoint files are deleted

busy_trial_ids()[source]

Returns list of ids for currently busy trials

A trial is busy if its status is in_progress or stopping. If the execution setup is able to run n_workers jobs in parallel, then if this method returns a list of size n, the tuner may start n_workers - n new jobs.

Return type:

List[Tuple[int, str]]

Returns:

List of (trial_id, status)

stdout(trial_id)[source]

Fetch stdout log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stdout)

stderr(trial_id)[source]

Fetch stderr log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stderr)

set_path(results_root=None, tuner_name=None)[source]
Parameters:
  • results_root (Optional[str]) – The local folder that should contain the results of the tuning experiment. Used by Tuner to indicate a desired path where the results should be written to. This is used to unify the location of backend files and Tuner results when possible (in the local backend). By default, the backend does not do anything since not all backends may be able to unify their file locations.

  • tuner_name (Optional[str]) – Name of the tuner, can be used for instance to save checkpoints on remote storage.

entrypoint_path()[source]
Return type:

Path

Returns:

Entrypoint path of script to be executed

set_entrypoint(entry_point)[source]

Update the entrypoint.

Parameters:

entry_point (str) – New path of the entrypoint.

syne_tune.backend.time_keeper module
class syne_tune.backend.time_keeper.TimeKeeper[source]

Bases: object

To be used by tuner, backend, and scheduler to measure time differences and wait for a specified amount of time. By centralizing this functionality here, we can support simulating experiments much faster than real time if the training evaluation function corresponds to a tabulated benchmark.

start_of_time()[source]

Called at the start of the experiment. Can be called multiple times if several experiments are run in sequence.

time()[source]
Return type:

float

Returns:

Time elapsed since the start of the experiment

time_stamp()[source]
Return type:

datetime

Returns:

Timestamp (datetime) corresponding to time()

advance(step)[source]

Advance time by step. For real time, this means we sleep for step seconds.

class syne_tune.backend.time_keeper.RealTimeKeeper[source]

Bases: TimeKeeper

start_of_time()[source]

Called at the start of the experiment. Can be called multiple times if several experiments are run in sequence.

time()[source]
Return type:

float

Returns:

Time elapsed since the start of the experiment

time_stamp()[source]
Return type:

datetime

Returns:

Timestamp (datetime) corresponding to time()

advance(step)[source]

Advance time by step. For real time, this means we sleep for step seconds.

syne_tune.backend.trial_backend module
class syne_tune.backend.trial_backend.TrialBackend(delete_checkpoints=False, pass_args_as_json=False)[source]

Bases: object

Interface for backend to execute evaluations of trials.

Parameters:
  • delete_checkpoints (bool) – If True, the checkpoints written by a trial are deleted once the trial is stopped or is registered as completed. Checkpoints of paused trials may also be removed, if the scheduler supports early checkpoint removal. Also, as part of stop_all() called at the end of the tuning loop, all remaining checkpoints are deleted. Defaults to False (no checkpoints are removed).

  • pass_args_as_json (bool) – Normally, the hyperparameter configuration is passed as command line arguments to the trial evaluation script. This works if all hyperparameters have elementary types. If pass_args_as_json == True, the configuration is instead written into a JSON file, whose name is passed as command line argument ST_CONFIG_JSON_FNAME_ARG. The trial evaluation script then loads the configuration from this file. This allows the configuration to contain entries with complex types (e.g., lists or dictionaries), as long as they are JSON-serializable. Defaults to False.

start_trial(config, checkpoint_trial_id=None)[source]

Start new trial with new trial ID

Parameters:
  • config (Dict[str, Any]) – Configuration for new trial

  • checkpoint_trial_id (Optional[int]) – If given, the new trial starts from the checkpoint written by this previous trial

Return type:

TrialResult

Returns:

New trial, which includes new trial ID

copy_checkpoint(src_trial_id, tgt_trial_id)[source]

Copy the checkpoint folder from one trial to the other.

Parameters:
  • src_trial_id (int) – Source trial ID (copy from)

  • tgt_trial_id (int) – Target trial ID (copy to)

delete_checkpoint(trial_id)[source]

Removes checkpoint folder for a trial. It is OK for the folder not to exist.

Parameters:

trial_id (int) – ID of trial for which checkpoint files are deleted

resume_trial(trial_id, new_config=None)[source]

Resume paused trial

Parameters:
  • trial_id (int) – ID of (paused) trial to be resumed

  • new_config (Optional[dict]) – If given, the config maintained in trial.config is replaced by new_config

Return type:

TrialResult

Returns:

Information for resumed trial

pause_trial(trial_id, result=None)[source]

Pauses a running trial

Checks that the operation is valid and calls backend internal implementation to actually pause the trial. If the status is queried after this function, it should be "paused".

Parameters:
  • trial_id (int) – ID of trial to pause

  • result (Optional[dict]) – Result dict based on which scheduler decided to pause the trial

stop_trial(trial_id, result=None)[source]

Stops (and terminates) a running trial

Checks that the operation is valid and calls backend internal implementation to actually stop the trial. f the status is queried after this function, it should be "stopped".

Parameters:
  • trial_id (int) – ID of trial to stop

  • result (Optional[dict]) – Result dict based on which scheduler decided to stop the trial

new_trial_id()[source]
Return type:

int

fetch_status_results(trial_ids)[source]
Parameters:

trial_ids (List[int]) – Trials whose information should be fetched.

Return type:

(Dict[int, Tuple[Trial, str]], List[Tuple[int, dict]])

Returns:

A tuple containing 1) a dictionary from trial-id to Trial and status information; 2) a list of (trial-id, results) pairs for each new result emitted since the last call. The list of results is sorted by the worker time-stamp.

busy_trial_ids()[source]

Returns list of ids for currently busy trials

A trial is busy if its status is in_progress or stopping. If the execution setup is able to run n_workers jobs in parallel, then if this method returns a list of size n, the tuner may start n_workers - n new jobs.

Return type:

List[Tuple[int, str]]

Returns:

List of (trial_id, status)

stdout(trial_id)[source]

Fetch stdout log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stdout)

stderr(trial_id)[source]

Fetch stderr log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stderr)

stop_all()[source]

Stop all trials which are in progress.

set_path(results_root=None, tuner_name=None)[source]
Parameters:
  • results_root (Optional[str]) – The local folder that should contain the results of the tuning experiment. Used by Tuner to indicate a desired path where the results should be written to. This is used to unify the location of backend files and Tuner results when possible (in the local backend). By default, the backend does not do anything since not all backends may be able to unify their file locations.

  • tuner_name (Optional[str]) – Name of the tuner, can be used for instance to save checkpoints on remote storage.

entrypoint_path()[source]
Return type:

Path

Returns:

Entrypoint path of script to be executed

set_entrypoint(entry_point)[source]

Update the entrypoint.

Parameters:

entry_point (str) – New path of the entrypoint.

on_tuner_save()[source]

Called at the end of save().

syne_tune.backend.trial_status module
class syne_tune.backend.trial_status.Status[source]

Bases: object

completed: str = 'Completed'
in_progress: str = 'InProgress'
failed: str = 'Failed'
paused: str = 'Paused'
stopped: str = 'Stopped'
stopping: str = 'Stopping'
class syne_tune.backend.trial_status.Trial(trial_id, config, creation_time)[source]

Bases: object

trial_id: int
config: Dict[str, object]
creation_time: datetime
add_results(metrics, status, training_end_time)[source]
class syne_tune.backend.trial_status.TrialResult(trial_id, config, creation_time, metrics, status, training_end_time=None)[source]

Bases: Trial

metrics: List[Dict[str, object]]
status: Literal['Completed', 'InProgress', 'Failed', 'Stopped', 'Stopping']
training_end_time: Optional[datetime] = None
property seconds
property cost
syne_tune.blackbox_repository package
class syne_tune.blackbox_repository.BlackboxOffline(df_evaluations, configuration_space, fidelity_space=None, objectives_names=None, seed_col=None)[source]

Bases: Blackbox

A blackbox obtained given offline evaluations. Each row of the dataframe should contain one evaluation given a fixed configuration, fidelity and seed. The columns must correspond to the provided configuration and fidelity space, by default all columns that are prefixed by "metric_" are assumed to be metrics but this can be overridden by providing metric columns.

Additional arguments on top of parent class Blackbox:

Parameters:
  • df_evaluations (DataFrame) – Data frame with evaluations data

  • seed_col (Optional[str]) – optional, can be used when multiple seeds are recorded

hyperparameter_objectives_values(predict_curves=False)[source]

If predict_curves is False, the shape of X is (num_evals * num_seeds * num_fidelities, num_hps + 1), the shape of y is (num_evals * num_seeds * num_fidelities, num_objectives). This can be reshaped to (num_fidelities, num_seeds, num_evals, *). The final column of X is the fidelity value (only a single fidelity attribute is supported).

If predict_curves is True, the shape of X is (num_evals * num_seeds, num_hps), the shape of y is (num_evals * num_seeds, num_fidelities * num_objectives). The latter can be reshaped to (num_seeds, num_evals, num_fidelities, num_objectives).

Returns:

a tuple of two dataframes (X, y), where X contains hyperparameters values and y contains objective values, this is used when fitting a surrogate model.

syne_tune.blackbox_repository.deserialize(path)[source]
Parameters:
  • path (str) – where to find blackbox serialized information (at least data.csv.zip and configspace.json)

  • groupby_col – separate evaluations into a list of blackbox with different task if the column is provided

Return type:

Union[Dict[str, BlackboxOffline], BlackboxOffline]

Returns:

list of blackboxes per task, or single blackbox in the case of a single task

syne_tune.blackbox_repository.load_blackbox(name, skip_if_present=True, s3_root=None, generate_if_not_found=True, yahpo_kwargs=None, ignore_hash=True)[source]
Parameters:
  • name (str) –

    name of a blackbox present in the repository, see blackbox_list() to get list of available blackboxes. Syne Tune currently provides the following blackboxes evaluations:

    • ”nasbench201”: 15625 multi-fidelity configurations of computer vision architectures evaluated on 3 datasets. NAS-Bench-201: Extending the scope of reproducible neural architecture search. Dong, X. and Yang, Y. 2020.

    • ”fcnet”: 62208 multi-fidelity configurations of MLP evaluated on 4 datasets. Tabular benchmarks for joint architecture and hyperparameter optimization. Klein, A. and Hutter, F. 2019.

    • ”lcbench”: 2000 multi-fidelity Pytorch model configurations evaluated on many datasets. Reference: Auto-PyTorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL. Lucas Zimmer, Marius Lindauer, Frank Hutter. 2020.

    • ”icml-deepar”: 2420 single-fidelity configurations of DeepAR forecasting algorithm evaluated on 10 datasets. A quantile-based approach for hyperparameter transfer learning. Salinas, D., Shen, H., and Perrone, V. 2021.

    • ”icml-xgboost”: 5O00 single-fidelity configurations of XGBoost evaluated on 9 datasets. A quantile-based approach for hyperparameter transfer learning. Salinas, D., Shen, H., and Perrone, V. 2021.

    • ”yahpo-*”: Number of different benchmarks from YAHPO Gym. Note that these blackboxes come with surrogates already, so no need to wrap them into SurrogateBlackbox

  • skip_if_present (bool) – skip the download if the file locally exists

  • s3_root (Optional[str]) – S3 root directory for blackbox repository. Defaults to S3 bucket name of SageMaker session

  • generate_if_not_found (bool) – If the blackbox file is not present locally or on S3, should it be generated using its conversion script?

  • yahpo_kwargs (Optional[dict]) – For a YAHPO blackbox (name == "yahpo-*"), these are additional arguments to instantiate_yahpo

  • ignore_hash (bool) – do not check if hash of currently stored files matches the pre-computed hash. Be careful with this option. If hashes do not match, results might not be reproducible.

Return type:

Union[Dict[str, Blackbox], Blackbox]

Returns:

blackbox with the given name, download it if not present.

syne_tune.blackbox_repository.blackbox_list()[source]
Return type:

List[str]

Returns:

list of blackboxes available

syne_tune.blackbox_repository.add_surrogate(blackbox, surrogate=None, configuration_space=None, predict_curves=None, separate_seeds=False, fit_differences=None)[source]

Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation.

Parameters:
  • blackbox (Blackbox) – the blackbox must implement hyperparameter_objectives_values() so that input/output are passed to estimate the model

  • surrogate – the model that is fitted to predict objectives given any configuration. Possible examples: KNeighborsRegressor(n_neighbors=1), MLPRegressor() or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows in X to vectors. We use configuration_space to deduce the types of columns in X (categorical parameters are one-hot encoded).

  • configuration_space (Optional[dict]) – configuration space for the resulting blackbox surrogate. The default is blackbox.configuration_space. But note that if blackbox is tabular, the domains in blackbox.configuration_space are typically categorical even for numerical parameters.

  • predict_curves (Optional[bool]) – If True, the surrogate uses multivariate regression to predict metric curves over fidelities. If False, fidelity is used as input. The latter can lead to inconsistent predictions along fidelity and is typically more expensive. If not given, the default value is False if blackbox is of type BlackboxOffline, otherwise True.

  • separate_seeds (bool) – If True, seeds in blackbox map to seeds in the surrogate blackbox, which fits different models to each seed. If False, the data from blackbox is merged for all seeds, and the surrogate represents a single seed. The latter provides more data for the surrogate model to be fit, but the variation between seeds is lost in the surrogate. Defaults to False.

  • fit_differences (Optional[List[str]]) – Names of objectives which are cumulative sums. For these objectives, the y data is transformed to finite differences before fitting the model. This is recommended for elapsed_time objectives.

Returns:

a blackbox where the output is obtained through the fitted surrogate

class syne_tune.blackbox_repository.BlackboxRepositoryBackend(blackbox_name, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, dataset=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, config_space_surrogate=None, **simulatorbackend_kwargs)[source]

Bases: _BlackboxSimulatorBackend

Allows to simulate a blackbox from blackbox-repository, selected by blackbox_name. See examples/launch_simulated_benchmark.py for an example on how to use. If you want to add a new dataset, see the Adding a new dataset section of syne_tune/blackbox_repository/README.md.

In each result reported to the simulator backend, the value for key elapsed_time_attr must be the time since the start of the evaluation. For example, if resource (or fidelity) equates to epochs trained, this would be the time from start of training until the end of the epoch. If the blackbox contains this information in a column, elapsed_time_attr should be its key.

If this backend is used with pause-and-resume multi-fidelity scheduling, it needs to track at which resource level each trial is paused. Namely, once a trial is resumed, all results for resources smaller or equal to that level are ignored, which simulates the situation that training is resumed from a checkpoint. This feature relies on result to be passed to pause_trial(). If this is not done, the backend cannot know from which resource level to resume a trial, so it starts the trial from scratch (which is equivalent to no checkpointing). The same happens if support_checkpointing is False.

Note

If the blackbox maintains cumulative time (elapsed_time), this is different from what SimulatorBackend requires for elapsed_time_attr, if a pause-and-resume scheduler is used. Namely, the backend requires the time since the start of the last recent resume. This conversion is done here internally in _run_job_and_collect_results(), which is called for each resume. This means that the field elapsed_time_attr is not what is received from the blackbox table, but instead what the backend needs.

max_resource_attr plays the same role as in HyperbandScheduler. If given, it is the key in a configuration config for the maximum resource. This is used by schedulers which limit each evaluation by setting this argument (e.g., promotion-based Hyperband).

If seed is given, entries of the blackbox are queried for this seed. Otherwise, a seed is drawn at random for every trial, but the same seed is used for all _run_job_and_collect_results() calls for the same trial. This is important for pause and resume scheduling.

Parameters:
  • blackbox_name (str) – Name of a blackbox, must have been registered in blackbox repository.

  • elapsed_time_attr (str) – Name of the column containing cumulative time

  • max_resource_attr (Optional[str]) – See above

  • seed (Optional[int]) – If given, this seed is used for all trial evaluations. Otherwise, seed is sampled at random for each trial. Only relevant for blackboxes with multiple seeds

  • support_checkpointing (bool) – If False, the simulation does not do checkpointing, so resumed trials are started from scratch. Defaults to True

  • dataset (Optional[str]) – Selects different versions of the blackbox (typically, the same ML model has been trained on different datasets)

  • surrogate (Optional[str]) – Optionally, a model that is fitted to predict objectives given any configuration. Examples: “KNeighborsRegressor”, “MLPRegressor”, “XGBRegressor”, which would enable using the corresponding scikit-learn estimator, see also make_surrogate(). The model is fit on top of pipeline that applies basic feature-processing to convert hyperparameter rows in X to vectors. The configuration_space hyperparameter types are used to deduce the types of columns in X (for instance, categorical hyperparameters are one-hot encoded).

  • surrogate_kwargs (Optional[dict]) – Arguments for the scikit-learn estimator, for instance {"n_neighbors": 1} can be used if surrogate="KNeighborsRegressor" is chosen. If blackbox_name is a YAHPO blackbox, then surrogate_kwargs is passed as yahpo_kwargs to load_blackbox(). In this case, surrogate is ignored (YAHPO always uses surrogates).

  • config_space_surrogate (Optional[dict]) – If surrogate is given, this is the configuration space for the surrogate blackbox. If not given, the space of the original blackbox is used. However, its numerical parameters have finite domains (categorical or ordinal), which is usually not what we want for a surrogate.

  • simulatorbackend_kwargs – Additional arguments to parent SimulatorBackend

property blackbox: Blackbox
class syne_tune.blackbox_repository.UserBlackboxBackend(blackbox, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, **simulatorbackend_kwargs)[source]

Bases: _BlackboxSimulatorBackend

Version of _BlackboxSimulatorBackend, where the blackbox is given as explicit Blackbox object. See examples/launch_simulated_benchmark.py for an example on how to use.

Additional arguments on top of parent _BlackboxSimulatorBackend:

Parameters:

blackbox (Blackbox) – Blackbox to be used for simulation

property blackbox: Blackbox
Subpackages
syne_tune.blackbox_repository.conversion_scripts package
Subpackages
syne_tune.blackbox_repository.conversion_scripts.scripts package
Subpackages
syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench package
Submodules
syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.api module
class syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.api.Benchmark(data_dir, cache=False, cache_dir='cached/')[source]

Bases: object

API for TabularBench.

query(dataset_name, tag, config_id)[source]

Query a run.

Keyword arguments: dataset_name – str, the name of the dataset in the benchmark tag – str, the tag you want to query config_id – int, an identifier for which run you want to query, if too large will query the last run

query_best(dataset_name, tag, criterion, position=0)[source]

Query the n-th best run. “Best” here means achieving the largest value at any epoch/step,

Keyword arguments: dataset_name – str, the name of the dataset in the benchmark tag – str, the tag you want to query criterion – str, the tag you want to use for the ranking position – int, an identifier for which position in the ranking you want to query

get_queriable_tags(dataset_name=None, config_id=None)[source]

Returns a list of all queriable tags

get_dataset_names()[source]

Returns a list of all availabe dataset names like defined on openml

get_openml_task_ids()[source]

Returns a list of openml task ids

get_number_of_configs(dataset_name)[source]

Returns the number of configurations for a dataset

get_config(dataset_name, config_id)[source]

Returns the configuration of a run specified by dataset name and config id

plot_by_name(dataset_names, x_col, y_col, n_configs=10, show_best=False, xscale='linear', yscale='linear', criterion=None)[source]

Plot multiple datasets and multiple runs.

Keyword arguments: dataset_names – list x_col – str, tag to plot on x-axis y_col – str, tag to plot on y-axis n_configs – int, number of configs to plot for each dataset show_best – bool, weather to show the n_configs best (according to query_best()) xscale – str, set xscale, options as in matplotlib: “linear”, “log”, “symlog”, “logit”, … yscale – str, set yscale, options as in matplotlib: “linear”, “log”, “symlog”, “logit”, … criterion – str, tag used as criterion for query_best()

syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.lcbench module
syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.lcbench.convert_task(bench, dataset_name)[source]
class syne_tune.blackbox_repository.conversion_scripts.scripts.lcbench.lcbench.LCBenchRecipe[source]

Bases: BlackboxRecipe

Submodules
syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import module
Convert tabular data from

Tabular Benchmarks for Joint Architecture and Hyperparameter Optimization Aaron Klein Frank Hutter https://arxiv.org/pdf/1905.04970.pdf.

syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import.convert_dataset(dataset_path, max_rows=None)[source]
syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import.generate_fcnet()[source]
syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import.plot_learning_curves()[source]
class syne_tune.blackbox_repository.conversion_scripts.scripts.fcnet_import.FCNETRecipe[source]

Bases: BlackboxRecipe

syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import module
Convert evaluations from

A Quantile-based Approach for Hyperparameter Transfer Learning David Salinas Huibin Shen Valerio Perrone http://proceedings.mlr.press/v119/salinas20a/salinas20a.pdf

syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import.download(blackbox)[source]
syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import.serialize_deepar()[source]
syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import.serialize_xgboost()[source]

‘hp_log2_min_child_weight’, ‘hp_subsample’, ‘hp_colsample_bytree’, ‘hp_log2_gamma’, ‘hp_log2_lambda’, ‘hp_eta’, ‘hp_max_depth_index’, ‘hp_log2_alpha’, ‘metric_error’, ‘blackbox’, ‘task’

class syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import.XGBoostRecipe[source]

Bases: BlackboxRecipe

class syne_tune.blackbox_repository.conversion_scripts.scripts.icml2020_import.DeepARRecipe[source]

Bases: BlackboxRecipe

syne_tune.blackbox_repository.conversion_scripts.scripts.nasbench201_import module
syne_tune.blackbox_repository.conversion_scripts.scripts.nasbench201_import.str_to_list(arch_str)[source]
syne_tune.blackbox_repository.conversion_scripts.scripts.nasbench201_import.convert_dataset(data, dataset)[source]
class syne_tune.blackbox_repository.conversion_scripts.scripts.nasbench201_import.NASBench201Recipe[source]

Bases: BlackboxRecipe

syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import module
syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import.convert_task(task_data)[source]
class syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import.PD1Recipe[source]

Bases: BlackboxRecipe

syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import.serialize(bb_dict, path, metadata=None)[source]
syne_tune.blackbox_repository.conversion_scripts.scripts.pd1_import.deserialize(path)[source]

Deserialize blackboxes contained in a path that were saved with serialize above. TODO: the API is currently dissonant with serialize, deserialize for BlackboxOffline as serialize is there a member. A possible way to unify is to have serialize also be a free function for BlackboxOffline. :type path: str :param path: a path that contains blackboxes that were saved with serialize :rtype: Dict[str, BlackboxTabular] :return: a dictionary from task name to blackbox

syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import module

Wrap Surrogates from YAHPO Gym - An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization Florian Pfisterer, Lennart Schneider, Julia Moosbauer, Martin Binder, Bernd Bischl

syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.download(target_path, version)[source]
class syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.BlackBoxYAHPO(benchmark, fidelities=None)[source]

Bases: Blackbox

A wrapper that allows putting a ‘YAHPO’ BenchmarkInstance into a Blackbox.

If fidelities is given, it restricts fidelity_values to these values. The sequence must be positive int and increasing. This works only if there is a single fidelity attribute with integer values (but note that for some specific YAHPO benchmarks, a fractional fidelity is transformed to an integer one).

Even though YAHPO interpolates between fidelities, it can make sense to restrict them to the values which have really been acquired in the data. Note that this restricts multi-fidelity schedulers like HyperbandScheduler, in that all their rungs levels have to be fidelity values.

For example, for YAHPO iaml, the fidelity trainsize has been acquired at [0.05, 0.1, 0.2, 0.4, 0.6, 0.8, 1], this is transformed to [1, 2, 4, 8, 12, 16, 20]. By default, the fidelity is represented by cs.randint(1, 20), but if fidelities is passed, it uses cs.ordinal(fidelities).

Parameters:
  • benchmark (BenchmarkSet) – YAHPO BenchmarkSet

  • fidelities (Optional[List[int]]) – See above

set_instance(instance)[source]

Set an instance for the underlying YAHPO Benchmark.

property instances: array
property fidelity_values: array
Returns:

Fidelity values; or None if the blackbox has none

property time_attribute: str

Name of the time column

syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.cs_to_synetune(config_space)[source]

Convert ConfigSpace.ConfigSpace to a synetune configspace.

TODO cover all possible hyperparameters of ConfigSpace.ConfigSpace, right now we only convert the one we need.

syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.instantiate_yahpo(scenario, check=False, fidelities=None)[source]

Instantiates a dict of BlackBoxYAHPO, one entry for each instance.

Parameters:
  • scenario (str) –

  • check (bool) – If False, objective_function of the blackbox does not check whether the input configuration is valid. This is faster, but calls fail silently if configurations are invalid.

Returns:

syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.serialize_yahpo(scenario, target_path, version='1.0')[source]
class syne_tune.blackbox_repository.conversion_scripts.scripts.yahpo_import.YAHPORecipe(name)[source]

Bases: BlackboxRecipe

Submodules
syne_tune.blackbox_repository.conversion_scripts.blackbox_recipe module
class syne_tune.blackbox_repository.conversion_scripts.blackbox_recipe.BlackboxRecipe(name, cite_reference, hash=None)[source]

Bases: object

generate(s3_root=None)[source]

Generates the blackbox on disk then upload it on s3 if AWS is available. :type s3_root: Optional[str] :param s3_root: s3 root where to upload to s3, default to s3://{sagemaker-bucket}/blackbox-repository. If AWS is not available, this step is skipped and the dataset is just persisted locally. :return:

syne_tune.blackbox_repository.conversion_scripts.recipes module
syne_tune.blackbox_repository.conversion_scripts.utils module
syne_tune.blackbox_repository.conversion_scripts.utils.s3_blackbox_folder(s3_root=None)[source]
syne_tune.blackbox_repository.conversion_scripts.utils.get_sub_directory_and_name(name)[source]

Blackboxes are either stored under “{blackbox-repository}/{name}” (such as fcnet, nas201, …) or “{blackbox-repository}/{subdir}/{subname}” for all yahpo benchmark. In the Yahpo case, “yahpo-rbv2_xgboost” is for instance stored on “{blackbox-repository}/yahpo/rbv2_xgboost/”. :type name: str :param name: name of the blackbox, for instance “fcnet”, “lcbench” or “yahpo-rbv2_xgboost”. :return: subdirectory and subname such that the blackbox should be stored on {blackbox_repository}/{subdir}/{name}.

syne_tune.blackbox_repository.conversion_scripts.utils.blackbox_local_path(name)[source]
Return type:

Path

syne_tune.blackbox_repository.conversion_scripts.utils.blackbox_s3_path(name, s3_root=None)[source]
Return type:

Path

syne_tune.blackbox_repository.conversion_scripts.utils.upload_blackbox(name, s3_root=None)[source]

Uploads a blackbox locally present in repository_path to S3. :type name: str :param name: folder must be available in repository_path/name

syne_tune.blackbox_repository.conversion_scripts.utils.download_file(source, destination)[source]
syne_tune.blackbox_repository.conversion_scripts.utils.compute_hash_binary(filename)[source]
syne_tune.blackbox_repository.conversion_scripts.utils.compute_hash_benchmark(tgt_folder)[source]
syne_tune.blackbox_repository.conversion_scripts.utils.validate_hash(tgt_folder, original_hash)[source]

Computes hash of the files in tgt_folder and validates it with the original hash :type tgt_folder: :param tgt_folder: target folder that contains the files of the original benchmark :type original_hash: :param original_hash: original sha256 hash :return:

Submodules
syne_tune.blackbox_repository.blackbox module
class syne_tune.blackbox_repository.blackbox.Blackbox(configuration_space, fidelity_space=None, objectives_names=None)[source]

Bases: object

Interface designed to be compatible with

Parameters:
  • configuration_space (Dict[str, Any]) – Configuration space of blackbox.

  • fidelity_space (Optional[dict]) – Fidelity space for blackbox, optional.

  • objectives_names (Optional[List[str]]) – Names of the metrics, by default consider all metrics prefixed by "metric_" to be metrics

objective_function(configuration, fidelity=None, seed=None)[source]

Returns an evaluation of the blackbox.

First perform data check and then call _objective_function() that should be overriden in the child class.

Parameters:
  • configuration (Dict[str, Any]) – configuration to be evaluated, should belong to configuration_space

  • fidelity (Union[dict, Number, None]) – not passing a fidelity is possible if either the blackbox does not have a fidelity space or if it has a single fidelity in its fidelity space. In the latter case, all fidelities are returned in form of a tensor with shape (num_fidelities, num_objectives).

  • seed (Optional[int]) – Only used if the blackbox defines multiple seeds

Return type:

Union[Dict[str, float], ndarray]

Returns:

dictionary of objectives evaluated or tensor with shape (num_fidelities, num_objectives) if no fidelity was given.

hyperparameter_objectives_values(predict_curves=False)[source]

If predict_curves is False, the shape of X is (num_evals * num_seeds * num_fidelities, num_hps + 1), the shape of y is (num_evals * num_seeds * num_fidelities, num_objectives). This can be reshaped to (num_fidelities, num_seeds, num_evals, *). The final column of X is the fidelity value (only a single fidelity attribute is supported).

If predict_curves is True, the shape of X is (num_evals * num_seeds, num_hps), the shape of y is (num_evals * num_seeds, num_fidelities * num_objectives). The latter can be reshaped to (num_seeds, num_evals, num_fidelities, num_objectives).

Return type:

Tuple[DataFrame, DataFrame]

Returns:

a tuple of two dataframes (X, y), where X contains hyperparameters values and y contains objective values, this is used when fitting a surrogate model.

property fidelity_values: array | None
Returns:

Fidelity values; or None if the blackbox has none

fidelity_name()[source]

Can only be used for blackboxes with a single fidelity attribute.

Return type:

str

Returns:

Name of fidelity attribute (must be single one)

configuration_space_with_max_resource_attr(max_resource_attr)[source]

It is best practice to have one attribute in the configuration space to represent the maximum fidelity value used for evaluation (e.g., the maximum number of epochs).

Parameters:

max_resource_attr (str) – Name of new attribute for maximum resource

Return type:

Dict[str, Any]

Returns:

Configuration space augmented by the new attribute

syne_tune.blackbox_repository.blackbox.from_function(configuration_space, eval_fun, fidelity_space=None, objectives_names=None)[source]

Helper to create a blackbox from a function, useful for test or to wrap-up real blackbox functions.

Parameters:
  • configuration_space (Dict[str, Any]) – Configuration space for blackbox

  • eval_fun (Callable) – Function that returns dictionary of objectives given configuration and fidelity

  • fidelity_space (Optional[dict]) – Fidelity space for blackbox

  • objectives_names (Optional[List[str]]) – Objectives returned by blackbox

Return type:

Blackbox

Returns:

Resulting blackbox wrapping eval_fun

syne_tune.blackbox_repository.blackbox_offline module
class syne_tune.blackbox_repository.blackbox_offline.BlackboxOffline(df_evaluations, configuration_space, fidelity_space=None, objectives_names=None, seed_col=None)[source]

Bases: Blackbox

A blackbox obtained given offline evaluations. Each row of the dataframe should contain one evaluation given a fixed configuration, fidelity and seed. The columns must correspond to the provided configuration and fidelity space, by default all columns that are prefixed by "metric_" are assumed to be metrics but this can be overridden by providing metric columns.

Additional arguments on top of parent class Blackbox:

Parameters:
  • df_evaluations (DataFrame) – Data frame with evaluations data

  • seed_col (Optional[str]) – optional, can be used when multiple seeds are recorded

hyperparameter_objectives_values(predict_curves=False)[source]

If predict_curves is False, the shape of X is (num_evals * num_seeds * num_fidelities, num_hps + 1), the shape of y is (num_evals * num_seeds * num_fidelities, num_objectives). This can be reshaped to (num_fidelities, num_seeds, num_evals, *). The final column of X is the fidelity value (only a single fidelity attribute is supported).

If predict_curves is True, the shape of X is (num_evals * num_seeds, num_hps), the shape of y is (num_evals * num_seeds, num_fidelities * num_objectives). The latter can be reshaped to (num_seeds, num_evals, num_fidelities, num_objectives).

Returns:

a tuple of two dataframes (X, y), where X contains hyperparameters values and y contains objective values, this is used when fitting a surrogate model.

syne_tune.blackbox_repository.blackbox_offline.serialize(bb_dict, path, categorical_cols=[])[source]
Parameters:
  • bb_dict (Dict[str, BlackboxOffline]) –

  • path (str) –

  • categorical_cols (List[str]) – optional, allow to retrieve columns as categories, lower drastically the memory footprint when few values are present

Returns:

syne_tune.blackbox_repository.blackbox_offline.deserialize(path)[source]
Parameters:
  • path (str) – where to find blackbox serialized information (at least data.csv.zip and configspace.json)

  • groupby_col – separate evaluations into a list of blackbox with different task if the column is provided

Return type:

Union[Dict[str, BlackboxOffline], BlackboxOffline]

Returns:

list of blackboxes per task, or single blackbox in the case of a single task

syne_tune.blackbox_repository.blackbox_surrogate module
class syne_tune.blackbox_repository.blackbox_surrogate.Columns(names=None)[source]

Bases: BaseEstimator, TransformerMixin

fit(*args, **kwargs)[source]
transform(X)[source]
class syne_tune.blackbox_repository.blackbox_surrogate.BlackboxSurrogate(X, y, configuration_space, objectives_names, fidelity_space=None, fidelity_values=None, surrogate=None, predict_curves=False, num_seeds=1, fit_differences=None, max_fit_samples=None, name=None)[source]

Bases: Blackbox

Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation. To wrap an existing blackbox with a surrogate estimator, use add_surrogate() which automatically extract X, y matrices from available blackbox evaluations.

The surrogate regression model is provided by surrogate, it has to conform to the scikit-learn fit-predict API. If predict_curves is True, the model maps features of the configuration to the whole curve over fidelities, separate for each metric and seed. This has several advantages. First, predictions are consistent: if all curves in the data respect a certain property which is retained under convex combinations, predictions have this property as well (examples: positivity, monotonicity). This is important for elapsed_time metrics. The regression models are also fairly compact, and prediction is fast, max_fit_samples is normally not needed.

If predict_curves is False, the model maps features from configuration and fidelity to metric values (univariate regression). In this case, properties like monotonicity are not retained. Also, training can take long and the trained models can be large.

This difference only matters if there are fidelities. Otherwise, regression is always univariate.

If num_seeds is given, we maintain different surrogate models for each seed. Otherwise, a single surrogate model is fit to data across all seeds.

If fit_differences is given, it contains names of objectives which are cumulative sums. For these objectives, the y data is transformed to finite differences before fitting the model. This is recommended for elapsed_time objectives. This feature only matters if there are fidelities.

Additional arguments on top of parent class Blackbox:

Parameters:
  • X (DataFrame) – dataframe containing hyperparameters values. Shape is (num_seeds * num_evals, num_hps) if predict_curves is True, (num_fidelities * num_seeds * num_evals, num_hps) otherwise

  • y (DataFrame) – dataframe containing objectives values. Shape is (num_seeds * num_evals, num_fidelities * num_objectives) if predict_curves is True, and (num_fidelities * num_seeds * num_evals, num_objectives) otherwise

  • surrogate – the model that is fitted to predict objectives given any configuration, default to KNeighborsRegressor(n_neighbors=1). If predict_curves is True, this must be multi-variate regression, i.e. accept target matrices in fit, where columns correspond to fidelities. Regression models from scikit-learn allow for that. Possible examples: KNeighborsRegressor(n_neighbors=1), MLPRegressor() or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows in X to vectors. We use the configuration_space hyperparameters types to deduce the types of columns in X (for instance, Categorical values are one-hot encoded).

  • predict_curves (bool) – See above. Default is False (backwards compatible)

  • num_seeds (int) – See above

  • fit_differences (Optional[List[str]]) – See above

  • max_fit_samples (Optional[int]) – maximum number of samples to be fed to the surrogate estimator, if the more data points than this number are passed, then they are subsampled without replacement. If num_seeds is used, this is a limit on the data per seed

  • name (Optional[str]) –

property fidelity_values: array | None
Returns:

Fidelity values; or None if the blackbox has none

property num_fidelities: int
static make_model_pipeline(configuration_space, fidelity_space, model, predict_curves=False)[source]

Create feature pipeline for scikit-learn model

Parameters:
  • configuration_space – Configuration space

  • fidelity_space – Fidelity space

  • model – Scikit-learn model

  • predict_curves – Predict full curves?

Returns:

Feature pipeline

fit_surrogate(X, y)[source]

Fits a surrogate model to data from a blackbox. Here, the targets y can be a matrix with the number of columns equal to the number of fidelity values (the predict_curves = True case).

Return type:

Blackbox

hyperparameter_objectives_values(predict_curves=False)[source]

If predict_curves is False, the shape of X is (num_evals * num_seeds * num_fidelities, num_hps + 1), the shape of y is (num_evals * num_seeds * num_fidelities, num_objectives). This can be reshaped to (num_fidelities, num_seeds, num_evals, *). The final column of X is the fidelity value (only a single fidelity attribute is supported).

If predict_curves is True, the shape of X is (num_evals * num_seeds, num_hps), the shape of y is (num_evals * num_seeds, num_fidelities * num_objectives). The latter can be reshaped to (num_seeds, num_evals, num_fidelities, num_objectives).

Return type:

Tuple[DataFrame, DataFrame]

Returns:

a tuple of two dataframes (X, y), where X contains hyperparameters values and y contains objective values, this is used when fitting a surrogate model.

syne_tune.blackbox_repository.blackbox_surrogate.add_surrogate(blackbox, surrogate=None, configuration_space=None, predict_curves=None, separate_seeds=False, fit_differences=None)[source]

Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation.

Parameters:
  • blackbox (Blackbox) – the blackbox must implement hyperparameter_objectives_values() so that input/output are passed to estimate the model

  • surrogate – the model that is fitted to predict objectives given any configuration. Possible examples: KNeighborsRegressor(n_neighbors=1), MLPRegressor() or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows in X to vectors. We use configuration_space to deduce the types of columns in X (categorical parameters are one-hot encoded).

  • configuration_space (Optional[dict]) – configuration space for the resulting blackbox surrogate. The default is blackbox.configuration_space. But note that if blackbox is tabular, the domains in blackbox.configuration_space are typically categorical even for numerical parameters.

  • predict_curves (Optional[bool]) – If True, the surrogate uses multivariate regression to predict metric curves over fidelities. If False, fidelity is used as input. The latter can lead to inconsistent predictions along fidelity and is typically more expensive. If not given, the default value is False if blackbox is of type BlackboxOffline, otherwise True.

  • separate_seeds (bool) – If True, seeds in blackbox map to seeds in the surrogate blackbox, which fits different models to each seed. If False, the data from blackbox is merged for all seeds, and the surrogate represents a single seed. The latter provides more data for the surrogate model to be fit, but the variation between seeds is lost in the surrogate. Defaults to False.

  • fit_differences (Optional[List[str]]) – Names of objectives which are cumulative sums. For these objectives, the y data is transformed to finite differences before fitting the model. This is recommended for elapsed_time objectives.

Returns:

a blackbox where the output is obtained through the fitted surrogate

syne_tune.blackbox_repository.blackbox_tabular module
class syne_tune.blackbox_repository.blackbox_tabular.BlackboxTabular(hyperparameters, configuration_space, fidelity_space, objectives_evaluations, fidelity_values=None, objectives_names=None)[source]

Bases: Blackbox

Blackbox that contains tabular evaluations (e.g. all hyperparameters evaluated on all fidelities). We use a separate class than BlackboxOffline, as performance improvement can be made by avoiding to repeat hyperparameters and by storing all evaluations in a single table.

Additional arguments on top of parent class Blackbox:

Parameters:
  • hyperparameters (DataFrame) – dataframe of hyperparameters, shape (num_evals, num_hps), columns must match hyperparameter names of configuration_space

  • objectives_evaluations (array) – values of recorded objectives, must have shape (num_evals, num_seeds, num_fidelities, num_objectives)

  • fidelity_values (Optional[array]) – values of the num_fidelities fidelities, default to [1, ..., num_fidelities]

property fidelity_values: array
Returns:

Fidelity values; or None if the blackbox has none

hyperparameter_objectives_values(predict_curves=False)[source]

If predict_curves is False, the shape of X is (num_evals * num_seeds * num_fidelities, num_hps + 1), the shape of y is (num_evals * num_seeds * num_fidelities, num_objectives). This can be reshaped to (num_fidelities, num_seeds, num_evals, *). The final column of X is the fidelity value (only a single fidelity attribute is supported).

If predict_curves is True, the shape of X is (num_evals * num_seeds, num_hps), the shape of y is (num_evals * num_seeds, num_fidelities * num_objectives). The latter can be reshaped to (num_seeds, num_evals, num_fidelities, num_objectives).

Parameters:

predict_curves (bool) – See above. Default is False

Return type:

Tuple[DataFrame, DataFrame]

Returns:

Dataframes corresponding to X and y

rename_objectives(objective_name_mapping)[source]
Parameters:

objective_name_mapping (Dict[str, str]) – dictionary from old objective name to new one, old objective name must be present in the blackbox

Return type:

BlackboxTabular

Returns:

a blackbox with as many objectives as objective_name_mapping

all_configurations()[source]

This method is useful in order to set restrict_configurations in StochasticAndFilterDuplicatesSearcher or GPFIFOSearcher, which restricts the searcher to only return configurations in this set. This allows you to use a tabular blackbox without a surrogate.

Return type:

List[Dict[str, Any]]

Returns:

List of all hyperparameter configurations for which objective values can be returned

syne_tune.blackbox_repository.blackbox_tabular.serialize(bb_dict, path, metadata=None)[source]
syne_tune.blackbox_repository.blackbox_tabular.deserialize(path)[source]

Deserialize blackboxes contained in a path that were saved with serialize() above.

TODO: the API is currently dissonant with serialize(), deserialize() for BlackboxOffline as serialize is a member function there. A possible way to unify is to have serialize also be a free function for BlackboxOffline.

Parameters:

path (str) – a path that contains blackboxes that were saved with serialize()

Return type:

Dict[str, BlackboxTabular]

Returns:

a dictionary from task name to blackbox

syne_tune.blackbox_repository.repository module
syne_tune.blackbox_repository.repository.blackbox_list()[source]
Return type:

List[str]

Returns:

list of blackboxes available

syne_tune.blackbox_repository.repository.load_blackbox(name, skip_if_present=True, s3_root=None, generate_if_not_found=True, yahpo_kwargs=None, ignore_hash=True)[source]
Parameters:
  • name (str) –

    name of a blackbox present in the repository, see blackbox_list() to get list of available blackboxes. Syne Tune currently provides the following blackboxes evaluations:

    • ”nasbench201”: 15625 multi-fidelity configurations of computer vision architectures evaluated on 3 datasets. NAS-Bench-201: Extending the scope of reproducible neural architecture search. Dong, X. and Yang, Y. 2020.

    • ”fcnet”: 62208 multi-fidelity configurations of MLP evaluated on 4 datasets. Tabular benchmarks for joint architecture and hyperparameter optimization. Klein, A. and Hutter, F. 2019.

    • ”lcbench”: 2000 multi-fidelity Pytorch model configurations evaluated on many datasets. Reference: Auto-PyTorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL. Lucas Zimmer, Marius Lindauer, Frank Hutter. 2020.

    • ”icml-deepar”: 2420 single-fidelity configurations of DeepAR forecasting algorithm evaluated on 10 datasets. A quantile-based approach for hyperparameter transfer learning. Salinas, D., Shen, H., and Perrone, V. 2021.

    • ”icml-xgboost”: 5O00 single-fidelity configurations of XGBoost evaluated on 9 datasets. A quantile-based approach for hyperparameter transfer learning. Salinas, D., Shen, H., and Perrone, V. 2021.

    • ”yahpo-*”: Number of different benchmarks from YAHPO Gym. Note that these blackboxes come with surrogates already, so no need to wrap them into SurrogateBlackbox

  • skip_if_present (bool) – skip the download if the file locally exists

  • s3_root (Optional[str]) – S3 root directory for blackbox repository. Defaults to S3 bucket name of SageMaker session

  • generate_if_not_found (bool) – If the blackbox file is not present locally or on S3, should it be generated using its conversion script?

  • yahpo_kwargs (Optional[dict]) – For a YAHPO blackbox (name == "yahpo-*"), these are additional arguments to instantiate_yahpo

  • ignore_hash (bool) – do not check if hash of currently stored files matches the pre-computed hash. Be careful with this option. If hashes do not match, results might not be reproducible.

Return type:

Union[Dict[str, Blackbox], Blackbox]

Returns:

blackbox with the given name, download it if not present.

syne_tune.blackbox_repository.repository.check_blackbox_local_files(tgt_folder)[source]

checks whether the file of the blackbox name are present in repository_path

Return type:

bool

syne_tune.blackbox_repository.serialize module
syne_tune.blackbox_repository.serialize.serialize_configspace(path, configuration_space, fidelity_space=None)[source]
syne_tune.blackbox_repository.serialize.deserialize_configspace(path)[source]
syne_tune.blackbox_repository.serialize.serialize_metadata(path, metadata)[source]
syne_tune.blackbox_repository.serialize.deserialize_metadata(path)[source]
syne_tune.blackbox_repository.simulated_tabular_backend module
syne_tune.blackbox_repository.simulated_tabular_backend.make_surrogate(surrogate=None, surrogate_kwargs=None)[source]

Creates surrogate model (scikit-learn estimater)

Parameters:
  • surrogate (Optional[str]) – A model that is fitted to predict objectives given any configuration. Possible examples: “KNeighborsRegressor”, MLPRegressor”, “XGBRegressor”, which would enable using the corresponding scikit-learn estimator. The model is fit on top of pipeline that applies basic feature-processing to convert hyperparameters rows in X to vectors. The configuration_space hyperparameters types are used to deduce the types of columns in X (for instance, categorical hyperparameters are one-hot encoded).

  • surrogate_kwargs (Optional[dict]) – Arguments for the scikit-learn estimator, for instance {"n_neighbors": 1} can be used if surrogate="KNeighborsRegressor" is chosen.

Returns:

Scikit-learn estimator representing surrogate model

class syne_tune.blackbox_repository.simulated_tabular_backend.BlackboxRepositoryBackend(blackbox_name, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, dataset=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, config_space_surrogate=None, **simulatorbackend_kwargs)[source]

Bases: _BlackboxSimulatorBackend

Allows to simulate a blackbox from blackbox-repository, selected by blackbox_name. See examples/launch_simulated_benchmark.py for an example on how to use. If you want to add a new dataset, see the Adding a new dataset section of syne_tune/blackbox_repository/README.md.

In each result reported to the simulator backend, the value for key elapsed_time_attr must be the time since the start of the evaluation. For example, if resource (or fidelity) equates to epochs trained, this would be the time from start of training until the end of the epoch. If the blackbox contains this information in a column, elapsed_time_attr should be its key.

If this backend is used with pause-and-resume multi-fidelity scheduling, it needs to track at which resource level each trial is paused. Namely, once a trial is resumed, all results for resources smaller or equal to that level are ignored, which simulates the situation that training is resumed from a checkpoint. This feature relies on result to be passed to pause_trial(). If this is not done, the backend cannot know from which resource level to resume a trial, so it starts the trial from scratch (which is equivalent to no checkpointing). The same happens if support_checkpointing is False.

Note

If the blackbox maintains cumulative time (elapsed_time), this is different from what SimulatorBackend requires for elapsed_time_attr, if a pause-and-resume scheduler is used. Namely, the backend requires the time since the start of the last recent resume. This conversion is done here internally in _run_job_and_collect_results(), which is called for each resume. This means that the field elapsed_time_attr is not what is received from the blackbox table, but instead what the backend needs.

max_resource_attr plays the same role as in HyperbandScheduler. If given, it is the key in a configuration config for the maximum resource. This is used by schedulers which limit each evaluation by setting this argument (e.g., promotion-based Hyperband).

If seed is given, entries of the blackbox are queried for this seed. Otherwise, a seed is drawn at random for every trial, but the same seed is used for all _run_job_and_collect_results() calls for the same trial. This is important for pause and resume scheduling.

Parameters:
  • blackbox_name (str) – Name of a blackbox, must have been registered in blackbox repository.

  • elapsed_time_attr (str) – Name of the column containing cumulative time

  • max_resource_attr (Optional[str]) – See above

  • seed (Optional[int]) – If given, this seed is used for all trial evaluations. Otherwise, seed is sampled at random for each trial. Only relevant for blackboxes with multiple seeds

  • support_checkpointing (bool) – If False, the simulation does not do checkpointing, so resumed trials are started from scratch. Defaults to True

  • dataset (Optional[str]) – Selects different versions of the blackbox (typically, the same ML model has been trained on different datasets)

  • surrogate (Optional[str]) – Optionally, a model that is fitted to predict objectives given any configuration. Examples: “KNeighborsRegressor”, “MLPRegressor”, “XGBRegressor”, which would enable using the corresponding scikit-learn estimator, see also make_surrogate(). The model is fit on top of pipeline that applies basic feature-processing to convert hyperparameter rows in X to vectors. The configuration_space hyperparameter types are used to deduce the types of columns in X (for instance, categorical hyperparameters are one-hot encoded).

  • surrogate_kwargs (Optional[dict]) – Arguments for the scikit-learn estimator, for instance {"n_neighbors": 1} can be used if surrogate="KNeighborsRegressor" is chosen. If blackbox_name is a YAHPO blackbox, then surrogate_kwargs is passed as yahpo_kwargs to load_blackbox(). In this case, surrogate is ignored (YAHPO always uses surrogates).

  • config_space_surrogate (Optional[dict]) – If surrogate is given, this is the configuration space for the surrogate blackbox. If not given, the space of the original blackbox is used. However, its numerical parameters have finite domains (categorical or ordinal), which is usually not what we want for a surrogate.

  • simulatorbackend_kwargs – Additional arguments to parent SimulatorBackend

property blackbox: Blackbox
class syne_tune.blackbox_repository.simulated_tabular_backend.UserBlackboxBackend(blackbox, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, **simulatorbackend_kwargs)[source]

Bases: _BlackboxSimulatorBackend

Version of _BlackboxSimulatorBackend, where the blackbox is given as explicit Blackbox object. See examples/launch_simulated_benchmark.py for an example on how to use.

Additional arguments on top of parent _BlackboxSimulatorBackend:

Parameters:

blackbox (Blackbox) – Blackbox to be used for simulation

property blackbox: Blackbox
syne_tune.blackbox_repository.utils module
syne_tune.blackbox_repository.utils.metrics_for_configuration(blackbox, config, resource_attr, fidelity_range=None, seed=None)[source]

Returns all results for configuration config at fidelities in range fidelity_range.

Parameters:
  • blackbox (Blackbox) – Blackbox

  • config (Dict[str, Any]) – Configuration

  • resource_attr (str) – Name of resource attribute

  • fidelity_range (Optional[Tuple[float, float]]) – Range [min_f, max_f], only fidelities in this range (both ends inclusive) are returned. Default is no filtering

  • seed (Optional[int]) – Seed for queries to blackbox. Drawn at random if not given

Return type:

List[dict]

Returns:

List of result dicts

syne_tune.callbacks package
class syne_tune.callbacks.TensorboardCallback(ignore_metrics=None, target_metric=None, mode=None, log_hyperparameters=True)[source]

Bases: TunerCallback

Logs relevant metrics reported from trial evaluations, so they can be visualized with Tensorboard.

Parameters:
  • ignore_metrics (Optional[List[str]]) – Defines which metrics should be ignored. If None, all metrics are reported to Tensorboard.

  • target_metric (Optional[str]) – Defines the metric we aim to optimize. If this argument is set, we report the cumulative optimum of this metric as well as the optimal hyperparameters we have found so far.

  • mode (Optional[str]) – Determined whether we maximize (“max”) or minimize (“min”) the target metric.

  • log_hyperparameters (bool) – If set to True, we also log all hyperparameters specified in the configurations space.

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict[str, Any]) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tunerTuner object

on_tuning_end()[source]

Called once the tuning loop terminates

This is called before Tuner object is serialized (optionally), and also before running jobs are stopped.

Submodules
syne_tune.callbacks.hyperband_remove_checkpoints_callback module
class syne_tune.callbacks.hyperband_remove_checkpoints_callback.TrialStatus[source]

Bases: object

RUNNING = 'RUNNING'
PAUSED_WITH_CHECKPOINT = 'PAUSED-WITH-CP'
PAUSED_NO_CHECKPOINT = 'PAUSED-NO-CP'
STOPPED_OR_COMPLETED = 'STOPPED-COMPLETED'
class syne_tune.callbacks.hyperband_remove_checkpoints_callback.BetaBinomialEstimator(beta_mean, beta_size)[source]

Bases: object

Estimator of the probability \(p = P(X = 1)\) for a variable \(X\) with Bernoulli distribution. This is using a Beta prior, which is conjugate to the binomial likelihood. The prior is parameterized by effective sample size beta_size (\(a + b\)) and mean beta_mean (\(a / (a + b)\)).

update(data)[source]
property num_one: int
property num_total
posterior_mean()[source]
Return type:

float

class syne_tune.callbacks.hyperband_remove_checkpoints_callback.TrialInformation(trial_id, level, rank, rung_len, score_val=None)[source]

Bases: object

trial_id: str
level: int
rank: int
rung_len: int
score_val: Optional[float] = None
class syne_tune.callbacks.hyperband_remove_checkpoints_callback.HyperbandRemoveCheckpointsCommon(max_num_checkpoints, max_wallclock_time, metric, resource_attr, mode)[source]

Bases: TunerCallback

Common base class for HyperbandRemoveCheckpointsCallback and HyperbandRemoveCheckpointsBaselineCallback.

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tunerTuner object

property num_checkpoints_removed: int
on_loop_end()[source]

Called at end of each tuning loop iteration

This is done before the loop stopping condition is checked and acted upon.

on_trial_complete(trial, result)[source]

Called when a trial completes (Status.completed)

The arguments here also have been passed to scheduler.on_trial_complete, before this call here.

Parameters:
  • trial (Trial) – Trial that just completed.

  • result (Dict[str, Any]) – Last result obtained.

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict[str, Any]) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

on_start_trial(trial)[source]

Called just after a new trials is started

Parameters:

trial (Trial) – Trial which has just been started

on_resume_trial(trial)[source]

Called just after a trial is resumed

Parameters:

trial (Trial) – Trial which has just been resumed

trials_resumed_without_checkpoint()[source]
Return type:

List[Tuple[str, int]]

Returns:

List of (trial_id, level) for trials which were resumed, even though their checkpoint was removed

extra_results()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary containing information which can be appended to results written out

static extra_results_keys()[source]
Return type:

List[str]

class syne_tune.callbacks.hyperband_remove_checkpoints_callback.HyperbandRemoveCheckpointsCallback(max_num_checkpoints, max_wallclock_time, metric, resource_attr, mode, approx_steps=25, prior_beta_mean=0.33, prior_beta_size=2, min_data_at_rung=5)[source]

Bases: HyperbandRemoveCheckpointsCommon

Implements speculative early removal of checkpoints of paused trials for HyperbandScheduler (only for types which pause trials at rung levels).

In this scheduler, any paused trial can in principle be resumed in the future, which is why we remove checkpoints speculatively. The idea is to keep the total number of checkpoints no larger than max_num_checkpoints. If this limit is reached, we rank all currently paused trials which still have a checkpoint and remove checkpoints for those with lowest scores. If a trial is resumed whose checkpoint has been removed, we have to train from scratch, at a cost proportional to the rung level the trial is paused at. The score is an approximation to this expected cost, the product of rung level and probability of getting resumed. This probability depends on the current rung size, the rank of the trial in the rung, and both the time spent and remaining for the experiment, so we need max_wallclock_time. Details are given in a technical report.

The probability of getting resumed also depends on the probability \(p_r\) that a new trial arriving at rung \(r\) ranks better than an existing paused one with a checkpoint. These probabilities are estimated here. For each new arrival at a rung, we obtain one datapoint for every paused trial with checkpoint there. We use Bayesian estimators with Beta prior given by mean prior_beta_mean and sample size prior_beta_size. The mean should be \(< 1/2\)). We also run an estimator for an overall probability \(p\), which is fed by all datapoints. This estimator is used as long as there are less than \(min_data_at_rung\) datapoints at rung \(r\).

Parameters:
  • max_num_checkpoints (int) – Once the total number of checkpoints surpasses this number, we remove some.

  • max_wallclock_time (int) – Maximum time of the experiment

  • metric (str) – Name of metric in result of on_trial_result()

  • resource_attr (str) – Name of resource attribute in result of on_trial_result()

  • mode (str) – “min” or “max”

  • approx_steps (int) – Number of approximation steps in score computation. Computations scale cubically in this number. Defaults to 25

  • prior_beta_mean (float) – Parameter of Beta prior for estimators. Defaults to 0.33

  • prior_beta_size (float) – Parameter of Beta prior for estimators. Defaults to 2

  • min_data_at_rung (int) – See above. Defaults to 5

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tunerTuner object

estimator_for_rung(level)[source]
Return type:

BetaBinomialEstimator

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict[str, Any]) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

class syne_tune.callbacks.hyperband_remove_checkpoints_callback.HyperbandRemoveCheckpointsBaselineCallback(max_num_checkpoints, max_wallclock_time, metric, resource_attr, mode, baseline=None)[source]

Bases: HyperbandRemoveCheckpointsCommon

Implements some simple baselines to compare with HyperbandRemoveCheckpointsCallback.

Parameters:
  • max_num_checkpoints (int) – Once the total number of checkpoints surpasses this number, we remove some.

  • max_wallclock_time (int) – Maximum time of the experiment

  • metric (str) – Name of metric in result of on_trial_result()

  • resource_attr (str) – Name of resource attribute in result of on_trial_result()

  • mode (str) – “min” or “max”

  • baseline (Optional[str]) –

    Type of baseline. Defaults to “by_level”

    • ”random”: Select random paused trial with checkpoint

    • ”by_level”: Select paused trial (with checkpoint) on lowest rung level,

      and then of worst rank

syne_tune.callbacks.hyperband_remove_checkpoints_score module
syne_tune.callbacks.hyperband_remove_checkpoints_score.compute_probabilities_of_getting_resumed(ranks, rung_lens, prom_quants, p_vals, time_ratio, approx_steps)[source]

Computes an approximation to the probability of getting resumed under our independence assumptions. This approximation improves with larger approx_steps, but its cost scales cubically in this number.

Parameters:
  • ranks (ndarray) – Ranks \(k\), starting from 1 (smaller is better)

  • rung_lens (ndarray) – Rung lengths \(n_r\)

  • prom_quants (ndarray) – Promotion quantiles \(\alpha_r\)

  • p_vals (ndarray) – Probabilities \(p_r\)

  • time_ratio (float) – Ratio \(\beta\) between time left and time spent

  • approx_steps (int) – Number of approximation steps, see above

Return type:

ndarray

Returns:

Approximations of probability to get resumed

syne_tune.callbacks.remove_checkpoints_callback module
class syne_tune.callbacks.remove_checkpoints_callback.RemoveCheckpointsCallback[source]

Bases: TunerCallback

This implements early removal of checkpoints of paused trials. In order for this to work, the scheduler needs to implement trials_checkpoints_can_be_removed().

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tunerTuner object

on_loop_end()[source]

Called at end of each tuning loop iteration

This is done before the loop stopping condition is checked and acted upon.

class syne_tune.callbacks.remove_checkpoints_callback.DefaultRemoveCheckpointsSchedulerMixin[source]

Bases: RemoveCheckpointsSchedulerMixin

Implements general case of RemoveCheckpointsSchedulerMixin, where the callback is of type RemoveCheckpointsCallback. This means scheduler has to implement trials_checkpoints_can_be_removed().

trials_checkpoints_can_be_removed()[source]

Supports the general case (see header comment). This method returns IDs of paused trials for which checkpoints can safely be removed. These trials either cannot be resumed anymore, or it is very unlikely they will be resumed. Any trial ID needs to be returned only once, not over and over. If a trial gets stopped (by returning SchedulerDecision.STOP in on_trial_result()), its checkpoint is removed anyway, so its ID does not have to be returned here.

Return type:

List[int]

Returns:

IDs of paused trials for which checkpoints can be removed

callback_for_checkpoint_removal(stop_criterion)[source]
Parameters:

stop_criterion (Callable[[TuningStatus], bool]) – Stopping criterion, as passed to Tuner

Return type:

Optional[TunerCallback]

Returns:

CP removal callback, or None if CP removal is not activated

syne_tune.callbacks.tensorboard_callback module
class syne_tune.callbacks.tensorboard_callback.TensorboardCallback(ignore_metrics=None, target_metric=None, mode=None, log_hyperparameters=True)[source]

Bases: TunerCallback

Logs relevant metrics reported from trial evaluations, so they can be visualized with Tensorboard.

Parameters:
  • ignore_metrics (Optional[List[str]]) – Defines which metrics should be ignored. If None, all metrics are reported to Tensorboard.

  • target_metric (Optional[str]) – Defines the metric we aim to optimize. If this argument is set, we report the cumulative optimum of this metric as well as the optimal hyperparameters we have found so far.

  • mode (Optional[str]) – Determined whether we maximize (“max”) or minimize (“min”) the target metric.

  • log_hyperparameters (bool) – If set to True, we also log all hyperparameters specified in the configurations space.

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict[str, Any]) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tunerTuner object

on_tuning_end()[source]

Called once the tuning loop terminates

This is called before Tuner object is serialized (optionally), and also before running jobs are stopped.

syne_tune.experiments package
class syne_tune.experiments.ExperimentResult(name, results, metadata, tuner, path)[source]

Bases: object

Wraps results dataframe and provides retrieval services.

Parameters:
  • name (str) – Name of experiment

  • results (DataFrame) – Dataframe containing results of experiment

  • metadata (Dict[str, Any]) – Metadata stored along with results

  • tuner (Tuner) – Tuner object stored along with results

  • path (Path) – local path where the experiment is stored

name: str
results: DataFrame
metadata: Dict[str, Any]
tuner: Tuner
path: Path
creation_date()[source]
Returns:

Timestamp when Tuner was created

plot_hypervolume(metrics_to_plot=None, reference_point=None, figure_path=None, **plt_kwargs)[source]

Plot best hypervolume value as function of wallclock time

Parameters:
  • reference_point (Optional[ndarray]) – Reference point for hypervolume calculations. If None, the maximum values of each metric is used.

  • figure_path (Optional[str]) – If specified, defines the path where the figure will be saved. If None, the figure is shown

  • plt_kwargs – Arguments to matplotlib.pyplot.plot()

plot(metric_to_plot=0, figure_path=None, **plt_kwargs)[source]

Plot best metric value as function of wallclock time

Parameters:
  • metric_to_plot (Union[str, int]) – Indicates which metric to plot, can be the index or a name of the metric. default to 0 - first metric defined

  • figure_path (Optional[str]) – If specified, defines the path where the figure will be saved. If None, the figure is shown

  • plt_kwargs – Arguments to matplotlib.pyplot.plot()

plot_trials_over_time(metric_to_plot=0, figure_path=None, figsize=None)[source]

Plot trials results over as function of wallclock time

Parameters:
  • metric_to_plot (Union[str, int]) – Indicates which metric to plot, can be the index or a name of the metric. default to 0 - first metric defined

  • figure_path (Optional[str]) – If specified, defines the path where the figure will be saved. If None, the figure is shown

  • figsize – width and height of figure

metric_mode()[source]
Return type:

Union[str, List[str]]

metric_names()[source]
Return type:

List[str]

entrypoint_name()[source]
Return type:

str

best_config(metric=0)[source]

Return the best config found for the specified metric :type metric: Union[str, int] :param metric: Indicates which metric to use, can be the index or a name of the metric.

default to 0 - first metric defined in the Scheduler

Return type:

Dict[str, Any]

Returns:

Configuration corresponding to best metric value

syne_tune.experiments.load_experiment(tuner_name, download_if_not_found=True, load_tuner=False, local_path=None, experiment_name=None)[source]

Load results from an experiment

Parameters:
  • tuner_name (str) – Name of a tuning experiment previously run

  • download_if_not_found (bool) – If True, fetch results from S3 if not found locally

  • load_tuner (bool) – Whether to load the tuner in addition to metadata and results

  • local_path (Optional[str]) – Path containing the experiment to load. If not specified, ~/{SYNE_TUNE_FOLDER}/ is used.

  • experiment_name (Optional[str]) – If given, this is used as first directory.

Return type:

ExperimentResult

Returns:

Result object

syne_tune.experiments.get_metadata(path_filter=None, root=PosixPath('/home/docs/syne-tune'))[source]

Load meta-data for a number of experiments

Parameters:
  • path_filter (Optional[Callable[[str], bool]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.

  • root (Path) – Root path for experiment results. Default is experiment_path()

Return type:

Dict[str, dict]

Returns:

Dictionary from tuner name to metadata dict

syne_tune.experiments.list_experiments(path_filter=None, experiment_filter=None, root=PosixPath('/home/docs/syne-tune'), load_tuner=False)[source]

List experiments for which results are found

Parameters:
  • path_filter (Optional[Callable[[str], bool]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.

  • experiment_filter (Optional[Callable[[ExperimentResult], bool]]) – Filter on ExperimentResult, optional

  • root (Path) – Root path for experiment results. Default is result of experiment_path()

  • load_tuner (bool) – Whether to load the tuner in addition to metadata and results

Return type:

List[ExperimentResult]

Returns:

List of result objects

syne_tune.experiments.load_experiments_df(path_filter=None, experiment_filter=None, root=PosixPath('/home/docs/syne-tune'), load_tuner=False)[source]
Parameters:
  • path_filter (Optional[Callable[[str], bool]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.

  • experiment_filter (Optional[Callable[[ExperimentResult], bool]]) – Filter on ExperimentResult

  • root (Path) – Root path for experiment results. Default is experiment_path()

  • load_tuner (bool) – Whether to load the tuner in addition to metadata and results

Return type:

DataFrame

Returns:

Dataframe that contains all evaluations reported by tuners according to the filter given. The columns contain trial-id, hyperparameter evaluated, metrics reported via Reporter. These metrics are collected automatically:

  • st_worker_time (indicating time spent in the worker when report was seen)

  • time (indicating wallclock time measured by the tuner)

  • decision decision taken by the scheduler when observing the result

  • status status of the trial that was shown to the tuner

  • config_{xx} configuration value for the hyperparameter {xx}

  • tuner_name named passed when instantiating the Tuner

  • entry_point_name, entry_point_path name and path of the entry point that was tuned

class syne_tune.experiments.ComparativeResults(experiment_names, setups, num_runs, metadata_to_setup, plot_params=None, metadata_to_subplot=None, benchmark_key='benchmark', with_subdirs='*', datetime_bounds=None, metadata_keys=None, metadata_subplot_level=False, download_from_s3=False, s3_bucket=None)[source]

Bases: object

This class loads, processes, and plots results of a comparative study, combining several experiments for different methods, seeds, and benchmarks (optional). Note that an experiment corresponds to one run of HPO, resulting in files ST_METADATA_FILENAME for metadata, and ST_RESULTS_DATAFRAME_FILENAME for time-stamped results.

There is one comparative plot per benchmark (aggregation of results across benchmarks are not supported here). Results are grouped by setup (which usually equates to method), and then summary statistics are shown for each setup as function of wall-clock time. The plot can also have several subplots, in which case results are first grouped into subplot number, then setup.

If benchmark_key is None, there is only a single benchmark, and all results are merged together.

Both setup name and subplot number (optional) can be configured by the user, as function of metadata written for each experiment. The functions metadata_to_setup and metadata_to_subplot (optional) can also be used for filtering: results of experiments for which any of them returns None, are not used.

When grouping results w.r.t. benchmark name and setup name, we should end up with num_runs experiments. These are (typically) random repetitions with different seeds. If after grouping, a different number of experiments is found for some setup, a warning message is printed. In this case, we recommend to check the completeness of result files. Common reasons:

  • Less than num_runs experiments found. Experiments failed, or files were not properly synced.

  • More than num_runs experiments found. This happens if initial experiments for the study failed, but ended up writing results. This can be fixed by either removing the result files, or by using datetime_bounds (since initial failed experiments ran first).

Result files have the path f"{experiment_path()}{ename}/{patt}/{ename}-*/", where path is from with_subdirs, and ename from experiment_names. The default is with_subdirs="*". If with_subdirs is None, result files have the path f"{experiment_path()}{ename}-*/". Use this if your experiments have been run locally.

If datetime_bounds is given, it contains a tuple of strings (lower_time, upper_time), or a dictionary mapping names from experiment_names to such tuples. Both strings are time-stamps in the format ST_DATETIME_FORMAT (example: “2023-03-19-22-01-57”), and each can be None as well. This serves to filter out any result whose time-stamp does not fall within the interval (both sides are inclusive), where None means the interval is open on that side. This feature is useful to filter out results of erroneous attempts.

If metadata_keys is given, it contains a list of keys into the metadata. In this case, metadata values for these keys are extracted and can be retrieved with metadata_values(). In fact, metadata_values(benchmark_name) returns a nested dictionary, where result[key][setup_name] is a list of values. If metadata_subplot_level is True and metadata_to_subplot is given, the result structure is result[key][setup_name][subplot_no]. This should be set if different subplots share the same setup names, since otherwise metadata values are only grouped by setup name.

Parameters:
  • experiment_names (Tuple[str, ...]) – Tuple of experiment names (prefixes, without the timestamps)

  • setups (Iterable[str]) – Possible values of setup names

  • num_runs (int) – When grouping results w.r.t. benchmark name and setup name, we should end up with this many experiments. See above

  • metadata_to_setup (Union[Callable[[Dict[str, Any]], Optional[str]], Dict[str, Callable[[Dict[str, Any]], Optional[str]]]]) – See above

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Can be overwritten in plot(). See PlotParameters

  • metadata_to_subplot (Optional[Callable[[Dict[str, Any]], Optional[int]]]) – See above. Optional

  • benchmark_key (Optional[str]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this is None, there is only a single benchmark, and all results are merged together

  • with_subdirs (Union[str, List[str], None]) – See above. Defaults to “*”

  • datetime_bounds (Union[Tuple[Optional[str], Optional[str]], Dict[str, Tuple[Optional[str], Optional[str]]], None]) – See above

  • metadata_keys (Optional[List[str]]) – See above

  • metadata_subplot_level (bool) – See above. Defaults to False

  • download_from_s3 (bool) – Should result files be downloaded from S3? This is supported only if with_subdirs

  • s3_bucket (Optional[str]) – Only if download_from_s3 == True. If not given, the default bucket for the SageMaker session is used

metadata_values(benchmark_name=None)[source]

The nested dictionary returned has the structure result[key][setup_name], or result[key][setup_name][subplot_no] if metadata_subplot_level == True.

Parameters:

benchmark_name (Optional[str]) – Name of benchmark

Return type:

Dict[str, Any]

Returns:

Nested dictionary with meta-data values

plot(benchmark_name=None, plot_params=None, file_name=None, extra_results_keys=None, dataframe_column_generator=None, one_result_per_trial=False)[source]

Create comparative plot from results of all experiments collected at construction, for benchmark benchmark_name (if there is a single benchmark only, this need not be given).

If plot_params.show_init_trials is given, the best metric value curve for the data from trials <=  plot_params.show_init_trials.trial_id in a particular setup plot_params.show_init_trials.setup_name is shown in all subplots the setup is contained in. This is useful to contrast the performance of methods against the performance for one particular trial, for example the initial configuration (i.e., to show how much this can be improved upon). The final metric value of this extra curve is extended until the end of the horizontal range, in order to make it visible. The corresponding curve is labeled with plot_params.show_init_trials.new_setup_name in the legend.

If extra_results_keys is given, these are column names in the result dataframe. For each setup and seed, we collect the values for the largest time stamp. We return a nested dictionary extra_results, so that extra_results[setup_name][key] contains values (over seeds), where key is in extra_results_keys. If metadata_subplot_level is True and metadata_to_subplot is given, the structure is extra_results[setup_name][subplot_no][key].

If dataframe_column_generator is given, it maps a result dataframe for a single experiment to a new column named plot_params.metric. This is applied before computing cumulative maximum or minimum and aggregation over seeds. This way, we can plot derived metrics which are not contained in the results as columns. Note that the transformed dataframe is not retained.

Parameters:
  • benchmark_name (Optional[str]) – Name of benchmark for which to plot results. Not needed if there is only one benchmark

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.

  • file_name (Optional[str]) – If given, the figure is stored in a file of this name

  • extra_results_keys (Optional[List[str]]) – See above, optional

  • dataframe_column_generator (Optional[Callable[[DataFrame], Series]]) – See above, optional

  • one_result_per_trial (bool) – If True, results for each experiment are filtered down to one row per trial (the one with the largest time stamp). This is useful for results from a single-fidelity method, where the training script reported results after every epoch.

Return type:

Dict[str, Any]

Returns:

Dictionary with “fig”, “axs” (for further processing). If extra_results_keys, “extra_results” entry as stated above

class syne_tune.experiments.PlotParameters(metric=None, mode=None, title=None, xlabel=None, ylabel=None, xlim=None, ylim=None, metric_multiplier=None, convert_to_min=None, tick_params=None, aggregate_mode=None, dpi=None, grid=None, subplots=None, show_init_trials=None)[source]

Bases: object

Parameters specifying the figure.

If convert_to_min == True, then smaller is better in plots. An original metric value metric_val is converted as metric_multiplier * metric_val if mode == "min", and as 1 - metric_multiplier * metric_val if mode == "max". If convert_to_min == False`, we always convert as metric_multiplier * metric_val, so that larger is better if mode == "max".

Parameters:
  • metric (Optional[str]) – Name of metric, mandatory

  • mode (Optional[str]) – See above, “min” or “max”. Defaults to “min” if not given

  • title (Optional[str]) – Title of plot. If subplots is used, see SubplotParameters

  • xlabel (Optional[str]) – Label for x axis. If subplots is used, this is printed below each column. Defaults to DEFAULT_XLABEL

  • ylabel (Optional[str]) – Label for y axis. If subplots is used, this is printed left of each row

  • xlim (Optional[Tuple[float, float]]) – (x_min, x_max) for x axis. If subplots is used, see SubplotParameters

  • ylim (Optional[Tuple[float, float]]) – (y_min, y_max) for y axis.

  • metric_multiplier (Optional[float]) – See above. Defaults to 1

  • convert_to_min (Optional[bool]) – See above. Defaults to True

  • tick_params (Optional[Dict[str, Any]]) – Params for ax.tick_params

  • aggregate_mode (Optional[str]) –

    How are values across seeds aggregated?

    • ”mean_and_ci”: Mean and 0.95 normal confidence interval

    • ”median_percentiles”: Mean and 25, 75 percentiles

    • ”iqm_bootstrap”: Interquartile mean and 0.95 confidence interval based on the bootstrap variance estimate

    Defaults to DEFAULT_AGGREGATE_MODE

  • dpi (Optional[int]) – Resolution of figure in DPI. Defaults to 200

  • grid (Optional[bool]) – Figure with grid? Defaults to False

  • subplots (Optional[SubplotParameters]) – If given, the figure consists of several subplots. See SubplotParameters

  • show_init_trials (Optional[ShowTrialParameters]) – See ShowTrialParameters

metric: str = None
mode: str = None
title: str = None
xlabel: str = None
ylabel: str = None
xlim: Tuple[float, float] = None
ylim: Tuple[float, float] = None
metric_multiplier: float = None
convert_to_min: bool = None
tick_params: Dict[str, Any] = None
aggregate_mode: str = None
dpi: int = None
grid: bool = None
subplots: SubplotParameters = None
show_init_trials: ShowTrialParameters = None
merge_defaults(default_params)[source]
Return type:

PlotParameters

class syne_tune.experiments.SubplotParameters(nrows=None, ncols=None, titles=None, title_each_figure=None, kwargs=None, legend_no=None, xlims=None, subplot_indices=None)[source]

Bases: object

Parameters specifying an arrangement of subplots. kwargs is mandatory.

Parameters:
  • nrows (Optional[int]) – Number of rows of subplot matrix

  • ncols (Optional[int]) – Number of columns of subplot matrix

  • titles (Optional[List[str]]) – If given, these are titles for each column in the arrangement of subplots. If title_each_figure == True, these are titles for each subplot. If titles is not given, then PlotParameters.title is printed on top of the leftmost column

  • title_each_figure (Optional[bool]) – See titles, defaults to False

  • kwargs (Optional[Dict[str, Any]]) – Extra arguments for plt.subplots, apart from “nrows” and “ncols”

  • legend_no (Optional[List[int]]) – Subplot indices where legend is to be shown. Defaults to [] (no legends shown). This is not relative to subplot_indices

  • xlims (Optional[List[int]]) – If this is given, must be a list with one entry per subfigure. In this case, the global xlim is overwritten by (0, xlims[subplot_no]). If subplot_indices is given, xlims must have the same length, and xlims[j] refers to subplot index subplot_indices[j] then

  • subplot_indices (Optional[List[int]]) – If this is given, we only plot subfigures with indices in this list, and in this order. Otherwise, we plot subfigures 0, 1, 2, …

nrows: int = None
ncols: int = None
titles: List[str] = None
title_each_figure: bool = None
kwargs: Dict[str, Any] = None
legend_no: List[int] = None
xlims: List[int] = None
subplot_indices: List[int] = None
merge_defaults(default_params)[source]
Return type:

SubplotParameters

class syne_tune.experiments.ShowTrialParameters(setup_name=None, trial_id=None, new_setup_name=None)[source]

Bases: object

Parameters specifying the show_init_trials feature. This features adds one more curve to each subplot where setup_name features. This curve shows best metric value found for trials with ID <= trial_id. The right-most value is extended as constant line across the remainder of the x-axis, for better visibility.

Parameters:
  • setup_name (Optional[str]) – Setup from which the trial performance is taken

  • trial_id (Optional[int]) – ID of trial. Defaults to 0. If this is positive, data from trials with IDs <= trial_id are shown

  • new_setup_name (Optional[str]) – Name of the additional curve in legends

setup_name: str = None
trial_id: int = None
new_setup_name: str = None
merge_defaults(default_params)[source]
Return type:

ShowTrialParameters

class syne_tune.experiments.TrialsOfExperimentResults(experiment_names, setups, metadata_to_setup, plot_params=None, multi_fidelity_params=None, benchmark_key='benchmark', seed_key='seed', with_subdirs='*', datetime_bounds=None, download_from_s3=False, s3_bucket=None)[source]

Bases: object

This class loads, processes, and plots metric results for single experiments, where the curves for different trials have different colours.

Compared to ComparativeResults, each subfigure uses data from a single experiment (one benchmark, one seed, one setup). Both benchmark and seed need to be chosen in plot(). If there are different setups, they give rise to subfigures.

If plot_params.subplots is not given, the arrangement is one row with columns corresponding to setups, and setup names as titles. Specify plot_params.subplots in order to change this arrangement (e.g., to have more than one row). Setups can be selected by using plot_params.subplots.subplot_indices. Also, if plot_params.subplots.titles is not given, we use setup names, and each subplot gets its own title (plot_params.subplots.title_each_figure is ignored).

For plot_params, we use the same PlotParameters as in ComparativeResults, but some fields are not used here (title, aggregate_mode, show_one_trial, subplots.legend_no, subplots.xlims).

Parameters:
  • experiment_names (Tuple[str, ...]) – Tuple of experiment names (prefixes, without the timestamps)

  • setups (Iterable[str]) – Possible values of setup names

  • metadata_to_setup (Union[Callable[[Dict[str, Any]], Optional[str]], Dict[str, Callable[[Dict[str, Any]], Optional[str]]]]) – See above

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Can be overwritten in plot(). See PlotParameters

  • multi_fidelity_params (Optional[MultiFidelityParameters]) – If given, we use a special variant tailored to multi-fidelity methods (see plot()).

  • benchmark_key (Optional[str]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this is None, there is only a single benchmark, and all results are merged together

  • seed_key (str) – Key for seed in metadata files. Defaults to “seed”.

  • with_subdirs (Union[str, List[str], None]) – See above. Defaults to “*”

  • datetime_bounds (Union[Tuple[Optional[str], Optional[str]], Dict[str, Tuple[Optional[str], Optional[str]]], None]) – See above

  • download_from_s3 (bool) – Should result files be downloaded from S3? This is supported only if with_subdirs

  • s3_bucket (Optional[str]) – Only if download_from_s3 == True. If not given, the default bucket for the SageMaker session is used

plot(benchmark_name=None, seed=0, plot_params=None, file_name=None)[source]

Creates a plot, whose subfigures should metric data from single experiments. In general:

  • Each trial has its own color, which is cycled through periodically. The cycling depends on the largest rung level for the trial. This is to avoid neighboring curves to have the same color

For single-fidelity methods (default, multi_fidelity_params not given):

  • The learning curve for a trial ends with ‘o’. If it reports only once at the end, this is all that is shown for the trial

For multi-fidelity methods:

  • Learning curves are plotted in contiguous chunks of execution. For pause and resume setups (those in ``multi_fidelity_params.pause_resume_setups), they are interrupted. Each chunk starts at the epoch after resume and ends at the epoch where the trial is paused

  • Values at rung levels are marked as ‘o’. If this is the furthest the trial got to, the marker is ‘D’ (diamond)

Results for different setups are plotted as subfigures, either using the setup in plot_params.subplots, or as columns of a single row.

Parameters:
  • benchmark_name (Optional[str]) – Name of benchmark for which to plot results. Not needed if there is only one benchmark

  • seed (int) – Seed number. Defaults to 0

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.

  • file_name (Optional[str]) – If given, the figure is stored in a file of this name

class syne_tune.experiments.MultiFidelityParameters(rung_levels, multifidelity_setups)[source]

Bases: object

Parameters configuring the multi-fidelity version of TrialsOfExperimentResults.

multifidelity_setups contains names of setups which are multi-fidelity, the remaining ones are single-fidelity. It can also be a dictionary, mapping a multi-fidelity setup name to True if this is a pause-and-resume method (these are visualized differently), False otherwise (early stopping method).

Parameters:
  • rung_levels (List[int]) – See above. Positive integers, increasing

  • multifidelity_setups (Union[List[str], Dict[str, bool]]) – See above

rung_levels: List[int]
multifidelity_setups: Union[List[str], Dict[str, bool]]
check_params(setups)[source]
syne_tune.experiments.hypervolume_indicator_column_generator(metrics_and_modes, reference_point=None, increment=1)[source]

Returns generator for new dataframe column containing the best hypervolume indicator as function of wall-clock time, based on the metrics in metrics_and_modes (metric names correspond to column names in the dataframe). For a metric with mode == "max", we use its negative.

This mapping is used to create the dataframe_column_generator argument of plot(). Since the current implementation is not incremental and quite slow, if you plot results for single-fidelity HPO methods, it is strongly recommended to also use one_result_per_trial=True:

results = ComparativeResults(...)
dataframe_column_generator = hypervolume_indicator_column_generator(
    metrics_and_modes
)
plot_params = PlotParameters(
    metric="hypervolume_indicator",
    mode="max",
)
results.plot(
    benchmark_name=benchmark_name,
    plot_params=plot_params,
    dataframe_column_generator=dataframe_column_generator,
    one_result_per_trial=True,
)
Parameters:
  • metrics_and_modes (List[Tuple[str, str]]) – List of (metric, mode), see above

  • reference_point (Optional[ndarray]) – Reference point for hypervolume computation. If not given, a default value is used

  • increment (int) – If > 1, the HV indicator is linearly interpolated, this is faster. Defaults to 1 (no interpolation)

Returns:

Dataframe column generator

Subpackages
syne_tune.experiments.benchmark_definitions package
Submodules
syne_tune.experiments.benchmark_definitions.common module
class syne_tune.experiments.benchmark_definitions.common.SurrogateBenchmarkDefinition(max_wallclock_time, n_workers, elapsed_time_attr, metric, mode, blackbox_name, dataset_name, max_num_evaluations=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, max_resource_attr=None, datasets=None, fidelities=None, points_to_evaluate=None)[source]

Bases: object

Meta-data for tabulated benchmark, served by the blackbox repository.

For a standard benchmark, metric and mode are scalars, and there is a single metric. For a multi-objective benchmark (e.g., constrained HPO, cost-aware HPO, sampling of Pareto front), metric must be a list with the names of the different objectives. In this case, mode is a list of the same size or a scalar.

Note

In Syne Tune experimentation, a benchmark is simply a tuning problem (training and evaluation code or blackbox, together with defaults). They are useful beyond benchmarking (i.e., comparing different HPO methods with each other), in that many experimental studies compare setups with a single HPO method, but different variations of the tuning problem of the backend.

Parameters:
  • max_wallclock_time (float) – Default value for stopping criterion

  • n_workers (int) – Default value for tuner

  • elapsed_time_attr (str) – Name of metric reported

  • metric (Union[str, List[str]]) – Name of metric reported (or list of several)

  • mode (Union[str, List[str]]) – “max” or “min” (or list of several)

  • blackbox_name (str) – Name of blackbox, see load_blackbox()

  • dataset_name (str) – Dataset (or instance) for blackbox

  • max_num_evaluations (Optional[int]) – Default value for stopping criterion

  • surrogate (Optional[str]) – Default value for surrogate to be used, see make_surrogate(). Otherwise: use no surrogate

  • surrogate_kwargs (Optional[dict]) – Default value for arguments of surrogate, see make_surrogate()

  • add_surrogate_kwargs (Optional[dict]) – Arguments passed to add_surrogate(). Optional.

  • max_resource_attr (Optional[str]) – Internal name between backend and scheduler

  • datasets (Optional[List[str]]) – Used in transfer tuning

  • fidelities (Optional[List[int]]) – If given, this is a strictly increasing subset of the fidelity values provided by the surrogate, and only those will be reported

  • points_to_evaluate (Optional[List[Dict[str, Any]]]) – Initial configurations to be suggested by the scheduler. If your benchmark training code suggests default values for the hyperparameters, it is good practice serving this default configuration here.

max_wallclock_time: float
n_workers: int
elapsed_time_attr: str
metric: Union[str, List[str]]
mode: Union[str, List[str]]
blackbox_name: str
dataset_name: str
max_num_evaluations: Optional[int] = None
surrogate: Optional[str] = None
surrogate_kwargs: Optional[dict] = None
add_surrogate_kwargs: Optional[dict] = None
max_resource_attr: Optional[str] = None
datasets: Optional[List[str]] = None
fidelities: Optional[List[int]] = None
points_to_evaluate: Optional[List[Dict[str, Any]]] = None
class syne_tune.experiments.benchmark_definitions.common.RealBenchmarkDefinition(script, config_space, max_wallclock_time, n_workers, instance_type, metric, mode, max_resource_attr, framework, resource_attr=None, estimator_kwargs=None, max_num_evaluations=None, points_to_evaluate=None)[source]

Bases: object

Meta-data for real benchmark, given by code.

For a standard benchmark, metric and mode are scalars, and there is a single metric. For a multi-objective benchmark (e.g., constrained HPO, cost-aware HPO, sampling of Pareto front), metric must be a list with the names of the different objectives. In this case, mode is a list of the same size or a scalar.

Note

In Syne Tune experimentation, a benchmark is simply a tuning problem (training and evaluation code or blackbox, together with defaults). They are useful beyond benchmarking (i.e., comparing different HPO methods with each other), in that many experimental studies compare setups with a single HPO method, but different variations of the tuning problem of the backend.

Parameters:
  • script (Path) – Absolute filename of training script

  • config_space (Dict[str, Any]) – Default value for configuration space, must include max_resource_attr

  • max_wallclock_time (float) – Default value for stopping criterion

  • n_workers (int) – Default value for tuner

  • instance_type (str) – Default value for instance type

  • metric (str) – Name of metric reported (or list of several)

  • mode (str) – “max” or “min” (or list of several)

  • max_resource_attr (str) – Name of config_space entry

  • framework (str) – SageMaker framework to be used for script. Additional dependencies in requirements.txt in script.parent

:param resource_attr Name of attribute reported (required for

multi-fidelity)

Parameters:
  • estimator_kwargs (Optional[dict]) – Additional arguments to SageMaker estimator, e.g. framework_version

  • max_num_evaluations (Optional[int]) – Default value for stopping criterion

  • points_to_evaluate (Optional[List[Dict[str, Any]]]) – Initial configurations to be suggested by the scheduler. If your benchmark training code suggests default values for the hyperparameters, it is good practice serving this default configuration here.

script: Path
config_space: Dict[str, Any]
max_wallclock_time: float
n_workers: int
instance_type: str
metric: str
mode: str
max_resource_attr: str
framework: str
resource_attr: Optional[str] = None
estimator_kwargs: Optional[dict] = None
max_num_evaluations: Optional[int] = None
points_to_evaluate: Optional[List[Dict[str, Any]]] = None
syne_tune.experiments.benchmark_definitions.fcnet module
syne_tune.experiments.benchmark_definitions.fcnet.fcnet_benchmark(dataset_name)[source]
syne_tune.experiments.benchmark_definitions.lcbench module
syne_tune.experiments.benchmark_definitions.lcbench.lcbench_benchmark(dataset_name, datasets=None)[source]

The default is to use nearest neighbour regression with K=1. If you use a more sophisticated surrogate, it is recommended to also define add_surrogate_kwargs, for example:

surrogate="RandomForestRegressor",
add_surrogate_kwargs={
    "predict_curves": True,
    "fit_differences": ["time"],
},
Parameters:
  • dataset_name (str) – Value for dataset_name

  • datasets – Used for transfer learning

Return type:

SurrogateBenchmarkDefinition

Returns:

Definition of benchmark

syne_tune.experiments.benchmark_definitions.nas201 module
syne_tune.experiments.benchmark_definitions.nas201.nas201_benchmark(dataset_name)[source]
syne_tune.experiments.benchmark_definitions.yahpo module
syne_tune.experiments.launchers package
Submodules
syne_tune.experiments.launchers.hpo_main_common module
syne_tune.experiments.launchers.hpo_main_common.str2bool(v)[source]
Return type:

bool

class syne_tune.experiments.launchers.hpo_main_common.Parameter(name, type, help, default, required=False)[source]

Bases: object

name: str
type: Any
help: str
default: Any
required: bool = False
class syne_tune.experiments.launchers.hpo_main_common.ConfigDict(**kwargs)[source]

Bases: object

Dictinary with arguments for launcher scripts. Expected params as Parameter(name, type, default value)

check_if_all_paremeters_present(desired_parameters)[source]

Verify that all the parameers present in desired_parameters can be found in this ConfigDict

extra_parameters()[source]

Return all parameters beyond those required Required are the defauls and those requested in argparse

Return type:

List[Dict[str, Any]]

expand_base_arguments(extra_base_arguments)[source]

Expand the list of base argument for this experiment with those in extra_base_arguments

static from_argparse(extra_args=None)[source]

Build the configuration dict from command line arguments

Parameters:

extra_args (Optional[List[Dict[str, Any]]]) – Extra arguments for command line parser. Optional

Return type:

ConfigDict

static from_dict(loaded_config=None)[source]

Read the config from a dictionary

Return type:

ConfigDict

syne_tune.experiments.launchers.hpo_main_common.set_logging_level(args)[source]
syne_tune.experiments.launchers.hpo_main_common.get_metadata(seed, method, experiment_tag, benchmark_name, random_seed, max_size_data_for_model=None, benchmark=None, extra_metadata=None)[source]

Returns default value for metadata passed to Tuner.

Parameters:
  • seed (int) – Seed of repetition

  • method (str) – Name of method

  • experiment_tag (str) – Tag of experiment

  • benchmark_name (str) – Name of benchmark

  • random_seed (int) – Master random seed

  • max_size_data_for_model (Optional[int]) – Limits number of datapoints for surrogate model of BO, MOBSTER or HyperTune

  • benchmark (Union[SurrogateBenchmarkDefinition, RealBenchmarkDefinition, None]) – Optional. Take n_workers, max_wallclock_time from there

  • extra_metadata (Optional[Dict[str, Any]]) – metadata updated by these at the end. Optional

Return type:

Dict[str, Any]

Returns:

Default metadata dictionary

syne_tune.experiments.launchers.hpo_main_common.extra_metadata(args, extra_args)[source]
Return type:

Dict[str, Any]

syne_tune.experiments.launchers.hpo_main_common.config_from_argparse(extra_args, backend_specific_args)[source]

Define the configuration directory based on extra arguments

Return type:

ConfigDict

syne_tune.experiments.launchers.hpo_main_local module
syne_tune.experiments.launchers.hpo_main_local.get_benchmark(configuration, benchmark_definitions, **benchmark_kwargs)[source]

If configuration.benchmark is None and benchmark_definitions maps to a single benchmark, configuration.benchmark is set to its key.

Return type:

RealBenchmarkDefinition

syne_tune.experiments.launchers.hpo_main_local.create_objects_for_tuner(configuration, methods, method, benchmark, master_random_seed, seed, verbose, extra_tuning_job_metadata=None, map_method_args=None, extra_results=None, num_gpus_per_trial=1)[source]
Return type:

Dict[str, Any]

syne_tune.experiments.launchers.hpo_main_local.start_experiment_local_backend(configuration, methods, benchmark_definitions, extra_results=None, map_method_args=None, extra_tuning_job_metadata=None)[source]

Runs sequence of experiments with local backend sequentially. The loop runs over methods selected from methods and repetitions,

map_method_args can be used to modify method_kwargs for constructing MethodArguments, depending on configuration and the method. This allows for extra flexibility to specify specific arguments for chosen methods Its signature is method_kwargs = map_method_args(configuration, method, method_kwargs), where method is the name of the baseline.

Note

When this is launched remotely as entry point of a SageMaker training job (command line --launched_remotely 1), the backend is configured to write logs and checkpoints to a directory which is not synced to S3. This is different to the tuner path, which is “/opt/ml/checkpoints”, so that tuning results are synced to S3. Syncing checkpoints to S3 is not recommended (it is slow and can lead to failures, since several worker processes write to the same synced directory).

Parameters:
  • configuration (ConfigDict) – ConfigDict with parameters of the experiment. Must contain all parameters from LOCAL_BACKEND_EXTRA_PARAMETERS

  • methods (Dict[str, Callable[[MethodArguments], TrialScheduler]]) – Dictionary with method constructors.

  • benchmark_definitions (Callable[..., Dict[str, RealBenchmarkDefinition]]) – Definitions of benchmarks; one is selected from command line arguments

  • extra_results (Optional[ExtraResultsComposer]) – If given, this is used to append extra information to the results dataframe

  • map_method_args (Optional[Callable[[ConfigDict, str, Dict[str, Any]], Dict[str, Any]]]) – See above, optional

  • extra_tuning_job_metadata (Optional[Dict[str, Any]]) – Metadata added to the tuner, can be used to manage results

syne_tune.experiments.launchers.hpo_main_local.main(methods, benchmark_definitions, extra_args=None, map_method_args=None, extra_results=None)[source]

Runs sequence of experiments with local backend sequentially. The loop runs over methods selected from methods and repetitions, both controlled by command line arguments.

map_method_args can be used to modify method_kwargs for constructing MethodArguments, depending on configuration returned by parse_args() and the method. Its signature is method_kwargs = map_method_args(configuration, method, method_kwargs), where method is the name of the baseline. It is called just before the method is created.

Parameters:
  • methods (Dict[str, Callable[[MethodArguments], TrialScheduler]]) – Dictionary with method constructors

  • benchmark_definitions (Callable[..., Dict[str, RealBenchmarkDefinition]]) – Definitions of benchmarks; one is selected from command line arguments

  • extra_args (Optional[List[Dict[str, Any]]]) – Extra arguments for command line parser. Optional

  • map_method_args (Optional[Callable[[ConfigDict, str, Dict[str, Any]], Dict[str, Any]]]) – See above, optional

  • extra_results (Optional[ExtraResultsComposer]) – If given, this is used to append extra information to the results dataframe

syne_tune.experiments.launchers.hpo_main_sagemaker module
syne_tune.experiments.launchers.hpo_main_sagemaker.start_experiment_sagemaker_backend(configuration, methods, benchmark_definitions, extra_results=None, map_method_args=None, extra_tuning_job_metadata=None)[source]

Runs experiment with SageMaker backend.

map_method_args can be used to modify method_kwargs for constructing MethodArguments, depending on configuration and the method. This allows for extra flexibility to specify specific arguments for chosen methods Its signature is method_kwargs = map_method_args(configuration, method, method_kwargs), where method is the name of the baseline.

Parameters:
  • configuration (ConfigDict) – ConfigDict with parameters of the experiment. Must contain all parameters from SAGEMAKER_BACKEND_EXTRA_PARAMETERS

  • methods (Dict[str, Callable[[MethodArguments], TrialScheduler]]) – Dictionary with method constructors.

  • benchmark_definitions (Callable[..., Dict[str, RealBenchmarkDefinition]]) – Definitions of benchmarks; one is selected from command line arguments

  • extra_results (Optional[ExtraResultsComposer]) – If given, this is used to append extra information to the results dataframe

  • map_method_args (Optional[Callable[[ConfigDict, str, Dict[str, Any]], Dict[str, Any]]]) – See above, optional

  • extra_tuning_job_metadata (Optional[Dict[str, Any]]) – Metadata added to the tuner, can be used to manage results

syne_tune.experiments.launchers.hpo_main_sagemaker.main(methods, benchmark_definitions, extra_args=None, map_method_args=None, extra_results=None)[source]

Runs experiment with SageMaker backend.

Command line arguments must specify a single benchmark, method, and seed, for example --method ASHA --num_seeds 5 --start_seed 4 starts experiment with seed=4, or --method ASHA --num_seeds 1 starts experiment with seed=0. Here, ASHA must be key in methods.

map_method_args can be used to modify method_kwargs for constructing MethodArguments, depending on configuration returned by parse_args() and the method. Its signature is method_kwargs = map_method_args(configuration, method, method_kwargs), where method is the name of the baseline. It is called just before the method is created.

Parameters:
  • methods (Dict[str, Callable[[MethodArguments], TrialScheduler]]) – Dictionary with method constructors

  • benchmark_definitions (Callable[..., Dict[str, RealBenchmarkDefinition]]) – Definitions of benchmark; one is selected from command line arguments

  • extra_args (Optional[List[Dict[str, Any]]]) – Extra arguments for command line parser. Optional

  • map_method_args (Optional[Callable[[ConfigDict, str, Dict[str, Any]], Dict[str, Any]]]) – See above. Needed if extra_args is given

  • extra_results (Optional[ExtraResultsComposer]) – If given, this is used to append extra information to the results dataframe

syne_tune.experiments.launchers.hpo_main_simulator module
syne_tune.experiments.launchers.hpo_main_simulator.is_dict_of_dict(benchmark_definitions)[source]
Return type:

bool

syne_tune.experiments.launchers.hpo_main_simulator.get_transfer_learning_evaluations(blackbox_name, test_task, datasets, n_evals=None)[source]
Parameters:
  • blackbox_name (str) – name of blackbox

  • test_task (str) – task where the performance would be tested, it is excluded from transfer-learning evaluations

  • datasets (Optional[List[str]]) – subset of datasets to consider, only evaluations from those datasets are provided to

transfer-learning methods. If none, all datasets are used. :type n_evals: Optional[int] :param n_evals: maximum number of evaluations to be returned :rtype: Dict[str, Any] :return:

syne_tune.experiments.launchers.hpo_main_simulator.start_experiment_simulated_backend(configuration, methods, benchmark_definitions, extra_results=None, map_method_args=None, extra_tuning_job_metadata=None, use_transfer_learning=False)[source]

Runs sequence of experiments with simulator backend sequentially. The loop runs over methods selected from methods, repetitions and benchmarks selected from benchmark_definitions

map_method_args can be used to modify method_kwargs for constructing MethodArguments, depending on configuration and the method. This allows for extra flexibility to specify specific arguments for chosen methods Its signature is method_kwargs = map_method_args(configuration, method, method_kwargs), where method is the name of the baseline.

Parameters:
  • configuration (ConfigDict) – ConfigDict with parameters of the experiment. Must contain all parameters from LOCAL_LOCAL_SIMULATED_BENCHMARK_REQUIRED_PARAMETERS

  • methods (Dict[str, Callable[[MethodArguments], TrialScheduler]]) – Dictionary with method constructors.

  • benchmark_definitions (Union[Dict[str, SurrogateBenchmarkDefinition], Dict[str, Dict[str, SurrogateBenchmarkDefinition]]]) – Definitions of benchmarks; one is selected from command line arguments

  • extra_results (Optional[ExtraResultsComposer]) – If given, this is used to append extra information to the results dataframe

  • map_method_args (Optional[Callable[[ConfigDict, str, Dict[str, Any]], Dict[str, Any]]]) – See above, optional

  • extra_tuning_job_metadata (Optional[Dict[str, Any]]) – Metadata added to the tuner, can be used to manage results

  • use_transfer_learning (bool) – If True, we use transfer tuning. Defaults to False

syne_tune.experiments.launchers.hpo_main_simulator.main(methods, benchmark_definitions, extra_args=None, map_method_args=None, extra_results=None, use_transfer_learning=False)[source]

Runs sequence of experiments with simulator backend sequentially. The loop runs over methods selected from methods, repetitions and benchmarks selected from benchmark_definitions, with the range being controlled by command line arguments.

map_method_args can be used to modify method_kwargs for constructing MethodArguments, depending on configuration returned by parse_args() and the method. Its signature is method_kwargs = map_method_args(configuration, method, method_kwargs), where method is the name of the baseline. It is called just before the method is created.

Parameters:
  • methods (Dict[str, Callable[[MethodArguments], TrialScheduler]]) – Dictionary with method constructors

  • benchmark_definitions (Union[Dict[str, SurrogateBenchmarkDefinition], Dict[str, Dict[str, SurrogateBenchmarkDefinition]]]) – Definitions of benchmarks

  • extra_args (Optional[List[Dict[str, Any]]]) – Extra arguments for command line parser. Optional

  • map_method_args (Optional[Callable[[ConfigDict, str, Dict[str, Any]], Dict[str, Any]]]) – See above. Needed if extra_args given

  • extra_results (Optional[ExtraResultsComposer]) – If given, this is used to append extra information to the results dataframe

  • use_transfer_learning (bool) – If True, we use transfer tuning. Defaults to False

syne_tune.experiments.launchers.launch_remote_common module
syne_tune.experiments.launchers.launch_remote_common.sagemaker_estimator_args(entry_point, experiment_tag, tuner_name, benchmark=None, sagemaker_backend=False, source_dependencies=None)[source]

Returns SageMaker estimator keyword arguments for remote tuning job.

Note: We switch off SageMaker profiler and debugger, as both are not needed and consume extra resources and may introduce instabilities.

Parameters:
  • entry_point (Path) – Script for running HPO experiment, used for entry_point and source_dir arguments

  • experiment_tag (str) – Tag of experiment, used to create checkpoint_s3_uri

  • tuner_name (str) – Name of tuner, used to create checkpoint_s3_uri

  • benchmark (Union[SurrogateBenchmarkDefinition, RealBenchmarkDefinition, None]) – Benchmark definition, optional

  • sagemaker_backend (bool) – Is remote tuning job running the SageMaker backend? If not, it either runs local or simulator backend. Defaults to False

  • source_dependencies (Optional[List[str]]) – If given, these are additional source dependencies passed to the SageMaker estimator

Return type:

Dict[str, Any]

Returns:

Keyword arguments for SageMaker estimator

syne_tune.experiments.launchers.launch_remote_common.fit_sagemaker_estimator(backoff_wait_time, estimator, ntimes_resource_wait=100, **kwargs)[source]

Runs estimator.fit(**kwargs). If backoff_wait_time > 0, we make sure that if fit fails with ClientError of type “ResourceLimitExceeded”, we wait for backoff_wait_time seconds and try again (up to ntimes_resource_wait times).

If backoff_wait_time <= 0, the call of fit is not wrapped.

Parameters:
  • backoff_wait_time (int) – See above.

  • estimator (EstimatorBase) – SageMaker estimator to call fit for

  • ntimes_resource_wait (int) – Maximum number of retries

  • kwargs – Arguments for estimator.fit

syne_tune.experiments.launchers.launch_remote_local module
syne_tune.experiments.launchers.launch_remote_sagemaker module
syne_tune.experiments.launchers.launch_remote_simulator module
syne_tune.experiments.launchers.launch_remote_simulator.get_hyperparameters(seed, method, experiment_tag, random_seed, configuration)[source]

Compose hyperparameters for SageMaker training job

Parameters:
  • seed (int) – Seed of repetition

  • method (str) – Method name

  • experiment_tag (str) – Tag of experiment

  • random_seed (int) – Master random seed

  • configuration (ConfigDict) – Configuration for the job

Return type:

Dict[str, Any]

Returns:

Dictionary of hyperparameters

syne_tune.experiments.launchers.launch_remote_simulator.launch_remote(entry_point, methods, benchmark_definitions, source_dependencies=None, extra_args=None, is_expensive_method=None)[source]

Launches sequence of SageMaker training jobs, each running an experiment with the simulator backend.

The loop runs over methods selected from methods. Different repetitions (seeds) are run sequentially in the remote job. However, if is_expensive_method(method_name) is true, we launch different remote jobs for every seed for this particular method. This is to cater for methods which are themselves expensive to run (e.g., involving Gaussian process based Bayesian optimization).

If benchmark_definitions is a single-level dictionary and no benchmark is selected on the command line, then all benchmarks are run sequentially in the remote job. However, if benchmark_definitions is two-level nested, we loop over the outer level and start separate remote jobs, each of which iterates over its inner level of benchmarks. This is useful if the number of benchmarks to iterate over is large.

Parameters:
  • entry_point (Path) – Script for running the experiment

  • methods (Dict[str, Any]) – Dictionary with method constructors; one is selected from command line arguments

  • benchmark_definitions (Union[Dict[str, SurrogateBenchmarkDefinition], Dict[str, Dict[str, SurrogateBenchmarkDefinition]]]) – Definitions of benchmarks, can be nested (see above)

  • source_dependencies (Optional[List[str]]) – If given, these are source dependencies for the SageMaker estimator, on top of Syne Tune itself

  • extra_args (Optional[List[Dict[str, Any]]]) – Extra arguments for command line parser, optional

  • is_expensive_method (Optional[Callable[[str], bool]]) – See above. The default is a predicative always returning False (no method is expensive)

syne_tune.experiments.launchers.launch_remote_simulator.launch_remote_experiments_simulator(configuration, entry_point, methods, benchmark_definitions, source_dependencies, is_expensive_method=None)[source]

Launches sequence of SageMaker training jobs, each running an experiment with the simulator backend.

The loop runs over methods selected from methods. Different repetitions (seeds) are run sequentially in the remote job. However, if is_expensive_method(method_name) is true, we launch different remote jobs for every seed for this particular method. This is to cater for methods which are themselves expensive to run (e.g., involving Gaussian process based Bayesian optimization).

If benchmark_definitions is a single-level dictionary and no benchmark is selected on the command line, then all benchmarks are run sequentially in the remote job. However, if benchmark_definitions is two-level nested, we loop over the outer level and start separate remote jobs, each of which iterates over its inner level of benchmarks. This is useful if the number of benchmarks to iterate over is large.

Parameters:
  • configuration (ConfigDict) – ConfigDict with parameters of the benchmark. Must contain all parameters from hpo_main_simulator.LOCAL_LOCAL_SIMULATED_BENCHMARK_REQUIRED_PARAMETERS

  • entry_point (Path) – Script for running the experiment

  • methods (Dict[str, Callable[[MethodArguments], TrialScheduler]]) – Dictionary with method constructors; one is selected from command line arguments

  • benchmark_definitions (Union[Dict[str, SurrogateBenchmarkDefinition], Dict[str, Dict[str, SurrogateBenchmarkDefinition]]]) – Definitions of benchmarks; one is selected from command line arguments

  • is_expensive_method (Optional[Callable[[str], bool]]) – See above. The default is a predicative always returning False (no method is expensive)

syne_tune.experiments.launchers.utils module
syne_tune.experiments.launchers.utils.filter_none(a)[source]
Return type:

dict

syne_tune.experiments.launchers.utils.sync_from_s3_command(experiment_name, s3_bucket=None)[source]
Return type:

str

syne_tune.experiments.launchers.utils.message_sync_from_s3(experiment_tag)[source]
Return type:

str

syne_tune.experiments.launchers.utils.combine_requirements_txt(synetune_requirements_file, script)[source]
Return type:

Path

syne_tune.experiments.launchers.utils.ERR_MSG(fname)[source]
Return type:

str

syne_tune.experiments.launchers.utils.find_or_create_requirements_txt(entry_point, requirements_fname=None)[source]
Return type:

Path

syne_tune.experiments.launchers.utils.get_master_random_seed(random_seed)[source]
Return type:

int

syne_tune.experiments.launchers.utils.effective_random_seed(master_random_seed, seed)[source]
Return type:

int

syne_tune.experiments.visualization package
Submodules
syne_tune.experiments.visualization.aggregate_results module
syne_tune.experiments.visualization.aggregate_results.fill_trajectory(performance_list, time_list, replace_nan=nan)[source]
Return type:

(ndarray, ndarray)

syne_tune.experiments.visualization.aggregate_results.compute_mean_and_ci(metrics_runs, time)[source]

Aggregate is the mean, error bars are empirical estimate of 95% confidence interval for the true mean.

Note: Error bar scale depends on number of runs n via 1 / sqrt(n).

Return type:

Dict[str, ndarray]

syne_tune.experiments.visualization.aggregate_results.compute_median_percentiles(metrics_runs, time)[source]

Aggregate is the median, error bars are 25 and 75 percentiles.

Note: Error bar scale does not depend on number of runs.

Return type:

Dict[str, ndarray]

syne_tune.experiments.visualization.aggregate_results.compute_iqm_bootstrap(metrics_runs, time)[source]

The aggregate is the interquartile mean (IQM). Error bars are bootstrap estimate of 95% confidence interval for true IQM. This is the normal interval, based on the bootstrap variance estimate. While other bootstrap CI estimates are available, they are more expensive to compute.

Note: Error bar scale depends on number of runs n via 1 / sqrt(n).

Return type:

Dict[str, ndarray]

syne_tune.experiments.visualization.aggregate_results.aggregate_and_errors_over_time(errors, runtimes, mode='mean_and_ci')[source]
Return type:

Dict[str, ndarray]

syne_tune.experiments.visualization.multiobjective module
syne_tune.experiments.visualization.multiobjective.hypervolume_indicator_column_generator(metrics_and_modes, reference_point=None, increment=1)[source]

Returns generator for new dataframe column containing the best hypervolume indicator as function of wall-clock time, based on the metrics in metrics_and_modes (metric names correspond to column names in the dataframe). For a metric with mode == "max", we use its negative.

This mapping is used to create the dataframe_column_generator argument of plot(). Since the current implementation is not incremental and quite slow, if you plot results for single-fidelity HPO methods, it is strongly recommended to also use one_result_per_trial=True:

results = ComparativeResults(...)
dataframe_column_generator = hypervolume_indicator_column_generator(
    metrics_and_modes
)
plot_params = PlotParameters(
    metric="hypervolume_indicator",
    mode="max",
)
results.plot(
    benchmark_name=benchmark_name,
    plot_params=plot_params,
    dataframe_column_generator=dataframe_column_generator,
    one_result_per_trial=True,
)
Parameters:
  • metrics_and_modes (List[Tuple[str, str]]) – List of (metric, mode), see above

  • reference_point (Optional[ndarray]) – Reference point for hypervolume computation. If not given, a default value is used

  • increment (int) – If > 1, the HV indicator is linearly interpolated, this is faster. Defaults to 1 (no interpolation)

Returns:

Dataframe column generator

syne_tune.experiments.visualization.pareto_set module
syne_tune.experiments.visualization.pareto_set.get_pareto_optimal(costs)[source]

Find the pareto-optimal points :type costs: ndarray :param costs: (n_points, m_cost_values) array :return: (n_points, 1) indicator if point is on pareto front or not.

syne_tune.experiments.visualization.pareto_set.get_pareto_set(results, metrics, mode='min')[source]

Returns a subset of the results frame consisting of all Pareto optimal points. :type results: DataFrame :param results: pandas.DataFrame Experiment results dataframe generated by the Tuner object :type metrics: List[str] :param metrics: List that contains all metrics that should be optimized :type mode: Union[str, List[str], None] :param mode: Defines for each metric whether to maximize or minimize :return: DataFrame with Pareto set

syne_tune.experiments.visualization.plot_per_trial module
class syne_tune.experiments.visualization.plot_per_trial.MultiFidelityParameters(rung_levels, multifidelity_setups)[source]

Bases: object

Parameters configuring the multi-fidelity version of TrialsOfExperimentResults.

multifidelity_setups contains names of setups which are multi-fidelity, the remaining ones are single-fidelity. It can also be a dictionary, mapping a multi-fidelity setup name to True if this is a pause-and-resume method (these are visualized differently), False otherwise (early stopping method).

Parameters:
  • rung_levels (List[int]) – See above. Positive integers, increasing

  • multifidelity_setups (Union[List[str], Dict[str, bool]]) – See above

rung_levels: List[int]
multifidelity_setups: Union[List[str], Dict[str, bool]]
check_params(setups)[source]
class syne_tune.experiments.visualization.plot_per_trial.TrialsOfExperimentResults(experiment_names, setups, metadata_to_setup, plot_params=None, multi_fidelity_params=None, benchmark_key='benchmark', seed_key='seed', with_subdirs='*', datetime_bounds=None, download_from_s3=False, s3_bucket=None)[source]

Bases: object

This class loads, processes, and plots metric results for single experiments, where the curves for different trials have different colours.

Compared to ComparativeResults, each subfigure uses data from a single experiment (one benchmark, one seed, one setup). Both benchmark and seed need to be chosen in plot(). If there are different setups, they give rise to subfigures.

If plot_params.subplots is not given, the arrangement is one row with columns corresponding to setups, and setup names as titles. Specify plot_params.subplots in order to change this arrangement (e.g., to have more than one row). Setups can be selected by using plot_params.subplots.subplot_indices. Also, if plot_params.subplots.titles is not given, we use setup names, and each subplot gets its own title (plot_params.subplots.title_each_figure is ignored).

For plot_params, we use the same PlotParameters as in ComparativeResults, but some fields are not used here (title, aggregate_mode, show_one_trial, subplots.legend_no, subplots.xlims).

Parameters:
  • experiment_names (Tuple[str, ...]) – Tuple of experiment names (prefixes, without the timestamps)

  • setups (Iterable[str]) – Possible values of setup names

  • metadata_to_setup (Union[Callable[[Dict[str, Any]], Optional[str]], Dict[str, Callable[[Dict[str, Any]], Optional[str]]]]) – See above

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Can be overwritten in plot(). See PlotParameters

  • multi_fidelity_params (Optional[MultiFidelityParameters]) – If given, we use a special variant tailored to multi-fidelity methods (see plot()).

  • benchmark_key (Optional[str]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this is None, there is only a single benchmark, and all results are merged together

  • seed_key (str) – Key for seed in metadata files. Defaults to “seed”.

  • with_subdirs (Union[str, List[str], None]) – See above. Defaults to “*”

  • datetime_bounds (Union[Tuple[Optional[str], Optional[str]], Dict[str, Tuple[Optional[str], Optional[str]]], None]) – See above

  • download_from_s3 (bool) – Should result files be downloaded from S3? This is supported only if with_subdirs

  • s3_bucket (Optional[str]) – Only if download_from_s3 == True. If not given, the default bucket for the SageMaker session is used

plot(benchmark_name=None, seed=0, plot_params=None, file_name=None)[source]

Creates a plot, whose subfigures should metric data from single experiments. In general:

  • Each trial has its own color, which is cycled through periodically. The cycling depends on the largest rung level for the trial. This is to avoid neighboring curves to have the same color

For single-fidelity methods (default, multi_fidelity_params not given):

  • The learning curve for a trial ends with ‘o’. If it reports only once at the end, this is all that is shown for the trial

For multi-fidelity methods:

  • Learning curves are plotted in contiguous chunks of execution. For pause and resume setups (those in ``multi_fidelity_params.pause_resume_setups), they are interrupted. Each chunk starts at the epoch after resume and ends at the epoch where the trial is paused

  • Values at rung levels are marked as ‘o’. If this is the furthest the trial got to, the marker is ‘D’ (diamond)

Results for different setups are plotted as subfigures, either using the setup in plot_params.subplots, or as columns of a single row.

Parameters:
  • benchmark_name (Optional[str]) – Name of benchmark for which to plot results. Not needed if there is only one benchmark

  • seed (int) – Seed number. Defaults to 0

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.

  • file_name (Optional[str]) – If given, the figure is stored in a file of this name

syne_tune.experiments.visualization.plotting module
class syne_tune.experiments.visualization.plotting.SubplotParameters(nrows=None, ncols=None, titles=None, title_each_figure=None, kwargs=None, legend_no=None, xlims=None, subplot_indices=None)[source]

Bases: object

Parameters specifying an arrangement of subplots. kwargs is mandatory.

Parameters:
  • nrows (Optional[int]) – Number of rows of subplot matrix

  • ncols (Optional[int]) – Number of columns of subplot matrix

  • titles (Optional[List[str]]) – If given, these are titles for each column in the arrangement of subplots. If title_each_figure == True, these are titles for each subplot. If titles is not given, then PlotParameters.title is printed on top of the leftmost column

  • title_each_figure (Optional[bool]) – See titles, defaults to False

  • kwargs (Optional[Dict[str, Any]]) – Extra arguments for plt.subplots, apart from “nrows” and “ncols”

  • legend_no (Optional[List[int]]) – Subplot indices where legend is to be shown. Defaults to [] (no legends shown). This is not relative to subplot_indices

  • xlims (Optional[List[int]]) – If this is given, must be a list with one entry per subfigure. In this case, the global xlim is overwritten by (0, xlims[subplot_no]). If subplot_indices is given, xlims must have the same length, and xlims[j] refers to subplot index subplot_indices[j] then

  • subplot_indices (Optional[List[int]]) – If this is given, we only plot subfigures with indices in this list, and in this order. Otherwise, we plot subfigures 0, 1, 2, …

nrows: int = None
ncols: int = None
titles: List[str] = None
title_each_figure: bool = None
kwargs: Dict[str, Any] = None
legend_no: List[int] = None
xlims: List[int] = None
subplot_indices: List[int] = None
merge_defaults(default_params)[source]
Return type:

SubplotParameters

class syne_tune.experiments.visualization.plotting.ShowTrialParameters(setup_name=None, trial_id=None, new_setup_name=None)[source]

Bases: object

Parameters specifying the show_init_trials feature. This features adds one more curve to each subplot where setup_name features. This curve shows best metric value found for trials with ID <= trial_id. The right-most value is extended as constant line across the remainder of the x-axis, for better visibility.

Parameters:
  • setup_name (Optional[str]) – Setup from which the trial performance is taken

  • trial_id (Optional[int]) – ID of trial. Defaults to 0. If this is positive, data from trials with IDs <= trial_id are shown

  • new_setup_name (Optional[str]) – Name of the additional curve in legends

setup_name: str = None
trial_id: int = None
new_setup_name: str = None
merge_defaults(default_params)[source]
Return type:

ShowTrialParameters

class syne_tune.experiments.visualization.plotting.PlotParameters(metric=None, mode=None, title=None, xlabel=None, ylabel=None, xlim=None, ylim=None, metric_multiplier=None, convert_to_min=None, tick_params=None, aggregate_mode=None, dpi=None, grid=None, subplots=None, show_init_trials=None)[source]

Bases: object

Parameters specifying the figure.

If convert_to_min == True, then smaller is better in plots. An original metric value metric_val is converted as metric_multiplier * metric_val if mode == "min", and as 1 - metric_multiplier * metric_val if mode == "max". If convert_to_min == False`, we always convert as metric_multiplier * metric_val, so that larger is better if mode == "max".

Parameters:
  • metric (Optional[str]) – Name of metric, mandatory

  • mode (Optional[str]) – See above, “min” or “max”. Defaults to “min” if not given

  • title (Optional[str]) – Title of plot. If subplots is used, see SubplotParameters

  • xlabel (Optional[str]) – Label for x axis. If subplots is used, this is printed below each column. Defaults to DEFAULT_XLABEL

  • ylabel (Optional[str]) – Label for y axis. If subplots is used, this is printed left of each row

  • xlim (Optional[Tuple[float, float]]) – (x_min, x_max) for x axis. If subplots is used, see SubplotParameters

  • ylim (Optional[Tuple[float, float]]) – (y_min, y_max) for y axis.

  • metric_multiplier (Optional[float]) – See above. Defaults to 1

  • convert_to_min (Optional[bool]) – See above. Defaults to True

  • tick_params (Optional[Dict[str, Any]]) – Params for ax.tick_params

  • aggregate_mode (Optional[str]) –

    How are values across seeds aggregated?

    • ”mean_and_ci”: Mean and 0.95 normal confidence interval

    • ”median_percentiles”: Mean and 25, 75 percentiles

    • ”iqm_bootstrap”: Interquartile mean and 0.95 confidence interval based on the bootstrap variance estimate

    Defaults to DEFAULT_AGGREGATE_MODE

  • dpi (Optional[int]) – Resolution of figure in DPI. Defaults to 200

  • grid (Optional[bool]) – Figure with grid? Defaults to False

  • subplots (Optional[SubplotParameters]) – If given, the figure consists of several subplots. See SubplotParameters

  • show_init_trials (Optional[ShowTrialParameters]) – See ShowTrialParameters

metric: str = None
mode: str = None
title: str = None
xlabel: str = None
ylabel: str = None
xlim: Tuple[float, float] = None
ylim: Tuple[float, float] = None
metric_multiplier: float = None
convert_to_min: bool = None
tick_params: Dict[str, Any] = None
aggregate_mode: str = None
dpi: int = None
grid: bool = None
subplots: SubplotParameters = None
show_init_trials: ShowTrialParameters = None
merge_defaults(default_params)[source]
Return type:

PlotParameters

syne_tune.experiments.visualization.plotting.group_results_dataframe(df)[source]
Return type:

Dict[Tuple[int, str], List[Tuple[str, DataFrame]]]

syne_tune.experiments.visualization.plotting.filter_final_row_per_trial(grouped_dfs)[source]

We filter rows such that only one row per trial ID remains, namely the one with the largest time stamp. This makes sense for single-fidelity methods, where reports have still been done after every epoch.

Return type:

Dict[Tuple[int, str], List[Tuple[str, DataFrame]]]

syne_tune.experiments.visualization.plotting.enrich_results(grouped_dfs, column_name, dataframe_column_generator)[source]
Return type:

Dict[Tuple[int, str], List[Tuple[str, DataFrame]]]

class syne_tune.experiments.visualization.plotting.ComparativeResults(experiment_names, setups, num_runs, metadata_to_setup, plot_params=None, metadata_to_subplot=None, benchmark_key='benchmark', with_subdirs='*', datetime_bounds=None, metadata_keys=None, metadata_subplot_level=False, download_from_s3=False, s3_bucket=None)[source]

Bases: object

This class loads, processes, and plots results of a comparative study, combining several experiments for different methods, seeds, and benchmarks (optional). Note that an experiment corresponds to one run of HPO, resulting in files ST_METADATA_FILENAME for metadata, and ST_RESULTS_DATAFRAME_FILENAME for time-stamped results.

There is one comparative plot per benchmark (aggregation of results across benchmarks are not supported here). Results are grouped by setup (which usually equates to method), and then summary statistics are shown for each setup as function of wall-clock time. The plot can also have several subplots, in which case results are first grouped into subplot number, then setup.

If benchmark_key is None, there is only a single benchmark, and all results are merged together.

Both setup name and subplot number (optional) can be configured by the user, as function of metadata written for each experiment. The functions metadata_to_setup and metadata_to_subplot (optional) can also be used for filtering: results of experiments for which any of them returns None, are not used.

When grouping results w.r.t. benchmark name and setup name, we should end up with num_runs experiments. These are (typically) random repetitions with different seeds. If after grouping, a different number of experiments is found for some setup, a warning message is printed. In this case, we recommend to check the completeness of result files. Common reasons:

  • Less than num_runs experiments found. Experiments failed, or files were not properly synced.

  • More than num_runs experiments found. This happens if initial experiments for the study failed, but ended up writing results. This can be fixed by either removing the result files, or by using datetime_bounds (since initial failed experiments ran first).

Result files have the path f"{experiment_path()}{ename}/{patt}/{ename}-*/", where path is from with_subdirs, and ename from experiment_names. The default is with_subdirs="*". If with_subdirs is None, result files have the path f"{experiment_path()}{ename}-*/". Use this if your experiments have been run locally.

If datetime_bounds is given, it contains a tuple of strings (lower_time, upper_time), or a dictionary mapping names from experiment_names to such tuples. Both strings are time-stamps in the format ST_DATETIME_FORMAT (example: “2023-03-19-22-01-57”), and each can be None as well. This serves to filter out any result whose time-stamp does not fall within the interval (both sides are inclusive), where None means the interval is open on that side. This feature is useful to filter out results of erroneous attempts.

If metadata_keys is given, it contains a list of keys into the metadata. In this case, metadata values for these keys are extracted and can be retrieved with metadata_values(). In fact, metadata_values(benchmark_name) returns a nested dictionary, where result[key][setup_name] is a list of values. If metadata_subplot_level is True and metadata_to_subplot is given, the result structure is result[key][setup_name][subplot_no]. This should be set if different subplots share the same setup names, since otherwise metadata values are only grouped by setup name.

Parameters:
  • experiment_names (Tuple[str, ...]) – Tuple of experiment names (prefixes, without the timestamps)

  • setups (Iterable[str]) – Possible values of setup names

  • num_runs (int) – When grouping results w.r.t. benchmark name and setup name, we should end up with this many experiments. See above

  • metadata_to_setup (Union[Callable[[Dict[str, Any]], Optional[str]], Dict[str, Callable[[Dict[str, Any]], Optional[str]]]]) – See above

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Can be overwritten in plot(). See PlotParameters

  • metadata_to_subplot (Optional[Callable[[Dict[str, Any]], Optional[int]]]) – See above. Optional

  • benchmark_key (Optional[str]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this is None, there is only a single benchmark, and all results are merged together

  • with_subdirs (Union[str, List[str], None]) – See above. Defaults to “*”

  • datetime_bounds (Union[Tuple[Optional[str], Optional[str]], Dict[str, Tuple[Optional[str], Optional[str]]], None]) – See above

  • metadata_keys (Optional[List[str]]) – See above

  • metadata_subplot_level (bool) – See above. Defaults to False

  • download_from_s3 (bool) – Should result files be downloaded from S3? This is supported only if with_subdirs

  • s3_bucket (Optional[str]) – Only if download_from_s3 == True. If not given, the default bucket for the SageMaker session is used

metadata_values(benchmark_name=None)[source]

The nested dictionary returned has the structure result[key][setup_name], or result[key][setup_name][subplot_no] if metadata_subplot_level == True.

Parameters:

benchmark_name (Optional[str]) – Name of benchmark

Return type:

Dict[str, Any]

Returns:

Nested dictionary with meta-data values

plot(benchmark_name=None, plot_params=None, file_name=None, extra_results_keys=None, dataframe_column_generator=None, one_result_per_trial=False)[source]

Create comparative plot from results of all experiments collected at construction, for benchmark benchmark_name (if there is a single benchmark only, this need not be given).

If plot_params.show_init_trials is given, the best metric value curve for the data from trials <=  plot_params.show_init_trials.trial_id in a particular setup plot_params.show_init_trials.setup_name is shown in all subplots the setup is contained in. This is useful to contrast the performance of methods against the performance for one particular trial, for example the initial configuration (i.e., to show how much this can be improved upon). The final metric value of this extra curve is extended until the end of the horizontal range, in order to make it visible. The corresponding curve is labeled with plot_params.show_init_trials.new_setup_name in the legend.

If extra_results_keys is given, these are column names in the result dataframe. For each setup and seed, we collect the values for the largest time stamp. We return a nested dictionary extra_results, so that extra_results[setup_name][key] contains values (over seeds), where key is in extra_results_keys. If metadata_subplot_level is True and metadata_to_subplot is given, the structure is extra_results[setup_name][subplot_no][key].

If dataframe_column_generator is given, it maps a result dataframe for a single experiment to a new column named plot_params.metric. This is applied before computing cumulative maximum or minimum and aggregation over seeds. This way, we can plot derived metrics which are not contained in the results as columns. Note that the transformed dataframe is not retained.

Parameters:
  • benchmark_name (Optional[str]) – Name of benchmark for which to plot results. Not needed if there is only one benchmark

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.

  • file_name (Optional[str]) – If given, the figure is stored in a file of this name

  • extra_results_keys (Optional[List[str]]) – See above, optional

  • dataframe_column_generator (Optional[Callable[[DataFrame], Series]]) – See above, optional

  • one_result_per_trial (bool) – If True, results for each experiment are filtered down to one row per trial (the one with the largest time stamp). This is useful for results from a single-fidelity method, where the training script reported results after every epoch.

Return type:

Dict[str, Any]

Returns:

Dictionary with “fig”, “axs” (for further processing). If extra_results_keys, “extra_results” entry as stated above

syne_tune.experiments.visualization.results_utils module
syne_tune.experiments.visualization.results_utils.create_index_for_result_files(experiment_names, metadata_to_setup, metadata_to_subplot=None, metadata_keys=None, metadata_subplot_level=False, benchmark_key='benchmark', with_subdirs='*', datetime_bounds=None, seed_key=None)[source]

Helper function for ComparativeResults.

Runs over all result directories for experiments of a comparative study. For each experiment, we read the metadata file, extract the benchmark name (key benchmark_key), and use metadata_to_setup, metadata_to_subplot to map the metadata to setup name and subplot index. If any of the two return None, the result is not used. Otherwise, we enter (result_path, setup_name, subplot_no) into the list for benchmark name. Here, result_path is the result path for the experiment, without the experiment_path() prefix. The index returned is the dictionary from benchmark names to these list. It allows loading results specifically for each benchmark, and we do not have to load and parse the metadata files again.

If benchmark_key is None, the returned index is a dictionary with a single element only, and the metadata files need not contain an entry for benchmark name.

Result files have the path f"{experiment_path()}{ename}/{patt}/{ename}-*/", where path is from with_subdirs, and ename from experiment_names. The default is with_subdirs="*". If with_subdirs is None, result files have the path f"{experiment_path()}{ename}-*/". This is an older convention, which makes it harder to sync files from S3, it is not recommended.

If metadata_keys is given, it contains a list of keys into the metadata. In this case, a nested dictionary metadata_values is returned, where metadata_values[benchmark_name][key][setup_name] contains a list of metadata values for this benchmark, key in metadata_keys, and setup name. In this case, if metadata_subplot_level is True and metadata_to_subplot is given, metadata_values has the structure metadata_values[benchmark_name][key][setup_name][subplot_no]. This should be set if different subplots share the same setup names.

If datetime_bounds is given, it contains a tuple of strings (lower_time, upper_time), or a dictionary mapping experiment names (from experiment_names) to such tuples. Both strings are time-stamps in the format ST_DATETIME_FORMAT (example: “2023-03-19-22-01-57”), and each can be None as well. This serves to filter out any result whose time-stamp does not fall within the interval (both sides are inclusive), where None means the interval is open on that side. This feature is useful to filter out results of erroneous attempts.

If seed_key is given, the returned index is a dictionary with keys (benchmark_name, seed), where seed is the value corresponding to seed_key in the metadata dict. This mode is needed for plots focusing on a single experiment.

Parameters:
  • experiment_names (Tuple[str, ...]) – Tuple of experiment names (prefixes, without the timestamps)

  • metadata_to_setup (Union[Callable[[Dict[str, Any]], Optional[str]], Dict[str, Callable[[Dict[str, Any]], Optional[str]]]]) – See above

  • metadata_to_subplot (Optional[Callable[[Dict[str, Any]], Optional[int]]]) – See above. Optional

  • metadata_keys (Optional[List[str]]) – See above. Optional

  • metadata_subplot_level (bool) – See above. Defaults to False

  • benchmark_key (Optional[str]) – Key for benchmark in metadata files. Defaults to “benchmark”

  • with_subdirs (Union[str, List[str], None]) – See above. Defaults to “*”

  • datetime_bounds (Union[Tuple[Optional[str], Optional[str]], Dict[str, Tuple[Optional[str], Optional[str]]], None]) – See above

  • seed_key (Optional[str]) – See above

Return type:

Union[Dict[str, Any], Dict[Tuple[str, int], Any]]

Returns:

Dictionary; entry “index” for index (see above); entry “setup_names” for setup names encountered; entry “metadata_values” see metadata_keys

syne_tune.experiments.visualization.results_utils.load_results_dataframe_per_benchmark(experiment_list)[source]

Helper function for ComparativeResults.

Loads time-stamped results for all experiments in experiments_list and returns them in a single dataframe with additional columns “setup_name”, “suplot_no”, “tuner_name”, whose values are constant across data for one experiment, allowing for later grouping.

Parameters:

experiment_list (List[Tuple[str, str, int]]) – Information about experiments, see create_index_for_result_files()

Return type:

Optional[DataFrame]

Returns:

Dataframe with all results combined

syne_tune.experiments.visualization.results_utils.download_result_files_from_s3(experiment_names, s3_bucket=None)[source]

Downloads result files from S3. This works only if the result objects on S3 have prefixes f"{s3_experiment_path(s3_bucket)}{ename}/", where ename is in experiment_names. Only files with names ST_METADATA_FILENAME and ST_RESULTS_DATAFRAME_FILENAME are downloaded.

Parameters:
  • experiment_names (Tuple[str, ...]) – Tuple of experiment names (prefixes, without the timestamps)

  • s3_bucket (Optional[str]) – If not given, the default bucket for the SageMaker session is used

Submodules
syne_tune.experiments.baselines module
class syne_tune.experiments.baselines.MethodArguments(config_space, metric, mode, random_seed, resource_attr, max_resource_attr=None, scheduler_kwargs=None, transfer_learning_evaluations=None, use_surrogates=False, fcnet_ordinal=None, num_gpus_per_trial=1)[source]

Bases: object

Arguments for creating HPO method (scheduler). We collect the union of optional arguments for all use cases here.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space (typically taken from benchmark definition)

  • metric (str) – Name of metric to optimize

  • mode (str) – Whether metric is minimized (“min”) or maximized (“max”)

  • random_seed (int) – Different for different repetitions

  • resource_attr (str) – Name of resource attribute

  • max_resource_attr (Optional[str]) – Name of max_resource_value in config_space. One of max_resource_attr, max_t is mandatory

  • scheduler_kwargs (Optional[Dict[str, Any]]) – If given, overwrites defaults of scheduler arguments

  • transfer_learning_evaluations (Optional[Dict[str, Any]]) – Support for transfer learning. Only for simulator backend experiments right now

  • use_surrogates (bool) – For simulator backend experiments, defaults to False

  • fcnet_ordinal (Optional[str]) – Only for simulator backend and fcnet blackbox. This blackbox is tabulated with finite domains, one of which has irregular spacing. If fcnet_ordinal="none", this is left as categorical, otherwise we use ordinal encoding with kind=fcnet_ordinal.

  • num_gpus_per_trial (int) – Only for local backend and GPU training. Number of GPUs assigned to a trial. This is passed here, because it needs to be written into the configuration space for some benchmarks. Defaults to 1

config_space: Dict[str, Any]
metric: str
mode: str
random_seed: int
resource_attr: str
max_resource_attr: Optional[str] = None
scheduler_kwargs: Optional[Dict[str, Any]] = None
transfer_learning_evaluations: Optional[Dict[str, Any]] = None
use_surrogates: bool = False
fcnet_ordinal: Optional[str] = None
num_gpus_per_trial: int = 1
syne_tune.experiments.baselines.default_arguments(args, extra_args)[source]
Return type:

Dict[str, Any]

syne_tune.experiments.baselines.convert_categorical_to_ordinal(config_space)[source]
Parameters:

config_space (Dict[str, Any]) – Configuration space

Return type:

Dict[str, Any]

Returns:

New configuration space where all categorical domains are replaced by ordinal ones (with kind="equal")

syne_tune.experiments.baselines.convert_categorical_to_ordinal_numeric(config_space, kind, do_convert=None)[source]

Converts categorical domains to ordinal ones, of type kind. This is not done if kind="none", or if do_convert(config_space) == False.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space

  • kind (Optional[str]) – Type of ordinal, or "none"

  • do_convert (Optional[Callable[[Dict[str, Any]], bool]]) – See above. The default is testing for the config space of the fcnet blackbox

Return type:

Dict[str, Any]

Returns:

New configuration space

syne_tune.experiments.default_baselines module
syne_tune.experiments.default_baselines.RandomSearch(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.GridSearch(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.BayesianOptimization(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.KDE(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.BORE(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.BoTorch(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.REA(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.ConstrainedBayesianOptimization(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.ASHA(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.MOBSTER(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.HyperTune(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.BOHB(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.DyHPO(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.ASHABORE(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.SyncHyperband(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.SyncBOHB(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.DEHB(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.SyncMOBSTER(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.MOREA(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.LSOBO(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.NSGA2(method_arguments, **kwargs)[source]
syne_tune.experiments.default_baselines.MORandomScalarizationBayesOpt(method_arguments, **kwargs)[source]
syne_tune.experiments.experiment_result module
class syne_tune.experiments.experiment_result.ExperimentResult(name, results, metadata, tuner, path)[source]

Bases: object

Wraps results dataframe and provides retrieval services.

Parameters:
  • name (str) – Name of experiment

  • results (DataFrame) – Dataframe containing results of experiment

  • metadata (Dict[str, Any]) – Metadata stored along with results

  • tuner (Tuner) – Tuner object stored along with results

  • path (Path) – local path where the experiment is stored

name: str
results: DataFrame
metadata: Dict[str, Any]
tuner: Tuner
path: Path
creation_date()[source]
Returns:

Timestamp when Tuner was created

plot_hypervolume(metrics_to_plot=None, reference_point=None, figure_path=None, **plt_kwargs)[source]

Plot best hypervolume value as function of wallclock time

Parameters:
  • reference_point (Optional[ndarray]) – Reference point for hypervolume calculations. If None, the maximum values of each metric is used.

  • figure_path (Optional[str]) – If specified, defines the path where the figure will be saved. If None, the figure is shown

  • plt_kwargs – Arguments to matplotlib.pyplot.plot()

plot(metric_to_plot=0, figure_path=None, **plt_kwargs)[source]

Plot best metric value as function of wallclock time

Parameters:
  • metric_to_plot (Union[str, int]) – Indicates which metric to plot, can be the index or a name of the metric. default to 0 - first metric defined

  • figure_path (Optional[str]) – If specified, defines the path where the figure will be saved. If None, the figure is shown

  • plt_kwargs – Arguments to matplotlib.pyplot.plot()

plot_trials_over_time(metric_to_plot=0, figure_path=None, figsize=None)[source]

Plot trials results over as function of wallclock time

Parameters:
  • metric_to_plot (Union[str, int]) – Indicates which metric to plot, can be the index or a name of the metric. default to 0 - first metric defined

  • figure_path (Optional[str]) – If specified, defines the path where the figure will be saved. If None, the figure is shown

  • figsize – width and height of figure

metric_mode()[source]
Return type:

Union[str, List[str]]

metric_names()[source]
Return type:

List[str]

entrypoint_name()[source]
Return type:

str

best_config(metric=0)[source]

Return the best config found for the specified metric :type metric: Union[str, int] :param metric: Indicates which metric to use, can be the index or a name of the metric.

default to 0 - first metric defined in the Scheduler

Return type:

Dict[str, Any]

Returns:

Configuration corresponding to best metric value

syne_tune.experiments.experiment_result.download_single_experiment(tuner_name, s3_bucket=None, experiment_name=None)[source]

Downloads results from S3 of a tuning experiment

Parameters:
  • tuner_name (str) – Name of tuner to be retrieved.

  • s3_bucket (Optional[str]) – If not given, the default bucket for the SageMaker session is used

  • experiment_name (Optional[str]) – If given, this is used as first directory.

syne_tune.experiments.experiment_result.load_experiment(tuner_name, download_if_not_found=True, load_tuner=False, local_path=None, experiment_name=None)[source]

Load results from an experiment

Parameters:
  • tuner_name (str) – Name of a tuning experiment previously run

  • download_if_not_found (bool) – If True, fetch results from S3 if not found locally

  • load_tuner (bool) – Whether to load the tuner in addition to metadata and results

  • local_path (Optional[str]) – Path containing the experiment to load. If not specified, ~/{SYNE_TUNE_FOLDER}/ is used.

  • experiment_name (Optional[str]) – If given, this is used as first directory.

Return type:

ExperimentResult

Returns:

Result object

syne_tune.experiments.experiment_result.get_metadata(path_filter=None, root=PosixPath('/home/docs/syne-tune'))[source]

Load meta-data for a number of experiments

Parameters:
  • path_filter (Optional[Callable[[str], bool]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.

  • root (Path) – Root path for experiment results. Default is experiment_path()

Return type:

Dict[str, dict]

Returns:

Dictionary from tuner name to metadata dict

syne_tune.experiments.experiment_result.list_experiments(path_filter=None, experiment_filter=None, root=PosixPath('/home/docs/syne-tune'), load_tuner=False)[source]

List experiments for which results are found

Parameters:
  • path_filter (Optional[Callable[[str], bool]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.

  • experiment_filter (Optional[Callable[[ExperimentResult], bool]]) – Filter on ExperimentResult, optional

  • root (Path) – Root path for experiment results. Default is result of experiment_path()

  • load_tuner (bool) – Whether to load the tuner in addition to metadata and results

Return type:

List[ExperimentResult]

Returns:

List of result objects

syne_tune.experiments.experiment_result.load_experiments_df(path_filter=None, experiment_filter=None, root=PosixPath('/home/docs/syne-tune'), load_tuner=False)[source]
Parameters:
  • path_filter (Optional[Callable[[str], bool]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.

  • experiment_filter (Optional[Callable[[ExperimentResult], bool]]) – Filter on ExperimentResult

  • root (Path) – Root path for experiment results. Default is experiment_path()

  • load_tuner (bool) – Whether to load the tuner in addition to metadata and results

Return type:

DataFrame

Returns:

Dataframe that contains all evaluations reported by tuners according to the filter given. The columns contain trial-id, hyperparameter evaluated, metrics reported via Reporter. These metrics are collected automatically:

  • st_worker_time (indicating time spent in the worker when report was seen)

  • time (indicating wallclock time measured by the tuner)

  • decision decision taken by the scheduler when observing the result

  • status status of the trial that was shown to the tuner

  • config_{xx} configuration value for the hyperparameter {xx}

  • tuner_name named passed when instantiating the Tuner

  • entry_point_name, entry_point_path name and path of the entry point that was tuned

syne_tune.experiments.util module
syne_tune.optimizer package
Subpackages
syne_tune.optimizer.schedulers package
class syne_tune.optimizer.schedulers.FIFOScheduler(config_space, **kwargs)[source]

Bases: TrialSchedulerWithSearcher

Scheduler which executes trials in submission order.

This is the most basic scheduler template. It can be configured to many use cases by choosing searcher along with search_options.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • searcher (str or BaseSearcher) – Searcher for get_config decisions. String values are passed to searcher_factory() along with search_options and extra information. Supported values: SUPPORTED_SEARCHERS_FIFO. Defaults to “random” (i.e., random search)

  • search_options (Dict[str, Any], optional) – If searcher is str, these arguments are passed to searcher_factory()

  • metric (str or List[str]) – Name of metric to optimize, key in results obtained via on_trial_result. For multi-objective schedulers, this can also be a list

  • mode (str or List[str], optional) – “min” if metric is minimized, “max” if metric is maximized, defaults to “min”. This can also be a list if metric is a list

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If not given, this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified. Note: If searcher is of type BaseSearcher, points_to_evaluate must be set there.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If this is given, max_t is not needed. We recommend to use max_resource_attr over max_t. If given, we use it to infer max_resource_level. It is also used to limit trial executions in promotion-based multi-fidelity schedulers (see class:HyperbandScheduler, type="promotion").

  • max_t (int, optional) – Value for max_resource_level. Needed for schedulers which make use of intermediate reports via on_trial_result. If this is not given, we try to infer its value from config_space (see ResourceLevelsScheduler). checking config_space["epochs"], config_space["max_t"], and config_space["max_epochs"]. If max_resource_attr is given, we use the value config_space[max_resource_attr]. But if max_t is given here, it takes precedence.

  • time_keeper (TimeKeeper, optional) – This will be used for timing here (see _elapsed_time). The time keeper has to be started at the beginning of the experiment. If not given, we use a local time keeper here, which is started with the first call to _suggest(). Can also be set after construction, with set_time_keeper(). Note: If you use SimulatorBackend, you need to pass its time_keeper here.

property searcher: BaseSearcher | None
set_time_keeper(time_keeper)[source]

Assign time keeper after construction.

This is possible only if the time keeper was not assigned at construction, and the experiment has not yet started.

Parameters:

time_keeper (TimeKeeper) – Time keeper to be used

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

Union[str, List[str]]

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

is_multiobjective_scheduler()[source]

Return True if a scheduler is multi-objective.

Return type:

bool

class syne_tune.optimizer.schedulers.HyperbandScheduler(config_space, **kwargs)[source]

Bases: FIFOScheduler, MultiFidelitySchedulerMixin, RemoveCheckpointsSchedulerMixin

Implements different variants of asynchronous Hyperband

See type for the different variants. One implementation detail is when using multiple brackets, task allocation to bracket is done randomly, based on a distribution which can be configured.

For definitions of concepts (bracket, rung, milestone), see

Li, Jamieson, Rostamizadeh, Gonina, Hardt, Recht, Talwalkar (2018)
A System for Massively Parallel Hyperparameter Tuning

or

Tiao, Klein, Lienart, Archambeau, Seeger (2020)
Model-based Asynchronous Hyperparameter and Neural Architecture Search

Note

This scheduler requires both metric and resource_attr to be returned by the reporter. Here, resource values must be positive int. If resource_attr == "epoch", this should be the number of epochs done, starting from 1 (not the epoch number, starting from 0).

Rung levels and promotion quantiles

Rung levels are values of the resource attribute at which stop/go decisions are made for jobs, comparing their metric against others at the same level. These rung levels (positive, strictly increasing) can be specified via rung_levels, the largest must be <= max_t. If rung_levels is not given, they are specified by grace_period and reduction_factor or rung_increment:

  • If \(r_{min}\) is grace_period, \(\eta\) is reduction_factor, then rung levels are \(\mathrm{round}(r_{min} \eta^j), j=0, 1, \dots\). This is the default choice for successive halving (Hyperband).

  • If rung_increment is given, but not reduction_factor, then rung levels are \(r_{min} + j \nu, j=0, 1, \dots\), where \(\nu\) is rung_increment.

If rung_levels is given, then grace_period, reduction_factor, rung_increment are ignored. If they are given, a warning is logged.

The rung levels determine the quantiles to be used in the stop/go decisions. If rung levels are \(r_j\), define \(q_j = r_j / r_{j+1}\). \(q_j\) is the promotion quantile at rung level \(r_j\). On average, a fraction of \(q_j\) jobs can continue, the remaining ones are stopped (or paused). In the default successive halving case, we have \(q_j = 1/\eta\) for all \(j\).

Cost-aware schedulers or searchers

Some schedulers (e.g., type == "cost_promotion") or searchers may depend on cost values (with key cost_attr) reported alongside the target metric. For promotion-based scheduling, a trial may pause and resume several times. The cost received in on_trial_result only counts the cost since the last resume. We maintain the sum of such costs in _cost_offset(), and append a new entry to result in on_trial_result with the total cost. If the evaluation function does not implement checkpointing, once a trial is resumed, it has to start from scratch. We detect this in on_trial_result and reset the cost offset to 0 (if the trial runs from scratch, the cost reported needs no offset added).

Note

This process requires cost_attr to be set

Pending evaluations

The searcher is notified, by searcher.register_pending calls, of (trial, resource) pairs for which evaluations are running, and a result is expected in the future. These pending evaluations can be used by the searcher in order to direct sampling elsewhere.

The choice of pending evaluations depends on searcher_data. If equal to “rungs”, pending evaluations sit only at rung levels, because observations are only used there. In the other cases, pending evaluations sit at all resource levels for which observations are obtained. For example, if a trial is at rung level \(r\) and continues towards the next rung level \(r_{next}\), if searcher_data == "rungs", searcher.register_pending is called for \(r_{next}\) only, while for other searcher_data values, pending evaluations are registered for \(r + 1, r + 2, \dots, r_{next}\). However, if in this case, register_pending_myopic is True, we instead call searcher.register_pending for \(r + 1\) when each observation is obtained (not just at a rung level). This leads to less pending evaluations at any one time. On the other hand, when a trial is continued at a rung level, we already know it will emit observations up to the next rung level, so it seems more “correct” to register all these pending evaluations in one go.

Additional arguments on top of parent class FIFOScheduler:

Parameters:
  • searcher (str or BaseSearcher) – Searcher for get_config decisions. String values are passed to searcher_factory() along with search_options and extra information. Supported values: SUPPORTED_SEARCHERS_HYPERBAND. Defaults to “random” (i.e., random search)

  • resource_attr (str, optional) – Name of resource attribute in results obtained via on_trial_result, defaults to “epoch”

  • grace_period (int, optional) – Minimum resource to be used for a job. Ignored if rung_levels is given. Defaults to 1

  • reduction_factor (float, optional) – Parameter to determine rung levels. Ignored if rung_levels is given. Must be \(\ge 2\), defaults to 3

  • rung_increment (int, optional) – Parameter to determine rung levels. Ignored if rung_levels or reduction_factor are given. Must be postive

  • rung_levels (List[int], optional) – If given, prescribes the set of rung levels to be used. Must contain positive integers, strictly increasing. This information overrides grace_period, reduction_factor, rung_increment. Note that the stop/promote rule in the successive halving scheduler is set based on the ratio of successive rung levels.

  • brackets (int, optional) – Number of brackets to be used in Hyperband. Each bracket has a different grace period, all share max_t and reduction_factor. If brackets == 1 (default), we run asynchronous successive halving.

  • type (str, optional) –

    Type of Hyperband scheduler. Defaults to “stopping”. Supported values (see also subclasses of RungSystem):

    • stopping: A config eval is executed by a single task. The task is stopped at a milestone if its metric is worse than a fraction of those who reached the milestone earlier, otherwise it continues. See StoppingRungSystem.

    • promotion: A config eval may be associated with multiple tasks over its lifetime. It is never terminated, but may be paused. Whenever a task becomes available, it may promote a config to the next milestone, if better than a fraction of others who reached the milestone. If no config can be promoted, a new one is chosen. See PromotionRungSystem.

    • cost_promotion: This is a cost-aware variant of ‘promotion’, see CostPromotionRungSystem for details. In this case, costs must be reported under the name rung_system_kwargs["cost_attr"] in results.

    • pasha: Similar to promotion type Hyperband, but it progressively expands the available resources until the ranking of configurations stabilizes.

    • rush_stopping: A variation of the stopping scheduler which requires passing rung_system_kwargs and points_to_evaluate. The first rung_system_kwargs["num_threshold_candidates"] of points_to_evaluate will enforce stricter rules on which task is continued. See RUSHStoppingRungSystem and RUSHScheduler.

    • rush_promotion: Same as rush_stopping but for promotion, see RUSHPromotionRungSystem

    • dyhpo: A model-based scheduler, which can be seen as extension of “promotion” with rung_increment rather than reduction_factor, see DynamicHPOSearcher

  • cost_attr (str, optional) – Required if the scheduler itself uses a cost metric (i.e., type="cost_promotion"), or if the searcher uses a cost metric. See also header comment.

  • searcher_data (str, optional) –

    Relevant only if a model-based searcher is used. Example: For NN tuning and ``resource_attr == “epoch”’, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:

    • ”rungs” (default): Only results at rung levels. Cheapest

    • ”all”: All results. Most expensive

    • ”rungs_and_last”: Results at rung levels, plus the most recent result. This means that in between rung levels, only the most recent result is used by the searcher. This is in between

    Note: For a Gaussian additive learning curve surrogate model, this has to be set to ‘all’.

  • register_pending_myopic (bool, optional) – See above. Used only if searcher_data != "rungs". Defaults to False

  • rung_system_per_bracket (bool, optional) – This concerns Hyperband with brackets > 1. Defaults to False. When starting a job for a new config, it is assigned a randomly sampled bracket. The larger the bracket, the larger the grace period for the config. If rung_system_per_bracket == True, we maintain separate rung level systems for each bracket, so that configs only compete with others started in the same bracket. If rung_system_per_bracket == False, we use a single rung level system, so that all configs compete with each other. In this case, the bracket of a config only determines the initial grace period, i.e. the first milestone at which it starts competing with others. This is the default. The concept of brackets in Hyperband is meant to hedge against overly aggressive filtering in successive halving, based on low fidelity criteria. In practice, successive halving (i.e., brackets = 1) often works best in the asynchronous case (as implemented here). If brackets > 1, the hedging is stronger if rung_system_per_bracket is True.

  • do_snapshots (bool, optional) – Support snapshots? If True, a snapshot of all running tasks and rung levels is returned by _promote_trial(). This snapshot is passed to searcher.get_config. Defaults to False. Note: Currently, only the stopping variant supports snapshots.

  • rung_system_kwargs (Dict[str, Any], optional) –

    Arguments passed to the rung system: * num_threshold_candidates: Used if ``type in [“rush_promotion”,

    ”rush_stopping”]``. The first num_threshold_candidates in points_to_evaluate enforce stricter requirements to the continuation of training tasks. See RUSHScheduler.

    • probability_sh: Used if type == "dyhpo". In DyHPO, we typically all paused trials against a number of new configurations, and the winner is either resumed or started (new trial). However, with the probability given here, we instead try to promote a trial as if type == "promotion". If no trial can be promoted, we fall back to the DyHPO logic. Use this to make DyHPO robust against starting too many new trials, because all paused ones score poorly (this happens especially at the beginning).

  • early_checkpoint_removal_kwargs (Dict[str, Any], optional) – If given, speculative early removal of checkpoints is done, see HyperbandRemoveCheckpointsCallback. The constructor arguments for the HyperbandRemoveCheckpointsCallback must be given here, if they cannot be inferred (key max_num_checkpoints is mandatory). This feature is used only for scheduler types which pause and resume trials.

does_pause_resume()[source]
Return type:

bool

Returns:

Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?

property rung_levels: List[int]

Note that all entries of rung_levels are smaller than max_t (or config_space[max_resource_attr]): rung levels are resource levels where stop/go decisions are made. In particular, if rung_levels is passed at construction with rung_levels[-1] == max_t, this last entry is stripped off.

Returns:

Rung levels (strictly increasing, positive ints)

property num_brackets: int
Returns:

Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1

property resource_attr: str
Returns:

Name of resource attribute in reported results

property max_resource_level: int
Returns:

Maximum resource level

property searcher_data: str
Returns:

Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config() may become. Choices:

  • ”rungs”: Only results at rung levels. Cheapest

  • ”all”: All results. Most expensive

  • ”rungs_and_last”: Results at rung levels plus last recent one. Not available for all multi-fidelity schedulers

on_trial_error(trial)[source]

Called when a trial has failed.

Parameters:

trial (Trial) – Trial for which error is reported.

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict[str, Any]) – Result dictionary

callback_for_checkpoint_removal(stop_criterion)[source]
Parameters:

stop_criterion (Callable[[TuningStatus], bool]) – Stopping criterion, as passed to Tuner

Return type:

Optional[TunerCallback]

Returns:

CP removal callback, or None if CP removal is not activated

class syne_tune.optimizer.schedulers.MedianStoppingRule(scheduler, resource_attr, running_average=True, metric=None, grace_time=1, grace_population=5, rank_cutoff=0.5)[source]

Bases: TrialScheduler

Applies median stopping rule in top of an existing scheduler.

  • If result at time-step ranks less than the cutoff of other results observed at this time-step, the trial is interrupted and otherwise, the wrapped scheduler is called to make the stopping decision.

  • Suggest decisions are left to the wrapped scheduler.

  • The mode of the wrapped scheduler is used.

Reference:

Google Vizier: A Service for Black-Box Optimization.
Golovin et al. 2017.
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, August 2017
Pages 1487–1495
Parameters:
  • scheduler (TrialScheduler) – Scheduler to be called for trial suggestion or when median-stopping-rule decision is to continue.

  • resource_attr (str) – Key in the reported dictionary that accounts for the resource (e.g. epoch).

  • running_average (bool) – If True, then uses the running average of observation instead of raw observations. Defaults to True

  • metric (Optional[str]) – Metric to be considered, defaults to scheduler.metric

  • grace_time (Optional[int]) – Median stopping rule is only applied for results whose resource_attr exceeds this amount. Defaults to 1

  • grace_population (int) – Median stopping rule when at least grace_population have been observed at a resource level. Defaults to 5

  • rank_cutoff (float) – Results whose quantiles are below this level are discarded. Defaults to 0.5 (median)

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

grace_condition(time_step)[source]
Parameters:

time_step (float) – Value result[self.resource_attr]

Return type:

bool

Returns:

Decide for continue?

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

class syne_tune.optimizer.schedulers.PopulationBasedTraining(config_space, custom_explore_fn=None, **kwargs)[source]

Bases: FIFOScheduler

Implements the Population Based Training (PBT) algorithm. This is an adapted version of the Ray Tune implementation:

https://docs.ray.io/en/latest/tune/tutorials/tune-advanced-tutorial.html

PBT was originally presented in the following paper:

Jaderberg et. al.
Population Based Training of Neural Networks

Population based training (PBT) maintains a population of models spread across an asynchronous set of workers and dynamically adjust their hyperparameters during training. Every time a worker reaches a user-defined milestone, it returns the performance of the currently evaluated network. If the network is within the top percentile of the population, the worker resumes its training until the next milestone. If not, PBT selects a model from the top percentile uniformly at random. The worker now continues with the latest checkpoint of this new model but mutates the hyperparameters.

The mutation happens as following. For each hyperparameter, we either resample its value uniformly at random, or otherwise increment (multiply by 1.2) or decrement (multiply by 0.8) the value (probability 0.5 each). For categorical hyperparameters, the value is always resampled uniformly.

Note: While this is implemented as child of FIFOScheduler, we require searcher="random" (default), since the current code only supports a random searcher.

Additional arguments on top of parent class FIFOScheduler.

Parameters:
  • resource_attr (str) – Name of resource attribute in results obtained via on_trial_result, defaults to “time_total_s”

  • population_size (int, optional) – Size of the population, defaults to 4

  • perturbation_interval (float, optional) – Models will be considered for perturbation at this interval of resource_attr. Note that perturbation incurs checkpoint overhead, so you shouldn’t set this to be too frequent. Defaults to 60

  • quantile_fraction (float, optional) – Parameters are transferred from the top quantile_fraction fraction of trials to the bottom quantile_fraction fraction. Needs to be between 0 and 0.5. Setting it to 0 essentially implies doing no exploitation at all. Defaults to 0.25

  • resample_probability (float, optional) – The probability of resampling from the original distribution when applying _explore(). If not resampled, the value will be perturbed by a factor of 1.2 or 0.8 if continuous, or changed to an adjacent value if discrete. Defaults to 0.25

  • custom_explore_fn (function, optional) – Custom exploration function. This function is invoked as f(config) instead of the built-in perturbations, and should return config updated as needed. If this is given, resample_probability is not used

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

class syne_tune.optimizer.schedulers.RayTuneScheduler(config_space, ray_scheduler=None, ray_searcher=None, points_to_evaluate=None)[source]

Bases: TrialScheduler

Allow using Ray scheduler and searcher. Any searcher/scheduler should work, except such which need access to TrialRunner (e.g., PBT), this feature is not implemented in Syne Tune.

If ray_searcher is not given (defaults to random searcher), initial configurations to evaluate can be passed in points_to_evaluate. If ray_searcher is given, this argument is ignored (needs to be passed to ray_searcher at construction). Note: Use impute_points_to_evaluate() in order to preprocess points_to_evaluate specified by the user or the benchmark.

Parameters:
  • config_space (Dict) – Configuration space

  • ray_scheduler – Ray scheduler, defaults to FIFO scheduler

  • ray_searcher (Optional[Searcher]) – Ray searcher, defaults to random search

  • points_to_evaluate (Optional[List[Dict]]) – See above

RT_FIFOScheduler

alias of FIFOScheduler

RT_Searcher

alias of Searcher

class RandomSearch(config_space, points_to_evaluate, mode)[source]

Bases: Searcher

suggest(trial_id)[source]

Queries the algorithm to retrieve the next set of parameters.

Return type:

Optional[Dict]

Arguments:

trial_id: Trial ID used for subsequent notifications.

Returns:
dict | FINISHED | None: Configuration for a trial, if possible.

If FINISHED is returned, Tune will be notified that no more suggestions/configurations will be provided. If None is returned, Tune will skip the querying of the searcher for this step.

on_trial_complete(trial_id, result=None, error=False)[source]

Notification for the completion of trial.

Typically, this method is used for notifying the underlying optimizer of the result.

Args:

trial_id: A unique string ID for the trial. result: Dictionary of metrics for current training progress.

Note that the result dict may include NaNs or may not include the optimization metric. It is up to the subclass implementation to preprocess the result to avoid breaking the optimization process. Upon errors, this may also be None.

error: True if the training process raised an error.

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_error(trial)[source]

Called when a trial has failed.

Parameters:

trial (Trial) – Trial for which error is reported.

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict) – Result dictionary

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

static convert_config_space(config_space)[source]

Converts config_space from our type to the one of Ray Tune.

Note: randint(lower, upper) in Ray Tune has exclusive upper, while this is inclusive for us. On the other hand, lograndint(lower, upper) has inclusive upper in Ray Tune as well.

Parameters:

config_space – Configuration space

Returns:

config_space converted into Ray Tune type

Subpackages
syne_tune.optimizer.schedulers.multiobjective package
class syne_tune.optimizer.schedulers.multiobjective.MOASHA(config_space, metrics, mode=None, time_attr='training_iteration', multiobjective_priority=None, max_t=100, grace_period=1, reduction_factor=3, brackets=1)[source]

Bases: TrialScheduler

Implements MultiObjective Asynchronous Successive HAlving with different multiobjective sort options. References:

A multi-objective perspective on jointly tuning hardware and hyperparameters
David Salinas, Valerio Perrone, Cedric Archambeau and Olivier Cruchant
NAS workshop, ICLR2021.

and

Multi-objective multi-fidelity hyperparameter optimization with application to fairness
Robin Schmucker, Michele Donini, Valerio Perrone, Cédric Archambeau
Parameters:
  • config_space (Dict[str, Any]) – Configuration space

  • metrics (List[str]) – List of metric names MOASHA optimizes over

  • mode (Union[str, List[str], None]) – One of {"min", "max"} or a list of these values (same size as metrics). Determines whether objectives are minimized or maximized. Defaults to “min”

  • time_attr (str) – A training result attr to use for comparing time. Note that you can pass in something non-temporal such as training_iteration as a measure of progress, the only requirement is that the attribute should increase monotonically. Defaults to “training_iteration”

  • multiobjective_priority (Optional[MOPriority]) – The multiobjective priority that is used to sort multiobjective candidates. We support several choices such as non-dominated sort or linear scalarization, default is non-dominated sort.

  • max_t (int) – max time units per trial. Trials will be stopped after max_t time units (determined by time_attr) have passed. Defaults to 100

  • grace_period (int) – Only stop trials at least this old in time. The units are the same as the attribute named by time_attr. Defaults to 1

  • reduction_factor (float) – Used to set halving rate and amount. This is simply a unit-less scalar. Defaults to 3

  • brackets (int) – Number of brackets. Each bracket has a different grace_period and number of rung levels. Defaults to 1

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict[str, Any]) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict[str, Any]) – Result dictionary

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

is_multiobjective_scheduler()[source]

Return True if a scheduler is multi-objective.

Return type:

bool

class syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveRegularizedEvolution(config_space, metric, mode, points_to_evaluate=None, population_size=100, sample_size=10, multiobjective_priority=None, **kwargs)[source]

Bases: RegularizedEvolution

Adapts regularized evolution algorithm by Real et al. to the multi-objective setting. Elements in the populations are scored via a multi-objective priority that is set to non-dominated sort by default. Parents are sampled from the population based on this score.

Additional arguments on top of parent class syne_tune.optimizer.schedulers.searchers.StochasticSearcher:

Parameters:
  • mode (Union[List[str], str]) – Mode to use for the metric given, can be “min” or “max”, defaults to “min”

  • population_size (int) – Size of the population, defaults to 100

  • sample_size (int) – Size of the candidate set to obtain a parent for the mutation, defaults to 10

class syne_tune.optimizer.schedulers.multiobjective.NSGA2Searcher(config_space, metric, mode='min', points_to_evaluate=None, population_size=20, **kwargs)[source]

Bases: StochasticSearcher

This is a wrapper around the NSGA-2 [1] implementation of pymoo [2].

[1] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan.
A fast and elitist multiobjective genetic algorithm: nsga-II.
Trans. Evol. Comp, 6(2):182–197, April 2002.
[2] J. Blank and K. Deb
pymoo: Multi-Objective Optimization in Python
IEEE Access, 2020
Parameters:
  • config_space (Dict[str, Any]) – Configuration space

  • metric (List[str]) –

    Name of metric passed to update(). Can be obtained from scheduler in configure_scheduler(). In the case of multi-objective optimization,

    metric is a list of strings specifying all objectives to be optimized.

  • points_to_evaluate (Optional[List[dict]]) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • mode (Union[List[str], str]) – Should metric be minimized (“min”, default) or maximized (“max”). In the case of multi-objective optimization, mode can be a list defining for each metric if it is minimized or maximized

  • population_size (int) – Size of the population

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[Dict[str, Any]]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

class syne_tune.optimizer.schedulers.multiobjective.LinearScalarizedScheduler(config_space, metric, mode='min', scalarization_weights=None, base_scheduler_factory=None, **base_scheduler_kwargs)[source]

Bases: TrialScheduler

Scheduler with linear scalarization of multiple objectives

This method optimizes a single objective equal to the linear scalarization of given two objectives. The scalarized single objective is named: "scalarized_<metric1>_<metric2>_..._<metricN>".

Parameters:
  • base_scheduler_factory (Optional[Callable[[Any], TrialScheduler]]) – Factory method for the single-objective scheduler used on the scalarized objective. It will be initialized inside this scheduler. Defaults to FIFOScheduler.

  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (List[str]) – Names of metrics to optimize

  • mode (Union[List[str], str]) – Modes of metrics to optimize (“min” or “max”). All must be matching.

  • scalarization_weights (Union[ndarray, List[float], None]) – Weights used to scalarize objectives. Defaults to an array of 1s

  • base_scheduler_kwargs – Additional arguments to base_scheduler_factory beyond config_space, metric, mode

scalarization_weights: ndarray
single_objective_metric: str
base_scheduler: TrialScheduler
on_trial_add(trial)[source]

Called when a new trial is added to the trial runner. See the docstring of the chosen base_scheduler for details

on_trial_error(trial)[source]

Called when a trial has failed. See the docstring of the chosen base_scheduler for details

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial. See the docstring of the chosen base_scheduler for details

Return type:

str

on_trial_complete(trial, result)[source]

Notification for the completion of trial. See the docstring of the chosen base_scheduler for details

on_trial_remove(trial)[source]

Called to remove trial. See the docstring of the chosen base_scheduler for details

trials_checkpoints_can_be_removed()[source]

See the docstring of the chosen base_scheduler for details :rtype: List[int] :return: IDs of paused trials for which checkpoints can be removed

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names.

metric_mode()[source]
Return type:

Union[str, List[str]]

Returns:

“min” if target metric is minimized, otherwise “max”.

metadata()[source]
Return type:

Dict[str, Any]

Returns:

Metadata of the scheduler

is_multiobjective_scheduler()[source]

Return True if a scheduler is multi-objective.

Return type:

bool

class syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveMultiSurrogateSearcher(config_space, metric, estimators, mode='min', points_to_evaluate=None, scoring_class=None, num_initial_candidates=250, num_initial_random_choices=3, allow_duplicates=False, restrict_configurations=None, clone_from_state=False, **kwargs)[source]

Bases: BayesianOptimizationSearcher

Multi Objective Multi Surrogate Searcher for FIFO scheduler

This searcher must be used with FIFOScheduler. It provides Bayesian optimization, based on a scikit-learn estimator based surrogate model.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • estimator – Instance of SKLearnEstimator to be used as surrogate model

  • scoring_class (Optional[Callable[[Any], ScoringFunction]]) – The scoring function (or acquisition function) class and any extra parameters used to instantiate it. If None, expected improvement (EI) is used. Note that the acquisition function is not locally optimized with this searcher.

  • num_initial_candidates (int) – Number of candidates sampled for scoring with acquisition function.

  • num_initial_random_choices (int) – Number of randomly chosen candidates before surrogate model is used.

  • allow_duplicates (bool) – If True, allow for the same candidate to be selected more than once.

  • restrict_configurations (Optional[List[Dict[str, Any]]]) – If given, the searcher only suggests configurations from this list. If allow_duplicates == False, entries are popped off this list once suggested.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

class syne_tune.optimizer.schedulers.multiobjective.MultiObjectiveLCBRandomLinearScalarization(predictor, active_metric=None, weights_sampler=None, kappa=0.5, normalize_acquisition=True, random_seed=None)[source]

Bases: ScoringFunction

Note: This is the multi objective random scalarization scoring function based on the work of Biswajit et al. [1]. This scoring function uses Lower Confidence Bound as the acquisition for the scalarized objective \(h(\mu, \sigma) = \mu - \kappa * \sigma\)

[1] Paria, Biswajit, Kirthevasan Kandasamy and Barnabás Póczos.
A Flexible Framework for Multi-Objective Bayesian Optimization using Random Scalarizations.
Conference on Uncertainty in Artificial Intelligence (2018).
Parameters:
  • predictor (Dict[str, Predictor]) – Surrogate predictor for statistics of predictive distribution

  • weights_sampler (Optional[Callable[[], Dict[str, float]]]) –

    Callable that can generate weights for each objective. Once called it will return a dictionary mapping metric name to scalarization weight as {

    <name of metric 1> : <weight for metric 1>, <name of metric 2> : <weight for metric 2>, …

    }

  • kappa (float) – Hyperparameter used for the LCM portion of the scoring

  • normalize_acquisition (bool) – If True, use rank-normalization on the acquisition function results before weighting.

  • random_seed (Optional[int]) – The random seed used for default weights_sampler if not provided.

score(candidates, predictor=None)[source]
Parameters:
  • candidates (Iterable[Dict[str, Union[int, float, str]]]) – Configurations for which scores are to be computed

  • predictor (Optional[Dict[str, Predictor]]) – Overrides default predictor

Return type:

List[float]

Returns:

List of score values, length of candidates

Submodules
syne_tune.optimizer.schedulers.multiobjective.linear_scalarizer module
class syne_tune.optimizer.schedulers.multiobjective.linear_scalarizer.LinearScalarizedScheduler(config_space, metric, mode='min', scalarization_weights=None, base_scheduler_factory=None, **base_scheduler_kwargs)[source]

Bases: TrialScheduler

Scheduler with linear scalarization of multiple objectives

This method optimizes a single objective equal to the linear scalarization of given two objectives. The scalarized single objective is named: "scalarized_<metric1>_<metric2>_..._<metricN>".

Parameters:
  • base_scheduler_factory (Optional[Callable[[Any], TrialScheduler]]) – Factory method for the single-objective scheduler used on the scalarized objective. It will be initialized inside this scheduler. Defaults to FIFOScheduler.

  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (List[str]) – Names of metrics to optimize

  • mode (Union[List[str], str]) – Modes of metrics to optimize (“min” or “max”). All must be matching.

  • scalarization_weights (Union[ndarray, List[float], None]) – Weights used to scalarize objectives. Defaults to an array of 1s

  • base_scheduler_kwargs – Additional arguments to base_scheduler_factory beyond config_space, metric, mode

scalarization_weights: ndarray
single_objective_metric: str
base_scheduler: TrialScheduler
on_trial_add(trial)[source]

Called when a new trial is added to the trial runner. See the docstring of the chosen base_scheduler for details

on_trial_error(trial)[source]

Called when a trial has failed. See the docstring of the chosen base_scheduler for details

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial. See the docstring of the chosen base_scheduler for details

Return type:

str

on_trial_complete(trial, result)[source]

Notification for the completion of trial. See the docstring of the chosen base_scheduler for details

on_trial_remove(trial)[source]

Called to remove trial. See the docstring of the chosen base_scheduler for details

trials_checkpoints_can_be_removed()[source]

See the docstring of the chosen base_scheduler for details :rtype: List[int] :return: IDs of paused trials for which checkpoints can be removed

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names.

metric_mode()[source]
Return type:

Union[str, List[str]]

Returns:

“min” if target metric is minimized, otherwise “max”.

metadata()[source]
Return type:

Dict[str, Any]

Returns:

Metadata of the scheduler

is_multiobjective_scheduler()[source]

Return True if a scheduler is multi-objective.

Return type:

bool

syne_tune.optimizer.schedulers.multiobjective.moasha module
class syne_tune.optimizer.schedulers.multiobjective.moasha.MOASHA(config_space, metrics, mode=None, time_attr='training_iteration', multiobjective_priority=None, max_t=100, grace_period=1, reduction_factor=3, brackets=1)[source]

Bases: TrialScheduler

Implements MultiObjective Asynchronous Successive HAlving with different multiobjective sort options. References:

A multi-objective perspective on jointly tuning hardware and hyperparameters
David Salinas, Valerio Perrone, Cedric Archambeau and Olivier Cruchant
NAS workshop, ICLR2021.

and

Multi-objective multi-fidelity hyperparameter optimization with application to fairness
Robin Schmucker, Michele Donini, Valerio Perrone, Cédric Archambeau
Parameters:
  • config_space (Dict[str, Any]) – Configuration space

  • metrics (List[str]) – List of metric names MOASHA optimizes over

  • mode (Union[str, List[str], None]) – One of {"min", "max"} or a list of these values (same size as metrics). Determines whether objectives are minimized or maximized. Defaults to “min”

  • time_attr (str) – A training result attr to use for comparing time. Note that you can pass in something non-temporal such as training_iteration as a measure of progress, the only requirement is that the attribute should increase monotonically. Defaults to “training_iteration”

  • multiobjective_priority (Optional[MOPriority]) – The multiobjective priority that is used to sort multiobjective candidates. We support several choices such as non-dominated sort or linear scalarization, default is non-dominated sort.

  • max_t (int) – max time units per trial. Trials will be stopped after max_t time units (determined by time_attr) have passed. Defaults to 100

  • grace_period (int) – Only stop trials at least this old in time. The units are the same as the attribute named by time_attr. Defaults to 1

  • reduction_factor (float) – Used to set halving rate and amount. This is simply a unit-less scalar. Defaults to 3

  • brackets (int) – Number of brackets. Each bracket has a different grace_period and number of rung levels. Defaults to 1

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict[str, Any]) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict[str, Any]) – Result dictionary

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

is_multiobjective_scheduler()[source]

Return True if a scheduler is multi-objective.

Return type:

bool

syne_tune.optimizer.schedulers.multiobjective.multi_objective_regularized_evolution module
class syne_tune.optimizer.schedulers.multiobjective.multi_objective_regularized_evolution.MultiObjectiveRegularizedEvolution(config_space, metric, mode, points_to_evaluate=None, population_size=100, sample_size=10, multiobjective_priority=None, **kwargs)[source]

Bases: RegularizedEvolution

Adapts regularized evolution algorithm by Real et al. to the multi-objective setting. Elements in the populations are scored via a multi-objective priority that is set to non-dominated sort by default. Parents are sampled from the population based on this score.

Additional arguments on top of parent class syne_tune.optimizer.schedulers.searchers.StochasticSearcher:

Parameters:
  • mode (Union[List[str], str]) – Mode to use for the metric given, can be “min” or “max”, defaults to “min”

  • population_size (int) – Size of the population, defaults to 100

  • sample_size (int) – Size of the candidate set to obtain a parent for the mutation, defaults to 10

syne_tune.optimizer.schedulers.multiobjective.multi_surrogate_multi_objective_searcher module
class syne_tune.optimizer.schedulers.multiobjective.multi_surrogate_multi_objective_searcher.MultiObjectiveMultiSurrogateSearcher(config_space, metric, estimators, mode='min', points_to_evaluate=None, scoring_class=None, num_initial_candidates=250, num_initial_random_choices=3, allow_duplicates=False, restrict_configurations=None, clone_from_state=False, **kwargs)[source]

Bases: BayesianOptimizationSearcher

Multi Objective Multi Surrogate Searcher for FIFO scheduler

This searcher must be used with FIFOScheduler. It provides Bayesian optimization, based on a scikit-learn estimator based surrogate model.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • estimator – Instance of SKLearnEstimator to be used as surrogate model

  • scoring_class (Optional[Callable[[Any], ScoringFunction]]) – The scoring function (or acquisition function) class and any extra parameters used to instantiate it. If None, expected improvement (EI) is used. Note that the acquisition function is not locally optimized with this searcher.

  • num_initial_candidates (int) – Number of candidates sampled for scoring with acquisition function.

  • num_initial_random_choices (int) – Number of randomly chosen candidates before surrogate model is used.

  • allow_duplicates (bool) – If True, allow for the same candidate to be selected more than once.

  • restrict_configurations (Optional[List[Dict[str, Any]]]) – If given, the searcher only suggests configurations from this list. If allow_duplicates == False, entries are popped off this list once suggested.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.multiobjective.multiobjective_priority module
class syne_tune.optimizer.schedulers.multiobjective.multiobjective_priority.MOPriority(metrics=None)[source]

Bases: object

priority_unsafe(objectives)[source]
Return type:

array

class syne_tune.optimizer.schedulers.multiobjective.multiobjective_priority.LinearScalarizationPriority(metrics=None, weights=None)[source]

Bases: MOPriority

priority_unsafe(objectives)[source]
Return type:

array

class syne_tune.optimizer.schedulers.multiobjective.multiobjective_priority.FixedObjectivePriority(metrics=None, dim=None)[source]

Bases: MOPriority

priority_unsafe(objectives)[source]
Return type:

array

class syne_tune.optimizer.schedulers.multiobjective.multiobjective_priority.NonDominatedPriority(metrics=None, dim=0, max_num_samples=None)[source]

Bases: MOPriority

priority_unsafe(objectives)[source]
Return type:

array

syne_tune.optimizer.schedulers.multiobjective.non_dominated_priority module
syne_tune.optimizer.schedulers.multiobjective.non_dominated_priority.pareto_efficient(X)[source]

Evaluates for each allocation in the provided array whether it is Pareto efficient. The costs are assumed to be improved by lowering them (eg lower is better).

Return type:

ndarray

Parameters
X: np.ndarray [N, D]

The allocations to check where N is the number of allocations and D the number of costs per allocation.

Returns
np.ndarray [N]

A boolean array, indicating for each allocation whether it is Pareto efficient.

syne_tune.optimizer.schedulers.multiobjective.non_dominated_priority.compute_epsilon_net(X, dim=None)[source]

Outputs an order of the items in the provided array such that the items are spaced well. This means that after choosing a seed item, the next item is chosen to be the farthest from the seed item. The third item is then chosen to maximize the distance to the existing points and so on.

This algorithm is taken from “Nearest-Neighbor Searching and Metric Space Dimensions” (Clarkson, 2005, p.17).

Return type:

ndarray

Parameters
X: np.ndarray [N, D]

The items to sparsify where N is the number of items and D their dimensionality.

dim: Optional[int], default: None

The index of the dimension which to use to choose the seed item. If None, an item is chosen at random, otherwise the item with the lowest value in the specified dimension is used.

Returns
np.ndarray [N]

A list of item indices, defining a sparsified order of the items.

syne_tune.optimizer.schedulers.multiobjective.non_dominated_priority.nondominated_sort(X, dim=None, max_items=None, flatten=True)[source]

Performs a multi-objective sort by iteratively computing the Pareto front and sparsifying the items within the Pareto front. This is a non-dominated sort leveraging an epsilon-net.

Return type:

Union[List[int], List[List[int]]]

Parameters
X: np.ndarray [N, D]

The multi-dimensional items to sort.

dim: Optional[int], default: None

The feature (metric) to prefer when ranking items within the Pareto front. If None, items are chosen randomly.

max_items: Optional[int], default: None

The maximum number of items that should be returned. When this is None, all items are sorted.

flatten: bool, default: True

Whether to flatten the resulting array.

Returns
Union[List[int], List[List[int]]]

The indices of the sorted items, either globally or within each of the Pareto front depending on the value of flatten.

syne_tune.optimizer.schedulers.multiobjective.nsga2_searcher module
class syne_tune.optimizer.schedulers.multiobjective.nsga2_searcher.NSGA2Searcher(config_space, metric, mode='min', points_to_evaluate=None, population_size=20, **kwargs)[source]

Bases: StochasticSearcher

This is a wrapper around the NSGA-2 [1] implementation of pymoo [2].

[1] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan.
A fast and elitist multiobjective genetic algorithm: nsga-II.
Trans. Evol. Comp, 6(2):182–197, April 2002.
[2] J. Blank and K. Deb
pymoo: Multi-Objective Optimization in Python
IEEE Access, 2020
Parameters:
  • config_space (Dict[str, Any]) – Configuration space

  • metric (List[str]) –

    Name of metric passed to update(). Can be obtained from scheduler in configure_scheduler(). In the case of multi-objective optimization,

    metric is a list of strings specifying all objectives to be optimized.

  • points_to_evaluate (Optional[List[dict]]) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • mode (Union[List[str], str]) – Should metric be minimized (“min”, default) or maximized (“max”). In the case of multi-objective optimization, mode can be a list defining for each metric if it is minimized or maximized

  • population_size (int) – Size of the population

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[Dict[str, Any]]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

syne_tune.optimizer.schedulers.multiobjective.random_scalarization module
class syne_tune.optimizer.schedulers.multiobjective.random_scalarization.MultiObjectiveLCBRandomLinearScalarization(predictor, active_metric=None, weights_sampler=None, kappa=0.5, normalize_acquisition=True, random_seed=None)[source]

Bases: ScoringFunction

Note: This is the multi objective random scalarization scoring function based on the work of Biswajit et al. [1]. This scoring function uses Lower Confidence Bound as the acquisition for the scalarized objective \(h(\mu, \sigma) = \mu - \kappa * \sigma\)

[1] Paria, Biswajit, Kirthevasan Kandasamy and Barnabás Póczos.
A Flexible Framework for Multi-Objective Bayesian Optimization using Random Scalarizations.
Conference on Uncertainty in Artificial Intelligence (2018).
Parameters:
  • predictor (Dict[str, Predictor]) – Surrogate predictor for statistics of predictive distribution

  • weights_sampler (Optional[Callable[[], Dict[str, float]]]) –

    Callable that can generate weights for each objective. Once called it will return a dictionary mapping metric name to scalarization weight as {

    <name of metric 1> : <weight for metric 1>, <name of metric 2> : <weight for metric 2>, …

    }

  • kappa (float) – Hyperparameter used for the LCM portion of the scoring

  • normalize_acquisition (bool) – If True, use rank-normalization on the acquisition function results before weighting.

  • random_seed (Optional[int]) – The random seed used for default weights_sampler if not provided.

score(candidates, predictor=None)[source]
Parameters:
  • candidates (Iterable[Dict[str, Union[int, float, str]]]) – Configurations for which scores are to be computed

  • predictor (Optional[Dict[str, Predictor]]) – Overrides default predictor

Return type:

List[float]

Returns:

List of score values, length of candidates

syne_tune.optimizer.schedulers.multiobjective.utils module
syne_tune.optimizer.schedulers.multiobjective.utils.default_reference_point(results_array)[source]
Return type:

ndarray

syne_tune.optimizer.schedulers.multiobjective.utils.hypervolume(results_array, reference_point=None)[source]

Compute the hypervolume of all results based on reference points

Parameters:
  • results_array (ndarray) – Array with experiment results ordered by time with shape (npoints, ndimensions).

  • reference_point (Optional[ndarray]) – Reference points for hypervolume calculations. If None, the maximum values of each dimension of results_array is used.

Return type:

float

:return Hypervolume indicator

syne_tune.optimizer.schedulers.multiobjective.utils.linear_interpolate(hv_indicator, indices)[source]
syne_tune.optimizer.schedulers.multiobjective.utils.hypervolume_cumulative(results_array, reference_point=None, increment=1)[source]

Compute the cumulative hypervolume of all results based on reference points Returns an array with hypervolumes given by an increasing range of points. return_array[idx] = hypervolume(results_array[0 : (idx + 1)]).

The current implementation is very slow, since the hypervolume index is not computed incrementally. A solution for now is to use increment > 1, in which case the HV index is only computed every increment entry, and linearly interpolated in between.

Parameters:
  • results_array (ndarray) – Array with experiment results ordered by time with shape (npoints, ndimensions).

  • reference_point (Optional[ndarray]) – Reference points for hypervolume calculations. If None, the maximum values of each dimension of results_array is used.

Return type:

ndarray

Returns:

Cumulative hypervolume array, shape (npoints,)

syne_tune.optimizer.schedulers.neuralbands package
class syne_tune.optimizer.schedulers.neuralbands.NeuralbandScheduler(config_space, gamma=0.01, nu=0.01, step_size=30, max_while_loop=100, **kwargs)[source]

Bases: NeuralbandSchedulerBase

NeuralBand is a neural-bandit based HPO algorithm for the multi-fidelity setting. It uses a budget-aware neural network together with a feedback perturbation to efficiently explore the input space across fidelities. NeuralBand uses a novel configuration selection criterion to actively choose the configuration in each trial and incrementally exploits the knowledge of every past trial.

Parameters:
  • config_space (Dict) –

  • gamma (float) – Control aggressiveness of configuration selection criterion

  • nu (float) – Control aggressiveness of perturbing feedback for exploration

  • step_size (int) – How many trials we train network once

  • max_while_loop (int) – Maximal number of times we can draw a configuration from configuration space

  • kwargs

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

Submodules
syne_tune.optimizer.schedulers.neuralbands.networks module
class syne_tune.optimizer.schedulers.neuralbands.networks.NetworkExploitation(dim, hidden_size=100)[source]

Bases: Module

forward(x1, b)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class syne_tune.optimizer.schedulers.neuralbands.networks.Exploitation(dim, lr=0.001, hidden=100)[source]

Bases: object

add_data(x, reward)[source]
predict(x)[source]
Return type:

Tensor

train()[source]
Return type:

float

syne_tune.optimizer.schedulers.neuralbands.neuralband module
syne_tune.optimizer.schedulers.neuralbands.neuralband.is_continue_decision(trial_decision)[source]
Return type:

bool

class syne_tune.optimizer.schedulers.neuralbands.neuralband.NeuralbandScheduler(config_space, gamma=0.01, nu=0.01, step_size=30, max_while_loop=100, **kwargs)[source]

Bases: NeuralbandSchedulerBase

NeuralBand is a neural-bandit based HPO algorithm for the multi-fidelity setting. It uses a budget-aware neural network together with a feedback perturbation to efficiently explore the input space across fidelities. NeuralBand uses a novel configuration selection criterion to actively choose the configuration in each trial and incrementally exploits the knowledge of every past trial.

Parameters:
  • config_space (Dict) –

  • gamma (float) – Control aggressiveness of configuration selection criterion

  • nu (float) – Control aggressiveness of perturbing feedback for exploration

  • step_size (int) – How many trials we train network once

  • max_while_loop (int) – Maximal number of times we can draw a configuration from configuration space

  • kwargs

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement module
syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement.is_continue_decision(trial_decision)[source]
Return type:

bool

class syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement.NeuralbandSchedulerBase(config_space, step_size, max_while_loop, **kwargs)[source]

Bases: HyperbandScheduler

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

class syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement.NeuralbandEGreedyScheduler(config_space, epsilon=0.1, step_size=30, max_while_loop=100, **kwargs)[source]

Bases: NeuralbandSchedulerBase

class syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement.NeuralbandTSScheduler(config_space, lamdba=0.1, nu=0.01, step_size=30, max_while_loop=100, **kwargs)[source]

Bases: NeuralbandSchedulerBase

class syne_tune.optimizer.schedulers.neuralbands.neuralband_supplement.NeuralbandUCBScheduler(config_space, lamdba=0.01, nu=0.01, step_size=30, max_while_loop=100, **kwargs)[source]

Bases: NeuralbandSchedulerBase

syne_tune.optimizer.schedulers.searchers package
class syne_tune.optimizer.schedulers.searchers.BaseSearcher(config_space, metric, points_to_evaluate=None, mode='min')[source]

Bases: object

Base class of searchers, which are components of schedulers responsible for implementing get_config().

Note

This is an abstract base class. In order to implement a new searcher, try to start from StochasticAndFilterDuplicatesSearcher or StochasticSearcher, which implement generally useful properties.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space

  • metric (Union[List[str], str]) –

    Name of metric passed to update(). Can be obtained from scheduler in configure_scheduler(). In the case of multi-objective optimization,

    metric is a list of strings specifying all objectives to be optimized.

  • points_to_evaluate (Optional[List[Dict[str, Any]]]) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • mode (Union[List[str], str]) – Should metric be minimized (“min”, default) or maximized (“max”). In the case of multi-objective optimization, mode can be a list defining for each metric if it is minimized or maximized

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[Dict[str, Any]]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

on_trial_result(trial_id, config, result, update)[source]

Inform searcher about result

The scheduler passes every result. If update == True, the searcher should update its surrogate model (if any), otherwise result is an intermediate result not modelled.

The default implementation calls _update() if update == True. It can be overwritten by searchers which also react to intermediate results.

Parameters:
  • trial_id (str) – See on_trial_result()

  • config (Dict[str, Any]) – See on_trial_result()

  • result (Dict[str, Any]) – See on_trial_result()

  • update (bool) – Should surrogate model be updated?

register_pending(trial_id, config=None, milestone=None)[source]

Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.

Parameters:
  • trial_id (str) – ID of trial to be registered as pending evaluation

  • config (Optional[Dict[str, Any]]) – If trial_id has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.

  • milestone (Optional[int]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers (config, milestone) as pending.

remove_case(trial_id, **kwargs)[source]

Remove data case previously appended by _update()

For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.

Parameters:
  • trial_id (str) – ID of trial whose data is to be removed

  • kwargs – Extra arguments, optional

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

cleanup_pending(trial_id)[source]

Removes all pending evaluations for trial trial_id.

This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.

Parameters:

trial_id (str) – ID of trial whose pending evaluations should be cleared

dataset_size()[source]
Returns:

Size of dataset a model is fitted to, or 0 if no model is fitted to data

model_parameters()[source]
Returns:

Dictionary with current model (hyper)parameter values if this is supported; otherwise empty

get_state()[source]

Together with clone_from_state(), this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.

Return type:

Dict[str, Any]

Returns:

Pickle-able mutable state of searcher

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

property debug_log: DebugLogPrinter | None

Some subclasses support writing a debug log, using DebugLogPrinter. See RandomSearcher for an example.

Returns:

debug_log object`` or None (not supported)

syne_tune.optimizer.schedulers.searchers.impute_points_to_evaluate(points_to_evaluate, config_space)[source]

Transforms points_to_evaluate argument to BaseSearcher. Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. Also, duplicate entries are filtered out. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

Parameters:
  • points_to_evaluate (Optional[List[Dict[str, Any]]]) – Argument to BaseSearcher

  • config_space (Dict[str, Any]) – Configuration space

Return type:

List[Dict[str, Any]]

Returns:

List of fully specified initial configs

class syne_tune.optimizer.schedulers.searchers.StochasticSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: BaseSearcher

Base class of searchers which use random decisions. Creates the random_state member, which must be used for all random draws.

Making proper use of this interface allows us to run experiments with control of random seeds, e.g. for paired comparisons or integration testing.

Additional arguments on top of parent class BaseSearcher:

Parameters:
  • random_seed_generator (RandomSeedGenerator, optional) – If given, random seed is drawn from there

  • random_seed (int, optional) – Used if random_seed_generator is not given.

get_state()[source]

Together with clone_from_state(), this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.

Return type:

Dict[str, Any]

Returns:

Pickle-able mutable state of searcher

set_random_state(random_state)[source]
class syne_tune.optimizer.schedulers.searchers.StochasticAndFilterDuplicatesSearcher(config_space, metric, points_to_evaluate=None, allow_duplicates=None, restrict_configurations=None, **kwargs)[source]

Bases: StochasticSearcher

Base class for searchers with the following properties:

  • Random decisions use common random_state

  • Maintains exclusion list to filter out duplicates in get_config() if allows_duplicates == False`. If this is ``True, duplicates are not filtered, and the exclusion list is used only to avoid configurations of failed trials.

  • If restrict_configurations is given, this is a list of configurations, and the searcher only suggests configurations from there. If allow_duplicates == False, entries are popped off this list once suggested. points_to_evaluate is filtered to only contain entries in this set.

In order to make use of these features:

  • Reject configurations in get_config() if should_not_suggest() returns True. If the configuration is drawn at random, use _get_random_config(), which incorporates this filtering

  • Implement _get_config() instead of get_config(). The latter adds the new config to the exclusion list if allow_duplicates == False

Note: Not all searchers which filter duplicates make use of this class.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • allow_duplicates (Optional[bool]) – See above. Defaults to False

  • restrict_configurations (Optional[List[Dict[str, Any]]]) – See above, optional

property allow_duplicates: bool
should_not_suggest(config)[source]
Parameters:

config (Dict[str, Any]) – Configuration

Return type:

bool

Returns:

get_config() should not suggest this configuration?

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[Dict[str, Any]]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

register_pending(trial_id, config=None, milestone=None)[source]

Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.

Parameters:
  • trial_id (str) – ID of trial to be registered as pending evaluation

  • config (Optional[Dict[str, Any]]) – If trial_id has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.

  • milestone (Optional[int]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers (config, milestone) as pending.

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

get_state()[source]

Together with clone_from_state(), this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.

Return type:

Dict[str, Any]

Returns:

Pickle-able mutable state of searcher

syne_tune.optimizer.schedulers.searchers.extract_random_seed(**kwargs)[source]
Return type:

(int, Dict[str, Any])

class syne_tune.optimizer.schedulers.searchers.RandomSearcher(config_space, metric, points_to_evaluate=None, debug_log=False, resource_attr=None, allow_duplicates=None, restrict_configurations=None, **kwargs)[source]

Bases: StochasticAndFilterDuplicatesSearcher

Searcher which randomly samples configurations to try next.

Additional arguments on top of parent class StochasticAndFilterDuplicatesSearcher:

Parameters:
  • debug_log (Union[bool, DebugLogPrinter]) – If True, debug log printing is activated. Logs which configs are chosen when, and which metric values are obtained. Defaults to False

  • resource_attr (Optional[str]) – Optional. Key in result passed to _update() for resource value (for multi-fidelity schedulers)

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

property debug_log

Some subclasses support writing a debug log, using DebugLogPrinter. See RandomSearcher for an example.

Returns:

debug_log object`` or None (not supported)

class syne_tune.optimizer.schedulers.searchers.GridSearcher(config_space, metric, points_to_evaluate=None, num_samples=None, shuffle_config=True, allow_duplicates=False, **kwargs)[source]

Bases: StochasticSearcher

Searcher that samples configurations from an equally spaced grid over config_space.

It first evaluates configurations defined in points_to_evaluate and then continues with the remaining points from the grid.

Additional arguments on top of parent class StochasticSearcher.

Parameters:
  • num_samples (Optional[Dict[str, int]]) – Dictionary, optional. Number of samples per hyperparameter. This is required for hyperparameters of type float, optional for integer hyperparameters, and will be ignored for other types (categorical, scalar). If left unspecified, a default value of DEFAULT_NSAMPLE will be used for float parameters, and the smallest of DEFAULT_NSAMPLE and integer range will be used for integer parameters.

  • shuffle_config (bool) – If True (default), the order of configurations suggested after those specified in points_to_evaluate is shuffled. Otherwise, the order will follow the Cartesian product of the configurations.

  • allow_duplicates (bool) – If True, get_config() may return the same configuration more than once. Defaults to False

get_config(**kwargs)[source]

Select the next configuration from the grid.

This is done without replacement, so previously returned configs are not suggested again.

Return type:

Optional[dict]

Returns:

A new configuration that is valid, or None if no new config can be suggested. The returned configuration is a dictionary that maps hyperparameters to its values.

get_state()[source]

Together with clone_from_state(), this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.

Return type:

Dict[str, Any]

Returns:

Pickle-able mutable state of searcher

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.searcher_factory(searcher_name, **kwargs)[source]

Factory for searcher objects

This function creates searcher objects from string argument name and additional kwargs. It is typically called in the constructor of a scheduler (see FIFOScheduler), which provides most of the required kwargs.

Parameters:
  • searcher_name (str) – Value of searcher argument to scheduler (see FIFOScheduler)

  • kwargs – Argument to BaseSearcher constructor

Return type:

BaseSearcher

Returns:

New searcher object

class syne_tune.optimizer.schedulers.searchers.ModelBasedSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: StochasticSearcher

Common code for surrogate model based searchers

If num_initial_random_choices > 0, initial configurations are drawn using an internal RandomSearcher object, which is created in _assign_random_searcher(). This internal random searcher shares random_state with the searcher here. This ensures that if ModelBasedSearcher and RandomSearcher objects are created with the same random_seed and points_to_evaluate argument, initial configurations are identical until _get_config_modelbased() kicks in.

Note that this works because random_state is only used in the internal random searcher until meth:_get_config_modelbased is first called.

on_trial_result(trial_id, config, result, update)[source]

Inform searcher about result

The scheduler passes every result. If update == True, the searcher should update its surrogate model (if any), otherwise result is an intermediate result not modelled.

The default implementation calls _update() if update == True. It can be overwritten by searchers which also react to intermediate results.

Parameters:
  • trial_id (str) – See on_trial_result()

  • config (Dict[str, Any]) – See on_trial_result()

  • result (Dict[str, Any]) – See on_trial_result()

  • update (bool) – Should surrogate model be updated?

get_config(**kwargs)[source]

Runs Bayesian optimization in order to suggest the next config to evaluate.

Return type:

Optional[Dict[str, Any]]

Returns:

Next config to evaluate at

dataset_size()[source]
Returns:

Size of dataset a model is fitted to, or 0 if no model is fitted to data

model_parameters()[source]
Returns:

Dictionary with current model (hyper)parameter values if this is supported; otherwise empty

set_params(param_dict)[source]
get_state()[source]

The mutable state consists of the GP model parameters, the TuningJobState, and the skip_optimization predicate (which can have a mutable state). We assume that skip_optimization can be pickled.

Note that we do not have to store the state of _random_searcher, since this internal searcher shares its random_state with the searcher here.

Return type:

Dict[str, Any]

property debug_log

Some subclasses support writing a debug log, using DebugLogPrinter. See RandomSearcher for an example.

Returns:

debug_log object`` or None (not supported)

class syne_tune.optimizer.schedulers.searchers.BayesianOptimizationSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: ModelBasedSearcher

Common Code for searchers using Bayesian optimization

We implement Bayesian optimization, based on a model factory which parameterizes the state transformer. This implementation works with any type of surrogate model and acquisition function, which are compatible with each other.

The following happens in get_config():

  • For the first num_init_random calls, a config is drawn at random (after points_to_evaluate, which are included in the num_init_random initial ones). Afterwards, Bayesian optimization is used, unless there are no finished evaluations yet (a surrogate model cannot be used with no data at all)

  • For BO, model hyperparameter are refit first. This step can be skipped (see opt_skip_* parameters).

  • Next, the BO decision is made based on BayesianOptimizationAlgorithm. This involves sampling num_init_candidates` configs are sampled at random, ranking them with a scoring function (initial_scoring), and finally runing local optimization starting from the top scoring config.

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

register_pending(trial_id, config=None, milestone=None)[source]

Registers trial as pending. This means the corresponding evaluation task is running. Once it finishes, update is called for this trial.

get_batch_configs(batch_size, num_init_candidates_for_batch=None, **kwargs)[source]

Asks for a batch of batch_size configurations to be suggested. This is roughly equivalent to calling get_config batch_size times, marking the suggested configs as pending in the state (but the state is not modified here). This means the batch is chosen sequentially, at about the cost of calling get_config batch_size times.

If num_init_candidates_for_batch is given, it is used instead of num_init_candidates for the selection of all but the first config in the batch. In order to speed up batch selection, choose num_init_candidates_for_batch smaller than num_init_candidates.

If less than batch_size configs are returned, the search space has been exhausted.

Note: Batch selection does not support debug_log right now: make sure to switch this off when creating scheduler and searcher.

Return type:

List[Dict[str, Union[int, float, str]]]

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

class syne_tune.optimizer.schedulers.searchers.GPFIFOSearcher(config_space, metric, points_to_evaluate=None, clone_from_state=False, **kwargs)[source]

Bases: BayesianOptimizationSearcher

Gaussian process Bayesian optimization for FIFO scheduler

This searcher must be used with FIFOScheduler. It provides Bayesian optimization, based on a Gaussian process surrogate model.

It is not recommended creating GPFIFOSearcher searcher objects directly, but rather to create FIFOScheduler objects with searcher="bayesopt", and passing arguments here in search_options. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory to create components in a consistent way.

Most of the implementation is generic in BayesianOptimizationSearcher.

Note: If metric values are to be maximized (mode-"max" in scheduler), the searcher uses map_reward to map metric values to internal criterion values, and minimizes the latter. The default choice is to multiply values by -1.

Pending configurations (for which evaluation tasks are currently running) are dealt with by fantasizing (i.e., target values are drawn from the current posterior, and acquisition functions are averaged over this sample, see num_fantasy_samples).

The GP surrogate model uses a Matern 5/2 covariance function with automatic relevance determination (ARD) of input attributes, and a constant mean function. The acquisition function is expected improvement (EI). All hyperparameters of the surrogate model are estimated by empirical Bayes (maximizing the marginal likelihood). In general, this hyperparameter fitting is the most expensive part of a get_config() call.

Note that the full logic of construction based on arguments is given in :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory. In particular, see gp_fifo_searcher_defaults() for default values.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • clone_from_state (bool) – Internal argument, do not use

  • resource_attr (str, optional) – Name of resource attribute in reports. This is optional here, but required for multi-fidelity searchers. If resource_attr and cost_attr are given, cost values are read from each report and stored in the state. This allows cost models to be fit on more data.

  • cost_attr (str, optional) – Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Needed only by cost-aware searchers. Depending on whether resource_attr is given, cost values are read from each report or only at the end.

  • num_init_random (int, optional) – Number of initial get_config() calls for which randomly sampled configs are returned. Afterwards, the model-based searcher is used. Defaults to DEFAULT_NUM_INITIAL_RANDOM_EVALUATIONS

  • num_init_candidates (int, optional) – Number of initial candidates sampled at random in order to seed the model-based search in get_config. Defaults to DEFAULT_NUM_INITIAL_CANDIDATES

  • num_fantasy_samples (int, optional) – Number of samples drawn for fantasizing (latent target values for pending evaluations), defaults to 20

  • no_fantasizing (bool, optional) – If True, fantasizing is not done and pending evaluations are ignored. This may lead to loss of diversity in decisions. Defaults to False

  • input_warping (bool, optional) – If True, we use a warping transform, so the kernel function becomes \(k(w(x), w(x'))\), where \(w(x)\) is a warping transform parameterized by two non-negative numbers per component, which are learned as hyperparameters. See also Warping. Coordinates which belong to categorical hyperparameters, are not warped. Defaults to False.

  • boxcox_transform (bool, optional) – If True, target values are transformed before being fitted with a Gaussian marginal likelihood. This is using the Box-Cox transform with a parameter \(\lambda\), which is learned alongside other parameters of the surrogate model. The transform is \(\log y\) for \(\lambda = 0\), and \(y - 1\) for \(\lambda = 1\). This option requires the targets to be positive. Defaults to False.

  • gp_base_kernel (str, optional) – Selects the covariance (or kernel) function to be used. Supported choices are SUPPORTED_BASE_MODELS. Defaults to “matern52-ard” (Matern 5/2 with automatic relevance determination).

  • acq_function (str, optional) – Selects the acquisition function to be used. Supported choices are SUPPORTED_ACQUISITION_FUNCTIONS. Defaults to “ei” (expected improvement acquisition function).

  • acq_function_kwargs (dict, optional) – Some acquisition functions have additional parameters, they can be passed here. If none are given, default values are used.

  • initial_scoring (str, optional) –

    Scoring function to rank initial candidates (local optimization of EI is started from top scorer):

    • ”thompson_indep”: Independent Thompson sampling; randomized score, which can increase exploration

    • ”acq_func”: score is the same (EI) acquisition function which is used for local optimization afterwards

    Defaults to DEFAULT_INITIAL_SCORING

  • skip_local_optimization (bool, optional) – If True, the local gradient-based optimization of the acquisition function is skipped, and the top-ranked initial candidate (after initial scoring) is returned instead. In this case, initial_scoring="acq_func" makes most sense, otherwise the acquisition function will not be used. Defaults to False

  • opt_nstarts (int, optional) – Parameter for surrogate model fitting. Number of random restarts. Defaults to 2

  • opt_maxiter (int, optional) – Parameter for surrogate model fitting. Maximum number of iterations per restart. Defaults to 50

  • opt_warmstart (bool, optional) – Parameter for surrogate model fitting. If True, each fitting is started from the previous optimum. Not recommended in general. Defaults to False

  • opt_verbose (bool, optional) – Parameter for surrogate model fitting. If True, lots of output. Defaults to False

  • max_size_data_for_model (int, optional) – If this is set, we limit the number of observations the surrogate model is fitted on this value. If there are more observations, they are down sampled, see SubsampleSingleFidelityStateConverter for details. This down sampling is repeated every time the model is fit. The opt_skip_* predicates are evaluated before the state is downsampled. Pass None not to apply such a threshold. The default is DEFAULT_MAX_SIZE_DATA_FOR_MODEL.

  • max_size_top_fraction (float, optional) – Only used if max_size_data_for_model is set. This fraction of the down sampled set is filled with the top entries in the full set, the remaining ones are sampled at random from the full set, see SubsampleSingleFidelityStateConverter for details. Defaults to 0.25.

  • opt_skip_init_length (int, optional) – Parameter for surrogate model fitting, skip predicate. Fitting is never skipped as long as number of observations below this threshold. Defaults to 150

  • opt_skip_period (int, optional) – Parameter for surrogate model fitting, skip predicate. If >1, and number of observations above opt_skip_init_length, fitting is done only K-th call, and skipped otherwise. Defaults to 1 (no skipping)

  • allow_duplicates (bool, optional) – If True, get_config() may return the same configuration more than once. Defaults to False

  • restrict_configurations (List[dict], optional) – If given, the searcher only suggests configurations from this list. This needs skip_local_optimization == True. If allow_duplicates == False, entries are popped off this list once suggested.

  • map_reward (str or MapReward, optional) –

    In the scheduler, the metric may be minimized or maximized, but internally, Bayesian optimization is minimizing the criterion. map_reward converts from metric to internal criterion:

    • ”minus_x”: criterion = -metric

    • ”<a>_minus_x”: criterion = <a> - metric. For example “1_minus_x” maps accuracy to zero-one error

    From a technical standpoint, it does not matter what is chosen here, because criterion is only used internally. Also note that criterion data is always normalized to mean 0, variance 1 before fitted with a Gaussian process. Defaults to “1_minus_x”

  • transfer_learning_task_attr (str, optional) – Used to support transfer HPO, where the state contains observed data from several tasks, one of which is the active one. To this end, config_space must contain a categorical parameter of name transfer_learning_task_attr, whose range are all task IDs. Also, transfer_learning_active_task must denote the active task, and transfer_learning_active_config_space is used as active_config_space argument in HyperparameterRanges. This allows us to use a narrower search space for the active task than for the union of all tasks (config_space must be that), which is needed if some configurations of non-active tasks lie outside of the ranges in active_config_space. One of the implications is that filter_observed_data() is selecting configs of the active task, so that incumbents or exclusion lists are restricted to data from the active task.

  • transfer_learning_active_task (str, optional) – See transfer_learning_task_attr.

  • transfer_learning_active_config_space (Dict[str, Any], optional) – See transfer_learning_task_attr. If not given, config_space is the search space for the active task as well. This active config space need not contain the transfer_learning_task_attr parameter. In fact, this parameter is set to a categorical with transfer_learning_active_task as single value, so that new configs are chosen for the active task only.

  • transfer_learning_model (str, optional) –

    See transfer_learning_task_attr. Specifies the surrogate model to be used for transfer learning:

    • ”matern52_product”: Kernel is product of Matern 5/2 (not ARD) on transfer_learning_task_attr and Matern 5/2 (ARD) on the rest. Assumes that data from same task are more closely related than data from different tasks

    • ”matern52_same”: Kernel is Matern 5/2 (ARD) on the rest of the variables, transfer_learning_task_attr is ignored. Assumes that data from all tasks can be merged together

    Defaults to “matern52_product”

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

class syne_tune.optimizer.schedulers.searchers.GPMultiFidelitySearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: GPFIFOSearcher

Gaussian process Bayesian optimization for asynchronous Hyperband scheduler.

This searcher must be used with a scheduler of type MultiFidelitySchedulerMixin. It provides a novel combination of Bayesian optimization, based on a Gaussian process surrogate model, with Hyperband scheduling. In particular, observations across resource levels are modelled jointly.

It is not recommended to create GPMultiFidelitySearcher searcher objects directly, but rather to create HyperbandScheduler objects with searcher="bayesopt", and passing arguments here in search_options. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory to create components in a consistent way.

Most of GPFIFOSearcher comments apply here as well. In multi-fidelity HPO, we optimize a function \(f(\mathbf{x}, r)\), \(\mathbf{x}\) the configuration, \(r\) the resource (or time) attribute. The latter must be a positive integer. In most applications, resource_attr == "epoch", and the resource is the number of epochs already trained.

If model == "gp_multitask" (default), we model the function \(f(\mathbf{x}, r)\) jointly over all resource levels \(r\) at which it is observed (but see searcher_data in HyperbandScheduler). The kernel and mean function of our surrogate model are over \((\mathbf{x}, r)\). The surrogate model is selected by gp_resource_kernel. More details about the supported kernels is in:

Tiao, Klein, Lienart, Archambeau, Seeger (2020)
Model-based Asynchronous Hyperparameter and Neural Architecture Search

The acquisition function (EI) which is optimized in get_config(), is obtained by fixing the resource level \(r\) to a value which is determined depending on the current state. If resource_acq == ‘bohb’, \(r\) is the largest value <= max_t, where we have seen \(\ge \mathrm{dimension}(\mathbf{x})\) metric values. If resource_acq == "first", \(r\) is the first milestone which config \(\mathbf{x}\) would reach when started.

Additional arguments on top of parent class GPFIFOSearcher.

Parameters:
  • model (str, optional) –

    Selects surrogate model (learning curve model) to be used. Choices are:

    • ”gp_multitask” (default): GP multi-task surrogate model

    • ”gp_independent”: Independent GPs for each rung level, sharing an ARD kernel

    • ”gp_issm”: Gaussian-additive model of ISSM type

    • ”gp_expdecay”: Gaussian-additive model of exponential decay type (as in Freeze Thaw Bayesian Optimization)

  • gp_resource_kernel (str, optional) – Only relevant for model == "gp_multitask". Surrogate model over criterion function \(f(\mathbf{x}, r)\), \(\mathbf{x}\) the config, \(r\) the resource. Note that \(\mathbf{x}\) is encoded to be a vector with entries in [0, 1], and \(r\) is linearly mapped to [0, 1], while the criterion data is normalized to mean 0, variance 1. The reference above provides details on the models supported here. For the exponential decay kernel, the base kernel over \(\mathbf{x}\) is Matern 5/2 ARD. See SUPPORTED_RESOURCE_MODELS for supported choices. Defaults to “exp-decay-sum”

  • resource_acq (str, optional) – Only relevant for ``model in {"gp_multitask", "gp_independent"}. Determines how the EI acquisition function is used. Values: “bohb”, “first”. Defaults to “bohb”

  • max_size_data_for_model (int, optional) –

    If this is set, we limit the number of observations the surrogate model is fitted on this value. If there are more observations, they are down sampled, see SubsampleMultiFidelityStateConverter for details. This down sampling is repeated every time the model is fit, which ensures that most recent data is taken into account. The opt_skip_* predicates are evaluated before the state is downsampled.

    Pass None not to apply such a threshold. The default is DEFAULT_MAX_SIZE_DATA_FOR_MODEL.

  • opt_skip_num_max_resource (bool, optional) – Parameter for surrogate model fitting, skip predicate. If True, and number of observations above opt_skip_init_length, fitting is done only when there is a new datapoint at r = max_t, and skipped otherwise. Defaults to False

  • issm_gamma_one (bool, optional) – Only relevant for model == "gp_issm". If True, the gamma parameter of the ISSM is fixed to 1, otherwise it is optimized over. Defaults to False

  • expdecay_normalize_inputs (bool, optional) – Only relevant for model == "gp_expdecay". If True, resource values r are normalized to [0, 1] as input to the exponential decay surrogate model. Defaults to False

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

register_pending(trial_id, config=None, milestone=None)[source]

Registers trial as pending. This means the corresponding evaluation task is running. Once it finishes, update is called for this trial.

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

cleanup_pending(trial_id)[source]

Removes all pending evaluations for trial trial_id.

This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.

Parameters:

trial_id (str) – ID of trial whose pending evaluations should be cleared

remove_case(trial_id, **kwargs)[source]

Remove data case previously appended by _update()

For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.

Parameters:
  • trial_id (str) – ID of trial whose data is to be removed

  • kwargs – Extra arguments, optional

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

Subpackages
syne_tune.optimizer.schedulers.searchers.bayesopt package
Subpackages
syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common module
syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common.dictionarize_objective(x)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common.TrialEvaluations(trial_id, metrics)[source]

Bases: object

For each fixed k, metrics[k] is either a single value or a dict. The latter is used, for example, for multi-fidelity schedulers, where metrics[k][str(r)] is the value at resource level r.

trial_id: str
metrics: Dict[str, Union[float, Dict[str, float]]]
num_cases(metric_name='target', resource=None)[source]

Counts the number of observations for metric metric_name.

Parameters:
  • metric_name (str) – Defaults to INTERNAL_METRIC_NAME

  • resource (Optional[int]) – In the multi-fidelity case, we only count observations at this resource level

Return type:

int

Returns:

Number of observations

class syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common.PendingEvaluation(trial_id, resource=None)[source]

Bases: object

Maintains information for pending candidates (i.e. candidates which have been queried for labeling, but target feedback has not yet been obtained.

The minimum information is the candidate which has been queried.

property trial_id: str
property resource: int | None
class syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.common.FantasizedPendingEvaluation(trial_id, fantasies, resource=None)[source]

Bases: PendingEvaluation

Here, latent target values are integrated out by Monte Carlo samples, also called “fantasies”.

property fantasies
syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.config_ext module
class syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.config_ext.ExtendedConfiguration(hp_ranges, resource_attr_key, resource_attr_range)[source]

Bases: object

This class facilitates handling extended configs, which consist of a normal config and a resource attribute.

The config space hp_ranges is extended by an additional resource attribute. Note that this is not a hyperparameter we optimize over, but it is under the control of the scheduler. Its allowed range is [1, resource_attr_range[1]], which can be larger than [resource_attr_range[0], resource_attr_range[1]]. This is because extended configs with resource values outside of resource_attr_range may arise (for example, in the early stopping context, we may receive data from epoch < resource_attr_range[0]).

get(config, resource)[source]

Create extended config with resource added.

Parameters:
  • config (Dict[str, Union[int, float, str]]) – Non-extended config

  • resource (int) – Resource value

Return type:

Dict[str, Union[int, float, str]]

Returns:

Extended config

remove_resource(config_ext)[source]

Strips away resource attribute and returns normal config. If config_ext is already normal, it is returned as is.

Parameters:

config_ext (Dict[str, Union[int, float, str]]) – Extended config

Return type:

Dict[str, Union[int, float, str]]

Returns:

config_ext without resource attribute

split(config_ext)[source]

Split extended config into normal config and resource value.

Parameters:

config_ext (Dict[str, Union[int, float, str]]) – Extended config

Return type:

(Dict[str, Union[int, float, str]], int)

Returns:

(config, resource_value)

get_resource(config_ext)[source]
Parameters:

config_ext (Dict[str, Union[int, float, str]]) – Extended config

Return type:

int

Returns:

Value of resource attribute

syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.tuning_job_state module
class syne_tune.optimizer.schedulers.searchers.bayesopt.datatypes.tuning_job_state.TuningJobState(hp_ranges, config_for_trial, trials_evaluations, failed_trials=None, pending_evaluations=None)[source]

Bases: object

Collects all data determining the state of a tuning experiment. Trials are indexed by trial_id. The configurations associated with trials are listed in config_for_trial. trials_evaluations contains observations, failed_trials lists trials for which evaluations have failed, pending_evaluations lists trials for which observations are pending.

trials_evaluations may store values for different metrics in each record, and each such value may be a dict (see:class:TrialEvaluations). For example, for multi-fidelity schedulers, trials_evaluations[i].metrics[k][str(r)] is the value for metric k and trial trials_evaluations[i].trial_id observed at resource level r.

static empty_state(hp_ranges)[source]
Return type:

TuningJobState

metrics_for_trial(trial_id, config=None)[source]

Helper for inserting new entry into trials_evaluations. If trial_id is already contained there, the corresponding eval.metrics is returned. Otherwise, a new entry new_eval is appended to trials_evaluations and its new_eval.metrics is returned (empty dict). In the latter case, config needs to be passed, because it may not yet feature in config_for_trial.

Return type:

Union[float, Dict[str, float]]

num_observed_cases(metric_name='target', resource=None)[source]

Counts the number of observations for metric metric_name.

Parameters:
  • metric_name (str) – Defaults to INTERNAL_METRIC_NAME

  • resource (Optional[int]) – In the multi-fidelity case, we only count observations at this resource level

Return type:

int

Returns:

Number of observations

observed_data_for_metric(metric_name='target', resource_attr_name=None)[source]

Extracts datapoints from trials_evaluations for particular metric metric_name, in the form of a list of configs and a list of metric values. If metric_name is a dict-valued metric, the dict keys must be resource values, and the returned configs are extended. Here, the name of the resource attribute can be passed in resource_attr_name (if not given, it can be obtained from hp_ranges if this is extended).

Note: Implements the default behaviour, namely to return extended configs for dict-valued metrics, which also require hp_ranges to be extended. This is not correct for some specific multi-fidelity surrogate models, which should access the data directly.

Parameters:
  • metric_name (str) –

  • resource_attr_name (Optional[str]) –

Return type:

(List[Dict[str, Union[int, float, str]]], List[float])

Returns:

configs, metric_values

is_pending(trial_id, resource=None)[source]
Return type:

bool

is_labeled(trial_id, metric_name='target', resource=None)[source]

Checks whether trial_id has observed data under metric_name. If resource is given, the observation must be at that resource level.

Return type:

bool

append_pending(trial_id, config=None, resource=None)[source]

Appends new pending evaluation. If the trial has not been registered here, config must be given. Otherwise, it is ignored.

remove_pending(trial_id, resource=None)[source]
Return type:

bool

pending_configurations(resource_attr_name=None)[source]

Returns list of configurations corresponding to pending evaluations. If the latter have resource values, the configs are extended.

Return type:

List[Dict[str, Union[int, float, str]]]

all_configurations(filter_observed_data=None)[source]

Returns list of configurations for all trials represented here, whether observed, pending, or failed. If filter_observed_data is given, the configurations for observed trials are filtered with this predicate.

Parameters:

filter_observed_data (Optional[Callable[[Dict[str, Union[int, float, str]]], bool]]) – See above, optional

Return type:

List[Dict[str, Union[int, float, str]]]

Returns:

List of all configurations

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd package
exception syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.SliceException[source]

Bases: Exception

Subpackages
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.gp_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.gp_model.HyperTuneDistributionArguments(num_samples, num_brackets=None)[source]

Bases: object

num_samples: int
num_brackets: Optional[int] = None
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.gp_model.HyperTuneModelMixin(hypertune_distribution_args)[source]

Bases: object

hypertune_bracket_distribution()[source]

Distribution [w_k] of support size num_supp_brackets, where num_supp_brackets <= args.num_brackets (the latter is maximum if not given) is maximum such that the first num_supp_brackets brackets have >= 6 labeled datapoints each.

If num_supp_brackets < args.num_brackets, the distribution must be extended to full size before being used to sample the next bracket.

Return type:

Optional[ndarray]

hypertune_ensemble_distribution()[source]

Distribution [theta_r] which is used to create an ensemble predictive distribution fed into the acquisition function. The ensemble distribution runs over all sufficiently supported rung levels, independent of the number of brackets.

Return type:

Optional[Dict[int, float]]

fit_distributions(poster_state, data, resource_attr_range, random_state)[source]
Return type:

Optional[Dict[int, float]]

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.gp_model.HyperTuneIndependentGPModel(kernel, mean_factory, resource_attr_range, hypertune_distribution_args, target_transform=None, separate_noise_variances=False, initial_noise_variance=None, initial_covariance_scale=None, optimization_config=None, random_seed=None, fit_reset_params=True)[source]

Bases: IndependentGPPerResourceModel, HyperTuneModelMixin

Variant of IndependentGPPerResourceModel which implements additional features of the Hyper-Tune algorithm, see

Yang Li et al
Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale
VLDB 2022

Our implementation differs from the Hyper-Tune paper in a number of ways. Most importantly, their method requires a sufficient number of observed points at the starting rung of the highest bracket. In contrast, we estimate ranking loss values already when the starting rung of the 2nd bracket is sufficiently occupied. This allows us to estimate the head of the distribution only (over all brackets with sufficiently occupied starting rungs), and we use the default distribution over the remaining tail. Eventually, we do the same as Hyper-Tune, but we move away from the default distribution earlier on.

Parameters:

hypertune_distribution_args (HyperTuneDistributionArguments) – Parameters for Hyper-Tune

create_likelihood(rung_levels)[source]

Delayed creation of likelihood, needs to know rung levels of Hyperband scheduler.

Note: last entry of rung_levels must be max_t, even if this is not a rung level in Hyperband.

Parameters:

rung_levels (List[int]) – Rung levels

hypertune_ensemble_distribution()[source]

Distribution [theta_r] which is used to create an ensemble predictive distribution fed into the acquisition function. The ensemble distribution runs over all sufficiently supported rung levels, independent of the number of brackets.

Return type:

Optional[Dict[int, float]]

fit(data)[source]

Fit the model parameters by optimizing the marginal likelihood, and set posterior states.

We catch exceptions during the optimization restarts. If any restarts fail, log messages are written. If all restarts fail, the current parameters are not changed.

Parameters:

data (Dict[str, Any]) – Input data

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.gp_model.HyperTuneJointGPModel(kernel, resource_attr_range, hypertune_distribution_args, mean=None, target_transform=None, initial_noise_variance=None, optimization_config=None, random_seed=None, fit_reset_params=True)[source]

Bases: GaussianProcessRegression, HyperTuneModelMixin

Variant of GaussianProcessRegression which implements additional features of the Hyper-Tune algorithm, see

Yang Li et al Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale VLDB 2022

See also HyperTuneIndependentGPModel

Parameters:

hypertune_distribution_args (HyperTuneDistributionArguments) – Parameters for Hyper-Tune

create_likelihood(rung_levels)[source]

Delayed creation of likelihood, needs to know rung levels of Hyperband scheduler.

Note: last entry of rung_levels must be max_t, even if this is not a rung level in Hyperband.

Parameters:

rung_levels (List[int]) – Rung levels

hypertune_ensemble_distribution()[source]

Distribution [theta_r] which is used to create an ensemble predictive distribution fed into the acquisition function. The ensemble distribution runs over all sufficiently supported rung levels, independent of the number of brackets.

Return type:

Optional[Dict[int, float]]

fit(data)[source]

Fit the model parameters by optimizing the marginal likelihood, and set posterior states.

We catch exceptions during the optimization restarts. If any restarts fail, log messages are written. If all restarts fail, the current parameters are not changed.

Parameters:

data (Dict[str, Any]) – Input data

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.likelihood module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.likelihood.HyperTuneIndependentGPMarginalLikelihood(kernel, mean, resource_attr_range, ensemble_distribution, target_transform=None, separate_noise_variances=False, initial_noise_variance=None, initial_covariance_scale=None, encoding_type=None, **kwargs)[source]

Bases: IndependentGPPerResourceMarginalLikelihood

Variant of IndependentGPPerResourceMarginalLikelihood, which has the same internal model and marginal likelihood function, but whose posterior state is of HyperTuneIndependentGPPosteriorState, which uses an ensemble predictive distribution, whose weighting distribution has to be passed here at construction.

property ensemble_distribution: Dict[int, float]
set_ensemble_distribution(distribution)[source]
get_posterior_state(data)[source]
Return type:

PosteriorState

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.likelihood.HyperTuneJointGPMarginalLikelihood(kernel, mean, resource_attr_range, ensemble_distribution, target_transform=None, initial_noise_variance=None, encoding_type=None, **kwargs)[source]

Bases: GaussianProcessMarginalLikelihood

Variant of GaussianProcessMarginalLikelihood, which has the same internal model and marginal likelihood function, but whose posterior state is of HyperTuneJointGPPosteriorState, which uses an ensemble predictive distribution, whose weighting distribution has to be passed here at construction.

property ensemble_distribution: Dict[int, float]
set_ensemble_distribution(distribution)[source]
get_posterior_state(data)[source]
Return type:

PosteriorState

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.posterior_state module
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.posterior_state.assert_ensemble_distribution(distribution, all_resources)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.posterior_state.HyperTuneIndependentGPPosteriorState(features, targets, kernel, mean, covariance_scale, noise_variance, resource_attr_range, ensemble_distribution, debug_log=False)[source]

Bases: IndependentGPPerResourcePosteriorState

Special case of IndependentGPPerResourcePosteriorState, where methods predict, backward_gradient, sample_marginals, sample_joint are over a random function \(f_{MF}(x)\), obtained by first sampling the resource level \(r \sim [\theta_r]\), then use \(f_{MF}(x) = f(x, r)\). Predictive means and variances are:

..math::

mu_{MF}(x) = sum_r theta_r mu(x, r) sigma_{MF}^2(x) = sum_r theta_r^2 sigma_{MF}^2(x, r)

Here, \([\theta_k]\) is a distribution over a subset of rung levels.

Note: This posterior state is unusual, in that sample_marginals, sample_joint have to work both with (a) extended inputs (x, r) and (b) non-extended inputs x. For case (a), they behave like the superclass methods, this is needed to support fitting model parameters, for example for drawing fantasy samples. For case (b), they use the ensemble distribution detailed above, which supports optimizing the acquisition function.

predict(test_features)[source]

Computes marginal statistics (means, variances) for a number of test features.

Parameters:

test_features (ndarray) – Features for test configs

Return type:

Tuple[ndarray, ndarray]

Returns:

posterior_means, posterior_variances

sample_marginals(test_features, num_samples=1, random_state=None)[source]

If test_features are non-extended features (no resource attribute), we sample from the ensemble predictive distribution. Otherwise, we call the superclass method.

Return type:

ndarray

sample_joint(test_features, num_samples=1, random_state=None)[source]

If test_features are non-extended features (no resource attribute), we sample from the ensemble predictive distribution. Otherwise, we call the superclass method.

Return type:

ndarray

backward_gradient(input, head_gradients, mean_data, std_data)[source]

Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.

Parameters:
  • input (ndarray) – Single input point x, shape (d,)

  • head_gradients (Dict[str, ndarray]) – See Predictor.backward_gradient

  • mean_data (float) – Mean used to normalize targets

  • std_data (float) – Stddev used to normalize targets

Return type:

ndarray

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.posterior_state.HyperTuneJointGPPosteriorState(features, targets, mean, kernel, noise_variance, resource_attr_range, ensemble_distribution, debug_log=False)[source]

Bases: GaussProcPosteriorState

Special case of GaussProcPosteriorState, where methods predict, backward_gradient, sample_marginals, sample_joint are over a random function \(f_{MF}(x)\), obtained by first sampling the resource level \(r \sim [\theta_r]\), then use \(f_{MF}(x) = f(x, r)\). Predictive means and variances are:

..math::

mu_{MF}(x) = sum_r theta_r mu(x, r) sigma_{MF}^2(x) = sum_r theta_r^2 sigma_{MF}^2(x, r)

Here, \([\theta_k]\) is a distribution over a subset of rung levels.

Note: This posterior state is unusual, in that sample_marginals, sample_joint have to work both with (a) extended inputs (x, r) and (b) non-extended inputs x. For case (a), they behave like the superclass methods, this is needed to support fitting model parameters, for example for drawing fantasy samples. For case (b), they use the ensemble distribution detailed above, which supports optimizing the acquisition function.

predict(test_features)[source]

Computes marginal statistics (means, variances) for a number of test features.

Parameters:

test_features (ndarray) – Features for test configs

Return type:

Tuple[ndarray, ndarray]

Returns:

posterior_means, posterior_variances

sample_marginals(test_features, num_samples=1, random_state=None)[source]

If test_features are non-extended features (no resource attribute), we sample from the ensemble predictive distribution. Otherwise, we call the superclass method.

Return type:

ndarray

sample_joint(test_features, num_samples=1, random_state=None)[source]

If test_features are non-extended features (no resource attribute), we sample from the ensemble predictive distribution. Otherwise, we call the superclass method.

Return type:

ndarray

backward_gradient(input, head_gradients, mean_data, std_data)[source]

Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.

The posterior represented here is based on normalized data, while the acquisition function is based on the de-normalized predictive distribution, which is why we need ‘mean_data’, ‘std_data’ here.

Parameters:
  • input (ndarray) – Single input point x, shape (d,)

  • head_gradients (Dict[str, ndarray]) – See Predictor.backward_gradient

  • mean_data (float) – Mean used to normalize targets

  • std_data (float) – Stddev used to normalize targets

Return type:

ndarray

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.ExtendFeaturesByResourceMixin(resource, resource_attr_range)[source]

Bases: object

extend_features_by_resource(test_features)[source]
Return type:

ndarray

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.PosteriorStateClampedResource(poster_state_extended, resource, resource_attr_range)[source]

Bases: PosteriorStateWithSampleJoint, ExtendFeaturesByResourceMixin

Converts posterior state of PosteriorStateWithSampleJoint over extended inputs into posterior state over non-extended inputs, where the resource attribute is clamped to a fixed value.

Parameters:
  • poster_state_extended (PosteriorStateWithSampleJoint) – Posterior state over extended inputs

  • resource (int) – Value to which resource attribute is clamped

  • resource_attr_range (Tuple[int, int]) – \((r_{min}, r_{max})\)

property num_data
property num_features
property num_fantasies
neg_log_likelihood()[source]
Return type:

ndarray

Returns:

Negative log marginal likelihood

predict(test_features)[source]

Computes marginal statistics (means, variances) for a number of test features.

Parameters:

test_features (ndarray) – Features for test configs

Return type:

Tuple[ndarray, ndarray]

Returns:

posterior_means, posterior_variances

sample_marginals(test_features, num_samples=1, random_state=None)[source]

See comments of predict.

Parameters:
  • test_features (ndarray) – Input points for test configs

  • num_samples (int) – Number of samples

  • random_state (Optional[RandomState]) – PRNG

Return type:

ndarray

Returns:

Marginal samples, (num_test, num_samples)

sample_joint(test_features, num_samples=1, random_state=None)[source]

See comments of predict.

Parameters:
  • test_features (ndarray) – Input points for test configs

  • num_samples (int) – Number of samples

  • random_state (Optional[RandomState]) – PRNG

Return type:

ndarray

Returns:

Joint samples, (num_test, num_samples)

backward_gradient(input, head_gradients, mean_data, std_data)[source]

Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.

Parameters:
  • input (ndarray) – Single input point x, shape (d,)

  • head_gradients (Dict[str, ndarray]) – See Predictor.backward_gradient

  • mean_data (float) – Mean used to normalize targets

  • std_data (float) – Stddev used to normalize targets

Return type:

ndarray

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.MeanFunctionClampedResource(mean_extended, resource, resource_attr_range, **kwargs)[source]

Bases: MeanFunction, ExtendFeaturesByResourceMixin

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.KernelFunctionClampedResource(kernel_extended, resource, resource_attr_range, **kwargs)[source]

Bases: KernelFunction, ExtendFeaturesByResourceMixin

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.GaussProcPosteriorStateAndRungLevels(poster_state, rung_levels)[source]

Bases: PosteriorStateWithSampleJoint

property poster_state: GaussProcPosteriorState
property num_data
property num_features
property num_fantasies
neg_log_likelihood()[source]
Return type:

ndarray

Returns:

Negative log marginal likelihood

predict(test_features)[source]

Computes marginal statistics (means, variances) for a number of test features.

Parameters:

test_features (ndarray) – Features for test configs

Return type:

Tuple[ndarray, ndarray]

Returns:

posterior_means, posterior_variances

sample_marginals(test_features, num_samples=1, random_state=None)[source]

See comments of predict.

Parameters:
  • test_features (ndarray) – Input points for test configs

  • num_samples (int) – Number of samples

  • random_state (Optional[RandomState]) – PRNG

Return type:

ndarray

Returns:

Marginal samples, (num_test, num_samples)

sample_joint(test_features, num_samples=1, random_state=None)[source]

See comments of predict.

Parameters:
  • test_features (ndarray) – Input points for test configs

  • num_samples (int) – Number of samples

  • random_state (Optional[RandomState]) – PRNG

Return type:

ndarray

Returns:

Joint samples, (num_test, num_samples)

backward_gradient(input, head_gradients, mean_data, std_data)[source]

Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.

Parameters:
  • input (ndarray) – Single input point x, shape (d,)

  • head_gradients (Dict[str, ndarray]) – See Predictor.backward_gradient

  • mean_data (float) – Mean used to normalize targets

  • std_data (float) – Stddev used to normalize targets

Return type:

ndarray

Returns:

property rung_levels: List[int]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.hypertune_ranking_losses(poster_state, data, num_samples, resource_attr_range, random_state=None)[source]

Samples ranking loss values as defined in the Hyper-Tune paper. We return a matrix of size (num_supp_levels, num_samples), where num_supp_levels <= poster_state.rung_levels is the number of rung levels supported by at least 6 labeled datapoints.

The loss values depend on the cases in data at the level poster_state.rung_levels[num_supp_levels - 1]. We must have num_supp_levels >= 2.

Loss values at this highest supported level are estimated by cross-validation (so the data at this level is split into training and test, where the training part is used to obtain the posterior state). The number of CV folds is <= 5, and such that each fold has at least two points.

Parameters:
Return type:

ndarray

Returns:

See above

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.hypertune.utils.number_supported_levels_and_data_highest_level(rung_levels, data, resource_attr_range)[source]

Finds num_supp_levels as maximum such that rung levels up to there have >= 6 labeled datapoints. The set of labeled datapoints of level num_supp_levels - 1 is returned as well.

If num_supp_levels == 1, no level except for the lowest has >= 6 datapoints. In this case, data_max_resource returned is invalid.

Parameters:
  • rung_levels (List[int]) – Rung levels

  • data (Dict[str, Any]) – Training data (only data at highest level is used)

  • resource_attr_range (Tuple[int, int]) – (r_min, r_max)

Return type:

Tuple[int, dict]

Returns:

(num_supp_levels, data_max_resource)

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.gpind_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.gpind_model.IndependentGPPerResourceModel(kernel, mean_factory, resource_attr_range, target_transform=None, separate_noise_variances=False, initial_noise_variance=None, initial_covariance_scale=None, optimization_config=None, random_seed=None, fit_reset_params=True)[source]

Bases: GaussianProcessOptimizeModel

GP multi-fidelity model over f(x, r), where for each r, f(x, r) is represented by an independent GP. The different processes share the same kernel, but have their own mean functions mu_r and covariance scales c_r.

The likelihood object is not created at construction, but only with create_likelihood. This is because we need to know the rung levels of the Hyperband scheduler.

Parameters:
  • kernel (KernelFunction) – Kernel function without covariance scale, shared by models for all resources r

  • mean_factory (Callable[[int], MeanFunction]) – Factory function for mean functions mu_r(x)

  • resource_attr_range (Tuple[int, int]) – (r_min, r_max)

  • target_transform (Optional[ScalarTargetTransform]) – Invertible transform of target values y to latent values z, which are then modelled as Gaussian. Shared across different \(r\). Defaults to the identity

  • separate_noise_variances (bool) – Separate noise variance for each r? Otherwise, noise variance is shared

  • initial_noise_variance (Optional[float]) – Initial value for noise variance parameter

  • initial_covariance_scale (Optional[float]) – Initial value for covariance scale parameters c_r

  • optimization_config (Optional[OptimizationConfig]) – Configuration that specifies the behavior of the optimization of the marginal likelihood.

  • random_seed – Random seed to be used (optional)

  • fit_reset_params (bool) – Reset parameters to initial values before running ‘fit’? If False, ‘fit’ starts from the current values

create_likelihood(rung_levels)[source]

Delayed creation of likelihood, needs to know rung levels of Hyperband scheduler.

Note: last entry of rung_levels must be max_t, even if this is not a rung level in Hyperband.

Parameters:

rung_levels (List[int]) – Rung levels

property likelihood: MarginalLikelihood
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.likelihood module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.likelihood.IndependentGPPerResourceMarginalLikelihood(kernel, mean, resource_attr_range, target_transform=None, separate_noise_variances=False, initial_noise_variance=None, initial_covariance_scale=None, encoding_type=None, **kwargs)[source]

Bases: MarginalLikelihood

Marginal likelihood for GP multi-fidelity model over \(f(x, r)\), where for each \(r\), \(f(x, r)\) is represented by an independent GP. The different processes share the same kernel, but have their own mean functions \(\mu_r\) and covariance scales \(c_r\). If separate_noise_variances == True, each process has its own noise variance, otherwise all processes share the same noise variance.

Parameters:
  • kernel (KernelFunction) – Shared kernel function \(k(x, x')\)

  • mean (Dict[int, MeanFunction]) – Maps rung level \(r\) to mean function \(\mu_r\)

  • resource_attr_range (Tuple[int, int]) – \((r_{min}, r_{max})\)

  • target_transform (Optional[ScalarTargetTransform]) – Invertible transform of target values y to latent values z, which are then modelled as Gaussian. Shared across different \(r\). Defaults to the identity

  • separate_noise_variances (bool) – See above. Defaults to False

  • initial_noise_variance (Optional[float]) – Initial value for noise variance(s). Defaults to INITIAL_NOISE_VARIANCE

  • initial_covariance_scale (Optional[float]) – Initial value for covariance scales. Defaults to INITIAL_COVARIANCE_SCALE

  • encoding_type (Optional[str]) – Encoding used for noise variance(s) and covariance scales. Defaults to DEFAULT_ENCODING

get_posterior_state(data)[source]
Return type:

PosteriorState

forward(data)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]

Return a list of tuples with the Gluon parameters of the likelihood and their respective encodings

Return type:

List[tuple]

get_noise_variance(as_ndarray=False)[source]
get_covariance_scale(resource, as_ndarray=False)[source]
set_covariance_scale(resource, val)[source]
get_params()[source]
Return type:

Dict[str, ndarray]

set_params(param_dict)[source]
on_fit_start(data)[source]

Called at the beginning of fit.

Parameters:

data (Dict[str, Any]) – Argument passed to fit

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.posterior_state module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.independent.posterior_state.IndependentGPPerResourcePosteriorState(features, targets, kernel, mean, covariance_scale, noise_variance, resource_attr_range, debug_log=False)[source]

Bases: PosteriorStateWithSampleJoint

Posterior state for model over f(x, r), where for a fixed set of resource levels r, each f(x, r) is represented by an independent Gaussian process. These processes share a common covariance function k(x, x), but can have their own mean functions mu_r and covariance scales c_r. They can also have their own noise variances, or the noise variance is shared.

Attention: Predictions can only be done at (x, r) where r has at least one training datapoint. This is because a posterior state cannot represent the prior.

state(resource)[source]
Return type:

GaussProcPosteriorState

property num_data
property num_features
property num_fantasies
neg_log_likelihood()[source]
Return type:

ndarray

Returns:

Negative log marginal likelihood

predict(test_features)[source]

Computes marginal statistics (means, variances) for a number of test features.

Parameters:

test_features (ndarray) – Features for test configs

Return type:

Tuple[ndarray, ndarray]

Returns:

posterior_means, posterior_variances

sample_marginals(test_features, num_samples=1, random_state=None)[source]

Different to predict, entries in test_features may have resources not covered by data in posterior state. For such entries, we return the prior mean. We do not sample from the prior. If sample_marginals is used to draw fantasy values, this corresponds to the Kriging believer heuristic.

Return type:

ndarray

sample_joint(test_features, num_samples=1, random_state=None)[source]

Different to predict, entries in test_features may have resources not covered by data in posterior state. For such entries, we return the prior mean. We do not sample from the prior. If sample_joint is used to draw fantasy values, this corresponds to the Kriging believer heuristic.

Return type:

ndarray

backward_gradient(input, head_gradients, mean_data, std_data)[source]

Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.

Parameters:
  • input (ndarray) – Single input point x, shape (d,)

  • head_gradients (Dict[str, ndarray]) – See Predictor.backward_gradient

  • mean_data (float) – Mean used to normalize targets

  • std_data (float) – Stddev used to normalize targets

Return type:

ndarray

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel package
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.KernelFunction(dimension, **kwargs)[source]

Bases: MeanFunction

Base class of kernel (or covariance) function math:k(x, x')

Parameters:

dimension (int) – Dimensionality of input points after encoding into ndarray

property dimension
Returns:

Dimension d of input points

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.Matern52(dimension, ARD=False, encoding_type='logarithm', has_covariance_scale=True, **kwargs)[source]

Bases: KernelFunction

Block that is responsible for the computation of Matern 5/2 kernel.

if ARD == False, inverse_bandwidths is equal to a scalar broadcast to the d components (with d = dimension, i.e., the number of features in X).

Arguments on top of base class SquaredDistance:

Parameters:

has_covariance_scale (bool) – Kernel has covariance scale parameter? Defaults to True

property ARD: bool
forward(X1, X2)[source]

Computes Matern 5/2 kernel matrix

Parameters:
  • X1 – input matrix, shape (n1,d)

  • X2 – input matrix, shape (n2,d)

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_covariance_scale()[source]
set_covariance_scale(covariance_scale)[source]
get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.ExponentialDecayResourcesKernelFunction(kernel_x, mean_x, encoding_type='logarithm', alpha_init=1.0, mean_lam_init=0.5, gamma_init=0.5, delta_fixed_value=None, delta_init=0.5, max_metric_value=1.0, **kwargs)[source]

Bases: KernelFunction

Variant of the kernel function for modeling exponentially decaying learning curves, proposed in:

Swersky, K., Snoek, J., & Adams, R. P. (2014).
Freeze-Thaw Bayesian Optimization.

The argument in that paper actually justifies using a non-zero mean function (see ExponentialDecayResourcesMeanFunction) and centralizing the kernel proposed there. This is done here. Details in:

Tiao, Klein, Archambeau, Seeger (2020)
Model-based Asynchronous Hyperparameter Optimization

We implement a new family of kernel functions, for which the additive Freeze-Thaw kernel is one instance (delta == 0). The kernel has parameters alpha, mean_lam, gamma > 0, and 0 <= delta <= 1. Note that beta = alpha / mean_lam is used in the Freeze-Thaw paper (the Gamma distribution over lambda is parameterized differently). The additive Freeze-Thaw kernel is obtained for delta == 0 (use delta_fixed_value = 0).

In fact, this class is configured with a kernel and a mean function over inputs x (dimension d) and represents a kernel (and mean function) over inputs (x, r) (dimension d + 1), where the resource attribute r >= 0 is last.

forward(X1, X2, **kwargs)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

mean_function(X)[source]
get_params()[source]

Parameter keys are “alpha”, “mean_lam”, “gamma”, “delta” (only if not fixed to delta_fixed_value), as well as those of self.kernel_x (prefix “kernelx_”) and of self.mean_x (prefix “meanx_”).

Return type:

Dict[str, Any]

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.ExponentialDecayResourcesMeanFunction(kernel, **kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.FabolasKernelFunction(dimension=1, encoding_type='logarithm', u1_init=1.0, u3_init=0.0, **kwargs)[source]

Bases: KernelFunction

The kernel function proposed in:

Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, np. (2016). Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets, in AISTATS 2017. ArXiv:1605.07079 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1605.07079

Please note this is only one of the components of the factorized kernel proposed in the paper. This is the finite-rank (“degenerate”) kernel for modelling data subset fraction sizes. Defined as:

k(x, y) = (U phi(x))^T (U phi(y)), x, y in [0, 1], phi(x) = [1, (1 - x)^2]^T, U = [[u1, u3], [0, u2]] upper triangular, u1, u2 > 0.

forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.ProductKernelFunction(kernel1, kernel2, name_prefixes=None, **kwargs)[source]

Bases: KernelFunction

Given two kernel functions K1, K2, this class represents the product kernel function given by

\[((x_1, x_2), (y_1, y_2)) \mapsto K(x_1, y_1) \cdot K(x_2, y_2)\]

We assume that parameters of K1 and K2 are disjoint.

forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]

Note: We assume that K1 and K2 have disjoint parameters, otherwise there will be a redundancy here.

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.FreezeThawKernelFunction(kernel_x, mean_x, encoding_type='logarithm', alpha_init=1.0, mean_lam_init=0.5, gamma_init=0.5, max_metric_value=1.0, **kwargs)[source]

Bases: KernelFunction

Variant of the kernel function for modeling exponentially decaying learning curves, proposed in:

Swersky, K., Snoek, J., & Adams, R. P. (2014). Freeze-Thaw Bayesian Optimization. ArXiv:1406.3896 [Cs, Stat). Retrieved from http://arxiv.org/abs/1406.3896

The argument in that paper actually justifies using a non-zero mean function (see ExponentialDecayResourcesMeanFunction) and centralizing the kernel proposed there. This is done here.

As in the Freeze-Thaw paper, learning curves for different configs are conditionally independent.

This class is configured with a kernel and a mean function over inputs x (dimension d) and represents a kernel (and mean function) over inputs (x, r) (dimension d + 1), where the resource attribute r >= 0 is last.

Note: This kernel is mostly for debugging! Its conditional independence assumptions allow for faster inference, as implemented in GaussProcExpDecayPosteriorState.

forward(X1, X2, **kwargs)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

mean_function(X)[source]
get_params()[source]

Parameter keys are alpha, mean_lam, gamma, delta (only if not fixed to delta_fixed_value), as well as those of self.kernel_x (prefix ‘kernelx_’) and of self.mean_x (prefix ‘meanx_’).

Return type:

Dict[str, Any]

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.FreezeThawMeanFunction(kernel, **kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.CrossValidationMeanFunction(kernel, **kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.CrossValidationKernelFunction(kernel_main, kernel_residual, mean_main, num_folds, **kwargs)[source]

Bases: KernelFunction

Kernel function suitable for \(f(x, r)\) being the average of r validation metrics evaluated on different (train, validation) splits.

More specifically, there are ‘num_folds`` such splits, and \(f(x, r)\) is the average over the first r of them.

We model the score on fold k as \(e_k(x) = f(x) + g_k(x)\), where \(f(x)\) and the \(g_k(x)\) are a priori independent Gaussian processes with kernels kernel_main and kernel_residual (all \(g_k\) share the same kernel). Moreover, the \(g_k\) are zero-mean, while \(f(x)\) may have a mean function. Then:

\[ \begin{align}\begin{aligned}f(x, r) = r^{-1} sum_{k \le r} e_k(x),\\k((x, r), (x', r')) = k_{main}(x, x') + \frac{k_{residual}(x, x')}{\mathrm{max}(r, r')}\end{aligned}\end{align} \]

Note that kernel_main, kernel_residual are over inputs \(x\) (dimension d), while the kernel represented here is over inputs \((x, r)\) of dimension d + 1, where the resource attribute \(r\) (number of folds) is last.

Inputs are encoded. We assume a linear encoding for r with bounds 1 and num_folds. TODO: Right now, all HPs are encoded, and the resource attribute counts as HP, even if it is not optimized over. This creates a dependence to how inputs are encoded.

forward(X1, X2, **kwargs)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

mean_function(X)[source]
get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.RangeKernelFunction(dimension, kernel, start, **kwargs)[source]

Bases: KernelFunction

Given kernel function K and range R, this class represents

\[(x, y) \mapsto K(x_R, y_R)\]
forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]

Note: We assume that K1 and K2 have disjoint parameters, otherwise there will be a redundancy here.

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.base module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.base.KernelFunction(dimension, **kwargs)[source]

Bases: MeanFunction

Base class of kernel (or covariance) function math:k(x, x')

Parameters:

dimension (int) – Dimensionality of input points after encoding into ndarray

property dimension
Returns:

Dimension d of input points

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.base.SquaredDistance(dimension, ARD=False, encoding_type='logarithm', **kwargs)[source]

Bases: Block

Block that is responsible for the computation of matrices of squared distances. The distances can possibly be weighted (e.g., ARD parametrization). For instance:

\[ \begin{align}\begin{aligned}m_{i j} = \sum_{k=1}^d ib_k^2 (x_{1: i k} - x_{2: j k})^2\\\mathbf{X}_1 = [x_{1: i j}],\quad \mathbf{X}_2 = [x_{2: i j}]\end{aligned}\end{align} \]

Here, \([ib_k]\) is the vector inverse_bandwidth. if ARD == False, inverse_bandwidths is equal to a scalar broadcast to the d components (with d = dimension, i.e., the number of features in X).

Parameters:
  • dimension (int) – Dimensionality \(d\) of input vectors

  • ARD (bool) – Automatic relevance determination (inverse_bandwidth vector of size d)? Defaults to False

  • encoding_type (str) – Encoding for inverse_bandwidth. Defaults to DEFAULT_ENCODING

forward(X1, X2)[source]

Computes matrix of squared distances

Parameters:
  • X1 – input matrix, shape (n1, d)

  • X2 – input matrix, shape (n2, d)

get_params()[source]

Parameter keys are “inv_bw<k> “if dimension > 1, and “inv_bw” if dimension == 1.

Return type:

Dict[str, Any]

set_params(param_dict)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.base.Matern52(dimension, ARD=False, encoding_type='logarithm', has_covariance_scale=True, **kwargs)[source]

Bases: KernelFunction

Block that is responsible for the computation of Matern 5/2 kernel.

if ARD == False, inverse_bandwidths is equal to a scalar broadcast to the d components (with d = dimension, i.e., the number of features in X).

Arguments on top of base class SquaredDistance:

Parameters:

has_covariance_scale (bool) – Kernel has covariance scale parameter? Defaults to True

property ARD: bool
forward(X1, X2)[source]

Computes Matern 5/2 kernel matrix

Parameters:
  • X1 – input matrix, shape (n1,d)

  • X2 – input matrix, shape (n2,d)

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_covariance_scale()[source]
set_covariance_scale(covariance_scale)[source]
get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.cross_validation module
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.cross_validation.decode_resource_values(res_encoded, num_folds)[source]

We assume the resource attribute r is encoded as randint(1, num_folds). Internally, r is taken as value in the real interval [0.5, num_folds + 0.5], which is linearly transformed to [0, 1] for encoding.

Parameters:
  • res_encoded – Encoded values r

  • num_folds – Maximum number of folds

Returns:

Original values r (not rounded to int)

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.cross_validation.CrossValidationKernelFunction(kernel_main, kernel_residual, mean_main, num_folds, **kwargs)[source]

Bases: KernelFunction

Kernel function suitable for \(f(x, r)\) being the average of r validation metrics evaluated on different (train, validation) splits.

More specifically, there are ‘num_folds`` such splits, and \(f(x, r)\) is the average over the first r of them.

We model the score on fold k as \(e_k(x) = f(x) + g_k(x)\), where \(f(x)\) and the \(g_k(x)\) are a priori independent Gaussian processes with kernels kernel_main and kernel_residual (all \(g_k\) share the same kernel). Moreover, the \(g_k\) are zero-mean, while \(f(x)\) may have a mean function. Then:

\[ \begin{align}\begin{aligned}f(x, r) = r^{-1} sum_{k \le r} e_k(x),\\k((x, r), (x', r')) = k_{main}(x, x') + \frac{k_{residual}(x, x')}{\mathrm{max}(r, r')}\end{aligned}\end{align} \]

Note that kernel_main, kernel_residual are over inputs \(x\) (dimension d), while the kernel represented here is over inputs \((x, r)\) of dimension d + 1, where the resource attribute \(r\) (number of folds) is last.

Inputs are encoded. We assume a linear encoding for r with bounds 1 and num_folds. TODO: Right now, all HPs are encoded, and the resource attribute counts as HP, even if it is not optimized over. This creates a dependence to how inputs are encoded.

forward(X1, X2, **kwargs)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

mean_function(X)[source]
get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.cross_validation.CrossValidationMeanFunction(kernel, **kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.exponential_decay module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.exponential_decay.ExponentialDecayResourcesKernelFunction(kernel_x, mean_x, encoding_type='logarithm', alpha_init=1.0, mean_lam_init=0.5, gamma_init=0.5, delta_fixed_value=None, delta_init=0.5, max_metric_value=1.0, **kwargs)[source]

Bases: KernelFunction

Variant of the kernel function for modeling exponentially decaying learning curves, proposed in:

Swersky, K., Snoek, J., & Adams, R. P. (2014).
Freeze-Thaw Bayesian Optimization.

The argument in that paper actually justifies using a non-zero mean function (see ExponentialDecayResourcesMeanFunction) and centralizing the kernel proposed there. This is done here. Details in:

Tiao, Klein, Archambeau, Seeger (2020)
Model-based Asynchronous Hyperparameter Optimization

We implement a new family of kernel functions, for which the additive Freeze-Thaw kernel is one instance (delta == 0). The kernel has parameters alpha, mean_lam, gamma > 0, and 0 <= delta <= 1. Note that beta = alpha / mean_lam is used in the Freeze-Thaw paper (the Gamma distribution over lambda is parameterized differently). The additive Freeze-Thaw kernel is obtained for delta == 0 (use delta_fixed_value = 0).

In fact, this class is configured with a kernel and a mean function over inputs x (dimension d) and represents a kernel (and mean function) over inputs (x, r) (dimension d + 1), where the resource attribute r >= 0 is last.

forward(X1, X2, **kwargs)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

mean_function(X)[source]
get_params()[source]

Parameter keys are “alpha”, “mean_lam”, “gamma”, “delta” (only if not fixed to delta_fixed_value), as well as those of self.kernel_x (prefix “kernelx_”) and of self.mean_x (prefix “meanx_”).

Return type:

Dict[str, Any]

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.exponential_decay.ExponentialDecayResourcesMeanFunction(kernel, **kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.fabolas module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.fabolas.FabolasKernelFunction(dimension=1, encoding_type='logarithm', u1_init=1.0, u3_init=0.0, **kwargs)[source]

Bases: KernelFunction

The kernel function proposed in:

Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, np. (2016). Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets, in AISTATS 2017. ArXiv:1605.07079 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1605.07079

Please note this is only one of the components of the factorized kernel proposed in the paper. This is the finite-rank (“degenerate”) kernel for modelling data subset fraction sizes. Defined as:

k(x, y) = (U phi(x))^T (U phi(y)), x, y in [0, 1], phi(x) = [1, (1 - x)^2]^T, U = [[u1, u3], [0, u2]] upper triangular, u1, u2 > 0.

forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.freeze_thaw module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.freeze_thaw.FreezeThawKernelFunction(kernel_x, mean_x, encoding_type='logarithm', alpha_init=1.0, mean_lam_init=0.5, gamma_init=0.5, max_metric_value=1.0, **kwargs)[source]

Bases: KernelFunction

Variant of the kernel function for modeling exponentially decaying learning curves, proposed in:

Swersky, K., Snoek, J., & Adams, R. P. (2014). Freeze-Thaw Bayesian Optimization. ArXiv:1406.3896 [Cs, Stat). Retrieved from http://arxiv.org/abs/1406.3896

The argument in that paper actually justifies using a non-zero mean function (see ExponentialDecayResourcesMeanFunction) and centralizing the kernel proposed there. This is done here.

As in the Freeze-Thaw paper, learning curves for different configs are conditionally independent.

This class is configured with a kernel and a mean function over inputs x (dimension d) and represents a kernel (and mean function) over inputs (x, r) (dimension d + 1), where the resource attribute r >= 0 is last.

Note: This kernel is mostly for debugging! Its conditional independence assumptions allow for faster inference, as implemented in GaussProcExpDecayPosteriorState.

forward(X1, X2, **kwargs)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

mean_function(X)[source]
get_params()[source]

Parameter keys are alpha, mean_lam, gamma, delta (only if not fixed to delta_fixed_value), as well as those of self.kernel_x (prefix ‘kernelx_’) and of self.mean_x (prefix ‘meanx_’).

Return type:

Dict[str, Any]

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.freeze_thaw.FreezeThawMeanFunction(kernel, **kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.product_kernel module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.product_kernel.ProductKernelFunction(kernel1, kernel2, name_prefixes=None, **kwargs)[source]

Bases: KernelFunction

Given two kernel functions K1, K2, this class represents the product kernel function given by

\[((x_1, x_2), (y_1, y_2)) \mapsto K(x_1, y_1) \cdot K(x_2, y_2)\]

We assume that parameters of K1 and K2 are disjoint.

forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]

Note: We assume that K1 and K2 have disjoint parameters, otherwise there will be a redundancy here.

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.range_kernel module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.range_kernel.RangeKernelFunction(dimension, kernel, start, **kwargs)[source]

Bases: KernelFunction

Given kernel function K and range R, this class represents

\[(x, y) \mapsto K(x_R, y_R)\]
forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]

Note: We assume that K1 and K2 have disjoint parameters, otherwise there will be a redundancy here.

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.ZeroKernel(dimension, **kwargs)[source]

Bases: KernelFunction

Constant zero kernel. This works only in the context used here, we do return matrices or vectors, but zero scalars.

forward(X1, X2, **kwargs)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.ZeroMean(**kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.ExponentialDecayBaseKernelFunction(r_max, r_min=1, normalize_inputs=False, **kwargs)[source]

Bases: KernelFunction

Implements exponential decay kernel k_r(r, r’) from the Freeze-Thaw paper, corresponding to ExponentialDecayResourcesKernelFunction with delta=0 and no x attributes.

Note: Inputs r lie in [r_min, r_max]. Optionally, they are normalized to [0, 1].

forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

mean_function(X)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.logdet_cholfact_cov_resource(likelihood)[source]

Computes the additional log(det(Lbar)) term. This is sum_i log(det(Lbar_i)), where Lbar_i is upper left submatrix of likelihood['lfact_all'], with size likelihood['ydims'][i].

Parameters:

likelihood (Dict) – Result of resource_kernel_likelihood_computations

Return type:

float

Returns:

log(det(Lbar))

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.resource_kernel_likelihood_precomputations(targets)[source]

Precomputations required by resource_kernel_likelihood_computations.

Importantly, prepare_data orders datapoints by nonincreasing number of targets ydims[i]. For 0 <= j < ydim_max, ydim_max = ydims[0] = max(ydims), num_configs[j] is the number of datapoints i for which ydims[i] > j. yflat is a flat matrix (rows corresponding to fantasy samples; column vector if no fantasizing) consisting of ydim_max parts, where part j is of size num_configs[j] and contains y[j] for targets of those i counted in num_configs[j].

Parameters:

targets (List[ndarray]) – Targets from data representation returned by prepare_data

Return type:

Dict

Returns:

See above

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.resource_kernel_likelihood_computations(precomputed, res_kernel, noise_variance, skip_c_d=False)[source]

Given precomputed from resource_kernel_likelihood_precomputations and resource kernel function res_kernel, compute quantities required for inference and marginal likelihood computation, pertaining to the likelihood of a additive model, as in the Freeze-Thaw paper.

Note that res_kernel takes raw (unnormalized) r as inputs. The code here works for any resource kernel and mean function, not just for ExponentialDecayBaseKernelFunction.

Results returned are: - c: n vector [c_i] - d: n vector [d_i], positive - vtv: n vector [|v_i|^2] - wtv: (n, F) matrix[(W_i)^T v_i], F number of fantasy samples - wtw: n vector [|w_i|^2] (only if no fantasizing) - lfact_all: Cholesky factor for kernel matrix - ydims: Target vector sizes (copy from precomputed)

Parameters:
  • precomputed (Dict) – Output of resource_kernel_likelihood_precomputations

  • res_kernel (ExponentialDecayBaseKernelFunction) – Kernel k(r, r’) over resources

  • noise_variance – Noise variance sigma^2

  • skip_c_d (bool) – If True, c and d are not computed

Return type:

Dict

Returns:

Quantities required for inference and learning criterion

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.resource_kernel_likelihood_slow_computations(targets, res_kernel, noise_variance, skip_c_d=False)[source]

Naive implementation of resource_kernel_likelihood_computations, which does not require precomputations, but is somewhat slower. Here, results are computed one datapoint at a time, instead of en bulk.

This code is used in unit testing only.

Return type:

Dict

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.predict_posterior_marginals_extended(poster_state, mean, kernel, test_features, resources, res_kernel)[source]

These are posterior marginals on f_r = h + g_r variables, where (x, r) are zipped from test_features, resources. posterior_means is a (n, F) matrix, where F is the number of fantasy samples, or F == 1 without fantasizing.

Parameters:
  • poster_state (Dict) – Posterior state

  • mean – Mean function

  • kernel – Kernel function

  • test_features – Feature matrix for test points (not extended)

  • resources (List[int]) – Resource values corresponding to rows of test_features

  • res_kernel (ExponentialDecayBaseKernelFunction) – Kernel k(r, r’) over resources

Returns:

posterior_means, posterior_variances

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.freeze_thaw.sample_posterior_joint(poster_state, mean, kernel, feature, targets, res_kernel, noise_variance, lfact_all, means_all, random_state, num_samples=1)[source]

Given poster_state for some data plus one additional configuration with data (feature, targets), draw joint samples of unobserved targets for this configuration. targets may be empty, but must not be complete (there must be some unobserved targets). The additional configuration must not be in the dataset used to compute poster_state.

If targets correspond to resource values range(r_min, r_obs), we sample latent target values y_r corresponding to range(r_obs, r_max+1), returning a dict with [y_r] under y (matrix with num_samples columns).

Parameters:
  • poster_state (Dict) – Posterior state for data

  • mean – Mean function

  • kernel – Kernel function

  • feature – Features for additional config

  • targets (ndarray) – Target values for additional config

  • res_kernel (ExponentialDecayBaseKernelFunction) – Kernel k(r, r’) over resources

  • noise_variance – Noise variance sigma^2

  • lfact_all – Cholesky factor of complete resource kernel matrix

  • means_all – See lfact_all

  • random_state (RandomState) – numpy.random.RandomState

  • num_samples (int) – Number of joint samples to draw (default: 1)

Return type:

Dict

Returns:

See above

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.gpiss_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.gpiss_model.GaussianProcessLearningCurveModel(kernel, res_model, mean=None, initial_noise_variance=None, optimization_config=None, random_seed=None, fit_reset_params=True)[source]

Bases: GaussianProcessOptimizeModel

Represents joint Gaussian model of learning curves over a number of configurations. The model has an additive form:

f(x, r) = g(r | x) + h(x),

where h(x) is a Gaussian process model for function values at r_max, and the g(r | x) are independent Gaussian models. Right now, g(r | x) can be:

  • Innovation state space model (ISSM) of a particular power-law decay

    form. For this one, g(r_max | x) = 0 for all x. Used if res_model is of type ISSModelParameters

  • Gaussian process model with exponential decay covariance function. This

    is essentially the model from the Freeze Thaw paper, see also ExponentialDecayResourcesKernelFunction. Used if res_model is of type ExponentialDecayBaseKernelFunction

Importantly, inference scales cubically only in the number of configurations, not in the number of observations.

Details about ISSMs in general are found in

Hyndman, R. and Koehler, A. and Ord, J. and Snyder, R. Forecasting with Exponential Smoothing: The State Space Approach Springer, 2008

Parameters:
  • kernel (KernelFunction) – Kernel function k(X, X’)

  • res_model (Union[ISSModelParameters, ExponentialDecayBaseKernelFunction]) – Model for g(r | x)

  • mean (Optional[MeanFunction]) – Mean function mu(X)

  • initial_noise_variance (Optional[float]) – A scalar to initialize the value of the residual noise variance

  • optimization_config (Optional[OptimizationConfig]) – Configuration that specifies the behavior of the optimization of the marginal likelihood.

  • random_seed – Random seed to be used (optional)

  • fit_reset_params (bool) – Reset parameters to initial values before running ‘fit’? If False, ‘fit’ starts from the current values

property likelihood: MarginalLikelihood
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm module
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.prepare_data(state, config_space_ext, active_metric, normalize_targets=False, do_fantasizing=False)[source]

Prepares data in state for further processing. The entries configs, targets of the result dict are lists of one entry per trial, they are sorted in decreasing order of number of target values. features is the feature matrix corresponding to configs. If normalize_targets is True, the target values are normalized to mean 0, variance 1 (over all values), and mean_targets, std_targets is returned.

If do_fantasizing is True, state.pending_evaluations is also taken into account. Entries there have to be of type FantasizedPendingEvaluation. Also, in terms of their resource levels, they need to be adjacent to observed entries, so there are no gaps. In this case, the entries of the targets list are matrices, each column corr´esponding to a fantasy sample.

Note: If normalize_targets, mean and stddev are computed over observed values only. Also, fantasy values in state.pending_evaluations are not normalized, because they are assumed to be sampled from the posterior with normalized targets as well.

Parameters:
  • state (TuningJobState) – TuningJobState with data

  • config_space_ext (ExtendedConfiguration) – Extended config space

  • active_metric (str) –

  • normalize_targets (bool) – See above

  • do_fantasizing (bool) – See above

Return type:

Dict

Returns:

See above

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.prepare_data_with_pending(state, config_space_ext, active_metric, normalize_targets=False)[source]

Similar to prepare_data with do_fantasizing=False, but two dicts are returned, the first for trials without pending evaluations, the second for trials with pending evaluations. The latter dict also contains trials which have pending, but no observed evaluations. The second dict has the additional entry num_pending, which lists the number of pending evals for each trial. These evals must be contiguous and adjacent with observed evals, so that the union of observed and pending evals are contiguous (when it comes to resource levels).

Parameters:
  • state (TuningJobState) – See prepare_data

  • config_space_ext (ExtendedConfiguration) – See prepare_data

  • active_metric (str) – See prepare_data

  • normalize_targets (bool) – See prepare_data

Return type:

(Dict, Dict)

Returns:

See above

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.issm_likelihood_precomputations(targets, r_min)[source]

Precomputations required by issm_likelihood_computations.

Importantly, prepare_data orders datapoints by nonincreasing number of targets ydims[i]. For 0 <= j < ydim_max, ydim_max = ydims[0] = max(ydims), num_configs[j] is the number of datapoints i for which ydims[i] > j. deltay is a flat matrix (rows corresponding to fantasy samples; column vector if no fantasizing) consisting of ydim_max parts, where part j is of size num_configs[j] and contains y[j] - y[j-1] for targets of those i counted in num_configs[j], the term needed in the recurrence to compute w[j]. ‘logr`` is a flat vector consisting of ydim_max - 1 parts, where part j (starting from 1) is of size num_configs[j] and contains the logarithmic term for computing a[j-1] and e[j].

Parameters:
  • targets (List[ndarray]) – Targets from data representation returned by prepare_data

  • r_min (int) – Value of r_min, as returned by prepare_data

Return type:

Dict

Returns:

See above

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.issm_likelihood_computations(precomputed, issm_params, r_min, r_max, skip_c_d=False)[source]

Given precomputed from issm_likelihood_precomputations and ISSM parameters issm_params, compute quantities required for inference and marginal likelihood computation, pertaining to the ISSM likelihood.

The index for r is range(r_min, r_max + 1). Observations must be contiguous from r_min. The ISSM parameters are: - alpha: n-vector, negative - beta: n-vector - gamma: scalar, positive

Results returned are: - c: n vector [c_i], negative - d: n vector [d_i], positive - vtv: n vector [|v_i|^2] - wtv: (n, F) matrix [(W_i)^T v_i], F number of fantasy samples - wtw: n-vector [|w_i|^2] (only if no fantasizing)

Parameters:
  • precomputed (Dict) – Output of issm_likelihood_precomputations

  • issm_params (Dict) – Parameters of ISSM likelihood

  • r_min (int) – Smallest resource value

  • r_max (int) – Largest resource value

  • skip_c_d (bool) – If True, c and d are not computed

Return type:

Dict

Returns:

Quantities required for inference and learning criterion

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.posterior_computations(features, mean, kernel, issm_likelihood, noise_variance)[source]

Computes posterior state (required for predictions) and negative log marginal likelihood (returned in criterion), The latter is computed only when there is no fantasizing (i.e., if issm_likelihood contains wtw).

Parameters:
  • features – Input matrix X

  • mean – Mean function

  • kernel – Kernel function

  • issm_likelihood (Dict) – Outcome of issm_likelihood_computations

  • noise_variance – Variance of ISSM innovations

Return type:

Dict

Returns:

Internal posterior state

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.predict_posterior_marginals(poster_state, mean, kernel, test_features)[source]

These are posterior marginals on the h variable, whereas the full model is for f_r = h + g_r (additive). posterior_means is a (n, F) matrix, where F is the number of fantasy samples, or F == 1 without fantasizing.

Parameters:
  • poster_state (Dict) – Posterior state

  • mean – Mean function

  • kernel – Kernel function

  • test_features – Feature matrix for test points (not extended)

Returns:

posterior_means, posterior_variances

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.sample_posterior_marginals(poster_state, mean, kernel, test_features, random_state, num_samples=1)[source]

We sample from posterior marginals on the h variance, see also predict_posterior_marginals.

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.predict_posterior_marginals_extended(poster_state, mean, kernel, test_features, resources, issm_params, r_min, r_max)[source]

These are posterior marginals on f_r = h + g_r variables, where (x, r) are zipped from test_features, resources. issm_params are likelihood parameters for the test configs. posterior_means is a (n, F) matrix, where F is the number of fantasy samples, or F == 1 without fantasizing.

Parameters:
  • poster_state (Dict) – Posterior state

  • mean – Mean function

  • kernel – Kernel function

  • test_features – Feature matrix for test points (not extended)

  • resources (List[int]) – Resource values corresponding to rows of test_features

  • issm_params (Dict) – See above

  • r_min (int) –

  • r_max (int) –

Returns:

posterior_means, posterior_variances

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.sample_posterior_joint(poster_state, mean, kernel, feature, targets, issm_params, r_min, r_max, random_state, num_samples=1)[source]

Given poster_state for some data plus one additional configuration with data (feature, targets, issm_params), draw joint samples of the latent variables not fixed by the data, and of the latent target values. targets may be empty, but must not reach all the way to r_max. The additional configuration must not be in the dataset used to compute poster_state.

If targets correspond to resource values range(r_min, r_obs), we sample latent target values y_r corresponding to range(r_obs, r_max+1) and latent function values f_r corresponding to range(r_obs-1, r_max+1), unless r_obs = r_min (i.e. targets empty), in which case both [y_r] and [f_r] ranges in range(r_min, r_max+1). We return a dict with [f_r] under f, [y_r] under y. These are matrices with num_samples columns.

Parameters:
  • poster_state (Dict) – Posterior state for data

  • mean – Mean function

  • kernel – Kernel function

  • feature – Features for additional config

  • targets (ndarray) – Target values for additional config

  • issm_params (Dict) – Likelihood parameters for additional config

  • r_min (int) – Smallest resource value

  • r_max (int) – Largest resource value

  • random_state (RandomState) – numpy.random.RandomState

  • num_samples (int) – Number of joint samples to draw (default: 1)

Return type:

Dict

Returns:

See above

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.issm_likelihood_slow_computations(targets, issm_params, r_min, r_max, skip_c_d=False)[source]

Naive implementation of issm_likelihood_computations, which does not require precomputations, but is much slower. Here, results are computed one datapoint at a time, instead of en bulk.

This code is used in unit testing, and called from sample_posterior_joint.

Return type:

Dict

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.update_posterior_state(poster_state, kernel, feature, d_new, s_new, r2_new)[source]

Incremental update of posterior state, given data for one additional configuration. The new datapoint gives rise to a new row/column of the Cholesky factor. r2vec and svec are extended by r2_new, s_new respectively. r4vec and pvec are extended and all entries change. The new datapoint is represented by feature, d_new, s_new, r2_new.

Note: The field criterion is not updated, but set to np.nan.

Parameters:
  • poster_state (Dict) – Posterior state for data

  • kernel – Kernel function

  • feature – Features for additional config

  • d_new – See above

  • s_new – See above

  • r2_new – See above

Return type:

Dict

Returns:

Updated posterior state

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.issm.update_posterior_pvec(poster_state, kernel, feature, d_new, s_new, r2_new)[source]

Part of update_posterior_state, just returns the new p vector.

Parameters:
  • poster_state (Dict) – See update_posterior_state

  • kernel – See update_posterior_state

  • feature – See update_posterior_state

  • d_new – See update_posterior_state

  • s_new – See update_posterior_state

  • r2_new – See update_posterior_state

Return type:

ndarray

Returns:

New p vector, as flat vector

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.likelihood module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.likelihood.GaussAdditiveMarginalLikelihood(kernel, res_model, mean=None, initial_noise_variance=None, encoding_type=None, **kwargs)[source]

Bases: MarginalLikelihood

Marginal likelihood of joint learning curve model, where each curve is modelled as sum of a Gaussian process over x (for the value at r_max) and a Gaussian model over r.

The latter res_model is either an ISSM or another Gaussian process with exponential decay covariance function.

Parameters:
get_posterior_state(data)[source]
Return type:

PosteriorState

forward(data)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]

Return a list of tuples with the Gluon parameters of the likelihood and their respective encodings

Return type:

List[tuple]

get_noise_variance(as_ndarray=False)[source]
get_params()[source]
Return type:

Dict[str, Any]

set_params(param_dict)[source]
data_precomputations(data, overwrite=False)[source]

Some models require precomputations based on data. Precomputed variables are appended to data. This is done only if not already included in data, unless overwrite is True.

Parameters:
  • data (Dict[str, Any]) –

  • overwrite (bool) –

on_fit_start(data)[source]

Called at the beginning of fit.

Parameters:

data (Dict[str, Any]) – Argument passed to fit

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.model_params module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.model_params.ISSModelParameters(gamma_is_one=False, **kwargs)[source]

Bases: MeanFunction

Maintains parameters of an ISSM of a particular power low decay form.

For each configuration, we have alpha < 0 and beta. These may depend on the input feature x (encoded configuration):

(alpha, beta) = F(x; params),

where params are the internal parameters to be learned.

There is also gamma > 0, which can be fixed to 1.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_gamma()[source]
get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_gamma(val)[source]
set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

get_issm_params(features)[source]

Given feature matrix X, returns ISSM parameters which configure the likelihood: alpha, beta vectors (size n), gamma scalar.

Parameters:

features – Feature matrix X, (n, d)

Return type:

Dict[str, Any]

Returns:

Dict with alpha, beta, gamma

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.model_params.IndependentISSModelParameters(gamma_is_one=False, **kwargs)[source]

Bases: ISSModelParameters

Most basic implementation, where alpha, beta are scalars, independent of the configuration.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_alpha()[source]
get_beta()[source]
get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_alpha(val)[source]
set_beta(val)[source]
set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

get_issm_params(features)[source]

Given feature matrix X, returns ISSM parameters which configure the likelihood: alpha, beta vectors (size n), gamma scalar.

Parameters:

features – Feature matrix X, (n, d)

Return type:

Dict[str, Any]

Returns:

Dict with alpha, beta, gamma

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.posterior_state module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.posterior_state.GaussProcAdditivePosteriorState(data, mean, kernel, noise_variance, **kwargs)[source]

Bases: PosteriorState

Represent posterior state for joint Gaussian model of learning curves over a number of configurations. The (additive) model is the sum of a Gaussian process model for function values at r_max and independent Gaussian models over r only.

Importantly, inference scales cubically only in the number of configurations, not in the number of observations.

property num_data
property num_features
property num_fantasies
neg_log_likelihood()[source]
Return type:

ndarray

Returns:

Negative log marginal likelihood

predict(test_features)[source]

We compute marginals over f(x, r), where test_features are extended features. Note: The test configs must not overlap with any in the training set. Otherwise, at least if r != r_max, the predictive distributions computed here may be wrong.

Parameters:

test_features (ndarray) – Extended features for test configs

Return type:

Tuple[ndarray, ndarray]

Returns:

posterior_means, posterior_variances

sample_marginals(test_features, num_samples=1, random_state=None)[source]

See comments of predict.

Parameters:
  • test_features (ndarray) – Input points for test configs

  • num_samples (int) – Number of samples

  • random_state (Optional[RandomState]) – PRNG

Return type:

ndarray

Returns:

Marginal samples, (num_test, num_samples)

backward_gradient(input, head_gradients, mean_data, std_data)[source]

Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.

Parameters:
  • input (ndarray) – Single input point x, shape (d,)

  • head_gradients (Dict[str, ndarray]) – See Predictor.backward_gradient

  • mean_data (float) – Mean used to normalize targets

  • std_data (float) – Stddev used to normalize targets

Return type:

ndarray

Returns:

sample_curves(data, num_samples=1, random_state=None)[source]

Given data from one or more configurations (as returned by issm.prepare_data), for each config, sample a curve from the joint posterior (predictive) distribution over latent targets. The curve for each config in data may be partly observed, but must not be fully observed. Samples for the different configs are independent. None of the configs in data must appear in the dataset used to compute the posterior state.

The result is a list of dict, one for each config. If for a config, targets in data are given for resource values range(r_min, r_obs), the dict entry y is a joint sample [y_r], r in range(r_obs, r_max+1). For some subclasses (e.g., ISSM), there is also an entry f with a joint sample [f_r], r in range(r_obs-1, r_max+1), the latent function values before noise. These entries are matrices with num_samples columns, which are independent (the joint dependence is along the rows).

Parameters:
  • data (Dict[str, Any]) – Data for configs to predict at

  • num_samples (int) – Number of samples to draw from each curve

  • random_state (Optional[RandomState]) – PRNG state to be used for sampling

Return type:

List[dict]

Returns:

See above

static has_precomputations(data)[source]
Return type:

bool

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.posterior_state.IncrementalUpdateGPAdditivePosteriorState(data, mean, kernel, noise_variance, **kwargs)[source]

Bases: GaussProcAdditivePosteriorState

Extension of GaussProcAdditivePosteriorState which allows for incremental updating (single config added to the dataset). This is required for simulation-based scoring, and for support of fantasizing.

update(feature, targets)[source]
Return type:

IncrementalUpdateGPAdditivePosteriorState

update_pvec(feature, targets)[source]

Part of update: Only update prediction vector p. This cannot be used to update p for several new datapoints.

Parameters:
  • feature (ndarray) –

  • targets (ndarray) –

Return type:

ndarray

Returns:

New p vector

sample_and_update_for_pending(data_pending, sample_all_nonobserved=False, random_state=None)[source]

This function is needed for sampling fantasy targets, and also to support simulation-based scoring.

issm.prepare_data_with_pending creates two data dicts data_nopending, data_pending, the first for configs with observed data, but no pending evals, the second for configs with pending evals. You create the state with data_nopending, then call this method with data_pending.

This method is iterating over configs (or trials) in data_pending. For each config, it draws a joint sample from some non-observed targets, then updates the state conditioned on observed and sampled targets (by calling update). If sample_all_nonobserved is False, the number of targets sampled is the entry in data_pending['num_pending']. Otherwise, targets are sampled for all non-observed positions.

The method returns the list of sampled target vectors, and the state at the end (like update does as well).

Parameters:
  • data_pending (Dict[str, Any]) – See above

  • sample_all_nonobserved (bool) – See above

  • random_state (Optional[RandomState]) – PRNG

Return type:

(List[ndarray], IncrementalUpdateGPAdditivePosteriorState)

Returns:

pending_targets, final_state

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.posterior_state.GaussProcISSMPosteriorState(data, mean, kernel, iss_model, noise_variance, **kwargs)[source]

Bases: IncrementalUpdateGPAdditivePosteriorState

Represent posterior state for joint Gaussian model of learning curves over a number of configurations. The model is the sum of a Gaussian process model for function values at r_max and independent Gaussian linear innovation state space models (ISSMs) of a particular power law decay form.

static has_precomputations(data)[source]
Return type:

bool

predict(test_features)[source]

We compute marginals over f(x, r), where test_features are extended features. Note: The test configs must not overlap with any in the training set. Otherwise, at least if r != r_max, the predictive distributions computed here may be wrong.

Parameters:

test_features (ndarray) – Extended features for test configs

Return type:

Tuple[ndarray, ndarray]

Returns:

posterior_means, posterior_variances

static data_precomputations(data)[source]
update(feature, targets)[source]
Return type:

IncrementalUpdateGPAdditivePosteriorState

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.learncurve.posterior_state.GaussProcExpDecayPosteriorState(data, mean, kernel, res_kernel, noise_variance, **kwargs)[source]

Bases: IncrementalUpdateGPAdditivePosteriorState

Represent posterior state for joint Gaussian model of learning curves over a number of configurations. The model is the sum of a Gaussian process model for function values at r_max and independent Gaussian processes over r, using an exponential decay covariance function. The latter is shared between all configs.

This is essentially the model from the Freeze Thaw paper (see also ExponentialDecayResourcesKernelFunction).

static has_precomputations(data)[source]
Return type:

bool

predict(test_features)[source]

We compute marginals over f(x, r), where test_features are extended features. Note: The test configs must not overlap with any in the training set. Otherwise, at least if r != r_max, the predictive distributions computed here may be wrong.

Parameters:

test_features (ndarray) – Extended features for test configs

Return type:

Tuple[ndarray, ndarray]

Returns:

posterior_means, posterior_variances

static data_precomputations(data)[source]
update(feature, targets)[source]
Return type:

IncrementalUpdateGPAdditivePosteriorState

Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.constants module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.constants.OptimizationConfig(lbfgs_tol, lbfgs_maxiter, verbose, n_starts)[source]

Bases: object

lbfgs_tol: float
lbfgs_maxiter: int
verbose: bool
n_starts: int
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.constants.MCMCConfig(n_samples, n_burnin, n_thinning)[source]

Bases: object

n_samples is the total number of samples drawn. The first n_burnin of these are dropped (burn-in), and every n_thinning of the rest is returned. This means we return (n_samples - n_burnin) // n_thinning samples.

n_samples: int
n_burnin: int
n_thinning: int
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.custom_op module
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.custom_op.AddJitterOp(*args, **kwargs)

Finds smaller jitter to add to diagonal of square matrix to render the matrix positive definite (in that linalg.potrf works).

Given input x (positive semi-definite matrix) and sigsq_init (nonneg scalar), find sigsq_final (nonneg scalar), so that:

sigsq_final = sigsq_init + jitter, jitter >= 0,
x + sigsq_final * Id positive definite (so that potrf call works)

We return the matrix x + sigsq_final * Id, for which potrf has not failed.

For the gradient, the dependence of jitter on the inputs is ignored.

The values tried for sigsq_final are:

sigsq_init, sigsq_init + initial_jitter * (jitter_growth ** k), k = 0, 1, 2, ...,
initial_jitter = initial_jitter_factor * max(mean(diag(x)), 1)

Note: The scaling of initial_jitter with mean(diag(x)) is taken from GPy. The rationale is that the largest eigenvalue of x is >= mean(diag(x)), and likely of this magnitude.

There is no guarantee that the Cholesky factor returned is well-conditioned enough for subsequent computations to be reliable. A better solution would be to estimate the condition number of the Cholesky factor, and to add jitter until this is bounded below a threshold we tolerate. See

Higham, N.
A Survey of Condition Number Estimation for Triangular Matrices
MIMS EPrint: 2007.10

Algorithm 4.1 could work for us.

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.custom_op.flatten_and_concat(x, sigsq_init)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.custom_op.cholesky_factorization(*args, **kwargs)

Replacement for autograd.numpy.linalg.cholesky(). Our backward (vjp) is faster and simpler, while somewhat less general (only works if a.ndim == 2).

See https://arxiv.org/abs/1710.08717 for derivation of backward (vjp) expression.

Parameters:

a – Symmmetric positive definite matrix A

Returns:

Lower-triangular Cholesky factor L of A

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.Distribution[source]

Bases: object

negative_log_density(x)[source]

Negative log density. lower and upper limits are ignored. If x is not a scalar, the distribution is i.i.d. over all entries.

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.Gamma(mean, alpha)[source]

Bases: Distribution

Gamma(mean, alpha):

p(x) = C(alpha, beta) x^{alpha - 1} exp( -beta x), beta = alpha / mean, C(alpha, beta) = beta^alpha / Gamma(alpha)

negative_log_density(x)[source]

Negative log density. lower and upper limits are ignored. If x is not a scalar, the distribution is i.i.d. over all entries.

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.Uniform(lower, upper)[source]

Bases: Distribution

negative_log_density(x)[source]

Negative log density. lower and upper limits are ignored. If x is not a scalar, the distribution is i.i.d. over all entries.

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.Normal(mean, sigma)[source]

Bases: Distribution

negative_log_density(x)[source]

Negative log density. lower and upper limits are ignored. If x is not a scalar, the distribution is i.i.d. over all entries.

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.LogNormal(mean, sigma)[source]

Bases: Distribution

negative_log_density(x)[source]

Negative log density. lower and upper limits are ignored. If x is not a scalar, the distribution is i.i.d. over all entries.

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.distribution.Horseshoe(s)[source]

Bases: Distribution

negative_log_density(x)[source]

Negative log density. lower and upper limits are ignored. If x is not a scalar, the distribution is i.i.d. over all entries.

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon module

Gluon APIs for autograd

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon.Block(prefix=None, params=None)[source]

Bases: object

Base class for all neural network layers and models. Your models should subclass this class. Block can be nested recursively in a tree structure. You can create and assign child Block as regular attributes:

from mxnet.gluon import Block, nn
from mxnet import ndarray as F
class Model(Block):
    def __init__(self, **kwargs):
        super(Model, self).__init__(**kwargs)
        # use name_scope to give child Blocks appropriate names.
        with self.name_scope():
            self.dense0 = nn.Dense(20)
            self.dense1 = nn.Dense(20)
    def forward(self, x):
        x = F.relu(self.dense0(x))
        return F.relu(self.dense1(x))
model = Model()
model.initialize(ctx=mx.cpu(0))
model(F.zeros((10, 10), ctx=mx.cpu(0)))

Child Block assigned this way will be registered and collect_params() will collect their Parameters recursively. You can also manually register child blocks with register_child(). Parameters ———- prefix : str

Prefix acts like a name space. All children blocks created in parent block’s name_scope() will have parent block’s prefix in their name. Please refer to naming tutorial for more info on prefix and naming.

paramsParameterDict or None

ParameterDict for sharing weights with the new Block. For example, if you want dense1 to share dense0’s weights, you can do:

dense0 = nn.Dense(20)
dense1 = nn.Dense(20, params=dense0.collect_params())
property prefix

Prefix of this Block.

property name

Name of this Block, without ‘_’ in the end.

name_scope()[source]

Returns a name space object managing a child Block and parameter names. Should be used within a with statement:

with self.name_scope():
    self.dense = nn.Dense(20)

Please refer to the naming tutorial for more info on prefix and naming.

property params

Returns this Block’s parameter dictionary (does not include its children’s parameters).

collect_params(select=None)[source]

Returns a ParameterDict containing this Block and all of its children’s Parameters(default), also can returns the select ParameterDict which match some given regular expressions. For example, collect the specified parameters in [‘conv1_weight’, ‘conv1_bias’, ‘fc_weight’, ‘fc_bias’]:

model.collect_params('conv1_weight|conv1_bias|fc_weight|fc_bias')

or collect all parameters whose names end with ‘weight’ or ‘bias’, this can be done using regular expressions:

model.collect_params('.*weight|.*bias')
Parameters
selectstr

regular expressions

Returns

The selected ParameterDict

register_child(block, name=None)[source]

Registers block as a child of self. Block s assigned to self as attributes will be registered automatically.

apply(fn)[source]

Applies fn recursively to every child block as well as self. Parameters ———- fn : callable

Function to be applied to each submodule, of form fn(block).

Returns

this block

initialize(init=None, ctx=None, verbose=False, force_reinit=False)[source]

Initializes Parameter s of this Block and its children. Equivalent to block.collect_params().initialize(...) Parameters ———- init : Initializer

Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

ctxContext or list of Context

Keeps a copy of Parameters on one or many context(s).

verbosebool, default False

Whether to verbosely print out details on initialization.

force_reinitbool, default False

Whether to force re-initialization if parameter is already initialized.

hybridize(active=True, **kwargs)[source]

Please refer description of HybridBlock hybridize().

cast(dtype)[source]

Cast this Block to use another data type. Parameters ———- dtype : str or numpy.dtype

The new data type.

forward(*args)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

hybrid_forward(*args)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon.Parameter(name, grad_req='write', shape=None, dtype=<class 'numpy.float64'>, lr_mult=1.0, wd_mult=1.0, init=None, allow_deferred_init=False, differentiable=True, stype='default', grad_stype='default')[source]

Bases: object

A Container holding parameters (weights) of Blocks. Parameter holds a copy of the parameter on each Context after it is initialized with Parameter.initialize(...). If grad_req is not 'null', it will also hold a gradient array on each Context:

x = np.zeros((16, 100))
w = Parameter('fc_weight', shape=(16, 100), init=np.random.uniform)
w.initialize()
b.initialize()
z = x + w.data
Parameters
namestr

Name of this parameter.

grad_req{‘write’, ‘add’, ‘null’}, default ‘write’

Specifies how to update gradient to grad arrays. - 'write' means everytime gradient is written to grad NDArray. - 'add' means everytime gradient is added to the grad NDArray. You need

to manually call zero_grad() to clear the gradient buffer before each iteration when using this option.

  • ‘null’ means gradient is not requested for this parameter. gradient arrays will not be allocated.

shapeint or tuple of int, default None

Shape of this parameter. By default shape is not specified. Parameter with unknown shape can be used for Symbol API, but init will throw an error when using NDArray API.

dtypenumpy.dtype or str, default ‘float64’

Data type of this parameter. For example, numpy.float64 or 'float64'.

lr_multfloat, default 1.0

Learning rate multiplier. Learning rate will be multiplied by lr_mult when updating this parameter with optimizer.

wd_multfloat, default 1.0

Weight decay multiplier (L2 regularizer coefficient). Works similar to lr_mult.

initInitializer, default None

Initializer of this parameter. Will use the global initializer by default.

stype: {‘default’, ‘row_sparse’, ‘csr’}, defaults to ‘default’.

The storage type of the parameter.

grad_stype: {‘default’, ‘row_sparse’, ‘csr’}, defaults to ‘default’.

The storage type of the parameter’s gradient.

Attributes
grad_req{‘write’, ‘add’, ‘null’}

This can be set before or after initialization. Setting grad_req to 'null' with x.grad_req = 'null' saves memory and computation when you don’t need gradient w.r.t x.

lr_multfloat

Local learning rate multiplier for this Parameter. The actual learning rate is calculated with learning_rate * lr_mult. You can set it with param.lr_mult = 2.0

wd_multfloat

Local weight decay multiplier for this Parameter.

property grad_req
property dtype

The type of the parameter. Setting the dtype value is equivalent to casting the value of the parameter

property shape

The shape of the parameter. By default, an unknown dimension size is 0. However, when the NumPy semantic is turned on, unknown dimension size is -1.

initialize(init=None, ctx=None, default_init=None, force_reinit=False)[source]

Initializes parameter and gradient arrays. Only used for NDArray API. Parameters ———- init : Initializer

The initializer to use. Overrides Parameter.init() and default_init.

ctxContext or list of Context, defaults to context.current_context().

Initialize Parameter on given context. If ctx is a list of Context, a copy will be made for each context. .. note:

Copies are independent arrays. User is responsible for keeping
their values consistent when updating.
Normally :py:class:`gluon.Trainer` does this for you.
default_initInitializer

Default initializer is used when both init() and Parameter.init() are None.

force_reinitbool, default False

Whether to force re-initialization if parameter is already initialized.

Examples
>>> weight = mx.gluon.Parameter('weight', shape=(2, 2))
>>> weight.initialize(ctx=mx.cpu(0))
>>> weight.data()
[[-0.01068833  0.01729892]
 [ 0.02042518 -0.01618656]]
<NDArray 2x2 @cpu(0)>
>>> weight.grad()
[[ 0.  0.]
 [ 0.  0.]]
<NDArray 2x2 @cpu(0)>
>>> weight.initialize(ctx=[mx.gpu(0), mx.gpu(1)])
>>> weight.data(mx.gpu(0))
[[-0.00873779 -0.02834515]
 [ 0.05484822 -0.06206018]]
<NDArray 2x2 @gpu(0)>
>>> weight.data(mx.gpu(1))
[[-0.00873779 -0.02834515]
 [ 0.05484822 -0.06206018]]
<NDArray 2x2 @gpu(1)>
reset_ctx(ctx)[source]

Re-assign Parameter to other contexts. Parameters ———- ctx : Context or list of Context, default context.current_context().

Assign Parameter to given context. If ctx is a list of Context, a copy will be made for each context.

set_data(data)[source]

Sets this parameter’s value on all contexts.

data(ctx=None)[source]

Returns a copy of this parameter on one context. Must have been initialized on this context before. For sparse parameters, use Parameter.row_sparse_data() instead. Parameters ———- ctx : Context

Desired context.

Returns

NDArray on ctx

list_data()[source]

Returns copies of this parameter on all contexts, in the same order as creation. For sparse parameters, use Parameter.list_row_sparse_data() instead. Returns ——- list of NDArrays

grad(ctx=None)[source]

Returns a gradient buffer for this parameter on one context. Parameters ———- ctx : Context

Desired context.

list_grad()[source]

Returns gradient buffers on all contexts, in the same order as values().

list_ctx()[source]

Returns a list of contexts this parameter is initialized on.

zero_grad()[source]

Sets gradient buffer on all contexts to 0. No action is taken if parameter is uninitialized or doesn’t require gradient.

cast(dtype)[source]

Cast data and gradient of this Parameter to a new data type. Parameters ———- dtype : str or numpy.dtype

The new data type.

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon.ParameterDict(prefix='', shared=None)[source]

Bases: object

A dictionary managing a set of parameters. Parameters ———- prefix : str, default ''

The prefix to be prepended to all Parameters’ names created by this dict.

sharedParameterDict or None

If not None, when this dict’s get() method creates a new parameter, will first try to retrieve it from “shared” dict. Usually used for sharing parameters with another Block.

items()[source]
keys()[source]
values()[source]
property prefix

Prefix of this dict. It will be prepended to Parameter`s' name created with :py:func:`get.

get(name, **kwargs)[source]

Retrieves a Parameter with name self.prefix+name. If not found, get() will first try to retrieve it from “shared” dict. If still not found, get() will create a new Parameter with key-word arguments and insert it to self. Parameters ———- name : str

Name of the desired Parameter. It will be prepended with this dictionary’s prefix.

**kwargsDict[str, Any]

The rest of key-word arguments for the created Parameter.

Returns
Parameter

The created or retrieved Parameter.

update(other)[source]

Copies all Parameters in other to self.

initialize(init=None, ctx=None, verbose=False, force_reinit=False)[source]

Initializes all Parameters managed by this dictionary to be used for NDArray API. It has no effect when using Symbol API. Parameters ———- init : Initializer

Global default Initializer to be used when Parameter.init() is None. Otherwise, Parameter.init() takes precedence.

ctxContext or list of Context

Keeps a copy of Parameters on one or many context(s).

verbosebool, default False

Whether to verbosely print out details on initialization.

force_reinitbool, default False

Whether to force re-initialization if parameter is already initialized.

reset_ctx(ctx)[source]

Re-assign all Parameters to other contexts. Parameters ———- ctx : Context or list of Context, default context.current_context().

Assign Parameter to given context. If ctx is a list of Context, a copy will be made for each context.

list_ctx()[source]

Returns a list of all the contexts on which the underlying Parameters are initialized.

setattr(name, value)[source]

Set an attribute to a new value for all Parameters. For example, set grad_req to null if you don’t need gradient w.r.t a model’s Parameters:

model.collect_params().setattr('grad_req', 'null')
or change the learning rate multiplier::

model.collect_params().setattr(‘lr_mult’, 0.5)

Parameters
namestr

Name of the attribute.

valuevalid type for attribute name

The new value for the attribute.

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.ConstantPositiveVector(param_name, encoding, size_cols, **kwargs)[source]

Bases: Block

Represents constant vector, with positive entry value represented as Gluon parameter, to be used in the context of wrapper classes in gluon_blocks.py. Shape, dtype, and context are determined from the features argument:

  • If features.shape = (n, d): shape = (d, 1) if size_cols = True (number cols of features) shape = (n, 1) if size_cols = False (number rows of features)

  • dtype = features.dtype, ctx = features.ctx

Encoding and internal Gluon parameter: The positive scalar parameter is encoded via encoding (see ScalarEncodingBase). The internal Gluon parameter (before encoding) has the name param_name + "_internal".

forward(features, param_internal)[source]

Returns constant positive vector

If features.shape = (n, d), the shape of the vector returned is (d, 1) if size_cols = True, (n, 1) otherwise.

Parameters:
  • features – Matrix for shape, dtype, ctx

  • param_internal – Unwrapped parameter

Returns:

Constant positive vector

set(val)[source]
get()[source]
get_box_constraints_internal()[source]
log_parameters()[source]
get_parameters()[source]
switch_updating(flag)[source]

Is the underlying parameter updated during learning?

By default, the parameter takes part in learning (its grad_req attribute is ‘write’). For flag == False, the attribute is flipped to ‘null’, and the parameter remains constant during learning.

Parameters:

flag – Update parameter during learning?

has_regularizer()[source]
eval_regularizer(features)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.PositiveScalarEncoding(lower, constr_upper=None, init_val=None, regularizer=None, dimension=1)[source]

Bases: ScalarEncodingBase

Provides encoding for positive scalar and vector: param > lower. Here, param is represented as gluon.Parameter. The param is with shape (dimension,) where dimension is 1 by default.

The encoding is given as:

param = softrelu(param_internal) + lower, softrelu(x) = log(1 + exp(x))

If constr_upper is used, the constraint

param_internal < dec(constr_upper)

can be enforced by an optimizer. Since dec is increasing, this translates to param < constr_upper. Note: While lower is enforced by the encoding, the upper bound is not, has to be enforced by an optimizer.

get(param_internal)[source]
decode(val, name)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.IdentityScalarEncoding(constr_lower=None, constr_upper=None, init_val=None, regularizer=None, dimension=1)[source]

Bases: ScalarEncodingBase

Identity encoding for scalar and vector:

param = param_internal

This does not ensure that param is positive! Use this only if positivity is otherwise guaranteed.

get(param_internal)[source]
decode(val, name)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.LogarithmScalarEncoding(constr_lower=None, constr_upper=None, init_val=None, regularizer=None, dimension=1)[source]

Bases: ScalarEncodingBase

Logarithmic encoding for scalar and vector:

param = exp(param_internal), param_internal = param

get(param_internal)[source]
decode(val, name)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.unwrap_parameter(param_internal, some_arg=None)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.encode_unwrap_parameter(param_internal, encoding, some_arg=None)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.param_to_pretty_string(gluon_param, encoding)[source]

Take a gluon parameter and transform it to a string amenable to plotting If need be, the gluon parameter is appropriately encoded (e.g., log-exp transform).

Parameters:
  • gluon_param (Parameter) – gluon parameter

  • encoding (ScalarEncodingBase) – object in charge of encoding/decoding the gluon_param

Return type:

str

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.register_parameter(params, name, encoding, shape=(1, ), dtype=<class 'numpy.float64'>)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gluon_blocks_helpers.create_encoding(encoding_name, init_val, constr_lower, constr_upper, dimension, prior)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gp_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gp_model.GaussianProcessModel(random_seed=None)[source]

Bases: object

Base class for Gaussian-linear models which support parameter fitting and prediction.

property random_state: RandomState
property states: List[PosteriorState] | None
Returns:

Current posterior states (one per MCMC sample; just a single state if model parameters are optimized)

fit(data)[source]

Adjust model parameters based on training data data. Can be done via optimization or MCMC sampling. The posterior states are computed at the end as well.

Parameters:

data (Dict[str, Any]) – Training data

recompute_states(data)[source]

Recomputes posterior states for current model parameters.

Parameters:

data (Dict[str, Any]) – Training data

predict(features_test)[source]

Compute the posterior mean(s) and variance(s) for the points in features_test. If the posterior state is based on m target vectors, a (n, m) matrix is returned for posterior means.

Parameters:

features_test (ndarray) – Data matrix X_test of size (n, d) (type np.ndarray) for which n predictions are made

Returns:

posterior_means, posterior_variances

multiple_targets()[source]
Returns:

Posterior state based on multiple (fantasized) target

sample_marginals(features_test, num_samples=1)[source]

Draws marginal samples from predictive distribution at n test points. Notice we concat the samples for each state. Let n_states = len(self._states)

If the posterior state is based on m > 1 target vectors, a (n, m, num_samples * n_states) tensor is returned, for m == 1 we return a (n, num_samples * n_states) matrix.

Parameters:
  • features_test (ndarray) – Test input points, shape (n, d)

  • num_samples (int) – Number of samples

Returns:

Samples with shape (n, num_samples * n_states) or (n, m, num_samples * n_states) if m > 1

sample_joint(features_test, num_samples=1)[source]

Draws joint samples from predictive distribution at n test points. This scales cubically with n. the posterior state must be based on a single target vector (m > 1 is not supported).

Parameters:
  • features_test (ndarray) – Test input points, shape (n, d)

  • num_samples (int) – Number of samples

Returns:

Samples, shape (n, num_samples)

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gp_model.GaussianProcessOptimizeModel(optimization_config=None, random_seed=None, fit_reset_params=True)[source]

Bases: GaussianProcessModel

Base class for models where parameters are fit by maximizing the marginal likelihood.

property states: List[PosteriorState] | None
Returns:

Current posterior states (one per MCMC sample; just a single state if model parameters are optimized)

property likelihood: MarginalLikelihood
fit(data)[source]

Fit the model parameters by optimizing the marginal likelihood, and set posterior states.

We catch exceptions during the optimization restarts. If any restarts fail, log messages are written. If all restarts fail, the current parameters are not changed.

Parameters:

data (Dict[str, Any]) – Input data

recompute_states(data)[source]

Recomputes posterior states for current model parameters.

Parameters:

data (Dict[str, Any]) – Training data

get_params()[source]
Return type:

Dict[str, Any]

set_params(param_dict)[source]
reset_params()[source]

Reset hyperparameters to their initial values (or resample them).

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gp_regression module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gp_regression.GaussianProcessRegression(kernel, mean=None, target_transform=None, initial_noise_variance=None, optimization_config=None, random_seed=None, fit_reset_params=True)[source]

Bases: GaussianProcessOptimizeModel

Gaussian Process Regression

Takes as input a mean function (which depends on X only) and a kernel function.

Parameters:
  • kernel (KernelFunction) – Kernel function

  • mean (Optional[MeanFunction]) – Mean function which depends on the input X only (by default, a scalar fitted while optimizing the likelihood)

  • target_transform (Optional[ScalarTargetTransform]) – Invertible transform of target values y to latent values z, which are then modelled as Gaussian. Defaults to the identity

  • initial_noise_variance (Optional[float]) – Initial value for noise variance parameter

  • optimization_config (Optional[OptimizationConfig]) – Configuration that specifies the behavior of the optimization of the marginal likelihood.

  • random_seed – Random seed to be used (optional)

  • fit_reset_params (bool) – Reset parameters to initial values before running ‘fit’? If False, ‘fit’ starts from the current values

property likelihood: MarginalLikelihood
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gpr_mcmc module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.gpr_mcmc.GPRegressionMCMC(build_kernel, mcmc_config=MCMCConfig(n_samples=300, n_burnin=250, n_thinning=5), random_seed=None)[source]

Bases: GaussianProcessModel

property states: List[GaussProcPosteriorState] | None
Returns:

Current posterior states (one per MCMC sample; just a single state if model parameters are optimized)

property number_samples: int
fit(data)[source]

Adjust model parameters based on training data data. Can be done via optimization or MCMC sampling. The posterior states are computed at the end as well.

Parameters:

data (Dict[str, Any]) – Training data

recompute_states(data)[source]

Supports fantasizing, in that targets can be a matrix. Then, ycols = targets.shape[1] must be a multiple of self.number_samples.

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.likelihood module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.likelihood.MarginalLikelihood(prefix=None, params=None)[source]

Bases: Block

Interface for marginal likelihood of Gaussian-linear model.

get_posterior_state(data)[source]
Return type:

PosteriorState

forward(data)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]

Return a list of tuples with the Gluon parameters of the likelihood and their respective encodings

Return type:

List[tuple]

box_constraints_internal()[source]
Return type:

Dict[str, Tuple[float, float]]

Returns:

Box constraints for all the underlying parameters

get_noise_variance(as_ndarray=False)[source]
get_params()[source]
Return type:

Dict[str, ndarray]

set_params(param_dict)[source]
reset_params(random_state)[source]

Reset hyperparameters to their initial values (or resample them).

data_precomputations(data, overwrite=False)[source]

Some models require precomputations based on data. Precomputed variables are appended to data. This is done only if not already included in data, unless overwrite is True.

Parameters:
  • data (Dict[str, Any]) –

  • overwrite (bool) –

on_fit_start(data)[source]

Called at the beginning of fit.

Parameters:

data (Dict[str, Any]) – Argument passed to fit

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.likelihood.GaussianProcessMarginalLikelihood(kernel, mean=None, target_transform=None, initial_noise_variance=None, encoding_type=None, **kwargs)[source]

Bases: MarginalLikelihood

Marginal likelihood of Gaussian process with Gaussian likelihood

Parameters:
  • kernel (KernelFunction) – Kernel function

  • mean (Optional[MeanFunction]) – Mean function which depends on the input X only (by default, a scalar fitted while optimizing the likelihood)

  • target_transform (Optional[ScalarTargetTransform]) – Invertible transform of target values y to latent values z, which are then modelled as Gaussian. Defaults to the identity

  • initial_noise_variance – A scalar to initialize the value of the residual noise variance

static assert_data_entries(data)[source]
get_posterior_state(data)[source]
Return type:

PosteriorState

forward(data)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]

Return a list of tuples with the Gluon parameters of the likelihood and their respective encodings

Return type:

List[tuple]

get_noise_variance(as_ndarray=False)[source]
get_params()[source]
Return type:

Dict[str, ndarray]

set_params(param_dict)[source]
on_fit_start(data)[source]

Called at the beginning of fit.

Parameters:

data (Dict[str, Any]) – Argument passed to fit

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean.MeanFunction(**kwargs)[source]

Bases: Block

Mean function, parameterizing a surrogate model together with a kernel function.

Note: KernelFunction also inherits from this interface.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean.ScalarMeanFunction(initial_mean_value=0.0, **kwargs)[source]

Bases: MeanFunction

Mean function defined as a scalar (fitted while optimizing the marginal likelihood).

Parameters:

initial_mean_value – A scalar to initialize the value of the mean

forward(X)[source]

Actual computation of the scalar mean function We compute mean_value * vector_of_ones, whose dimensions are given by the the first column of X

Parameters:

X – input data of size (n,d) for which we want to compute the mean (here, only useful to extract the right dimension)

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_mean_value()[source]
set_mean_value(mean_value)[source]
get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.mean.ZeroMeanFunction(**kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.optimization_utils module
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.optimization_utils.apply_lbfgs(exec_func, param_dict, bounds, **kwargs)[source]

Run SciPy L-BFGS-B on criterion given by autograd code

Run SciPy L-BFGS-B in order to minimize criterion given by autograd code. Criterion and gradient are computed by:

crit_val, gradient = exec_func(param_vec)

Given an autograd expression, use make_scipy_objective to obtain exec_func. param_vec must correspond to the parameter dictionary param_dict via ParamVecDictConverter. The initial param_vec is taken from param_dict, and final values are written back to param_dict (conversions are done by ParamVecDictConverter).

L-BFGS-B allows box constraints [a, b] for any coordinate. Here, None stands for -infinity (a) or +infinity (b). The default is (None, None), so no constraints. In bounds, box constraints can be specified per argument (the constraint applies to all coordinates of the argument). Pass {} for no constraints.

Parameters:
  • exec_func – Function to compute criterion and gradient

  • param_dict – See above

  • bounds – See above

Returns:

None, or dict with info about exception caught

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.optimization_utils.apply_lbfgs_with_multiple_starts(exec_func, param_dict, bounds, random_state, n_starts=5, **kwargs)[source]

When dealing with non-convex problems (e.g., optimization the marginal likelihood), we typically need to start from various starting points. This function applies this logic around apply_lbfgs, randomizing the starting points around the initial values provided in param_dict (see below “copy_of_initial_param_dict”).

The first optimization happens exactly at param_dict, so that the case n_starts=1 exactly coincides with the previously used apply_lbfgs. Importantly, the communication with the L-BFGS solver happens via param_dict, hence all the operations with respect to param_dict are inplace.

We catch exceptions and return ret_infos about these. If none of the restarts worked, param_dict is not modified.

Parameters:
  • exec_func – see above

  • param_dict – see above

  • bounds – see above

  • random_state – RandomState for sampling

  • n_starts – Number of times we start an optimization with L-BFGS (must be >= 1)

Returns:

List ret_infos of length n_starts. Entry is None if optimization worked, or otherwise has dict with info about exception caught

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.optimization_utils.add_regularizer_to_criterion(criterion, crit_args)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.optimization_utils.create_lbfgs_arguments(criterion, crit_args, verbose=False)[source]

Creates SciPy optimizer objective and param_dict for criterion function.

Parameters:
  • criterion (MarginalLikelihood) – Learning criterion (nullary)

  • crit_args (list) – Arguments for criterion.forward

Returns:

scipy_objective, param_dict

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state.PosteriorState[source]

Bases: object

Interface for posterior state of Gaussian-linear model.

property num_data
property num_features
property num_fantasies
neg_log_likelihood()[source]
Return type:

ndarray

Returns:

Negative log marginal likelihood

predict(test_features)[source]

Computes marginal statistics (means, variances) for a number of test features.

Parameters:

test_features (ndarray) – Features for test configs

Return type:

Tuple[ndarray, ndarray]

Returns:

posterior_means, posterior_variances

sample_marginals(test_features, num_samples=1, random_state=None)[source]

See comments of predict.

Parameters:
  • test_features (ndarray) – Input points for test configs

  • num_samples (int) – Number of samples

  • random_state (Optional[RandomState]) – PRNG

Return type:

ndarray

Returns:

Marginal samples, (num_test, num_samples)

backward_gradient(input, head_gradients, mean_data, std_data)[source]

Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.

Parameters:
  • input (ndarray) – Single input point x, shape (d,)

  • head_gradients (Dict[str, ndarray]) – See Predictor.backward_gradient

  • mean_data (float) – Mean used to normalize targets

  • std_data (float) – Stddev used to normalize targets

Return type:

ndarray

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state.PosteriorStateWithSampleJoint[source]

Bases: PosteriorState

sample_joint(test_features, num_samples=1, random_state=None)[source]

See comments of predict.

Parameters:
  • test_features (ndarray) – Input points for test configs

  • num_samples (int) – Number of samples

  • random_state (Optional[RandomState]) – PRNG

Return type:

ndarray

Returns:

Joint samples, (num_test, num_samples)

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state.GaussProcPosteriorState(features, targets, mean, kernel, noise_variance, debug_log=False, **kwargs)[source]

Bases: PosteriorStateWithSampleJoint

Represent posterior state for Gaussian process regression model. Note that members are immutable. If the posterior state is to be updated, a new object is created and returned.

property num_data
property num_features
property num_fantasies
neg_log_likelihood()[source]

Works only if fantasy samples are not used (single targets vector).

Return type:

ndarray

predict(test_features)[source]

Computes marginal statistics (means, variances) for a number of test features.

Parameters:

test_features (ndarray) – Features for test configs

Return type:

Tuple[ndarray, ndarray]

Returns:

posterior_means, posterior_variances

sample_marginals(test_features, num_samples=1, random_state=None)[source]

See comments of predict.

Parameters:
  • test_features (ndarray) – Input points for test configs

  • num_samples (int) – Number of samples

  • random_state (Optional[RandomState]) – PRNG

Return type:

ndarray

Returns:

Marginal samples, (num_test, num_samples)

backward_gradient(input, head_gradients, mean_data, std_data)[source]

Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.

The posterior represented here is based on normalized data, while the acquisition function is based on the de-normalized predictive distribution, which is why we need ‘mean_data’, ‘std_data’ here.

Parameters:
  • input (ndarray) – Single input point x, shape (d,)

  • head_gradients (Dict[str, ndarray]) – See Predictor.backward_gradient

  • mean_data (float) – Mean used to normalize targets

  • std_data (float) – Stddev used to normalize targets

Return type:

ndarray

Returns:

sample_joint(test_features, num_samples=1, random_state=None)[source]

See comments of predict.

Parameters:
  • test_features (ndarray) – Input points for test configs

  • num_samples (int) – Number of samples

  • random_state (Optional[RandomState]) – PRNG

Return type:

ndarray

Returns:

Joint samples, (num_test, num_samples)

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state.backward_gradient_given_predict(predict_func, input, head_gradients, mean_data, std_data)[source]

Implements Predictor.backward_gradient, see comments there. This is for a single posterior state. If the Predictor uses MCMC, have to call this for every sample.

The posterior represented here is based on normalized data, while the acquisition function is based on the de-normalized predictive distribution, which is why we need ‘mean_data’, ‘std_data’ here.

Parameters:
  • predict_func (Callable[[ndarray], Tuple[ndarray, ndarray]]) – Function mapping input x to mean, variance

  • input (ndarray) – Single input point x, shape (d,)

  • head_gradients (Dict[str, ndarray]) – See Predictor.backward_gradient

  • mean_data (float) – Mean used to normalize targets

  • std_data (float) – Stddev used to normalize targets

Return type:

ndarray

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_state.IncrementalUpdateGPPosteriorState(features, targets, mean, kernel, noise_variance, **kwargs)[source]

Bases: GaussProcPosteriorState

Extension of GaussProcPosteriorState which allows for incremental updating, given that a single data case is appended to the training set.

In order to not mutate members, “the update method returns a new object.”

update(feature, target)[source]
Parameters:
  • feature (ndarray) – Additional input xstar, shape (1, d)

  • target (ndarray) – Additional target ystar, shape (1, m)

Return type:

IncrementalUpdateGPPosteriorState

Returns:

Posterior state for increased data set

sample_and_update(feature, mean_impute_mask=None, random_state=None)[source]

Draw target(s), shape (1, m), from current posterior state, then update state based on these. The main computation of lvec is shared among the two. If mean_impute_mask is given, it is a boolean vector of size m (number columns of pred_mat). Columns j of target, where mean_impute_ mask[j] is true, are set to the predictive mean (instead of being sampled).

Parameters:
  • feature (ndarray) – Additional input xstar, shape (1, d)

  • mean_impute_mask – See above

  • random_state (Optional[RandomState]) – PRN generator

Return type:

(ndarray, IncrementalUpdateGPPosteriorState)

Returns:

target, poster_state_new

expand_fantasies(num_fantasies)[source]

If this posterior has been created with a single targets vector, shape (n, 1), use this to duplicate this vector m = num_fantasies times. Call this method before fantasy targets are appended by update.

Parameters:

num_fantasies (int) – Number m of fantasy samples

Return type:

IncrementalUpdateGPPosteriorState

Returns:

New state with targets duplicated m times

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils module
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.cholesky_computations(features, targets, mean, kernel, noise_variance, debug_log=False)[source]

Given input matrix X (features), target matrix Y (targets), mean and kernel function, compute posterior state {L, P}, where L is the Cholesky factor of

k(X, X) + sigsq_final * I

and

L P = Y - mean(X)

Here, sigsq_final >= noise_variance is minimal such that the Cholesky factorization does not fail.

Parameters:
  • features – Input matrix X (n, d)

  • targets – Target matrix Y (n, m)

  • mean (MeanFunction) – Mean function

  • kernel (Union[KernelFunction, Tuple[KernelFunction, ndarray]]) – Kernel function, or tuple

  • noise_variance – Noise variance (may be increased)

  • debug_log (bool) – Debug output during add_jitter CustomOp?

Returns:

L, P

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.predict_posterior_marginals(features, mean, kernel, chol_fact, pred_mat, test_features)[source]

Computes posterior means and variances for test_features. If pred_mat is a matrix, so will be posterior_means, but not posterior_variances. Reflects the fact that for GP regression and fixed hyperparameters, the posterior mean depends on the targets y, but the posterior covariance does not.

Parameters:
  • features – Training inputs

  • mean (MeanFunction) – Mean function

  • kernel (Union[KernelFunction, Tuple[KernelFunction, ndarray]]) – Kernel function, or tuple

  • chol_fact – Part L of posterior state

  • pred_mat – Part P of posterior state

  • test_features – Test inputs

Returns:

posterior_means, posterior_variances

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.sample_posterior_marginals(features, mean, kernel, chol_fact, pred_mat, test_features, random_state, num_samples=1)[source]

Draws num_sample samples from the product of marginals of the posterior over input points test_features. If pred_mat is a matrix with m columns, the samples returned have shape (n_test, m, num_samples).

Parameters:
  • features – Training inputs

  • mean (MeanFunction) – Mean function

  • kernel (Union[KernelFunction, Tuple[KernelFunction, ndarray]]) – Kernel function, or tuple

  • chol_fact – Part L of posterior state

  • pred_mat – Part P of posterior state

  • test_features – Test inputs

  • num_samples (int) – Number of samples to draw

Returns:

Samples, shape (n_test, num_samples) or (n_test, m, num_samples)

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.sample_posterior_joint(features, mean, kernel, chol_fact, pred_mat, test_features, random_state, num_samples=1)[source]

Draws num_sample samples from joint posterior distribution over inputs test_features. This is done by computing mean and covariance matrix of this posterior, and using the Cholesky decomposition of the latter. If pred_mat is a matrix with m columns, the samples returned have shape (n_test, m, num_samples).

Parameters:
  • features – Training inputs

  • mean (MeanFunction) – Mean function

  • kernel (Union[KernelFunction, Tuple[KernelFunction, ndarray]]) – Kernel function, or tuple

  • chol_fact – Part L of posterior state

  • pred_mat – Part P of posterior state

  • test_features – Test inputs

  • num_samples (int) – Number of samples to draw

Returns:

Samples, shape (n_test, num_samples) or (n_test, m, num_samples)

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.cholesky_update(features, mean, kernel, chol_fact, pred_mat, noise_variance, feature, target, lvec=None)[source]

Incremental update of posterior state (Cholesky factor, prediction matrix), given one datapoint (feature, target).

Note: noise_variance is the initial value, before any jitter may have been added to compute chol_fact. Here, we add the minimum amount of jitter such that the new diagonal entry of the Cholesky factor is >= MIN_CHOLESKY_DIAGONAL_VALUE. This means that if cholesky_update is used several times, we in fact add a diagonal (but not spherical) jitter matrix.

Parameters:
  • features – Shape (n, d)

  • chol_fact – Shape (n, n)

  • pred_mat – Shape (n, m)

  • mean (MeanFunction) –

  • kernel (Union[KernelFunction, Tuple[KernelFunction, ndarray]]) –

  • noise_variance

  • feature – Shape (1, d)

  • target – Shape (1, m)

  • lvec – If given, this is the new column of the Cholesky factor except the diagonal entry. If not, this is computed here

Returns:

chol_fact_new (n+1, n+1), pred_mat_new (n+1, m)

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.sample_and_cholesky_update(features, mean, kernel, chol_fact, pred_mat, noise_variance, feature, random_state, mean_impute_mask=None)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.negative_log_marginal_likelihood(chol_fact, pred_mat)[source]

The marginal likelihood is only computed if pred_mat has a single column (not for fantasy sample case).

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.slice module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.slice.SliceSampler(log_density, scale, random_state)[source]

Bases: object

sample(init_sample, num_samples, burn, thin)[source]
Return type:

List[ndarray]

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.slice.gen_random_direction(dimension, random_state)[source]
Return type:

ndarray

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.slice.slice_sampler_step_out(log_pivot, scale, sliced_log_density, random_state)[source]
Return type:

Tuple[float, float]

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.slice.slice_sampler_step_in(lower_bound, upper_bound, log_pivot, sliced_log_density, random_state)[source]

Find the right amount of movement along with a random_direction

Return type:

float

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.target_transform module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.target_transform.ScalarTargetTransform(**kwargs)[source]

Bases: MeanFunction

Interface for invertible transforms of scalar target values.

forward() maps original target values \(y\) to latent target values \(z\), the latter are typically modelled as Gaussian. negative_log_jacobian() returns the term to be added to \(-\log P(z)\), where \(z\) is mapped from \(y\), in order to obtain \(-\log P(y)\).

forward(targets)[source]
Parameters:

targets – Target vector \(y\) in original form

Returns:

Transformed latent target vector \(z\)

negative_log_jacobian(targets)[source]
Parameters:

targets – Target vector \(y\) in original form

Returns:

Term to add to \(-\log P(z)\) to obtain \(-\log P(y)\)

inverse(latents)[source]
Parameters:

latents – Latent target vector \(z\)

Returns:

Corresponding target vector \(y\)

on_fit_start(targets)[source]

This is called just before the surrogate model optimization starts.

Parameters:

targets – Target vector \(y\) in original form

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.target_transform.IdentityTargetTransform(**kwargs)[source]

Bases: ScalarTargetTransform

forward(targets)[source]
Parameters:

targets – Target vector \(y\) in original form

Returns:

Transformed latent target vector \(z\)

negative_log_jacobian(targets)[source]
Parameters:

targets – Target vector \(y\) in original form

Returns:

Term to add to \(-\log P(z)\) to obtain \(-\log P(y)\)

inverse(latents)[source]
Parameters:

latents – Latent target vector \(z\)

Returns:

Corresponding target vector \(y\)

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.target_transform.BoxCoxTargetTransform(initial_boxcox_lambda=None, **kwargs)[source]

Bases: ScalarTargetTransform

The Box-Cox transform for \(y > 0\) is parameterized in terms of \(\lambda\):

\[ \begin{align}\begin{aligned}z = T(y, \lambda) = \frac{y^{\lambda} - 1}{\lambda},\quad \lambda\ne 0\\T(y, \lambda=0) = \log y\end{aligned}\end{align} \]

One difficulty is that expressions involve division by \(\lambda\). Our implementation separates between (1) \(\lambda \ge \varepsilon\), (2) \(\lambda\le -\varepsilon\), and (3) \(-\varepsilon < \lambda < \varepsilon\), where \(\varepsilon\) is BOXCOX_LAMBDA_EPS. In case (3), we use the approximation \(z \approx u + \lambda u^2/2\), where \(u = \log y\).

Note that we require \(1 + z\lambda > 0\), which restricts \(z\) if \(\lambda\ne 0\).

Note

Targets must be positive. They are thresholded at BOXCOX_TARGET_THRES, so negative targets do not raise an error.

The Box-Cox transform has been proposed in the content of Bayesian optimization by

Cowen-Rivers, A. et.al.
HEBO: Pushing the Limits of Sample-efficient Hyper-parameter Optimisation
Journal of Artificial Intelligence Research 74 (2022), 1269-1349

However, they decouple the transformation of targets from fitting the remaining surrogate model parameters, which is possible only under a simplifying assumption (namely, that targets after transform are modelled i.i.d. by a single univariate Gaussian). Instead, we treat \(\lambda\) as just one more parameter to fit along with all the others.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_boxcox_lambda()[source]
set_boxcox_lambda(boxcox_lambda)[source]
get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

negative_log_jacobian(targets)[source]
Parameters:

targets – Target vector \(y\) in original form

Returns:

Term to add to \(-\log P(z)\) to obtain \(-\log P(y)\)

forward(targets)[source]
Parameters:

targets – Target vector \(y\) in original form

Returns:

Transformed latent target vector \(z\)

inverse(latents)[source]

The inverse is \(\exp( \log(1 + z\lambda) / \lambda )\). For \(\lambda\approx 0\), we use \(\exp( z (1 - z\lambda/2) )\).

We also need \(1 + z\lambda > 0\), so we use the maximum of \(z lambda\) and BOXCOX_ZLAMBDA_THRES.

on_fit_start(targets)[source]

We only optimize boxcox_lambda once there are no less than BOXCOX_LAMBDA_OPT_MIN_NUMDATA data points. Otherwise, it remains fixed to its initial value.

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping module
class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping.Warping(dimension, coordinate_range=None, encoding_type='logarithm', **kwargs)[source]

Bases: MeanFunction

Warping transform on contiguous range of feature \(x\). Each warped coordinate has two independent warping parameters.

If \(x = [x_1, \dots, x_d]\) and coordinate_range = (l, r), the warping transform operates on \([x_l, \dots, x_{r-1}]\). The default for coordinate_range is the full range, and we must have l < r. The block is the identity on all remaining coordinates. Input coordinates are assumed to lie in \([0, 1]\). The warping transform on each coordinate is due to Kumaraswamy:

\[warp(x_j) = 1 - (1 - r(x_j)^{a_j})^{b_j}.\]

Here, \(r(x_j)\) linearly maps \([0, 1]\) to \([\epsilon, 1 - \epsilon]\) for a small \(\epsilon > 0\), which avoids numerical issues when taking derivatives.

Parameters:
  • dimension (int) – Dimension \(d\) of input

  • coordinate_range (Optional[Tuple[int, int]]) – Range (l, r), see above. Default is (0, dimension), so the full range

  • encoding_type (str) – Encoding type

forward(x)[source]

Actual computation of the warping transformation (see details above)

Parameters:

x – Input data, shape (n, d)

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping.warpings_for_hyperparameters(hp_ranges)[source]

It is custom to warp hyperparameters which are not categorical. This function creates warpings based on your configuration space.

Parameters:

hp_ranges (HyperparameterRanges) – Encoding of configuration space

Return type:

List[Warping]

Returns:

To be used as warpings in WarpedKernel

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping.kernel_with_warping(kernel, hp_ranges)[source]

Note that the coordinates corresponding to categorical parameters are not warped.

Parameters:
Return type:

KernelFunction

Returns:

Kernel with warping

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.warping.WarpedKernel(kernel, warpings, **kwargs)[source]

Bases: KernelFunction

Block that composes warping with an arbitrary kernel. We allow for a list of warping transforms, so that a non-contiguous set of input coordinates can be warped.

It is custom to warp hyperparameters which are not categorical. You can use kernel_with_warping() to furnish a kernel with warping for all non-categorical hyperparameters.

Parameters:
  • kernel (KernelFunction) – Kernel \(k(x, x')\)

  • warpings (List[Warping]) – List of warping transforms, which are applied sequentially. Ranges of different entries should be non-overlapping, this is not checked.

forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

syne_tune.optimizer.schedulers.searchers.bayesopt.models package
Subpackages
syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.cost_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.cost_model.CostValue(c0, c1)[source]

Bases: object

Represents cost value \((c_0(x), c_1(x))\):

  • \(c_0(x)\): Startup cost for evaluation at config \(x\)

  • \(c_1(x)\): Cost per unit of resource \(r\) at config \(x\)

Our assumption is that, under the model, an evaluation at \(x\) until resource level \(r = 1, 2, 3, \dots\) costs \(c(x, r) = c_0(x) + r c_1(x)\)

c0: float
c1: float
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.cost_model.CostModel[source]

Bases: object

Interface for (temporal) cost model in the context of multi-fidelity HPO. We assume there are configurations \(x\) and resource levels \(r\) (for example, number of epochs). Here, \(r\) is a positive int. Can be seen as simplified version of surrogate model, which is mainly used in order to draw (jointly dependent) values from the posterior over cost values \((c_0(x), c_1(x))\).

Note: The model may be random (in which case joint samples are drawn from the posterior) or deterministic (in which case the model is fitted to data, and then cost values returned are deterministic.

A cost model has an inner state, which is set by calling update() passing a dataset. This inner state is then used when sample_joint() is called.

property cost_metric_name: str
Returns:

Name of metric in TrialEvaluations of cases in TuningJobState

update(state)[source]

Update inner representation in order to be ready to return cost value samples.

Note: The metric :attr``cost_metric_name`` must be dict-valued in state, with keys being resource values \(r\). In order to support a proper estimation of \(c_0\) and \(c_1\), there should (ideally) be entries with the same \(x\) and different resource levels \(r\). The likelihood function takes into account that \(c(x, r) = c_0(x) + r c_1(x)\).

Parameters:

state (TuningJobState) – Current dataset (only trials_evaluations is used)

resample()[source]

For a random cost model, the state is resampled, such that calls of joint_sample before and after are conditionally independent. Normally, successive calls of sample_joint are jointly dependent. For example, for a linear model, the state resampled here would be the weight vector, which is then used in sample_joint().

For a deterministic cost model, this method does nothing.

sample_joint(candidates)[source]

Draws cost values \((c_0(x), c_1(x))\) for candidates (non-extended).

If the model is random, the sampling is done jointly. Also, if sample_joint() is called multiple times, the posterior is to be updated after each call, such that the sample over the union of candidates over all calls is drawn jointly (but see resample()). Also, if measurement noise is allowed in update, this noise is not added here. A sample from \(c(x, r)\) is obtained as \(c_0(x) + r c_1(x)\). If the model is deterministic, the model determined in update() is just evaluated.

Parameters:

candidates (List[Dict[str, Union[int, float, str]]]) – Non-extended configs

Return type:

List[CostValue]

Returns:

List of \((c_0(x), c_1(x))\)

static event_time(start_time, level, next_milestone, cost)[source]

If a task reported its last recent value at start_time at level level, return time of reaching level next_milestone, given cost cost.

Parameters:
  • start_time (float) – See above

  • level (int) – See above

  • next_milestone (int) – See above

  • cost (CostValue) – See above

Return type:

float

Returns:

Time of reaching next_milestone under cost model

predict_times(candidates, resources, cost_values, start_time=0)[source]

Given configs \(x\), resource values \(r\) and cost values returned by sample_joint(), compute time predictions for when each config \(x\) reaches its resource level \(r\) if started at start_time.

Parameters:
  • candidates (List[Dict[str, Union[int, float, str]]]) – Configs

  • resources (List[int]) – Resource levels

  • cost_values (List[CostValue]) – Cost values from sample_joint()

  • start_time (float) – See above

Return type:

List[float]

Returns:

Predicted times

syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model.LinearCostModel[source]

Bases: CostModel

Deterministic cost model where both c0(x) and c1(x) are linear models of the form

c0(x) = np.dot(features0(x), weights0),
c1(x) = np.dot(features1(x), weights1)

The feature maps features0, features1 are supplied by subclasses. The weights are fit by ridge regression, using scikit.learn.RidgeCV, the regularization constant is set by LOO cross-validation.

property cost_metric_name: str
Returns:

Name of metric in TrialEvaluations of cases in TuningJobState

feature_matrices(candidates)[source]

Has to be supplied by subclasses

Parameters:

candidates (List[Dict[str, Union[int, float, str]]]) – List of n candidate configs (non-extended)

Return type:

(ndarray, ndarray)

Returns:

Feature matrices features0 (n, dim0), features1 (n, dim1)

update(state)[source]

Update inner representation in order to be ready to return cost value samples.

Note: The metric :attr``cost_metric_name`` must be dict-valued in state, with keys being resource values \(r\). In order to support a proper estimation of \(c_0\) and \(c_1\), there should (ideally) be entries with the same \(x\) and different resource levels \(r\). The likelihood function takes into account that \(c(x, r) = c_0(x) + r c_1(x)\).

Parameters:

state (TuningJobState) – Current dataset (only trials_evaluations is used)

sample_joint(candidates)[source]

Draws cost values \((c_0(x), c_1(x))\) for candidates (non-extended).

If the model is random, the sampling is done jointly. Also, if sample_joint() is called multiple times, the posterior is to be updated after each call, such that the sample over the union of candidates over all calls is drawn jointly (but see resample()). Also, if measurement noise is allowed in update, this noise is not added here. A sample from \(c(x, r)\) is obtained as \(c_0(x) + r c_1(x)\). If the model is deterministic, the model determined in update() is just evaluated.

Parameters:

candidates (List[Dict[str, Union[int, float, str]]]) – Non-extended configs

Return type:

List[CostValue]

Returns:

List of \((c_0(x), c_1(x))\)

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model.MLPLinearCostModel(num_inputs, num_outputs, num_hidden_layers, hidden_layer_width, batch_size, bs_exponent=None, extra_mlp=False, c0_mlp_feature=False, expected_hidden_layer_width=None)[source]

Bases: LinearCostModel

Deterministic linear cost model for multi-layer perceptron.

If config is a HP configuration, num_hidden_layers(config) is the number of hidden layers, hidden_layer_width(config, layer) is the number of units in hidden layer layer (0-based), batch_size(config) is the batch size.

If expected_hidden_layer_width is given, it maps layer (0-based) to expected layer width under random sampling. In this case, all MLP features are normalized to expected value 1 under random sampling (but ignoring bs_exponent if != 1). Note: If needed, we could incorporate bs_exponent in general. If batch_size was uniform between a and b:

\[ext{E}\left[ bs^{bs_{exp} - 1} \]

ight] =

rac{ ext{b^{bs_{exp}} - a^{bs_{exp}} }{ (bs_{exp} * (b - a) }

feature_matrices(candidates)[source]

Has to be supplied by subclasses

Parameters:

candidates (List[Dict[str, Union[int, float, str]]]) – List of n candidate configs (non-extended)

Return type:

(ndarray, ndarray)

Returns:

Feature matrices features0 (n, dim0), features1 (n, dim1)

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model.FixedLayersMLPCostModel(num_inputs, num_outputs, num_units_keys=None, bs_exponent=None, extra_mlp=False, c0_mlp_feature=False, expected_hidden_layer_width=None)[source]

Bases: MLPLinearCostModel

Linear cost model for MLP with num_hidden_layers hidden layers.

static get_expected_hidden_layer_width(config_space, num_units_keys)[source]

Constructs expected_hidden_layer_width function from the training evaluation function. Works because impute_points_to_evaluate imputes with the expected value under random sampling.

Parameters:
  • config_space (Dict) – Configuration space

  • num_units_keys (List[str]) – Keys into config_space for number of units of different layers

Returns:

expected_hidden_layer_width, exp_vals

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model.NASBench201LinearCostModel(config_keys, map_config_values, conv_separate_features, count_sum)[source]

Bases: LinearCostModel

Deterministic linear cost model for NASBench201.

The cell graph is:

node1 = x0(node0)
node2 = x1(node0) + x2(node1)
node3 = x3(node0) + x4(node1) + x5(node2)

config_keys contains attribute names of x0, ..., x5 in a config, in this ordering. map_config_values maps values in the config (for fields corresponding to x0, ..., x5) to entries of Op.

Parameters:
  • config_keys (Tuple[str, ...]) – See above

  • map_config_values (Dict[str, int]) – See above

  • conv_separate_features (bool) – If True, we use separate features for nor_conv_1x1, nor_conv_3x3 (c1 has 4 features). Otherwise, these two are captured by a single features (c1 has 3 features)

  • count_sum (bool) – If True, we use an additional feature for pointwise sum operators inside a cell (there are between 0 and 3)

class Op(value)[source]

Bases: IntEnum

An enumeration.

SKIP_CONNECT = 0
NONE = 1
NOR_CONV_1x1 = 2
NOR_CONV_3x3 = 3
AVG_POOL_3x3 = 4
feature_matrices(candidates)[source]

Has to be supplied by subclasses

Parameters:

candidates (List[Dict[str, Union[int, float, str]]]) – List of n candidate configs (non-extended)

Return type:

(ndarray, ndarray)

Returns:

Feature matrices features0 (n, dim0), features1 (n, dim1)

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.linear_cost_model.BiasOnlyLinearCostModel[source]

Bases: LinearCostModel

Simple baseline: features0(x) = [1], features1(x) = [1]

feature_matrices(candidates)[source]

Has to be supplied by subclasses

Parameters:

candidates (List[Dict[str, Union[int, float, str]]]) – List of n candidate configs (non-extended)

Return type:

(ndarray, ndarray)

Returns:

Feature matrices features0 (n, dim0), features1 (n, dim1)

syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.sklearn_cost_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.sklearn_cost_model.ScikitLearnCostModel(model_type=None)[source]

Bases: NonLinearCostModel

Deterministic cost model, where c0(x) = b0 (constant), and c1(x) is given by a scikit.learn (or scipy) regression model. Parameters are b0 and those of the regression model.

Parameters:

model_type (Optional[str]) – Regression model for c1(x)

transform_dataset(dataset, num_data0, res_min)[source]

Transforms dataset (see _data_for_c1_regression()) into a dataset representation (dict), which is used as kwargs in fit_regressor().

Parameters:
  • dataset (List[Tuple[Dict[str, Union[int, float, str]], float]]) –

  • num_data0 (int) –

  • res_min (int) –

Return type:

Dict[str, Any]

Returns:

Used as kwargs in fit_regressor

static fit_regressor(b0, **kwargs)[source]

Given value for b0, fits regressor to dataset specified via kwargs (see transform_dataset()). Returns the criterion function value for b0 as well as the fitted regression model.

Parameters:
  • b0 (float) –

  • kwargs

Returns:

fval, model

predict_c1_values(candidates)[source]
Parameters:

candidates (List[Dict[str, Union[int, float, str]]]) – Test configs

Returns:

Corresponding c1 values

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost.sklearn_cost_model.UnivariateSplineCostModel(scalar_attribute, input_range, spline_degree=3)[source]

Bases: NonLinearCostModel

Here, c1(x) is given by a univariate spline (UnivariateSpline), where a single scalar is extracted from x.

In the second part of the dataset (pos >= num_data0), duplicate entries with the same config in dataset are grouped into one, using the mean as target value, and a weight equal to the number of duplicates. This still leaves duplicates in the overall dataset, one in data0, the other in data1, but spline smoothing can deal with this.

transform_dataset(dataset, num_data0, res_min)[source]

Transforms dataset (see _data_for_c1_regression()) into a dataset representation (dict), which is used as kwargs in fit_regressor().

Parameters:
  • dataset (List[Tuple[Dict[str, Union[int, float, str]], float]]) –

  • num_data0 (int) –

  • res_min (int) –

Return type:

Dict[str, Any]

Returns:

Used as kwargs in fit_regressor

static fit_regressor(b0, **kwargs)[source]

Given value for b0, fits regressor to dataset specified via kwargs (see transform_dataset()). Returns the criterion function value for b0 as well as the fitted regression model.

Parameters:
  • b0 (float) –

  • kwargs

Returns:

fval, model

predict_c1_values(candidates)[source]
Parameters:

candidates (List[Dict[str, Union[int, float, str]]]) – Test configs

Returns:

Corresponding c1 values

Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.models.acqfunc_factory module
syne_tune.optimizer.schedulers.searchers.bayesopt.models.acqfunc_factory.acquisition_function_factory(name, **kwargs)[source]
Return type:

Callable[[Any], AcquisitionFunction]

syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost_fifo_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost_fifo_model.CostFixedResourcePredictor(state, model, fixed_resource, num_samples=1)[source]

Bases: BasePredictor

Wraps cost model \(c(x, r)\) of CostModel to be used as surrogate model, where predictions are done at r = fixed_resource.

Note: For random cost models, we approximate expectations in predict by resampling num_samples times (should be 1 for deterministic cost models).

Note: Since this is a generic wrapper, we assume for backward_gradient that the gradient contribution through the cost model vanishes. For special cost models, the mapping from encoded input to predictive means may be differentiable, and prediction code in autograd may be available. For such cost models, this wrapper should not be used, and backward_gradient should be implemented properly.

Parameters:
  • state (TuningJobState) – TuningJobSubState

  • model (CostModel) – Model parameters must have been fit

  • fixed_resource (int) – \(c(x, r)\) is predicted for this resource level r

  • num_samples (int) – Number of samples drawn in predict(). Use this for random cost models only

static keys_predict()[source]

Keys of signals returned by predict().

Note: In order to work with AcquisitionFunction implementations, the following signals are required:

  • “mean”: Predictive mean

  • “std”: Predictive standard deviation

Return type:

Set[str]

Returns:

Set of keys for dict returned by predict()

predict(inputs)[source]

Returns signals which are statistics of the predictive distribution at input points inputs. By default:

  • “mean”: Predictive means. If the model supports fantasizing with a number nf of fantasies, this has shape (n, nf), otherwise (n,)

  • “std”: Predictive stddevs, shape (n,)

If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.

Parameters:

inputs (ndarray) – Input points, shape (n, d)

Return type:

List[Dict[str, ndarray]]

Returns:

List of dict with keys keys_predict(), of length the number of MCMC samples, or length 1 for empirical Bayes

backward_gradient(input, head_gradients)[source]

The gradient contribution through the cost model is blocked.

Return type:

List[ndarray]

predict_mean_current_candidates()[source]

Returns the predictive mean (signal with key ‘mean’) at all current candidates in the state (observed, pending).

If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.

Return type:

List[ndarray]

Returns:

List of predictive means

current_best()[source]

Returns the so-called incumbent, to be used in acquisition functions such as expected improvement. This is the minimum of predictive means (signal with key “mean”) at all current candidate locations (both state.trials_evaluations and state.pending_evaluations). Normally, a scalar is returned, but if the model supports fantasizing and the state contains pending evaluations, there is one incumbent per fantasy sample, so a vector is returned.

If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.

Return type:

List[ndarray]

Returns:

Incumbent, see above

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.cost_fifo_model.CostEstimator(model, fixed_resource, num_samples=1)[source]

Bases: Estimator

The name of the cost metric is model.cost_metric_name.

Parameters:
  • model (CostModel) – CostModel to be wrapped

  • fixed_resource (int) – \(c(x, r)\) is predicted for this resource level r

  • num_samples (int) – Number of samples drawn in predict(). Use this for random cost models only

get_params()[source]
Returns:

Current tunable model parameters

set_params(param_dict)[source]
Parameters:

param_dict – New model parameters

property fixed_resource: int
set_fixed_resource(resource)[source]
fit_from_state(state, update_params)[source]

Models of type CostModel do not have hyperparameters to be fit, so update_params is ignored here.

Return type:

Predictor

syne_tune.optimizer.schedulers.searchers.bayesopt.models.estimator module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.estimator.Estimator[source]

Bases: object

Interface for surrogate models used in ModelStateTransformer.

In general, a surrogate model is probabilistic (or Bayesian), in that predictions are driven by a posterior distribution, represented in a posterior state of type Predictor. The model may also come with tunable (hyper)parameters, such as for example covariance function parameters for a Gaussian process model. These parameters can be accessed with get_params(), set_params().

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Current tunable model parameters

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – New model parameters

fit_from_state(state, update_params)[source]

Creates a Predictor object based on data in state. For a Bayesian model, this involves computing the posterior state, which is wrapped in the Predictor object.

If the model also has (hyper)parameters, these are learned iff update_params == True. Otherwise, these parameters are not changed, but only the posterior state is computed. The idea is that in general, model fitting is much more expensive than just creating the final posterior state (or predictor). It then makes sense to partly work with stale model parameters.

If your surrogate model is not Bayesian, or does not have hyperparameters, you can ignore the update_params argument,

Parameters:
  • state (TuningJobState) – Current data model parameters are to be fit on, and the posterior state is to be computed from

  • update_params (bool) – See above

Return type:

Predictor

Returns:

Predictor, wrapping the posterior state

property debug_log: DebugLogPrinter | None
configure_scheduler(scheduler)[source]

Called by configure_scheduler() of searchers which make use of an class:Estimator. Allows the estimator to depend on parameters of the scheduler.

Parameters:

scheduler – Scheduler object

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.estimator.TransformedData(features, targets, mean, std)[source]

Bases: object

features: ndarray
targets: ndarray
mean: float
std: float
syne_tune.optimizer.schedulers.searchers.bayesopt.models.estimator.transform_state_to_data(state, active_metric=None, normalize_targets=True, num_fantasy_samples=1)[source]

Transforms TuningJobState object state to features and targets. The former are encoded vectors from state.hp_ranges. The latter are normalized to zero mean, unit variance if normalize_targets == True, in which case the original mean and stddev is also returned.

If state.pending_evaluations is not empty, it must contain entries of type FantasizedPendingEvaluation, which contain the fantasy samples. This is the case only for internal states.

Parameters:
  • state (TuningJobState) – TuningJobState to transform

  • active_metric (Optional[str]) – Name of target metric (optional)

  • normalize_targets (bool) – Normalize targets? Defaults to True

  • num_fantasy_samples (int) – Number of fantasy samples. Defaults to 1

Return type:

TransformedData

Returns:

Transformed data

syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_mcmc_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_mcmc_model.GaussProcMCMCEstimator(gpmodel, active_metric='target', normalize_targets=True, debug_log=None, filter_observed_data=None, hp_ranges_for_prediction=None)[source]

Bases: GaussProcEstimator

We support pending evaluations via fantasizing. Note that state does not contain the fantasy values, but just the pending configs. Fantasy values are sampled here.

We draw one fantasy sample per MCMC sample here. This could be extended by sampling > 1 fantasy samples for each MCMC sample.

Parameters:
  • gpmodel (GPRegressionMCMC) – GPRegressionMCMC model

  • active_metric (str) – Name of the metric to optimize.

  • normalize_targets (bool) – Normalize target values in state.trials_evaluations?

get_params()[source]
Returns:

Current tunable model parameters

set_params(param_dict)[source]
Parameters:

param_dict – New model parameters

syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model.GaussProcPredictor(state, gpmodel, fantasy_samples, active_metric=None, normalize_mean=0.0, normalize_std=1.0, filter_observed_data=None, hp_ranges_for_prediction=None)[source]

Bases: BasePredictor

Gaussian process surrogate model, where model parameters are either fit by marginal likelihood maximization (e.g., GaussianProcessRegression), or integrated out by MCMC sampling (e.g., GPRegressionMCMC).

Both state and gpmodel are immutable. If parameters of the latter are to be fit, this has to be done before.

fantasy_samples contains the sampled (normalized) target values for pending configs. Only active_metric target values are considered. The target values for a pending config are a flat vector. If MCMC is used, its length is a multiple of the number of MCMC samples, containing the fantasy values for MCMC sample 0, sample 1, …

Parameters:
hp_ranges_for_prediction()[source]
Return type:

HyperparameterRanges

Returns:

Feature generator to be used for inputs in predict()

predict(inputs)[source]

Returns signals which are statistics of the predictive distribution at input points inputs. By default:

  • “mean”: Predictive means. If the model supports fantasizing with a number nf of fantasies, this has shape (n, nf), otherwise (n,)

  • “std”: Predictive stddevs, shape (n,)

If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.

Parameters:

inputs (ndarray) – Input points, shape (n, d)

Return type:

List[Dict[str, ndarray]]

Returns:

List of dict with keys keys_predict(), of length the number of MCMC samples, or length 1 for empirical Bayes

backward_gradient(input, head_gradients)[source]

Computes the gradient \(\nabla_x f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by predict() for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.

Lists have > 1 entry if MCMC is used, otherwise they are all size 1.

Parameters:
  • input (ndarray) – Single input point \(x\), shape (d,)

  • head_gradients (List[Dict[str, ndarray]]) – See above

Return type:

List[ndarray]

Returns:

Gradient \(\nabla_x f(x)\) (several if MCMC is used)

does_mcmc()[source]
property posterior_states: List[PosteriorState] | None
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model.GaussProcEstimator(gpmodel, active_metric, normalize_targets=True, debug_log=None, filter_observed_data=None, no_fantasizing=False, hp_ranges_for_prediction=None)[source]

Bases: Estimator

We support pending evaluations via fantasizing. Note that state does not contain the fantasy values, but just the pending configs. Fantasy values are sampled here.

Parameters:
property debug_log: DebugLogPrinter | None
property gpmodel: GaussianProcessRegression | GPRegressionMCMC | IndependentGPPerResourceModel | HyperTuneIndependentGPModel | HyperTuneJointGPModel
fit_from_state(state, update_params)[source]

Parameters of self._gpmodel are optimized iff update_params. This requires state to contain labeled examples.

If self.state.pending_evaluations is not empty, we proceed as follows: :rtype: Predictor

  • Compute posterior for state without pending evals

  • Draw fantasy values for pending evals

  • Recompute posterior (without fitting)

configure_scheduler(scheduler)[source]

Called by configure_scheduler() of searchers which make use of an class:Estimator. Allows the estimator to depend on parameters of the scheduler.

Parameters:

scheduler – Scheduler object

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model.GaussProcEmpiricalBayesEstimator(gpmodel, num_fantasy_samples, active_metric='target', normalize_targets=True, debug_log=None, filter_observed_data=None, no_fantasizing=False, hp_ranges_for_prediction=None)[source]

Bases: GaussProcEstimator

We support pending evaluations via fantasizing. Note that state does not contain the fantasy values, but just the pending configs. Fantasy values are sampled here.

Parameters:
get_params()[source]
Returns:

Current tunable model parameters

set_params(param_dict)[source]
Parameters:

param_dict – New model parameters

syne_tune.optimizer.schedulers.searchers.bayesopt.models.gpiss_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gpiss_model.GaussProcAdditivePredictor(state, gpmodel, fantasy_samples, active_metric, filter_observed_data=None, normalize_mean=0.0, normalize_std=1.0)[source]

Bases: BasePredictor

Gaussian Process additive surrogate model, where model parameters are fit by marginal likelihood maximization.

Note: predict_mean_current_candidates() calls predict() for all observed and pending extended configs. This may not be exactly correct, because predict() is not meant to be used for configs which have observations (it IS correct at \(r = r_{max}\)).

fantasy_samples contains the sampled (normalized) target values for pending configs. Only active_metric target values are considered. The target values for a pending config are a flat vector.

Parameters:
  • state (TuningJobState) – TuningJobSubState

  • gpmodel (GaussianProcessLearningCurveModel) – Parameters must have been fit

  • fantasy_samples (List[FantasizedPendingEvaluation]) – See above

  • active_metric (str) – See parent class

  • filter_observed_data (Optional[Callable[[Dict[str, Union[int, float, str]]], bool]]) – See parent class

  • normalize_mean (float) – Mean used to normalize targets

  • normalize_std (float) – Stddev used to normalize targets

predict(inputs)[source]

Input features inputs are w.r.t. extended configs (x, r).

Parameters:

inputs (ndarray) – Input features

Return type:

List[Dict[str, ndarray]]

Returns:

Predictive means, stddevs

backward_gradient(input, head_gradients)[source]

Computes the gradient \(\nabla_x f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by predict() for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.

Lists have > 1 entry if MCMC is used, otherwise they are all size 1.

Parameters:
  • input (ndarray) – Single input point \(x\), shape (d,)

  • head_gradients (List[Dict[str, ndarray]]) – See above

Return type:

List[ndarray]

Returns:

Gradient \(\nabla_x f(x)\) (several if MCMC is used)

does_mcmc()[source]
property posterior_states: List[GaussProcAdditivePosteriorState] | None
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.gpiss_model.GaussProcAdditiveEstimator(gpmodel, num_fantasy_samples, active_metric, config_space_ext, normalize_targets=False, debug_log=None, filter_observed_data=None)[source]

Bases: Estimator

If num_fantasy_samples > 0, we draw this many fantasy targets independently, while each sample is dependent over all pending evaluations. If num_fantasy_samples == 0, pending evaluations in state are ignored.

Parameters:
  • gpmodel (GaussianProcessLearningCurveModel) – GaussianProcessLearningCurveModel

  • num_fantasy_samples (int) – See above

  • active_metric (str) – Name of the metric to optimize.

  • config_space_ext (ExtendedConfiguration) – ExtendedConfiguration

  • normalize_targets (bool) – Normalize observed target values?

  • debug_log (Optional[DebugLogPrinter]) – DebugLogPrinter (optional)

  • filter_observed_data (Optional[Callable[[Dict[str, Union[int, float, str]]], bool]]) – Filter for observed data before computing incumbent

property debug_log: DebugLogPrinter | None
get_params()[source]
Returns:

Current tunable model parameters

set_params(param_dict)[source]
Parameters:

param_dict – New model parameters

fit_from_state(state, update_params)[source]

Creates a Predictor object based on data in state. For a Bayesian model, this involves computing the posterior state, which is wrapped in the Predictor object.

If the model also has (hyper)parameters, these are learned iff update_params == True. Otherwise, these parameters are not changed, but only the posterior state is computed. The idea is that in general, model fitting is much more expensive than just creating the final posterior state (or predictor). It then makes sense to partly work with stale model parameters.

If your surrogate model is not Bayesian, or does not have hyperparameters, you can ignore the update_params argument,

Parameters:
  • state (TuningJobState) – Current data model parameters are to be fit on, and the posterior state is to be computed from

  • update_params (bool) – See above

Return type:

Predictor

Returns:

Predictor, wrapping the posterior state

predictor_for_fantasy_samples(state, fantasy_samples)[source]

Same as model with fit_params=False, but fantasy_samples are passed in, rather than sampled here.

Parameters:
Return type:

Predictor

Returns:

See model

configure_scheduler(scheduler)[source]

Called by configure_scheduler() of searchers which make use of an class:Estimator. Allows the estimator to depend on parameters of the scheduler.

Parameters:

scheduler – Scheduler object

syne_tune.optimizer.schedulers.searchers.bayesopt.models.kernel_factory module
syne_tune.optimizer.schedulers.searchers.bayesopt.models.kernel_factory.base_kernel_factory(name, dimension, **kwargs)[source]
Return type:

KernelFunction

syne_tune.optimizer.schedulers.searchers.bayesopt.models.kernel_factory.resource_kernel_factory(name, kernel_x, mean_x, **kwargs)[source]

Given kernel function kernel_x and mean function mean_x over config x, create kernel and mean functions over (x, r), where r is the resource attribute (nonnegative scalar, usually in [0, 1]).

Note: For name in ["matern52", "matern52-res-warp"], if kernel_x is of type WarpedKernel, the resulting kernel inherits this warping.

Parameters:
  • name (str) – Selects resource kernel type

  • kernel_x (KernelFunction) – Kernel function over configs x

  • mean_x (MeanFunction) – Mean function over configs x

  • kwargs – Extra arguments (optional)

Return type:

(KernelFunction, MeanFunction)

Returns:

(res_kernel, res_mean), both over (x, r)

syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc.HeadWithGradient(hval, gradient)[source]

Bases: object

gradient maps each output model to a dict of head gradients, whose keys are those used by predict (e.g., mean, std)

hval: ndarray
gradient: Dict[str, Dict[str, ndarray]]
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc.CurrentBestProvider[source]

Bases: object

Helper class for MeanStdAcquisitionFunction. The current_best values required in compute_acq() and compute_acq_with_gradient() may depend on the MCMC sample index for each model (if none of the models use MCMC, this index is always (0, 0, ..., 0)).

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc.NoneCurrentBestProvider[source]

Bases: CurrentBestProvider

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc.ActiveMetricCurrentBestProvider(active_metric_current_best)[source]

Bases: CurrentBestProvider

Default implementation in which current_best depends on the active metric only.

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc.MeanStdAcquisitionFunction(predictor, active_metric=None)[source]

Bases: AcquisitionFunction

Base class for standard acquisition functions which depend on predictive mean and stddev. Subclasses have to implement the head and its derivatives w.r.t. mean and std:

\[f(x, \mathrm{model}) = h(\mathrm{mean}, \mathrm{std}, \mathrm{model.current_best}())\]

If model is a Predictor, then active_metric is ignored. If model is a dict mapping output names to models, then active_metric must be given.

Note that acquisition functions will always be minimized!

compute_acq(inputs, predictor=None)[source]

Note: If inputs has shape (d,), it is taken to be (1, d)

Parameters:
  • inputs (ndarray) – Encoded input points, shape (n, d)

  • predictor (Union[Predictor, Dict[str, Predictor], None]) – If given, overrides self.predictor

Return type:

ndarray

Returns:

Acquisition function values, shape (n,)

compute_acq_with_gradient(input, predictor=None)[source]

For a single input point \(x\), compute acquisition function value \(f(x)\) and gradient \(\nabla_x f(x)\).

Parameters:
  • input (ndarray) – Single input point \(x\), shape (d,)

  • predictor (Union[Predictor, Dict[str, Predictor], None]) – If given, overrides self.predictor

Return type:

(float, ndarray)

Returns:

\((f(x), \nabla_x f(x))\)

syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.EIAcquisitionFunction(predictor, active_metric=None, jitter=0.01, debug_collect_stats=False)[source]

Bases: MeanStdAcquisitionFunction

Minus expected improvement acquisition function (minus because the convention is to always minimize acquisition functions)

debug_stats_message()[source]
Return type:

str

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.LCBAcquisitionFunction(predictor, kappa, active_metric=None)[source]

Bases: MeanStdAcquisitionFunction

Lower confidence bound (LCB) acquisition function:

\[h(\mu, \sigma) = \mu - \kappa * \sigma\]
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.EIpuAcquisitionFunction(predictor, active_metric=None, exponent_cost=1.0, jitter=0.01)[source]

Bases: MeanStdAcquisitionFunction

Minus cost-aware expected improvement acquisition function.

This is defined as

\[\mathrm{EIpu}(x) = \frac{\mathrm{EI(x)}}{\mathrm{power}(\mathrm{cost}(x), \mathrm{exponent_cost})},\]

where \(\mathrm{EI}(x)\) is expected improvement, \(\mathrm{cost}(x)\) is the predictive mean of a cost model, and exponent_cost is an exponent in \((0, 1]\).

exponent_cost scales the influence of the cost term on the acquisition function. See also:

Lee etal.
Cost-aware Bayesian Optimization

Note: two metrics are expected in the model output: the main objective and the cost. The main objective needs to be indicated as active_metric when initializing EIpuAcquisitionFunction. The cost is automatically assumed to be the other metric.

Parameters:
  • predictor (Union[Predictor, Dict[str, Predictor]]) – Predictors for main objective and cost

  • active_metric (Optional[str]) – Name of main objective

  • exponent_cost (float) – Exponent for cost in denominator. Defaults to 1

  • jitter (float) – Jitter factor, must be positive. Defaults to 0.01

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.ConstraintCurrentBestProvider(current_best_list, num_samples_active)[source]

Bases: CurrentBestProvider

Here, current_best depends on two predictors, for active and constraint metric.

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.CEIAcquisitionFunction(predictor, active_metric=None, jitter=0.01)[source]

Bases: MeanStdAcquisitionFunction

Minus constrained expected improvement acquisition function. (minus because the convention is to always minimize the acquisition function)

This is defined as CEI(x) = EI(x) * P(c(x) <= 0), where EI is the standard expected improvement with respect to the current feasible best, and P(c(x) <= 0) is the probability that the hyperparameter configuration x satisfies the constraint modeled by c(x).

If there are no feasible hyperparameters yet, the current feasible best is undefined. Thus, CEI is reduced to the P(c(x) <= 0) term until a feasible configuration is found.

Two metrics are expected in the model output: the main objective and the constraint metric. The main objective needs to be indicated as active_metric when initializing CEIAcquisitionFunction. The constraint is automatically assumed to be the other metric.

References on CEI:

Gardner et al.
Bayesian Optimization with Inequality Constraints
ICML 2014

and

Gelbart et al.
Bayesian Optimization with Unknown Constraints
UAI 2014.
Parameters:
  • predictor (Union[Predictor, Dict[str, Predictor]]) – Predictors for main objective and cost

  • active_metric (Optional[str]) – Name of main objective

  • jitter (float) – Jitter factor, must be positive. Defaults to 0.01

syne_tune.optimizer.schedulers.searchers.bayesopt.models.meanstd_acqfunc_impl.get_quantiles(acquisition_par, fmin, m, s)[source]

Quantiles of the Gaussian distribution, useful to determine the acquisition function values.

Parameters:
  • acquisition_par – parameter of the acquisition function

  • fmin – current minimum.

  • m – vector of means.

  • s – vector of standard deviations.

syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_base module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_base.BasePredictor(state, active_metric=None, filter_observed_data=None)[source]

Bases: Predictor

Base class for (most) Predictor implementations, provides common code.

property filter_observed_data: Callable[[Dict[str, int | float | str]], bool] | None
set_filter_observed_data(filter_observed_data)[source]
predict_mean_current_candidates()[source]

Returns the predictive mean (signal with key ‘mean’) at all current candidates in the state (observed, pending).

If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.

Return type:

List[ndarray]

Returns:

List of predictive means

current_best()[source]

Returns the so-called incumbent, to be used in acquisition functions such as expected improvement. This is the minimum of predictive means (signal with key “mean”) at all current candidate locations (both state.trials_evaluations and state.pending_evaluations). Normally, a scalar is returned, but if the model supports fantasizing and the state contains pending evaluations, there is one incumbent per fantasy sample, so a vector is returned.

If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.

Return type:

List[ndarray]

Returns:

Incumbent, see above

syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt.SkipOptimizationPredicate[source]

Bases: object

Interface for skip_optimization predicate in ModelStateTransformer.

reset()[source]

If there is an internal state, reset it to its initial value

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt.NeverSkipPredicate[source]

Bases: SkipOptimizationPredicate

Hyperparameter optimization is never skipped.

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt.AlwaysSkipPredicate[source]

Bases: SkipOptimizationPredicate

Hyperparameter optimization is always skipped.

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt.SkipPeriodicallyPredicate(init_length, period, metric_name='target')[source]

Bases: SkipOptimizationPredicate

Let N be the number of labeled points for metric metric_name. Optimizations are not skipped if N < init_length. Afterwards, we increase a counter whenever N is larger than in the previous call. With respect to this counter, optimizations are done every period times, in between they are skipped.

Parameters:
  • init_length (int) – See above

  • period (int) – See above

  • metric_name (str) – Name of internal metric. Defaults to INTERNAL_METRIC_NAME.

reset()[source]

If there is an internal state, reset it to its initial value

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_skipopt.SkipNoMaxResourcePredicate(init_length, max_resource, metric_name='target')[source]

Bases: SkipOptimizationPredicate

This predicate works for multi-fidelity HPO, see for example GPMultiFidelitySearcher.

We track the number of labeled datapoints at resource level max_resource. HP optimization is skipped if the total number N of labeled cases is N >= init_length, and if the number of max_resource cases has not increased since the last recent optimization.

This means that as long as the dataset only grows w.r.t. cases at lower resources than max_resource, this does not trigger HP optimization.

Parameters:
  • init_length (int) – See above

  • max_resource (int) – See above

  • metric_name (str) – Name of internal metric. Defaults to INTERNAL_METRIC_NAME.

reset()[source]

If there is an internal state, reset it to its initial value

syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_transformer module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_transformer.StateForModelConverter[source]

Bases: object

Interface for state converters (optionally) used in ModelStateTransformer. These are applied to a state before being passed to the model for fitting and predictions. The main use case is to filter down data if fitting the model scales super-linearly.

set_random_state(random_state)[source]

Some state converters use random sampling. For these, the random state has to be set before first usage.

Parameters:

random_state (RandomState) – Random state to be used internally

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.model_transformer.ModelStateTransformer(estimator, init_state, skip_optimization=None, state_converter=None)[source]

Bases: object

This class maintains the TuningJobState object alongside an HPO experiment, and manages the reaction to changes of this state. In particular, it provides a fitted surrogate model on demand, which encapsulates the GP posterior.

The state transformer is generic, it uses Estimator for anything specific to the model type.

skip_optimization is a predicate depending on the state, determining what is done at the next recent call of model. If False, the model parameters are refit, otherwise the current ones are not changed (which is usually faster, but risks stale-ness).

We also track the observed data state.trials_evaluations. If this did not change since the last recent model() call, we do not refit the model parameters. This is based on the assumption that model parameter fitting only depends on state.trials_evaluations (observed data), not on other fields (e.g., pending evaluations).

If given, state_converter maps the state to another one which is then passed to the model for fitting and predictions. One important use case is filtering down data when model fitting is superlinear. Another is to convert multi-fidelity setups to be used with single-fidelity models inside.

Note that estimator and skip_optimization can also be a dictionary mapping output names to models. In that case, the state is shared but the models for each output metric are updated independently.

Parameters:
property state: TuningJobState
property use_single_model: bool
property estimator: Estimator | Dict[str, Estimator]
property skip_optimization: SkipOptimizationPredicate | Dict[str, SkipOptimizationPredicate]
fit(**kwargs)[source]

If skip_optimization is given, it overrides the self._skip_optimization predicate.

Return type:

Union[Predictor, Dict[str, Predictor]]

Returns:

Fitted surrogate model for current state in the standard single model case; in the multi-model case, it returns a dictionary mapping output names to surrogate model instances for current state (shared across models).

get_params()[source]
set_params(param_dict)[source]
append_trial(trial_id, config=None, resource=None)[source]

Appends new pending evaluation to the state.

Parameters:
  • trial_id (str) – ID of trial

  • config (Optional[Dict[str, Union[int, float, str]]]) – Must be given if this trial does not yet feature in the state

  • resource (Optional[int]) – Must be given in the multi-fidelity case, to specify at which resource level the evaluation is pending

drop_pending_evaluation(trial_id, resource=None)[source]

Drop pending evaluation from state. If it is not listed as pending, nothing is done

Parameters:
  • trial_id (str) – ID of trial

  • resource (Optional[int]) – Must be given in the multi-fidelity case, to specify at which resource level the evaluation is pending

Return type:

bool

remove_observed_case(trial_id, metric_name='target', key=None)[source]

Removes specific observation from the state.

Parameters:
  • trial_id (str) – ID of trial

  • metric_name (str) – Name of internal metric

  • key (Optional[str]) – Must be given in the multi-fidelity case

label_trial(data, config=None)[source]

Adds observed data for a trial. If it has observations in the state already, data.metrics are appended. Otherwise, a new entry is appended. If new observations replace pending evaluations, these are removed.

config must be passed if the trial has not yet been registered in the state (this happens normally with the append_trial call). If already registered, config is ignored.

filter_pending_evaluations(filter_pred)[source]

Filters state.pending_evaluations with filter_pred.

Parameters:

filter_pred (Callable[[PendingEvaluation], bool]) – Filtering predicate

mark_trial_failed(trial_id)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.models.sklearn_model module
class syne_tune.optimizer.schedulers.searchers.bayesopt.models.sklearn_model.SKLearnPredictorWrapper(sklearn_predictor, state, active_metric=None)[source]

Bases: BasePredictor

Wrapper class for sklearn predictors returned by fit_from_state of SKLearnEstimatorWrapper.

predict(inputs)[source]

Returns signals which are statistics of the predictive distribution at input points inputs. By default:

  • “mean”: Predictive means. If the model supports fantasizing with a number nf of fantasies, this has shape (n, nf), otherwise (n,)

  • “std”: Predictive stddevs, shape (n,)

If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.

Parameters:

inputs (ndarray) – Input points, shape (n, d)

Return type:

List[Dict[str, ndarray]]

Returns:

List of dict with keys keys_predict(), of length the number of MCMC samples, or length 1 for empirical Bayes

backward_gradient(input, head_gradients)[source]

Computes the gradient \(\nabla f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by predict() for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.

Parameters:
  • input (ndarray) – Single input point \(x\), shape (d,)

  • head_gradients (List[Dict[str, ndarray]]) – See above

Return type:

List[ndarray]

Returns:

Gradient \(\nabla f(x)\) (one-length list)

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.sklearn_model.SKLearnEstimatorWrapper(sklearn_estimator, active_metric=None, *args, **kwargs)[source]

Bases: Estimator

Wrapper class for sklearn estimators.

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Current tunable model parameters

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – New model parameters

fit_from_state(state, update_params)[source]

Creates a Predictor object based on data in state.

If the model also has hyperparameters, these are learned iff update_params == True. Otherwise, these parameters are not changed, but only the posterior state is computed. If your surrogate model is not Bayesian, or does not have hyperparameters, you can ignore the update_params argument.

If self.state.pending_evaluations is not empty, we compute posterior for state without pending evals. This method can be overwritten for any other behaviour such as one found in fit_from_state().

Parameters:
  • state (TuningJobState) – Current data model parameters are to be fit on, and the posterior state is to be computed from

  • update_params (bool) – See above

Return type:

Predictor

Returns:

Predictor, wrapping the posterior state

syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_multi_fidelity module
syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_multi_fidelity.cap_size_tuning_job_state(state, max_size, random_state=None)[source]

Returns state which is identical to state, except that the trials_evaluations are replaced by a subset so the total number of metric values is <= max_size. Filtering is done by preserving data from trials which have observations at the higher resource levels. For some trials, we may remove values at low resources, but keep values at higher ones, in order to meet the max_size constraint.

Parameters:
  • state (TuningJobState) – Original state to filter down

  • max_size (int) – Maximum number of observed metric values in new state

  • random_state (Optional[RandomState]) – Used for random sampling. Defaults to numpy.random.

Return type:

TuningJobState

Returns:

New state meeting the max_size constraint. This is a copy of state even if this meets the constraint already.

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_multi_fidelity.SubsampleMultiFidelityStateConverter(max_size, random_state=None)[source]

Bases: StateForModelConverter

Converts state by (possibly) down sampling the observation so that their total number is <= max_size. This is done in a way that trials with observations in higher rung levels are retained (with all their data), so observations are preferentially removed at low levels, and from trials which do not have observations higher up.

This state converter makes sense if observed data is only used at geometrically spaced rung levels, so the number of observations per trial remains small. If a trial runs up on the order of max_resource_level observations, it does not work, because it ends up retaining densely sampled observations from very few trials. Use SubsampleMFDenseDataStateConverter in such a case.

set_random_state(random_state)[source]

Some state converters use random sampling. For these, the random state has to be set before first usage.

Parameters:

random_state (RandomState) – Random state to be used internally

syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_multi_fidelity.sparsify_tuning_job_state(state, max_size, grace_period, reduction_factor)[source]

Does the first step of state conversion in SubsampleMFDenseDataStateConverter, in that dense observations are sparsified w.r.t. a geometrically spaced rung level system.

Parameters:
  • state (TuningJobState) – Original state to filter down

  • max_size (int) – Maximum number of observed metric values in new state

  • grace_period (int) – Minimum resource level \(r_{min}\)

  • reduction_factor (float) – Reduction factor \(\eta\)

Return type:

TuningJobState

Returns:

New state which either meets the max_size constraint, or is maximally sparsified

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_multi_fidelity.SubsampleMFDenseDataStateConverter(max_size, grace_period=None, reduction_factor=None, random_state=None)[source]

Bases: SubsampleMultiFidelityStateConverter

Variant of SubsampleMultiFidelityStateConverter, which has the same goal, but does subsampling in a different way. The current default for most GP-based multi-fidelity algorithms (e.g., MOBSTER, Hyper-Tune) is to use observations only at geometrically spaced rung levels (such as 1, 3, 9, …), and SubsampleMultiFidelityStateConverter makes sense.

But for some (e.g., DyHPO), observations are recorded at all (or linearly spaced) resource levels, so there is much more data for trials which progressed further. Here, we do the state conversion in two steps, always stopping the process once the target size max_size is reached. We assume a geometric rung level spacing, given by grace_period and reduction_factor, only for the purpose of state conversion. In the first step, we sparsify the observations. If each rung level \(r_k`\) defines a bucket \(B_k = r_{k-1} + 1, \dots, r_k\), each trial should have at most one observation in each bucket. Sparsification is done top down. If the result of this first step is still larger than max_size, we continue with subsampling as in SubsampleMultiFidelityStateConverter.

syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_single_fidelity module
syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_single_fidelity.cap_size_tuning_job_state(state, max_size, mode, top_fraction, random_state=None)[source]

Returns state which is identical to state, except that the trials_evaluations are replaced by a subset so the total number of metric values is <= max_size.

Parameters:
  • state (TuningJobState) – Original state to filter down

  • max_size (int) – Maximum number of observed metric values in new state

  • mode (str) – “min” or “max”

  • top_fraction (float) – See above

  • random_state (Optional[RandomState]) – Used for random sampling. Defaults to numpy.random.

Return type:

TuningJobState

Returns:

New state meeting the max_size constraint. This is a copy of state even if this meets the constraint already.

class syne_tune.optimizer.schedulers.searchers.bayesopt.models.subsample_state_single_fidelity.SubsampleSingleFidelityStateConverter(max_size, mode, top_fraction, random_state=None)[source]

Bases: StateForModelConverter

Converts state by (possibly) down sampling the observation so that their total number is <= max_size. If len(state) > max_size, the subset is sampled as follows. max_size * top_fraction is filled with the best observations. The remainder is sampled without replacement from the remaining observations.

Parameters:
  • max_size (int) – Maximum number of observed metric values in new state

  • mode (str) – “min” or “max”

  • top_fraction (float) – See above

  • random_state (Optional[RandomState]) – Used for random sampling. Can also be set with set_random_state()

set_random_state(random_state)[source]

Some state converters use random sampling. For these, the random state has to be set before first usage.

Parameters:

random_state (RandomState) – Random state to be used internally

syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn package
class syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.SKLearnPredictor[source]

Bases: object

Base class for predictors generated by scikit-learn based estimators of SKLearnEstimator.

This is only for predictors who return means and stddevs in predict().

predict(X)[source]

Returns signals which are statistics of the predictive distribution at input points inputs.

Parameters:

inputs – Input points, shape (n, d)

Return type:

Tuple[ndarray, ndarray]

Returns:

(means, stds), where predictive means means and predictive stddevs stds have shape (n,)

backward_gradient(input, head_gradients)[source]

Needs to be implemented only if gradient-based local optimization of an acquisition function is supported.

Computes the gradient \(\nabla f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by predict() for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.

Parameters:
  • input (ndarray) – Single input point \(x\), shape (d,)

  • head_gradients (Dict[str, ndarray]) – See above

Return type:

ndarray

Returns:

Gradient \(\nabla f(x)\)

class syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.SKLearnEstimator[source]

Bases: object

Base class scikit-learn based estimators, giving rise to surrogate models for Bayesian optimization.

fit(X, y, update_params)[source]

Implements fit_from_state(), given transformed data. Here, y is normalized (zero mean, unit variance) iff normalize_targets == True.

Parameters:
  • X (ndarray) – Feature matrix, shape (n_samples, n_features)

  • y (ndarray) – Target values, shape (n_samples,)

  • update_params (bool) – Should model (hyper)parameters be updated? Ignored if estimator has no hyperparameters

Return type:

SKLearnPredictor

Returns:

Predictor, wrapping the posterior state

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Current model hyperparameters

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – New model hyperparameters

property normalize_targets: bool
Returns:

Should targets in state be normalized before calling fit()?

Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.estimator module
class syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.estimator.SKLearnEstimator[source]

Bases: object

Base class scikit-learn based estimators, giving rise to surrogate models for Bayesian optimization.

fit(X, y, update_params)[source]

Implements fit_from_state(), given transformed data. Here, y is normalized (zero mean, unit variance) iff normalize_targets == True.

Parameters:
  • X (ndarray) – Feature matrix, shape (n_samples, n_features)

  • y (ndarray) – Target values, shape (n_samples,)

  • update_params (bool) – Should model (hyper)parameters be updated? Ignored if estimator has no hyperparameters

Return type:

SKLearnPredictor

Returns:

Predictor, wrapping the posterior state

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Current model hyperparameters

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – New model hyperparameters

property normalize_targets: bool
Returns:

Should targets in state be normalized before calling fit()?

syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.predictor module
class syne_tune.optimizer.schedulers.searchers.bayesopt.sklearn.predictor.SKLearnPredictor[source]

Bases: object

Base class for predictors generated by scikit-learn based estimators of SKLearnEstimator.

This is only for predictors who return means and stddevs in predict().

predict(X)[source]

Returns signals which are statistics of the predictive distribution at input points inputs.

Parameters:

inputs – Input points, shape (n, d)

Return type:

Tuple[ndarray, ndarray]

Returns:

(means, stds), where predictive means means and predictive stddevs stds have shape (n,)

backward_gradient(input, head_gradients)[source]

Needs to be implemented only if gradient-based local optimization of an acquisition function is supported.

Computes the gradient \(\nabla f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by predict() for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.

Parameters:
  • input (ndarray) – Single input point \(x\), shape (d,)

  • head_gradients (Dict[str, ndarray]) – See above

Return type:

ndarray

Returns:

Gradient \(\nabla f(x)\)

syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes module
syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.assign_active_metric(predictor, active_metric)[source]

Checks that active_metric is provided when predictor consists of multiple output predictors. Otherwise, just sets active_metric to the only predictor output name available.

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.NextCandidatesAlgorithm[source]

Bases: object

next_candidates()[source]
Return type:

List[Dict[str, Union[int, float, str]]]

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.Predictor(state, active_metric=None)[source]

Bases: object

Base class for probabilistic predictors used in Bayesian optimization. They support marginal predictions feeding into an acquisition function, as well as computing gradients of an acquisition function w.r.t. inputs.

In general, a predictor is created by an estimator. It wraps a posterior state, which allows for probabilistic predictions on arbitrary inputs.

Parameters:
  • state (TuningJobState) – Tuning job state

  • active_metric (Optional[str]) – Name of internal objective

keys_predict()[source]

Keys of signals returned by predict().

Note: In order to work with AcquisitionFunction implementations, the following signals are required:

  • “mean”: Predictive mean

  • “std”: Predictive standard deviation

Return type:

Set[str]

Returns:

Set of keys for dict returned by predict()

predict(inputs)[source]

Returns signals which are statistics of the predictive distribution at input points inputs. By default:

  • “mean”: Predictive means. If the model supports fantasizing with a number nf of fantasies, this has shape (n, nf), otherwise (n,)

  • “std”: Predictive stddevs, shape (n,)

If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.

Parameters:

inputs (ndarray) – Input points, shape (n, d)

Return type:

List[Dict[str, ndarray]]

Returns:

List of dict with keys keys_predict(), of length the number of MCMC samples, or length 1 for empirical Bayes

hp_ranges_for_prediction()[source]
Return type:

HyperparameterRanges

Returns:

Feature generator to be used for inputs in predict()

predict_candidates(candidates)[source]

Convenience variant of predict()

Parameters:

candidates (Iterable[Dict[str, Union[int, float, str]]]) – List of configurations

Return type:

List[Dict[str, ndarray]]

Returns:

Same as predict()

current_best()[source]

Returns the so-called incumbent, to be used in acquisition functions such as expected improvement. This is the minimum of predictive means (signal with key “mean”) at all current candidate locations (both state.trials_evaluations and state.pending_evaluations). Normally, a scalar is returned, but if the model supports fantasizing and the state contains pending evaluations, there is one incumbent per fantasy sample, so a vector is returned.

If the hyperparameters of the surrogate model are being optimized (e.g., by empirical Bayes), the returned list has length 1. If its hyperparameters are averaged over by MCMC, the returned list has one entry per MCMC sample.

Return type:

List[ndarray]

Returns:

Incumbent, see above

backward_gradient(input, head_gradients)[source]

Computes the gradient \(\nabla_x f(x)\) for an acquisition function \(f(x)\), where \(x\) is a single input point. This is using reverse mode differentiation, the head gradients are passed by the acquisition function. The head gradients are \(\partial_k f\), where \(k\) runs over the statistics returned by predict() for the single input point \(x\). The shape of head gradients is the same as the shape of the statistics.

Lists have > 1 entry if MCMC is used, otherwise they are all size 1.

Parameters:
  • input (ndarray) – Single input point \(x\), shape (d,)

  • head_gradients (List[Dict[str, ndarray]]) – See above

Return type:

List[ndarray]

Returns:

Gradient \(\nabla_x f(x)\) (several if MCMC is used)

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.ScoringFunction(predictor=None, active_metric=None)[source]

Bases: object

Class to score candidates. As opposed to acquisition functions, scores do not support gradient computation. Note that scores are always minimized.

score(candidates, predictor=None)[source]
Parameters:
  • candidates (Iterable[Dict[str, Union[int, float, str]]]) – Configurations for which scores are to be computed

  • predictor (Union[Predictor, Dict[str, Predictor], None]) – Overrides default predictor

Return type:

List[float]

Returns:

List of score values, length of candidates

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.AcquisitionFunction(predictor=None, active_metric=None)[source]

Bases: ScoringFunction

Base class for acquisition functions \(f(x)\).

Parameters:
  • predictor (Union[Predictor, Dict[str, Predictor], None]) – Predictor(s) from surrogate model

  • active_metric (Optional[str]) – Name of internal metric

compute_acq(inputs, predictor=None)[source]

Note: If inputs has shape (d,), it is taken to be (1, d)

Parameters:
  • inputs (ndarray) – Encoded input points, shape (n, d)

  • predictor (Union[Predictor, Dict[str, Predictor], None]) – If given, overrides self.predictor

Return type:

ndarray

Returns:

Acquisition function values, shape (n,)

compute_acq_with_gradient(input, predictor=None)[source]

For a single input point \(x\), compute acquisition function value \(f(x)\) and gradient \(\nabla_x f(x)\).

Parameters:
  • input (ndarray) – Single input point \(x\), shape (d,)

  • predictor (Union[Predictor, Dict[str, Predictor], None]) – If given, overrides self.predictor

Return type:

Tuple[float, ndarray]

Returns:

\((f(x), \nabla_x f(x))\)

score(candidates, predictor=None)[source]
Parameters:
  • candidates (Iterable[Dict[str, Union[int, float, str]]]) – Configurations for which scores are to be computed

  • predictor (Union[Predictor, Dict[str, Predictor], None]) – Overrides default predictor

Return type:

List[float]

Returns:

List of score values, length of candidates

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.LocalOptimizer(hp_ranges, predictor, acquisition_class, active_metric=None)[source]

Bases: object

Class that tries to find a local candidate with a better score, typically using a local optimization method such as L-BFGS. It would normally encapsulate an acquisition function and predictor.

acquisition_class contains the type of the acquisition function (subclass of AcquisitionFunction). It can also be a tuple of the form (type, kwargs), where kwargs are extra arguments to the class constructor.

Parameters:
  • hp_ranges (HyperparameterRanges) – Feature generator for configurations

  • predictor (Union[Predictor, Dict[str, Predictor]]) – Predictor(s) for acquisition function

  • acquisition_class (Callable[[Any], AcquisitionFunction]) – See above

  • active_metric (Optional[str]) – Name of internal metric

optimize(candidate, predictor=None)[source]

Run local optimization, starting from candidate

Parameters:
  • candidate (Dict[str, Union[int, float, str]]) – Starting point

  • predictor (Union[Predictor, Dict[str, Predictor], None]) – Overrides self.predictor

Return type:

Dict[str, Union[int, float, str]]

Returns:

Configuration found by local optimization

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.base_classes.CandidateGenerator[source]

Bases: object

Class to generate candidates from which to start the local minimization, typically random candidate or some form of more uniformly spaced variation, such as latin hypercube or Sobol sequence.

generate_candidates()[source]
Return type:

Iterator[Dict[str, Union[int, float, str]]]

generate_candidates_en_bulk(num_cands, exclusion_list=None)[source]
Parameters:
  • num_cands (int) – Number of candidates to generate

  • exclusion_list (Optional[ExclusionList]) – If given, these candidates must not be returned

Return type:

List[Dict[str, Union[int, float, str]]]

Returns:

List of num_cands candidates. If exclusion_list is given, the number of candidates returned can be < num_cands

syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm module
class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm.BayesianOptimizationAlgorithm(initial_candidates_generator, initial_candidates_scorer, num_initial_candidates, local_optimizer, pending_candidate_state_transformer, exclusion_candidates, num_requested_candidates, greedy_batch_selection, duplicate_detector, num_initial_candidates_for_batch=None, sample_unique_candidates=False, debug_log=None)[source]

Bases: NextCandidatesAlgorithm

Core logic of the Bayesian optimization algorithm

Parameters:
  • initial_candidates_generator (CandidateGenerator) – generator of candidates

  • initial_scoring_function – scoring function used to rank the initial candidates. Note: If a batch is selected in one go (num_requested_candidates > 1, greedy_batch_selection == False), this function should encourage diversity among its top scorers. In general, greedy batch selection is recommended.

  • num_initial_candidates (int) – how many initial candidates to generate, if possible

  • local_optimizer (LocalOptimizer) – local optimizer which starts from score minimizer. If a batch is selected in one go (not greedily), then local optimizations are started from the top num_requested_candidates ranked candidates (after scoring)

  • pending_candidate_state_transformer (Optional[ModelStateTransformer]) – Once a candidate is selected, it becomes pending, and the state is transformed by appending information. This is done by the transformer. This is object is needed only if next_candidates() goes through more than one outer iterations (i.e., if greedy_batch_selection == True and num_requested_candidates > 1. Otherwise, None can be passed here. Note: Model updates (by the state transformer) for batch candidates beyond the first do not involve fitting hyperparameters, so they are usually cheap.

  • exclusion_candidates (ExclusionList) – Set of candidates that should not be returned, because they are already labeled, currently pending, or have failed

  • num_requested_candidates (int) – number of candidates to return

  • greedy_batch_selection (bool) – If True and num_requested_candidates > 1, we generate, order, and locally optimize for each single candidate to be selected. Otherwise, this is done just once, and num_requested_candidates are extracted in one go. Note: If this is True, pending_candidate_state_transformer is needed.

  • duplicate_detector (DuplicateDetector) – used to make sure no candidates equal to already evaluated ones is returned

  • num_initial_candidates_for_batch (Optional[int]) – This is used only if num_requested_candidates > 1 and greedy_batch_selection == True. In this case, num_initial_candidates_for_batch overrides num_initial_candidates when selecting all but the first candidate for the batch. Typically, num_initial_candidates is larger than num_initial_candidates_for_batch in this case, which speeds up selecting large batches, but still select the first candidate thoroughly

  • sample_unique_candidates (bool) – If True, we check that initial candidates sampled at random are unique and disjoint from the exclusion list. This can be expensive. Defaults to False

  • debug_log (Optional[DebugLogPrinter]) – If a DebugLogPrinter object is passed here, it is used to write log messages

initial_candidates_generator: CandidateGenerator
initial_candidates_scorer: ScoringFunction
num_initial_candidates: int
local_optimizer: LocalOptimizer
pending_candidate_state_transformer: Optional[ModelStateTransformer]
exclusion_candidates: ExclusionList
num_requested_candidates: int
greedy_batch_selection: bool
duplicate_detector: DuplicateDetector
num_initial_candidates_for_batch: Optional[int] = None
sample_unique_candidates: bool = False
debug_log: Optional[DebugLogPrinter] = None
next_candidates()[source]
Return type:

List[Dict[str, Union[int, float, str]]]

syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components module
class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.IndependentThompsonSampling(predictor=None, active_metric=None, random_state=None)[source]

Bases: ScoringFunction

Note: This is not Thompson sampling, but rather a variant called “independent Thompson sampling”, where means and variances are drawn from the marginal rather than the joint distribution. This is cheap, but incorrect. In fact, the larger the number of candidates, the more likely the winning configuration is arising from pure chance.

Parameters:
  • predictor (Union[Predictor, Dict[str, Predictor], None]) – Surrogate predictor for statistics of predictive distribution

  • random_state (Optional[RandomState]) – PRN generator

score(candidates, predictor=None)[source]
Parameters:
  • candidates (Iterable[Dict[str, Union[int, float, str]]]) – Configurations for which scores are to be computed

  • predictor (Optional[Predictor]) – Overrides default predictor

Return type:

List[float]

Returns:

List of score values, length of candidates

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.LBFGSOptimizeAcquisition(hp_ranges, predictor, acquisition_class, active_metric=None)[source]

Bases: LocalOptimizer

optimize(candidate, predictor=None)[source]

Run local optimization, starting from candidate

Parameters:
  • candidate (Dict[str, Union[int, float, str]]) – Starting point

  • predictor (Union[Predictor, Dict[str, Predictor], None]) – Overrides self.predictor

Return type:

Dict[str, Union[int, float, str]]

Returns:

Configuration found by local optimization

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.NoOptimization(*args, **kwargs)[source]

Bases: LocalOptimizer

optimize(candidate, predictor=None)[source]

Run local optimization, starting from candidate

Parameters:
  • candidate (Dict[str, Union[int, float, str]]) – Starting point

  • predictor (Optional[Predictor]) – Overrides self.predictor

Return type:

Dict[str, Union[int, float, str]]

Returns:

Configuration found by local optimization

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.RandomStatefulCandidateGenerator(hp_ranges, random_state)[source]

Bases: CandidateGenerator

This generator maintains a random state, so if generate_candidates() is called several times, different sequences are returned.

Parameters:
  • hp_ranges (HyperparameterRanges) – Feature generator for configurations

  • random_state (RandomState) – PRN generator

generate_candidates()[source]
Return type:

Iterator[Dict[str, Union[int, float, str]]]

generate_candidates_en_bulk(num_cands, exclusion_list=None)[source]
Parameters:
  • num_cands (int) – Number of candidates to generate

  • exclusion_list – If given, these candidates must not be returned

Return type:

List[Dict[str, Union[int, float, str]]]

Returns:

List of num_cands candidates. If exclusion_list is given, the number of candidates returned can be < num_cands

syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.generate_unique_candidates(candidates_generator, num_candidates, exclusion_candidates)[source]
Return type:

List[Dict[str, Union[int, float, str]]]

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.RandomFromSetCandidateGenerator(base_set, random_state, ext_config=None)[source]

Bases: CandidateGenerator

In this generator, candidates are sampled from a given set.

Parameters:
  • base_set (List[Dict[str, Union[int, float, str]]]) – Set of all configurations to sample from

  • random_state (RandomState) – PRN generator

  • ext_config (Optional[Dict[str, Union[int, float, str]]]) – If given, each configuration is updated with this dictionary before being returned

generate_candidates()[source]
Return type:

Iterator[Dict[str, Union[int, float, str]]]

generate_candidates_en_bulk(num_cands, exclusion_list=None)[source]
Parameters:
  • num_cands (int) – Number of candidates to generate

  • exclusion_list – If given, these candidates must not be returned

Return type:

List[Dict[str, Union[int, float, str]]]

Returns:

List of num_cands candidates. If exclusion_list is given, the number of candidates returned can be < num_cands

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.DuplicateDetector[source]

Bases: object

contains(existing_candidates, new_candidate)[source]
Return type:

bool

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.DuplicateDetectorNoDetection[source]

Bases: DuplicateDetector

contains(existing_candidates, new_candidate)[source]
Return type:

bool

class syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.bo_algorithm_components.DuplicateDetectorIdentical[source]

Bases: DuplicateDetector

contains(existing_candidates, new_candidate)[source]
Return type:

bool

syne_tune.optimizer.schedulers.searchers.bayesopt.tuning_algorithms.defaults module
syne_tune.optimizer.schedulers.searchers.bayesopt.utils package
Submodules
syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy module
class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.ThreeHumpCamel[source]

Bases: object

property search_space
evaluate(x1, x2)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.branin_function(x1, x2, r=6)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.Branin[source]

Bases: object

property search_space
evaluate(x1, x2)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.BraninWithR(r)[source]

Bases: Branin

evaluate(x1, x2)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.Ackley[source]

Bases: object

property search_space
evaluate(x1, x2)[source]
class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.SimpleQuadratic[source]

Bases: object

property search_space
evaluate(x1, x2)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.evaluate_blackbox(bb_func, inputs)[source]
Return type:

ndarray

syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.sample_data(bb_cls, num_train, num_grid, expand_datadct=True)[source]
Return type:

Dict[str, Any]

syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.expand_data(data)[source]

Appends derived entries to data dict, which have non-elementary types.

Return type:

Dict[str, Any]

syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.data_to_state(data)[source]
Return type:

TuningJobState

syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.decode_inputs(inputs, ss_limits)[source]
Return type:

(List[Dict[str, Union[int, float, str]]], Dict)

syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.assert_equal_candidates(candidates1, candidates2, hp_ranges, decimal=5)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.utils.comparison_gpy.assert_equal_randomstate(randomstate1, randomstate2)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.utils.debug_log module
class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.debug_log.DebugLogPrinter[source]

Bases: object

Supports a concise debug log. In particular, information about get_config is displayed in a single block. For that, different parts are first collected until the end of get_config.

start_get_config(gc_type, trial_id)[source]
set_final_config(config)[source]
set_state(state)[source]
set_targets(targets)[source]
set_model_params(params)[source]
set_fantasies(fantasies)[source]
set_init_config(config, top_scores=None)[source]
set_num_evaluations(num_evals)[source]
append_extra(extra)[source]
write_block()[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects module

Object definitions that are used for testing.

syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.build_kernel(state, do_warping=False)[source]
Return type:

KernelFunction

syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.default_gpmodel(state, random_seed, optimization_config)[source]
Return type:

GaussianProcessRegression

syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.default_gpmodel_mcmc(state, random_seed, mcmc_config)[source]
Return type:

GPRegressionMCMC

class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.RepeatedCandidateGenerator(n_unique_candidates)[source]

Bases: CandidateGenerator

Generates candidates from a fixed set. Used to test the deduplication logic.

generate_candidates()[source]
Return type:

Iterator[Dict[str, Union[int, float, str]]]

class syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.Quadratic3d(local_minima, active_metric, metric_names)[source]

Bases: object

property search_space
property f_min
syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.tuples_to_configs(config_tpls, hp_ranges)[source]

Many unit tests write configs as tuples.

Return type:

List[Dict[str, Union[int, float, str]]]

syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.create_exclusion_set(candidates_tpl, hp_ranges, is_dict=False)[source]

Creates exclusion list from set of tuples.

Return type:

ExclusionList

syne_tune.optimizer.schedulers.searchers.bayesopt.utils.test_objects.create_tuning_job_state(hp_ranges, cand_tuples, metrics, pending_tuples=None, failed_tuples=None)[source]

Builds TuningJobState from basics, where configs are given as tuples or as dicts.

NOTE: We assume that all configs in the different lists are different!

Return type:

TuningJobState

syne_tune.optimizer.schedulers.searchers.bore package
class syne_tune.optimizer.schedulers.searchers.bore.Bore(config_space, metric, points_to_evaluate=None, allow_duplicates=None, restrict_configurations=None, mode=None, gamma=None, calibrate=None, classifier=None, acq_optimizer=None, feval_acq=None, random_prob=None, init_random=None, classifier_kwargs=None, **kwargs)[source]

Bases: StochasticAndFilterDuplicatesSearcher

Implements “Bayesian optimization by Density Ratio Estimation” as described in the following paper:

BORE: Bayesian Optimization by Density-Ratio Estimation,
Tiao, Louis C and Klein, Aaron and Seeger, Matthias W and Bonilla, Edwin V. and Archambeau, Cedric and Ramos, Fabio
Proceedings of the 38th International Conference on Machine Learning

Additional arguments on top of parent class StochasticAndFilterDuplicatesSearcher:

Parameters:
  • mode (Optional[str]) – Can be “min” (default) or “max”.

  • gamma (Optional[float]) – Defines the percentile, i.e how many percent of configurations are used to model \(l(x)\). Defaults to 0.25

  • calibrate (Optional[bool]) – If set to true, we calibrate the predictions of the classifier via CV. Defaults to False

  • classifier (Optional[str]) – The binary classifier to model the acquisition function. Choices: {"mlp", "gp", "xgboost", "rf", "logreg"}. Defaults to “xgboost”

  • acq_optimizer (Optional[str]) – The optimization method to maximize the acquisition function. Choices: {"de", "rs", "rs_with_replacement"}. Defaults to “rs”

  • feval_acq (Optional[int]) – Maximum allowed function evaluations of the acquisition function. Defaults to 500

  • random_prob (Optional[float]) – probability for returning a random configurations (epsilon greedy). Defaults to 0

  • init_random (Optional[int]) – get_config() returns randomly drawn configurations until at least init_random observations have been recorded in update(). After that, the BORE algorithm is used. Defaults to 6

  • classifier_kwargs (Optional[dict]) – Parameters for classifier. Optional

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

class syne_tune.optimizer.schedulers.searchers.bore.MultiFidelityBore(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, gamma=None, calibrate=None, classifier=None, acq_optimizer=None, feval_acq=None, random_prob=None, init_random=None, classifier_kwargs=None, resource_attr='epoch', **kwargs)[source]

Bases: Bore

Adapts BORE (Tiao et al.) for the multi-fidelity Hyperband setting following BOHB (Falkner et al.). Once we collected enough data points on the smallest resource level, we fit a probabilistic classifier and sample from it until we have a sufficient amount of data points for the next higher resource level. We then refit the classifier on the data of this resource level. These steps are iterated until we reach the highest resource level. References:

BORE: Bayesian Optimization by Density-Ratio Estimation,
Tiao, Louis C and Klein, Aaron and Seeger, Matthias W and Bonilla, Edwin V. and Archambeau, Cedric and Ramos, Fabio
Proceedings of the 38th International Conference on Machine Learning

and

BOHB: Robust and Efficient Hyperparameter Optimization at Scale
S. Falkner and A. Klein and F. Hutter
Proceedings of the 35th International Conference on Machine Learning

Additional arguments on top of parent class Bore:

Parameters:

resource_attr (str) – Name of resource attribute. Defaults to “epoch”

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

Submodules
syne_tune.optimizer.schedulers.searchers.bore.bore module
class syne_tune.optimizer.schedulers.searchers.bore.bore.Bore(config_space, metric, points_to_evaluate=None, allow_duplicates=None, restrict_configurations=None, mode=None, gamma=None, calibrate=None, classifier=None, acq_optimizer=None, feval_acq=None, random_prob=None, init_random=None, classifier_kwargs=None, **kwargs)[source]

Bases: StochasticAndFilterDuplicatesSearcher

Implements “Bayesian optimization by Density Ratio Estimation” as described in the following paper:

BORE: Bayesian Optimization by Density-Ratio Estimation,
Tiao, Louis C and Klein, Aaron and Seeger, Matthias W and Bonilla, Edwin V. and Archambeau, Cedric and Ramos, Fabio
Proceedings of the 38th International Conference on Machine Learning

Additional arguments on top of parent class StochasticAndFilterDuplicatesSearcher:

Parameters:
  • mode (Optional[str]) – Can be “min” (default) or “max”.

  • gamma (Optional[float]) – Defines the percentile, i.e how many percent of configurations are used to model \(l(x)\). Defaults to 0.25

  • calibrate (Optional[bool]) – If set to true, we calibrate the predictions of the classifier via CV. Defaults to False

  • classifier (Optional[str]) – The binary classifier to model the acquisition function. Choices: {"mlp", "gp", "xgboost", "rf", "logreg"}. Defaults to “xgboost”

  • acq_optimizer (Optional[str]) – The optimization method to maximize the acquisition function. Choices: {"de", "rs", "rs_with_replacement"}. Defaults to “rs”

  • feval_acq (Optional[int]) – Maximum allowed function evaluations of the acquisition function. Defaults to 500

  • random_prob (Optional[float]) – probability for returning a random configurations (epsilon greedy). Defaults to 0

  • init_random (Optional[int]) – get_config() returns randomly drawn configurations until at least init_random observations have been recorded in update(). After that, the BORE algorithm is used. Defaults to 6

  • classifier_kwargs (Optional[dict]) – Parameters for classifier. Optional

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.bore.de module
class syne_tune.optimizer.schedulers.searchers.bore.de.DifferentialevolutionOptimizer(f, lower, upper, fevals, strategy='best1', bin=1)[source]

Bases: object

evolve(j)[source]
run()[source]
syne_tune.optimizer.schedulers.searchers.bore.gp_classififer module
syne_tune.optimizer.schedulers.searchers.bore.mlp_classififer module
class syne_tune.optimizer.schedulers.searchers.bore.mlp_classififer.MLP(n_inputs, n_hidden=32, epochs=100, learning_rate=0.001, activation='relu')[source]

Bases: object

fit(X, y)[source]
predict_proba(X)[source]
predict(X)[source]
syne_tune.optimizer.schedulers.searchers.bore.multi_fidelity_bore module
class syne_tune.optimizer.schedulers.searchers.bore.multi_fidelity_bore.MultiFidelityBore(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, gamma=None, calibrate=None, classifier=None, acq_optimizer=None, feval_acq=None, random_prob=None, init_random=None, classifier_kwargs=None, resource_attr='epoch', **kwargs)[source]

Bases: Bore

Adapts BORE (Tiao et al.) for the multi-fidelity Hyperband setting following BOHB (Falkner et al.). Once we collected enough data points on the smallest resource level, we fit a probabilistic classifier and sample from it until we have a sufficient amount of data points for the next higher resource level. We then refit the classifier on the data of this resource level. These steps are iterated until we reach the highest resource level. References:

BORE: Bayesian Optimization by Density-Ratio Estimation,
Tiao, Louis C and Klein, Aaron and Seeger, Matthias W and Bonilla, Edwin V. and Archambeau, Cedric and Ramos, Fabio
Proceedings of the 38th International Conference on Machine Learning

and

BOHB: Robust and Efficient Hyperparameter Optimization at Scale
S. Falkner and A. Klein and F. Hutter
Proceedings of the 35th International Conference on Machine Learning

Additional arguments on top of parent class Bore:

Parameters:

resource_attr (str) – Name of resource attribute. Defaults to “epoch”

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

syne_tune.optimizer.schedulers.searchers.botorch package
class syne_tune.optimizer.schedulers.searchers.botorch.BoTorchSearcher(config_space, metric, points_to_evaluate=None, allow_duplicates=False, restrict_configurations=None, mode='min', num_init_random=3, no_fantasizing=False, max_num_observations=200, input_warping=True, **kwargs)[source]

Bases: StochasticAndFilterDuplicatesSearcher

A searcher that suggest configurations using BOTORCH to build GP surrogate and optimize acquisition function.

qExpectedImprovement is used for the acquisition function, given that it supports pending evaluations.

Additional arguments on top of parent class StochasticAndFilterDuplicatesSearcher:

Parameters:
  • mode (str) – “min” (default) or “max”

  • num_init_random (int) – get_config() returns randomly drawn configurations until at least init_random observations have been recorded in update(). After that, the BOTorch algorithm is used. Defaults to 3

  • no_fantasizing (bool) – If True, fantasizing is not done and pending evaluations are ignored. This may lead to loss of diversity in decisions. Defaults to False

  • max_num_observations (Optional[int]) – Maximum number of observation to use when fitting the GP. If the number of observations gets larger than this number, then data is subsampled. If None, then all data is used to fit the GP. Defaults to 200

  • input_warping (bool) – Whether to apply input warping when fitting the GP. Defaults to True

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

num_suggestions()[source]
register_pending(trial_id, config=None, milestone=None)[source]

Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.

Parameters:
  • trial_id (str) – ID of trial to be registered as pending evaluation

  • config (Optional[dict]) – If trial_id has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.

  • milestone (Optional[int]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers (config, milestone) as pending.

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

cleanup_pending(trial_id)[source]

Removes all pending evaluations for trial trial_id.

This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.

Parameters:

trial_id (str) – ID of trial whose pending evaluations should be cleared

dataset_size()[source]
Returns:

Size of dataset a model is fitted to, or 0 if no model is fitted to data

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

objectives()[source]
metric_names()[source]
Return type:

List[str]

metric_mode()[source]
Return type:

str

class syne_tune.optimizer.schedulers.searchers.botorch.BotorchSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: BoTorchSearcher

Downwards compatibility. Please use BoTorchSearcher instead

Submodules
syne_tune.optimizer.schedulers.searchers.botorch.botorch_searcher module
class syne_tune.optimizer.schedulers.searchers.botorch.botorch_searcher.BoTorchSearcher(config_space, metric, points_to_evaluate=None, allow_duplicates=False, restrict_configurations=None, mode='min', num_init_random=3, no_fantasizing=False, max_num_observations=200, input_warping=True, **kwargs)[source]

Bases: StochasticAndFilterDuplicatesSearcher

A searcher that suggest configurations using BOTORCH to build GP surrogate and optimize acquisition function.

qExpectedImprovement is used for the acquisition function, given that it supports pending evaluations.

Additional arguments on top of parent class StochasticAndFilterDuplicatesSearcher:

Parameters:
  • mode (str) – “min” (default) or “max”

  • num_init_random (int) – get_config() returns randomly drawn configurations until at least init_random observations have been recorded in update(). After that, the BOTorch algorithm is used. Defaults to 3

  • no_fantasizing (bool) – If True, fantasizing is not done and pending evaluations are ignored. This may lead to loss of diversity in decisions. Defaults to False

  • max_num_observations (Optional[int]) – Maximum number of observation to use when fitting the GP. If the number of observations gets larger than this number, then data is subsampled. If None, then all data is used to fit the GP. Defaults to 200

  • input_warping (bool) – Whether to apply input warping when fitting the GP. Defaults to True

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

num_suggestions()[source]
register_pending(trial_id, config=None, milestone=None)[source]

Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.

Parameters:
  • trial_id (str) – ID of trial to be registered as pending evaluation

  • config (Optional[dict]) – If trial_id has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.

  • milestone (Optional[int]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers (config, milestone) as pending.

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

cleanup_pending(trial_id)[source]

Removes all pending evaluations for trial trial_id.

This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.

Parameters:

trial_id (str) – ID of trial whose pending evaluations should be cleared

dataset_size()[source]
Returns:

Size of dataset a model is fitted to, or 0 if no model is fitted to data

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

objectives()[source]
metric_names()[source]
Return type:

List[str]

metric_mode()[source]
Return type:

str

class syne_tune.optimizer.schedulers.searchers.botorch.botorch_searcher.BotorchSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: BoTorchSearcher

Downwards compatibility. Please use BoTorchSearcher instead

syne_tune.optimizer.schedulers.searchers.botorch.botorch_transfer_searcher module
syne_tune.optimizer.schedulers.searchers.botorch.botorch_transfer_searcher.parse_value(val)[source]
syne_tune.optimizer.schedulers.searchers.botorch.botorch_transfer_searcher.configs_from_df(df)[source]
Return type:

List[dict]

class syne_tune.optimizer.schedulers.searchers.botorch.botorch_transfer_searcher.BoTorchTransfer(config_space, metric, transfer_learning_evaluations, new_task_id, random_seed=None, encode_tasks_ordinal=False, **kwargs)[source]

Bases: BoTorch

class syne_tune.optimizer.schedulers.searchers.botorch.botorch_transfer_searcher.BoTorchTransferSearcher(config_space, metric, transfer_learning_evaluations, new_task_id, points_to_evaluate=None, allow_duplicates=False, num_init_random=0, encode_tasks_ordinal=False, **kwargs)[source]

Bases: BoTorchSearcher

dataset_size()[source]
Returns:

Size of dataset a model is fitted to, or 0 if no model is fitted to data

objectives()[source]
syne_tune.optimizer.schedulers.searchers.constrained package
class syne_tune.optimizer.schedulers.searchers.constrained.ConstrainedGPFIFOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: MultiModelGPFIFOSearcher

Gaussian process-based constrained hyperparameter optimization (to be used with FIFOScheduler).

Additional arguments on top of parent class MultiModelGPFIFOSearcher:

Parameters:

constraint_attr – Name of constraint metric in report passed to _update().

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

Submodules
syne_tune.optimizer.schedulers.searchers.constrained.constrained_gp_fifo_searcher module
class syne_tune.optimizer.schedulers.searchers.constrained.constrained_gp_fifo_searcher.ConstrainedGPFIFOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: MultiModelGPFIFOSearcher

Gaussian process-based constrained hyperparameter optimization (to be used with FIFOScheduler).

Additional arguments on top of parent class MultiModelGPFIFOSearcher:

Parameters:

constraint_attr – Name of constraint metric in report passed to _update().

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.cost_aware package
class syne_tune.optimizer.schedulers.searchers.cost_aware.CostAwareGPFIFOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: MultiModelGPFIFOSearcher

Gaussian process-based cost-aware hyperparameter optimization (to be used with FIFOScheduler). The searcher requires a cost metric, which is given by cost_attr.

Implements two different variants. If resource_attr is given, cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by kwargs["cost_model"].

If resource_attr is not given, cost values are read only at the end (just like the primary metric) and cost is modeled as \(c(x)\), using a default GP surrogate model.

Note: The presence or absence of resource_attr decides on which variant is used here. If resource_attr is given, cost_model must be given as well.

Additional arguments on top of parent class GPFIFOSearcher:

Parameters:
  • cost_attr (str) – Mandatory. Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Depending on whether resource_attr is given, cost values are read from each report or only at the end.

  • resource_attr (str, optional) – Name of resource attribute in reports, optional. If this is given, cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by cost_model. If not given, cost values are read only at the end (just like the primary metric) and cost is modeled as \(c(x)\), using a default GP surrogate model.

  • cost_model (CostModel, optional) – Needed if resource_attr is given, model for \(c(x, r)\). Ignored if resource_attr is not given, since \(c(x)\) is represented by a default GP surrogate model.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

class syne_tune.optimizer.schedulers.searchers.cost_aware.CostAwareGPMultiFidelitySearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: MultiModelGPMultiFidelitySearcher

Gaussian process-based cost-aware multi-fidelity hyperparameter optimization (to be used with HyperbandScheduler). The searcher requires a cost metric, which is given by cost_attr.

The acquisition function used here is the same as in GPMultiFidelitySearcher, but expected improvement (EI) is replaced by EIpu (see EIpuAcquisitionFunction).

Cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by kwargs["cost_model"].

Additional arguments on top of parent class GPMultiFidelitySearcher:

Parameters:
  • cost_attr (str) – Mandatory. Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Depending on whether resource_attr is given, cost values are read from each report or only at the end.

  • resource_attr (str) – Name of resource attribute in reports. Cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by cost_model.

  • cost_model (CostModel, optional) – Model for \(c(x, r)\)

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

Submodules
syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_fifo_searcher module
class syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_fifo_searcher.MultiModelGPFIFOSearcher(config_space, metric, points_to_evaluate=None, clone_from_state=False, **kwargs)[source]

Bases: GPFIFOSearcher

Superclass for multi-model extensions of GPFIFOSearcher. We first call _create_internal() passing factory and skip_optimization predicate for the INTERNAL_METRIC_NAME model, then replace the state transformer by a multi-model one.

class syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_fifo_searcher.CostAwareGPFIFOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: MultiModelGPFIFOSearcher

Gaussian process-based cost-aware hyperparameter optimization (to be used with FIFOScheduler). The searcher requires a cost metric, which is given by cost_attr.

Implements two different variants. If resource_attr is given, cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by kwargs["cost_model"].

If resource_attr is not given, cost values are read only at the end (just like the primary metric) and cost is modeled as \(c(x)\), using a default GP surrogate model.

Note: The presence or absence of resource_attr decides on which variant is used here. If resource_attr is given, cost_model must be given as well.

Additional arguments on top of parent class GPFIFOSearcher:

Parameters:
  • cost_attr (str) – Mandatory. Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Depending on whether resource_attr is given, cost values are read from each report or only at the end.

  • resource_attr (str, optional) – Name of resource attribute in reports, optional. If this is given, cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by cost_model. If not given, cost values are read only at the end (just like the primary metric) and cost is modeled as \(c(x)\), using a default GP surrogate model.

  • cost_model (CostModel, optional) – Needed if resource_attr is given, model for \(c(x, r)\). Ignored if resource_attr is not given, since \(c(x)\) is represented by a default GP surrogate model.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_multifidelity_searcher module
class syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_multifidelity_searcher.MultiModelGPMultiFidelitySearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: GPMultiFidelitySearcher

Superclass for multi-model extensions of GPMultiFidelitySearcher. We first call _create_internal() passing factory and skip_optimization predicate for the INTERNAL_METRIC_NAME model, then replace the state transformer by a multi-model one.

class syne_tune.optimizer.schedulers.searchers.cost_aware.cost_aware_gp_multifidelity_searcher.CostAwareGPMultiFidelitySearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: MultiModelGPMultiFidelitySearcher

Gaussian process-based cost-aware multi-fidelity hyperparameter optimization (to be used with HyperbandScheduler). The searcher requires a cost metric, which is given by cost_attr.

The acquisition function used here is the same as in GPMultiFidelitySearcher, but expected improvement (EI) is replaced by EIpu (see EIpuAcquisitionFunction).

Cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by kwargs["cost_model"].

Additional arguments on top of parent class GPMultiFidelitySearcher:

Parameters:
  • cost_attr (str) – Mandatory. Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Depending on whether resource_attr is given, cost values are read from each report or only at the end.

  • resource_attr (str) – Name of resource attribute in reports. Cost values are read from each report and cost is modeled as \(c(x, r)\), the cost model being given by cost_model.

  • cost_model (CostModel, optional) – Model for \(c(x, r)\)

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.dyhpo package
class syne_tune.optimizer.schedulers.searchers.dyhpo.DynamicHPOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: BaseSearcher

Supports model-based decisions in the DyHPO algorithm proposed by Wistuba etal (see DyHPORungSystem).

It is not recommended to create DynamicHPOSearcher searcher objects directly, but rather to create HyperbandScheduler objects with searcher="dyhpo" and type="dyhpo", and passing arguments here in search_options. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory to create components in a consistent way.

This searcher is special, in that it contains a searcher of type GPMultiFidelitySearcher. Also, its model-based scoring is not triggered by get_config(), but rather when the scheduler tries to find a trial which can be promoted. At this point, score_paused_trials_and_new_configs() is called, which scores all paused trials along with new configurations. Depending on who is the best scorer, a paused trial is resumed, or a trial with a new configuration is started. Since all the work is already done in score_paused_trials_and_new_configs(), the implementation of get_config() becomes trivial. See also DyHPORungSystem. Extra points:

  • The number of new configurations scored in score_paused_trials_and_new_configs() is the maximum of num_init_candidates and the number of paused trials scored as well

  • The parameters of the surrogate model are not refit in every call of score_paused_trials_and_new_configs(), but only when in the last recent call, a new configuration was chosen as top scorer. The aim is to do refitting in a similar frequency to MOBSTER, where decisions on whether to resume a trial are not done in a model-based way.

This searcher must be used with HyperbandScheduler and

type="dyhpo". It has the same constructor parameters as

GPMultiFidelitySearcher. Of these, the following are not used, but need to be given valid values: resource_acq, initial_scoring, skip_local_optimization.

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[dict]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

on_trial_result(trial_id, config, result, update)[source]

Inform searcher about result

The scheduler passes every result. If update == True, the searcher should update its surrogate model (if any), otherwise result is an intermediate result not modelled.

The default implementation calls _update() if update == True. It can be overwritten by searchers which also react to intermediate results.

Parameters:
  • trial_id (str) – See on_trial_result()

  • config (Dict[str, Any]) – See on_trial_result()

  • result (Dict[str, Any]) – See on_trial_result()

  • update (bool) – Should surrogate model be updated?

register_pending(trial_id, config=None, milestone=None)[source]

Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.

Parameters:
  • trial_id (str) – ID of trial to be registered as pending evaluation

  • config (Optional[dict]) – If trial_id has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.

  • milestone (Optional[int]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers (config, milestone) as pending.

remove_case(trial_id, **kwargs)[source]

Remove data case previously appended by _update()

For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.

Parameters:
  • trial_id (str) – ID of trial whose data is to be removed

  • kwargs – Extra arguments, optional

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

cleanup_pending(trial_id)[source]

Removes all pending evaluations for trial trial_id.

This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.

Parameters:

trial_id (str) – ID of trial whose pending evaluations should be cleared

dataset_size()[source]
Returns:

Size of dataset a model is fitted to, or 0 if no model is fitted to data

model_parameters()[source]
Returns:

Dictionary with current model (hyper)parameter values if this is supported; otherwise empty

score_paused_trials_and_new_configs(paused_trials, min_resource, new_trial_id)[source]

This method computes acquisition scores for a number of extended configs \((x, r)\). The acquisition score \(EI(x | r)\) is expected improvement (EI) at resource level \(r\). Here, the incumbent used in EI is the best value attained at level \(r\), or the best value overall if there is no data yet at that level. There are two types of configs being scored:

  • Paused trials: Passed by paused_trials as tuples (trial_id, resource), where resource is the level to be attained by the trial if it was resumed

  • New configurations drawn at random. For these, the score is EI at \(r\) equal to min_resource

We return a dictionary. If a paused trial wins, its trial_id is returned with key “trial_id”. If a new configuration wins, this configuration is returned with key “config”.

Note: As long as the internal searcher still returns configs from points_to_evaluate or drawn at random, this method always returns this config with key “config”. Scoring and considering paused trials is only done afterwards.

Parameters:
  • paused_trials (List[Tuple[str, int, int]]) – See above. Can be empty

  • min_resource (int) – Smallest resource level

  • new_trial_id (str) – ID of new trial to be started in case a new configuration wins

Return type:

Dict[str, Any]

Returns:

Dictionary, see above

get_state()[source]

Together with clone_from_state(), this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.

Return type:

Dict[str, Any]

Returns:

Pickle-able mutable state of searcher

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

property debug_log: DebugLogPrinter | None

Some subclasses support writing a debug log, using DebugLogPrinter. See RandomSearcher for an example.

Returns:

debug_log object`` or None (not supported)

Submodules
syne_tune.optimizer.schedulers.searchers.dyhpo.dyhpo_searcher module
class syne_tune.optimizer.schedulers.searchers.dyhpo.dyhpo_searcher.MyGPMultiFidelitySearcher(config_space, **kwargs)[source]

Bases: GPMultiFidelitySearcher

This wrapper is for convenience, to avoid having to depend on internal concepts of GPMultiFidelitySearcher.

score_paused_trials_and_new_configs(paused_trials, min_resource, new_trial_id, skip_optimization)[source]

See DynamicHPOSearcher.score_paused_trials_and_new_configs(). If skip_optimization == True, this is passed to the posterior state computation, and refitting of the surrogate model is skipped. Otherwise, nothing is passed, so the built-in skip_optimization logic is used.

Return type:

Dict[str, Any]

class syne_tune.optimizer.schedulers.searchers.dyhpo.dyhpo_searcher.DynamicHPOSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: BaseSearcher

Supports model-based decisions in the DyHPO algorithm proposed by Wistuba etal (see DyHPORungSystem).

It is not recommended to create DynamicHPOSearcher searcher objects directly, but rather to create HyperbandScheduler objects with searcher="dyhpo" and type="dyhpo", and passing arguments here in search_options. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory to create components in a consistent way.

This searcher is special, in that it contains a searcher of type GPMultiFidelitySearcher. Also, its model-based scoring is not triggered by get_config(), but rather when the scheduler tries to find a trial which can be promoted. At this point, score_paused_trials_and_new_configs() is called, which scores all paused trials along with new configurations. Depending on who is the best scorer, a paused trial is resumed, or a trial with a new configuration is started. Since all the work is already done in score_paused_trials_and_new_configs(), the implementation of get_config() becomes trivial. See also DyHPORungSystem. Extra points:

  • The number of new configurations scored in score_paused_trials_and_new_configs() is the maximum of num_init_candidates and the number of paused trials scored as well

  • The parameters of the surrogate model are not refit in every call of score_paused_trials_and_new_configs(), but only when in the last recent call, a new configuration was chosen as top scorer. The aim is to do refitting in a similar frequency to MOBSTER, where decisions on whether to resume a trial are not done in a model-based way.

This searcher must be used with HyperbandScheduler and

type="dyhpo". It has the same constructor parameters as

GPMultiFidelitySearcher. Of these, the following are not used, but need to be given valid values: resource_acq, initial_scoring, skip_local_optimization.

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[dict]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

on_trial_result(trial_id, config, result, update)[source]

Inform searcher about result

The scheduler passes every result. If update == True, the searcher should update its surrogate model (if any), otherwise result is an intermediate result not modelled.

The default implementation calls _update() if update == True. It can be overwritten by searchers which also react to intermediate results.

Parameters:
  • trial_id (str) – See on_trial_result()

  • config (Dict[str, Any]) – See on_trial_result()

  • result (Dict[str, Any]) – See on_trial_result()

  • update (bool) – Should surrogate model be updated?

register_pending(trial_id, config=None, milestone=None)[source]

Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.

Parameters:
  • trial_id (str) – ID of trial to be registered as pending evaluation

  • config (Optional[dict]) – If trial_id has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.

  • milestone (Optional[int]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers (config, milestone) as pending.

remove_case(trial_id, **kwargs)[source]

Remove data case previously appended by _update()

For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.

Parameters:
  • trial_id (str) – ID of trial whose data is to be removed

  • kwargs – Extra arguments, optional

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

cleanup_pending(trial_id)[source]

Removes all pending evaluations for trial trial_id.

This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.

Parameters:

trial_id (str) – ID of trial whose pending evaluations should be cleared

dataset_size()[source]
Returns:

Size of dataset a model is fitted to, or 0 if no model is fitted to data

model_parameters()[source]
Returns:

Dictionary with current model (hyper)parameter values if this is supported; otherwise empty

score_paused_trials_and_new_configs(paused_trials, min_resource, new_trial_id)[source]

This method computes acquisition scores for a number of extended configs \((x, r)\). The acquisition score \(EI(x | r)\) is expected improvement (EI) at resource level \(r\). Here, the incumbent used in EI is the best value attained at level \(r\), or the best value overall if there is no data yet at that level. There are two types of configs being scored:

  • Paused trials: Passed by paused_trials as tuples (trial_id, resource), where resource is the level to be attained by the trial if it was resumed

  • New configurations drawn at random. For these, the score is EI at \(r\) equal to min_resource

We return a dictionary. If a paused trial wins, its trial_id is returned with key “trial_id”. If a new configuration wins, this configuration is returned with key “config”.

Note: As long as the internal searcher still returns configs from points_to_evaluate or drawn at random, this method always returns this config with key “config”. Scoring and considering paused trials is only done afterwards.

Parameters:
  • paused_trials (List[Tuple[str, int, int]]) – See above. Can be empty

  • min_resource (int) – Smallest resource level

  • new_trial_id (str) – ID of new trial to be started in case a new configuration wins

Return type:

Dict[str, Any]

Returns:

Dictionary, see above

get_state()[source]

Together with clone_from_state(), this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.

Return type:

Dict[str, Any]

Returns:

Pickle-able mutable state of searcher

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

property debug_log: DebugLogPrinter | None

Some subclasses support writing a debug log, using DebugLogPrinter. See RandomSearcher for an example.

Returns:

debug_log object`` or None (not supported)

syne_tune.optimizer.schedulers.searchers.dyhpo.hyperband_dyhpo module
class syne_tune.optimizer.schedulers.searchers.dyhpo.hyperband_dyhpo.ScheduleDecision[source]

Bases: object

PROMOTE_SH = 0
PROMOTE_DYHPO = 1
START_DYHPO = 2
class syne_tune.optimizer.schedulers.searchers.dyhpo.hyperband_dyhpo.DyHPORungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t, searcher, probability_sh, random_state)[source]

Bases: PromotionRungSystem

Implements the logic which decides which paused trial to promote to the next resource level, or alternatively which configuration to start as a new trial, proposed in:

Wistuba, M. and Kadra, A. and Grabocka, J.
Dynamic and Efficient Gray-Box Hyperparameter Optimization for Deep Learning

We do promotion-based scheduling, as in PromotionRungSystem. In fact, we run the successive halving rule in on_task_schedule() with probability probability_sh, and the DyHPO logic otherwise, or if the SH rule does not promote a trial. This mechanism (not contained in the paper) ensures that trials are promoted eventually, even if DyHPO only starts new trials.

Since HyperbandScheduler was designed for promotion decisions to be separate from decisions about new configs, the overall workflow is a bit tricky:

  • In FIFOScheduler._suggest(), we first call promote_trial_id, extra_kwargs = self._promote_trial(). If promote_trial_id != None, this trial is promoted. Otherwise, we call config = self.searcher.get_config(**extra_kwargs, trial_id=trial_id) and start a new trial with this config. In most cases, _promote_trial() makes a promotion decision without using the searcher.

  • Here, we use the fact that information can be passed from _promote_trial() to self.searcher.get_config via extra_kwargs. Namely, :meth:``HyperbandScheduler._promote_trial` calls on_task_schedule() here, which calls score_paused_trials_and_new_configs(), where everything happens.

  • First, all paused trials are scored w.r.t. the value of running them for one more unit of resource. Also, a number of random configs are scored w.r.t. the value of running them to the minimum resource.

  • If the winning config is from a paused trial, this is resumed. If the winning config is a new one, on_task_schedule() returns this config using a special key KEY_NEW_CONFIGURATION. This dict becomes part of extra_kwargs and is passed to self.searcher.get_config

  • get_config() is trivial. It obtains an argument of name KEY_NEW_CONFIGURATION returns its value, which is the winning config to be started as new trial

We can ignore rung_levels and promote_quantiles, they are not used. For each trial, we only need to maintain the resource level at which it is paused.

on_task_schedule(new_trial_id)[source]

The main decision making happens here. We collect (trial_id, resource) for all paused trials and call searcher. The searcher scores all these trials along with a certain number of randomly drawn new configurations.

If one of the paused trials has the best score, we return its trial_id along with extra information, so it gets promoted. If one of the new configurations has the best score, we return this configuration. In this case, a new trial is started with this configuration.

Note: For this scheduler type, kwargs must contain the trial ID of the new trial to be started, in case none can be promoted.

Return type:

Dict[str, Any]

property schedule_records: List[Tuple[str, int, int]]
static summary_schedule_keys()[source]
Return type:

List[str]

summary_schedule_records()[source]
Return type:

Dict[str, Any]

support_early_checkpoint_removal()[source]

Early checkpoint removal currently not supported for DyHPO

Return type:

bool

syne_tune.optimizer.schedulers.searchers.hypertune package
class syne_tune.optimizer.schedulers.searchers.hypertune.HyperTuneSearcher(config_space, **kwargs)[source]

Bases: GPMultiFidelitySearcher

Implements Hyper-Tune as extension of GPMultiFidelitySearcher, see HyperTuneIndependentGPModel for references. Two modifications:

  • New brackets are sampled from a model-based distribution \([w_k]\)

  • The acquisition function is fed with predictive means and variances from a mixture over rung level distributions, weighted by \([ heta_k]\)

It is not recommended to create HyperTuneSearcher searcher objects directly, but rather to create HyperbandScheduler objects with searcher="hypertune", and passing arguments here in search_options. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory to create components in a consistent way.

The following arguments of the parent class are not relevant here, and are ignored: gp_resource_kernel, resource_acq, issm_gamma_one, expdecay_normalize_inputs.

Additional arguments on top of parent class GPMultiFidelitySearcher:

Parameters:
  • model (str, optional) –

    Selects surrogate model (learning curve model) to be used. Choices are:

    • ”gp_multitask”: GP multi-task surrogate model

    • ”gp_independent” (default): Independent GPs for each rung level, sharing an ARD kernel

    The default is “gp_independent” (as in the Hyper-Tune paper), which is different to the default in GPMultiFidelitySearcher (which is “gp_multitask”). “gp_issm”, “gp_expdecay” not supported here.

  • hypertune_distribution_num_samples (int, optional) – Parameter for estimating the distribution, given by \([ heta_k]\). Defaults to 50

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

Submodules
syne_tune.optimizer.schedulers.searchers.hypertune.hypertune_bracket_distribution module
class syne_tune.optimizer.schedulers.searchers.hypertune.hypertune_bracket_distribution.HyperTuneBracketDistribution[source]

Bases: DefaultHyperbandBracketDistribution

Represents the adaptive distribution over brackets [w_k].

configure(scheduler)[source]

This method is called in by the scheduler just after self.searcher.configure_scheduler. The searcher must be accessible via self.searcher. The __call__() method cannot be used before this method has been called.

syne_tune.optimizer.schedulers.searchers.hypertune.hypertune_searcher module
class syne_tune.optimizer.schedulers.searchers.hypertune.hypertune_searcher.HyperTuneSearcher(config_space, **kwargs)[source]

Bases: GPMultiFidelitySearcher

Implements Hyper-Tune as extension of GPMultiFidelitySearcher, see HyperTuneIndependentGPModel for references. Two modifications:

  • New brackets are sampled from a model-based distribution \([w_k]\)

  • The acquisition function is fed with predictive means and variances from a mixture over rung level distributions, weighted by \([ heta_k]\)

It is not recommended to create HyperTuneSearcher searcher objects directly, but rather to create HyperbandScheduler objects with searcher="hypertune", and passing arguments here in search_options. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory to create components in a consistent way.

The following arguments of the parent class are not relevant here, and are ignored: gp_resource_kernel, resource_acq, issm_gamma_one, expdecay_normalize_inputs.

Additional arguments on top of parent class GPMultiFidelitySearcher:

Parameters:
  • model (str, optional) –

    Selects surrogate model (learning curve model) to be used. Choices are:

    • ”gp_multitask”: GP multi-task surrogate model

    • ”gp_independent” (default): Independent GPs for each rung level, sharing an ARD kernel

    The default is “gp_independent” (as in the Hyper-Tune paper), which is different to the default in GPMultiFidelitySearcher (which is “gp_multitask”). “gp_issm”, “gp_expdecay” not supported here.

  • hypertune_distribution_num_samples (int, optional) – Parameter for estimating the distribution, given by \([ heta_k]\). Defaults to 50

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

syne_tune.optimizer.schedulers.searchers.kde package
class syne_tune.optimizer.schedulers.searchers.kde.KernelDensityEstimator(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, num_min_data_points=None, top_n_percent=None, min_bandwidth=None, num_candidates=None, bandwidth_factor=None, random_fraction=None, **kwargs)[source]

Bases: StochasticAndFilterDuplicatesSearcher

Fits two kernel density estimators (KDE) to model the density of the top N configurations as well as the density of the configurations that are not among the top N, respectively. New configurations are sampled by optimizing the ratio of these two densities. KDE as model for Bayesian optimization has been originally proposed by Bergstra et al. Compared to their original implementation TPE, we use multi-variate instead of univariate KDE, as proposed by Falkner et al. Code is based on the implementation by Falkner et al: https://github.com/automl/HpBandSter/tree/master/hpbandster

Algorithms for Hyper-Parameter Optimization
J. Bergstra and R. Bardenet and Y. Bengio and B. K{‘e}gl
Proceedings of the 24th International Conference on Advances in Neural Information Processing Systems

and

BOHB: Robust and Efficient Hyperparameter Optimization at Scale
S. Falkner and A. Klein and F. Hutter
Proceedings of the 35th International Conference on Machine Learning

Note: restrict_configurations is not supported here, this would require reimplementing the selection of configs in _get_config().

Additional arguments on top of parent class StochasticAndFilterDuplicatesSearcher:

Parameters:
  • mode (Optional[str]) – Mode to use for the metric given, can be “min” or “max”. Is obtained from scheduler in configure_scheduler(). Defaults to “min”

  • num_min_data_points (Optional[int]) – Minimum number of data points that we use to fit the KDEs. As long as less observations have been received in update(), randomly drawn configurations are returned in get_config(). If set to None, we set this to the number of hyperparameters. Defaults to None.

  • top_n_percent (Optional[int]) – Determines how many datapoints we use to fit the first KDE model for modeling the well performing configurations. Defaults to 15

  • min_bandwidth (Optional[float]) – The minimum bandwidth for the KDE models. Defaults to 1e-3

  • num_candidates (Optional[int]) – Number of candidates that are sampled to optimize the acquisition function. Defaults to 64

  • bandwidth_factor (Optional[int]) – We sample continuous hyperparameter from a truncated Normal. This factor is multiplied to the bandwidth to define the standard deviation of this truncated Normal. Defaults to 3

  • random_fraction (Optional[float]) – Defines the fraction of configurations that are drawn uniformly at random instead of sampling from the model. Defaults to 0.33

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

class syne_tune.optimizer.schedulers.searchers.kde.MultiFidelityKernelDensityEstimator(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, num_min_data_points=None, top_n_percent=None, min_bandwidth=None, num_candidates=None, bandwidth_factor=None, random_fraction=None, resource_attr=None, **kwargs)[source]

Bases: KernelDensityEstimator

Adapts KernelDensityEstimator to the multi-fidelity setting as proposed by Falkner et al such that we can use it with Hyperband. Following Falkner et al, we fit the KDE only on the highest resource level where we have at least num_min_data_points. Code is based on the implementation by Falkner et al: https://github.com/automl/HpBandSter/tree/master/hpbandster

BOHB: Robust and Efficient Hyperparameter Optimization at Scale
S. Falkner and A. Klein and F. Hutter
Proceedings of the 35th International Conference on Machine Learning

Additional arguments on top of parent class KernelDensityEstimator:

Parameters:

resource_attr (Optional[str]) – Name of resource attribute. Defaults to scheduler.resource_attr in configure_scheduler()

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

Submodules
syne_tune.optimizer.schedulers.searchers.kde.kde_searcher module
class syne_tune.optimizer.schedulers.searchers.kde.kde_searcher.KernelDensityEstimator(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, num_min_data_points=None, top_n_percent=None, min_bandwidth=None, num_candidates=None, bandwidth_factor=None, random_fraction=None, **kwargs)[source]

Bases: StochasticAndFilterDuplicatesSearcher

Fits two kernel density estimators (KDE) to model the density of the top N configurations as well as the density of the configurations that are not among the top N, respectively. New configurations are sampled by optimizing the ratio of these two densities. KDE as model for Bayesian optimization has been originally proposed by Bergstra et al. Compared to their original implementation TPE, we use multi-variate instead of univariate KDE, as proposed by Falkner et al. Code is based on the implementation by Falkner et al: https://github.com/automl/HpBandSter/tree/master/hpbandster

Algorithms for Hyper-Parameter Optimization
J. Bergstra and R. Bardenet and Y. Bengio and B. K{‘e}gl
Proceedings of the 24th International Conference on Advances in Neural Information Processing Systems

and

BOHB: Robust and Efficient Hyperparameter Optimization at Scale
S. Falkner and A. Klein and F. Hutter
Proceedings of the 35th International Conference on Machine Learning

Note: restrict_configurations is not supported here, this would require reimplementing the selection of configs in _get_config().

Additional arguments on top of parent class StochasticAndFilterDuplicatesSearcher:

Parameters:
  • mode (Optional[str]) – Mode to use for the metric given, can be “min” or “max”. Is obtained from scheduler in configure_scheduler(). Defaults to “min”

  • num_min_data_points (Optional[int]) – Minimum number of data points that we use to fit the KDEs. As long as less observations have been received in update(), randomly drawn configurations are returned in get_config(). If set to None, we set this to the number of hyperparameters. Defaults to None.

  • top_n_percent (Optional[int]) – Determines how many datapoints we use to fit the first KDE model for modeling the well performing configurations. Defaults to 15

  • min_bandwidth (Optional[float]) – The minimum bandwidth for the KDE models. Defaults to 1e-3

  • num_candidates (Optional[int]) – Number of candidates that are sampled to optimize the acquisition function. Defaults to 64

  • bandwidth_factor (Optional[int]) – We sample continuous hyperparameter from a truncated Normal. This factor is multiplied to the bandwidth to define the standard deviation of this truncated Normal. Defaults to 3

  • random_fraction (Optional[float]) – Defines the fraction of configurations that are drawn uniformly at random instead of sampling from the model. Defaults to 0.33

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.kde.multi_fidelity_kde_searcher module
class syne_tune.optimizer.schedulers.searchers.kde.multi_fidelity_kde_searcher.MultiFidelityKernelDensityEstimator(config_space, metric, points_to_evaluate=None, allow_duplicates=None, mode=None, num_min_data_points=None, top_n_percent=None, min_bandwidth=None, num_candidates=None, bandwidth_factor=None, random_fraction=None, resource_attr=None, **kwargs)[source]

Bases: KernelDensityEstimator

Adapts KernelDensityEstimator to the multi-fidelity setting as proposed by Falkner et al such that we can use it with Hyperband. Following Falkner et al, we fit the KDE only on the highest resource level where we have at least num_min_data_points. Code is based on the implementation by Falkner et al: https://github.com/automl/HpBandSter/tree/master/hpbandster

BOHB: Robust and Efficient Hyperparameter Optimization at Scale
S. Falkner and A. Klein and F. Hutter
Proceedings of the 35th International Conference on Machine Learning

Additional arguments on top of parent class KernelDensityEstimator:

Parameters:

resource_attr (Optional[str]) – Name of resource attribute. Defaults to scheduler.resource_attr in configure_scheduler()

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

syne_tune.optimizer.schedulers.searchers.sklearn package
class syne_tune.optimizer.schedulers.searchers.sklearn.SKLearnSurrogateSearcher(config_space, metric, estimator, points_to_evaluate=None, scoring_class=None, num_initial_candidates=250, num_initial_random_choices=3, allow_duplicates=False, restrict_configurations=None, clone_from_state=False, **kwargs)[source]

Bases: BayesianOptimizationSearcher

SKLearn Surrogate Bayesian optimization for FIFO scheduler

This searcher must be used with FIFOScheduler. It provides Bayesian optimization, based on a scikit-learn estimator based surrogate model.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • estimator (SKLearnEstimator) – Instance of SKLearnEstimator to be used as surrogate model

  • scoring_class (Optional[Callable[[Any], ScoringFunction]]) – The scoring function (or acquisition function) class and any extra parameters used to instantiate it. If None, expected improvement (EI) is used. Note that the acquisition function is not locally optimized with this searcher.

  • num_initial_candidates (int) – Number of candidates sampled for scoring with acquisition function.

  • num_initial_random_choices (int) – Number of randomly chosen candidates before surrogate model is used.

  • allow_duplicates (bool) – If True, allow for the same candidate to be selected more than once.

  • restrict_configurations (Optional[List[Dict[str, Any]]]) – If given, the searcher only suggests configurations from this list. If allow_duplicates == False, entries are popped off this list once suggested.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

Submodules
syne_tune.optimizer.schedulers.searchers.sklearn.sklearn_surrogate_searcher module
class syne_tune.optimizer.schedulers.searchers.sklearn.sklearn_surrogate_searcher.SKLearnSurrogateSearcher(config_space, metric, estimator, points_to_evaluate=None, scoring_class=None, num_initial_candidates=250, num_initial_random_choices=3, allow_duplicates=False, restrict_configurations=None, clone_from_state=False, **kwargs)[source]

Bases: BayesianOptimizationSearcher

SKLearn Surrogate Bayesian optimization for FIFO scheduler

This searcher must be used with FIFOScheduler. It provides Bayesian optimization, based on a scikit-learn estimator based surrogate model.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • estimator (SKLearnEstimator) – Instance of SKLearnEstimator to be used as surrogate model

  • scoring_class (Optional[Callable[[Any], ScoringFunction]]) – The scoring function (or acquisition function) class and any extra parameters used to instantiate it. If None, expected improvement (EI) is used. Note that the acquisition function is not locally optimized with this searcher.

  • num_initial_candidates (int) – Number of candidates sampled for scoring with acquisition function.

  • num_initial_random_choices (int) – Number of randomly chosen candidates before surrogate model is used.

  • allow_duplicates (bool) – If True, allow for the same candidate to be selected more than once.

  • restrict_configurations (Optional[List[Dict[str, Any]]]) – If given, the searcher only suggests configurations from this list. If allow_duplicates == False, entries are popped off this list once suggested.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.utils package
class syne_tune.optimizer.schedulers.searchers.utils.HyperparameterRanges(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]

Bases: object

Wraps configuration space, provides services around encoding of hyperparameters (mapping configurations to [0, 1] vectors and vice versa).

If name_last_pos is given, the hyperparameter of that name is assigned the final position in the vector returned by to_ndarray(). This can be used to single out the (time) resource for a GP model, where that component has to come last.

If in this case (name_last_pos given), value_for_last_pos is also given, some methods are modified:

  • random_config() samples a config as normal, but then overwrites the name_last_pos component by value_for_last_pos

  • get_ndarray_bounds() works as normal, but returns bound (a, a) for name_last_pos component, where a is the internal value corresponding to value_for_last_pos

The use case is HPO with a resource attribute. This attribute should be fixed when optimizing the acquisition function, but can take different values in the evaluation data (coming from all previous searches).

If active_config_space is given, it contains a subset of non-constant hyperparameters in config_space, and the range of each entry is a subset of the range of the corresponding config_space entry. These active ranges affect the choice of new configs (by sampling). While the internal encoding is based on original ranges, search is restricted to active ranges (e.g., optimization of surrogate model). This option is required to implement transfer tuning, where domain ranges in config_space may be narrower than what data from past tuning jobs requires.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space. Constant hyperparameters are filtered out here

  • name_last_pos (Optional[str]) – See above, optional

  • value_for_last_pos – See above, optional

  • active_config_space (Optional[dict]) – See above, optional

  • prefix_keys (Optional[List[str]]) – If given, these keys into config_space come first in the internal ordering, which determines the internal encoding. Optional

property internal_keys: List[str]
property config_space_for_sampling: Dict[str, Any]
to_ndarray(config)[source]

Map configuration to [0, 1] encoded vector

Parameters:

config (Dict[str, Union[int, float, str]]) – Configuration to encode

Return type:

ndarray

Returns:

Encoded vector

to_ndarray_matrix(configs)[source]

Map configurations to [0, 1] encoded matrix

Parameters:

configs (Iterable[Dict[str, Union[int, float, str]]]) – Configurations to encode

Return type:

ndarray

Returns:

Matrix of encoded vectors (rows)

property ndarray_size: int
Returns:

Dimensionality of encoded vector returned by to_ndarray

from_ndarray(enc_config)[source]

Maps encoded vector back to configuration (can involve rounding)

The encoded vector enc_config need to be in the image of to_ndarray. In fact, any [0, 1] valued vector of dimensionality ndarray_size is allowed.

Parameters:

enc_config (ndarray) – Encoded vector

Return type:

Dict[str, Union[int, float, str]]

Returns:

Configuration corresponding to encoded vector

property encoded_ranges: Dict[str, Tuple[int, int]]

Encoded ranges are [0, 1] or closed subintervals thereof, in case active_config_space is used.

Returns:

Ranges of hyperparameters in the encoded ndarray representation

is_attribute_fixed()[source]
Returns:

Is last position attribute fixed?

random_config(random_state)[source]

Draws random configuration

Parameters:

random_state (RandomState) – Random state

Return type:

Dict[str, Union[int, float, str]]

Returns:

Random configuration

random_configs(random_state, num_configs)[source]

Draws random configurations

Parameters:
  • random_state – Random state

  • num_configs (int) – Number of configurations to sample

Return type:

List[Dict[str, Union[int, float, str]]]

Returns:

Random configurations

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

Returns:

List of (lower, upper) bounds for each dimension in encoded vector representation.

filter_for_last_pos_value(configs)[source]

If is_attribute_fixed, configs is filtered by removing entries whose name_last_pos attribute value is different from value_for_last_pos. Otherwise, it is returned unchanged.

Parameters:

configs (List[Dict[str, Union[int, float, str]]]) – List of configs to be filtered

Return type:

List[Dict[str, Union[int, float, str]]]

Returns:

Filtered list of configs

config_to_tuple(config, keys=None, skip_last=False)[source]
Parameters:
  • config (Dict[str, Union[int, float, str]]) – Configuration

  • keys (Optional[List[str]]) – Overrides _internal_keys

  • skip_last (bool) – If True and name_last_pos is used, the corresponding attribute is skipped, so that config and tuple are non-extended

Return type:

Tuple[Union[str, int, float], ...]

Returns:

Tuple representation

tuple_to_config(config_tpl, keys=None, skip_last=False)[source]

Reverse of config_to_tuple().

Parameters:
  • config_tpl (Tuple[Union[str, int, float], ...]) – Tuple representation

  • keys (Optional[List[str]]) – Overrides _internal_keys

  • skip_last (bool) – If True and name_last_pos is used, the corresponding attribute is skipped, so that config and tuple are non-extended

Return type:

Dict[str, Union[int, float, str]]

Returns:

Configuration corresponding to config_tpl

config_to_match_string(config, keys=None, skip_last=False)[source]

Maps configuration to match string, used to compare for approximate equality. Two configurations are considered to be different if their match strings are not the same.

Parameters:
  • config (Dict[str, Union[int, float, str]]) – Configuration

  • keys (Optional[List[str]]) – Overrides _internal_keys

  • skip_last (bool) – If True and name_last_pos is used, the corresponding attribute is skipped, so that config and match string are non-extended

Return type:

str

Returns:

Match string

syne_tune.optimizer.schedulers.searchers.utils.make_hyperparameter_ranges(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]

Default method to create HyperparameterRanges from config_space

Parameters:
Return type:

HyperparameterRanges

Returns:

New object

class syne_tune.optimizer.schedulers.searchers.utils.HyperparameterRangesImpl(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]

Bases: HyperparameterRanges

Basic implementation of HyperparameterRanges.

Parameters:
property ndarray_size: int
Returns:

Dimensionality of encoded vector returned by to_ndarray

to_ndarray(config)[source]

Map configuration to [0, 1] encoded vector

Parameters:

config (Dict[str, Union[int, float, str]]) – Configuration to encode

Return type:

ndarray

Returns:

Encoded vector

from_ndarray(enc_config)[source]

Maps encoded vector back to configuration (can involve rounding)

The encoded vector enc_config need to be in the image of to_ndarray. In fact, any [0, 1] valued vector of dimensionality ndarray_size is allowed.

Parameters:

enc_config (ndarray) – Encoded vector

Return type:

Dict[str, Union[int, float, str]]

Returns:

Configuration corresponding to encoded vector

property encoded_ranges: Dict[str, Tuple[int, int]]

Encoded ranges are [0, 1] or closed subintervals thereof, in case active_config_space is used.

Returns:

Ranges of hyperparameters in the encoded ndarray representation

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

Returns:

List of (lower, upper) bounds for each dimension in encoded vector representation.

class syne_tune.optimizer.schedulers.searchers.utils.LinearScaling[source]

Bases: Scaling

to_internal(value)[source]
Return type:

float

from_internal(value)[source]
Return type:

float

class syne_tune.optimizer.schedulers.searchers.utils.LogScaling[source]

Bases: Scaling

to_internal(value)[source]
Return type:

float

from_internal(value)[source]
Return type:

float

class syne_tune.optimizer.schedulers.searchers.utils.ReverseLogScaling[source]

Bases: Scaling

to_internal(value)[source]
Return type:

float

from_internal(value)[source]
Return type:

float

syne_tune.optimizer.schedulers.searchers.utils.get_scaling(hp_range)[source]
Return type:

Scaling

Submodules
syne_tune.optimizer.schedulers.searchers.utils.common module
syne_tune.optimizer.schedulers.searchers.utils.default_arguments module
class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.CheckType[source]

Bases: object

assert_valid(key, value)[source]
class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.Float(lower=None, upper=None)[source]

Bases: CheckType

assert_valid(key, value)[source]
class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.Integer(lower=None, upper=None)[source]

Bases: CheckType

assert_valid(key, value)[source]
class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.IntegerOrNone(lower=None, upper=None)[source]

Bases: Integer

assert_valid(key, value)[source]
class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.Categorical(choices)[source]

Bases: CheckType

assert_valid(key, value)[source]
class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.String[source]

Bases: CheckType

assert_valid(key, value)[source]
class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.Boolean[source]

Bases: CheckType

assert_valid(key, value)[source]
class syne_tune.optimizer.schedulers.searchers.utils.default_arguments.Dictionary[source]

Bases: CheckType

assert_valid(key, value)[source]
syne_tune.optimizer.schedulers.searchers.utils.default_arguments.check_and_merge_defaults(options, mandatory, default_options, constraints=None, dict_name=None)[source]

First, check that all keys in mandatory appear in options. Second, create result_options by merging options and default_options, where entries in options have precedence. Finally, if constraints is given, this is used to check validity of values.

Parameters:
  • options (Dict[str, Any]) – Input arguments

  • mandatory (Set[str]) – Set of mandatory argument names

  • default_options (Dict[str, Any]) – Default values for options

  • constraints (Optional[Dict[str, CheckType]]) – See above, optional

  • dict_name (Optional[str]) – Prefix used in assert messages, optional

Return type:

Dict[str, Any]

Returns:

Output arguments

syne_tune.optimizer.schedulers.searchers.utils.default_arguments.filter_by_key(options, remove_keys)[source]

Filter options by removing entries whose keys are in remove_keys. Used to filter kwargs passed to a constructor, before passing it to the superclass constructor.

Parameters:
  • options (Dict[str, Any]) – Arguments to be filtered

  • remove_keys (Set[str]) – See above

Return type:

Dict[str, Any]

Returns:

Filtered options

syne_tune.optimizer.schedulers.searchers.utils.default_arguments.assert_no_invalid_options(options, all_keys, name)[source]
syne_tune.optimizer.schedulers.searchers.utils.exclusion_list module
class syne_tune.optimizer.schedulers.searchers.utils.exclusion_list.ExclusionList(hp_ranges, configurations=None)[source]

Bases: object

Maintains exclusion list of configs, to avoid choosing configs several times. In fact, self.excl_set maintains a set of match strings.

The exclusion list contains non-extended configs, but it can be fed with and queried with extended configs. In that case, the resource attribute is removed from the config.

Parameters:
  • hp_ranges (HyperparameterRanges) – Encodes configurations to vectors

  • configurations (Union[List[Dict[str, Union[int, float, str]]], Set[str], None]) – Initial configurations. Default is empty

contains(config)[source]
Return type:

bool

add(config)[source]
copy()[source]
Return type:

ExclusionList

config_space_exhausted()[source]
Return type:

bool

get_state()[source]
Return type:

Dict[str, Any]

clone_from_state(state)[source]
class syne_tune.optimizer.schedulers.searchers.utils.exclusion_list.ExclusionListFromState(state, filter_observed_data=None)[source]

Bases: ExclusionList

syne_tune.optimizer.schedulers.searchers.utils.hp_ranges module
class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges.HyperparameterRanges(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]

Bases: object

Wraps configuration space, provides services around encoding of hyperparameters (mapping configurations to [0, 1] vectors and vice versa).

If name_last_pos is given, the hyperparameter of that name is assigned the final position in the vector returned by to_ndarray(). This can be used to single out the (time) resource for a GP model, where that component has to come last.

If in this case (name_last_pos given), value_for_last_pos is also given, some methods are modified:

  • random_config() samples a config as normal, but then overwrites the name_last_pos component by value_for_last_pos

  • get_ndarray_bounds() works as normal, but returns bound (a, a) for name_last_pos component, where a is the internal value corresponding to value_for_last_pos

The use case is HPO with a resource attribute. This attribute should be fixed when optimizing the acquisition function, but can take different values in the evaluation data (coming from all previous searches).

If active_config_space is given, it contains a subset of non-constant hyperparameters in config_space, and the range of each entry is a subset of the range of the corresponding config_space entry. These active ranges affect the choice of new configs (by sampling). While the internal encoding is based on original ranges, search is restricted to active ranges (e.g., optimization of surrogate model). This option is required to implement transfer tuning, where domain ranges in config_space may be narrower than what data from past tuning jobs requires.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space. Constant hyperparameters are filtered out here

  • name_last_pos (Optional[str]) – See above, optional

  • value_for_last_pos – See above, optional

  • active_config_space (Optional[dict]) – See above, optional

  • prefix_keys (Optional[List[str]]) – If given, these keys into config_space come first in the internal ordering, which determines the internal encoding. Optional

property internal_keys: List[str]
property config_space_for_sampling: Dict[str, Any]
to_ndarray(config)[source]

Map configuration to [0, 1] encoded vector

Parameters:

config (Dict[str, Union[int, float, str]]) – Configuration to encode

Return type:

ndarray

Returns:

Encoded vector

to_ndarray_matrix(configs)[source]

Map configurations to [0, 1] encoded matrix

Parameters:

configs (Iterable[Dict[str, Union[int, float, str]]]) – Configurations to encode

Return type:

ndarray

Returns:

Matrix of encoded vectors (rows)

property ndarray_size: int
Returns:

Dimensionality of encoded vector returned by to_ndarray

from_ndarray(enc_config)[source]

Maps encoded vector back to configuration (can involve rounding)

The encoded vector enc_config need to be in the image of to_ndarray. In fact, any [0, 1] valued vector of dimensionality ndarray_size is allowed.

Parameters:

enc_config (ndarray) – Encoded vector

Return type:

Dict[str, Union[int, float, str]]

Returns:

Configuration corresponding to encoded vector

property encoded_ranges: Dict[str, Tuple[int, int]]

Encoded ranges are [0, 1] or closed subintervals thereof, in case active_config_space is used.

Returns:

Ranges of hyperparameters in the encoded ndarray representation

is_attribute_fixed()[source]
Returns:

Is last position attribute fixed?

random_config(random_state)[source]

Draws random configuration

Parameters:

random_state (RandomState) – Random state

Return type:

Dict[str, Union[int, float, str]]

Returns:

Random configuration

random_configs(random_state, num_configs)[source]

Draws random configurations

Parameters:
  • random_state – Random state

  • num_configs (int) – Number of configurations to sample

Return type:

List[Dict[str, Union[int, float, str]]]

Returns:

Random configurations

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

Returns:

List of (lower, upper) bounds for each dimension in encoded vector representation.

filter_for_last_pos_value(configs)[source]

If is_attribute_fixed, configs is filtered by removing entries whose name_last_pos attribute value is different from value_for_last_pos. Otherwise, it is returned unchanged.

Parameters:

configs (List[Dict[str, Union[int, float, str]]]) – List of configs to be filtered

Return type:

List[Dict[str, Union[int, float, str]]]

Returns:

Filtered list of configs

config_to_tuple(config, keys=None, skip_last=False)[source]
Parameters:
  • config (Dict[str, Union[int, float, str]]) – Configuration

  • keys (Optional[List[str]]) – Overrides _internal_keys

  • skip_last (bool) – If True and name_last_pos is used, the corresponding attribute is skipped, so that config and tuple are non-extended

Return type:

Tuple[Union[str, int, float], ...]

Returns:

Tuple representation

tuple_to_config(config_tpl, keys=None, skip_last=False)[source]

Reverse of config_to_tuple().

Parameters:
  • config_tpl (Tuple[Union[str, int, float], ...]) – Tuple representation

  • keys (Optional[List[str]]) – Overrides _internal_keys

  • skip_last (bool) – If True and name_last_pos is used, the corresponding attribute is skipped, so that config and tuple are non-extended

Return type:

Dict[str, Union[int, float, str]]

Returns:

Configuration corresponding to config_tpl

config_to_match_string(config, keys=None, skip_last=False)[source]

Maps configuration to match string, used to compare for approximate equality. Two configurations are considered to be different if their match strings are not the same.

Parameters:
  • config (Dict[str, Union[int, float, str]]) – Configuration

  • keys (Optional[List[str]]) – Overrides _internal_keys

  • skip_last (bool) – If True and name_last_pos is used, the corresponding attribute is skipped, so that config and match string are non-extended

Return type:

str

Returns:

Match string

syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_factory module
syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_factory.make_hyperparameter_ranges(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]

Default method to create HyperparameterRanges from config_space

Parameters:
Return type:

HyperparameterRanges

Returns:

New object

syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl module
class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRange(name)[source]

Bases: object

property name: str
to_ndarray(hp)[source]
Return type:

ndarray

from_ndarray(cand_ndarray)[source]
Return type:

Union[str, int, float]

ndarray_size()[source]
Return type:

int

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.scale_from_zero_one(value, lower_bound, upper_bound, scaling, lower_internal, upper_internal)[source]
class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeContinuous(name, lower_bound, upper_bound, scaling, active_lower_bound=None, active_upper_bound=None)[source]

Bases: HyperparameterRange

Real valued hyperparameter. If active_lower_bound and/or active_upper_bound are given, the feasible interval for values of new configs is reduced, but data can still contain configs with values in [lower_bound, upper_bound], and internal encoding is done w.r.t. this original range.

Parameters:
  • name (str) – Name of hyperparameter

  • lower_bound (float) – Lower bound (included)

  • upper_bound (float) – Upper bound (included)

  • scaling (Scaling) – Determines internal representation, whereby parameter = scaling(internal).

  • active_lower_bound (Optional[float]) – See above

  • active_upper_bound (Optional[float]) – See above

to_ndarray(hp)[source]
Return type:

ndarray

from_ndarray(ndarray)[source]
Return type:

Union[str, int, float]

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeInteger(name, lower_bound, upper_bound, scaling, active_lower_bound=None, active_upper_bound=None)[source]

Bases: HyperparameterRange

Integer valued hyperparameter. Both bounds are included in the valid values. Under the hood generates a continuous range from lower_bound - 0.5 to upper_bound + 0.5. See docs for continuous hyperparameter for more information.

Parameters:
  • name (str) – Name of hyperparameter

  • lower_bound (int) – Lower bound (integer, included)

  • upper_bound (int) – Upper bound (integer, included)

  • scaling (Scaling) – Determines internal representation, whereby parameter = scaling(internal).

  • active_lower_bound (Optional[int]) – See above

  • active_upper_bound (Optional[int]) – See above

property scaling: Scaling
to_ndarray(hp)[source]
Return type:

ndarray

from_ndarray(ndarray)[source]
Return type:

Union[str, int, float]

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeFiniteRange(name, lower_bound, upper_bound, size, scaling, cast_int=False)[source]

Bases: HyperparameterRange

Finite range numerical hyperparameter, see FiniteRange. Internally, we use an int with linear scaling.

Note: Different to HyperparameterRangeContinuous, we require that lower_bound < upper_bound and size >=2.

Parameters:
  • name (str) – Name of hyperparameter

  • lower_bound (float) – Lower bound (included)

  • upper_bound (float) – Upper bound (included)

  • size (int) – Number of values in range

  • scaling (Scaling) – Determines internal representation, whereby parameter = scaling(internal).

  • cast_int (bool) – If True, values are cast to int

property scaling: Scaling
to_ndarray(hp)[source]
Return type:

ndarray

from_ndarray(ndarray)[source]
Return type:

Union[str, int, float]

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeCategorical(name, choices)[source]

Bases: HyperparameterRange

Base class for categorical hyperparameter.

Parameters:
  • name (str) – Name of hyperparameter

  • choices (Tuple[Any, ...]) – Values parameter can take

class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeCategoricalNonBinary(name, choices, active_choices=None)[source]

Bases: HyperparameterRangeCategorical

Can take on discrete set of values. We use one-hot encoding internally. If the value range has size 2, it is more efficient to use HyperparameterRangeCategoricalBinary.

Parameters:
  • name (str) – Name of hyperparameter

  • choices (Tuple[Any, ...]) – Values parameter can take

  • active_choices (Optional[Tuple[Any, ...]]) – If given, must be nonempty subset of choices.

ndarray_size()[source]
Return type:

int

to_ndarray(hp)[source]
Return type:

ndarray

from_ndarray(cand_ndarray)[source]
Return type:

Union[str, int, float]

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeCategoricalBinary(name, choices, active_choices=None)[source]

Bases: HyperparameterRangeCategorical

Here, the value range must be of size 2. The internal encoding is a single int, so 1 instead of 2 dimensions.

Parameters:
  • name (str) – Name of hyperparameter

  • choices (Tuple[Any, ...]) – Values parameter can take (must be size 2)

  • active_choices (Optional[Tuple[Any, ...]]) – If given, must be nonempty subset of choices.

to_ndarray(hp)[source]
Return type:

ndarray

from_ndarray(cand_ndarray)[source]
Return type:

Union[str, int, float]

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeOrdinalEqual(name, choices, active_choices=None)[source]

Bases: HyperparameterRangeCategorical

Ordinal hyperparameter, equal distance encoding. See also Ordinal.

Parameters:
  • name (str) – Name of hyperparameter

  • choices (Tuple[Any, ...]) – Values parameter can take

  • active_choices (Optional[Tuple[Any, ...]]) – If given, must be nonempty contiguous subsequence of choices.

to_ndarray(hp)[source]
Return type:

ndarray

from_ndarray(cand_ndarray)[source]
Return type:

Union[str, int, float]

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangeOrdinalNearestNeighbor(name, choices, log_scale=False, active_choices=None)[source]

Bases: HyperparameterRangeCategorical

Ordinal hyperparameter, nearest neighbour encoding. See also OrdinalNearestNeighbor.

Parameters:
  • name (str) – Name of hyperparameter

  • choices (Tuple[Any, ...]) – Values parameter can take (numerical values, strictly increasing, size >= 2)

  • log_scale (bool) – If True, nearest neighbour done in log (choices must be positive)

  • active_choices (Optional[Tuple[Any, ...]]) – If given, must be nonempty contiguous subsequence of choices.

property log_scale: bool
to_ndarray(hp)[source]
Return type:

ndarray

from_ndarray(cand_ndarray)[source]
Return type:

Union[str, int, float]

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

class syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.HyperparameterRangesImpl(config_space, name_last_pos=None, value_for_last_pos=None, active_config_space=None, prefix_keys=None)[source]

Bases: HyperparameterRanges

Basic implementation of HyperparameterRanges.

Parameters:
property ndarray_size: int
Returns:

Dimensionality of encoded vector returned by to_ndarray

to_ndarray(config)[source]

Map configuration to [0, 1] encoded vector

Parameters:

config (Dict[str, Union[int, float, str]]) – Configuration to encode

Return type:

ndarray

Returns:

Encoded vector

from_ndarray(enc_config)[source]

Maps encoded vector back to configuration (can involve rounding)

The encoded vector enc_config need to be in the image of to_ndarray. In fact, any [0, 1] valued vector of dimensionality ndarray_size is allowed.

Parameters:

enc_config (ndarray) – Encoded vector

Return type:

Dict[str, Union[int, float, str]]

Returns:

Configuration corresponding to encoded vector

property encoded_ranges: Dict[str, Tuple[int, int]]

Encoded ranges are [0, 1] or closed subintervals thereof, in case active_config_space is used.

Returns:

Ranges of hyperparameters in the encoded ndarray representation

get_ndarray_bounds()[source]
Return type:

List[Tuple[float, float]]

Returns:

List of (lower, upper) bounds for each dimension in encoded vector representation.

syne_tune.optimizer.schedulers.searchers.utils.hp_ranges_impl.decode_extended_features(features_ext, resource_attr_range)[source]

Given matrix of features from extended configs, corresponding to ExtendedConfiguration, split into feature matrix from normal configs and resource values.

Parameters:
  • features_ext (ndarray) – Matrix of features from extended configs

  • resource_attr_range (Tuple[int, int]) – (r_min, r_max)

Return type:

(ndarray, ndarray)

Returns:

(features, resources)

syne_tune.optimizer.schedulers.searchers.utils.scaling module
class syne_tune.optimizer.schedulers.searchers.utils.scaling.Scaling[source]

Bases: object

to_internal(value)[source]
Return type:

float

from_internal(value)[source]
Return type:

float

class syne_tune.optimizer.schedulers.searchers.utils.scaling.LinearScaling[source]

Bases: Scaling

to_internal(value)[source]
Return type:

float

from_internal(value)[source]
Return type:

float

class syne_tune.optimizer.schedulers.searchers.utils.scaling.LogScaling[source]

Bases: Scaling

to_internal(value)[source]
Return type:

float

from_internal(value)[source]
Return type:

float

class syne_tune.optimizer.schedulers.searchers.utils.scaling.ReverseLogScaling[source]

Bases: Scaling

to_internal(value)[source]
Return type:

float

from_internal(value)[source]
Return type:

float

syne_tune.optimizer.schedulers.searchers.utils.scaling.get_scaling(hp_range)[source]
Return type:

Scaling

syne_tune.optimizer.schedulers.searchers.utils.warmstarting module
syne_tune.optimizer.schedulers.searchers.utils.warmstarting.create_hp_ranges_for_warmstarting(**kwargs)[source]

See GPFIFOSearcher for details on “transfer_learning_task_attr”, “transfer_learning_active_task”, “transfer_learning_active_config_space” as optional fields in kwargs. If given, they determine active_config_space and prefix_keys of hp_ranges created here, and they also place constraints on config_space.

This function is not only called in gp_searcher_factory to create hp_ranges for a new GPFIFOSearcher object. It is also needed to create the TuningJobState object containing the data to be used in warmstarting.

Return type:

HyperparameterRanges

syne_tune.optimizer.schedulers.searchers.utils.warmstarting.create_filter_observed_data_for_warmstarting(**kwargs)[source]

See GPFIFOSearcher for details on transfer_learning_task_attr’, ‘transfer_learning_active_task’ as optional fields in kwargs.

Return type:

Optional[Callable[[Dict[str, Union[int, float, str]]], bool]]

syne_tune.optimizer.schedulers.searchers.utils.warmstarting.create_base_gp_kernel_for_warmstarting(hp_ranges, **kwargs)[source]

In the transfer learning case, the base kernel is a product of two Matern52 kernels, the first non-ARD over the categorical parameter determining the task, the second ARD over the remaining parameters.

Return type:

KernelFunction

Submodules
syne_tune.optimizer.schedulers.searchers.bracket_distribution module
class syne_tune.optimizer.schedulers.searchers.bracket_distribution.BracketDistribution[source]

Bases: object

Configures asynchronous multi-fidelity schedulers such as HyperbandScheduler with distribution over brackets. This distribution can be fixed up front, or change adaptively during the course of an experiment. It has an effect only if the scheduler is run with more than one bracket.

configure(scheduler)[source]

This method is called in by the scheduler just after self.searcher.configure_scheduler. The searcher must be accessible via self.searcher. The __call__() method cannot be used before this method has been called.

class syne_tune.optimizer.schedulers.searchers.bracket_distribution.DefaultHyperbandBracketDistribution[source]

Bases: BracketDistribution

Implements default bracket distribution, where probability for each bracket is proportional to the number of slots in each bracket in synchronous Hyperband.

configure(scheduler)[source]

This method is called in by the scheduler just after self.searcher.configure_scheduler. The searcher must be accessible via self.searcher. The __call__() method cannot be used before this method has been called.

syne_tune.optimizer.schedulers.searchers.gp_fifo_searcher module
class syne_tune.optimizer.schedulers.searchers.gp_fifo_searcher.GPFIFOSearcher(config_space, metric, points_to_evaluate=None, clone_from_state=False, **kwargs)[source]

Bases: BayesianOptimizationSearcher

Gaussian process Bayesian optimization for FIFO scheduler

This searcher must be used with FIFOScheduler. It provides Bayesian optimization, based on a Gaussian process surrogate model.

It is not recommended creating GPFIFOSearcher searcher objects directly, but rather to create FIFOScheduler objects with searcher="bayesopt", and passing arguments here in search_options. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory to create components in a consistent way.

Most of the implementation is generic in BayesianOptimizationSearcher.

Note: If metric values are to be maximized (mode-"max" in scheduler), the searcher uses map_reward to map metric values to internal criterion values, and minimizes the latter. The default choice is to multiply values by -1.

Pending configurations (for which evaluation tasks are currently running) are dealt with by fantasizing (i.e., target values are drawn from the current posterior, and acquisition functions are averaged over this sample, see num_fantasy_samples).

The GP surrogate model uses a Matern 5/2 covariance function with automatic relevance determination (ARD) of input attributes, and a constant mean function. The acquisition function is expected improvement (EI). All hyperparameters of the surrogate model are estimated by empirical Bayes (maximizing the marginal likelihood). In general, this hyperparameter fitting is the most expensive part of a get_config() call.

Note that the full logic of construction based on arguments is given in :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory. In particular, see gp_fifo_searcher_defaults() for default values.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • clone_from_state (bool) – Internal argument, do not use

  • resource_attr (str, optional) – Name of resource attribute in reports. This is optional here, but required for multi-fidelity searchers. If resource_attr and cost_attr are given, cost values are read from each report and stored in the state. This allows cost models to be fit on more data.

  • cost_attr (str, optional) – Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Needed only by cost-aware searchers. Depending on whether resource_attr is given, cost values are read from each report or only at the end.

  • num_init_random (int, optional) – Number of initial get_config() calls for which randomly sampled configs are returned. Afterwards, the model-based searcher is used. Defaults to DEFAULT_NUM_INITIAL_RANDOM_EVALUATIONS

  • num_init_candidates (int, optional) – Number of initial candidates sampled at random in order to seed the model-based search in get_config. Defaults to DEFAULT_NUM_INITIAL_CANDIDATES

  • num_fantasy_samples (int, optional) – Number of samples drawn for fantasizing (latent target values for pending evaluations), defaults to 20

  • no_fantasizing (bool, optional) – If True, fantasizing is not done and pending evaluations are ignored. This may lead to loss of diversity in decisions. Defaults to False

  • input_warping (bool, optional) – If True, we use a warping transform, so the kernel function becomes \(k(w(x), w(x'))\), where \(w(x)\) is a warping transform parameterized by two non-negative numbers per component, which are learned as hyperparameters. See also Warping. Coordinates which belong to categorical hyperparameters, are not warped. Defaults to False.

  • boxcox_transform (bool, optional) – If True, target values are transformed before being fitted with a Gaussian marginal likelihood. This is using the Box-Cox transform with a parameter \(\lambda\), which is learned alongside other parameters of the surrogate model. The transform is \(\log y\) for \(\lambda = 0\), and \(y - 1\) for \(\lambda = 1\). This option requires the targets to be positive. Defaults to False.

  • gp_base_kernel (str, optional) – Selects the covariance (or kernel) function to be used. Supported choices are SUPPORTED_BASE_MODELS. Defaults to “matern52-ard” (Matern 5/2 with automatic relevance determination).

  • acq_function (str, optional) – Selects the acquisition function to be used. Supported choices are SUPPORTED_ACQUISITION_FUNCTIONS. Defaults to “ei” (expected improvement acquisition function).

  • acq_function_kwargs (dict, optional) – Some acquisition functions have additional parameters, they can be passed here. If none are given, default values are used.

  • initial_scoring (str, optional) –

    Scoring function to rank initial candidates (local optimization of EI is started from top scorer):

    • ”thompson_indep”: Independent Thompson sampling; randomized score, which can increase exploration

    • ”acq_func”: score is the same (EI) acquisition function which is used for local optimization afterwards

    Defaults to DEFAULT_INITIAL_SCORING

  • skip_local_optimization (bool, optional) – If True, the local gradient-based optimization of the acquisition function is skipped, and the top-ranked initial candidate (after initial scoring) is returned instead. In this case, initial_scoring="acq_func" makes most sense, otherwise the acquisition function will not be used. Defaults to False

  • opt_nstarts (int, optional) – Parameter for surrogate model fitting. Number of random restarts. Defaults to 2

  • opt_maxiter (int, optional) – Parameter for surrogate model fitting. Maximum number of iterations per restart. Defaults to 50

  • opt_warmstart (bool, optional) – Parameter for surrogate model fitting. If True, each fitting is started from the previous optimum. Not recommended in general. Defaults to False

  • opt_verbose (bool, optional) – Parameter for surrogate model fitting. If True, lots of output. Defaults to False

  • max_size_data_for_model (int, optional) – If this is set, we limit the number of observations the surrogate model is fitted on this value. If there are more observations, they are down sampled, see SubsampleSingleFidelityStateConverter for details. This down sampling is repeated every time the model is fit. The opt_skip_* predicates are evaluated before the state is downsampled. Pass None not to apply such a threshold. The default is DEFAULT_MAX_SIZE_DATA_FOR_MODEL.

  • max_size_top_fraction (float, optional) – Only used if max_size_data_for_model is set. This fraction of the down sampled set is filled with the top entries in the full set, the remaining ones are sampled at random from the full set, see SubsampleSingleFidelityStateConverter for details. Defaults to 0.25.

  • opt_skip_init_length (int, optional) – Parameter for surrogate model fitting, skip predicate. Fitting is never skipped as long as number of observations below this threshold. Defaults to 150

  • opt_skip_period (int, optional) – Parameter for surrogate model fitting, skip predicate. If >1, and number of observations above opt_skip_init_length, fitting is done only K-th call, and skipped otherwise. Defaults to 1 (no skipping)

  • allow_duplicates (bool, optional) – If True, get_config() may return the same configuration more than once. Defaults to False

  • restrict_configurations (List[dict], optional) – If given, the searcher only suggests configurations from this list. This needs skip_local_optimization == True. If allow_duplicates == False, entries are popped off this list once suggested.

  • map_reward (str or MapReward, optional) –

    In the scheduler, the metric may be minimized or maximized, but internally, Bayesian optimization is minimizing the criterion. map_reward converts from metric to internal criterion:

    • ”minus_x”: criterion = -metric

    • ”<a>_minus_x”: criterion = <a> - metric. For example “1_minus_x” maps accuracy to zero-one error

    From a technical standpoint, it does not matter what is chosen here, because criterion is only used internally. Also note that criterion data is always normalized to mean 0, variance 1 before fitted with a Gaussian process. Defaults to “1_minus_x”

  • transfer_learning_task_attr (str, optional) – Used to support transfer HPO, where the state contains observed data from several tasks, one of which is the active one. To this end, config_space must contain a categorical parameter of name transfer_learning_task_attr, whose range are all task IDs. Also, transfer_learning_active_task must denote the active task, and transfer_learning_active_config_space is used as active_config_space argument in HyperparameterRanges. This allows us to use a narrower search space for the active task than for the union of all tasks (config_space must be that), which is needed if some configurations of non-active tasks lie outside of the ranges in active_config_space. One of the implications is that filter_observed_data() is selecting configs of the active task, so that incumbents or exclusion lists are restricted to data from the active task.

  • transfer_learning_active_task (str, optional) – See transfer_learning_task_attr.

  • transfer_learning_active_config_space (Dict[str, Any], optional) – See transfer_learning_task_attr. If not given, config_space is the search space for the active task as well. This active config space need not contain the transfer_learning_task_attr parameter. In fact, this parameter is set to a categorical with transfer_learning_active_task as single value, so that new configs are chosen for the active task only.

  • transfer_learning_model (str, optional) –

    See transfer_learning_task_attr. Specifies the surrogate model to be used for transfer learning:

    • ”matern52_product”: Kernel is product of Matern 5/2 (not ARD) on transfer_learning_task_attr and Matern 5/2 (ARD) on the rest. Assumes that data from same task are more closely related than data from different tasks

    • ”matern52_same”: Kernel is Matern 5/2 (ARD) on the rest of the variables, transfer_learning_task_attr is ignored. Assumes that data from all tasks can be merged together

    Defaults to “matern52_product”

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.gp_multifidelity_searcher module
class syne_tune.optimizer.schedulers.searchers.gp_multifidelity_searcher.GPMultiFidelitySearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: GPFIFOSearcher

Gaussian process Bayesian optimization for asynchronous Hyperband scheduler.

This searcher must be used with a scheduler of type MultiFidelitySchedulerMixin. It provides a novel combination of Bayesian optimization, based on a Gaussian process surrogate model, with Hyperband scheduling. In particular, observations across resource levels are modelled jointly.

It is not recommended to create GPMultiFidelitySearcher searcher objects directly, but rather to create HyperbandScheduler objects with searcher="bayesopt", and passing arguments here in search_options. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory to create components in a consistent way.

Most of GPFIFOSearcher comments apply here as well. In multi-fidelity HPO, we optimize a function \(f(\mathbf{x}, r)\), \(\mathbf{x}\) the configuration, \(r\) the resource (or time) attribute. The latter must be a positive integer. In most applications, resource_attr == "epoch", and the resource is the number of epochs already trained.

If model == "gp_multitask" (default), we model the function \(f(\mathbf{x}, r)\) jointly over all resource levels \(r\) at which it is observed (but see searcher_data in HyperbandScheduler). The kernel and mean function of our surrogate model are over \((\mathbf{x}, r)\). The surrogate model is selected by gp_resource_kernel. More details about the supported kernels is in:

Tiao, Klein, Lienart, Archambeau, Seeger (2020)
Model-based Asynchronous Hyperparameter and Neural Architecture Search

The acquisition function (EI) which is optimized in get_config(), is obtained by fixing the resource level \(r\) to a value which is determined depending on the current state. If resource_acq == ‘bohb’, \(r\) is the largest value <= max_t, where we have seen \(\ge \mathrm{dimension}(\mathbf{x})\) metric values. If resource_acq == "first", \(r\) is the first milestone which config \(\mathbf{x}\) would reach when started.

Additional arguments on top of parent class GPFIFOSearcher.

Parameters:
  • model (str, optional) –

    Selects surrogate model (learning curve model) to be used. Choices are:

    • ”gp_multitask” (default): GP multi-task surrogate model

    • ”gp_independent”: Independent GPs for each rung level, sharing an ARD kernel

    • ”gp_issm”: Gaussian-additive model of ISSM type

    • ”gp_expdecay”: Gaussian-additive model of exponential decay type (as in Freeze Thaw Bayesian Optimization)

  • gp_resource_kernel (str, optional) – Only relevant for model == "gp_multitask". Surrogate model over criterion function \(f(\mathbf{x}, r)\), \(\mathbf{x}\) the config, \(r\) the resource. Note that \(\mathbf{x}\) is encoded to be a vector with entries in [0, 1], and \(r\) is linearly mapped to [0, 1], while the criterion data is normalized to mean 0, variance 1. The reference above provides details on the models supported here. For the exponential decay kernel, the base kernel over \(\mathbf{x}\) is Matern 5/2 ARD. See SUPPORTED_RESOURCE_MODELS for supported choices. Defaults to “exp-decay-sum”

  • resource_acq (str, optional) – Only relevant for ``model in {"gp_multitask", "gp_independent"}. Determines how the EI acquisition function is used. Values: “bohb”, “first”. Defaults to “bohb”

  • max_size_data_for_model (int, optional) –

    If this is set, we limit the number of observations the surrogate model is fitted on this value. If there are more observations, they are down sampled, see SubsampleMultiFidelityStateConverter for details. This down sampling is repeated every time the model is fit, which ensures that most recent data is taken into account. The opt_skip_* predicates are evaluated before the state is downsampled.

    Pass None not to apply such a threshold. The default is DEFAULT_MAX_SIZE_DATA_FOR_MODEL.

  • opt_skip_num_max_resource (bool, optional) – Parameter for surrogate model fitting, skip predicate. If True, and number of observations above opt_skip_init_length, fitting is done only when there is a new datapoint at r = max_t, and skipped otherwise. Defaults to False

  • issm_gamma_one (bool, optional) – Only relevant for model == "gp_issm". If True, the gamma parameter of the ISSM is fixed to 1, otherwise it is optimized over. Defaults to False

  • expdecay_normalize_inputs (bool, optional) – Only relevant for model == "gp_expdecay". If True, resource values r are normalized to [0, 1] as input to the exponential decay surrogate model. Defaults to False

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

register_pending(trial_id, config=None, milestone=None)[source]

Registers trial as pending. This means the corresponding evaluation task is running. Once it finishes, update is called for this trial.

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

cleanup_pending(trial_id)[source]

Removes all pending evaluations for trial trial_id.

This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.

Parameters:

trial_id (str) – ID of trial whose pending evaluations should be cleared

remove_case(trial_id, **kwargs)[source]

Remove data case previously appended by _update()

For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.

Parameters:
  • trial_id (str) – ID of trial whose data is to be removed

  • kwargs – Extra arguments, optional

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory module
syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.gp_fifo_searcher_factory(**kwargs)[source]

Returns kwargs for _create_internal(), based on kwargs equal to search_options passed to and extended by scheduler (see FIFOScheduler).

Extensions of kwargs by the scheduler:

  • scheduler: Name of scheduler ("fifo", "hyperband_*")

  • config_space: Configuration space

Only Hyperband schedulers:

  • resource_attr: Name of resource (or time) attribute

  • max_epochs: Maximum resource value

Parameters:

kwargssearch_options coming from scheduler

Return type:

Dict[str, Any]

Returns:

kwargs for _create_internal()

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.gp_multifidelity_searcher_factory(**kwargs)[source]

Returns kwargs for _create_internal(), based on kwargs equal to search_options passed to and extended by scheduler (see HyperbandScheduler).

Parameters:

kwargssearch_options coming from scheduler

Return type:

Dict[str, Any]

Returns:

kwargs for _create_internal()

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.constrained_gp_fifo_searcher_factory(**kwargs)[source]

Returns kwargs for _create_internal(), based on kwargs equal to search_options passed to and extended by scheduler (see FIFOScheduler).

Parameters:

kwargssearch_options coming from scheduler

Return type:

Dict[str, Any]

Returns:

kwargs for _create_internal()

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.cost_aware_coarse_gp_fifo_searcher_factory(**kwargs)[source]

Returns kwargs for _create_internal(), based on kwargs equal to search_options passed to and extended by scheduler (see FIFOScheduler).

This is for the coarse-grained variant, where costs \(c(x)\) are obtained together with metric values and are given a GP surrogate model.

Parameters:

kwargssearch_options coming from scheduler

Return type:

Dict[str, Any]

Returns:

kwargs for _create_internal()

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.cost_aware_fine_gp_fifo_searcher_factory(**kwargs)[source]

Returns kwargs for _create_internal(), based on kwargs equal to search_options passed to and extended by scheduler (see FIFOScheduler).

This is for the fine-grained variant, where costs \(c(x, r)\) are obtained with each report and are represented by a CostModel surrogate model.

Parameters:

kwargssearch_options coming from scheduler

Return type:

Dict[str, Any]

Returns:

kwargs for _create_internal()

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.cost_aware_gp_multifidelity_searcher_factory(**kwargs)[source]

Returns kwargs for _create_internal(), based on kwargs equal to search_options passed to and extended by scheduler (see HyperbandScheduler).

Parameters:

kwargssearch_options coming from scheduler

Return type:

Dict[str, Any]

Returns:

kwargs for _create_internal()

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.hypertune_searcher_factory(**kwargs)[source]

Returns kwargs for _create_internal(), based on kwargs equal to search_options passed to and extended by scheduler (see HyperbandScheduler).

Parameters:

kwargssearch_options coming from scheduler

Return type:

Dict[str, Any]

Returns:

kwargs for _create_internal()

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.gp_fifo_searcher_defaults(kwargs)[source]

Returns mandatory, default_options, config_space for check_and_merge_defaults() to be applied to search_options for GPFIFOSearcher.

Return type:

(Set[str], dict, dict)

Returns:

(mandatory, default_options, config_space)

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.gp_multifidelity_searcher_defaults(kwargs)[source]

Returns mandatory, default_options, config_space for check_and_merge_defaults() to be applied to search_options for GPMultiFidelitySearcher.

Return type:

(Set[str], dict, dict)

Returns:

(mandatory, default_options, config_space)

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.constrained_gp_fifo_searcher_defaults(kwargs)[source]

Returns mandatory, default_options, config_space for check_and_merge_defaults() to be applied to search_options for ConstrainedGPFIFOSearcher.

Return type:

(Set[str], dict, dict)

Returns:

(mandatory, default_options, config_space)

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.cost_aware_gp_fifo_searcher_defaults(kwargs)[source]

Returns mandatory, default_options, config_space for check_and_merge_defaults() to be applied to search_options for CostAwareGPFIFOSearcher.

Return type:

(Set[str], dict, dict)

Returns:

(mandatory, default_options, config_space)

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.cost_aware_gp_multifidelity_searcher_defaults(kwargs)[source]

Returns mandatory, default_options, config_space for check_and_merge_defaults() to be applied to search_options for CostAwareGPMultiFidelitySearcher.

Return type:

(Set[str], dict, dict)

Returns:

(mandatory, default_options, config_space)

syne_tune.optimizer.schedulers.searchers.gp_searcher_factory.hypertune_searcher_defaults(kwargs)[source]

Returns mandatory, default_options, config_space for check_and_merge_defaults() to be applied to search_options for HyperTuneSearcher.

Return type:

(Set[str], dict, dict)

Returns:

(mandatory, default_options, config_space)

syne_tune.optimizer.schedulers.searchers.gp_searcher_utils module
class syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.MapReward(forward, reverse)[source]

Bases: object

forward: Callable[[float], float]
reverse: Callable[[float], float]
syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.map_reward_const_minus_x(const=1.0)[source]

Factory for map_reward argument in GPMultiFidelitySearcher.

Return type:

MapReward

syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.encode_state(state)[source]
Return type:

Dict[str, Any]

syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.decode_state(enc_state, hp_ranges)[source]
Return type:

TuningJobState

syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.decode_state_from_old_encoding(enc_state, hp_ranges)[source]

Decodes TuningJobState from encoding done for the old definition of TuningJobState. Code maintained for backwards compatibility.

Note: Since the old TuningJobState did not contain trial_id, we need to make them up here. We assign these IDs in the order candidate_evaluations, failed_candidates, pending_candidates, matching for duplicates.

Parameters:
Return type:

TuningJobState

Returns:

class syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.ResourceForAcquisitionMap[source]

Bases: object

In order to use a standard acquisition function (like expected improvement) for multi-fidelity HPO, we need to decide at which r_acq we would like to evaluate the AF, w.r.t. the posterior distribution over f(x, r=r_acq). This decision can depend on the current state.

class syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.ResourceForAcquisitionBOHB(threshold, active_metric='target')[source]

Bases: ResourceForAcquisitionMap

Implements a heuristic proposed in the BOHB paper: r_acq is the largest r such that we have at least threshold observations at r. If there are less than threshold observations at all levels, the smallest level is returned.

class syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.ResourceForAcquisitionFirstMilestone[source]

Bases: ResourceForAcquisitionMap

Here, r_acq is the smallest rung level to be attained by a config started from scratch.

class syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.ResourceForAcquisitionFinal(r_max)[source]

Bases: ResourceForAcquisitionMap

Here, r_acq = r_max is the largest resource level.

syne_tune.optimizer.schedulers.searchers.gp_searcher_utils.resource_for_acquisition_factory(kwargs, hp_ranges)[source]
Return type:

ResourceForAcquisitionMap

syne_tune.optimizer.schedulers.searchers.model_based_searcher module
syne_tune.optimizer.schedulers.searchers.model_based_searcher.check_initial_candidates_scorer(initial_scoring)[source]
Return type:

str

class syne_tune.optimizer.schedulers.searchers.model_based_searcher.ModelBasedSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: StochasticSearcher

Common code for surrogate model based searchers

If num_initial_random_choices > 0, initial configurations are drawn using an internal RandomSearcher object, which is created in _assign_random_searcher(). This internal random searcher shares random_state with the searcher here. This ensures that if ModelBasedSearcher and RandomSearcher objects are created with the same random_seed and points_to_evaluate argument, initial configurations are identical until _get_config_modelbased() kicks in.

Note that this works because random_state is only used in the internal random searcher until meth:_get_config_modelbased is first called.

on_trial_result(trial_id, config, result, update)[source]

Inform searcher about result

The scheduler passes every result. If update == True, the searcher should update its surrogate model (if any), otherwise result is an intermediate result not modelled.

The default implementation calls _update() if update == True. It can be overwritten by searchers which also react to intermediate results.

Parameters:
  • trial_id (str) – See on_trial_result()

  • config (Dict[str, Any]) – See on_trial_result()

  • result (Dict[str, Any]) – See on_trial_result()

  • update (bool) – Should surrogate model be updated?

get_config(**kwargs)[source]

Runs Bayesian optimization in order to suggest the next config to evaluate.

Return type:

Optional[Dict[str, Any]]

Returns:

Next config to evaluate at

dataset_size()[source]
Returns:

Size of dataset a model is fitted to, or 0 if no model is fitted to data

model_parameters()[source]
Returns:

Dictionary with current model (hyper)parameter values if this is supported; otherwise empty

set_params(param_dict)[source]
get_state()[source]

The mutable state consists of the GP model parameters, the TuningJobState, and the skip_optimization predicate (which can have a mutable state). We assume that skip_optimization can be pickled.

Note that we do not have to store the state of _random_searcher, since this internal searcher shares its random_state with the searcher here.

Return type:

Dict[str, Any]

property debug_log

Some subclasses support writing a debug log, using DebugLogPrinter. See RandomSearcher for an example.

Returns:

debug_log object`` or None (not supported)

syne_tune.optimizer.schedulers.searchers.model_based_searcher.create_initial_candidates_scorer(initial_scoring, predictor, acquisition_class, random_state, active_metric='target')[source]
Return type:

ScoringFunction

class syne_tune.optimizer.schedulers.searchers.model_based_searcher.BayesianOptimizationSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: ModelBasedSearcher

Common Code for searchers using Bayesian optimization

We implement Bayesian optimization, based on a model factory which parameterizes the state transformer. This implementation works with any type of surrogate model and acquisition function, which are compatible with each other.

The following happens in get_config():

  • For the first num_init_random calls, a config is drawn at random (after points_to_evaluate, which are included in the num_init_random initial ones). Afterwards, Bayesian optimization is used, unless there are no finished evaluations yet (a surrogate model cannot be used with no data at all)

  • For BO, model hyperparameter are refit first. This step can be skipped (see opt_skip_* parameters).

  • Next, the BO decision is made based on BayesianOptimizationAlgorithm. This involves sampling num_init_candidates` configs are sampled at random, ranking them with a scoring function (initial_scoring), and finally runing local optimization starting from the top scoring config.

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

register_pending(trial_id, config=None, milestone=None)[source]

Registers trial as pending. This means the corresponding evaluation task is running. Once it finishes, update is called for this trial.

get_batch_configs(batch_size, num_init_candidates_for_batch=None, **kwargs)[source]

Asks for a batch of batch_size configurations to be suggested. This is roughly equivalent to calling get_config batch_size times, marking the suggested configs as pending in the state (but the state is not modified here). This means the batch is chosen sequentially, at about the cost of calling get_config batch_size times.

If num_init_candidates_for_batch is given, it is used instead of num_init_candidates for the selection of all but the first config in the batch. In order to speed up batch selection, choose num_init_candidates_for_batch smaller than num_init_candidates.

If less than batch_size configs are returned, the search space has been exhausted.

Note: Batch selection does not support debug_log right now: make sure to switch this off when creating scheduler and searcher.

Return type:

List[Dict[str, Union[int, float, str]]]

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

syne_tune.optimizer.schedulers.searchers.random_grid_searcher module
class syne_tune.optimizer.schedulers.searchers.random_grid_searcher.RandomSearcher(config_space, metric, points_to_evaluate=None, debug_log=False, resource_attr=None, allow_duplicates=None, restrict_configurations=None, **kwargs)[source]

Bases: StochasticAndFilterDuplicatesSearcher

Searcher which randomly samples configurations to try next.

Additional arguments on top of parent class StochasticAndFilterDuplicatesSearcher:

Parameters:
  • debug_log (Union[bool, DebugLogPrinter]) – If True, debug log printing is activated. Logs which configs are chosen when, and which metric values are obtained. Defaults to False

  • resource_attr (Optional[str]) – Optional. Key in result passed to _update() for resource value (for multi-fidelity schedulers)

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

property debug_log

Some subclasses support writing a debug log, using DebugLogPrinter. See RandomSearcher for an example.

Returns:

debug_log object`` or None (not supported)

class syne_tune.optimizer.schedulers.searchers.random_grid_searcher.GridSearcher(config_space, metric, points_to_evaluate=None, num_samples=None, shuffle_config=True, allow_duplicates=False, **kwargs)[source]

Bases: StochasticSearcher

Searcher that samples configurations from an equally spaced grid over config_space.

It first evaluates configurations defined in points_to_evaluate and then continues with the remaining points from the grid.

Additional arguments on top of parent class StochasticSearcher.

Parameters:
  • num_samples (Optional[Dict[str, int]]) – Dictionary, optional. Number of samples per hyperparameter. This is required for hyperparameters of type float, optional for integer hyperparameters, and will be ignored for other types (categorical, scalar). If left unspecified, a default value of DEFAULT_NSAMPLE will be used for float parameters, and the smallest of DEFAULT_NSAMPLE and integer range will be used for integer parameters.

  • shuffle_config (bool) – If True (default), the order of configurations suggested after those specified in points_to_evaluate is shuffled. Otherwise, the order will follow the Cartesian product of the configurations.

  • allow_duplicates (bool) – If True, get_config() may return the same configuration more than once. Defaults to False

get_config(**kwargs)[source]

Select the next configuration from the grid.

This is done without replacement, so previously returned configs are not suggested again.

Return type:

Optional[dict]

Returns:

A new configuration that is valid, or None if no new config can be suggested. The returned configuration is a dictionary that maps hyperparameters to its values.

get_state()[source]

Together with clone_from_state(), this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.

Return type:

Dict[str, Any]

Returns:

Pickle-able mutable state of searcher

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.regularized_evolution module
class syne_tune.optimizer.schedulers.searchers.regularized_evolution.PopulationElement(result=None, score=0, config=None)[source]

Bases: object

result: Dict[str, Any] = None
score: int = 0
config: Dict[str, Any] = None
class syne_tune.optimizer.schedulers.searchers.regularized_evolution.RegularizedEvolution(config_space, metric, points_to_evaluate=None, population_size=100, sample_size=10, **kwargs)[source]

Bases: StochasticSearcher

Implements the regularized evolution algorithm. The original implementation only considers categorical hyperparameters. For integer and float parameters we sample a new value uniformly at random. Reference:

Real, E., Aggarwal, A., Huang, Y., and Le, Q. V.
Regularized Evolution for Image Classifier Architecture Search.
In Proceedings of the Conference on Artificial Intelligence (AAAI’19)

The code is based one the original regularized evolution open-source implementation: https://colab.research.google.com/github/google-research/google-research/blob/master/evolution/regularized_evolution_algorithm/regularized_evolution.ipynb

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • mode – Mode to use for the metric given, can be “min” or “max”, defaults to “min”

  • population_size (int) – Size of the population, defaults to 100

  • sample_size (int) – Size of the candidate set to obtain a parent for the mutation, defaults to 10

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[dict]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

syne_tune.optimizer.schedulers.searchers.searcher module
syne_tune.optimizer.schedulers.searchers.searcher.impute_points_to_evaluate(points_to_evaluate, config_space)[source]

Transforms points_to_evaluate argument to BaseSearcher. Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. Also, duplicate entries are filtered out. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

Parameters:
  • points_to_evaluate (Optional[List[Dict[str, Any]]]) – Argument to BaseSearcher

  • config_space (Dict[str, Any]) – Configuration space

Return type:

List[Dict[str, Any]]

Returns:

List of fully specified initial configs

class syne_tune.optimizer.schedulers.searchers.searcher.BaseSearcher(config_space, metric, points_to_evaluate=None, mode='min')[source]

Bases: object

Base class of searchers, which are components of schedulers responsible for implementing get_config().

Note

This is an abstract base class. In order to implement a new searcher, try to start from StochasticAndFilterDuplicatesSearcher or StochasticSearcher, which implement generally useful properties.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space

  • metric (Union[List[str], str]) –

    Name of metric passed to update(). Can be obtained from scheduler in configure_scheduler(). In the case of multi-objective optimization,

    metric is a list of strings specifying all objectives to be optimized.

  • points_to_evaluate (Optional[List[Dict[str, Any]]]) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • mode (Union[List[str], str]) – Should metric be minimized (“min”, default) or maximized (“max”). In the case of multi-objective optimization, mode can be a list defining for each metric if it is minimized or maximized

configure_scheduler(scheduler)[source]

Some searchers need to obtain information from the scheduler they are used with, in order to configure themselves. This method has to be called before the searcher can be used.

Parameters:

scheduler (TrialScheduler) – Scheduler the searcher is used with.

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[Dict[str, Any]]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

on_trial_result(trial_id, config, result, update)[source]

Inform searcher about result

The scheduler passes every result. If update == True, the searcher should update its surrogate model (if any), otherwise result is an intermediate result not modelled.

The default implementation calls _update() if update == True. It can be overwritten by searchers which also react to intermediate results.

Parameters:
  • trial_id (str) – See on_trial_result()

  • config (Dict[str, Any]) – See on_trial_result()

  • result (Dict[str, Any]) – See on_trial_result()

  • update (bool) – Should surrogate model be updated?

register_pending(trial_id, config=None, milestone=None)[source]

Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.

Parameters:
  • trial_id (str) – ID of trial to be registered as pending evaluation

  • config (Optional[Dict[str, Any]]) – If trial_id has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.

  • milestone (Optional[int]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers (config, milestone) as pending.

remove_case(trial_id, **kwargs)[source]

Remove data case previously appended by _update()

For searchers which maintain the dataset of all cases (reports) passed to update, this method allows to remove one case from the dataset.

Parameters:
  • trial_id (str) – ID of trial whose data is to be removed

  • kwargs – Extra arguments, optional

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

cleanup_pending(trial_id)[source]

Removes all pending evaluations for trial trial_id.

This should be called after an evaluation terminates. For various reasons (e.g., termination due to convergence), pending candidates for this evaluation may still be present.

Parameters:

trial_id (str) – ID of trial whose pending evaluations should be cleared

dataset_size()[source]
Returns:

Size of dataset a model is fitted to, or 0 if no model is fitted to data

model_parameters()[source]
Returns:

Dictionary with current model (hyper)parameter values if this is supported; otherwise empty

get_state()[source]

Together with clone_from_state(), this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.

Return type:

Dict[str, Any]

Returns:

Pickle-able mutable state of searcher

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

property debug_log: DebugLogPrinter | None

Some subclasses support writing a debug log, using DebugLogPrinter. See RandomSearcher for an example.

Returns:

debug_log object`` or None (not supported)

syne_tune.optimizer.schedulers.searchers.searcher_base module
syne_tune.optimizer.schedulers.searchers.searcher_base.extract_random_seed(**kwargs)[source]
Return type:

(int, Dict[str, Any])

syne_tune.optimizer.schedulers.searchers.searcher_base.sample_random_configuration(hp_ranges, random_state, exclusion_list=None)[source]

Samples a configuration from config_space at random.

Parameters:
  • hp_ranges (HyperparameterRanges) – Used for sampling configurations

  • random_state (RandomState) – PRN generator

  • exclusion_list (Optional[ExclusionList]) – Configurations not to be returned

Return type:

Optional[Dict[str, Any]]

Returns:

New configuration, or None if configuration space has been exhausted

class syne_tune.optimizer.schedulers.searchers.searcher_base.StochasticSearcher(config_space, metric, points_to_evaluate=None, **kwargs)[source]

Bases: BaseSearcher

Base class of searchers which use random decisions. Creates the random_state member, which must be used for all random draws.

Making proper use of this interface allows us to run experiments with control of random seeds, e.g. for paired comparisons or integration testing.

Additional arguments on top of parent class BaseSearcher:

Parameters:
  • random_seed_generator (RandomSeedGenerator, optional) – If given, random seed is drawn from there

  • random_seed (int, optional) – Used if random_seed_generator is not given.

get_state()[source]

Together with clone_from_state(), this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.

Return type:

Dict[str, Any]

Returns:

Pickle-able mutable state of searcher

set_random_state(random_state)[source]
class syne_tune.optimizer.schedulers.searchers.searcher_base.StochasticAndFilterDuplicatesSearcher(config_space, metric, points_to_evaluate=None, allow_duplicates=None, restrict_configurations=None, **kwargs)[source]

Bases: StochasticSearcher

Base class for searchers with the following properties:

  • Random decisions use common random_state

  • Maintains exclusion list to filter out duplicates in get_config() if allows_duplicates == False`. If this is ``True, duplicates are not filtered, and the exclusion list is used only to avoid configurations of failed trials.

  • If restrict_configurations is given, this is a list of configurations, and the searcher only suggests configurations from there. If allow_duplicates == False, entries are popped off this list once suggested. points_to_evaluate is filtered to only contain entries in this set.

In order to make use of these features:

  • Reject configurations in get_config() if should_not_suggest() returns True. If the configuration is drawn at random, use _get_random_config(), which incorporates this filtering

  • Implement _get_config() instead of get_config(). The latter adds the new config to the exclusion list if allow_duplicates == False

Note: Not all searchers which filter duplicates make use of this class.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • allow_duplicates (Optional[bool]) – See above. Defaults to False

  • restrict_configurations (Optional[List[Dict[str, Any]]]) – See above, optional

property allow_duplicates: bool
should_not_suggest(config)[source]
Parameters:

config (Dict[str, Any]) – Configuration

Return type:

bool

Returns:

get_config() should not suggest this configuration?

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[Dict[str, Any]]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

register_pending(trial_id, config=None, milestone=None)[source]

Signals to searcher that evaluation for trial has started, but not yet finished, which allows model-based searchers to register this evaluation as pending.

Parameters:
  • trial_id (str) – ID of trial to be registered as pending evaluation

  • config (Optional[Dict[str, Any]]) – If trial_id has not been registered with the searcher, its configuration must be passed here. Ignored otherwise.

  • milestone (Optional[int]) – For multi-fidelity schedulers, this is the next rung level the evaluation will attend, so that model registers (config, milestone) as pending.

evaluation_failed(trial_id)[source]

Called by scheduler if an evaluation job for a trial failed.

The searcher should react appropriately (e.g., remove pending evaluations for this trial, not suggest the configuration again).

Parameters:

trial_id (str) – ID of trial whose evaluated failed

get_state()[source]

Together with clone_from_state(), this is needed in order to store and re-create the mutable state of the searcher. The state returned here must be pickle-able.

Return type:

Dict[str, Any]

Returns:

Pickle-able mutable state of searcher

syne_tune.optimizer.schedulers.searchers.searcher_callback module
class syne_tune.optimizer.schedulers.searchers.searcher_callback.StoreResultsAndModelParamsCallback(add_wallclock_time=True)[source]

Bases: StoreResultsCallback

Extends StoreResultsCallback by also storing the current surrogate model parameters in on_trial_result(). This works for schedulers with model-based searchers. For other schedulers, this callback behaves the same as the superclass.

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tunerTuner object

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict[str, Any]) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

class syne_tune.optimizer.schedulers.searchers.searcher_callback.SimulatorAndModelParamsCallback[source]

Bases: SimulatorCallback

Extends SimulatorCallback by also storing the current surrogate model parameters in on_trial_result(). This works for schedulers with model-based searchers. For other schedulers, this callback behaves the same as the superclass.

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tunerTuner object

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict[str, Any]) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

syne_tune.optimizer.schedulers.searchers.searcher_factory module
syne_tune.optimizer.schedulers.searchers.searcher_factory.searcher_factory(searcher_name, **kwargs)[source]

Factory for searcher objects

This function creates searcher objects from string argument name and additional kwargs. It is typically called in the constructor of a scheduler (see FIFOScheduler), which provides most of the required kwargs.

Parameters:
  • searcher_name (str) – Value of searcher argument to scheduler (see FIFOScheduler)

  • kwargs – Argument to BaseSearcher constructor

Return type:

BaseSearcher

Returns:

New searcher object

syne_tune.optimizer.schedulers.synchronous package
class syne_tune.optimizer.schedulers.synchronous.SynchronousHyperbandScheduler(config_space, bracket_rungs, **kwargs)[source]

Bases: SynchronousHyperbandCommon, DefaultRemoveCheckpointsSchedulerMixin

Synchronous Hyperband. Compared to HyperbandScheduler, this is also scheduling jobs asynchronously, but decision-making is synchronized, in that trials are only promoted to the next milestone once the rung they are currently paused at, is completely occupied.

Our implementation never delays scheduling of a job. If the currently active bracket does not accept jobs, we assign the job to a later bracket. This means that at any point in time, several brackets can be active, but jobs are preferentially assigned to the first one (the “primary” active bracket).

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for trial evaluation function

  • bracket_rungs (List[List[Tuple[int, int]]]) – Determines rung level systems for each bracket, see SynchronousHyperbandBracketManager

  • metric (str) – Name of metric to optimize, key in result’s obtained via on_trial_result()

  • searcher (str, optional) – Searcher for get_config decisions. Passed to searcher_factory() along with search_options and extra information. Supported values: SUPPORTED_SEARCHERS_HYPERBAND. Defaults to “random” (i.e., random search)

  • search_options (Dict[str, Any], optional) – Passed to searcher_factory().

  • mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.

  • max_resource_level (int, optional) – Largest rung level, corresponds to max_t in FIFOScheduler. Must be positive int larger than grace_period. If this is not given, it is inferred like in FIFOScheduler. In particular, it is not needed if max_resource_attr is given.

  • resource_attr (str, optional) – Name of resource attribute in results obtained via ``on_trial_result(). The type of resource must be int. Default to “epoch”

  • searcher_data (str, optional) –

    Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:

    • ”rungs” (default): Only results at rung levels. Cheapest

    • ”all”: All results. Most expensive

    Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.

property rung_levels: List[int]
Returns:

Rung levels (positive int; increasing), may or may not include max_resource_level

property num_brackets: int
Returns:

Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict[str, Any]) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

on_trial_error(trial)[source]

Given the trial is currently pending, we send a result at its milestone for metric value NaN. Such trials are ranked after all others and will most likely not be promoted.

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

trials_checkpoints_can_be_removed()[source]

Supports the general case (see header comment). This method returns IDs of paused trials for which checkpoints can safely be removed. These trials either cannot be resumed anymore, or it is very unlikely they will be resumed. Any trial ID needs to be returned only once, not over and over. If a trial gets stopped (by returning SchedulerDecision.STOP in on_trial_result()), its checkpoint is removed anyway, so its ID does not have to be returned here.

Return type:

List[int]

Returns:

IDs of paused trials for which checkpoints can be removed

class syne_tune.optimizer.schedulers.synchronous.SynchronousGeometricHyperbandScheduler(config_space, **kwargs)[source]

Bases: SynchronousHyperbandScheduler

Special case of SynchronousHyperbandScheduler with rung system defined by geometric sequences (see SynchronousHyperbandRungSystem.geometric()). This is the most frequently used case.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for trial evaluation function

  • metric (str) – Name of metric to optimize, key in result’s obtained via on_trial_result()

  • grace_period (int, optional) – Smallest (resource) rung level. Must be positive int. Defaults to 1

  • reduction_factor (float, optional) – Approximate ratio of successive rung levels. Must be >= 2. Defaults to 3

  • brackets (int, optional) – Number of brackets to be used. The default is to use the maximum number of brackets per iteration. Pass 1 for successive halving.

  • searcher (str, optional) – Selects searcher. Passed to searcher_factory(). Defaults to “random”

  • search_options (Dict[str, Any], optional) – Passed to searcher_factory().

  • mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_level (int, optional) – Largest rung level, corresponds to max_t in FIFOScheduler. Must be positive int larger than grace_period. If this is not given, it is inferred like in FIFOScheduler. In particular, it is not needed if max_resource_attr is given.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.

  • resource_attr (str, optional) – Name of resource attribute in results obtained via ``on_trial_result(). The type of resource must be int. Default to “epoch”

  • searcher_data (str, optional) –

    Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:

    • ”rungs” (default): Only results at rung levels. Cheapest

    • ”all”: All results. Most expensive

    Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.

class syne_tune.optimizer.schedulers.synchronous.DifferentialEvolutionHyperbandScheduler(config_space, rungs_first_bracket, num_brackets_per_iteration=None, **kwargs)[source]

Bases: SynchronousHyperbandCommon

Differential Evolution Hyperband, as proposed in

DEHB: Evolutionary Hyperband for Scalable, Robust and Efficient Hyperparameter Optimization
Noor Awad, Neeratyoy Mallik, Frank Hutter
IJCAI 30 (2021), pages 2147-2153

We implement DEHB as a variant of synchronous Hyperband, which may differ slightly from the implementation of the authors. Main differences to synchronous Hyperband:

  • In DEHB, trials are not paused and potentially promoted (except in the very first bracket). Therefore, checkpointing is not used (except in the very first bracket, if support_pause_resume is True)

  • Only the initial configurations are drawn at random (or drawn from the searcher). Whenever possible, new configurations (in their internal encoding) are derived from earlier ones by way of differential evolution

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for trial evaluation function

  • rungs_first_bracket (List[Tuple[int, int]]) – Determines rung level systems for each bracket, see DifferentialEvolutionHyperbandBracketManager

  • num_brackets_per_iteration (Optional[int]) – Number of brackets per iteration. The algorithm cycles through these brackets in one iteration. If not given, the maximum number is used (i.e., len(rungs_first_bracket))

  • metric (str) – Name of metric to optimize, key in result’s obtained via on_trial_result()

  • searcher (str, optional) – Searcher for get_config decisions. Passed to searcher_factory() along with search_options and extra information. Supported values: SUPPORTED_SEARCHERS_HYPERBAND. If searcher == "random_encoded" (default), the encoded configs are sampled directly, each entry independently from U([0, 1]). This distribution has higher entropy than for “random” if there are discrete hyperparameters in config_space. Note that points_to_evaluate is still used in this case.

  • search_options (Dict[str, Any], optional) – Passed to searcher_factory(). Note: If search_options["allow_duplicates"] == True, then suggest() may return a configuration more than once

  • mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.

  • max_resource_level (int, optional) – Largest rung level, corresponds to max_t in FIFOScheduler. Must be positive int larger than grace_period. If this is not given, it is inferred like in FIFOScheduler. In particular, it is not needed if max_resource_attr is given.

  • resource_attr (str, optional) – Name of resource attribute in results obtained via on_trial_result(). The type of resource must be int. Default to “epoch”

  • mutation_factor (float, optional) – In \((0, 1]\). Factor \(F\) used in the rand/1 mutation operation of DE. Default to 0.5

  • crossover_probability (float, optional) – In \((0, 1)\). Probability \(p\) used in crossover operation (child entries are chosen with probability \(p\)). Defaults to 0.5

  • support_pause_resume (bool, optional) – If True, _suggest() supports pause and resume in the first bracket (this is the default). If the objective supports checkpointing, this is made use of. Defaults to True. Note: The resumed trial still gets assigned a new trial_id, but it starts from the earlier checkpoint.

  • searcher_data (str, optional) –

    Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:

    • ”rungs” (default): Only results at rung levels. Cheapest

    • ”all”: All results. Most expensive

    Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.

MAX_RETRIES = 50
property rung_levels: List[int]
Returns:

Rung levels (positive int; increasing), may or may not include max_resource_level

property num_brackets: int
Returns:

Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict[str, Any]) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

on_trial_error(trial)[source]

Given the trial is currently pending, we send a result at its milestone for metric value NaN. Such trials are ranked after all others and will most likely not be promoted.

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

class syne_tune.optimizer.schedulers.synchronous.GeometricDifferentialEvolutionHyperbandScheduler(config_space, **kwargs)[source]

Bases: DifferentialEvolutionHyperbandScheduler

Special case of DifferentialEvolutionHyperbandScheduler with rung system defined by geometric sequences. This is the most frequently used case.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for trial evaluation function

  • grace_period (int, optional) – Smallest (resource) rung level. Must be positive int. Defaults to 1

  • reduction_factor (float, optional) – Approximate ratio of successive rung levels. Must be >= 2. Defaults to 3

  • brackets (int, optional) – Number of brackets to be used. The default is to use the maximum number of brackets per iteration. Pass 1 for successive halving.

  • metric (str) – Name of metric to optimize, key in result’s obtained via on_trial_result()

  • searcher (str, optional) – Selects searcher. Passed to searcher_factory().. If searcher == "random_encoded" (default), the encoded configs are sampled directly, each entry independently from U([0, 1]). This distribution has higher entropy than for “random” if there are discrete hyperparameters in config_space. Note that points_to_evaluate is still used in this case.

  • search_options (Dict[str, Any], optional) – Passed to searcher_factory().

  • mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_level (int, optional) – Largest rung level, corresponds to max_t in FIFOScheduler. Must be positive int larger than grace_period. If this is not given, it is inferred like in FIFOScheduler. In particular, it is not needed if max_resource_attr is given.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.

  • resource_attr (str, optional) – Name of resource attribute in results obtained via on_trial_result(). The type of resource must be int. Default to “epoch”

  • mutation_factor (float, optional) – In \((0, 1]\). Factor \(F\) used in the rand/1 mutation operation of DE. Default to 0.5

  • crossover_probability (float, optional) – In \((0, 1)\). Probability \(p\) used in crossover operation (child entries are chosen with probability \(p\)). Defaults to 0.5

  • support_pause_resume (bool, optional) – If True, _suggest() supports pause and resume in the first bracket (this is the default). If the objective supports checkpointing, this is made use of. Defaults to True. Note: The resumed trial still gets assigned a new trial_id, but it starts from the earlier checkpoint.

Submodules
syne_tune.optimizer.schedulers.synchronous.dehb module
class syne_tune.optimizer.schedulers.synchronous.dehb.TrialInformation(encoded_config, level, metric_val=None)[source]

Bases: object

Information the scheduler maintains per trial.

encoded_config: ndarray
level: int
metric_val: Optional[float] = None
class syne_tune.optimizer.schedulers.synchronous.dehb.ExtendedSlotInRung(bracket_id, slot_in_rung)[source]

Bases: object

Extends SlotInRung mostly for convenience

slot_in_rung()[source]
Return type:

SlotInRung

class syne_tune.optimizer.schedulers.synchronous.dehb.DifferentialEvolutionHyperbandScheduler(config_space, rungs_first_bracket, num_brackets_per_iteration=None, **kwargs)[source]

Bases: SynchronousHyperbandCommon

Differential Evolution Hyperband, as proposed in

DEHB: Evolutionary Hyperband for Scalable, Robust and Efficient Hyperparameter Optimization
Noor Awad, Neeratyoy Mallik, Frank Hutter
IJCAI 30 (2021), pages 2147-2153

We implement DEHB as a variant of synchronous Hyperband, which may differ slightly from the implementation of the authors. Main differences to synchronous Hyperband:

  • In DEHB, trials are not paused and potentially promoted (except in the very first bracket). Therefore, checkpointing is not used (except in the very first bracket, if support_pause_resume is True)

  • Only the initial configurations are drawn at random (or drawn from the searcher). Whenever possible, new configurations (in their internal encoding) are derived from earlier ones by way of differential evolution

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for trial evaluation function

  • rungs_first_bracket (List[Tuple[int, int]]) – Determines rung level systems for each bracket, see DifferentialEvolutionHyperbandBracketManager

  • num_brackets_per_iteration (Optional[int]) – Number of brackets per iteration. The algorithm cycles through these brackets in one iteration. If not given, the maximum number is used (i.e., len(rungs_first_bracket))

  • metric (str) – Name of metric to optimize, key in result’s obtained via on_trial_result()

  • searcher (str, optional) – Searcher for get_config decisions. Passed to searcher_factory() along with search_options and extra information. Supported values: SUPPORTED_SEARCHERS_HYPERBAND. If searcher == "random_encoded" (default), the encoded configs are sampled directly, each entry independently from U([0, 1]). This distribution has higher entropy than for “random” if there are discrete hyperparameters in config_space. Note that points_to_evaluate is still used in this case.

  • search_options (Dict[str, Any], optional) – Passed to searcher_factory(). Note: If search_options["allow_duplicates"] == True, then suggest() may return a configuration more than once

  • mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.

  • max_resource_level (int, optional) – Largest rung level, corresponds to max_t in FIFOScheduler. Must be positive int larger than grace_period. If this is not given, it is inferred like in FIFOScheduler. In particular, it is not needed if max_resource_attr is given.

  • resource_attr (str, optional) – Name of resource attribute in results obtained via on_trial_result(). The type of resource must be int. Default to “epoch”

  • mutation_factor (float, optional) – In \((0, 1]\). Factor \(F\) used in the rand/1 mutation operation of DE. Default to 0.5

  • crossover_probability (float, optional) – In \((0, 1)\). Probability \(p\) used in crossover operation (child entries are chosen with probability \(p\)). Defaults to 0.5

  • support_pause_resume (bool, optional) – If True, _suggest() supports pause and resume in the first bracket (this is the default). If the objective supports checkpointing, this is made use of. Defaults to True. Note: The resumed trial still gets assigned a new trial_id, but it starts from the earlier checkpoint.

  • searcher_data (str, optional) –

    Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:

    • ”rungs” (default): Only results at rung levels. Cheapest

    • ”all”: All results. Most expensive

    Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.

MAX_RETRIES = 50
property rung_levels: List[int]
Returns:

Rung levels (positive int; increasing), may or may not include max_resource_level

property num_brackets: int
Returns:

Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict[str, Any]) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

on_trial_error(trial)[source]

Given the trial is currently pending, we send a result at its milestone for metric value NaN. Such trials are ranked after all others and will most likely not be promoted.

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

syne_tune.optimizer.schedulers.synchronous.dehb_bracket module
class syne_tune.optimizer.schedulers.synchronous.dehb_bracket.DifferentialEvolutionHyperbandBracket(rungs, mode)[source]

Bases: SynchronousBracket

Represents a bracket in Differential Evolution Hyperband (DEHB).

There are a number of differences to brackets in standard synchronous Hyperband (SynchronousHyperbandBracket):

  • on_result(): result.trial_id overwrites trial_id in rung even if latter is not None.

  • Promotions are not triggered automatically when a rung is complete

  • Some additional methods

property num_rungs: int
size_of_current_rung()[source]
Return type:

int

trial_id_for_slot(rung_index, slot_index)[source]
Return type:

Optional[int]

top_list_for_previous_rung()[source]

Returns list of trial_ids corresponding to best scoring entries in rung below the currently active one (which must not be the base rung). The list is of the size of the current rung.

Return type:

List[int]

syne_tune.optimizer.schedulers.synchronous.dehb_bracket_manager module
class syne_tune.optimizer.schedulers.synchronous.dehb_bracket_manager.DifferentialEvolutionHyperbandBracketManager(rungs_first_bracket, mode, num_brackets_per_iteration=None)[source]

Bases: SynchronousHyperbandBracketManager

Special case of SynchronousHyperbandBracketManager to manage DEHB brackets (type DifferentialEvolutionHyperbandBracket).

In DEHB, the list of brackets is determined by the first one and the number of brackets. Also, later brackets have less total budget, because the size of a rung is determined by its level, independent of the bracket. This is different to what is done in synchronous Hyperband, where the rungs of later brackets have larger sizes, so the total budget of each bracket is the same.

We also need additional methods to access trial_id’s in specific rungs, as well as entries of the top lists for completed rungs. This is because DEHB controls the creation of new configurations at higher rungs, while synchronous Hyperband relies on automatic promotion from lower rungs.

size_of_current_rung(bracket_id)[source]
Return type:

int

trial_id_from_parent_slot(bracket_id, level, slot_index)[source]

The parent slot has the same slot index and rung level in the largest bracket < bracket_id with a trial_id not None. If no such slot exists, None is returned. For a cross-over or selection operation, the target is chosen from the parent slot.

Return type:

Optional[int]

top_of_previous_rung(bracket_id, pos)[source]

For the current rung in bracket bracket_id, consider the slots of the previous rung (below) in sorted order. We return the trial_id of position pos (so for pos=0, the best entry).

Return type:

int

syne_tune.optimizer.schedulers.synchronous.hyperband module
class syne_tune.optimizer.schedulers.synchronous.hyperband.SynchronousHyperbandCommon(config_space, **kwargs)[source]

Bases: TrialSchedulerWithSearcher, MultiFidelitySchedulerMixin

Common code for _create_internal() in SynchronousHyperbandScheduler and DifferentialEvolutionHyperbandScheduler

property searcher: BaseSearcher | None
property resource_attr: str
Returns:

Name of resource attribute in reported results

property max_resource_level: int
Returns:

Maximum resource level

property searcher_data: str
Returns:

Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config() may become. Choices:

  • ”rungs”: Only results at rung levels. Cheapest

  • ”all”: All results. Most expensive

  • ”rungs_and_last”: Results at rung levels plus last recent one. Not available for all multi-fidelity schedulers

class syne_tune.optimizer.schedulers.synchronous.hyperband.SynchronousHyperbandScheduler(config_space, bracket_rungs, **kwargs)[source]

Bases: SynchronousHyperbandCommon, DefaultRemoveCheckpointsSchedulerMixin

Synchronous Hyperband. Compared to HyperbandScheduler, this is also scheduling jobs asynchronously, but decision-making is synchronized, in that trials are only promoted to the next milestone once the rung they are currently paused at, is completely occupied.

Our implementation never delays scheduling of a job. If the currently active bracket does not accept jobs, we assign the job to a later bracket. This means that at any point in time, several brackets can be active, but jobs are preferentially assigned to the first one (the “primary” active bracket).

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for trial evaluation function

  • bracket_rungs (List[List[Tuple[int, int]]]) – Determines rung level systems for each bracket, see SynchronousHyperbandBracketManager

  • metric (str) – Name of metric to optimize, key in result’s obtained via on_trial_result()

  • searcher (str, optional) – Searcher for get_config decisions. Passed to searcher_factory() along with search_options and extra information. Supported values: SUPPORTED_SEARCHERS_HYPERBAND. Defaults to “random” (i.e., random search)

  • search_options (Dict[str, Any], optional) – Passed to searcher_factory().

  • mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.

  • max_resource_level (int, optional) – Largest rung level, corresponds to max_t in FIFOScheduler. Must be positive int larger than grace_period. If this is not given, it is inferred like in FIFOScheduler. In particular, it is not needed if max_resource_attr is given.

  • resource_attr (str, optional) – Name of resource attribute in results obtained via ``on_trial_result(). The type of resource must be int. Default to “epoch”

  • searcher_data (str, optional) –

    Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:

    • ”rungs” (default): Only results at rung levels. Cheapest

    • ”all”: All results. Most expensive

    Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.

property rung_levels: List[int]
Returns:

Rung levels (positive int; increasing), may or may not include max_resource_level

property num_brackets: int
Returns:

Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict[str, Any]) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

on_trial_error(trial)[source]

Given the trial is currently pending, we send a result at its milestone for metric value NaN. Such trials are ranked after all others and will most likely not be promoted.

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

trials_checkpoints_can_be_removed()[source]

Supports the general case (see header comment). This method returns IDs of paused trials for which checkpoints can safely be removed. These trials either cannot be resumed anymore, or it is very unlikely they will be resumed. Any trial ID needs to be returned only once, not over and over. If a trial gets stopped (by returning SchedulerDecision.STOP in on_trial_result()), its checkpoint is removed anyway, so its ID does not have to be returned here.

Return type:

List[int]

Returns:

IDs of paused trials for which checkpoints can be removed

syne_tune.optimizer.schedulers.synchronous.hyperband_bracket module
class syne_tune.optimizer.schedulers.synchronous.hyperband_bracket.SlotInRung(rung_index, level, slot_index, trial_id, metric_val)[source]

Bases: object

Used to communicate slot positions and content for them.

rung_index: int
level: int
slot_index: int
trial_id: Optional[int]
metric_val: Optional[float]
class syne_tune.optimizer.schedulers.synchronous.hyperband_bracket.SynchronousBracket(mode)[source]

Bases: object

Base class for a single bracket in synchronous Hyperband algorithms.

A bracket consists of a list of rungs. Each rung consists of a number of slots and a resource level (called rung level). The larger the rung level, the smaller the number of slots.

A slot is occupied (by a metric value), free, or pending. A pending slot has already been returned by next_free_slot(). Slots in the lowest rung (smallest rung level, largest size) are filled first. At any point in time, only slots in the lowest not fully occupied rung can be filled. If there are no free slots in the current rung, but there are pending ones, the bracket is blocked, and another bracket needs to be worked on.

static assert_check_rungs(rungs)[source]
property num_rungs: int
is_bracket_complete()[source]
Return type:

bool

num_pending_slots()[source]
Return type:

int

Returns:

Number of pending slots (have been returned by next_free_slot, but not yet occupied

next_free_slot()[source]
Return type:

Optional[SlotInRung]

on_result(result)[source]

Provides result for slot previously requested by next_free_slot. Here, result.metric is written to the slot in order to make it occupied. Also, result.trial_id is written there.

We normally return None. But if the result passed completes the current rung, this triggers the creation of a child run which consists of promoted trials from the current rung. In this case, we return the IDs of trials which have not been promoted. This is used in for early removal of checkpoints, see trials_checkpoints_can_be_removed().

Parameters:

result (SlotInRung) – See above

Return type:

Optional[List[int]]

Returns:

See above

class syne_tune.optimizer.schedulers.synchronous.hyperband_bracket.SynchronousHyperbandBracket(rungs, mode)[source]

Bases: SynchronousBracket

Represents a bracket in standard synchronous Hyperband.

When a rung is fully occupied, slots for the next rung are assigned with the trial_id’s having the best metric values. At any point in time, only slots in the lowest not fully occupied rung can be filled.

property num_rungs: int
syne_tune.optimizer.schedulers.synchronous.hyperband_bracket.get_top_list(rung, new_len, mode)[source]

Returns list of IDs of trials of len new_len which should be promoted, because they performed best. We also return the list of IDs of the remaining trials, which are not to be promoted.

Parameters:
  • rung (List[Tuple[int, float]]) – Current rung which has just been completed

  • new_len (int) – Size of new rung

  • mode (str) – “min” or “max”

Return type:

(List[int], List[int])

Returns:

(top_list, remaining_list)

syne_tune.optimizer.schedulers.synchronous.hyperband_bracket_manager module
class syne_tune.optimizer.schedulers.synchronous.hyperband_bracket_manager.SynchronousHyperbandBracketManager(bracket_rungs, mode)[source]

Bases: object

Maintains all brackets, relays requests for another job and report of result to one of the brackets.

Each bracket contains a number of rungs, the largest one max_num_rungs. A bracket with k rungs has offset max_num_rungs - k. Hyperband cycles through brackets with offset 0, ..., num_brackets - 1, where num_brackets <= max_num_rungs.

At any given time, one bracket is primary, all other active brackets are secondary. Jobs are preferentially assigned to the primary bracket, but if its current rung has no free slots (all are pending), secondary brackets are considered.

Each bracket has a bracket_id (nonnegative int). The primary bracket always has the lowest id of all active ones. For job assignment, we iterate over active brackets starting from the primary, and assign the job to the first bracket which has a free slot. If none of the active brackets have a free slot, a new bracket is created.

Parameters:
  • bracket_rungs (List[List[Tuple[int, int]]]) – Rungs for successive brackets, from largest to smallest

  • mode (str) – Criterion is minimized (‘min’) or maximized (‘max’)

property bracket_rungs: List[List[Tuple[int, int]]]
level_to_prev_level(bracket_id, level)[source]
Parameters:
  • bracket_id (int) –

  • level (int) – Level in bracket

Return type:

int

Returns:

Previous level; or 0

next_job()[source]

Called by scheduler to request a new job. Jobs are preferentially assigned to the primary bracket, which has the lowest id among all active brackets. If the primary bracket does not accept jobs (because all remaining slots are already pending), further active brackets are polled. If none of the active brackets accept jobs, a new bracket is created.

The job description returned is (bracket_id, slot_in_rung), where slot_in_rung is SlotInRung, containing the info of what is to be done (trial_id, level fields). It is this entry which has to be returned in ‘on_result``, which the metric_val field set. If the job returned here has trial_id == None, it comes from the lowest rung of its bracket, and the trial_id has to be set as well when returning the record in on_result.

Return type:

Tuple[int, SlotInRung]

Returns:

Tuple (bracket_id, slot_in_rung)

on_result(result)[source]

Called by scheduler to provide result for previously requested job. See next_job().

Parameters:

result (Tuple[int, SlotInRung]) – Tuple (bracket_id, slot_in_rung)

Return type:

Optional[List[int]]

Returns:

See on_result()

syne_tune.optimizer.schedulers.synchronous.hyperband_impl module
class syne_tune.optimizer.schedulers.synchronous.hyperband_impl.SynchronousGeometricHyperbandScheduler(config_space, **kwargs)[source]

Bases: SynchronousHyperbandScheduler

Special case of SynchronousHyperbandScheduler with rung system defined by geometric sequences (see SynchronousHyperbandRungSystem.geometric()). This is the most frequently used case.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for trial evaluation function

  • metric (str) – Name of metric to optimize, key in result’s obtained via on_trial_result()

  • grace_period (int, optional) – Smallest (resource) rung level. Must be positive int. Defaults to 1

  • reduction_factor (float, optional) – Approximate ratio of successive rung levels. Must be >= 2. Defaults to 3

  • brackets (int, optional) – Number of brackets to be used. The default is to use the maximum number of brackets per iteration. Pass 1 for successive halving.

  • searcher (str, optional) – Selects searcher. Passed to searcher_factory(). Defaults to “random”

  • search_options (Dict[str, Any], optional) – Passed to searcher_factory().

  • mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_level (int, optional) – Largest rung level, corresponds to max_t in FIFOScheduler. Must be positive int larger than grace_period. If this is not given, it is inferred like in FIFOScheduler. In particular, it is not needed if max_resource_attr is given.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.

  • resource_attr (str, optional) – Name of resource attribute in results obtained via ``on_trial_result(). The type of resource must be int. Default to “epoch”

  • searcher_data (str, optional) –

    Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:

    • ”rungs” (default): Only results at rung levels. Cheapest

    • ”all”: All results. Most expensive

    Note: For a Gaussian additive learning curve surrogate model, this has to be set to “all”.

class syne_tune.optimizer.schedulers.synchronous.hyperband_impl.GeometricDifferentialEvolutionHyperbandScheduler(config_space, **kwargs)[source]

Bases: DifferentialEvolutionHyperbandScheduler

Special case of DifferentialEvolutionHyperbandScheduler with rung system defined by geometric sequences. This is the most frequently used case.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for trial evaluation function

  • grace_period (int, optional) – Smallest (resource) rung level. Must be positive int. Defaults to 1

  • reduction_factor (float, optional) – Approximate ratio of successive rung levels. Must be >= 2. Defaults to 3

  • brackets (int, optional) – Number of brackets to be used. The default is to use the maximum number of brackets per iteration. Pass 1 for successive halving.

  • metric (str) – Name of metric to optimize, key in result’s obtained via on_trial_result()

  • searcher (str, optional) – Selects searcher. Passed to searcher_factory().. If searcher == "random_encoded" (default), the encoded configs are sampled directly, each entry independently from U([0, 1]). This distribution has higher entropy than for “random” if there are discrete hyperparameters in config_space. Note that points_to_evaluate is still used in this case.

  • search_options (Dict[str, Any], optional) – Passed to searcher_factory().

  • mode (str, optional) – Mode to use for the metric given, can be “min” (default) or “max”

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If None (default), this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_level (int, optional) – Largest rung level, corresponds to max_t in FIFOScheduler. Must be positive int larger than grace_period. If this is not given, it is inferred like in FIFOScheduler. In particular, it is not needed if max_resource_attr is given.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If given, trials need not be stopped, which can run more efficiently.

  • resource_attr (str, optional) – Name of resource attribute in results obtained via on_trial_result(). The type of resource must be int. Default to “epoch”

  • mutation_factor (float, optional) – In \((0, 1]\). Factor \(F\) used in the rand/1 mutation operation of DE. Default to 0.5

  • crossover_probability (float, optional) – In \((0, 1)\). Probability \(p\) used in crossover operation (child entries are chosen with probability \(p\)). Defaults to 0.5

  • support_pause_resume (bool, optional) – If True, _suggest() supports pause and resume in the first bracket (this is the default). If the objective supports checkpointing, this is made use of. Defaults to True. Note: The resumed trial still gets assigned a new trial_id, but it starts from the earlier checkpoint.

syne_tune.optimizer.schedulers.synchronous.hyperband_rung_system module
class syne_tune.optimizer.schedulers.synchronous.hyperband_rung_system.SynchronousHyperbandRungSystem[source]

Bases: object

Collects factory methods for RungSystemsPerBracket rung systems to be used in SynchronousHyperbandBracketManager.

static geometric(min_resource, max_resource, reduction_factor, num_brackets=None)[source]

This is the geometric progression setup from the original papers on successive halving and Hyperband.

If smax = ceil(log(max_resource / min_resource) / log(reduction_factor)), there can be at most s_max + 1 brackets. Here, bracket s has r_num = s_max - s + 1 rungs, and the size of rung r in bracket s is

n(r,s) = ceil( (s_max + 1) / r_num) * power(reduction_factor, r_num - r - 1)

Parameters:
  • min_resource (int) – Smallest resource level (positive int)

  • max_resource (int) – Largest resource level (positive int)

  • reduction_factor (float) – Approximate ratio between successive rung levels

  • num_brackets (Optional[int]) – Number of brackets. If not given, the maximum number of brackets is used. Pass 1 for successive halving

Return type:

List[List[Tuple[int, int]]]

Returns:

Rung system

syne_tune.optimizer.schedulers.transfer_learning package
class syne_tune.optimizer.schedulers.transfer_learning.TransferLearningTaskEvaluations(configuration_space, hyperparameters, objectives_names, objectives_evaluations)[source]

Bases: object

Class that contains offline evaluations for a task that can be used for transfer learning. Args:

configuration_space: Dict the configuration space that was used when sampling evaluations. hyperparameters: pd.DataFrame the hyperparameters values that were acquired, all keys of configuration-space

should appear as columns.

objectives_names: List[str] the name of the objectives that were acquired objectives_evaluations: np.array values of recorded objectives, must have shape

(num_evals, num_seeds, num_fidelities, num_objectives)

configuration_space: Dict
hyperparameters: DataFrame
objectives_names: List[str]
objectives_evaluations: array
objective_values(objective_name)[source]
Return type:

array

objective_index(objective_name)[source]
Return type:

int

top_k_hyperparameter_configurations(k, mode, objective)[source]

Returns the best k hyperparameter configurations. :type k: int :param k: The number of top hyperparameters to return. :type mode: str :param mode: ‘min’ or ‘max’, indicating the type of optimization problem. :type objective: str :param objective: The objective to consider for ranking hyperparameters. :rtype: List[Dict[str, Any]] :returns: List of hyperparameters in order.

class syne_tune.optimizer.schedulers.transfer_learning.TransferLearningMixin(config_space, transfer_learning_evaluations, metric_names, **kwargs)[source]

Bases: object

metric_names()[source]
Return type:

List[str]

top_k_hyperparameter_configurations_per_task(transfer_learning_evaluations, num_hyperparameters_per_task, mode, metric)[source]

Returns the best hyperparameter configurations for each task. :type transfer_learning_evaluations: Dict[str, TransferLearningTaskEvaluations] :param transfer_learning_evaluations: Set of candidates to choose from. :type num_hyperparameters_per_task: int :param num_hyperparameters_per_task: The number of top hyperparameters per task to return. :type mode: str :param mode: ‘min’ or ‘max’, indicating the type of optimization problem. :type metric: str :param metric: The metric to consider for ranking hyperparameters. :rtype: Dict[str, List[Dict[str, Any]]] :returns: Dict which maps from task name to list of hyperparameters in order.

class syne_tune.optimizer.schedulers.transfer_learning.BoundingBox(scheduler_fun, config_space, metric, transfer_learning_evaluations, mode=None, num_hyperparameters_per_task=1)[source]

Bases: TransferLearningMixin, TrialScheduler

Simple baseline that computes a bounding-box of the best candidate found in previous tasks to restrict the search space to only good candidates. The bounding-box is obtained by restricting to the min-max of the best numerical hyperparameters and restricting to the set of the best candidates on categorical parameters. Reference:

Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning.
Valerio Perrone, Huibin Shen, Matthias Seeger, Cédric Archambeau, Rodolphe Jenatton.
NeurIPS 2019.

scheduler_fun is used to create the scheduler to be used here, feeding it with the modified config space. Any additional scheduler arguments (such as points_to_evaluate) should be encoded inside this function. Example:

from syne_tune.optimizer.baselines import RandomSearch

def scheduler_fun(new_config_space: Dict[str, Any], mode: str, metric: str):
    return RandomSearch(new_config_space, metric, mode)

bb_scheduler = BoundingBox(scheduler_fun, ...)

Here, bb_scheduler represents random search, where the hyperparameter ranges are restricted to contain the best evalutions of previous tasks, as provided by transfer_learning_evaluations.

Parameters:
  • scheduler_fun (Callable[[dict, str, str], TrialScheduler]) – Maps tuple of configuration space (dict), mode (str), metric (str) to a scheduler. This is required since the final configuration space is known only after computing a bounding-box.

  • config_space (Dict[str, Any]) – Initial configuration space to consider, will be updated to the bounding of the best evaluations of previous tasks

  • metric (str) – Objective name to optimize, must be present in transfer learning evaluations.

  • mode (Optional[str]) – Mode to be considered, default to “min”.

  • transfer_learning_evaluations (Dict[str, TransferLearningTaskEvaluations]) – Dictionary from task name to offline evaluations.

  • num_hyperparameters_per_task (int) – Number of the best configurations to use per task when computing the bounding box, defaults to 1.

suggest(trial_id)[source]

Returns a suggestion for a new trial, or one to be resumed

This method returns suggestion of type TrialSuggestion (unless there is no config left to explore, and None is returned).

If suggestion.spawn_new_trial_id is True, a new trial is to be started with config suggestion.config. Typically, this new trial is started from scratch. But if suggestion.checkpoint_trial_id is given, the trial is to be (warm)started from the checkpoint written for the trial with this ID. The new trial has ID trial_id.

If suggestion.spawn_new_trial_id is False, an existing and currently paused trial is to be resumed, whose ID is suggestion.checkpoint_trial_id. If this trial has a checkpoint, we start from there. In this case, suggestion.config is optional. If not given (default), the config of the resumed trial does not change. Otherwise, its config is overwritten by suggestion.config (see HyperbandScheduler with type="promotion" for an example why this can be useful).

Apart from the HP config, additional fields can be appended to the dict, these are passed to the trial function as well.

Parameters:

trial_id (int) – ID for new trial to be started (ignored if existing trial to be resumed)

Return type:

Optional[TrialSuggestion]

Returns:

Suggestion for a trial to be started or to be resumed, see above. If no suggestion can be made, None is returned

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict[str, Any]) – Result dictionary

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

on_trial_error(trial)[source]

Called when a trial has failed.

Parameters:

trial (Trial) – Trial for which error is reported.

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict[str, Any]) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

class syne_tune.optimizer.schedulers.transfer_learning.RUSHScheduler(config_space, transfer_learning_evaluations, metric, type='stopping', points_to_evaluate=None, custom_rush_points=None, num_hyperparameters_per_task=1, **kwargs)[source]

Bases: TransferLearningMixin, HyperbandScheduler

A transfer learning variation of Hyperband which uses previously well-performing hyperparameter configurations as an initialization. The best hyperparameter configuration of each individual task provided is evaluated. The one among them which performs best on the current task will serve as a hurdle and is used to prune other candidates. This changes the standard successive halving promotion as follows. As usual, only the top-performing fraction is promoted to the next rung level. However, these candidates need to be at least as good as the hurdle configuration to be promoted. In practice this means that much fewer candidates can be promoted. Reference:

A resource-efficient method for repeated HPO and NAS.
Giovanni Zappella, David Salinas, Cédric Archambeau.
AutoML workshop @ ICML 2021.

Additional arguments on top of parent class HyperbandScheduler.

Parameters:
  • transfer_learning_evaluations (Dict[str, TransferLearningTaskEvaluations]) – Dictionary from task name to offline evaluations.

  • points_to_evaluate (Optional[List[dict]]) – If given, these configurations are evaluated after custom_rush_points and configurations inferred from transfer_learning_evaluations. These points are not used to prune any configurations.

  • custom_rush_points (Optional[List[dict]]) – If given, these configurations are evaluated first, in addition to top performing configurations from other tasks and also serve to preemptively prune underperforming configurations

  • num_hyperparameters_per_task (int) – The number of top hyperparameter configurations to consider per task. Defaults to 1

Subpackages
syne_tune.optimizer.schedulers.transfer_learning.quantile_based package
Submodules
syne_tune.optimizer.schedulers.transfer_learning.quantile_based.normalization_transforms module
class syne_tune.optimizer.schedulers.transfer_learning.quantile_based.normalization_transforms.GaussianTransform(y, random_state=None)[source]

Bases: object

Transform data into Gaussian by applying psi = Phi^{-1} o F where F is the truncated empirical CDF. :type y: array :param y: shape (n, dim) :type random_state: Optional[RandomState] :param random_state: If specified, randomize the rank when consecutive values exists between extreme values.

If none use lowest rank of duplicated values.

static z_transform(series, values_sorted, random_state=None)[source]
Parameters:
  • series – shape (n, dim)

  • values_sorted – series sorted on the first axis

  • random_state (Optional[RandomState]) – if not None, ranks are drawn uniformly for values with consecutive ranges

Returns:

data with same shape as input series where distribution is normalized on all dimensions

transform(y)[source]
Parameters:

y (array) – shape (n, dim)

Returns:

shape (n, dim), distributed along a normal

class syne_tune.optimizer.schedulers.transfer_learning.quantile_based.normalization_transforms.StandardTransform(y)[source]

Bases: object

transform(y)[source]
syne_tune.optimizer.schedulers.transfer_learning.quantile_based.normalization_transforms.from_string(name, random_state=None)[source]
syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher module
syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher.extract_input_output(transfer_learning_evaluations, normalization, random_state)[source]
syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher.fit_model(config_space, transfer_learning_evaluations, normalization, max_fit_samples, random_state, model=XGBRegressor(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, device=None, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=None, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=None, max_leaves=None, min_child_weight=None, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=None, n_jobs=None, num_parallel_tree=None, random_state=None, ...))[source]
syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher.eval_model(model_pipeline, X, y)[source]
syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher.subsample(X, y, max_samples=10000, random_state=None)[source]

Subsample both X and y with max_samples elements. If max_samples is not set then X and y are returned as such and if it is set, the index of X is reset. :rtype: Tuple[DataFrame, array] :return: (X, y) with max_samples sampled elements.

class syne_tune.optimizer.schedulers.transfer_learning.quantile_based.quantile_based_searcher.QuantileBasedSurrogateSearcher(config_space, metric, transfer_learning_evaluations, mode=None, max_fit_samples=100000, normalization='gaussian', **kwargs)[source]

Bases: StochasticSearcher

Implements the transfer-learning method:

A Quantile-based Approach for Hyperparameter Transfer Learning.
David Salinas, Huibin Shen, Valerio Perrone.
ICML 2020.

This is the Copula Thompson Sampling approach described in the paper where a surrogate is fitted on the transfer learning data to predict mean/variance of configuration performance given a hyperparameter. The surrogate is then sampled from and the best configurations are returned as next candidate to evaluate.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • mode (Optional[str]) – Whether to minimize or maximize, default to “min”.

  • transfer_learning_evaluations (Dict[str, TransferLearningTaskEvaluations]) – Dictionary from task name to offline evaluations.

  • max_fit_samples (int) – Maximum number to use when fitting the method. Defaults to 100000

  • normalization (str) – Default to “gaussian” which first computes the rank and then applies Gaussian inverse CDF. “standard” applies just standard normalization (remove mean and divide by variance) but can perform significantly worse.

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state (Dict[str, Any]) – See above

Returns:

New searcher object

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[dict]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

Submodules
syne_tune.optimizer.schedulers.transfer_learning.bounding_box module
class syne_tune.optimizer.schedulers.transfer_learning.bounding_box.BoundingBox(scheduler_fun, config_space, metric, transfer_learning_evaluations, mode=None, num_hyperparameters_per_task=1)[source]

Bases: TransferLearningMixin, TrialScheduler

Simple baseline that computes a bounding-box of the best candidate found in previous tasks to restrict the search space to only good candidates. The bounding-box is obtained by restricting to the min-max of the best numerical hyperparameters and restricting to the set of the best candidates on categorical parameters. Reference:

Learning search spaces for Bayesian optimization: Another view of hyperparameter transfer learning.
Valerio Perrone, Huibin Shen, Matthias Seeger, Cédric Archambeau, Rodolphe Jenatton.
NeurIPS 2019.

scheduler_fun is used to create the scheduler to be used here, feeding it with the modified config space. Any additional scheduler arguments (such as points_to_evaluate) should be encoded inside this function. Example:

from syne_tune.optimizer.baselines import RandomSearch

def scheduler_fun(new_config_space: Dict[str, Any], mode: str, metric: str):
    return RandomSearch(new_config_space, metric, mode)

bb_scheduler = BoundingBox(scheduler_fun, ...)

Here, bb_scheduler represents random search, where the hyperparameter ranges are restricted to contain the best evalutions of previous tasks, as provided by transfer_learning_evaluations.

Parameters:
  • scheduler_fun (Callable[[dict, str, str], TrialScheduler]) – Maps tuple of configuration space (dict), mode (str), metric (str) to a scheduler. This is required since the final configuration space is known only after computing a bounding-box.

  • config_space (Dict[str, Any]) – Initial configuration space to consider, will be updated to the bounding of the best evaluations of previous tasks

  • metric (str) – Objective name to optimize, must be present in transfer learning evaluations.

  • mode (Optional[str]) – Mode to be considered, default to “min”.

  • transfer_learning_evaluations (Dict[str, TransferLearningTaskEvaluations]) – Dictionary from task name to offline evaluations.

  • num_hyperparameters_per_task (int) – Number of the best configurations to use per task when computing the bounding box, defaults to 1.

suggest(trial_id)[source]

Returns a suggestion for a new trial, or one to be resumed

This method returns suggestion of type TrialSuggestion (unless there is no config left to explore, and None is returned).

If suggestion.spawn_new_trial_id is True, a new trial is to be started with config suggestion.config. Typically, this new trial is started from scratch. But if suggestion.checkpoint_trial_id is given, the trial is to be (warm)started from the checkpoint written for the trial with this ID. The new trial has ID trial_id.

If suggestion.spawn_new_trial_id is False, an existing and currently paused trial is to be resumed, whose ID is suggestion.checkpoint_trial_id. If this trial has a checkpoint, we start from there. In this case, suggestion.config is optional. If not given (default), the config of the resumed trial does not change. Otherwise, its config is overwritten by suggestion.config (see HyperbandScheduler with type="promotion" for an example why this can be useful).

Apart from the HP config, additional fields can be appended to the dict, these are passed to the trial function as well.

Parameters:

trial_id (int) – ID for new trial to be started (ignored if existing trial to be resumed)

Return type:

Optional[TrialSuggestion]

Returns:

Suggestion for a trial to be started or to be resumed, see above. If no suggestion can be made, None is returned

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict[str, Any]) – Result dictionary

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

on_trial_error(trial)[source]

Called when a trial has failed.

Parameters:

trial (Trial) – Trial for which error is reported.

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict[str, Any]) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

syne_tune.optimizer.schedulers.transfer_learning.rush module
class syne_tune.optimizer.schedulers.transfer_learning.rush.RUSHScheduler(config_space, transfer_learning_evaluations, metric, type='stopping', points_to_evaluate=None, custom_rush_points=None, num_hyperparameters_per_task=1, **kwargs)[source]

Bases: TransferLearningMixin, HyperbandScheduler

A transfer learning variation of Hyperband which uses previously well-performing hyperparameter configurations as an initialization. The best hyperparameter configuration of each individual task provided is evaluated. The one among them which performs best on the current task will serve as a hurdle and is used to prune other candidates. This changes the standard successive halving promotion as follows. As usual, only the top-performing fraction is promoted to the next rung level. However, these candidates need to be at least as good as the hurdle configuration to be promoted. In practice this means that much fewer candidates can be promoted. Reference:

A resource-efficient method for repeated HPO and NAS.
Giovanni Zappella, David Salinas, Cédric Archambeau.
AutoML workshop @ ICML 2021.

Additional arguments on top of parent class HyperbandScheduler.

Parameters:
  • transfer_learning_evaluations (Dict[str, TransferLearningTaskEvaluations]) – Dictionary from task name to offline evaluations.

  • points_to_evaluate (Optional[List[dict]]) – If given, these configurations are evaluated after custom_rush_points and configurations inferred from transfer_learning_evaluations. These points are not used to prune any configurations.

  • custom_rush_points (Optional[List[dict]]) – If given, these configurations are evaluated first, in addition to top performing configurations from other tasks and also serve to preemptively prune underperforming configurations

  • num_hyperparameters_per_task (int) – The number of top hyperparameter configurations to consider per task. Defaults to 1

syne_tune.optimizer.schedulers.transfer_learning.zero_shot module
class syne_tune.optimizer.schedulers.transfer_learning.zero_shot.ZeroShotTransfer(config_space, metric, transfer_learning_evaluations, mode='min', sort_transfer_learning_evaluations=True, use_surrogates=False, **kwargs)[source]

Bases: TransferLearningMixin, StochasticSearcher

A zero-shot transfer hyperparameter optimization method which jointly selects configurations that minimize the average rank obtained on historic metadata (transfer_learning_evaluations). This is a searcher which can be used with FIFOScheduler. Reference:

Sequential Model-Free Hyperparameter Tuning.
Martin Wistuba, Nicolas Schilling, Lars Schmidt-Thieme.
IEEE International Conference on Data Mining (ICDM) 2015.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • transfer_learning_evaluations (Dict[str, TransferLearningTaskEvaluations]) – Dictionary from task name to offline evaluations.

  • mode (str) – Whether to minimize (“min”, default) or maximize (“max”)

  • sort_transfer_learning_evaluations (bool) – Use False if the hyperparameters for each task in transfer_learning_evaluations are already in the same order. If set to True, hyperparameters are sorted. Defaults to True

  • use_surrogates (bool) – If the same configuration is not evaluated on all tasks, set this to True. This will generate a set of configurations and will impute their performance using surrogate models. Defaults to False

get_config(**kwargs)[source]

Suggest a new configuration.

Note: Query _next_initial_config() for initial configs to return first.

Parameters:

kwargs – Extra information may be passed from scheduler to searcher

Return type:

Optional[dict]

Returns:

New configuration. The searcher may return None if a new configuration cannot be suggested. In this case, the tuning will stop. This happens if searchers never suggest the same config more than once, and all configs in the (finite) search space are exhausted.

syne_tune.optimizer.schedulers.utils package
Submodules
syne_tune.optimizer.schedulers.utils.simple_profiler module
class syne_tune.optimizer.schedulers.utils.simple_profiler.ProfilingBlock(meta, time_stamp, durations)[source]

Bases: object

meta: Dict[str, Any]
time_stamp: float
durations: Dict[str, List[float]]
class syne_tune.optimizer.schedulers.utils.simple_profiler.SimpleProfiler[source]

Bases: object

Useful to profile time of recurring computations, for example get_config calls in searchers.

Measurements are divided into blocks. A block is started by begin_block. Each block stores meta data, a time stamp when begin_block was called (relative to the time stamp for the first block, which is 0), and a dict of lists of durations, whose keys are tags. A tag corresponds to a range of code to be profiled. It may be executed many times within a block, therefore lists of durations.

Tags can have multiple levels of prefixes, corresponding to brackets.

begin_block(meta)[source]
push_prefix(prefix)[source]
pop_prefix()[source]
start(tag)[source]
stop(tag)[source]
clear()[source]
records_as_dict()[source]

Return records as a dict of lists, can be converted into Pandas data-frame by: :rtype: Dict[str, Any]

pandas.DataFrame.fromDict(…)

Each entry corresponds to a column.

syne_tune.optimizer.schedulers.utils.successive_halving module
syne_tune.optimizer.schedulers.utils.successive_halving.successive_halving_rung_levels(rung_levels, grace_period, reduction_factor, rung_increment, max_t)[source]

Creates rung_levels from grace_period, reduction_factor

Note: If rung_levels is given and rung_levels[-1] == max_t, we strip off this final entry, so that all rung levels are < max_t.

Parameters:
Return type:

List[int]

Returns:

List of rung levels

Submodules
syne_tune.optimizer.schedulers.fifo module
class syne_tune.optimizer.schedulers.fifo.FIFOScheduler(config_space, **kwargs)[source]

Bases: TrialSchedulerWithSearcher

Scheduler which executes trials in submission order.

This is the most basic scheduler template. It can be configured to many use cases by choosing searcher along with search_options.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • searcher (str or BaseSearcher) – Searcher for get_config decisions. String values are passed to searcher_factory() along with search_options and extra information. Supported values: SUPPORTED_SEARCHERS_FIFO. Defaults to “random” (i.e., random search)

  • search_options (Dict[str, Any], optional) – If searcher is str, these arguments are passed to searcher_factory()

  • metric (str or List[str]) – Name of metric to optimize, key in results obtained via on_trial_result. For multi-objective schedulers, this can also be a list

  • mode (str or List[str], optional) – “min” if metric is minimized, “max” if metric is maximized, defaults to “min”. This can also be a list if metric is a list

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If not given, this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified. Note: If searcher is of type BaseSearcher, points_to_evaluate must be set there.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If this is given, max_t is not needed. We recommend to use max_resource_attr over max_t. If given, we use it to infer max_resource_level. It is also used to limit trial executions in promotion-based multi-fidelity schedulers (see class:HyperbandScheduler, type="promotion").

  • max_t (int, optional) – Value for max_resource_level. Needed for schedulers which make use of intermediate reports via on_trial_result. If this is not given, we try to infer its value from config_space (see ResourceLevelsScheduler). checking config_space["epochs"], config_space["max_t"], and config_space["max_epochs"]. If max_resource_attr is given, we use the value config_space[max_resource_attr]. But if max_t is given here, it takes precedence.

  • time_keeper (TimeKeeper, optional) – This will be used for timing here (see _elapsed_time). The time keeper has to be started at the beginning of the experiment. If not given, we use a local time keeper here, which is started with the first call to _suggest(). Can also be set after construction, with set_time_keeper(). Note: If you use SimulatorBackend, you need to pass its time_keeper here.

property searcher: BaseSearcher | None
set_time_keeper(time_keeper)[source]

Assign time keeper after construction.

This is possible only if the time keeper was not assigned at construction, and the experiment has not yet started.

Parameters:

time_keeper (TimeKeeper) – Time keeper to be used

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

Union[str, List[str]]

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

is_multiobjective_scheduler()[source]

Return True if a scheduler is multi-objective.

Return type:

bool

syne_tune.optimizer.schedulers.hyperband module
syne_tune.optimizer.schedulers.hyperband.is_continue_decision(trial_decision)[source]
Return type:

bool

class syne_tune.optimizer.schedulers.hyperband.TrialInformation(config, time_stamp, bracket, keep_case, trial_decision, reported_result=None, largest_update_resource=None)[source]

Bases: object

The scheduler maintains information about all trials it has been dealing with so far. trial_decision is the current status of the trial. keep_case is relevant only if searcher_data == "rungs_and_last". largest_update_resource is the largest resource level for which the searcher was updated, or None. reported_result contains the last recent reported result, or None (task was started, but did not report anything yet). Only contains attributes self.metric and self._resource_attr.

config: Dict[str, Any]
time_stamp: float
bracket: int
keep_case: bool
trial_decision: str
reported_result: Optional[dict] = None
largest_update_resource: Optional[int] = None
restart(time_stamp)[source]
class syne_tune.optimizer.schedulers.hyperband.HyperbandScheduler(config_space, **kwargs)[source]

Bases: FIFOScheduler, MultiFidelitySchedulerMixin, RemoveCheckpointsSchedulerMixin

Implements different variants of asynchronous Hyperband

See type for the different variants. One implementation detail is when using multiple brackets, task allocation to bracket is done randomly, based on a distribution which can be configured.

For definitions of concepts (bracket, rung, milestone), see

Li, Jamieson, Rostamizadeh, Gonina, Hardt, Recht, Talwalkar (2018)
A System for Massively Parallel Hyperparameter Tuning

or

Tiao, Klein, Lienart, Archambeau, Seeger (2020)
Model-based Asynchronous Hyperparameter and Neural Architecture Search

Note

This scheduler requires both metric and resource_attr to be returned by the reporter. Here, resource values must be positive int. If resource_attr == "epoch", this should be the number of epochs done, starting from 1 (not the epoch number, starting from 0).

Rung levels and promotion quantiles

Rung levels are values of the resource attribute at which stop/go decisions are made for jobs, comparing their metric against others at the same level. These rung levels (positive, strictly increasing) can be specified via rung_levels, the largest must be <= max_t. If rung_levels is not given, they are specified by grace_period and reduction_factor or rung_increment:

  • If \(r_{min}\) is grace_period, \(\eta\) is reduction_factor, then rung levels are \(\mathrm{round}(r_{min} \eta^j), j=0, 1, \dots\). This is the default choice for successive halving (Hyperband).

  • If rung_increment is given, but not reduction_factor, then rung levels are \(r_{min} + j \nu, j=0, 1, \dots\), where \(\nu\) is rung_increment.

If rung_levels is given, then grace_period, reduction_factor, rung_increment are ignored. If they are given, a warning is logged.

The rung levels determine the quantiles to be used in the stop/go decisions. If rung levels are \(r_j\), define \(q_j = r_j / r_{j+1}\). \(q_j\) is the promotion quantile at rung level \(r_j\). On average, a fraction of \(q_j\) jobs can continue, the remaining ones are stopped (or paused). In the default successive halving case, we have \(q_j = 1/\eta\) for all \(j\).

Cost-aware schedulers or searchers

Some schedulers (e.g., type == "cost_promotion") or searchers may depend on cost values (with key cost_attr) reported alongside the target metric. For promotion-based scheduling, a trial may pause and resume several times. The cost received in on_trial_result only counts the cost since the last resume. We maintain the sum of such costs in _cost_offset(), and append a new entry to result in on_trial_result with the total cost. If the evaluation function does not implement checkpointing, once a trial is resumed, it has to start from scratch. We detect this in on_trial_result and reset the cost offset to 0 (if the trial runs from scratch, the cost reported needs no offset added).

Note

This process requires cost_attr to be set

Pending evaluations

The searcher is notified, by searcher.register_pending calls, of (trial, resource) pairs for which evaluations are running, and a result is expected in the future. These pending evaluations can be used by the searcher in order to direct sampling elsewhere.

The choice of pending evaluations depends on searcher_data. If equal to “rungs”, pending evaluations sit only at rung levels, because observations are only used there. In the other cases, pending evaluations sit at all resource levels for which observations are obtained. For example, if a trial is at rung level \(r\) and continues towards the next rung level \(r_{next}\), if searcher_data == "rungs", searcher.register_pending is called for \(r_{next}\) only, while for other searcher_data values, pending evaluations are registered for \(r + 1, r + 2, \dots, r_{next}\). However, if in this case, register_pending_myopic is True, we instead call searcher.register_pending for \(r + 1\) when each observation is obtained (not just at a rung level). This leads to less pending evaluations at any one time. On the other hand, when a trial is continued at a rung level, we already know it will emit observations up to the next rung level, so it seems more “correct” to register all these pending evaluations in one go.

Additional arguments on top of parent class FIFOScheduler:

Parameters:
  • searcher (str or BaseSearcher) – Searcher for get_config decisions. String values are passed to searcher_factory() along with search_options and extra information. Supported values: SUPPORTED_SEARCHERS_HYPERBAND. Defaults to “random” (i.e., random search)

  • resource_attr (str, optional) – Name of resource attribute in results obtained via on_trial_result, defaults to “epoch”

  • grace_period (int, optional) – Minimum resource to be used for a job. Ignored if rung_levels is given. Defaults to 1

  • reduction_factor (float, optional) – Parameter to determine rung levels. Ignored if rung_levels is given. Must be \(\ge 2\), defaults to 3

  • rung_increment (int, optional) – Parameter to determine rung levels. Ignored if rung_levels or reduction_factor are given. Must be postive

  • rung_levels (List[int], optional) – If given, prescribes the set of rung levels to be used. Must contain positive integers, strictly increasing. This information overrides grace_period, reduction_factor, rung_increment. Note that the stop/promote rule in the successive halving scheduler is set based on the ratio of successive rung levels.

  • brackets (int, optional) – Number of brackets to be used in Hyperband. Each bracket has a different grace period, all share max_t and reduction_factor. If brackets == 1 (default), we run asynchronous successive halving.

  • type (str, optional) –

    Type of Hyperband scheduler. Defaults to “stopping”. Supported values (see also subclasses of RungSystem):

    • stopping: A config eval is executed by a single task. The task is stopped at a milestone if its metric is worse than a fraction of those who reached the milestone earlier, otherwise it continues. See StoppingRungSystem.

    • promotion: A config eval may be associated with multiple tasks over its lifetime. It is never terminated, but may be paused. Whenever a task becomes available, it may promote a config to the next milestone, if better than a fraction of others who reached the milestone. If no config can be promoted, a new one is chosen. See PromotionRungSystem.

    • cost_promotion: This is a cost-aware variant of ‘promotion’, see CostPromotionRungSystem for details. In this case, costs must be reported under the name rung_system_kwargs["cost_attr"] in results.

    • pasha: Similar to promotion type Hyperband, but it progressively expands the available resources until the ranking of configurations stabilizes.

    • rush_stopping: A variation of the stopping scheduler which requires passing rung_system_kwargs and points_to_evaluate. The first rung_system_kwargs["num_threshold_candidates"] of points_to_evaluate will enforce stricter rules on which task is continued. See RUSHStoppingRungSystem and RUSHScheduler.

    • rush_promotion: Same as rush_stopping but for promotion, see RUSHPromotionRungSystem

    • dyhpo: A model-based scheduler, which can be seen as extension of “promotion” with rung_increment rather than reduction_factor, see DynamicHPOSearcher

  • cost_attr (str, optional) – Required if the scheduler itself uses a cost metric (i.e., type="cost_promotion"), or if the searcher uses a cost metric. See also header comment.

  • searcher_data (str, optional) –

    Relevant only if a model-based searcher is used. Example: For NN tuning and ``resource_attr == “epoch”’, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:

    • ”rungs” (default): Only results at rung levels. Cheapest

    • ”all”: All results. Most expensive

    • ”rungs_and_last”: Results at rung levels, plus the most recent result. This means that in between rung levels, only the most recent result is used by the searcher. This is in between

    Note: For a Gaussian additive learning curve surrogate model, this has to be set to ‘all’.

  • register_pending_myopic (bool, optional) – See above. Used only if searcher_data != "rungs". Defaults to False

  • rung_system_per_bracket (bool, optional) – This concerns Hyperband with brackets > 1. Defaults to False. When starting a job for a new config, it is assigned a randomly sampled bracket. The larger the bracket, the larger the grace period for the config. If rung_system_per_bracket == True, we maintain separate rung level systems for each bracket, so that configs only compete with others started in the same bracket. If rung_system_per_bracket == False, we use a single rung level system, so that all configs compete with each other. In this case, the bracket of a config only determines the initial grace period, i.e. the first milestone at which it starts competing with others. This is the default. The concept of brackets in Hyperband is meant to hedge against overly aggressive filtering in successive halving, based on low fidelity criteria. In practice, successive halving (i.e., brackets = 1) often works best in the asynchronous case (as implemented here). If brackets > 1, the hedging is stronger if rung_system_per_bracket is True.

  • do_snapshots (bool, optional) – Support snapshots? If True, a snapshot of all running tasks and rung levels is returned by _promote_trial(). This snapshot is passed to searcher.get_config. Defaults to False. Note: Currently, only the stopping variant supports snapshots.

  • rung_system_kwargs (Dict[str, Any], optional) –

    Arguments passed to the rung system: * num_threshold_candidates: Used if ``type in [“rush_promotion”,

    ”rush_stopping”]``. The first num_threshold_candidates in points_to_evaluate enforce stricter requirements to the continuation of training tasks. See RUSHScheduler.

    • probability_sh: Used if type == "dyhpo". In DyHPO, we typically all paused trials against a number of new configurations, and the winner is either resumed or started (new trial). However, with the probability given here, we instead try to promote a trial as if type == "promotion". If no trial can be promoted, we fall back to the DyHPO logic. Use this to make DyHPO robust against starting too many new trials, because all paused ones score poorly (this happens especially at the beginning).

  • early_checkpoint_removal_kwargs (Dict[str, Any], optional) – If given, speculative early removal of checkpoints is done, see HyperbandRemoveCheckpointsCallback. The constructor arguments for the HyperbandRemoveCheckpointsCallback must be given here, if they cannot be inferred (key max_num_checkpoints is mandatory). This feature is used only for scheduler types which pause and resume trials.

does_pause_resume()[source]
Return type:

bool

Returns:

Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?

property rung_levels: List[int]

Note that all entries of rung_levels are smaller than max_t (or config_space[max_resource_attr]): rung levels are resource levels where stop/go decisions are made. In particular, if rung_levels is passed at construction with rung_levels[-1] == max_t, this last entry is stripped off.

Returns:

Rung levels (strictly increasing, positive ints)

property num_brackets: int
Returns:

Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1

property resource_attr: str
Returns:

Name of resource attribute in reported results

property max_resource_level: int
Returns:

Maximum resource level

property searcher_data: str
Returns:

Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config() may become. Choices:

  • ”rungs”: Only results at rung levels. Cheapest

  • ”all”: All results. Most expensive

  • ”rungs_and_last”: Results at rung levels plus last recent one. Not available for all multi-fidelity schedulers

on_trial_error(trial)[source]

Called when a trial has failed.

Parameters:

trial (Trial) – Trial for which error is reported.

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict[str, Any]) – Result dictionary

callback_for_checkpoint_removal(stop_criterion)[source]
Parameters:

stop_criterion (Callable[[TuningStatus], bool]) – Stopping criterion, as passed to Tuner

Return type:

Optional[TunerCallback]

Returns:

CP removal callback, or None if CP removal is not activated

class syne_tune.optimizer.schedulers.hyperband.HyperbandBracketManager(scheduler_type, resource_attr, metric, mode, max_t, rung_levels, brackets, rung_system_per_bracket, cost_attr, random_seed, rung_system_kwargs, scheduler)[source]

Bases: object

Maintains rung level systems for range of brackets. Differences depending on scheduler_type manifest themselves mostly at the level of the rung level system itself.

Parameters:
static does_pause_resume(scheduler_type)[source]
Return type:

bool

Returns:

Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?

on_task_add(trial_id, **kwargs)[source]

Called when new task is started (can be new trial or trial being resumed).

Since the bracket has already been sampled, not much is done here. We return the list of milestones for this bracket in reverse (decreasing) order. The first entry is max_t, even if it is not a rung level in the bracket. This list contains the resource levels the task would reach if it ran to max_t without being stopped.

Parameters:
  • trial_id (str) – ID of trial

  • kwargs – Further arguments passed to rung_sys.on_task_add

Return type:

List[int]

Returns:

List of milestones in decreasing order, where`` max_t`` is first

on_task_report(trial_id, result)[source]

This method is called whenever a new report is received. It returns a dictionary with all the information needed for making decisions (e.g., stop / continue task, update model, etc). Keys are:

  • task_continues: Should task continue or stop/pause?

  • milestone_reached: True if rung level (or max_t) is hit

  • next_milestone: If hit rung level < max_t, this is the subsequent rung level (otherwise: None)

  • bracket_id: Bracket in which the task is running

Parameters:
  • trial_id (str) – ID of trial

  • result (Dict[str, Any]) – Results reported

Return type:

Dict[str, Any]

Returns:

See above

on_task_remove(trial_id)[source]

Called when trial is stopped or completes

Parameters:

trial_id – ID of trial

on_task_schedule(new_trial_id)[source]

Samples bracket for task to be scheduled. Check whether any paused trial in that bracket can be promoted. If so, its trial_id is returned. We also return extra_kwargs to be used in _promote_trial. This contains the bracket which was sampled (key “bracket”).

Note: extra_kwargs can return information also if trial_id = None is returned. This information is passed to get_config of the searcher.

Note: extra_kwargs can return information also if trial_id = None is returned. This information is passed to get_config of the searcher.

Parameters:

new_trial_id (str) – ID for new trial as passed to _suggest()

Return type:

(Optional[str], dict)

Returns:

(trial_id, extra_kwargs)

snapshot_rungs(bracket_id)[source]
paused_trials(resource=None)[source]

Only for pause and resume schedulers (does_pause_resume() returns True), where trials can be paused at certain rung levels only. If resource is not given, returns list of all paused trials (trial_id, rank, metric_val, level), where level is the rung level, and rank is the rank of the trial in the rung (0 for the best metric value). If resource is given, only the paused trials in the rung of this level are returned.

Parameters:

resource (Optional[int]) – If given, paused trials of only this rung level are returned. Otherwise, all paused trials are returned

Return type:

List[Tuple[str, int, float, int]]

Returns:

See above

information_for_rungs()[source]
Return type:

List[Tuple[int, int, float]]

Returns:

List of (resource, num_entries, prom_quant), where resource is a rung level, num_entries the number of entries in the rung, and prom_quant the promotion quantile

support_early_checkpoint_removal()[source]
Return type:

bool

syne_tune.optimizer.schedulers.hyperband_checkpoint_removal module
syne_tune.optimizer.schedulers.hyperband_checkpoint_removal.create_callback_for_checkpoint_removal(callback_kwargs, stop_criterion)[source]
Return type:

Optional[TunerCallback]

syne_tune.optimizer.schedulers.hyperband_cost_promotion module
class syne_tune.optimizer.schedulers.hyperband_cost_promotion.CostPromotionRungEntry(trial_id, metric_val, cost_val, was_promoted=False)[source]

Bases: PromotionRungEntry

Appends cost_val to the superclass. This is the cost value \(c(x, r)\) recorded for the trial at the resource level.

class syne_tune.optimizer.schedulers.hyperband_cost_promotion.CostPromotionRungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, cost_attr, max_t)[source]

Bases: PromotionRungSystem

Cost-aware extension of promotion-based asynchronous Hyperband (ASHA).

This code is equivalent with base PromotionRungSystem, except the “promotable” condition in _find_promotable_trial() is replaced.

When a config \(\mathbf{x}\) reaches rung level \(r\), the result includes a metric \(f(\mathbf{x}, r)\), but also a cost \(c(\mathbf{x}, r)\). The latter is the cost (e.g., training time) spent to reach level \(r\).

Consider all trials who reached rung level \(r\) (whether promoted from there or still paused there), ordered w.r.t. \(f(\mathbf{x}, r)\), best first, and let their number be \(N\). Define

\[C(r, k) = \sum_{i\le k} c(\mathbf{x}_i, r)\]

For a promotion quantile \(q\), define

\[K = \max_k \mathrm{I}[ C(r, k) \le q C(r, N) ]\]

Any trial not yet promoted and ranked \(\le K\) can be promoted. As usual, we scan rungs from the top. If several trials are promotable, the one with the best metric value is promoted.

Note that costs \(c(\mathbf{x}, r)\) reported via cost_attr need to be total costs of a trial. If the trial is paused and resumed, partial costs have to be added up. See HyperbandScheduler for how this works.

syne_tune.optimizer.schedulers.hyperband_pasha module
class syne_tune.optimizer.schedulers.hyperband_pasha.PASHARungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t)[source]

Bases: PromotionRungSystem

Implements PASHA algorithm. PASHA is a more efficient version of ASHA and is able to dynamically allocate maximum resources for the tuning procedure depending on the need. Experimental evaluation has shown PASHA consumes significantly fewer computational resources than ASHA.

For more details, see the paper:
Bohdal, Balles, Wistuba, Ermis, Archambeau, Zappella (2023)
PASHA: Efficient HPO and NAS with Progressive Resource Allocation
on_task_report(trial_id, result, skip_rungs)[source]

Apart from calling the superclass method, we also check the rankings and decides if to increase the current maximum resources.

Return type:

Dict[str, Any]

syne_tune.optimizer.schedulers.hyperband_promotion module
class syne_tune.optimizer.schedulers.hyperband_promotion.PromotionRungEntry(trial_id, metric_val, was_promoted=False)[source]

Bases: RungEntry

Appends was_promoted to the superclass. This is True iff the trial has been promoted from this rung. Otherwise, the trial is paused at this rung.

class syne_tune.optimizer.schedulers.hyperband_promotion.PromotionRungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t)[source]

Bases: RungSystem

Implements the promotion logic for an asynchronous variant of Hyperband, known as ASHA:

Li etal
A System for Massively Parallel Hyperparameter Tuning

In ASHA, configs sit paused at milestones (rung levels) until they get promoted, which means that a free task picks up their evaluation until the next milestone.

The rule to decide whether a paused trial is promoted (or remains paused) is the same as in StoppingRungSystem, except that continues becomes gets promoted. If several paused trials in a rung can be promoted, the one with the best metric value is chosen.

Note: Say that an evaluation is resumed from level resume_from. If the trial evaluation function does not implement pause & resume, it needs to start training from scratch, in which case metrics are reported for every epoch, also those < resume_from. At least for some modes of fitting the searcher model to data, this would lead to duplicate target values for the same extended config \((x, r)\), which we want to avoid. The solution is to maintain resume_from in the data for the terminator (see _running). Given this, we can report in on_task_report() that the current metric data should not be used for the searcher model (ignore_data = True), namely as long as the evaluation has not yet gone beyond level resume_from.

on_task_schedule(new_trial_id)[source]

Used to implement _promote_trial(). Searches through rungs to find a trial which can be promoted. If one is found, we return the trial_id and other info (current milestone, milestone to be promoted to). We also mark the trial as being promoted at the rung level it sits right now.

Return type:

Dict[str, Any]

on_task_add(trial_id, skip_rungs, **kwargs)[source]

Called when new task is started. Depending on kwargs["new_config"], this could start an evaluation (True) or promote an existing config to the next milestone (False). In the latter case, kwargs contains additional information about the promotion (in “milestone”, “resume_from”).

Parameters:
  • trial_id (str) – ID of trial to be started

  • skip_rungs (int) – This number of the smallest rung levels are not considered milestones for this task

  • kwargs – Additional arguments

on_task_report(trial_id, result, skip_rungs)[source]

Decision on whether task may continue (task_continues=True), or should be paused (task_continues=False). milestone_reached is a flag whether resource coincides with a milestone. For this scheduler, we have that

task_continues == not milestone_reached,

since a trial is always paused at a milestone.

ignore_data is True if a result is received from a resumed trial at a level <= resume_from. This happens if checkpointing is not implemented (or not used), because resumed trials are started from scratch then. These metric values should in general be ignored.

Parameters:
  • trial_id (str) – ID of trial which reported results

  • result (Dict[str, Any]) – Reported metrics

  • skip_rungs (int) – This number of smallest rung levels are not considered milestones for this task

Return type:

Dict[str, Any]

Returns:

dict(task_continues, milestone_reached, next_milestone, ignore_data)

on_task_remove(trial_id)[source]

Called when task is removed.

Parameters:

trial_id (str) – ID of trial which is to be removed

static does_pause_resume()[source]
Return type:

bool

Returns:

Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?

support_early_checkpoint_removal()[source]
Return type:

bool

Returns:

Do we support early checkpoint removal via paused_trials()?

paused_trials(resource=None)[source]

Only for pause and resume schedulers (does_pause_resume() returns True), where trials can be paused at certain rung levels only. If resource is not given, returns list of all paused trials (trial_id, rank, metric_val, level), where level is the rung level, and rank is the rank of the trial in the rung (0 for the best metric value). If resource is given, only the paused trials in the rung of this level are returned. If resource is not a rung level, the returned list is empty.

Parameters:

resource (Optional[int]) – If given, paused trials of only this rung level are returned. Otherwise, all paused trials are returned

Return type:

List[Tuple[str, int, float, int]]

Returns:

See above

syne_tune.optimizer.schedulers.hyperband_rush module
class syne_tune.optimizer.schedulers.hyperband_rush.RUSHDecider(num_threshold_candidates, mode)[source]

Bases: object

Implements the additional decision logic according to the RUSH algorithm. It is used as part of RUSHStoppingRungSystem and RUSHPromotionRungSystem. Reference:

A resource-efficient method for repeated HPO and NAS.
Giovanni Zappella, David Salinas, Cédric Archambeau.
AutoML workshop @ ICML 2021.

For a more detailed description, refer to RUSHScheduler.

Parameters:
  • num_threshold_candidates (int) – Number of threshold candidates

  • mode (str) – “min” or “max”

task_continues(task_continues, trial_id, metric_val, resource)[source]
Return type:

bool

class syne_tune.optimizer.schedulers.hyperband_rush.RUSHStoppingRungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t, num_threshold_candidates)[source]

Bases: StoppingRungSystem

Implementation for RUSH algorithm, stopping variant.

Additional arguments on top of base class StoppingRungSystem:

Parameters:

num_threshold_candidates (int) – Number of threshold candidates

class syne_tune.optimizer.schedulers.hyperband_rush.RUSHPromotionRungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t, num_threshold_candidates)[source]

Bases: PromotionRungSystem

Implementation for RUSH algorithm, promotion variant.

Additional arguments on top of base class PromotionRungSystem:

Parameters:

num_threshold_candidates (int) – Number of threshold candidates

syne_tune.optimizer.schedulers.hyperband_stopping module
class syne_tune.optimizer.schedulers.hyperband_stopping.RungEntry(trial_id, metric_val)[source]

Bases: object

Represents entry in a rung. This class is extended by rung level systems which need to maintain more information per entry.

Parameters:
  • trial_id (str) – ID of trial

  • metric_val (float) – Metric value

class syne_tune.optimizer.schedulers.hyperband_stopping.Rung(level, prom_quant, mode, data=None)[source]

Bases: object

Parameters:
  • level (int) – Rung level \(r_j\)

  • prom_quant (float) – promotion quantile \(q_j\)

  • data (Optional[List[RungEntry]]) – Data of all previous jobs reaching the level. This list is kept sorted w.r.t. metric_val, so that best values come first

add(entry)[source]
pop(pos)[source]
Return type:

RungEntry

quantile()[source]

Returns same value as numpy.quantile(metric_vals, q), where metric_vals are the metric values in data, and q = prom_quant if mode == "min", q = ``1 - prom_quant otherwise. If len(data) < 2, we return None.

See here. The default for numpy.quantile is method="linear".

Return type:

Optional[float]

Returns:

See above

class syne_tune.optimizer.schedulers.hyperband_stopping.RungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t)[source]

Bases: object

Terminology: Trials emit results at certain resource levels (e.g., epoch numbers). Some resource levels are rung levels, this is where scheduling decisions (stop, continue or pause, resume) are taken. For a running trial, the next rung level (or max_t) it will reach is called its next milestone.

Note that rung_levels, promote_quantiles can be empty. All entries of rung_levels are smaller than max_t.

Parameters:
  • rung_levels (List[int]) – List of rung levels (positive int, increasing)

  • promote_quantiles (List[float]) – List of promotion quantiles at each rung level

  • metric (str) – Name of metric to optimize

  • mode (str) – “min” or “max”

  • resource_attr (str) – Name of resource attribute

  • max_t (int) – Largest resource level

on_task_schedule(new_trial_id)[source]

Called when new task is to be scheduled.

For a promotion-based rung system, check whether any trial can be promoted. If so, return dict with keys “trial_id”, “resume_from” (rung level where trial is paused), “milestone” (next rung level the trial will reach, or None).

If no trial can be promoted, or if the rung system is not promotion-based, the returned dictionary must not contain the “trial_id” key. It is nevertheless passed back via extra_kwargs in on_task_schedule(). The default is to return an empty dictionary, but some special subclasses can use this to return information in case a trial is not promoted.

If no trial can be promoted, or if the rung system is not promotion-based, the returned dictionary must not contain the “trial_id” key. It is nevertheless passed back via extra_kwargs in on_task_schedule(). The default is to return an empty dictionary, but some special subclasses can use this to return information in case a trial is not promoted.

Parameters:

new_trial_id (str) – ID for new trial as passed to _suggest(). Only needed by specific subclasses

Return type:

Dict[str, Any]

Returns:

See above

on_task_add(trial_id, skip_rungs, **kwargs)[source]

Called when new task is started.

Parameters:
  • trial_id (str) – ID of trial to be started

  • skip_rungs (int) – This number of the smallest rung levels are not considered milestones for this task

  • kwargs – Additional arguments

on_task_report(trial_id, result, skip_rungs)[source]

Called when a trial reports metric results.

Returns dict with keys “milestone_reached” (trial reaches its milestone), “task_continues” (trial should continue; otherwise it is stopped or paused), “next_milestone” (next milestone it will reach, or None). For certain subclasses, there may be additional entries.

Parameters:
  • trial_id (str) – ID of trial which reported results

  • result (Dict[str, Any]) – Reported metrics

  • skip_rungs (int) – This number of the smallest rung levels are not considered milestones for this task

Return type:

Dict[str, Any]

Returns:

See above

on_task_remove(trial_id)[source]

Called when task is removed.

Parameters:

trial_id (str) – ID of trial which is to be removed

get_first_milestone(skip_rungs)[source]
Parameters:

skip_rungs (int) – This number of the smallest rung levels are not considered milestones for this task

Return type:

int

Returns:

First milestone to be considered

get_milestones(skip_rungs)[source]
Parameters:

skip_rungs (int) – This number of the smallest rung levels are not considered milestones for this task

Return type:

List[int]

Returns:

All milestones to be considered, in decreasing order; does not include max_t

snapshot_rungs(skip_rungs)[source]

A snapshot is a list of rung levels with entries (level, data), ordered from top to bottom (largest rung first).

Parameters:

skip_rungs (int) – This number of the smallest rung levels are not considered milestones for this task

Return type:

List[Tuple[int, List[RungEntry]]]

Returns:

Snapshot (see above)

static does_pause_resume()[source]
Return type:

bool

Returns:

Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?

support_early_checkpoint_removal()[source]
Return type:

bool

Returns:

Do we support early checkpoint removal via paused_trials()?

paused_trials(resource=None)[source]

Only for pause and resume schedulers (does_pause_resume() returns True), where trials can be paused at certain rung levels only. If resource is not given, returns list of all paused trials (trial_id, rank, metric_val, level), where level is the rung level, and rank is the rank of the trial in the rung (0 for the best metric value). If resource is given, only the paused trials in the rung of this level are returned. If resource is not a rung level, the returned list is empty.

Parameters:

resource (Optional[int]) – If given, paused trials of only this rung level are returned. Otherwise, all paused trials are returned

Return type:

List[Tuple[str, int, float, int]]

Returns:

See above

information_for_rungs()[source]
Return type:

List[Tuple[int, int, float]]

Returns:

List of (resource, num_entries, prom_quant), where resource is a rung level, num_entries the number of entries in the rung, and prom_quant the promotion quantile

class syne_tune.optimizer.schedulers.hyperband_stopping.StoppingRungSystem(rung_levels, promote_quantiles, metric, mode, resource_attr, max_t)[source]

Bases: RungSystem

The decision on whether a trial \(\mathbf{x}\) continues or is stopped at a rung level \(r\), is taken in on_task_report(). To this end, the metric value \(f(\mathbf{x}, r)\) is inserted into \(r.data\). Then:

\[\mathrm{continues}(\mathbf{x}, r)\; \Leftrightarrow\; f(\mathbf{x}, r) \le \mathrm{np.quantile}(r.data, r.prom\_quant)\]

in case mode == "min". See also _task_continues().

on_task_schedule(new_trial_id)[source]

Called when new task is to be scheduled.

For a promotion-based rung system, check whether any trial can be promoted. If so, return dict with keys “trial_id”, “resume_from” (rung level where trial is paused), “milestone” (next rung level the trial will reach, or None).

If no trial can be promoted, or if the rung system is not promotion-based, the returned dictionary must not contain the “trial_id” key. It is nevertheless passed back via extra_kwargs in on_task_schedule(). The default is to return an empty dictionary, but some special subclasses can use this to return information in case a trial is not promoted.

If no trial can be promoted, or if the rung system is not promotion-based, the returned dictionary must not contain the “trial_id” key. It is nevertheless passed back via extra_kwargs in on_task_schedule(). The default is to return an empty dictionary, but some special subclasses can use this to return information in case a trial is not promoted.

Parameters:

new_trial_id (str) – ID for new trial as passed to _suggest(). Only needed by specific subclasses

Return type:

Dict[str, Any]

Returns:

See above

on_task_report(trial_id, result, skip_rungs)[source]

Called when a trial reports metric results.

Returns dict with keys “milestone_reached” (trial reaches its milestone), “task_continues” (trial should continue; otherwise it is stopped or paused), “next_milestone” (next milestone it will reach, or None). For certain subclasses, there may be additional entries.

Parameters:
  • trial_id (str) – ID of trial which reported results

  • result (Dict[str, Any]) – Reported metrics

  • skip_rungs (int) – This number of the smallest rung levels are not considered milestones for this task

Return type:

Dict[str, Any]

Returns:

See above

static does_pause_resume()[source]
Return type:

bool

Returns:

Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?

syne_tune.optimizer.schedulers.median_stopping_rule module
class syne_tune.optimizer.schedulers.median_stopping_rule.MedianStoppingRule(scheduler, resource_attr, running_average=True, metric=None, grace_time=1, grace_population=5, rank_cutoff=0.5)[source]

Bases: TrialScheduler

Applies median stopping rule in top of an existing scheduler.

  • If result at time-step ranks less than the cutoff of other results observed at this time-step, the trial is interrupted and otherwise, the wrapped scheduler is called to make the stopping decision.

  • Suggest decisions are left to the wrapped scheduler.

  • The mode of the wrapped scheduler is used.

Reference:

Google Vizier: A Service for Black-Box Optimization.
Golovin et al. 2017.
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, August 2017
Pages 1487–1495
Parameters:
  • scheduler (TrialScheduler) – Scheduler to be called for trial suggestion or when median-stopping-rule decision is to continue.

  • resource_attr (str) – Key in the reported dictionary that accounts for the resource (e.g. epoch).

  • running_average (bool) – If True, then uses the running average of observation instead of raw observations. Defaults to True

  • metric (Optional[str]) – Metric to be considered, defaults to scheduler.metric

  • grace_time (Optional[int]) – Median stopping rule is only applied for results whose resource_attr exceeds this amount. Defaults to 1

  • grace_population (int) – Median stopping rule when at least grace_population have been observed at a resource level. Defaults to 5

  • rank_cutoff (float) – Results whose quantiles are below this level are discarded. Defaults to 0.5 (median)

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

grace_condition(time_step)[source]
Parameters:

time_step (float) – Value result[self.resource_attr]

Return type:

bool

Returns:

Decide for continue?

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

syne_tune.optimizer.schedulers.multi_fidelity module
class syne_tune.optimizer.schedulers.multi_fidelity.MultiFidelitySchedulerMixin[source]

Bases: object

Declares properties which are required for multi-fidelity schedulers.

property resource_attr: str
Returns:

Name of resource attribute in reported results

property max_resource_level: int
Returns:

Maximum resource level

property rung_levels: List[int]
Returns:

Rung levels (positive int; increasing), may or may not include max_resource_level

property searcher_data: str
Returns:

Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config() may become. Choices:

  • ”rungs”: Only results at rung levels. Cheapest

  • ”all”: All results. Most expensive

  • ”rungs_and_last”: Results at rung levels plus last recent one. Not available for all multi-fidelity schedulers

property num_brackets: int
Returns:

Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1

syne_tune.optimizer.schedulers.pbt module
class syne_tune.optimizer.schedulers.pbt.PBTTrialState(trial, last_score=None, last_checkpoint=None, last_perturbation_time=0, stopped=False)[source]

Bases: object

Internal PBT state tracked per-trial.

trial: Trial
last_score: float = None
last_checkpoint: int = None
last_perturbation_time: int = 0
stopped: bool = False
class syne_tune.optimizer.schedulers.pbt.PopulationBasedTraining(config_space, custom_explore_fn=None, **kwargs)[source]

Bases: FIFOScheduler

Implements the Population Based Training (PBT) algorithm. This is an adapted version of the Ray Tune implementation:

https://docs.ray.io/en/latest/tune/tutorials/tune-advanced-tutorial.html

PBT was originally presented in the following paper:

Jaderberg et. al.
Population Based Training of Neural Networks

Population based training (PBT) maintains a population of models spread across an asynchronous set of workers and dynamically adjust their hyperparameters during training. Every time a worker reaches a user-defined milestone, it returns the performance of the currently evaluated network. If the network is within the top percentile of the population, the worker resumes its training until the next milestone. If not, PBT selects a model from the top percentile uniformly at random. The worker now continues with the latest checkpoint of this new model but mutates the hyperparameters.

The mutation happens as following. For each hyperparameter, we either resample its value uniformly at random, or otherwise increment (multiply by 1.2) or decrement (multiply by 0.8) the value (probability 0.5 each). For categorical hyperparameters, the value is always resampled uniformly.

Note: While this is implemented as child of FIFOScheduler, we require searcher="random" (default), since the current code only supports a random searcher.

Additional arguments on top of parent class FIFOScheduler.

Parameters:
  • resource_attr (str) – Name of resource attribute in results obtained via on_trial_result, defaults to “time_total_s”

  • population_size (int, optional) – Size of the population, defaults to 4

  • perturbation_interval (float, optional) – Models will be considered for perturbation at this interval of resource_attr. Note that perturbation incurs checkpoint overhead, so you shouldn’t set this to be too frequent. Defaults to 60

  • quantile_fraction (float, optional) – Parameters are transferred from the top quantile_fraction fraction of trials to the bottom quantile_fraction fraction. Needs to be between 0 and 0.5. Setting it to 0 essentially implies doing no exploitation at all. Defaults to 0.25

  • resample_probability (float, optional) – The probability of resampling from the original distribution when applying _explore(). If not resampled, the value will be perturbed by a factor of 1.2 or 0.8 if continuous, or changed to an adjacent value if discrete. Defaults to 0.25

  • custom_explore_fn (function, optional) – Custom exploration function. This function is invoked as f(config) instead of the built-in perturbations, and should return config updated as needed. If this is given, resample_probability is not used

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

syne_tune.optimizer.schedulers.random_seeds module
syne_tune.optimizer.schedulers.random_seeds.generate_random_seed(random_state=<module 'numpy.random' from '/home/docs/checkouts/readthedocs.org/user_builds/syne-tune/envs/stable/lib/python3.9/site-packages/numpy/random/__init__.py'>)[source]
Return type:

int

class syne_tune.optimizer.schedulers.random_seeds.RandomSeedGenerator(master_seed)[source]

Bases: object

syne_tune.optimizer.schedulers.ray_scheduler module
class syne_tune.optimizer.schedulers.ray_scheduler.RayTuneScheduler(config_space, ray_scheduler=None, ray_searcher=None, points_to_evaluate=None)[source]

Bases: TrialScheduler

Allow using Ray scheduler and searcher. Any searcher/scheduler should work, except such which need access to TrialRunner (e.g., PBT), this feature is not implemented in Syne Tune.

If ray_searcher is not given (defaults to random searcher), initial configurations to evaluate can be passed in points_to_evaluate. If ray_searcher is given, this argument is ignored (needs to be passed to ray_searcher at construction). Note: Use impute_points_to_evaluate() in order to preprocess points_to_evaluate specified by the user or the benchmark.

Parameters:
  • config_space (Dict) – Configuration space

  • ray_scheduler – Ray scheduler, defaults to FIFO scheduler

  • ray_searcher (Optional[Searcher]) – Ray searcher, defaults to random search

  • points_to_evaluate (Optional[List[Dict]]) – See above

RT_FIFOScheduler

alias of FIFOScheduler

RT_Searcher

alias of Searcher

class RandomSearch(config_space, points_to_evaluate, mode)[source]

Bases: Searcher

suggest(trial_id)[source]

Queries the algorithm to retrieve the next set of parameters.

Return type:

Optional[Dict]

Arguments:

trial_id: Trial ID used for subsequent notifications.

Returns:
dict | FINISHED | None: Configuration for a trial, if possible.

If FINISHED is returned, Tune will be notified that no more suggestions/configurations will be provided. If None is returned, Tune will skip the querying of the searcher for this step.

on_trial_complete(trial_id, result=None, error=False)[source]

Notification for the completion of trial.

Typically, this method is used for notifying the underlying optimizer of the result.

Args:

trial_id: A unique string ID for the trial. result: Dictionary of metrics for current training progress.

Note that the result dict may include NaNs or may not include the optimization metric. It is up to the subclass implementation to preprocess the result to avoid breaking the optimization process. Upon errors, this may also be None.

error: True if the training process raised an error.

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_error(trial)[source]

Called when a trial has failed.

Parameters:

trial (Trial) – Trial for which error is reported.

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict) – Result dictionary

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

static convert_config_space(config_space)[source]

Converts config_space from our type to the one of Ray Tune.

Note: randint(lower, upper) in Ray Tune has exclusive upper, while this is inclusive for us. On the other hand, lograndint(lower, upper) has inclusive upper in Ray Tune as well.

Parameters:

config_space – Configuration space

Returns:

config_space converted into Ray Tune type

syne_tune.optimizer.schedulers.remove_checkpoints module
class syne_tune.optimizer.schedulers.remove_checkpoints.RemoveCheckpointsSchedulerMixin[source]

Bases: object

Methods to be implemented by pause-and-resume schedulers (in that on_trial_result() can return SchedulerDecision.PAUSE) which support early removal of checkpoints. Typically, model checkpoints are retained for paused trials, because they may get resumed later on. This can lead to the disk filling up, so removing checkpoints which are no longer needed, can be important.

Early checkpoint removal is implemented as a callback used with Tuner, which is created by callback_for_checkpoint_removal() here.

callback_for_checkpoint_removal(stop_criterion)[source]
Parameters:

stop_criterion (Callable[[TuningStatus], bool]) – Stopping criterion, as passed to Tuner

Return type:

Optional[TunerCallback]

Returns:

CP removal callback, or None if CP removal is not activated

syne_tune.optimizer.schedulers.scheduler_searcher module
class syne_tune.optimizer.schedulers.scheduler_searcher.TrialSchedulerWithSearcher(config_space, **kwargs)[source]

Bases: TrialScheduler

Base class for trial schedulers which have a BaseSearcher member searcher. This searcher has a method configure_scheduler() which has to be called before the searcher is first used.

We also collect common code here:

  • Determine max_resource_level if not explicitly given

  • Master seed, random_seed_generator

property searcher: BaseSearcher | None
suggest(trial_id)[source]

Returns a suggestion for a new trial, or one to be resumed

This method returns suggestion of type TrialSuggestion (unless there is no config left to explore, and None is returned).

If suggestion.spawn_new_trial_id is True, a new trial is to be started with config suggestion.config. Typically, this new trial is started from scratch. But if suggestion.checkpoint_trial_id is given, the trial is to be (warm)started from the checkpoint written for the trial with this ID. The new trial has ID trial_id.

If suggestion.spawn_new_trial_id is False, an existing and currently paused trial is to be resumed, whose ID is suggestion.checkpoint_trial_id. If this trial has a checkpoint, we start from there. In this case, suggestion.config is optional. If not given (default), the config of the resumed trial does not change. Otherwise, its config is overwritten by suggestion.config (see HyperbandScheduler with type="promotion" for an example why this can be useful).

Apart from the HP config, additional fields can be appended to the dict, these are passed to the trial function as well.

Parameters:

trial_id (int) – ID for new trial to be started (ignored if existing trial to be resumed)

Return type:

Optional[TrialSuggestion]

Returns:

Suggestion for a trial to be started or to be resumed, see above. If no suggestion can be made, None is returned

on_trial_error(trial)[source]

Called when a trial has failed.

Parameters:

trial (Trial) – Trial for which error is reported.

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict[str, Any]) – Result dictionary

syne_tune.optimizer.schedulers.smac_scheduler module
Submodules
syne_tune.optimizer.baselines module
class syne_tune.optimizer.baselines.RandomSearch(config_space, metric, **kwargs)[source]

Bases: FIFOScheduler

Random search.

See RandomSearcher for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.GridSearch(config_space, metric, **kwargs)[source]

Bases: FIFOScheduler

Grid search.

See GridSearcher for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.BayesianOptimization(config_space, metric, **kwargs)[source]

Bases: FIFOScheduler

Gaussian process based Bayesian optimization.

See GPFIFOSearcher for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.ASHA(config_space, metric, resource_attr, **kwargs)[source]

Bases: HyperbandScheduler

Asynchronous Sucessive Halving (ASHA).

One of max_t, max_resource_attr needs to be in kwargs. For type="promotion", the latter is more useful.

See also HyperbandScheduler for kwargs parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • kwargs – Additional arguments to HyperbandScheduler

class syne_tune.optimizer.baselines.MOBSTER(config_space, metric, resource_attr, **kwargs)[source]

Bases: HyperbandScheduler

Model-based Asynchronous Multi-fidelity Optimizer (MOBSTER).

One of max_t, max_resource_attr needs to be in kwargs. For type="promotion", the latter is more useful, see also HyperbandScheduler.

MOBSTER can be run with different surrogate models. The model is selected by search_options["model"] in kwargs. The default is "gp_multitask" (jointly dependent multi-task GP model), another useful choice is "gp_independent" (independent GP models at each rung level, with shared ARD kernel).

See also:

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • kwargs – Additional arguments to HyperbandScheduler

class syne_tune.optimizer.baselines.HyperTune(config_space, metric, resource_attr, **kwargs)[source]

Bases: HyperbandScheduler

One of max_t, max_resource_attr needs to be in kwargs. For type="promotion", the latter is more useful, see also HyperbandScheduler.

Hyper-Tune is a model-based variant of ASHA with more than one bracket. It can be seen as extension of MOBSTER and can be used with search_options["model"] in kwargs being "gp_independent" or "gp_multitask". It has a model-based way to sample the bracket for every new trial, as well as an ensemble predictive distribution feeding into the acquisition function. Our implementation is based on:

Yang Li et al
Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale
VLDB 2022

See also:

Parameters:
  • config_space (Dict) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • kwargs – Additional arguments to HyperbandScheduler

class syne_tune.optimizer.baselines.DyHPO(config_space, metric, resource_attr, probability_sh=None, **kwargs)[source]

Bases: HyperbandScheduler

Dynamic Gray-Box Hyperparameter Optimization (DyHPO)

One of max_t, max_resource_attr needs to be in kwargs. The latter is more useful (DyHPO is a pause-resume scheduler), see also HyperbandScheduler.

DyHPO can be run with the same surrogate models as MOBSTER, but search_options["model"] != "gp_independent". This is because DyHPO requires extrapolation to resource levels without any data, which cannot sensibly be done with independent GPs per resource level. Compared to MOBSTER or HyperTune, DyHPO is typically run with linearly spaced rung levels (the default being 1, 2, 3, …). Decisions whether to promote a paused trial are folded together with suggesting a new configuration, both are model-based. Our implementation is based on

Wistuba, M. and Kadra, A. and Grabocka, J.
Dynamic and Efficient Gray-Box Hyperparameter Optimization for Deep Learning

However, there are important differences:

  • We do not implement their surrogate model based on a neural network kernel, but instead just use the surrogate models we provide for MOBSTER as well

  • We implement a hybrid of DyHPO with the asynchronous successive halving rule for promoting trials, controlled by probability_sh. With this probability, we promote a trial via the SH rule. This mitigates the issue that DyHPO tends to start many trials initially, because due to lack of any data at higher rungs, the score values for promoting a trial are much worse than those for starting a new one.

See HyperbandScheduler for kwargs parameters, and GPMultiFidelitySearcher for kwargs["search_options"] parameters. The following parameters are most important for DyHPO:

  • rung_increment (and grace_period): These parameters determine the rung level spacing. DyHPO is run with linearly spaced rung levels :math:`r_{min} + k

u`, where \(r_{min}\) is grace_period and

:math:`

u` is rung_increment. The default is 2.
  • probability_sh: See comment. The smaller this probability, the closer the method is to the published original, which tends to start many more trials than promote paused ones. On the other hand, if this probability is close to 1, you may as well run MOBSTER. The default is DEFAULT_SH_PROBABILITY.

  • search_options["opt_skip_period"]: DyHPO can be quite a bit slower than MOBSTER, because the GP surrogate model is used more frequently. It can be sped up a bit by changing opt_skip_period (general default is 1). The default here is 3.

class syne_tune.optimizer.baselines.PASHA(config_space, metric, resource_attr, **kwargs)[source]

Bases: HyperbandScheduler

Progressive ASHA.

One of max_t, max_resource_attr needs to be in kwargs. The latter is more useful, see also HyperbandScheduler.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • kwargs – Additional arguments to HyperbandScheduler

class syne_tune.optimizer.baselines.BOHB(config_space, metric, resource_attr, **kwargs)[source]

Bases: HyperbandScheduler

Asynchronous BOHB

Combines ASHA with TPE-like Bayesian optimization, using kernel density estimators.

One of max_t, max_resource_attr needs to be in kwargs. For type="promotion", the latter is more useful, see also HyperbandScheduler.

See MultiFidelityKernelDensityEstimator for kwargs["search_options"] parameters, and HyperbandScheduler for kwargs parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • kwargs – Additional arguments to HyperbandScheduler

class syne_tune.optimizer.baselines.SyncHyperband(config_space, metric, resource_attr, **kwargs)[source]

Bases: SynchronousGeometricHyperbandScheduler

Synchronous Hyperband.

One of max_resource_level, max_resource_attr needs to be in kwargs. The latter is more useful, see also HyperbandScheduler.

If kwargs["brackets"] is not given, the maximum number of brackets is used. Choose kwargs["brackets"] = 1 for synchronous successive halving.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • kwargs – Additional arguments to SynchronousGeometricHyperbandScheduler

class syne_tune.optimizer.baselines.SyncBOHB(config_space, metric, resource_attr, **kwargs)[source]

Bases: SynchronousGeometricHyperbandScheduler

Synchronous BOHB.

Combines SyncHyperband with TPE-like Bayesian optimization, using kernel density estimators.

One of max_resource_level, max_resource_attr needs to be in kwargs. The latter is more useful, see also HyperbandScheduler.

If kwargs["brackets"] is not given, the maximum number of brackets is used. Choose kwargs["brackets"] = 1 for synchronous successive halving.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • kwargs – Additional arguments to SynchronousGeometricHyperbandScheduler

class syne_tune.optimizer.baselines.DEHB(config_space, metric, resource_attr, **kwargs)[source]

Bases: GeometricDifferentialEvolutionHyperbandScheduler

Differential Evolution Hyperband (DEHB).

Combines SyncHyperband with ideas from evolutionary algorithms.

One of max_resource_level, max_resource_attr needs to be in kwargs. The latter is more useful, see also HyperbandScheduler.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • kwargs – Additional arguments to SynchronousGeometricHyperbandScheduler

class syne_tune.optimizer.baselines.SyncMOBSTER(config_space, metric, resource_attr, **kwargs)[source]

Bases: SynchronousGeometricHyperbandScheduler

Synchronous MOBSTER.

Combines SyncHyperband with Gaussian process based Bayesian optimization, just like MOBSTER builds on top of ASHA in the asynchronous case.

One of max_resource_level, max_resource_attr needs to be in kwargs. The latter is more useful, see also HyperbandScheduler.

If kwargs["brackets"] is not given, the maximum number of brackets is used. Choose kwargs["brackets"] = 1 for synchronous successive halving.

The default surrogate model (search_options["model"] in kwargs) is "gp_independent", different to MOBSTER.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • kwargs – Additional arguments to SynchronousGeometricHyperbandScheduler

class syne_tune.optimizer.baselines.BORE(config_space, metric, random_seed=None, **kwargs)[source]

Bases: FIFOScheduler

Bayesian Optimization by Density-Ratio Estimation (BORE).

See Bore for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • random_seed (Optional[int]) – Random seed, optional

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.ASHABORE(config_space, metric, resource_attr, random_seed=None, **kwargs)[source]

Bases: HyperbandScheduler

Model-based ASHA with BORE searcher

See MultiFidelityBore for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • random_seed (Optional[int]) – Random seed, optional

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.BoTorch(config_space, metric, random_seed=None, **kwargs)[source]

Bases: FIFOScheduler

Bayesian Optimization using BoTorch

See BoTorchSearcher for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • random_seed (Optional[int]) – Random seed, optional

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.REA(config_space, metric, population_size=100, sample_size=10, random_seed=None, **kwargs)[source]

Bases: FIFOScheduler

Regularized Evolution (REA).

See RegularizedEvolution for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • population_size (int) – See RegularizedEvolution. Defaults to 100

  • sample_size (int) – See RegularizedEvolution. Defaults to 10

  • random_seed (Optional[int]) – Random seed, optional

  • kwargs – Additional arguments to FIFOScheduler

syne_tune.optimizer.baselines.create_gaussian_process_estimator(config_space, metric, random_seed=None, search_options=None)[source]
Return type:

Estimator

class syne_tune.optimizer.baselines.MORandomScalarizationBayesOpt(config_space, metric, mode='min', random_seed=None, estimators=None, **kwargs)[source]

Bases: FIFOScheduler

Uses MultiObjectiveMultiSurrogateSearcher with one standard GP surrogate model per metric (same as in BayesianOptimization, together with the MultiObjectiveLCBRandomLinearScalarization acquisition function.

If estimators is given, surrogate models are taken from there, and the default is used otherwise. This is useful if you have a good low-variance model for one of the objectives.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (List[str]) – Name of metrics to optimize

  • mode (Union[List[str], str]) – Modes of optimization. Defaults to “min” for all

  • random_seed (Optional[int]) – Random seed, optional

  • estimators (Optional[Dict[str, Estimator]]) – Use these surrogate models instead of the default GP one. Optional

  • kwargs – Additional arguments to FIFOScheduler. Here, kwargs["search_options"] is used to create the searcher and its GP surrogate models.

class syne_tune.optimizer.baselines.NSGA2(config_space, metric, mode='min', population_size=20, random_seed=None, **kwargs)[source]

Bases: FIFOScheduler

See RandomSearcher for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (List[str]) – Name of metric to optimize

  • population_size (int) – The size of the population for NSGA-2

  • random_seed (Optional[int]) – Random seed, optional

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.MOREA(config_space, metric, mode='min', population_size=100, sample_size=10, random_seed=None, **kwargs)[source]

Bases: FIFOScheduler

See RandomSearcher for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (List[str]) – Name of metric to optimize

  • population_size (int) – See RegularizedEvolution. Defaults to 100

  • sample_size (int) – See RegularizedEvolution. Defaults to 10

  • random_seed (Optional[int]) – Random seed, optional

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.MOLinearScalarizationBayesOpt(config_space, metric, scalarization_weights=None, **kwargs)[source]

Bases: LinearScalarizedScheduler

Uses LinearScalarizedScheduler together with a default GP surrogate model.

See GPFIFOSearcher for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (List[str]) – Name of metric to optimize

  • scalarization_weights (Optional[List[float]]) – Positive weight used for the scalarization. Defaults to all 1

  • kwargs – Additional arguments to FIFOScheduler

scalarization_weights: ndarray
single_objective_metric: str
base_scheduler: TrialScheduler
class syne_tune.optimizer.baselines.ConstrainedBayesianOptimization(config_space, metric, constraint_attr, **kwargs)[source]

Bases: FIFOScheduler

Constrained Bayesian Optimization.

See ConstrainedGPFIFOSearcher for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • constraint_attr (str) – Name of constraint metric

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.ZeroShotTransfer(config_space, transfer_learning_evaluations, metric, mode='min', sort_transfer_learning_evaluations=True, use_surrogates=False, random_seed=None, **kwargs)[source]

Bases: FIFOScheduler

A zero-shot transfer hyperparameter optimization method which jointly selects configurations that minimize the average rank obtained on historic metadata (transfer_learning_evaluations). Reference:

Sequential Model-Free Hyperparameter Tuning.
Martin Wistuba, Nicolas Schilling, Lars Schmidt-Thieme.
IEEE International Conference on Data Mining (ICDM) 2015.
Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • transfer_learning_evaluations (Dict[str, TransferLearningTaskEvaluations]) – Dictionary from task name to offline evaluations.

  • metric (str) – Name of metric to optimize

  • mode (str) – Whether to minimize (min) or maximize (max)

  • sort_transfer_learning_evaluations (bool) – Use False if the hyperparameters for each task in transfer_learning_evaluations are already in the same order. If set to True, hyperparameters are sorted.

  • use_surrogates (bool) – If the same configuration is not evaluated on all tasks, set this to True. This will generate a set of configurations and will impute their performance using surrogate models.

  • random_seed (Optional[int]) – Used for randomly sampling candidates. Only used if use_surrogates=True.

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.ASHACTS(config_space, metric, resource_attr, transfer_learning_evaluations, mode='min', random_seed=None, **kwargs)[source]

Bases: HyperbandScheduler

Runs ASHA where the searcher is done with the transfer-learning method:

A Quantile-based Approach for Hyperparameter Transfer Learning.
David Salinas, Huibin Shen, Valerio Perrone.
ICML 2020.

This is the Copula Thompson Sampling approach described in the paper where a surrogate is fitted on the transfer learning data to predict mean and variance of configuration performance given a hyperparameter. The surrogate is then sampled from, and the best configurations are returned as next candidate to evaluate.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • resource_attr (str) – Name of resource attribute

  • transfer_learning_evaluations (Dict[str, TransferLearningTaskEvaluations]) – Dictionary from task name to offline evaluations.

  • mode (str) – Whether to minimize (min) or maximize (max)

  • random_seed (Optional[int]) – Used for randomly sampling candidates

  • kwargs – Additional arguments to HyperbandScheduler

class syne_tune.optimizer.baselines.KDE(config_space, metric, **kwargs)[source]

Bases: FIFOScheduler

Single-fidelity variant of BOHB

Combines FIFOScheduler with TPE-like Bayesian optimization, using kernel density estimators.

See KernelDensityEstimator for kwargs["search_options"] parameters.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • metric (str) – Name of metric to optimize

  • kwargs – Additional arguments to FIFOScheduler

class syne_tune.optimizer.baselines.CQR(config_space, metric, mode='min', random_seed=None, **kwargs)[source]

Bases: FIFOScheduler

Single-fidelity Conformal Quantile Regression approach proposed in:
Optimizing Hyperparameters with Conformal Quantile Regression.
David Salinas, Jacek Golebiowski, Aaron Klein, Matthias Seeger, Cedric Archambeau.
ICML 2023.

The method predict quantile performance with gradient boosted trees and calibrate prediction with conformal predictions.

class syne_tune.optimizer.baselines.ASHACQR(config_space, metric, resource_attr, mode='min', random_seed=None, **kwargs)[source]

Bases: HyperbandScheduler

Multi-fidelity Conformal Quantile Regression approach proposed in:
Optimizing Hyperparameters with Conformal Quantile Regression.
David Salinas, Jacek Golebiowski, Aaron Klein, Matthias Seeger, Cedric Archambeau.
ICML 2023.

The method predict quantile performance with gradient boosted trees and calibrate prediction with conformal predictions.

syne_tune.optimizer.scheduler module
class syne_tune.optimizer.scheduler.SchedulerDecision[source]

Bases: object

Possible return values of TrialScheduler.on_trial_result(), signals the tuner how to proceed with the reporting trial.

The difference between PAUSE and STOP is important. If a trial is stopped, it cannot be resumed afterwards. Its checkpoints may be deleted. If a trial is paused, it may be resumed in the future, and its most recent checkpoint should be retained.

CONTINUE = 'CONTINUE'

Status for continuing trial execution

PAUSE = 'PAUSE'

Status for pausing trial execution

STOP = 'STOP'

Status for stopping trial execution

class syne_tune.optimizer.scheduler.TrialSuggestion(spawn_new_trial_id=True, checkpoint_trial_id=None, config=None)[source]

Bases: object

Suggestion returned by TrialScheduler.suggest()

Parameters:
  • spawn_new_trial_id (bool) – Whether a new trial_id should be used.

  • checkpoint_trial_id (Optional[int]) – Checkpoint of this trial ID should be used to resume from. If spawn_new_trial_id is False, then the trial checkpoint_trial_id is resumed with its previous checkpoint.

  • config (Optional[dict]) – The configuration which should be evaluated.

spawn_new_trial_id: bool = True
checkpoint_trial_id: Optional[int] = None
config: Optional[dict] = None
static start_suggestion(config, checkpoint_trial_id=None)[source]

Suggestion to start new trial

Parameters:
  • config (Dict[str, Any]) – Configuration to use for the new trial.

  • checkpoint_trial_id (Optional[int]) – Use checkpoint of this trial when starting the new trial (otherwise, it is started from scratch).

Return type:

TrialSuggestion

Returns:

A trial decision that consists in starting a new trial (which would receive a new trial-id).

static resume_suggestion(trial_id, config=None)[source]

Suggestion to resume a paused trial

Parameters:
  • trial_id (int) – ID of trial to be resumed (from its checkpoint)

  • config (Optional[dict]) – Configuration to use for resumed trial

Return type:

TrialSuggestion

Returns:

A trial decision that consists in resuming trial trial-id with config if provided, or the previous configuration used if not provided.

class syne_tune.optimizer.scheduler.TrialScheduler(config_space)[source]

Bases: object

Schedulers maintain and drive the logic of an experiment, making decisions which configs to evaluate in new trials, and which trials to stop early.

Some schedulers support pausing and resuming trials. In this case, they also drive the decision when to restart a paused trial.

Parameters:

config_space (Dict[str, Any]) – Configuration spoce

suggest(trial_id)[source]

Returns a suggestion for a new trial, or one to be resumed

This method returns suggestion of type TrialSuggestion (unless there is no config left to explore, and None is returned).

If suggestion.spawn_new_trial_id is True, a new trial is to be started with config suggestion.config. Typically, this new trial is started from scratch. But if suggestion.checkpoint_trial_id is given, the trial is to be (warm)started from the checkpoint written for the trial with this ID. The new trial has ID trial_id.

If suggestion.spawn_new_trial_id is False, an existing and currently paused trial is to be resumed, whose ID is suggestion.checkpoint_trial_id. If this trial has a checkpoint, we start from there. In this case, suggestion.config is optional. If not given (default), the config of the resumed trial does not change. Otherwise, its config is overwritten by suggestion.config (see HyperbandScheduler with type="promotion" for an example why this can be useful).

Apart from the HP config, additional fields can be appended to the dict, these are passed to the trial function as well.

Parameters:

trial_id (int) – ID for new trial to be started (ignored if existing trial to be resumed)

Return type:

Optional[TrialSuggestion]

Returns:

Suggestion for a trial to be started or to be resumed, see above. If no suggestion can be made, None is returned

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_error(trial)[source]

Called when a trial has failed.

Parameters:

trial (Trial) – Trial for which error is reported.

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict[str, Any]) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict[str, Any]) – Result dictionary

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

Union[str, List[str]]

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

metadata()[source]
Return type:

Dict[str, Any]

Returns:

Metadata for the scheduler

is_multiobjective_scheduler()[source]

Return True if a scheduler is multi-objective.

Return type:

bool

syne_tune.remote package
Submodules
syne_tune.remote.constants module
syne_tune.remote.estimators module
syne_tune.remote.estimators.instance_sagemaker_estimator(**kwargs)[source]

Returns SageMaker estimator to be used for simulator back-end experiments and for remote launching of SageMaker back-end experiments.

Parameters:

kwargs – Extra arguments to SageMaker estimator

Returns:

SageMaker estimator

syne_tune.remote.estimators.basic_cpu_instance_sagemaker_estimator(**kwargs)[source]

Returns SageMaker estimator to be used for simulator back-end experiments and for remote launching of SageMaker back-end experiments.

Parameters:

kwargs – Extra arguments to SageMaker estimator

Returns:

SageMaker estimator

syne_tune.remote.estimators.pytorch_estimator(**estimator_kwargs)[source]

Get the PyTorch sagemaker estimator with the most up-to-date framework version. List of available containers: https://github.com/aws/deep-learning-containers/blob/master/available_images.md

Parameters:

estimator_kwargs – Estimator parameters as discussed in https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/sagemaker.pytorch.html

Return type:

PyTorch

Returns:

PyTorch estimator

syne_tune.remote.estimators.huggingface_estimator(**estimator_kwargs)[source]

Get the Huggingface sagemaker estimator with the most up-to-date framework version. List of available containers: https://github.com/aws/deep-learning-containers/blob/master/available_images.md

Parameters:

estimator_kwargs – Estimator parameters as discussed in https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/sagemaker.huggingface.html

Return type:

HuggingFace

Returns:

PyTorch estimator

syne_tune.remote.estimators.sklearn_estimator(**estimator_kwargs)[source]

Get the Scikit-learn sagemaker estimator with the most up-to-date framework version. List of available containers: https://github.com/aws/deep-learning-containers/blob/master/available_images.md

Parameters:

estimator_kwargs – Estimator parameters as discussed in https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html

Return type:

SKLearn

Returns:

PyTorch estimator

syne_tune.remote.estimators.mxnet_estimator(**estimator_kwargs)[source]

Get the MXNet sagemaker estimator with the most up-to-date framework version. List of available containers: https://github.com/aws/deep-learning-containers/blob/master/available_images.md

Parameters:

estimator_kwargs – Estimator parameters as discussed in https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/sagemaker.mxnet.html

Return type:

MXNet

Returns:

PyTorch estimator

syne_tune.remote.remote_launcher module
class syne_tune.remote.remote_launcher.RemoteLauncher(tuner, role=None, instance_type='ml.c5.4xlarge', dependencies=None, store_logs_localbackend=False, log_level=None, s3_path=None, no_tuner_logging=False, publish_tuning_metrics=True, **estimator_kwargs)[source]

Bases: object

This class allows to launch a tuning job remotely. The remote tuning job may use either the local backend (in which case the remote instance will be used to evaluate trials) or the Sagemaker backend in which case the remote instance will spawn one Sagemaker job per trial.

Parameters:
  • tuner (Tuner) – Tuner that should be run remotely on a instance_type instance. Note that StoppingCriterion should be used for the Tuner rather than a lambda function to ensure serialization.

  • role (Optional[str]) – SageMaker role to be used to launch the remote tuning instance.

  • instance_type (str) – Instance where the tuning is going to happen. Defaults to “ml.c5.4xlarge”

  • dependencies (Optional[List[str]]) – List of folders that should be included as dependencies for the backend script to run

  • estimator_kwargs – Extra arguments for creating the SageMaker estimator for the tuning code.

  • store_logs_localbackend (bool) – Whether to sync logs and checkpoints to S3 when using the local backend. When using SageMaker backend, logs are persisted by SageMaker. Using True can lead to failure with large checkpoints. Defauls to False

  • log_level (Optional[int]) – Logging level. Default is logging.INFO, while logging.DEBUG gives more messages

  • s3_path (Optional[str]) – S3 base path used for checkpointing, outputs of tuning will be stored under {s3_path}/{tuner_name}. The logs of the local backend are only stored if store_logs_localbackend is True. Defaults to s3_experiment_path()

  • no_tuner_logging (bool) – If True, the logging level for syne_tune.tuner is set to logging.ERROR. Defaults to False

  • publish_tuning_metrics (bool) – If True, a number of tuning metrics (see RemoteTuningMetricsCallback) are reported and displayed in the SageMaker training job console. This is modifying tuner, in the sense that a callback is appended to tuner.callbacks. Defaults to True.

is_lambda(f)[source]
Parameters:

f – Object to test

Returns:

True iff f is a lambda function

run(wait=True)[source]
Parameters:

wait (bool) – Whether the call should wait until the job completes (default: True). If False the call returns once the tuning job is scheduled on SageMaker.

prepare_upload()[source]

Prepares the files that needs to be uploaded by SageMaker so that the tuning job can happen. This includes, 1) the entrypoint script of the backend and 2) the tuner that needs to run remotely.

get_source_dir()[source]
Return type:

Path

is_source_dir_specified()[source]
Return type:

bool

update_backend_with_remote_paths()[source]

Update the paths of the backend of the endpoint script and source dir with their remote location.

upload_dir()[source]
Return type:

Path

remote_script_dir()[source]
Return type:

Path

launch_tuning_job_on_sagemaker(wait)[source]
clean_requirements_file()[source]
syne_tune.remote.remote_launcher.syne_tune_image_uri()[source]
Return type:

str

Returns:

syne tune docker uri, if not present try to build it and returns an error if this failed.

syne_tune.remote.remote_main module

Entrypoint script that allows to launch a tuning job remotely. It loads the tuner from a specified path then runs it.

syne_tune.remote.remote_main.decode_bool(hp)[source]
syne_tune.remote.remote_metrics_callback module
class syne_tune.remote.remote_metrics_callback.RemoteTuningMetricsCallback(metric, mode, config_space=None, resource_attr=None)[source]

Bases: TunerCallback

Reports metrics related to the experiment run by Tuner. With remote tuning, if these metrics are registered with the SageMaker estimator running the experiment, they are visualized in the SageMaker console. Metrics reported are:

  • BEST_METRIC_VALUE: Best value of metric reported to tuner so far

  • BEST_TRIAL_ID: ID of trial for which the best metric value was reported so far

  • BEST_RESOURCE_VALUE: Resource value for which the best metric value was reported so far. Only if resource_attr is given

  • If config_space is given, then for each hyperparameter name in there (entry with domain), we add a metric BEST_HP_PREFIX + name. However, at most MAX_METRICS_SUPPORTED_BY_SAGEMAKER are supported

static get_metric_names(config_space, resource_attr=None)[source]
register_metrics_with_estimator(estimator)[source]

Registers metrics reported here at SageMaker estimator estimator. This should be the one which runs the remote experiment.

Note: The total number of metric definitions must not exceed MAX_METRICS_SUPPORTED_BY_SAGEMAKER. Otherwise, only the initial part of metric_names is registered.

Parameters:

estimator (EstimatorBase) – SageMaker estimator to run the experiment

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tunerTuner object

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict[str, Any]) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

syne_tune.remote.scheduling module
syne_tune.remote.scheduling.backoff(errorname, ntimes_resource_wait=100, length2sleep=600)[source]

Decorator that back offs for a fixed about of s after a given error is detected

syne_tune.utils package
syne_tune.utils.add_checkpointing_to_argparse(parser)[source]

To be called for the argument parser in the endpoint script. Arguments added here are optional. If checkpointing is not supported, they are simply not parsed.

Parameters:

parser (ArgumentParser) – Parser to add extra arguments to

syne_tune.utils.resume_from_checkpointed_model(config, load_model_fn)[source]

Checks whether there is a checkpoint to be resumed from. If so, the checkpoint is loaded by calling load_model_fn. This function takes a local pathname (to which it appends a filename). It returns resume_from, the resource value (e.g., epoch) the checkpoint was written at. If it fails to load the checkpoint, it may return 0. This skips resuming from a checkpoint. This resume_from value is returned.

If checkpointing is not supported in config, or no checkpoint is found, resume_from = 0 is returned.

Parameters:
  • config (Dict[str, Any]) – Configuration the training script is called with

  • load_model_fn (Callable[[str], int]) – See above, must return resume_from. See pytorch_load_save_functions() for an example

Return type:

int

Returns:

resume_from (0 if no checkpoint has been loaded)

syne_tune.utils.checkpoint_model_at_rung_level(config, save_model_fn, resource)[source]

If checkpointing is supported, checks whether a checkpoint is to be written. This is the case if the checkpoint dir is set in config. A checkpoint is written by calling save_model_fn, passing the local pathname and resource.

Note: Why is resource passed here? In the future, we want to support writing checkpoints only for certain resource levels. This is useful if writing the checkpoint is expensive compared to the time needed to run one resource unit.

Parameters:
  • config (Dict[str, Any]) – Configuration the training script is called with

  • save_model_fn (Callable[[str, int], Any]) – See above. See pytorch_load_save_functions() for an example

  • resource (int) – Current resource level (e.g., number of epochs done)

syne_tune.utils.pytorch_load_save_functions(state_dict_objects, mutable_state=None, fname='checkpoint.json')[source]

Provides default load_model_fn, save_model_fn functions for standard PyTorch models (arguments to resume_from_checkpointed_model(), checkpoint_model_at_rung_level()).

Parameters:
  • state_dict_objects (Dict[str, Any]) – Dict of PyTorch objects implementing state_dict and load_state_dict

  • mutable_state (Optional[dict]) – Optional. Additional dict with elementary value types

  • fname (str) – Name of local file (path is taken from config)

Returns:

load_model_fn, save_model_fn

syne_tune.utils.parse_bool(val)[source]
Return type:

bool

syne_tune.utils.add_config_json_to_argparse(parser)[source]

To be called for the argument parser in the endpoint script.

Parameters:

parser (ArgumentParser) – Parser to add extra arguments to

syne_tune.utils.load_config_json(args)[source]

Loads configuration from JSON file and returns the union with args.

Parameters:

args (Dict[str, Any]) – Arguments returned by ArgumentParser, as dictionary

Return type:

Dict[str, Any]

Returns:

Combined configuration dictionary

syne_tune.utils.streamline_config_space(config_space, exclude_names=None, verbose=False)[source]

Given a configuration space config_space, this function returns a new configuration space where some domains may have been replaced by approximately equivalent ones, which are however better suited for Bayesian optimization. Entries with key in exclude_names are not replaced.

See convert_domain() for what replacement rules may be applied.

Parameters:
  • config_space (Dict[str, Any]) – Original configuration space

  • exclude_names (Optional[List[str]]) – Do not convert entries with these keys

  • verbose (bool) – Log output for replaced domains? Defaults to False

Return type:

Dict[str, Any]

Returns:

Streamlined configuration space

Submodules
syne_tune.utils.checkpoint module
syne_tune.utils.checkpoint.add_checkpointing_to_argparse(parser)[source]

To be called for the argument parser in the endpoint script. Arguments added here are optional. If checkpointing is not supported, they are simply not parsed.

Parameters:

parser (ArgumentParser) – Parser to add extra arguments to

syne_tune.utils.checkpoint.resume_from_checkpointed_model(config, load_model_fn)[source]

Checks whether there is a checkpoint to be resumed from. If so, the checkpoint is loaded by calling load_model_fn. This function takes a local pathname (to which it appends a filename). It returns resume_from, the resource value (e.g., epoch) the checkpoint was written at. If it fails to load the checkpoint, it may return 0. This skips resuming from a checkpoint. This resume_from value is returned.

If checkpointing is not supported in config, or no checkpoint is found, resume_from = 0 is returned.

Parameters:
  • config (Dict[str, Any]) – Configuration the training script is called with

  • load_model_fn (Callable[[str], int]) – See above, must return resume_from. See pytorch_load_save_functions() for an example

Return type:

int

Returns:

resume_from (0 if no checkpoint has been loaded)

syne_tune.utils.checkpoint.checkpoint_model_at_rung_level(config, save_model_fn, resource)[source]

If checkpointing is supported, checks whether a checkpoint is to be written. This is the case if the checkpoint dir is set in config. A checkpoint is written by calling save_model_fn, passing the local pathname and resource.

Note: Why is resource passed here? In the future, we want to support writing checkpoints only for certain resource levels. This is useful if writing the checkpoint is expensive compared to the time needed to run one resource unit.

Parameters:
  • config (Dict[str, Any]) – Configuration the training script is called with

  • save_model_fn (Callable[[str, int], Any]) – See above. See pytorch_load_save_functions() for an example

  • resource (int) – Current resource level (e.g., number of epochs done)

syne_tune.utils.checkpoint.pytorch_load_save_functions(state_dict_objects, mutable_state=None, fname='checkpoint.json')[source]

Provides default load_model_fn, save_model_fn functions for standard PyTorch models (arguments to resume_from_checkpointed_model(), checkpoint_model_at_rung_level()).

Parameters:
  • state_dict_objects (Dict[str, Any]) – Dict of PyTorch objects implementing state_dict and load_state_dict

  • mutable_state (Optional[dict]) – Optional. Additional dict with elementary value types

  • fname (str) – Name of local file (path is taken from config)

Returns:

load_model_fn, save_model_fn

syne_tune.utils.config_as_json module
syne_tune.utils.config_as_json.add_config_json_to_argparse(parser)[source]

To be called for the argument parser in the endpoint script.

Parameters:

parser (ArgumentParser) – Parser to add extra arguments to

syne_tune.utils.config_as_json.load_config_json(args)[source]

Loads configuration from JSON file and returns the union with args.

Parameters:

args (Dict[str, Any]) – Arguments returned by ArgumentParser, as dictionary

Return type:

Dict[str, Any]

Returns:

Combined configuration dictionary

syne_tune.utils.convert_domain module
syne_tune.utils.convert_domain.fit_to_regular_grid(x)[source]

Computes the least squares fit of \(a * j + b\) to x[j], where \(j = 0,\dots, n-1\). Returns the LS estimate of a, b, and the coefficient of variation \(R^2\).

Parameters:

x (ndarray) – Strictly increasing sequence

Return type:

Dict[str, float]

Returns:

See above

syne_tune.utils.convert_domain.convert_choice_domain(domain, name=None)[source]

If the choice domain domain has more than 2 numerical values, it is converted to finrange(), logfinrange(), ordinal(), or logordinal(). Otherwise, domain is returned as is.

The idea is to compute the least squares fit \(a * j + b\) to x[j], where x are the sorted values or their logs (if all values are positive). If this fit is very close (judged by coefficient of variation \(R^2\)), we use the equispaced types finrange or logfinrange, otherwise we use ordinal or logordinal.

Return type:

Domain

syne_tune.utils.convert_domain.convert_linear_to_log_domain(domain, name=None)[source]
Return type:

Domain

syne_tune.utils.convert_domain.convert_domain(domain, name=None)[source]

If one of the following rules apply, domain is converted and returned, otherwise it is returned as is.

  • domain is categorical, its values are numerical. This is converted to finrange(), logfinrange(), ordinal(), or logordinal(). We fit the values or their logs to the closest regular grid, converting to (log)finrange if the least squares fit to the grid is good enough, otherwise to (log)ordinal, where ordinal is with kind="nn". Note that the conversion to (log)finrange may result in slightly different values.

  • domain is float` or ``int. This is converted to the same type, but in log scale, if the current scale is linear, lower is positive, and the ratio upper / lower is larger than UPPER_LOWER_RATIO_THRESHOLD.

Parameters:

domain (Domain) – Original domain

Return type:

Domain

Returns:

Streamlined domain

syne_tune.utils.convert_domain.streamline_config_space(config_space, exclude_names=None, verbose=False)[source]

Given a configuration space config_space, this function returns a new configuration space where some domains may have been replaced by approximately equivalent ones, which are however better suited for Bayesian optimization. Entries with key in exclude_names are not replaced.

See convert_domain() for what replacement rules may be applied.

Parameters:
  • config_space (Dict[str, Any]) – Original configuration space

  • exclude_names (Optional[List[str]]) – Do not convert entries with these keys

  • verbose (bool) – Log output for replaced domains? Defaults to False

Return type:

Dict[str, Any]

Returns:

Streamlined configuration space

syne_tune.utils.parse_bool module
syne_tune.utils.parse_bool.parse_bool(val)[source]
Return type:

bool

Submodules
syne_tune.config_space module
class syne_tune.config_space.Domain[source]

Bases: object

Base class to specify a type and valid range to sample parameters from.

This base class is implemented by parameter spaces, like float ranges (Float), integer ranges (Integer), or categorical variables (Categorical). The Domain object contains information about valid values (e.g. minimum and maximum values), and exposes methods that allow specification of specific samplers (e.g. uniform() or loguniform()).

sampler = None
default_sampler_cls = None
property value_type
Returns:

Type of values (one of str, float, int)

cast(value)[source]
Parameters:

value – Value top cast

Returns:

value cast to domain. For a finite domain, this can involve rounding

set_sampler(sampler, allow_override=False)[source]
get_sampler()[source]
Return type:

Sampler

sample(spec=None, size=1, random_state=None)[source]
Parameters:
  • spec (Union[List[dict], dict, None]) – Passed to sampler

  • size (int) – Number of values to sample, defaults to 1

  • random_state (Optional[RandomState]) – PRN generator

Return type:

Union[Any, List[Any]]

Returns:

Single value (size == 1) or list (size > 1)

is_grid()[source]
Return type:

bool

is_function()[source]
Return type:

bool

is_valid(value)[source]
Parameters:

value (Any) – Value to test

Returns:

Is value a valid value in domain?

property domain_str
match_string(value)[source]

Returns string representation of value (which must be of domain type) which is to match configurations for (approximate) equality. For discrete types (e.g., Integer, Categorical), this matches for exact equality.

Parameters:

value (Any) – Value of domain type (use cast() to be safe)

Return type:

str

Returns:

String representation useful for matching

class syne_tune.config_space.Sampler[source]

Bases: object

sample(domain, spec=None, size=1, random_state=None)[source]
class syne_tune.config_space.BaseSampler[source]

Bases: Sampler

class syne_tune.config_space.Uniform[source]

Bases: Sampler

class syne_tune.config_space.LogUniform(base=2.718281828459045)[source]

Bases: Sampler

Note: We keep the argument base for compatibility with Ray Tune. Since base has no effect on the distribution, we don’t use it internally.

class syne_tune.config_space.Normal(mean=0.0, sd=0.0)[source]

Bases: Sampler

class syne_tune.config_space.Grid[source]

Bases: Sampler

Dummy sampler used for grid search

sample(domain, spec=None, size=1, random_state=None)[source]
class syne_tune.config_space.Float(lower, upper)[source]

Bases: Domain

Continuous value in closed interval [lower, upper].

Parameters:
  • lower (float) – Lower bound (included)

  • upper (float) – Upper bound (included)

default_sampler_cls

alias of _Uniform

property value_type
Returns:

Type of values (one of str, float, int)

uniform()[source]
loguniform()[source]
reverseloguniform()[source]
normal(mean=0.0, sd=1.0)[source]
quantized(q)[source]
is_valid(value)[source]
Parameters:

value (float) – Value to test

Returns:

Is value a valid value in domain?

property domain_str
match_string(value)[source]

Returns string representation of value (which must be of domain type) which is to match configurations for (approximate) equality. For discrete types (e.g., Integer, Categorical), this matches for exact equality.

Parameters:

value – Value of domain type (use cast() to be safe)

Return type:

str

Returns:

String representation useful for matching

class syne_tune.config_space.Integer(lower, upper)[source]

Bases: Domain

Integer value in closed interval [lower, upper]. Note that upper is included.

Parameters:
  • lower (int) – Lower bound (included)

  • upper (int) – Upper bound (included)

default_sampler_cls

alias of _Uniform

property value_type
Returns:

Type of values (one of str, float, int)

cast(value)[source]
Parameters:

value – Value top cast

Returns:

value cast to domain. For a finite domain, this can involve rounding

quantized(q)[source]
uniform()[source]
loguniform()[source]
is_valid(value)[source]
Parameters:

value (int) – Value to test

Returns:

Is value a valid value in domain?

property domain_str
match_string(value)[source]

Returns string representation of value (which must be of domain type) which is to match configurations for (approximate) equality. For discrete types (e.g., Integer, Categorical), this matches for exact equality.

Parameters:

value – Value of domain type (use cast() to be safe)

Return type:

str

Returns:

String representation useful for matching

class syne_tune.config_space.Categorical(categories)[source]

Bases: Domain

Value from finite set, whose values do not have a total ordering. For values with an ordering, use Ordinal.

Parameters:

categories (Sequence) – Finite sequence, all entries must have same type

default_sampler_cls

alias of _Uniform

uniform()[source]
grid()[source]
is_valid(value)[source]
Parameters:

value (Any) – Value to test

Returns:

Is value a valid value in domain?

property value_type
Returns:

Type of values (one of str, float, int)

property domain_str
cast(value)[source]
Parameters:

value – Value top cast

Returns:

value cast to domain. For a finite domain, this can involve rounding

match_string(value)[source]

Returns string representation of value (which must be of domain type) which is to match configurations for (approximate) equality. For discrete types (e.g., Integer, Categorical), this matches for exact equality.

Parameters:

value – Value of domain type (use cast() to be safe)

Return type:

str

Returns:

String representation useful for matching

class syne_tune.config_space.Ordinal(categories)[source]

Bases: Categorical

Represents an ordered set. As far as random sampling is concerned, this type is equivalent to Categorical, but when used in methods that require encodings (or distances), nearby values have closer encodings.

Parameters:

categories (Sequence) – Finite sequence, all entries must have same type

class syne_tune.config_space.OrdinalNearestNeighbor(categories, log_scale=False)[source]

Bases: Ordinal

Different type for ordered set of numerical values (int or float). Essentially, the finite set is represented by a real-valued interval containing all values, and random sampling draws a value from this interval and rounds it to the nearest value in categories. If log_scale is True, all of this happens in log scale. Unless values are equidistant, this is different from Ordinal.

Parameters:
  • categories (Sequence) – Finite sequence, must be strictly increasing, value type must be float or int. If log_scale=True, values must be positive

  • log_scale (bool) – Encoding and NN matching in log domain?

property lower_int: float | None
property upper_int: float | None
property categories_int: ndarray | None
cast_int(value_int)[source]
cast(value)[source]
Parameters:

value – Value top cast

Returns:

value cast to domain. For a finite domain, this can involve rounding

set_sampler(sampler, allow_override=False)[source]
get_sampler()[source]
sample(spec=None, size=1, random_state=None)[source]
Parameters:
  • spec (Union[List[dict], dict, None]) – Passed to sampler

  • size (int) – Number of values to sample, defaults to 1

  • random_state (Optional[RandomState]) – PRN generator

Return type:

Union[Any, List[Any]]

Returns:

Single value (size == 1) or list (size > 1)

class syne_tune.config_space.FiniteRange(lower, upper, size, log_scale=False, cast_int=False)[source]

Bases: Domain

Represents a finite range [lower, ..., upper] with size values equally spaced in linear or log domain. If cast_int, the value type is int (rounding after the transform).

Parameters:
  • lower (float) – Lower bound (included)

  • upper (float) – Upper bound (included)

  • size (int) – Number of values

  • log_scale (bool) – Equal spacing in log domain?

  • cast_int (bool) – Value type is int (float otherwise)

property values
property value_type
Returns:

Type of values (one of str, float, int)

cast(value)[source]
Parameters:

value – Value top cast

Returns:

value cast to domain. For a finite domain, this can involve rounding

set_sampler(sampler, allow_override=False)[source]
get_sampler()[source]
sample(spec=None, size=1, random_state=None)[source]
Parameters:
  • spec (Union[List[dict], dict, None]) – Passed to sampler

  • size (int) – Number of values to sample, defaults to 1

  • random_state (Optional[RandomState]) – PRN generator

Return type:

Union[Any, List[Any]]

Returns:

Single value (size == 1) or list (size > 1)

property domain_str
match_string(value)[source]

Returns string representation of value (which must be of domain type) which is to match configurations for (approximate) equality. For discrete types (e.g., Integer, Categorical), this matches for exact equality.

Parameters:

value – Value of domain type (use cast() to be safe)

Return type:

str

Returns:

String representation useful for matching

syne_tune.config_space.uniform(lower, upper)[source]

Uniform float value between lower and upper

Parameters:
  • lower (float) – Lower bound (included)

  • upper (float) – Upper bound (included)

Returns:

Float object

syne_tune.config_space.loguniform(lower, upper)[source]

Log-uniform float value between lower and upper

Sampling is done as exp(x), where x is uniform between log(lower) and log(upper).

Parameters:
  • lower (float) – Lower bound (included; positive)

  • upper (float) – Upper bound (included; positive)

Returns:

Float object

syne_tune.config_space.randint(lower, upper)[source]

Uniform integer between lower and upper

lower and upper are inclusive. This is a difference to Ray Tune, where upper is exclusive.

Parameters:
  • lower (int) – Lower bound (included)

  • upper (int) – Upper bound (included)

:return Integer object

syne_tune.config_space.lograndint(lower, upper)[source]

Log-uniform integer between lower and upper

lower and upper are inclusive. Note: Ray Tune has an argument base here, but since this does not affect the distribution, we drop it.

Parameters:
  • lower (int) – Lower bound (included)

  • upper (int) – Upper bound (included)

:return Integer object

syne_tune.config_space.choice(categories)[source]

Uniform over list of categories

Parameters:

categories (list) – Sequence of values, all entries must have the same type

Returns:

Categorical object

syne_tune.config_space.ordinal(categories, kind=None)[source]

Ordinal value from list categories. Different variants are selected by kind.

For kind == "equal", sampling is the same as for choice, and the internal encoding is by int (first value maps to 0, second to 1, …).

For kind == "nn", the finite set is represented by a real-valued interval containing all values, and random sampling draws a value from this interval and rounds it to the nearest value in categories. This behaves like a finite version of uniform or randint. For kind == "nn-log", nearest neighbour rounding happens in log space, which behaves like a finite version of loguniform`() or lograndint`(). You can also use the synonym logordinal(). For this type, values in categories must be int or float and strictly increasing, and also positive if kind == "nn-log".

Parameters:
  • categories (list) – Sequence of values, all entries must have the same type

  • kind (Optional[str]) – Can be “equal”, “nn”, “nn-log”

Returns:

Ordinal or OrdinalNearestNeighbor object

syne_tune.config_space.logordinal(categories)[source]

Corresponds to ordinal() with kind="nn-log", so that nearest neighbour mapping happens in log scale. Values in categories must be int or float, strictly increasing, and positive.

Parameters:

categories (list) – Sequence of values, strictly increasing, of type float or int, all positive

Returns:

OrdinalNearestNeighbor object

syne_tune.config_space.finrange(lower, upper, size, cast_int=False)[source]

Finite range [lower, ..., upper] with size entries, which are equally spaced. Finite alternative to uniform().

Parameters:
  • lower (float) – Smallest feasible value

  • upper (float) – Largest feasible value

  • size (int) – Size of (finite) domain, must be >= 2

  • cast_int (bool) – Values rounded and cast to int?

Returns:

FiniteRange object

syne_tune.config_space.logfinrange(lower, upper, size, cast_int=False)[source]

Finite range [lower, ..., upper] with size entries, which are equally spaced in the log domain. Finite alternative to loguniform().

Parameters:
  • lower (float) – Smallest feasible value (positive)

  • upper (float) – Largest feasible value (positive)

  • size (int) – Size of (finite) domain, must be >= 2

  • cast_int (bool) – Values rounded and cast to int?

Returns:

FiniteRange object

syne_tune.config_space.is_log_space(domain)[source]
Parameters:

domain (Domain) – Hyperparameter type

Return type:

bool

Returns:

Logarithmic encoding?

syne_tune.config_space.is_reverse_log_space(domain)[source]
Return type:

bool

syne_tune.config_space.is_uniform_space(domain)[source]
Parameters:

domain (Domain) – Hyperparameter type

Return type:

bool

Returns:

Linear (uniform) encoding?

syne_tune.config_space.add_to_argparse(parser, config_space)[source]

Use this to prepare argument parser in endpoint script, for the non-fixed parameters in config_space.

Parameters:
  • parser (ArgumentParser) – argparse.ArgumentParser object

  • config_space (Dict[str, Any]) – Configuration space (modified)

syne_tune.config_space.cast_config_values(config, config_space)[source]

Returns config with keys, values of config, but values are cast to their specific types.

Parameters:
  • config (Dict[str, Any]) – Config whose values are to be cast

  • config_space (Dict[str, Any]) – Configuration space

Return type:

Dict[str, Any]

Returns:

New config with values cast to correct types

syne_tune.config_space.non_constant_hyperparameter_keys(config_space)[source]
Parameters:

config_space (Dict[str, Any]) – Configuration space

Return type:

List[str]

Returns:

Keys corresponding to (non-fixed) hyperparameters

syne_tune.config_space.config_space_size(config_space, upper_limit=1048576)[source]

Counts the number of distinct configurations in the configuration space config_space. If this is infinite (due to real-valued parameters) or larger than upper_limit, None is returned.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space

  • upper_limit (int) – See above. Defaults to 2**20

Return type:

Optional[int]

Returns:

Number of distinct configurations; or None if infinite or more than upper_limit

syne_tune.config_space.config_to_match_string(config, config_space, keys)[source]

Maps configuration to a match string, which can be used to compare configs for (approximate) equality. Only keys in keys are used, in that ordering.

Parameters:
  • config (Dict[str, Any]) – Configuration to be encoded in match string

  • config_space (Dict[str, Any]) – Configuration space

  • keys (List[str]) – Keys of parameters to be encoded

Return type:

str

Returns:

Match string

syne_tune.config_space.to_dict(x)[source]

We assume that for each Domain subclass, the __init__() kwargs are also members, and all other members start with _.

Parameters:

x (Domain) – Domain object

Return type:

Dict[str, Any]

Returns:

Representation as dict

syne_tune.config_space.from_dict(d)[source]
Parameters:

d (Dict[str, Any]) – Representation of Domain object as dict

Return type:

Domain

Returns:

Decoded Domain object

syne_tune.config_space.config_space_to_json_dict(config_space)[source]

Converts config_space into a dictionary that can be saved as a json file.

Parameters:

config_space (Dict[str, Union[Domain, int, float, str]]) – Configuration space

Return type:

Dict[str, Union[int, float, str]]

Returns:

JSON-serializable dictionary representing config_space

syne_tune.config_space.config_space_from_json_dict(config_space_dict)[source]

Converts the given dictionary into a Syne Tune search space.

Reverse of config_space_to_json_dict().

Parameters:

config_space_dict (Dict[str, Union[int, float, str]]) – JSON-serializable dict, as output by config_space_to_json_dict()

Return type:

Dict[str, Union[Domain, int, float, str]]

Returns:

Configuration space corresponding to config_space_dict

syne_tune.config_space.restrict_domain(numerical_domain, lower, upper)[source]

Restricts a numerical domain to be in the range [lower, upper]

Parameters:
  • numerical_domain (Domain) – Numerical domain

  • lower (float) – Lower bound

  • upper (float) – Upper bound

Return type:

Domain

Returns:

Restricted domain

class syne_tune.config_space.Quantized(sampler, q)[source]

Bases: Sampler

get_sampler()[source]
sample(domain, spec=None, size=1, random_state=None)[source]
syne_tune.config_space.quniform(lower, upper, q)[source]

Sample a quantized float value uniformly between lower and upper.

Sampling from tune.uniform(1, 10) is equivalent to sampling from np.random.uniform(1, 10))

The value will be quantized, i.e. rounded to an integer increment of q. Quantization makes the upper bound inclusive.

syne_tune.config_space.reverseloguniform(lower, upper)[source]

Values 0 <= x < 1, internally represented as -log(1 - x)

Paam lower:

Lower boundary of the output interval (e.g. 0.99)

Parameters:

upper (float) – Upper boundary of the output interval (e.g. 0.9999)

Returns:

Float object

syne_tune.config_space.qloguniform(lower, upper, q)[source]

Sugar for sampling in different orders of magnitude.

The value will be quantized, i.e. rounded to an integer increment of q. Quantization makes the upper bound inclusive.

Parameters:
  • lower (float) – Lower boundary of the output interval (e.g. 1e-4)

  • upper (float) – Upper boundary of the output interval (e.g. 1e-2)

  • q (float) – Quantization number. The result will be rounded to an integer increment of this value.

syne_tune.config_space.qrandint(lower, upper, q=1)[source]

Sample an integer value uniformly between lower and upper.

lower is inclusive, upper is also inclusive (!).

The value will be quantized, i.e. rounded to an integer increment of q. Quantization makes the upper bound inclusive.

syne_tune.config_space.qlograndint(lower, upper, q)[source]

Sample an integer value log-uniformly between lower and upper

lower is inclusive, upper is also inclusive (!).

The value will be quantized, i.e. rounded to an integer increment of q. Quantization makes the upper bound inclusive.

syne_tune.constants module

Collects constants to be shared between core code and tuning scripts or benchmarks.

syne_tune.constants.SYNE_TUNE_ENV_FOLDER = 'SYNETUNE_FOLDER'

Environment variable that allows to overides default library folder

syne_tune.constants.SYNE_TUNE_DEFAULT_FOLDER = 'syne-tune'

Name of default library folder used if the env variable is not defined

syne_tune.constants.ST_WORKER_ITER = 'st_worker_iter'

Number of times reporter was called

syne_tune.constants.ST_WORKER_TIMESTAMP = 'st_worker_timestamp'

Time stamp when worker was called

syne_tune.constants.ST_WORKER_TIME = 'st_worker_time'

Time since creation of reporter

syne_tune.constants.ST_WORKER_COST = 'st_worker_cost'

Estimate of dollar cost spent so far

syne_tune.constants.ST_INSTANCE_TYPE = 'st_instance_type'

Instance type to be used for job execution (SageMaker backend)

syne_tune.constants.ST_INSTANCE_COUNT = 'st_instance_count'

Number of instances o be used for job execution (SageMaker backend)

syne_tune.constants.ST_SAGEMAKER_METRIC_TAG = 'tune-metric'

Tag for log lines used in Reporter

syne_tune.constants.ST_CHECKPOINT_DIR = 'st_checkpoint_dir'

Name of config key for checkpoint directory

syne_tune.constants.ST_CONFIG_JSON_FNAME_ARG = 'st_config_json_filename'

Name of config key for config JSON file

syne_tune.constants.ST_REMOTE_UPLOAD_DIR_NAME = 'tuner'

Name for upload_dir in RemoteTuner

syne_tune.constants.ST_RESULTS_DATAFRAME_FILENAME = 'results.csv.zip'

Name for results dataframe stored in StoreResultsCallback

syne_tune.constants.ST_METADATA_FILENAME = 'metadata.json'

Name for metadata file stored in Tuner

syne_tune.constants.ST_TUNER_DILL_FILENAME = 'tuner.dill'

Name for final tuner object file stored in Tuner

syne_tune.constants.ST_DATETIME_FORMAT = '%Y-%m-%d-%H-%M-%S'

Datetime format used in result path names

syne_tune.constants.MAX_METRICS_SUPPORTED_BY_SAGEMAKER = 40

Max number of metrics allowed for estimator

syne_tune.constants.TUNER_DEFAULT_SLEEP_TIME = 5.0

Default value for sleep_time

syne_tune.num_gpu module

Adapted from to not run in Shell mode which is unsecure. https://github.com/aws/sagemaker-rl-container/blob/master/src/vw-serving/src/vw_serving/sagemaker/gpu.py

syne_tune.num_gpu.get_num_gpus()[source]

Returns the number of available GPUs based on configuration parameters and available hardware GPU devices. Gpus are detected by running “nvidia-smi –list-gpus” as a subprocess. :rtype: int :return: (int) number of GPUs

syne_tune.report module
class syne_tune.report.Reporter(add_time=True, add_cost=True)[source]

Bases: object

Callback for reporting metric values from a training script back to Syne Tune. Example:

from syne_tune import Reporter

report = Reporter()
for epoch in range(1, epochs + 1):
    # ...
    report(epoch=epoch, accuracy=accuracy)
Parameters:
  • add_time (bool) – If True (default), the time (in secs) since creation of the Reporter object is reported automatically as ST_WORKER_TIME

  • add_cost (bool) – If True (default), estimated dollar cost since creation of Reporter object is reported automatically as ST_WORKER_COST. This is available for SageMaker backend only. Requires add_time=True.

add_time: bool = True
add_cost: bool = True
syne_tune.report.retrieve(log_lines)[source]

Retrieves metrics reported with _report_logger() given log lines.

Parameters:

log_lines (List[str]) – Lines in log file to be scanned for metric reports

Return type:

List[Dict[str, float]]

Returns:

list of metrics retrieved from the log lines.

syne_tune.results_callback module
class syne_tune.results_callback.ExtraResultsComposer[source]

Bases: object

Base class for extra_results_composer argument in StoreResultsCallback. Extracts extra results in StoreResultsCallback.on_trial_result() and returns them as dictionary to be appended to the results dataframe.

Why don’t we use a lambda function instead? We would like the tuner, with all its dependent objects, to be dill serializable, and lambda functions are not.

keys()[source]
Return type:

List[str]

Returns:

Key names of dictionaries returned in __call__(), or [] if nothing is returned

class syne_tune.results_callback.StoreResultsCallback(add_wallclock_time=True, extra_results_composer=None)[source]

Bases: TunerCallback

Default implementation of TunerCallback which records all reported results, and allows to store them as CSV file.

Parameters:
  • add_wallclock_time (bool) – If True, wallclock time since call of on_tuning_start is stored as ST_TUNER_TIME.

  • extra_results_composer (Optional[ExtraResultsComposer]) – Optional. If given, this is called in on_trial_result(), and the resulting dictionary is appended as extra columns to the results dataframe

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict[str, Any]) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

store_results()[source]

Store current results into CSV file, of name {tuner.tuner_path}/{ST_RESULTS_DATAFRAME_FILENAME}.

dataframe()[source]
Return type:

DataFrame

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tunerTuner object

on_tuning_end()[source]

Called once the tuning loop terminates

This is called before Tuner object is serialized (optionally), and also before running jobs are stopped.

syne_tune.stopping_criterion module
class syne_tune.stopping_criterion.StoppingCriterion(max_wallclock_time=None, max_num_evaluations=None, max_num_trials_started=None, max_num_trials_completed=None, max_cost=None, max_num_trials_finished=None, min_metric_value=None, max_metric_value=None)[source]

Bases: object

Stopping criterion that can be used in a Tuner, for instance Tuner(stop_criterion=StoppingCriterion(max_wallclock_time=3600), ...).

If several arguments are used, the combined criterion is true whenever one of the atomic criteria is true.

In principle, stop_criterion for Tuner can be any lambda function, but this class should be used with remote launching in order to ensure proper serialization.

Parameters:
  • max_wallclock_time (Optional[float]) – Stop once this wallclock time is reached

  • max_num_evaluations (Optional[int]) – Stop once more than this number of metric records have been reported

  • max_num_trials_started (Optional[int]) – Stop once more than this number of trials have been started

  • max_num_trials_completed (Optional[int]) – Stop once more than this number of trials have been completed. This does not include trials which were stopped or failed

  • max_cost (Optional[float]) – Stop once total cost of evaluations larger than this value

  • max_num_trials_finished (Optional[int]) – Stop once more than this number of trials have finished (i.e., completed, stopped, failed, or stopping)

  • min_metric_value (Optional[Dict[str, float]]) – Dictionary with thresholds for selected metrics. Stop once an evaluation reports a metric value below a threshold

  • max_metric_value (Optional[Dict[str, float]]) – Dictionary with thresholds for selected metrics. Stop once an evaluation reports a metric value above a threshold

max_wallclock_time: float = None
max_num_evaluations: int = None
max_num_trials_started: int = None
max_num_trials_completed: int = None
max_cost: float = None
max_num_trials_finished: int = None
min_metric_value: Optional[Dict[str, float]] = None
max_metric_value: Optional[Dict[str, float]] = None
class syne_tune.stopping_criterion.PlateauStopper(metric, std=0.001, num_trials=10, mode='min', patience=0)[source]

Bases: object

Stops the experiment when a metric plateaued for N consecutive trials for more than the given amount of iterations specified in the patience parameter. This code is inspired by Ray Tune.

syne_tune.try_import module
syne_tune.try_import.try_import_gpsearchers_message()[source]
Return type:

str

syne_tune.try_import.try_import_kde_message()[source]
Return type:

str

syne_tune.try_import.try_import_bore_message()[source]
Return type:

str

syne_tune.try_import.try_import_raytune_message()[source]
Return type:

str

syne_tune.try_import.try_import_benchmarks_message()[source]
Return type:

str

syne_tune.try_import.try_import_aws_message()[source]
Return type:

str

syne_tune.try_import.try_import_botorch_message()[source]
Return type:

str

syne_tune.try_import.try_import_blackbox_repository_message()[source]
Return type:

str

syne_tune.try_import.try_import_yahpo_message()[source]
Return type:

str

syne_tune.try_import.try_import_moo_message()[source]
Return type:

str

syne_tune.try_import.try_import_visual_message()[source]
Return type:

str

syne_tune.try_import.try_import_sklearn_message()[source]
Return type:

str

syne_tune.try_import.try_import_backends_message()[source]
Return type:

str

syne_tune.tuner module
class syne_tune.tuner.Tuner(trial_backend, scheduler, stop_criterion, n_workers, sleep_time=5.0, results_update_interval=10.0, print_update_interval=30.0, max_failures=1, tuner_name=None, asynchronous_scheduling=True, wait_trial_completion_when_stopping=False, callbacks=None, metadata=None, suffix_tuner_name=True, save_tuner=True, start_jobs_without_delay=True, trial_backend_path=None)[source]

Bases: object

Controller of tuning loop, manages interplay between scheduler and trial backend. Also, stopping criterion and number of workers are maintained here.

Parameters:
  • trial_backend (TrialBackend) – Backend for trial evaluations

  • scheduler (TrialScheduler) – Tuning algorithm for making decisions about which trials to start, stop, pause, or resume

  • stop_criterion (Callable[[TuningStatus], bool]) – Tuning stops when this predicates returns True. Called in each iteration with the current tuning status. It is recommended to use StoppingCriterion.

  • n_workers (int) – Number of workers used here. Note that the backend needs to support (at least) this number of workers to be run in parallel

  • sleep_time (float) – Time to sleep when all workers are busy. Defaults to DEFAULT_SLEEP_TIME

  • results_update_interval (float) – Frequency at which results are updated and stored (in seconds). Defaults to 10.

  • print_update_interval (float) – Frequency at which result table is printed. Defaults to 30.

  • max_failures (int) – This many trial execution failures are allowed before the tuning loop is aborted. Defaults to 1

  • tuner_name (Optional[str]) – Name associated with the tuning experiment, default to the name of the entrypoint. Must consists of alpha-digits characters, possibly separated by ‘-’. A postfix with a date time-stamp is added to ensure uniqueness.

  • asynchronous_scheduling (bool) – Whether to use asynchronous scheduling when scheduling new trials. If True, trials are scheduled as soon as a worker is available. If False, the tuner waits that all trials are finished before scheduling a new batch of size n_workers. Default to True.

  • wait_trial_completion_when_stopping (bool) – How to deal with running trials when stopping criterion is met. If True, the tuner waits until all trials are finished. If False, all trials are terminated. Defaults to False.

  • callbacks (Optional[List[TunerCallback]]) – Called at certain times in the tuning loop, for example when a result is seen. The default callback stores results every results_update_interval.

  • metadata (Optional[dict]) – Dictionary of user-metadata that will be persisted in {tuner_path}/{ST_METADATA_FILENAME}, in addition to metadata provided by the user. SMT_TUNER_CREATION_TIMESTAMP is always included which measures the time-stamp when the tuner started to run.

  • suffix_tuner_name (bool) – If True, a timestamp is appended to the provided tuner_name that ensures uniqueness, otherwise the name is left unchanged and is expected to be unique. Defaults to True.

  • save_tuner (bool) – If True, the Tuner object is serialized at the end of tuning, including its dependencies (e.g., scheduler). This allows all details of the experiment to be recovered. Defaults to True.

  • start_jobs_without_delay (bool) –

    Defaults to True. If this is True, the tuner starts new jobs depending on scheduler decisions communicated to the backend. For example, if a trial has just been stopped (by calling backend.stop_trial), the tuner may start a new one immediately, even if the SageMaker training job is still busy due to stopping delays. This can lead to faster experiment runtime, because the backend is temporarily going over its budget.

    If set to False, the tuner always asks the backend for the number of busy workers, which guarantees that we never go over the n_workers budget. This makes a difference for backends where stopping or pausing trials is not immediate (e.g., SageMakerBackend). Not going over budget means that n_workers can be set up to the available quota, without running the risk of an exception due to the quota being exceeded. If you get such exceptions, we recommend to use start_jobs_without_delay=False. Also, if the SageMaker warm pool feature is used, it is recommended to set start_jobs_without_delay=False, since otherwise more than n_workers warm pools will be started, because existing ones are busy with stopping when they should be reassigned.

  • trial_backend_path (Optional[str]) –

    If this is given, the path of trial_backend (where logs and checkpoints of trials are stored) is set to this. Otherwise, it is set to self.tuner_path, so that per-trial information is written to the same path as tuning results.

    If the backend is LocalBackend and the experiment is ru remotely, we recommend to set this, since otherwise checkpoints and logs are synced to S3, along with tuning results, which is costly and error-prone.

run()[source]

Launches the tuning.

save(folder=None)[source]
static load(tuner_path)[source]
best_config(metric=0)[source]
Parameters:

metric (Union[str, int, None]) – Indicates which metric to use, can be the index or a name of the metric. default to 0 - first metric defined in the Scheduler

Return type:

Tuple[int, Dict[str, Any]]

Returns:

the best configuration found while tuning for the metric given and the associated trial-id

syne_tune.tuner_callback module
class syne_tune.tuner_callback.TunerCallback[source]

Bases: object

Allows user of Tuner to monitor progress, store additional results, etc.

on_tuning_start(tuner)[source]

Called at start of tuning loop

Parameters:

tunerTuner object

on_tuning_end()[source]

Called once the tuning loop terminates

This is called before Tuner object is serialized (optionally), and also before running jobs are stopped.

on_loop_start()[source]

Called at start of each tuning loop iteration

Every iteration starts with fetching new results from the backend. This is called before this is done.

on_loop_end()[source]

Called at end of each tuning loop iteration

This is done before the loop stopping condition is checked and acted upon.

on_fetch_status_results(trial_status_dict, new_results)[source]

Called just after trial_backend.fetch_status_results

Parameters:
  • trial_status_dict (Dict[int, Tuple[Trial, str]]) – Result of fetch_status_results

  • new_results (List[Tuple[int, dict]]) – Result of fetch_status_results

on_trial_complete(trial, result)[source]

Called when a trial completes (Status.completed)

The arguments here also have been passed to scheduler.on_trial_complete, before this call here.

Parameters:
  • trial (Trial) – Trial that just completed.

  • result (Dict[str, Any]) – Last result obtained.

on_trial_result(trial, status, result, decision)[source]

Called when a new result (reported by a trial) is observed

The arguments here are inputs or outputs of scheduler.on_trial_result (called just before).

Parameters:
  • trial (Trial) – Trial whose report has been received

  • status (str) – Status of trial before scheduler.on_trial_result has been called

  • result (Dict[str, Any]) – Result dict received

  • decision (str) – Decision returned by scheduler.on_trial_result

on_tuning_sleep(sleep_time)[source]

Called just after tuner has slept, because no worker was available

Parameters:

sleep_time (float) – Time (in secs) for which tuner has just slept

on_start_trial(trial)[source]

Called just after a new trials is started

Parameters:

trial (Trial) – Trial which has just been started

on_resume_trial(trial)[source]

Called just after a trial is resumed

Parameters:

trial (Trial) – Trial which has just been resumed

syne_tune.tuning_status module
class syne_tune.tuning_status.MetricsStatistics[source]

Bases: object

Allows to maintain simple running statistics (min/max/sum/count) of metrics provided. Statistics are tracked for numeric types only. Types of first added metrics define its types.

add(metrics)[source]
class syne_tune.tuning_status.TuningStatus(metric_names)[source]

Bases: object

Information of a tuning job to display as progress or to use to decide whether to stop the tuning job.

Parameters:

metric_names (List[str]) – Names of metrics reported

update(trial_status_dict, new_results)[source]

Updates the tuning status given new statuses and results.

Parameters:
  • trial_status_dict (Dict[int, Tuple[Trial, str]]) – Dictionary mapping trial ID to Trial object and status

  • new_results (List[Tuple[int, dict]]) – New results, along with trial IDs

mark_running_job_as_stopped()[source]

Update the status of all trials still running to be marked as stop.

property num_trials_started
Returns:

Number of trials which have been started

property num_trials_completed
Returns:

Number of trials which have been completed

property num_trials_failed
Returns:

Number of trials which have failed

property num_trials_finished
Returns:

Number of trials that finished, e.g. that completed, were stopped or are stopping, or failed

property num_trials_running
Returns:

Number of trials currently running

property wallclock_time
Returns:

the wallclock time spent in the tuner

property user_time
Returns:

the total user time spent in the workers

property cost
Returns:

the estimated dollar-cost spent while tuning

get_dataframe()[source]
Return type:

DataFrame

Returns:

Information about all trials as dataframe

syne_tune.tuning_status.print_best_metric_found(tuning_status, metric_names, mode=None)[source]

Prints trial status summary and the best metric found.

Parameters:
  • tuning_status (TuningStatus) – Current tuning status

  • metric_names (List[str]) – Plot results for first metric in this list

  • mode (Optional[str]) – “min” or “max”

Return type:

Optional[Tuple[int, float]]

Returns:

trial-id and value of the best metric found

syne_tune.util module
class syne_tune.util.RegularCallback(callback, call_seconds_frequency)[source]

Bases: object

Allows to call the callback function at most once every call_seconds_frequency seconds.

Parameters:
  • callback (callable) – Callback object

  • call_seconds_frequency (float) – Wait time between subsequent calls

syne_tune.util.experiment_path(tuner_name=None, local_path=None)[source]

Return the path of an experiment which is used both by Tuner and to collect results of experiments.

Parameters:
  • tuner_name (Optional[str]) – Name of a tuning experiment

  • local_path (Optional[str]) – Local path where results should be saved when running locally outside of SageMaker. If not specified, then the environment variable "SYNETUNE_FOLDER" is used if defined otherwise ~/syne-tune/ is used. Defining the environment variable "SYNETUNE_FOLDER" allows to override the default path.

Return type:

Path

Returns:

Path where to write logs and results for Syne Tune tuner. On SageMaker, results are written to "/opt/ml/checkpoints/" so that files are persisted continuously to S3 by SageMaker.

syne_tune.util.s3_experiment_path(s3_bucket=None, experiment_name=None, tuner_name=None)[source]

Returns S3 path for storing results and checkpoints.

Parameters:
  • s3_bucket (Optional[str]) – If not given, the default bucket for the SageMaker session is used

  • experiment_name (Optional[str]) – If given, this is used as first directory

  • tuner_name (Optional[str]) – If given, this is used as second directory

Return type:

str

Returns:

S3 path, ending on “/”

syne_tune.util.check_valid_sagemaker_name(name)[source]
syne_tune.util.sanitize_sagemaker_name(name)[source]
Return type:

str

syne_tune.util.name_from_base(base, default, max_length=63)[source]

Append a timestamp to the provided string.

This function assures that the total length of the resulting string is not longer than the specified max length, trimming the input parameter if necessary.

Parameters:
  • base (Optional[str]) – String used as prefix to generate the unique name

  • default (str) – String used if base is None

  • max_length (int) – Maximum length for the resulting string (default: 63)

Return type:

str

Returns:

Input parameter with appended timestamp

syne_tune.util.random_string(length)[source]
Return type:

str

syne_tune.util.repository_root_path()[source]
Return type:

Path

Returns:

Returns path including syne_tune, examples, benchmarking

syne_tune.util.script_checkpoint_example_path()[source]
Return type:

Path

Returns:

Path of checkpoint example

syne_tune.util.script_height_example_path()[source]
Return type:

Path

Returns:

Path of train_heigth example

syne_tune.util.catchtime(name)[source]
Return type:

float

syne_tune.util.is_increasing(lst)[source]
Parameters:

lst (List[Union[float, int]]) – List of float or int entries

Return type:

bool

Returns:

Is lst strictly increasing?

syne_tune.util.is_positive_integer(lst)[source]
Parameters:

lst (List[int]) – List of int entries

Return type:

bool

Returns:

Are all entries of lst of type int and positive?

syne_tune.util.is_integer(lst)[source]
Parameters:

lst (list) – List of entries

Return type:

bool

Returns:

Are all entries of lst of type int?

syne_tune.util.dump_json_with_numpy(x, filename=None)[source]

Serializes dictionary x in JSON, taking into account NumPy specific value types such as n.p.int64.

Parameters:
  • x (dict) – Dictionary to serialize or encode

  • filename (Union[str, Path, None]) – Name of file to store JSON to. Optional. If not given, the JSON encoding is returned as string

Return type:

Optional[str]

Returns:

If filename is None, JSON encoding is returned

syne_tune.util.dict_get(params, key, default)[source]

Returns params[key] if this exists and is not None, and default otherwise. Note that this is not the same as params.get(key, default). Namely, if params[key] is equal to None, this would return None, but this method returns default.

This function is particularly helpful when dealing with a dict returned by argparse.ArgumentParser. Whenever key is added as argument to the parser, but a value is not provided, this leads to params[key] = None.

Return type:

Any

syne_tune.util.recursive_merge(a, b, stop_keys=None)[source]

Merge dictionaries a and b, where b takes precedence. We typically use this to modify a dictionary a, so b is smaller than a. Further recursion is stopped on any node with key in stop_keys. Use this for dictionary-valued entries not to be merged, but to be replaced by what is in b.

Parameters:
  • a (Dict[str, Any]) – Dictionary

  • b (Dict[str, Any]) – Dictionary (can be empty)

  • stop_keys (Optional[List[str]]) – See above, optional

Return type:

Dict[str, Any]

Returns:

Merged dictionary

syne_tune.util.find_first_of_type(a, typ)[source]
Return type:

Optional[Any]

syne_tune.util.metric_name_mode(metric_names, metric_mode, metric)[source]

Retrieve the metric mode given a metric queried by either index or name. :type metric_names: List[str] :param metric_names: metrics names defined in a scheduler :type metric_mode: Union[str, List[str]] :param metric_mode: metric mode or modes of a scheduler :type metric: Union[str, int] :param metric: Index or name of the selected metric :return the name and the mode of the queried metric

Return type:

Tuple[str, str]

Indices and tables