Why should I use Syne Tune?

Hyperparameter Optimization (HPO) has been an important problem for many years, and a variety of commercial and open-source tools are available to help practitioners run HPO efficiently. Notable examples for open source tools are Ray Tune and Optuna. Here are some reasons why you may prefer Syne Tune over these alternatives:

  • Lightweight and platform-agnostic: Syne Tune is designed to work with different execution backends, so you are not locked into a particular distributed system architecture. Syne Tune runs with minimal dependencies.

  • Wide range of modalities: Syne Tune supports multi-fidelity HPO, constrained HPO, multi-objective HPO, transfer tuning, cost-aware HPO, population based training.

  • Simple, modular design: Rather than wrapping all sorts of other HPO frameworks, Syne Tune provides simple APIs and scheduler templates, which can easily be extended to your specific needs. Studying the code will allow you to understand what the different algorithms are doing, and how they differ from each other.

  • Industry-strength Bayesian optimization: Syne Tune has special support for Gaussian process based Bayesian optimization. The same code powers modalities like multi-fidelity HPO, constrained HPO, or cost-aware HPO, having been tried and tested for several years.

  • Support for distributed parallelized experimentation: We built Syne Tune to be able to move fast, using the parallel resources AWS SageMaker offers. Syne Tune allows ML/AI practitioners to easily set up and run studies with many experiments running in parallel.

  • Special support for researchers: Syne Tune allows for rapid development and comparison between different tuning algorithms. Its blackbox repository and simulator backend run realistic simulations of experiments many times faster than real time. Benchmarking is simple, efficient, and allows to compare different methods as apples to apples (same execution backend, implementation from the same parts).

If you are an AWS customer, there are additional good reasons to use Syne Tune over the alternatives:

  • If you use AWS services or SageMaker frameworks day to day, Syne Tune works out of the box and fits into your normal workflow. It unlocks the power of distributed experimentation that SageMaker offers.

  • Syne Tune is developed in collaboration with the team behind the Automatic Model Tuning service.

What are the different installation options supported?

To install Syne Tune with minimal dependencies from pip, you can simply do:

pip install 'syne-tune'

If you want in addition to install our own Gaussian process based optimizers, Ray Tune or Bore optimizer, you can run pip install 'syne-tune[X]' where X can be:

  • gpsearchers: For built-in Gaussian process based optimizers (such as BayesianOptimization, MOBSTER, or HyperTune)

  • aws: AWS SageMaker dependencies. These are required for remote launching or for the SageMakerBackend

  • raytune: For Ray Tune optimizers (see RayTuneScheduler), installs all Ray Tune dependencies

  • benchmarks: For installing dependencies required to run all benchmarks locally (not needed for remote launching or SageMakerBackend)

  • blackbox-repository: Blackbox repository for simulated tuning

  • yahpo: YAHPO Gym surrogate blackboxes

  • kde: For BOHB (such as SyncBOHB, or FIFOScheduler or HyperbandScheduler with searcher="kde")

  • botorch: Bayesian optimization from BoTorch (see BoTorchSearcher)

  • dev: For developers who would like to extend Syne Tune

  • bore: For Bore optimizer (see BORE)

There are also union tags you can use:

  • basic: Union of dependencies of a reasonable size (gpsearchers, kde, aws, moo, sklearn). Even if size does not matter for your local installation, you should consider basic for remote launching of experiments.

  • extra: Union of all dependencies listed above.

Our general recommendation is to use pip install 'syne-tune[basic]', then add

  • dev if you aim to extend Syne Tune

  • benchmarks if you like to run Syne Tune real benchmarks locally

  • blackbox-repository if you like to run surrogate benchmarks with the simulator backend

  • visual if you like to visualize results of experiments

In order to run schedulers which depend on BOTorch, you need to add botorch, and if you like to run Ray Tune schedulers, you need to add raytune (both of these come with many dependencies). If the size of the installation is of no concern, just use pip install 'syne-tune[extra]'.

If you run code which needs dependencies you have not installed, a warning message tells you which tag is missing, and you can always install it later.

To install the latest version from git, run the following:

pip install git+https://github.com/awslabs/syne-tune.git

For local development, we recommend using the following setup which will enable you to easily test your changes:

git clone https://github.com/awslabs/syne-tune.git
cd syne-tune
python3 -m venv st_venv
. st_venv/bin/activate
pip install --upgrade pip
pip install -e '.[basic,dev]'

This installs everything in a virtual environment st_venv. Remember to activate this environment before working with Syne Tune. We also recommend building the virtual environment from scratch now and then, in particular when you pull a new release, as dependencies may have changed.

How can I run on AWS and SageMaker?

If you want to launch experiments or training jobs on SageMaker rather than on your local machine, you will need access to AWS and SageMaker on your machine. Make sure that:

  • awscli is installed (see this link)

  • AWS credentials have been set properly (see this link).

  • The necessary SageMaker role has been created (see this page for instructions. If you’ve created a SageMaker notebook in the past, this role should already have been created for you).

The following command should run without error if your credentials are available:

python -c "import boto3; print(boto3.client('sagemaker').list_training_jobs(MaxResults=1))"

You can also run the following example that evaluates trials on SageMaker to test your setup.

python examples/launch_height_sagemaker.py

What are the metrics reported by default when calling the Reporter?

Whenever you call the reporter to log a result, the worker time-stamp, the worker time since the creation of the reporter and the number of times the reporter was called are logged under the fields ST_WORKER_TIMESTAMP, ST_WORKER_TIME, and ST_WORKER_ITER. In addition, when running on SageMaker, a dollar-cost estimate is logged under the field ST_WORKER_COST.

To see this behavior, you can simply call the reporter to see those metrics:

from syne_tune.report import Reporter
reporter = Reporter()
for step in range(3):
   reporter(step=step, metric=float(step) / 3)

# [tune-metric]: {"step": 0, "metric": 0.0, "st_worker_timestamp": 1644311849.6071281, "st_worker_time": 0.0001048670000045604, "st_worker_iter": 0}
# [tune-metric]: {"step": 1, "metric": 0.3333333333333333, "st_worker_timestamp": 1644311849.6071832, "st_worker_time": 0.00015910100000837701, "st_worker_iter": 1}
# [tune-metric]: {"step": 2, "metric": 0.6666666666666666, "st_worker_timestamp": 1644311849.60733, "st_worker_time": 0.00030723599996917983, "st_worker_iter": 2}

How can I utilize multiple GPUs?

To utilize multiple GPUs you can use the local backend LocalBackend, which will run on the GPUs available in a local machine. You can also run on a remote AWS instance with multiple GPUs using the local backend and the remote launcher, see here, or run with the SageMakerBackend which spins-up one training job per trial.

When evaluating trials on a local machine with LocalBackend, by default each trial is allocated to the least occupied GPU by setting CUDA_VISIBLE_DEVICES environment variable. When running on a machine with more than one GPU, you can adjust the number of GPUs assigned to each trial by num_gpus_per_trial. However, make sure that the product of n_workers and num_gpus_per_trial is not larger than the total number of GPUs, since otherwise trials will be delayed. You can also use gpus_to_use in order restrict Syne Tune to use a subset of available GPUs only.

What is the default mode when performing optimization?

The default mode is "min" when performing optimization, so the target metric is minimized. The mode can be configured when instantiating a scheduler.

How are trials evaluated on a local machine?

When trials are executed locally (e.g., when LocalBackend is used), each trial is evaluated as a different sub-process. As such the number of concurrent configurations evaluated at the same time (set by n_workers when creating the Tuner) should account for the capacity of the machine where the trials are executed.

Is the tuner checkpointed?

Yes. When performing the tuning, the tuner state is regularly saved on the experiment path under tuner.dill (every 10 seconds, which can be configured with results_update_interval when creating the Tuner). This allows to use spot-instances when running a tuning remotely with the remote launcher. It also allows to resume a past experiment or analyse the state of scheduler at any point.

Where can I find the output of the tuning?

When running locally, the output of the tuning is saved under ~/syne-tune/{tuner-name}/ by default. When running remotely on SageMaker, the output of the tuning is saved under /opt/ml/checkpoints/ by default and the tuning output is synced regularly to s3://{sagemaker-default-bucket}/syne-tune/{tuner-name}/.

Can I resume a previous tuning job?

Yes, if you want to resume tuning you can deserialize the tuner that is regularly checkpointed to disk possibly after having modified some part of the scheduler or adapting the stopping condition to your need. See examples/launch_resume_tuning.py. for an example which resumes a previous tuning after having updated the configuration space.

How can I change the default output folder where tuning results are stored?

To change the path where tuning results are written, you can set the environment variable SYNETUNE_FOLDER to the folder that you want.

For instance, the following runs a tuning where results tuning files are written under ~/new-syne-tune-folder:

export SYNETUNE_FOLDER="~/new-syne-tune-folder"
python examples/launch_height_baselines.py

You can also do the following for instance to permanently change the output folder of Syne Tune:

echo 'export SYNETUNE_FOLDER="~/new-syne-tune-folder"' >> ~/.bashrc && source ~/.bashrc

What does the output of the tuning contain?

Syne Tune stores the following files metadata.json, results.csv.zip, and tuner.dill, which are respectively metadata of the tuning job, results obtained at each time-step, and state of the tuner. If you create the Tuner with save_tuner=False, the tuner.dill file is not written. The content of results.csv.zip can be customized.

How can I enable trial checkpointing?

Since trials may be paused and resumed (either by schedulers or when using spot-instances), the user may checkpoint intermediate results to avoid starting computation from scratch. Model outputs and checkpoints must be written into a specific local path given by the command line argument ST_CHECKPOINT_DIR. Saving/loading model checkpoint from this directory enables to save/load the state when the job is stopped/resumed (setting the folder correctly and uniquely per trial is the responsibility of the backend). Here is an example of a tuning script with checkpointing enabled:

examples/training_scripts/checkpoint_example/checkpoint_example.py
import argparse
import json
import logging
import os
import time
from pathlib import Path

from syne_tune import Reporter
from syne_tune.constants import ST_CHECKPOINT_DIR


report = Reporter()


def load_checkpoint(checkpoint_path: Path):
    with open(checkpoint_path, "r") as f:
        return json.load(f)


def save_checkpoint(checkpoint_path: Path, epoch: int, value: float):
    os.makedirs(checkpoint_path.parent, exist_ok=True)
    with open(checkpoint_path, "w") as f:
        json.dump({"last_epoch": epoch, "last_value": value}, f)


if __name__ == "__main__":
    root = logging.getLogger()
    root.setLevel(logging.INFO)

    parser = argparse.ArgumentParser()
    parser.add_argument("--num-epochs", type=int, required=True)
    parser.add_argument("--multiplier", type=float, default=1)
    parser.add_argument("--sleep-time", type=float, default=0.1)

    # convention the path where to serialize and deserialize is given as st_checkpoint_dir
    parser.add_argument(f"--{ST_CHECKPOINT_DIR}", type=str)

    args, _ = parser.parse_known_args()

    num_epochs = args.num_epochs
    checkpoint_path = None
    start_epoch = 1
    current_value = 0
    checkpoint_dir = getattr(args, ST_CHECKPOINT_DIR)
    if checkpoint_dir is not None:
        checkpoint_path = Path(checkpoint_dir) / "checkpoint.json"
        if checkpoint_path.exists():
            state = load_checkpoint(checkpoint_path)
            logging.info(f"resuming from previous checkpoint {state}")
            start_epoch = state["last_epoch"] + 1
            current_value = state["last_value"]

    # write dumb values for loss to illustrate sagemaker ability to retrieve metrics
    # should be replaced by your algorithm
    for current_epoch in range(start_epoch, num_epochs + 1):
        time.sleep(args.sleep_time)
        current_value = (current_value + 1) * args.multiplier
        if checkpoint_path is not None:
            save_checkpoint(checkpoint_path, current_epoch, current_value)
        report(train_acc=current_value, epoch=current_epoch)

When using the SageMaker backend, we use the SageMaker checkpoint mechanism under the hood to sync local checkpoints to S3. Checkpoints are synced to s3://{sagemaker-default-bucket}/syne-tune/{tuner-name}/{trial-id}/, where sagemaker-default-bucket is the default bucket for SageMaker. A complete example is given by examples/launch_height_sagemaker_checkpoints.py.

The same mechanism is used to regularly write the tuning results to S3 during remote tuning. However, during remote tuning with the local backend, we do not want checkpoints to be synced to S3, since they are only required temporarily on the same instance. Syncing them to S3 would be costly and error-prone, because the SageMaker mechanism is not intended to work with different processes writing to and reading from the sync directory concurrently. In this case, we can switch off syncing checkpoints to S3 (but not tuning results!) by setting trial_backend_path=backend_path_not_synced_to_s3() when creating the Tuner object. An example is fine_tuning_transformer_glue/hpo_main.py. It is also supported by default in the experimentation framework and in RemoteLauncher.

There are some convenience functions which help you to implement checkpointing for your training script. Have a look at resnet_cifar10.py:

  • Checkpoints have to be written at the end of certain epochs (namely those after which the scheduler may pause the trial). This is dealt with by checkpoint_model_at_rung_level(config, save_model_fn, epoch). Here, epoch is the current epoch, allowing the function to decide whether to checkpoint or not. save_model_fn stores the current mutable state along with epoch to a local path (see below). Finally, config contains arguments provided by the scheduler (see below).

  • Before the training loop starts (and optionally), the mutable state to start from has to be loaded from a checkpoint. This is done by resume_from_checkpointed_model(config, load_model_fn). If the checkpoint has been loaded successfully, the training loop may start with epoch resume_from + 1 instead of 1. Here, load_model_fn loads the mutable state from a checkpoint in a local path, returning its epoch value if successful, which is returned as resume_from.

In general, load_model_fn and save_model_fn have to be provided as part of the script. For most PyTorch models, you can use pytorch_load_save_functions to this end. In general, you will want to include the model, the optimizer, and the learning rate scheduler.

Finally, the scheduler provides additional information about checkpointing in config (most importantly, the path in ST_CHECKPOINT_DIR). You don’t have to worry about this: add_checkpointing_to_argparse(parser) adds corresponding arguments to the parser.

How can I retrieve the best checkpoint obtained after tuning?

You can take a look at this example examples/launch_checkpoint_example.py which shows how to retrieve the best checkpoint obtained after tuning an XGBoost model.

How can I retrain the best model found after tuning?

You can call tuner.trial_backend.start_trial(config=tuner.best_config()) after tuning to retrain the best config, you can take a look at this example examples/launch_plot_example.py which shows how to retrain the best model found while tuning.

Which schedulers make use of checkpointing?

Checkpointing means storing the state of a trial (i.e., model parameters, optimizer or learning rate scheduler parameters), so that it can be paused and potentially resumed at a later point in time, without having to start training from scratch. The following schedulers make use of checkpointing:

  • Promotion-based asynchronous Hyperband: HyperbandScheduler with type="promotion" or type="dyhpo", as well as other asynchronous multi-fidelity schedulers. The code runs without checkpointing, but in this case, any trial which is resumed is started from scratch. For example, if a trial was paused after 9 epochs of training and is resumed later, training starts from scratch and the first 9 epochs are wasted effort. Moreover, extra variance is introduced by starting from scratch, since weights may be initialized differently. It is not recommended running promotion-based Hyperband without checkpointing.

  • Population-based training (PBT): PopulationBasedTraining does not work without checkpointing.

  • Synchronous Hyperband: SynchronousGeometricHyperbandScheduler, as well as other synchronous multi-fidelity schedulers. This code runs without checkpointing, but wastes effort in the same sense as promotion-based asynchronous Hyperband

Checkpoints are filling up my disk. What can I do?

When tuning large models, checkpoints can be large, and with the local backend, these checkpoints are stored locally. With multi-fidelity methods, many trials may be started, and keeping all checkpoints (which is the default) may exceed the available disk space.

If the trial backend TrialBackend is created with delete_checkpoints=True, Syne Tune removes the checkpoint of a trial once it is stopped or completes. All remaining checkpoints are removed at the end of the experiment. Moreover, a number of schedulers support early checkpoint removal for paused trials when they cannot be resumed anymore.

For promotion-based asynchronous multi-fidelity schedulers ( ASHA, MOBSTER, HyperTune), any paused trial can in principle be resumed in the future, and delete_checkpoints=True` alone does not remove checkpoints. In this case, you can activate speculative early checkpoint removal, by passing early_checkpoint_removal_kwargs when creating HyperbandScheduler (or ASHA, MOBSTER, HyperTune). This is a kwargs dictionary with the following arguments:

  • max_num_checkpoints: This is mandatory. Maximum number of trials with checkpoints being retained. Once more than this number of trials with checkpoints are present, checkpoints are removed selectively. This number must be larger than the number of workers, since running trials will always write checkpoints.

  • approx_steps: Positive integer. The computation of the ranking score is a step-wise approximation, which gets more accurate for larger approx_steps. However, this computation scales cubically in approx_steps. The default is 25, which may be sufficient in most cases, but if you need to keep the number of checkpoints quite small, you may want to tune this parameter.

  • max_wallclock_time: Maximum time in seconds the experiment is run for. This is the same as passed to StoppingCriterion, and if you use an instance of this as stop_criterion passed to Tuner, the value is taken from there. Speculative checkpoint removal can only be used if the stopping criterion includes max_wallclock_time.

  • prior_beta_mean: The method depends on the probability of the event that a trial arriving at a rung ranks better than a random paused trial with checkpoint at this rung. These probabilities are estimated for each rung, but we need some initial guess. You are most likely fine with the default. A value \(< 1/2\) is recommended.

  • prior_beta_size: See also prior_beta_mean. The initial guess is a Beta prior, defined in terms of mean and effective sample size (here). The smaller this positive number, the weaker the effect of the initial guess. You are most likely fine with the default.

  • min_data_at_rung: Also related to the estimators mentioned with prior_beta_mean. You are most likely fine with the default.

A complete example is examples/launch_fashionmnist_checkpoint_removal.py. For details on speculative checkpoint removal, look at HyperbandRemoveCheckpointsCallback.

Where can I find the output of my trials?

When running LocalBackend locally, results of trials are saved under ~/syne-tune/{tuner-name}/{trial-id}/ and contains the following files:

  • config.json: configuration that is being evaluated in the trial

  • std.err: standard error

  • std.out: standard output

In addition all checkpointing files used by a training script such as intermediate model checkpoint will also be located there. This is exemplified in the following example:

tree ~/syne-tune/train-height-2022-01-12-11-08-40-971/
~/syne-tune/train-height-2022-01-12-11-08-40-971/
├── 0   ├── config.json
│   ├── std.err
│   ├── std.out
│   └── stop
├── 1   ├── config.json
│   ├── std.err
│   ├── std.out
│   └── stop
├── 2   ├── config.json
│   ├── std.err
│   ├── std.out
│   └── stop
├── 3   ├── config.json
│   ├── std.err
│   ├── std.out
│   └── stop
├── metadata.json
├── results.csv.zip
└── tuner.dill

When running tuning remotely with the remote launcher, only config.json, metadata.json, results.csv.zip and tuner.dill are synced with S3 unless store_logs_localbackend=True when creating Tuner, in which case the trial logs and informations are also persisted.

Is the experimentation framework only useful to compare different HPO methods?

No, by all means no! Most of our users do not use it that way, but simply to speed up experimentation, often with a single HPO methods, but many variants of their problem. More details about Syne Tune for rapid experimentation are provided here and here. Just to clarify:

  • We use the term benchmark to denote a tuning problem, consisting of some code for training and evaluation, plus some default configuration space (which can be changed to result in different variants of the benchmark).

  • While the code for the experimentation framework resides in syne_tune.experiments, we collect example benchmarks in benchmarking (only available if Syne Tune is installed from source). Many of the examples there are about comparison of different HPO methods, but some are not (for example, benchmarking.examples.demo_experiment).

  • In fact, while you do not have to use the experimentation framework to run studies in Syne Tune, it is much easier than maintaining your own launcher scripts and plotting code, so you are strongly encouraged to do so, whether your goal is benchmarking HPO methods or simply just find a good ML model for your current problem faster.

How can I plot the results of a tuning?

Some basic plots can be obtained via ExperimentResult. An example is given in examples/launch_plot_results.py.

How can I plot comparative results across many experiments?

Syne Tune contains powerful plotting tools as part of the experimentation framework in mod:syne_tune.experiments, these are detailed here. An example is provided as part of benchmarking/examples/benchmark_hypertune.

How can I specify additional tuning metadata?

By default, Syne Tune stores the time, the names and modes of the metric being tuned, the name of the entrypoint, the name backend and the scheduler name. You can also add custom metadata to your tuning job by setting metadata in Tuner as follow:

from syne_tune import Tuner

tuner = Tuner(
    ...
    tuner_name="plot-results-demo",
    metadata={"tag": "special-tag", "user": "alice"},
)

All Syne Tune and user metadata are saved when the tuner starts under metadata.json.

How do I append additional information to the results which are stored?

Results are processed and stored by callbacks passed to Tuner, in particular see StoreResultsCallback. In order to add more information, you can inherit from this class. An example is given in StoreResultsAndModelParamsCallback.

If you run experiments with tabulated benchmarks using the BlackboxRepositoryBackend, as demonstrated in launch_nasbench201_simulated.py, results are stored by SimulatorCallback instead, and you need to inherit from this class. An example is given in SimulatorAndModelParamsCallback.

I don’t want to wait, how can I launch the tuning on a remote machine?

Remote launching of experiments has a number of advantages:

  • The machine you are working on is not blocked

  • You can launch many experiments in parallel

  • You can launch experiments with any instance type you like, without having to provision them yourselves. For GPU instances, you do not have to worry about setting up CUDA, etc.

You can use the remote launcher to launch an experiment on a remote machine. The remote launcher supports both LocalBackend and SageMakerBackend. In the former case, multiple trials will be evaluated on the remote machine (one use-case being to use a beefy machine), in the latter case trials will be evaluated as separate SageMaker training jobs. An example for running the remote launcher is given in launch_height_sagemaker_remotely.py.

Remote launching for experimentation is detailed in this tutorial or this tutorial.

How can I run many experiments in parallel?

You can remotely launch any number of experiments, which will then run in parallel, as detailed in this tutorial, see also these examples:

Note

In order to run these examples, you need to have installed Syne Tune from source.

How can I access results after tuning remotely?

You can either call load_experiment(), which will download files from S3 if the experiment is not found locally. You can also sync directly files from S3 under ~/syne-tune/ folder in batch for instance by running:

aws s3 sync s3://{sagemaker-default-bucket}/syne-tune/{tuner-name}/ ~/syne-tune/  --include "*"  --exclude "*tuner.dill"

To get all results without the tuner state (you can omit the include and exclude if you also want to include the tuner state).

How can I specify dependencies to remote launcher or when using the SageMaker backend?

When you run remote code, you often need to install packages (e.g., scipy) or have custom code available.

  • To install packages, you can add a file requirements.txt in the same folder as your endpoint script. All those packages will be installed by SageMaker when docker container starts.

  • To include custom code (for instance a library that you are working on), you can set the parameter dependencies on the remote launcher or on a SageMaker framework to a list of folders. The folders indicated will be compressed, sent to S3 and added to the python path when the container starts. More details are given in this tutorial.

How can I benchmark different methods?

The most flexible way to do so is to write a custom launcher script, as detailed in this tutorial, see also these examples:

Note

In order to run these examples, you need to have installed Syne Tune from source.

What different schedulers do you support? What are the main differences between them?

A succinct overview of supported schedulers is provided here.

Most methods can be accessed with short names by from syne_tune.optimizer.baselines, which is the best place to start.

We refer to HPO algorithms as schedulers. A scheduler decides which configurations to assign to new trials, but also when to stop a running or resume a paused trial. Some schedulers delegate the first decision to a searcher. The most important differences between schedulers in the single-objective case are:

  • Does the scheduler stop trials early or pause and resume trials (HyperbandScheduler) or not (FIFOScheduler). The former requires a resource dimension (e.g., number of epochs; size of training set) and slightly more elaborate reporting (e.g., evaluation after every epoch), but can outperform the latter by a large margin.

  • Does the searcher suggest new configurations by uniform random sampling (searcher="random") or by sequential model-based decision-making (searcher="bayesopt", searcher="kde", searcher="hypertune", searcher="botorch", searcher="dyhpo"). The latter can be more expensive if a lot of trials are run, but can also be more sample-efficient.

An overview of this landscape is given here.

Here is a tutorial for multi-fidelity schedulers. Further schedulers provided by Syne Tune include:

How do I define the configuration space?

While the training script defines the function to be optimized, some care needs to be taken to define the configuration space for the hyperparameter optimization problem. This being a global optimization problem without gradients easily available, it is most important to reduce the number of parameters. A general recommendation is to use streamline_config_space() on your configuration space, which does some automatic rewriting to enforce best practices. Details on how to choose a configuration space, and on automatic rewriting, is given here.

A powerful approach is to run experiments in parallel. Namely, split your hyperparameters into groups A, B, such that HPO over B is tractable. Draw a set of N configurations from A at random, then start N HPO experiments in parallel, where in each of them the search space is over B only, while the parameters in A are fixed. Syne Tune supports massively parallel experimentation, see this tutorial.

How do I set arguments of multi-fidelity schedulers?

When running schedulers like ASHA, MOBSTER, HyperTune, SyncHyperband, or DEHB, there are mandatory parameters resource_attr, max_resource_attr, max_t, max_resource_value. What are they for?

Full details are given in this tutorial. Multi-fidelity HPO needs metric values to be reported at regular intervals during training, for example after every epoch, or for successively larger training datasets. These reports are indexed by a resource value, which is a positive integer (for example, the number of epochs already trained).

  • resource_attr is the name of the resource attribute in the dictionary reported by the training script. For example, the script may report report(epoch=5, mean_loss=0.125) at the end of the 5-th epoch, in which case resource_attr = "epoch".

  • The training script needs to know how many resources to spend overall. For example, a neural network training script needs to know how many epochs to maximally train for. It is best practice to pass this maximum resource value as parameter into the script, which is done by making it part of the configuration space. In this case, max_resource_attr is the name of the attribute in the configuration space which contains the maximum resource value. For example, if your script should train for a maximum of 100 epochs (the scheduler may stop or pause it before, though), you could use config_space = dict(..., epochs=100), in which case max_resource_attr = "epochs".

  • Finally, you can also use max_t instead of max_resource_attr, even though this is not recommended. If you don’t want to include the maximum resource value in your configuration space, you can pass the value directly as max_t. However, this can lead to avoidable errors, and may be inefficient for some schedulers.

Note

When creating a multi-fidelity scheduler, we recommend to use max_resource_attr in favour of max_t or max_resource_value, as the latter is error-prone and may be less efficient for some schedulers.

Is my training script ready for multi-fidelity tuning?

A more detailed answer to this question is given in the multi-fidelity tutorial. In short:

  • You need to define the notion of resource for your script. Resource is a discrete variable (integer), so that time/costs scale linearly in it for every configuration. A common example is epochs of training for a neural network. You need to pass the name of this argument as max_resource_attr to the multi-fidelity scheduler.

  • One input argument to your script is the maximum number of resources. Your script loops over resources until this is reached, then terminates.

  • At the end of this resource loop (e.g., loop over training epochs), you report metrics. Here, you need to report the current resource level as well (e.g., number of epochs trained so far).

  • It is recommended to support checkpointing, as is detailed here.

Note

In pause-and-resume multi-fidelity schedulers, we know for how many resources each training job runs, since it is paused at the next rung level. Such schedulers will pass this resource level via max_resource_attr to the training script. This means that the script terminates on its own and does not have to be stopped by the trial execution backend.

How can I visualize the progress of my tuning experiment with Tensorboard?

To visualize the progress of Syne Tune in Tensorboard, you can pass the TensorboardCallback to the Tuner object:

from syne_tune.callbacks import TensorboardCallback

tuner = Tuner(
    ...
    callbacks=[TensorboardCallback()],
)

Note that, you need to install TensorboardX to use this callback:

pip install tensorboardX

The callback will log all metrics that are reported in your training script via the report(...) function. Now, to open Tensorboard, run:

tensorboard --logdir ~/syne-tune/{tuner-name}/tensorboard_output

If you want to plot the cumulative optimum of the metric you want to optimize, you can pass the target_metric argument to class:syne_tune.callbacks.TensorboardCallback. This will also report the best found hyperparameter configuration over time. A complete example is examples/launch_tensorboard_example.py.

How can I add a new scheduler?

This is explained in detail in this tutorial, and also in examples/launch_height_standalone_scheduler.

Please do consider contributing back your efforts to the Syne Tune community, thanks!

How can I add a new tabular or surrogate benchmark?

To add a new dataset of tabular evaluations, you need to:

Further details are given here.

How can I reduce delays in starting trials with the SageMaker backend?

The SageMaker backend executes each trial as a SageMaker training job, which encurs start-up delays up to several minutes. These delays can be reduced to about 20 seconds with SageMaker managed warm pools, as is detailed in this tutorial or this example. We strongly recommend to use managed warm pools with the SageMaker backend.

How can I pass lists or dictionaries to the training script?

By default, the hyperparameter configuration is passed to the training script as command line arguments. This precludes parameters from having complex types, such as lists or dictionaries. The configuration can also be passed as JSON file, in which case its entries can have any type which is JSON-serializable. This mode is activated with pass_args_as_json=True when creating the trial backend:

examples/launch_height_config_json.py
        trial_backend = LocalBackend(
            entry_point=str(entry_point),
            pass_args_as_json=True,
        )

The trial backend stores the configuration as JSON file and passes its filename as command line argument. In the training script, the configuration is loaded as follows:

examples/training_scripts/height_example/train_height_config_json.py
    parser = ArgumentParser()
    # Append required argument(s):
    add_config_json_to_argparse(parser)
    args, _ = parser.parse_known_args()
    # Loads config JSON and merges with ``args``
    config = load_config_json(vars(args))

The complete example is here. Note that entries automatically appended to the configuration by Syne Tune, such as ST_CHECKPOINT_DIR, are passed as command line arguments in any case.

How can I write extra results for an experiment?

By default, Syne Tune is writing these result files at the end of an experiment. Here, results.csv.zip contains all data reported by training jobs, along with time stamps. The contents of this dataframe can be customized, by adding extra columns to it, as demonstrated in examples/launch_height_extra_results.py.