Launching Experiments Remotely

As a machine learning practitioner, you operate in a highly competitive landscape. Your success depends to a large extent on whether you can decrease the time to the next decision. In this section, we discuss one important approach, namely how to increase the number of experiments run in parallel.

Note

Imports in our scripts are absolute against the root package transformer_wikitext2, so that only the code in benchmarking.nursery.odsc_tutorial has to be present. In order to run them, you need to append <abspath>/odsc_tutorial/ to the PYTHONPATH environment variable. This is required even if you have installed Syne Tune from source.

Launching our Study

Here is how we specified and ran experiments of our study. First, we specify a script for launching experiments locally:

transformer_wikitext2/local/hpo_main.py
from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.hpo_main_local import main


if __name__ == "__main__":
    main(methods, benchmark_definitions)

This is very simple, as most work is done by the generic syne_tune.experiments.launchers.hpo_main_local.main(). Note that hpo_main_local needs to be chosen, since we use the local backend.

This local launcher script can be used to configure your experiment, given additional command line arguments, as is explained in detail here.

You can use hpo_main.py to launch experiments locally, but they’ll run sequentially, one after the other, and you need to have all dependencies installed locally. A second script is needed in order to launch many experiments in parallel:

transformer_wikitext2/local/launch_remote.py
from pathlib import Path

from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments.launchers.launch_remote_local import launch_remote


if __name__ == "__main__":
    entry_point = Path(__file__).parent / "hpo_main.py"
    source_dependencies = [str(Path(__file__).parent.parent)]
    launch_remote(
        entry_point=entry_point,
        methods=methods,
        benchmark_definitions=benchmark_definitions,
        source_dependencies=source_dependencies,
    )

Once more, all the hard work in done in syne_tune.experiments.launchers.launch_remote_local.launch_remote(), where launch_remote_local needs to be chosen for the local backend. Most important is that our previous hpo_main.py is specified as entry_point here. Here is the command to run all experiments of our study in parallel (replace ... by the absolute path to odsc_tutorial):

export PYTHONPATH="${PYTHONPATH}:/.../odsc_tutorial/"
python transformer_wikitext2/local/launch_remote.py \
  --experiment_tag odsc-1 --benchmark transformer_wikitext2 --num_seeds 10
  • This command launches 40 SageMaker training jobs, running 10 random repetitions (seeds) for each of the 4 methods specified in baselines.py.

  • Each SageMaker training job uses one ml.g4dn.12xlarge AWS instance. You can only run all 40 jobs in parallel if your resource limit for this instance type is 40 or larger. Each training job will run a little longer than 5 hours, as specified by max_wallclock_time.

  • You can use --instance_type and --max_wallclock_time command line arguments to change these defaults. However, if you choose an instance type with less than 4 GPUs, the local backend will not be able to run 4 trials in parallel.

  • If benchmark_definitions.py defines a single benchmark only, the --benchmark argument can also be dropped.

When using remote launching, results of your experiments are written to S3, to the default bucket for your AWS account. Once all jobs have finished (which takes a little more than 5 hours if you have sufficient limits, and otherwise longer), you can create the comparative plot shown above, using this script:

transformer_wikitext2/local/plot_results.py
from typing import Dict, Any, Optional
import logging

from transformer_wikitext2.baselines import methods
from transformer_wikitext2.benchmark_definitions import benchmark_definitions
from syne_tune.experiments import ComparativeResults, PlotParameters


SETUPS = list(methods.keys())


def metadata_to_setup(metadata: Dict[str, Any]) -> Optional[str]:
    return metadata["algorithm"]


if __name__ == "__main__":
    logging.getLogger().setLevel(logging.INFO)
    experiment_names = ("odsc-1",)
    num_runs = 10
    download_from_s3 = False  # Set ``True`` in order to download files from S3
    # Plot parameters across all benchmarks
    plot_params = PlotParameters(
        xlabel="wall-clock time",
        aggregate_mode="iqm_bootstrap",
        grid=True,
    )
    # The creation of ``results`` downloads files from S3 (only if
    # ``download_from_s3 == True``), reads the metadata and creates an inverse
    # index. If any result files are missing, or there are too many of them,
    # warning messages are printed
    results = ComparativeResults(
        experiment_names=experiment_names,
        setups=SETUPS,
        num_runs=num_runs,
        metadata_to_setup=metadata_to_setup,
        plot_params=plot_params,
        download_from_s3=download_from_s3,
    )
    # Create comparative plot (single panel)
    benchmark_name = "transformer_wikitext2"
    benchmark = benchmark_definitions(sagemaker_backend=False)[benchmark_name]
    # These parameters overwrite those given at construction
    plot_params = PlotParameters(
        metric=benchmark.metric,
        mode=benchmark.mode,
        ylim=(5, 8),
    )
    results.plot(
        benchmark_name=benchmark_name,
        plot_params=plot_params,
        file_name=f"./odsc-comparison-local-{benchmark_name}.png",
    )

For details about visualization of results in Syne Tune, please consider this tutorial. In a nutshell, this is what happens:

  • Collect and filter results from all experiments of a study

  • Group them according to setup (HPO method here), aggregate over seeds

  • Create plot in which each setup is represented by a curve and confidence bars