syne_tune.experiments package

class syne_tune.experiments.ExperimentResult(name, results, metadata, tuner, path)[source]

Bases: object

Wraps results dataframe and provides retrieval services.

Parameters:
  • name (str) – Name of experiment

  • results (DataFrame) – Dataframe containing results of experiment

  • metadata (Dict[str, Any]) – Metadata stored along with results

  • tuner (Tuner) – Tuner object stored along with results

  • path (Path) – local path where the experiment is stored

name: str
results: DataFrame
metadata: Dict[str, Any]
tuner: Tuner
path: Path
creation_date()[source]
Returns:

Timestamp when Tuner was created

plot_hypervolume(metrics_to_plot=None, reference_point=None, figure_path=None, **plt_kwargs)[source]

Plot best hypervolume value as function of wallclock time

Parameters:
  • reference_point (Optional[ndarray]) – Reference point for hypervolume calculations. If None, the maximum values of each metric is used.

  • figure_path (Optional[str]) – If specified, defines the path where the figure will be saved. If None, the figure is shown

  • plt_kwargs – Arguments to matplotlib.pyplot.plot()

plot(metric_to_plot=0, figure_path=None, **plt_kwargs)[source]

Plot best metric value as function of wallclock time

Parameters:
  • metric_to_plot (Union[str, int]) – Indicates which metric to plot, can be the index or a name of the metric. default to 0 - first metric defined

  • figure_path (Optional[str]) – If specified, defines the path where the figure will be saved. If None, the figure is shown

  • plt_kwargs – Arguments to matplotlib.pyplot.plot()

plot_trials_over_time(metric_to_plot=0, figure_path=None, figsize=None)[source]

Plot trials results over as function of wallclock time

Parameters:
  • metric_to_plot (Union[str, int]) – Indicates which metric to plot, can be the index or a name of the metric. default to 0 - first metric defined

  • figure_path (Optional[str]) – If specified, defines the path where the figure will be saved. If None, the figure is shown

  • figsize – width and height of figure

metric_mode()[source]
Return type:

Union[str, List[str]]

metric_names()[source]
Return type:

List[str]

entrypoint_name()[source]
Return type:

str

best_config(metric=0)[source]

Return the best config found for the specified metric :type metric: Union[str, int] :param metric: Indicates which metric to use, can be the index or a name of the metric.

default to 0 - first metric defined in the Scheduler

Return type:

Dict[str, Any]

Returns:

Configuration corresponding to best metric value

syne_tune.experiments.load_experiment(tuner_name, download_if_not_found=True, load_tuner=False, local_path=None, experiment_name=None)[source]

Load results from an experiment

Parameters:
  • tuner_name (str) – Name of a tuning experiment previously run

  • download_if_not_found (bool) – If True, fetch results from S3 if not found locally

  • load_tuner (bool) – Whether to load the tuner in addition to metadata and results

  • local_path (Optional[str]) – Path containing the experiment to load. If not specified, ~/{SYNE_TUNE_FOLDER}/ is used.

  • experiment_name (Optional[str]) – If given, this is used as first directory.

Return type:

ExperimentResult

Returns:

Result object

syne_tune.experiments.get_metadata(path_filter=None, root=PosixPath('/home/docs/syne-tune'))[source]

Load meta-data for a number of experiments

Parameters:
  • path_filter (Optional[Callable[[str], bool]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.

  • root (Path) – Root path for experiment results. Default is experiment_path()

Return type:

Dict[str, dict]

Returns:

Dictionary from tuner name to metadata dict

syne_tune.experiments.list_experiments(path_filter=None, experiment_filter=None, root=PosixPath('/home/docs/syne-tune'), load_tuner=False)[source]

List experiments for which results are found

Parameters:
  • path_filter (Optional[Callable[[str], bool]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.

  • experiment_filter (Optional[Callable[[ExperimentResult], bool]]) – Filter on ExperimentResult, optional

  • root (Path) – Root path for experiment results. Default is result of experiment_path()

  • load_tuner (bool) – Whether to load the tuner in addition to metadata and results

Return type:

List[ExperimentResult]

Returns:

List of result objects

syne_tune.experiments.load_experiments_df(path_filter=None, experiment_filter=None, root=PosixPath('/home/docs/syne-tune'), load_tuner=False)[source]
Parameters:
  • path_filter (Optional[Callable[[str], bool]]) – If passed then only experiments whose path matching the filter are kept. This allows rapid filtering in the presence of many experiments.

  • experiment_filter (Optional[Callable[[ExperimentResult], bool]]) – Filter on ExperimentResult

  • root (Path) – Root path for experiment results. Default is experiment_path()

  • load_tuner (bool) – Whether to load the tuner in addition to metadata and results

Return type:

DataFrame

Returns:

Dataframe that contains all evaluations reported by tuners according to the filter given. The columns contain trial-id, hyperparameter evaluated, metrics reported via Reporter. These metrics are collected automatically:

  • st_worker_time (indicating time spent in the worker when report was seen)

  • time (indicating wallclock time measured by the tuner)

  • decision decision taken by the scheduler when observing the result

  • status status of the trial that was shown to the tuner

  • config_{xx} configuration value for the hyperparameter {xx}

  • tuner_name named passed when instantiating the Tuner

  • entry_point_name, entry_point_path name and path of the entry point that was tuned

class syne_tune.experiments.ComparativeResults(experiment_names, setups, num_runs, metadata_to_setup, plot_params=None, metadata_to_subplot=None, benchmark_key='benchmark', with_subdirs='*', datetime_bounds=None, metadata_keys=None, metadata_subplot_level=False, download_from_s3=False, s3_bucket=None)[source]

Bases: object

This class loads, processes, and plots results of a comparative study, combining several experiments for different methods, seeds, and benchmarks (optional). Note that an experiment corresponds to one run of HPO, resulting in files ST_METADATA_FILENAME for metadata, and ST_RESULTS_DATAFRAME_FILENAME for time-stamped results.

There is one comparative plot per benchmark (aggregation of results across benchmarks are not supported here). Results are grouped by setup (which usually equates to method), and then summary statistics are shown for each setup as function of wall-clock time. The plot can also have several subplots, in which case results are first grouped into subplot number, then setup.

If benchmark_key is None, there is only a single benchmark, and all results are merged together.

Both setup name and subplot number (optional) can be configured by the user, as function of metadata written for each experiment. The functions metadata_to_setup and metadata_to_subplot (optional) can also be used for filtering: results of experiments for which any of them returns None, are not used.

When grouping results w.r.t. benchmark name and setup name, we should end up with num_runs experiments. These are (typically) random repetitions with different seeds. If after grouping, a different number of experiments is found for some setup, a warning message is printed. In this case, we recommend to check the completeness of result files. Common reasons:

  • Less than num_runs experiments found. Experiments failed, or files were not properly synced.

  • More than num_runs experiments found. This happens if initial experiments for the study failed, but ended up writing results. This can be fixed by either removing the result files, or by using datetime_bounds (since initial failed experiments ran first).

Result files have the path f"{experiment_path()}{ename}/{patt}/{ename}-*/", where path is from with_subdirs, and ename from experiment_names. The default is with_subdirs="*". If with_subdirs is None, result files have the path f"{experiment_path()}{ename}-*/". Use this if your experiments have been run locally.

If datetime_bounds is given, it contains a tuple of strings (lower_time, upper_time), or a dictionary mapping names from experiment_names to such tuples. Both strings are time-stamps in the format ST_DATETIME_FORMAT (example: “2023-03-19-22-01-57”), and each can be None as well. This serves to filter out any result whose time-stamp does not fall within the interval (both sides are inclusive), where None means the interval is open on that side. This feature is useful to filter out results of erroneous attempts.

If metadata_keys is given, it contains a list of keys into the metadata. In this case, metadata values for these keys are extracted and can be retrieved with metadata_values(). In fact, metadata_values(benchmark_name) returns a nested dictionary, where result[key][setup_name] is a list of values. If metadata_subplot_level is True and metadata_to_subplot is given, the result structure is result[key][setup_name][subplot_no]. This should be set if different subplots share the same setup names, since otherwise metadata values are only grouped by setup name.

Parameters:
  • experiment_names (Tuple[str, ...]) – Tuple of experiment names (prefixes, without the timestamps)

  • setups (Iterable[str]) – Possible values of setup names

  • num_runs (int) – When grouping results w.r.t. benchmark name and setup name, we should end up with this many experiments. See above

  • metadata_to_setup (Union[Callable[[Dict[str, Any]], Optional[str]], Dict[str, Callable[[Dict[str, Any]], Optional[str]]]]) – See above

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Can be overwritten in plot(). See PlotParameters

  • metadata_to_subplot (Optional[Callable[[Dict[str, Any]], Optional[int]]]) – See above. Optional

  • benchmark_key (Optional[str]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this is None, there is only a single benchmark, and all results are merged together

  • with_subdirs (Union[str, List[str], None]) – See above. Defaults to “*”

  • datetime_bounds (Union[Tuple[Optional[str], Optional[str]], Dict[str, Tuple[Optional[str], Optional[str]]], None]) – See above

  • metadata_keys (Optional[List[str]]) – See above

  • metadata_subplot_level (bool) – See above. Defaults to False

  • download_from_s3 (bool) – Should result files be downloaded from S3? This is supported only if with_subdirs

  • s3_bucket (Optional[str]) – Only if download_from_s3 == True. If not given, the default bucket for the SageMaker session is used

metadata_values(benchmark_name=None)[source]

The nested dictionary returned has the structure result[key][setup_name], or result[key][setup_name][subplot_no] if metadata_subplot_level == True.

Parameters:

benchmark_name (Optional[str]) – Name of benchmark

Return type:

Dict[str, Any]

Returns:

Nested dictionary with meta-data values

plot(benchmark_name=None, plot_params=None, file_name=None, extra_results_keys=None, dataframe_column_generator=None, one_result_per_trial=False)[source]

Create comparative plot from results of all experiments collected at construction, for benchmark benchmark_name (if there is a single benchmark only, this need not be given).

If plot_params.show_init_trials is given, the best metric value curve for the data from trials <=  plot_params.show_init_trials.trial_id in a particular setup plot_params.show_init_trials.setup_name is shown in all subplots the setup is contained in. This is useful to contrast the performance of methods against the performance for one particular trial, for example the initial configuration (i.e., to show how much this can be improved upon). The final metric value of this extra curve is extended until the end of the horizontal range, in order to make it visible. The corresponding curve is labeled with plot_params.show_init_trials.new_setup_name in the legend.

If extra_results_keys is given, these are column names in the result dataframe. For each setup and seed, we collect the values for the largest time stamp. We return a nested dictionary extra_results, so that extra_results[setup_name][key] contains values (over seeds), where key is in extra_results_keys. If metadata_subplot_level is True and metadata_to_subplot is given, the structure is extra_results[setup_name][subplot_no][key].

If dataframe_column_generator is given, it maps a result dataframe for a single experiment to a new column named plot_params.metric. This is applied before computing cumulative maximum or minimum and aggregation over seeds. This way, we can plot derived metrics which are not contained in the results as columns. Note that the transformed dataframe is not retained.

Parameters:
  • benchmark_name (Optional[str]) – Name of benchmark for which to plot results. Not needed if there is only one benchmark

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.

  • file_name (Optional[str]) – If given, the figure is stored in a file of this name

  • extra_results_keys (Optional[List[str]]) – See above, optional

  • dataframe_column_generator (Optional[Callable[[DataFrame], Series]]) – See above, optional

  • one_result_per_trial (bool) – If True, results for each experiment are filtered down to one row per trial (the one with the largest time stamp). This is useful for results from a single-fidelity method, where the training script reported results after every epoch.

Return type:

Dict[str, Any]

Returns:

Dictionary with “fig”, “axs” (for further processing). If extra_results_keys, “extra_results” entry as stated above

class syne_tune.experiments.PlotParameters(metric=None, mode=None, title=None, xlabel=None, ylabel=None, xlim=None, ylim=None, metric_multiplier=None, convert_to_min=None, tick_params=None, aggregate_mode=None, dpi=None, grid=None, subplots=None, show_init_trials=None)[source]

Bases: object

Parameters specifying the figure.

If convert_to_min == True, then smaller is better in plots. An original metric value metric_val is converted as metric_multiplier * metric_val if mode == "min", and as 1 - metric_multiplier * metric_val if mode == "max". If convert_to_min == False`, we always convert as metric_multiplier * metric_val, so that larger is better if mode == "max".

Parameters:
  • metric (Optional[str]) – Name of metric, mandatory

  • mode (Optional[str]) – See above, “min” or “max”. Defaults to “min” if not given

  • title (Optional[str]) – Title of plot. If subplots is used, see SubplotParameters

  • xlabel (Optional[str]) – Label for x axis. If subplots is used, this is printed below each column. Defaults to DEFAULT_XLABEL

  • ylabel (Optional[str]) – Label for y axis. If subplots is used, this is printed left of each row

  • xlim (Optional[Tuple[float, float]]) – (x_min, x_max) for x axis. If subplots is used, see SubplotParameters

  • ylim (Optional[Tuple[float, float]]) – (y_min, y_max) for y axis.

  • metric_multiplier (Optional[float]) – See above. Defaults to 1

  • convert_to_min (Optional[bool]) – See above. Defaults to True

  • tick_params (Optional[Dict[str, Any]]) – Params for ax.tick_params

  • aggregate_mode (Optional[str]) –

    How are values across seeds aggregated?

    • ”mean_and_ci”: Mean and 0.95 normal confidence interval

    • ”median_percentiles”: Mean and 25, 75 percentiles

    • ”iqm_bootstrap”: Interquartile mean and 0.95 confidence interval based on the bootstrap variance estimate

    Defaults to DEFAULT_AGGREGATE_MODE

  • dpi (Optional[int]) – Resolution of figure in DPI. Defaults to 200

  • grid (Optional[bool]) – Figure with grid? Defaults to False

  • subplots (Optional[SubplotParameters]) – If given, the figure consists of several subplots. See SubplotParameters

  • show_init_trials (Optional[ShowTrialParameters]) – See ShowTrialParameters

metric: str = None
mode: str = None
title: str = None
xlabel: str = None
ylabel: str = None
xlim: Tuple[float, float] = None
ylim: Tuple[float, float] = None
metric_multiplier: float = None
convert_to_min: bool = None
tick_params: Dict[str, Any] = None
aggregate_mode: str = None
dpi: int = None
grid: bool = None
subplots: SubplotParameters = None
show_init_trials: ShowTrialParameters = None
merge_defaults(default_params)[source]
Return type:

PlotParameters

class syne_tune.experiments.SubplotParameters(nrows=None, ncols=None, titles=None, title_each_figure=None, kwargs=None, legend_no=None, xlims=None, subplot_indices=None)[source]

Bases: object

Parameters specifying an arrangement of subplots. kwargs is mandatory.

Parameters:
  • nrows (Optional[int]) – Number of rows of subplot matrix

  • ncols (Optional[int]) – Number of columns of subplot matrix

  • titles (Optional[List[str]]) – If given, these are titles for each column in the arrangement of subplots. If title_each_figure == True, these are titles for each subplot. If titles is not given, then PlotParameters.title is printed on top of the leftmost column

  • title_each_figure (Optional[bool]) – See titles, defaults to False

  • kwargs (Optional[Dict[str, Any]]) – Extra arguments for plt.subplots, apart from “nrows” and “ncols”

  • legend_no (Optional[List[int]]) – Subplot indices where legend is to be shown. Defaults to [] (no legends shown). This is not relative to subplot_indices

  • xlims (Optional[List[int]]) – If this is given, must be a list with one entry per subfigure. In this case, the global xlim is overwritten by (0, xlims[subplot_no]). If subplot_indices is given, xlims must have the same length, and xlims[j] refers to subplot index subplot_indices[j] then

  • subplot_indices (Optional[List[int]]) – If this is given, we only plot subfigures with indices in this list, and in this order. Otherwise, we plot subfigures 0, 1, 2, …

nrows: int = None
ncols: int = None
titles: List[str] = None
title_each_figure: bool = None
kwargs: Dict[str, Any] = None
legend_no: List[int] = None
xlims: List[int] = None
subplot_indices: List[int] = None
merge_defaults(default_params)[source]
Return type:

SubplotParameters

class syne_tune.experiments.ShowTrialParameters(setup_name=None, trial_id=None, new_setup_name=None)[source]

Bases: object

Parameters specifying the show_init_trials feature. This features adds one more curve to each subplot where setup_name features. This curve shows best metric value found for trials with ID <= trial_id. The right-most value is extended as constant line across the remainder of the x-axis, for better visibility.

Parameters:
  • setup_name (Optional[str]) – Setup from which the trial performance is taken

  • trial_id (Optional[int]) – ID of trial. Defaults to 0. If this is positive, data from trials with IDs <= trial_id are shown

  • new_setup_name (Optional[str]) – Name of the additional curve in legends

setup_name: str = None
trial_id: int = None
new_setup_name: str = None
merge_defaults(default_params)[source]
Return type:

ShowTrialParameters

class syne_tune.experiments.TrialsOfExperimentResults(experiment_names, setups, metadata_to_setup, plot_params=None, multi_fidelity_params=None, benchmark_key='benchmark', seed_key='seed', with_subdirs='*', datetime_bounds=None, download_from_s3=False, s3_bucket=None)[source]

Bases: object

This class loads, processes, and plots metric results for single experiments, where the curves for different trials have different colours.

Compared to ComparativeResults, each subfigure uses data from a single experiment (one benchmark, one seed, one setup). Both benchmark and seed need to be chosen in plot(). If there are different setups, they give rise to subfigures.

If plot_params.subplots is not given, the arrangement is one row with columns corresponding to setups, and setup names as titles. Specify plot_params.subplots in order to change this arrangement (e.g., to have more than one row). Setups can be selected by using plot_params.subplots.subplot_indices. Also, if plot_params.subplots.titles is not given, we use setup names, and each subplot gets its own title (plot_params.subplots.title_each_figure is ignored).

For plot_params, we use the same PlotParameters as in ComparativeResults, but some fields are not used here (title, aggregate_mode, show_one_trial, subplots.legend_no, subplots.xlims).

Parameters:
  • experiment_names (Tuple[str, ...]) – Tuple of experiment names (prefixes, without the timestamps)

  • setups (Iterable[str]) – Possible values of setup names

  • metadata_to_setup (Union[Callable[[Dict[str, Any]], Optional[str]], Dict[str, Callable[[Dict[str, Any]], Optional[str]]]]) – See above

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Can be overwritten in plot(). See PlotParameters

  • multi_fidelity_params (Optional[MultiFidelityParameters]) – If given, we use a special variant tailored to multi-fidelity methods (see plot()).

  • benchmark_key (Optional[str]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this is None, there is only a single benchmark, and all results are merged together

  • seed_key (str) – Key for seed in metadata files. Defaults to “seed”.

  • with_subdirs (Union[str, List[str], None]) – See above. Defaults to “*”

  • datetime_bounds (Union[Tuple[Optional[str], Optional[str]], Dict[str, Tuple[Optional[str], Optional[str]]], None]) – See above

  • download_from_s3 (bool) – Should result files be downloaded from S3? This is supported only if with_subdirs

  • s3_bucket (Optional[str]) – Only if download_from_s3 == True. If not given, the default bucket for the SageMaker session is used

plot(benchmark_name=None, seed=0, plot_params=None, file_name=None)[source]

Creates a plot, whose subfigures should metric data from single experiments. In general:

  • Each trial has its own color, which is cycled through periodically. The cycling depends on the largest rung level for the trial. This is to avoid neighboring curves to have the same color

For single-fidelity methods (default, multi_fidelity_params not given):

  • The learning curve for a trial ends with ‘o’. If it reports only once at the end, this is all that is shown for the trial

For multi-fidelity methods:

  • Learning curves are plotted in contiguous chunks of execution. For pause and resume setups (those in ``multi_fidelity_params.pause_resume_setups), they are interrupted. Each chunk starts at the epoch after resume and ends at the epoch where the trial is paused

  • Values at rung levels are marked as ‘o’. If this is the furthest the trial got to, the marker is ‘D’ (diamond)

Results for different setups are plotted as subfigures, either using the setup in plot_params.subplots, or as columns of a single row.

Parameters:
  • benchmark_name (Optional[str]) – Name of benchmark for which to plot results. Not needed if there is only one benchmark

  • seed (int) – Seed number. Defaults to 0

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.

  • file_name (Optional[str]) – If given, the figure is stored in a file of this name

class syne_tune.experiments.MultiFidelityParameters(rung_levels, multifidelity_setups)[source]

Bases: object

Parameters configuring the multi-fidelity version of TrialsOfExperimentResults.

multifidelity_setups contains names of setups which are multi-fidelity, the remaining ones are single-fidelity. It can also be a dictionary, mapping a multi-fidelity setup name to True if this is a pause-and-resume method (these are visualized differently), False otherwise (early stopping method).

Parameters:
  • rung_levels (List[int]) – See above. Positive integers, increasing

  • multifidelity_setups (Union[List[str], Dict[str, bool]]) – See above

rung_levels: List[int]
multifidelity_setups: Union[List[str], Dict[str, bool]]
check_params(setups)[source]
syne_tune.experiments.hypervolume_indicator_column_generator(metrics_and_modes, reference_point=None, increment=1)[source]

Returns generator for new dataframe column containing the best hypervolume indicator as function of wall-clock time, based on the metrics in metrics_and_modes (metric names correspond to column names in the dataframe). For a metric with mode == "max", we use its negative.

This mapping is used to create the dataframe_column_generator argument of plot(). Since the current implementation is not incremental and quite slow, if you plot results for single-fidelity HPO methods, it is strongly recommended to also use one_result_per_trial=True:

results = ComparativeResults(...)
dataframe_column_generator = hypervolume_indicator_column_generator(
    metrics_and_modes
)
plot_params = PlotParameters(
    metric="hypervolume_indicator",
    mode="max",
)
results.plot(
    benchmark_name=benchmark_name,
    plot_params=plot_params,
    dataframe_column_generator=dataframe_column_generator,
    one_result_per_trial=True,
)
Parameters:
  • metrics_and_modes (List[Tuple[str, str]]) – List of (metric, mode), see above

  • reference_point (Optional[ndarray]) – Reference point for hypervolume computation. If not given, a default value is used

  • increment (int) – If > 1, the HV indicator is linearly interpolated, this is faster. Defaults to 1 (no interpolation)

Returns:

Dataframe column generator

Subpackages

Submodules