syne_tune.experiments.visualization.plotting module

class syne_tune.experiments.visualization.plotting.SubplotParameters(nrows=None, ncols=None, titles=None, title_each_figure=None, kwargs=None, legend_no=None, xlims=None, subplot_indices=None)[source]

Bases: object

Parameters specifying an arrangement of subplots. kwargs is mandatory.

Parameters:
  • nrows (Optional[int]) – Number of rows of subplot matrix

  • ncols (Optional[int]) – Number of columns of subplot matrix

  • titles (Optional[List[str]]) – If given, these are titles for each column in the arrangement of subplots. If title_each_figure == True, these are titles for each subplot. If titles is not given, then PlotParameters.title is printed on top of the leftmost column

  • title_each_figure (Optional[bool]) – See titles, defaults to False

  • kwargs (Optional[Dict[str, Any]]) – Extra arguments for plt.subplots, apart from “nrows” and “ncols”

  • legend_no (Optional[List[int]]) – Subplot indices where legend is to be shown. Defaults to [] (no legends shown). This is not relative to subplot_indices

  • xlims (Optional[List[int]]) – If this is given, must be a list with one entry per subfigure. In this case, the global xlim is overwritten by (0, xlims[subplot_no]). If subplot_indices is given, xlims must have the same length, and xlims[j] refers to subplot index subplot_indices[j] then

  • subplot_indices (Optional[List[int]]) – If this is given, we only plot subfigures with indices in this list, and in this order. Otherwise, we plot subfigures 0, 1, 2, …

nrows: int = None
ncols: int = None
titles: List[str] = None
title_each_figure: bool = None
kwargs: Dict[str, Any] = None
legend_no: List[int] = None
xlims: List[int] = None
subplot_indices: List[int] = None
merge_defaults(default_params)[source]
Return type:

SubplotParameters

class syne_tune.experiments.visualization.plotting.ShowTrialParameters(setup_name=None, trial_id=None, new_setup_name=None)[source]

Bases: object

Parameters specifying the show_init_trials feature. This features adds one more curve to each subplot where setup_name features. This curve shows best metric value found for trials with ID <= trial_id. The right-most value is extended as constant line across the remainder of the x-axis, for better visibility.

Parameters:
  • setup_name (Optional[str]) – Setup from which the trial performance is taken

  • trial_id (Optional[int]) – ID of trial. Defaults to 0. If this is positive, data from trials with IDs <= trial_id are shown

  • new_setup_name (Optional[str]) – Name of the additional curve in legends

setup_name: str = None
trial_id: int = None
new_setup_name: str = None
merge_defaults(default_params)[source]
Return type:

ShowTrialParameters

class syne_tune.experiments.visualization.plotting.PlotParameters(metric=None, mode=None, title=None, xlabel=None, ylabel=None, xlim=None, ylim=None, metric_multiplier=None, convert_to_min=None, tick_params=None, aggregate_mode=None, dpi=None, grid=None, subplots=None, show_init_trials=None)[source]

Bases: object

Parameters specifying the figure.

If convert_to_min == True, then smaller is better in plots. An original metric value metric_val is converted as metric_multiplier * metric_val if mode == "min", and as 1 - metric_multiplier * metric_val if mode == "max". If convert_to_min == False`, we always convert as metric_multiplier * metric_val, so that larger is better if mode == "max".

Parameters:
  • metric (Optional[str]) – Name of metric, mandatory

  • mode (Optional[str]) – See above, “min” or “max”. Defaults to “min” if not given

  • title (Optional[str]) – Title of plot. If subplots is used, see SubplotParameters

  • xlabel (Optional[str]) – Label for x axis. If subplots is used, this is printed below each column. Defaults to DEFAULT_XLABEL

  • ylabel (Optional[str]) – Label for y axis. If subplots is used, this is printed left of each row

  • xlim (Optional[Tuple[float, float]]) – (x_min, x_max) for x axis. If subplots is used, see SubplotParameters

  • ylim (Optional[Tuple[float, float]]) – (y_min, y_max) for y axis.

  • metric_multiplier (Optional[float]) – See above. Defaults to 1

  • convert_to_min (Optional[bool]) – See above. Defaults to True

  • tick_params (Optional[Dict[str, Any]]) – Params for ax.tick_params

  • aggregate_mode (Optional[str]) –

    How are values across seeds aggregated?

    • ”mean_and_ci”: Mean and 0.95 normal confidence interval

    • ”median_percentiles”: Mean and 25, 75 percentiles

    • ”iqm_bootstrap”: Interquartile mean and 0.95 confidence interval based on the bootstrap variance estimate

    Defaults to DEFAULT_AGGREGATE_MODE

  • dpi (Optional[int]) – Resolution of figure in DPI. Defaults to 200

  • grid (Optional[bool]) – Figure with grid? Defaults to False

  • subplots (Optional[SubplotParameters]) – If given, the figure consists of several subplots. See SubplotParameters

  • show_init_trials (Optional[ShowTrialParameters]) – See ShowTrialParameters

metric: str = None
mode: str = None
title: str = None
xlabel: str = None
ylabel: str = None
xlim: Tuple[float, float] = None
ylim: Tuple[float, float] = None
metric_multiplier: float = None
convert_to_min: bool = None
tick_params: Dict[str, Any] = None
aggregate_mode: str = None
dpi: int = None
grid: bool = None
subplots: SubplotParameters = None
show_init_trials: ShowTrialParameters = None
merge_defaults(default_params)[source]
Return type:

PlotParameters

syne_tune.experiments.visualization.plotting.group_results_dataframe(df)[source]
Return type:

Dict[Tuple[int, str], List[Tuple[str, DataFrame]]]

syne_tune.experiments.visualization.plotting.filter_final_row_per_trial(grouped_dfs)[source]

We filter rows such that only one row per trial ID remains, namely the one with the largest time stamp. This makes sense for single-fidelity methods, where reports have still been done after every epoch.

Return type:

Dict[Tuple[int, str], List[Tuple[str, DataFrame]]]

syne_tune.experiments.visualization.plotting.enrich_results(grouped_dfs, column_name, dataframe_column_generator)[source]
Return type:

Dict[Tuple[int, str], List[Tuple[str, DataFrame]]]

class syne_tune.experiments.visualization.plotting.ComparativeResults(experiment_names, setups, num_runs, metadata_to_setup, plot_params=None, metadata_to_subplot=None, benchmark_key='benchmark', with_subdirs='*', datetime_bounds=None, metadata_keys=None, metadata_subplot_level=False, download_from_s3=False, s3_bucket=None)[source]

Bases: object

This class loads, processes, and plots results of a comparative study, combining several experiments for different methods, seeds, and benchmarks (optional). Note that an experiment corresponds to one run of HPO, resulting in files ST_METADATA_FILENAME for metadata, and ST_RESULTS_DATAFRAME_FILENAME for time-stamped results.

There is one comparative plot per benchmark (aggregation of results across benchmarks are not supported here). Results are grouped by setup (which usually equates to method), and then summary statistics are shown for each setup as function of wall-clock time. The plot can also have several subplots, in which case results are first grouped into subplot number, then setup.

If benchmark_key is None, there is only a single benchmark, and all results are merged together.

Both setup name and subplot number (optional) can be configured by the user, as function of metadata written for each experiment. The functions metadata_to_setup and metadata_to_subplot (optional) can also be used for filtering: results of experiments for which any of them returns None, are not used.

When grouping results w.r.t. benchmark name and setup name, we should end up with num_runs experiments. These are (typically) random repetitions with different seeds. If after grouping, a different number of experiments is found for some setup, a warning message is printed. In this case, we recommend to check the completeness of result files. Common reasons:

  • Less than num_runs experiments found. Experiments failed, or files were not properly synced.

  • More than num_runs experiments found. This happens if initial experiments for the study failed, but ended up writing results. This can be fixed by either removing the result files, or by using datetime_bounds (since initial failed experiments ran first).

Result files have the path f"{experiment_path()}{ename}/{patt}/{ename}-*/", where path is from with_subdirs, and ename from experiment_names. The default is with_subdirs="*". If with_subdirs is None, result files have the path f"{experiment_path()}{ename}-*/". Use this if your experiments have been run locally.

If datetime_bounds is given, it contains a tuple of strings (lower_time, upper_time), or a dictionary mapping names from experiment_names to such tuples. Both strings are time-stamps in the format ST_DATETIME_FORMAT (example: “2023-03-19-22-01-57”), and each can be None as well. This serves to filter out any result whose time-stamp does not fall within the interval (both sides are inclusive), where None means the interval is open on that side. This feature is useful to filter out results of erroneous attempts.

If metadata_keys is given, it contains a list of keys into the metadata. In this case, metadata values for these keys are extracted and can be retrieved with metadata_values(). In fact, metadata_values(benchmark_name) returns a nested dictionary, where result[key][setup_name] is a list of values. If metadata_subplot_level is True and metadata_to_subplot is given, the result structure is result[key][setup_name][subplot_no]. This should be set if different subplots share the same setup names, since otherwise metadata values are only grouped by setup name.

Parameters:
  • experiment_names (Tuple[str, ...]) – Tuple of experiment names (prefixes, without the timestamps)

  • setups (Iterable[str]) – Possible values of setup names

  • num_runs (int) – When grouping results w.r.t. benchmark name and setup name, we should end up with this many experiments. See above

  • metadata_to_setup (Union[Callable[[Dict[str, Any]], Optional[str]], Dict[str, Callable[[Dict[str, Any]], Optional[str]]]]) – See above

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Can be overwritten in plot(). See PlotParameters

  • metadata_to_subplot (Optional[Callable[[Dict[str, Any]], Optional[int]]]) – See above. Optional

  • benchmark_key (Optional[str]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this is None, there is only a single benchmark, and all results are merged together

  • with_subdirs (Union[str, List[str], None]) – See above. Defaults to “*”

  • datetime_bounds (Union[Tuple[Optional[str], Optional[str]], Dict[str, Tuple[Optional[str], Optional[str]]], None]) – See above

  • metadata_keys (Optional[List[str]]) – See above

  • metadata_subplot_level (bool) – See above. Defaults to False

  • download_from_s3 (bool) – Should result files be downloaded from S3? This is supported only if with_subdirs

  • s3_bucket (Optional[str]) – Only if download_from_s3 == True. If not given, the default bucket for the SageMaker session is used

metadata_values(benchmark_name=None)[source]

The nested dictionary returned has the structure result[key][setup_name], or result[key][setup_name][subplot_no] if metadata_subplot_level == True.

Parameters:

benchmark_name (Optional[str]) – Name of benchmark

Return type:

Dict[str, Any]

Returns:

Nested dictionary with meta-data values

plot(benchmark_name=None, plot_params=None, file_name=None, extra_results_keys=None, dataframe_column_generator=None, one_result_per_trial=False)[source]

Create comparative plot from results of all experiments collected at construction, for benchmark benchmark_name (if there is a single benchmark only, this need not be given).

If plot_params.show_init_trials is given, the best metric value curve for the data from trials <=  plot_params.show_init_trials.trial_id in a particular setup plot_params.show_init_trials.setup_name is shown in all subplots the setup is contained in. This is useful to contrast the performance of methods against the performance for one particular trial, for example the initial configuration (i.e., to show how much this can be improved upon). The final metric value of this extra curve is extended until the end of the horizontal range, in order to make it visible. The corresponding curve is labeled with plot_params.show_init_trials.new_setup_name in the legend.

If extra_results_keys is given, these are column names in the result dataframe. For each setup and seed, we collect the values for the largest time stamp. We return a nested dictionary extra_results, so that extra_results[setup_name][key] contains values (over seeds), where key is in extra_results_keys. If metadata_subplot_level is True and metadata_to_subplot is given, the structure is extra_results[setup_name][subplot_no][key].

If dataframe_column_generator is given, it maps a result dataframe for a single experiment to a new column named plot_params.metric. This is applied before computing cumulative maximum or minimum and aggregation over seeds. This way, we can plot derived metrics which are not contained in the results as columns. Note that the transformed dataframe is not retained.

Parameters:
  • benchmark_name (Optional[str]) – Name of benchmark for which to plot results. Not needed if there is only one benchmark

  • plot_params (Optional[PlotParameters]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.

  • file_name (Optional[str]) – If given, the figure is stored in a file of this name

  • extra_results_keys (Optional[List[str]]) – See above, optional

  • dataframe_column_generator (Optional[Callable[[DataFrame], Series]]) – See above, optional

  • one_result_per_trial (bool) – If True, results for each experiment are filtered down to one row per trial (the one with the largest time stamp). This is useful for results from a single-fidelity method, where the training script reported results after every epoch.

Return type:

Dict[str, Any]

Returns:

Dictionary with “fig”, “axs” (for further processing). If extra_results_keys, “extra_results” entry as stated above