syne_tune.experiments.visualization.plotting module
- class syne_tune.experiments.visualization.plotting.SubplotParameters(nrows=None, ncols=None, titles=None, title_each_figure=None, kwargs=None, legend_no=None, xlims=None, subplot_indices=None)[source]
Bases:
object
Parameters specifying an arrangement of subplots.
kwargs
is mandatory.- Parameters:
nrows (
Optional
[int
]) – Number of rows of subplot matrixncols (
Optional
[int
]) – Number of columns of subplot matrixtitles (
Optional
[List
[str
]]) – If given, these are titles for each column in the arrangement of subplots. Iftitle_each_figure == True
, these are titles for each subplot. Iftitles
is not given, thenPlotParameters.title
is printed on top of the leftmost columntitle_each_figure (
Optional
[bool
]) – Seetitles
, defaults toFalse
kwargs (
Optional
[Dict
[str
,Any
]]) – Extra arguments forplt.subplots
, apart from “nrows” and “ncols”legend_no (
Optional
[List
[int
]]) – Subplot indices where legend is to be shown. Defaults to[]
(no legends shown). This is not relative tosubplot_indices
xlims (
Optional
[List
[int
]]) – If this is given, must be a list with one entry per subfigure. In this case, the globalxlim
is overwritten by(0, xlims[subplot_no])
. Ifsubplot_indices
is given,xlims
must have the same length, andxlims[j]
refers to subplot indexsubplot_indices[j]
thensubplot_indices (
Optional
[List
[int
]]) – If this is given, we only plot subfigures with indices in this list, and in this order. Otherwise, we plot subfigures 0, 1, 2, …
-
nrows:
int
= None
-
ncols:
int
= None
-
titles:
List
[str
] = None
-
title_each_figure:
bool
= None
-
kwargs:
Dict
[str
,Any
] = None
-
legend_no:
List
[int
] = None
-
xlims:
List
[int
] = None
-
subplot_indices:
List
[int
] = None
- class syne_tune.experiments.visualization.plotting.ShowTrialParameters(setup_name=None, trial_id=None, new_setup_name=None)[source]
Bases:
object
Parameters specifying the
show_init_trials
feature. This features adds one more curve to each subplot wheresetup_name
features. This curve shows best metric value found for trials with ID<= trial_id
. The right-most value is extended as constant line across the remainder of the x-axis, for better visibility.- Parameters:
setup_name (
Optional
[str
]) – Setup from which the trial performance is takentrial_id (
Optional
[int
]) – ID of trial. Defaults to 0. If this is positive, data from trials with IDs<= trial_id
are shownnew_setup_name (
Optional
[str
]) – Name of the additional curve in legends
-
setup_name:
str
= None
-
trial_id:
int
= None
-
new_setup_name:
str
= None
- class syne_tune.experiments.visualization.plotting.PlotParameters(metric=None, mode=None, title=None, xlabel=None, ylabel=None, xlim=None, ylim=None, metric_multiplier=None, convert_to_min=None, tick_params=None, aggregate_mode=None, dpi=None, grid=None, subplots=None, show_init_trials=None)[source]
Bases:
object
Parameters specifying the figure.
If
convert_to_min == True
, then smaller is better in plots. An original metric valuemetric_val
is converted asmetric_multiplier * metric_val
ifmode == "min"
, and as1 - metric_multiplier * metric_val
ifmode == "max"
. Ifconvert_to_min == False`
, we always convert asmetric_multiplier * metric_val
, so that larger is better ifmode == "max"
.- Parameters:
metric (
Optional
[str
]) – Name of metric, mandatorymode (
Optional
[str
]) – See above, “min” or “max”. Defaults to “min” if not giventitle (
Optional
[str
]) – Title of plot. Ifsubplots
is used, seeSubplotParameters
xlabel (
Optional
[str
]) – Label for x axis. Ifsubplots
is used, this is printed below each column. Defaults toDEFAULT_XLABEL
ylabel (
Optional
[str
]) – Label for y axis. Ifsubplots
is used, this is printed left of each rowxlim (
Optional
[Tuple
[float
,float
]]) –(x_min, x_max)
for x axis. Ifsubplots
is used, seeSubplotParameters
ylim (
Optional
[Tuple
[float
,float
]]) –(y_min, y_max)
for y axis.metric_multiplier (
Optional
[float
]) – See above. Defaults to 1convert_to_min (
Optional
[bool
]) – See above. Defaults toTrue
tick_params (
Optional
[Dict
[str
,Any
]]) – Params forax.tick_params
aggregate_mode (
Optional
[str
]) –How are values across seeds aggregated?
”mean_and_ci”: Mean and 0.95 normal confidence interval
”median_percentiles”: Mean and 25, 75 percentiles
”iqm_bootstrap”: Interquartile mean and 0.95 confidence interval based on the bootstrap variance estimate
Defaults to
DEFAULT_AGGREGATE_MODE
dpi (
Optional
[int
]) – Resolution of figure in DPI. Defaults to 200grid (
Optional
[bool
]) – Figure with grid? Defaults toFalse
subplots (
Optional
[SubplotParameters
]) – If given, the figure consists of several subplots. SeeSubplotParameters
show_init_trials (
Optional
[ShowTrialParameters
]) – SeeShowTrialParameters
-
metric:
str
= None
-
mode:
str
= None
-
title:
str
= None
-
xlabel:
str
= None
-
ylabel:
str
= None
-
xlim:
Tuple
[float
,float
] = None
-
ylim:
Tuple
[float
,float
] = None
-
metric_multiplier:
float
= None
-
convert_to_min:
bool
= None
-
tick_params:
Dict
[str
,Any
] = None
-
aggregate_mode:
str
= None
-
dpi:
int
= None
-
grid:
bool
= None
-
subplots:
SubplotParameters
= None
-
show_init_trials:
ShowTrialParameters
= None
- syne_tune.experiments.visualization.plotting.group_results_dataframe(df)[source]
- Return type:
Dict
[Tuple
[int
,str
],List
[Tuple
[str
,DataFrame
]]]
- syne_tune.experiments.visualization.plotting.filter_final_row_per_trial(grouped_dfs)[source]
We filter rows such that only one row per trial ID remains, namely the one with the largest time stamp. This makes sense for single-fidelity methods, where reports have still been done after every epoch.
- Return type:
Dict
[Tuple
[int
,str
],List
[Tuple
[str
,DataFrame
]]]
- syne_tune.experiments.visualization.plotting.enrich_results(grouped_dfs, column_name, dataframe_column_generator)[source]
- Return type:
Dict
[Tuple
[int
,str
],List
[Tuple
[str
,DataFrame
]]]
- class syne_tune.experiments.visualization.plotting.ComparativeResults(experiment_names, setups, num_runs, metadata_to_setup, plot_params=None, metadata_to_subplot=None, benchmark_key='benchmark', with_subdirs='*', datetime_bounds=None, metadata_keys=None, metadata_subplot_level=False, download_from_s3=False, s3_bucket=None)[source]
Bases:
object
This class loads, processes, and plots results of a comparative study, combining several experiments for different methods, seeds, and benchmarks (optional). Note that an experiment corresponds to one run of HPO, resulting in files
ST_METADATA_FILENAME
for metadata, andST_RESULTS_DATAFRAME_FILENAME
for time-stamped results.There is one comparative plot per benchmark (aggregation of results across benchmarks are not supported here). Results are grouped by setup (which usually equates to method), and then summary statistics are shown for each setup as function of wall-clock time. The plot can also have several subplots, in which case results are first grouped into subplot number, then setup.
If
benchmark_key is None
, there is only a single benchmark, and all results are merged together.Both setup name and subplot number (optional) can be configured by the user, as function of metadata written for each experiment. The functions
metadata_to_setup
andmetadata_to_subplot
(optional) can also be used for filtering: results of experiments for which any of them returnsNone
, are not used.When grouping results w.r.t. benchmark name and setup name, we should end up with
num_runs
experiments. These are (typically) random repetitions with different seeds. If after grouping, a different number of experiments is found for some setup, a warning message is printed. In this case, we recommend to check the completeness of result files. Common reasons:Less than
num_runs
experiments found. Experiments failed, or files were not properly synced.More than
num_runs
experiments found. This happens if initial experiments for the study failed, but ended up writing results. This can be fixed by either removing the result files, or by usingdatetime_bounds
(since initial failed experiments ran first).
Result files have the path
f"{experiment_path()}{ename}/{patt}/{ename}-*/"
, wherepath
is fromwith_subdirs
, andename
fromexperiment_names
. The default iswith_subdirs="*"
. Ifwith_subdirs
isNone
, result files have the pathf"{experiment_path()}{ename}-*/"
. Use this if your experiments have been run locally.If
datetime_bounds
is given, it contains a tuple of strings(lower_time, upper_time)
, or a dictionary mapping names fromexperiment_names
to such tuples. Both strings are time-stamps in the formatST_DATETIME_FORMAT
(example: “2023-03-19-22-01-57”), and each can beNone
as well. This serves to filter out any result whose time-stamp does not fall within the interval (both sides are inclusive), whereNone
means the interval is open on that side. This feature is useful to filter out results of erroneous attempts.If
metadata_keys
is given, it contains a list of keys into the metadata. In this case, metadata values for these keys are extracted and can be retrieved withmetadata_values()
. In fact,metadata_values(benchmark_name)
returns a nested dictionary, whereresult[key][setup_name]
is a list of values. Ifmetadata_subplot_level
isTrue
andmetadata_to_subplot
is given, the result structure isresult[key][setup_name][subplot_no]
. This should be set if different subplots share the same setup names, since otherwise metadata values are only grouped by setup name.- Parameters:
experiment_names (
Tuple
[str
,...
]) – Tuple of experiment names (prefixes, without the timestamps)setups (
Iterable
[str
]) – Possible values of setup namesnum_runs (
int
) – When grouping results w.r.t. benchmark name and setup name, we should end up with this many experiments. See abovemetadata_to_setup (
Union
[Callable
[[Dict
[str
,Any
]],Optional
[str
]],Dict
[str
,Callable
[[Dict
[str
,Any
]],Optional
[str
]]]]) – See aboveplot_params (
Optional
[PlotParameters
]) – Parameters controlling the plot. Can be overwritten inplot()
. SeePlotParameters
metadata_to_subplot (
Optional
[Callable
[[Dict
[str
,Any
]],Optional
[int
]]]) – See above. Optionalbenchmark_key (
Optional
[str
]) – Key for benchmark in metadata files. Defaults to “benchmark”. If this isNone
, there is only a single benchmark, and all results are merged togetherwith_subdirs (
Union
[str
,List
[str
],None
]) – See above. Defaults to “*”datetime_bounds (
Union
[Tuple
[Optional
[str
],Optional
[str
]],Dict
[str
,Tuple
[Optional
[str
],Optional
[str
]]],None
]) – See abovemetadata_keys (
Optional
[List
[str
]]) – See abovemetadata_subplot_level (
bool
) – See above. Defaults toFalse
download_from_s3 (
bool
) – Should result files be downloaded from S3? This is supported only ifwith_subdirs
s3_bucket (
Optional
[str
]) – Only ifdownload_from_s3 == True
. If not given, the default bucket for the SageMaker session is used
- metadata_values(benchmark_name=None)[source]
The nested dictionary returned has the structure
result[key][setup_name]
, orresult[key][setup_name][subplot_no]
ifmetadata_subplot_level == True
.- Parameters:
benchmark_name (
Optional
[str
]) – Name of benchmark- Return type:
Dict
[str
,Any
]- Returns:
Nested dictionary with meta-data values
- plot(benchmark_name=None, plot_params=None, file_name=None, extra_results_keys=None, dataframe_column_generator=None, one_result_per_trial=False)[source]
Create comparative plot from results of all experiments collected at construction, for benchmark
benchmark_name
(if there is a single benchmark only, this need not be given).If
plot_params.show_init_trials
is given, the best metric value curve for the data from trials<= plot_params.show_init_trials.trial_id
in a particular setupplot_params.show_init_trials.setup_name
is shown in all subplots the setup is contained in. This is useful to contrast the performance of methods against the performance for one particular trial, for example the initial configuration (i.e., to show how much this can be improved upon). The final metric value of this extra curve is extended until the end of the horizontal range, in order to make it visible. The corresponding curve is labeled withplot_params.show_init_trials.new_setup_name
in the legend.If
extra_results_keys
is given, these are column names in the result dataframe. For each setup and seed, we collect the values for the largest time stamp. We return a nested dictionaryextra_results
, so thatextra_results[setup_name][key]
contains values (over seeds), wherekey
is inextra_results_keys
. Ifmetadata_subplot_level
isTrue
andmetadata_to_subplot
is given, the structure isextra_results[setup_name][subplot_no][key]
.If
dataframe_column_generator
is given, it maps a result dataframe for a single experiment to a new column namedplot_params.metric
. This is applied before computing cumulative maximum or minimum and aggregation over seeds. This way, we can plot derived metrics which are not contained in the results as columns. Note that the transformed dataframe is not retained.- Parameters:
benchmark_name (
Optional
[str
]) – Name of benchmark for which to plot results. Not needed if there is only one benchmarkplot_params (
Optional
[PlotParameters
]) – Parameters controlling the plot. Values provided here overwrite values provided at construction.file_name (
Optional
[str
]) – If given, the figure is stored in a file of this nameextra_results_keys (
Optional
[List
[str
]]) – See above, optionaldataframe_column_generator (
Optional
[Callable
[[DataFrame
],Series
]]) – See above, optionalone_result_per_trial (
bool
) – IfTrue
, results for each experiment are filtered down to one row per trial (the one with the largest time stamp). This is useful for results from a single-fidelity method, where the training script reported results after every epoch.
- Return type:
Dict
[str
,Any
]- Returns:
Dictionary with “fig”, “axs” (for further processing). If
extra_results_keys
, “extra_results” entry as stated above