syne_tune.blackbox_repository.simulated_tabular_backend module
- syne_tune.blackbox_repository.simulated_tabular_backend.make_surrogate(surrogate=None, surrogate_kwargs=None)[source]
Creates surrogate model (scikit-learn estimater)
- Parameters:
surrogate (
Optional
[str
]) – A model that is fitted to predict objectives given any configuration. Possible examples: “KNeighborsRegressor”, MLPRegressor”, “XGBRegressor”, which would enable using the corresponding scikit-learn estimator. The model is fit on top of pipeline that applies basic feature-processing to convert hyperparameters rows in X to vectors. Theconfiguration_space
hyperparameters types are used to deduce the types of columns in X (for instance, categorical hyperparameters are one-hot encoded).surrogate_kwargs (
Optional
[dict
]) – Arguments for the scikit-learn estimator, for instance{"n_neighbors": 1}
can be used ifsurrogate="KNeighborsRegressor"
is chosen.
- Returns:
Scikit-learn estimator representing surrogate model
- class syne_tune.blackbox_repository.simulated_tabular_backend.BlackboxRepositoryBackend(blackbox_name, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, dataset=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, config_space_surrogate=None, **simulatorbackend_kwargs)[source]
Bases:
_BlackboxSimulatorBackend
Allows to simulate a blackbox from blackbox-repository, selected by
blackbox_name
. Seeexamples/launch_simulated_benchmark.py
for an example on how to use. If you want to add a new dataset, see the Adding a new dataset section ofsyne_tune/blackbox_repository/README.md
.In each result reported to the simulator backend, the value for key
elapsed_time_attr
must be the time since the start of the evaluation. For example, if resource (or fidelity) equates to epochs trained, this would be the time from start of training until the end of the epoch. If the blackbox contains this information in a column,elapsed_time_attr
should be its key.If this backend is used with pause-and-resume multi-fidelity scheduling, it needs to track at which resource level each trial is paused. Namely, once a trial is resumed, all results for resources smaller or equal to that level are ignored, which simulates the situation that training is resumed from a checkpoint. This feature relies on
result
to be passed topause_trial()
. If this is not done, the backend cannot know from which resource level to resume a trial, so it starts the trial from scratch (which is equivalent to no checkpointing). The same happens ifsupport_checkpointing
is False.Note
If the blackbox maintains cumulative time (elapsed_time), this is different from what
SimulatorBackend
requires forelapsed_time_attr
, if a pause-and-resume scheduler is used. Namely, the backend requires the time since the start of the last recent resume. This conversion is done here internally in_run_job_and_collect_results()
, which is called for each resume. This means that the fieldelapsed_time_attr
is not what is received from the blackbox table, but instead what the backend needs.max_resource_attr
plays the same role as inHyperbandScheduler
. If given, it is the key in a configurationconfig
for the maximum resource. This is used by schedulers which limit each evaluation by setting this argument (e.g., promotion-based Hyperband).If
seed
is given, entries of the blackbox are queried for this seed. Otherwise, a seed is drawn at random for every trial, but the same seed is used for all_run_job_and_collect_results()
calls for the same trial. This is important for pause and resume scheduling.- Parameters:
blackbox_name (
str
) – Name of a blackbox, must have been registered in blackbox repository.elapsed_time_attr (
str
) – Name of the column containing cumulative timemax_resource_attr (
Optional
[str
]) – See aboveseed (
Optional
[int
]) – If given, this seed is used for all trial evaluations. Otherwise, seed is sampled at random for each trial. Only relevant for blackboxes with multiple seedssupport_checkpointing (
bool
) – IfFalse
, the simulation does not do checkpointing, so resumed trials are started from scratch. Defaults toTrue
dataset (
Optional
[str
]) – Selects different versions of the blackbox (typically, the same ML model has been trained on different datasets)surrogate (
Optional
[str
]) – Optionally, a model that is fitted to predict objectives given any configuration. Examples: “KNeighborsRegressor”, “MLPRegressor”, “XGBRegressor”, which would enable using the corresponding scikit-learn estimator, see alsomake_surrogate()
. The model is fit on top of pipeline that applies basic feature-processing to convert hyperparameter rows in X to vectors. Theconfiguration_space
hyperparameter types are used to deduce the types of columns in X (for instance, categorical hyperparameters are one-hot encoded).surrogate_kwargs (
Optional
[dict
]) – Arguments for the scikit-learn estimator, for instance{"n_neighbors": 1}
can be used ifsurrogate="KNeighborsRegressor"
is chosen. Ifblackbox_name
is a YAHPO blackbox, thensurrogate_kwargs
is passed asyahpo_kwargs
toload_blackbox()
. In this case,surrogate
is ignored (YAHPO always uses surrogates).config_space_surrogate (
Optional
[dict
]) – Ifsurrogate
is given, this is the configuration space for the surrogate blackbox. If not given, the space of the original blackbox is used. However, its numerical parameters have finite domains (categorical or ordinal), which is usually not what we want for a surrogate.simulatorbackend_kwargs – Additional arguments to parent
SimulatorBackend
- class syne_tune.blackbox_repository.simulated_tabular_backend.UserBlackboxBackend(blackbox, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, **simulatorbackend_kwargs)[source]
Bases:
_BlackboxSimulatorBackend
Version of
_BlackboxSimulatorBackend
, where the blackbox is given as explicitBlackbox
object. Seeexamples/launch_simulated_benchmark.py
for an example on how to use.Additional arguments on top of parent
_BlackboxSimulatorBackend
:- Parameters:
blackbox (
Blackbox
) – Blackbox to be used for simulation