syne_tune.blackbox_repository package
- class syne_tune.blackbox_repository.BlackboxOffline(df_evaluations, configuration_space, fidelity_space=None, objectives_names=None, seed_col=None)[source]
Bases:
BlackboxA blackbox obtained given offline evaluations. Each row of the dataframe should contain one evaluation given a fixed configuration, fidelity and seed. The columns must correspond to the provided configuration and fidelity space, by default all columns that are prefixed by
"metric_"are assumed to be metrics but this can be overridden by providing metric columns.Additional arguments on top of parent class
Blackbox:- Parameters:
df_evaluations (
DataFrame) – Data frame with evaluations dataseed_col (
Optional[str]) – optional, can be used when multiple seeds are recorded
- hyperparameter_objectives_values(predict_curves=False)[source]
If
predict_curvesis False, the shape ofXis(num_evals * num_seeds * num_fidelities, num_hps + 1), the shape ofyis(num_evals * num_seeds * num_fidelities, num_objectives). This can be reshaped to(num_fidelities, num_seeds, num_evals, *). The final column ofXis the fidelity value (only a single fidelity attribute is supported).If
predict_curvesis True, the shape ofXis(num_evals * num_seeds, num_hps), the shape ofyis(num_evals * num_seeds, num_fidelities * num_objectives). The latter can be reshaped to(num_seeds, num_evals, num_fidelities, num_objectives).- Returns:
a tuple of two dataframes
(X, y), whereXcontains hyperparameters values andycontains objective values, this is used when fitting a surrogate model.
- syne_tune.blackbox_repository.deserialize(path)[source]
- Parameters:
path (
str) – where to find blackbox serialized information (at least data.csv.zip and configspace.json)groupby_col – separate evaluations into a list of blackbox with different task if the column is provided
- Return type:
Union[Dict[str,BlackboxOffline],BlackboxOffline]- Returns:
list of blackboxes per task, or single blackbox in the case of a single task
- syne_tune.blackbox_repository.load_blackbox(name, custom_repo_id=None, yahpo_kwargs=None, local_files_only=False, force_download=False, **snapshot_download_kwargs)[source]
- Parameters:
name (
str) –name of a blackbox present in the repository, see
blackbox_list()to get list of available blackboxes. Syne Tune currently provides the following blackboxes evaluations:”nasbench201”: 15625 multi-fidelity configurations of computer vision architectures evaluated on 3 datasets. NAS-Bench-201: Extending the scope of reproducible neural architecture search. Dong, X. and Yang, Y. 2020.
”fcnet”: 62208 multi-fidelity configurations of MLP evaluated on 4 datasets. Tabular benchmarks for joint architecture and hyperparameter optimization. Klein, A. and Hutter, F. 2019.
”lcbench”: 2000 multi-fidelity Pytorch model configurations evaluated on many datasets. Reference: Auto-PyTorch: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL. Lucas Zimmer, Marius Lindauer, Frank Hutter. 2020.
”icml-deepar”: 2420 single-fidelity configurations of DeepAR forecasting algorithm evaluated on 10 datasets. A quantile-based approach for hyperparameter transfer learning. Salinas, D., Shen, H., and Perrone, V. 2021.
”pd1”: 23 multi-fidelity benchmarks for hyperparameter optimization of neural networks for image classification Pre-trained Gaussian processes for Bayesian optimization. Wang, Z. and Dahl G. and Swersky K. and Lee C. and Nado Z. and Gilmer J. and Snoek J. and Ghahramani Z. 2021.
”icml-xgboost”: 5O00 single-fidelity configurations of XGBoost evaluated on 9 datasets. A quantile-based approach for hyperparameter transfer learning. Salinas, D., Shen, H., and Perrone, V. 2021.
”yahpo-*”: Number of different benchmarks from YAHPO Gym. Note that these blackboxes come with surrogates already, so no need to wrap them into
SurrogateBlackbox”hpob_*”: ca. 6.34 million evaluations distributed on 16 search spaces and 101 datasets. HPO-B: A Large-Scale Reproducible Benchmark for Black-Box HPO based on OpenML. S. Arango, H. Jomaa, M. Wistuba, J. Grabocka, 2021.
”tabrepo-*”: TabRepo contains the predictions and metrics of 1530 models evaluated on 211 classification and regression datasets. TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications. D. Salinas, N. Erickson, 2024.
custom_repo_id (
Optional[str]) – custom hugging face repoid to use, default to Syne Tune hubyahpo_kwargs (
Optional[dict]) – For a YAHPO blackbox (name == "yahpo-*"), these are additional arguments toinstantiate_yahpolocal_files_only (
bool) – whether to use local files with no internet check on the Hubforce_download (
bool) – forces files to be downloadedsnapshot_download_kwargs – keyword arguments for
snapshot_download(other than local_files_only and force_download)
- Return type:
- Returns:
blackbox with the given name, download it if not present.
- syne_tune.blackbox_repository.blackbox_list()[source]
- Return type:
List[str]- Returns:
list of blackboxes available
- syne_tune.blackbox_repository.add_surrogate(blackbox, surrogate=None, configuration_space=None, predict_curves=None, separate_seeds=False, fit_differences=None)[source]
Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation.
- Parameters:
blackbox (
Blackbox) – the blackbox must implementhyperparameter_objectives_values()so that input/output are passed to estimate the modelsurrogate – the model that is fitted to predict objectives given any configuration. Possible examples:
KNeighborsRegressor(n_neighbors=1),MLPRegressor()or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows inXto vectors. We useconfiguration_spaceto deduce the types of columns inX(categorical parameters are one-hot encoded).configuration_space (
Optional[dict]) – configuration space for the resulting blackbox surrogate. The default isblackbox.configuration_space. But note that ifblackboxis tabular, the domains inblackbox.configuration_spaceare typically categorical even for numerical parameters.predict_curves (
Optional[bool]) – If True, the surrogate uses multivariate regression to predict metric curves over fidelities. If False, fidelity is used as input. The latter can lead to inconsistent predictions along fidelity and is typically more expensive. If not given, the default value isFalseifblackboxis of typeBlackboxOffline, otherwiseTrue.separate_seeds (
bool) – IfTrue, seeds inblackboxmap to seeds in the surrogate blackbox, which fits different models to each seed. IfFalse, the data fromblackboxis merged for all seeds, and the surrogate represents a single seed. The latter provides more data for the surrogate model to be fit, but the variation between seeds is lost in the surrogate. Defaults toFalse.fit_differences (
Optional[List[str]]) – Names of objectives which are cumulative sums. For these objectives, theydata is transformed to finite differences before fitting the model. This is recommended forelapsed_timeobjectives.
- Returns:
a blackbox where the output is obtained through the fitted surrogate
- class syne_tune.blackbox_repository.BlackboxRepositoryBackend(blackbox_name, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, dataset=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, config_space_surrogate=None, **simulatorbackend_kwargs)[source]
Bases:
_BlackboxSimulatorBackendAllows to simulate a blackbox from blackbox-repository, selected by
blackbox_name. Seeexamples/launch_simulated_benchmark.pyfor an example on how to use. If you want to add a new dataset, see the Adding a new dataset section ofsyne_tune/blackbox_repository/README.md.In each result reported to the simulator backend, the value for key
elapsed_time_attrmust be the time since the start of the evaluation. For example, if resource (or fidelity) equates to epochs trained, this would be the time from start of training until the end of the epoch. If the blackbox contains this information in a column,elapsed_time_attrshould be its key.If this backend is used with pause-and-resume multi-fidelity scheduling, it needs to track at which resource level each trial is paused. Namely, once a trial is resumed, all results for resources smaller or equal to that level are ignored, which simulates the situation that training is resumed from a checkpoint. This feature relies on
resultto be passed topause_trial(). If this is not done, the backend cannot know from which resource level to resume a trial, so it starts the trial from scratch (which is equivalent to no checkpointing). The same happens ifsupport_checkpointingis False.Note
If the blackbox maintains cumulative time (elapsed_time), this is different from what
SimulatorBackendrequires forelapsed_time_attr, if a pause-and-resume scheduler is used. Namely, the backend requires the time since the start of the last recent resume. This conversion is done here internally in_run_job_and_collect_results(), which is called for each resume. This means that the fieldelapsed_time_attris not what is received from the blackbox table, but instead what the backend needs.max_resource_attrplays the same role as inHyperbandScheduler. If given, it is the key in a configurationconfigfor the maximum resource. This is used by schedulers which limit each evaluation by setting this argument (e.g., promotion-based Hyperband).If
seedis given, entries of the blackbox are queried for this seed. Otherwise, a seed is drawn at random for every trial, but the same seed is used for all_run_job_and_collect_results()calls for the same trial. This is important for pause and resume scheduling.- Parameters:
blackbox_name (
str) – Name of a blackbox, must have been registered in blackbox repository.elapsed_time_attr (
str) – Name of the column containing cumulative timemax_resource_attr (
Optional[str]) – See aboveseed (
Optional[int]) – If given, this seed is used for all trial evaluations. Otherwise, seed is sampled at random for each trial. Only relevant for blackboxes with multiple seedssupport_checkpointing (
bool) – IfFalse, the simulation does not do checkpointing, so resumed trials are started from scratch. Defaults toTruedataset (
Optional[str]) – Selects different versions of the blackbox (typically, the same ML model has been trained on different datasets)surrogate (
Optional[str]) – Optionally, a model that is fitted to predict objectives given any configuration. Examples: “KNeighborsRegressor”, “MLPRegressor”, “XGBRegressor”, which would enable using the corresponding scikit-learn estimator, see alsomake_surrogate(). The model is fit on top of pipeline that applies basic feature-processing to convert hyperparameter rows in X to vectors. Theconfiguration_spacehyperparameter types are used to deduce the types of columns in X (for instance, categorical hyperparameters are one-hot encoded).surrogate_kwargs (
Optional[dict]) – Arguments for the scikit-learn estimator, for instance{"n_neighbors": 1}can be used ifsurrogate="KNeighborsRegressor"is chosen. Ifblackbox_nameis a YAHPO blackbox, thensurrogate_kwargsis passed asyahpo_kwargstoload_blackbox(). In this case,surrogateis ignored (YAHPO always uses surrogates).config_space_surrogate (
Optional[dict]) – Ifsurrogateis given, this is the configuration space for the surrogate blackbox. If not given, the space of the original blackbox is used. However, its numerical parameters have finite domains (categorical or ordinal), which is usually not what we want for a surrogate.simulatorbackend_kwargs – Additional arguments to parent
SimulatorBackend
- class syne_tune.blackbox_repository.UserBlackboxBackend(blackbox, elapsed_time_attr, max_resource_attr=None, seed=None, support_checkpointing=True, **simulatorbackend_kwargs)[source]
Bases:
_BlackboxSimulatorBackendVersion of
_BlackboxSimulatorBackend, where the blackbox is given as explicitBlackboxobject. Seeexamples/launch_simulated_benchmark.pyfor an example on how to use.Additional arguments on top of parent
_BlackboxSimulatorBackend:- Parameters:
blackbox (
Blackbox) – Blackbox to be used for simulation
Subpackages
- syne_tune.blackbox_repository.conversion_scripts package
- Subpackages
- Submodules
Submodules
- syne_tune.blackbox_repository.blackbox module
- syne_tune.blackbox_repository.blackbox_offline module
- syne_tune.blackbox_repository.blackbox_surrogate module
- syne_tune.blackbox_repository.blackbox_tabular module
- syne_tune.blackbox_repository.repository module
- syne_tune.blackbox_repository.serialize module
- syne_tune.blackbox_repository.simulated_tabular_backend module
- syne_tune.blackbox_repository.utils module