syne_tune.blackbox_repository.blackbox_surrogate module
- class syne_tune.blackbox_repository.blackbox_surrogate.Columns(names=None)[source]
Bases:
BaseEstimator
,TransformerMixin
- class syne_tune.blackbox_repository.blackbox_surrogate.BlackboxSurrogate(X, y, configuration_space, objectives_names, fidelity_space=None, fidelity_values=None, surrogate=None, predict_curves=False, num_seeds=1, fit_differences=None, max_fit_samples=None, name=None)[source]
Bases:
Blackbox
Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation. To wrap an existing blackbox with a surrogate estimator, use
add_surrogate()
which automatically extractX
,y
matrices from available blackbox evaluations.The surrogate regression model is provided by
surrogate
, it has to conform to the scikit-learn fit-predict API. Ifpredict_curves
isTrue
, the model maps features of the configuration to the whole curve over fidelities, separate for each metric and seed. This has several advantages. First, predictions are consistent: if all curves in the data respect a certain property which is retained under convex combinations, predictions have this property as well (examples: positivity, monotonicity). This is important forelapsed_time
metrics. The regression models are also fairly compact, and prediction is fast,max_fit_samples
is normally not needed.If
predict_curves
isFalse,
the model maps features from configuration and fidelity to metric values (univariate regression). In this case, properties like monotonicity are not retained. Also, training can take long and the trained models can be large.This difference only matters if there are fidelities. Otherwise, regression is always univariate.
If
num_seeds
is given, we maintain different surrogate models for each seed. Otherwise, a single surrogate model is fit to data across all seeds.If
fit_differences
is given, it contains names of objectives which are cumulative sums. For these objectives, they
data is transformed to finite differences before fitting the model. This is recommended forelapsed_time
objectives. This feature only matters if there are fidelities.Additional arguments on top of parent class
Blackbox
:- Parameters:
X (
DataFrame
) – dataframe containing hyperparameters values. Shape is(num_seeds * num_evals, num_hps)
ifpredict_curves
isTrue
,(num_fidelities * num_seeds * num_evals, num_hps)
otherwisey (
DataFrame
) – dataframe containing objectives values. Shape is(num_seeds * num_evals, num_fidelities * num_objectives)
ifpredict_curves
isTrue
, and(num_fidelities * num_seeds * num_evals, num_objectives)
otherwisesurrogate – the model that is fitted to predict objectives given any configuration, default to KNeighborsRegressor(n_neighbors=1). If
predict_curves
isTrue
, this must be multi-variate regression, i.e. accept target matrices infit
, where columns correspond to fidelities. Regression models from scikit-learn allow for that. Possible examples:KNeighborsRegressor(n_neighbors=1)
,MLPRegressor()
or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows inX
to vectors. We use the configuration_space hyperparameters types to deduce the types of columns inX
(for instance,Categorical
values are one-hot encoded).predict_curves (
bool
) – See above. Default isFalse
(backwards compatible)num_seeds (
int
) – See abovefit_differences (
Optional
[List
[str
]]) – See abovemax_fit_samples (
Optional
[int
]) – maximum number of samples to be fed to the surrogate estimator, if the more data points than this number are passed, then they are subsampled without replacement. Ifnum_seeds
is used, this is a limit on the data per seedname (
Optional
[str
]) –
- property fidelity_values: array | None
- Returns:
Fidelity values; or None if the blackbox has none
- property num_fidelities: int
- static make_model_pipeline(configuration_space, fidelity_space, model, predict_curves=False)[source]
Create feature pipeline for scikit-learn model
- Parameters:
configuration_space – Configuration space
fidelity_space – Fidelity space
model – Scikit-learn model
predict_curves – Predict full curves?
- Returns:
Feature pipeline
- fit_surrogate(X, y)[source]
Fits a surrogate model to data from a blackbox. Here, the targets
y
can be a matrix with the number of columns equal to the number of fidelity values (thepredict_curves = True
case).- Return type:
- hyperparameter_objectives_values(predict_curves=False)[source]
If
predict_curves
is False, the shape ofX
is(num_evals * num_seeds * num_fidelities, num_hps + 1)
, the shape ofy
is(num_evals * num_seeds * num_fidelities, num_objectives)
. This can be reshaped to(num_fidelities, num_seeds, num_evals, *)
. The final column ofX
is the fidelity value (only a single fidelity attribute is supported).If
predict_curves
is True, the shape ofX
is(num_evals * num_seeds, num_hps)
, the shape ofy
is(num_evals * num_seeds, num_fidelities * num_objectives)
. The latter can be reshaped to(num_seeds, num_evals, num_fidelities, num_objectives)
.- Return type:
Tuple
[DataFrame
,DataFrame
]- Returns:
a tuple of two dataframes
(X, y)
, whereX
contains hyperparameters values andy
contains objective values, this is used when fitting a surrogate model.
- syne_tune.blackbox_repository.blackbox_surrogate.add_surrogate(blackbox, surrogate=None, configuration_space=None, predict_curves=None, separate_seeds=False, fit_differences=None)[source]
Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation.
- Parameters:
blackbox (
Blackbox
) – the blackbox must implementhyperparameter_objectives_values()
so that input/output are passed to estimate the modelsurrogate – the model that is fitted to predict objectives given any configuration. Possible examples:
KNeighborsRegressor(n_neighbors=1)
,MLPRegressor()
or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows inX
to vectors. We useconfiguration_space
to deduce the types of columns inX
(categorical parameters are one-hot encoded).configuration_space (
Optional
[dict
]) – configuration space for the resulting blackbox surrogate. The default isblackbox.configuration_space
. But note that ifblackbox
is tabular, the domains inblackbox.configuration_space
are typically categorical even for numerical parameters.predict_curves (
Optional
[bool
]) – If True, the surrogate uses multivariate regression to predict metric curves over fidelities. If False, fidelity is used as input. The latter can lead to inconsistent predictions along fidelity and is typically more expensive. If not given, the default value isFalse
ifblackbox
is of typeBlackboxOffline
, otherwiseTrue
.separate_seeds (
bool
) – IfTrue
, seeds inblackbox
map to seeds in the surrogate blackbox, which fits different models to each seed. IfFalse
, the data fromblackbox
is merged for all seeds, and the surrogate represents a single seed. The latter provides more data for the surrogate model to be fit, but the variation between seeds is lost in the surrogate. Defaults toFalse
.fit_differences (
Optional
[List
[str
]]) – Names of objectives which are cumulative sums. For these objectives, they
data is transformed to finite differences before fitting the model. This is recommended forelapsed_time
objectives.
- Returns:
a blackbox where the output is obtained through the fitted surrogate