syne_tune.blackbox_repository.blackbox_surrogate module
- class syne_tune.blackbox_repository.blackbox_surrogate.Columns(names=None)[source]
Bases:
BaseEstimator,TransformerMixin
- class syne_tune.blackbox_repository.blackbox_surrogate.BlackboxSurrogate(X, y, configuration_space, objectives_names, fidelity_space=None, fidelity_values=None, surrogate=None, predict_curves=False, num_seeds=1, fit_differences=None, max_fit_samples=None, name=None)[source]
Bases:
BlackboxFits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation. To wrap an existing blackbox with a surrogate estimator, use
add_surrogate()which automatically extractX,ymatrices from available blackbox evaluations.The surrogate regression model is provided by
surrogate, it has to conform to the scikit-learn fit-predict API. Ifpredict_curvesisTrue, the model maps features of the configuration to the whole curve over fidelities, separate for each metric and seed. This has several advantages. First, predictions are consistent: if all curves in the data respect a certain property which is retained under convex combinations, predictions have this property as well (examples: positivity, monotonicity). This is important forelapsed_timemetrics. The regression models are also fairly compact, and prediction is fast,max_fit_samplesis normally not needed.If
predict_curvesisFalse,the model maps features from configuration and fidelity to metric values (univariate regression). In this case, properties like monotonicity are not retained. Also, training can take long and the trained models can be large.This difference only matters if there are fidelities. Otherwise, regression is always univariate.
If
num_seedsis given, we maintain different surrogate models for each seed. Otherwise, a single surrogate model is fit to data across all seeds.If
fit_differencesis given, it contains names of objectives which are cumulative sums. For these objectives, theydata is transformed to finite differences before fitting the model. This is recommended forelapsed_timeobjectives. This feature only matters if there are fidelities.Additional arguments on top of parent class
Blackbox:- Parameters:
X (
DataFrame) – dataframe containing hyperparameters values. Shape is(num_seeds * num_evals, num_hps)ifpredict_curvesisTrue,(num_fidelities * num_seeds * num_evals, num_hps)otherwisey (
DataFrame) – dataframe containing objectives values. Shape is(num_seeds * num_evals, num_fidelities * num_objectives)ifpredict_curvesisTrue, and(num_fidelities * num_seeds * num_evals, num_objectives)otherwisesurrogate – the model that is fitted to predict objectives given any configuration, default to KNeighborsRegressor(n_neighbors=1). If
predict_curvesisTrue, this must be multi-variate regression, i.e. accept target matrices infit, where columns correspond to fidelities. Regression models from scikit-learn allow for that. Possible examples:KNeighborsRegressor(n_neighbors=1),MLPRegressor()or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows inXto vectors. We use the configuration_space hyperparameters types to deduce the types of columns inX(for instance,Categoricalvalues are one-hot encoded).predict_curves (
bool) – See above. Default isFalse(backwards compatible)num_seeds (
int) – See abovefit_differences (
Optional[List[str]]) – See abovemax_fit_samples (
Optional[int]) – maximum number of samples to be fed to the surrogate estimator, if the more data points than this number are passed, then they are subsampled without replacement. Ifnum_seedsis used, this is a limit on the data per seedname (
Optional[str])
- property fidelity_values: array | None
- Returns:
Fidelity values; or None if the blackbox has none
- property num_fidelities: int
- static make_model_pipeline(configuration_space, fidelity_space, model, predict_curves=False)[source]
Create feature pipeline for scikit-learn model
- Parameters:
configuration_space – Configuration space
fidelity_space – Fidelity space
model – Scikit-learn model
predict_curves – Predict full curves?
- Returns:
Feature pipeline
- fit_surrogate(X, y)[source]
Fits a surrogate model to data from a blackbox. Here, the targets
ycan be a matrix with the number of columns equal to the number of fidelity values (thepredict_curves = Truecase).- Return type:
- hyperparameter_objectives_values(predict_curves=False)[source]
If
predict_curvesis False, the shape ofXis(num_evals * num_seeds * num_fidelities, num_hps + 1), the shape ofyis(num_evals * num_seeds * num_fidelities, num_objectives). This can be reshaped to(num_fidelities, num_seeds, num_evals, *). The final column ofXis the fidelity value (only a single fidelity attribute is supported).If
predict_curvesis True, the shape ofXis(num_evals * num_seeds, num_hps), the shape ofyis(num_evals * num_seeds, num_fidelities * num_objectives). The latter can be reshaped to(num_seeds, num_evals, num_fidelities, num_objectives).- Return type:
Tuple[DataFrame,DataFrame]- Returns:
a tuple of two dataframes
(X, y), whereXcontains hyperparameters values andycontains objective values, this is used when fitting a surrogate model.
- syne_tune.blackbox_repository.blackbox_surrogate.add_surrogate(blackbox, surrogate=None, configuration_space=None, predict_curves=None, separate_seeds=False, fit_differences=None)[source]
Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation.
- Parameters:
blackbox (
Blackbox) – the blackbox must implementhyperparameter_objectives_values()so that input/output are passed to estimate the modelsurrogate – the model that is fitted to predict objectives given any configuration. Possible examples:
KNeighborsRegressor(n_neighbors=1),MLPRegressor()or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows inXto vectors. We useconfiguration_spaceto deduce the types of columns inX(categorical parameters are one-hot encoded).configuration_space (
Optional[dict]) – configuration space for the resulting blackbox surrogate. The default isblackbox.configuration_space. But note that ifblackboxis tabular, the domains inblackbox.configuration_spaceare typically categorical even for numerical parameters.predict_curves (
Optional[bool]) – If True, the surrogate uses multivariate regression to predict metric curves over fidelities. If False, fidelity is used as input. The latter can lead to inconsistent predictions along fidelity and is typically more expensive. If not given, the default value isFalseifblackboxis of typeBlackboxOffline, otherwiseTrue.separate_seeds (
bool) – IfTrue, seeds inblackboxmap to seeds in the surrogate blackbox, which fits different models to each seed. IfFalse, the data fromblackboxis merged for all seeds, and the surrogate represents a single seed. The latter provides more data for the surrogate model to be fit, but the variation between seeds is lost in the surrogate. Defaults toFalse.fit_differences (
Optional[List[str]]) – Names of objectives which are cumulative sums. For these objectives, theydata is transformed to finite differences before fitting the model. This is recommended forelapsed_timeobjectives.
- Returns:
a blackbox where the output is obtained through the fitted surrogate