syne_tune.blackbox_repository.blackbox_surrogate module

class syne_tune.blackbox_repository.blackbox_surrogate.Columns(names=None)[source]

Bases: BaseEstimator, TransformerMixin

fit(*args, **kwargs)[source]

transform(X)[source]

class syne_tune.blackbox_repository.blackbox_surrogate.BlackboxSurrogate(X, y, configuration_space, objectives_names, fidelity_space=None, fidelity_values=None, surrogate=None, predict_curves=False, num_seeds=1, fit_differences=None, max_fit_samples=None, name=None)[source]

Bases: Blackbox

Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation. To wrap an existing blackbox with a surrogate estimator, use add_surrogate() which automatically extract X, y matrices from available blackbox evaluations.

The surrogate regression model is provided by surrogate, it has to conform to the scikit-learn fit-predict API. If predict_curves is True, the model maps features of the configuration to the whole curve over fidelities, separate for each metric and seed. This has several advantages. First, predictions are consistent: if all curves in the data respect a certain property which is retained under convex combinations, predictions have this property as well (examples: positivity, monotonicity). This is important for elapsed_time metrics. The regression models are also fairly compact, and prediction is fast, max_fit_samples is normally not needed.

If predict_curves is False, the model maps features from configuration and fidelity to metric values (univariate regression). In this case, properties like monotonicity are not retained. Also, training can take long and the trained models can be large.

This difference only matters if there are fidelities. Otherwise, regression is always univariate.

If num_seeds is given, we maintain different surrogate models for each seed. Otherwise, a single surrogate model is fit to data across all seeds.

If fit_differences is given, it contains names of objectives which are cumulative sums. For these objectives, the y data is transformed to finite differences before fitting the model. This is recommended for elapsed_time objectives. This feature only matters if there are fidelities.

Additional arguments on top of parent class Blackbox:

Parameters:

X (DataFrame) – dataframe containing hyperparameters values. Shape is (num_seeds * num_evals, num_hps) if predict_curves is True, (num_fidelities * num_seeds * num_evals, num_hps) otherwise
y (DataFrame) – dataframe containing objectives values. Shape is (num_seeds * num_evals, num_fidelities * num_objectives) if predict_curves is True, and (num_fidelities * num_seeds * num_evals, num_objectives) otherwise
surrogate – the model that is fitted to predict objectives given any configuration, default to KNeighborsRegressor(n_neighbors=1). If predict_curves is True, this must be multi-variate regression, i.e. accept target matrices in fit, where columns correspond to fidelities. Regression models from scikit-learn allow for that. Possible examples: KNeighborsRegressor(n_neighbors=1), MLPRegressor() or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows in X to vectors. We use the configuration_space hyperparameters types to deduce the types of columns in X (for instance, Categorical values are one-hot encoded).
predict_curves (bool) – See above. Default is False (backwards compatible)
num_seeds (int) – See above
fit_differences (Optional[List[str]]) – See above
max_fit_samples (Optional[int]) – maximum number of samples to be fed to the surrogate estimator, if the more data points than this number are passed, then they are subsampled without replacement. If num_seeds is used, this is a limit on the data per seed
name (Optional[str]) –

property fidelity_values: array | None

Returns:: Fidelity values; or None if the blackbox has none

property num_fidelities: int

static make_model_pipeline(configuration_space, fidelity_space, model, predict_curves=False)[source]

Create feature pipeline for scikit-learn model

Parameters:

configuration_space – Configuration space
fidelity_space – Fidelity space
model – Scikit-learn model
predict_curves – Predict full curves?

Returns:

Feature pipeline

fit_surrogate(X, y)[source]

Fits a surrogate model to data from a blackbox. Here, the targets y can be a matrix with the number of columns equal to the number of fidelity values (the predict_curves = True case).

Return type:: Blackbox

hyperparameter_objectives_values(predict_curves=False)[source]

If predict_curves is False, the shape of X is (num_evals * num_seeds * num_fidelities, num_hps + 1), the shape of y is (num_evals * num_seeds * num_fidelities, num_objectives). This can be reshaped to (num_fidelities, num_seeds, num_evals, *). The final column of X is the fidelity value (only a single fidelity attribute is supported).

If predict_curves is True, the shape of X is (num_evals * num_seeds, num_hps), the shape of y is (num_evals * num_seeds, num_fidelities * num_objectives). The latter can be reshaped to (num_seeds, num_evals, num_fidelities, num_objectives).

Return type:: Tuple[DataFrame, DataFrame]
Returns:: a tuple of two dataframes (X, y), where X contains hyperparameters values and y contains objective values, this is used when fitting a surrogate model.

syne_tune.blackbox_repository.blackbox_surrogate.add_surrogate(blackbox, surrogate=None, configuration_space=None, predict_curves=None, separate_seeds=False, fit_differences=None)[source]

Fits a blackbox surrogates that can be evaluated anywhere, which can be useful for supporting interpolation/extrapolation.

Parameters:

blackbox (Blackbox) – the blackbox must implement hyperparameter_objectives_values() so that input/output are passed to estimate the model
surrogate – the model that is fitted to predict objectives given any configuration. Possible examples: KNeighborsRegressor(n_neighbors=1), MLPRegressor() or any estimator obeying Scikit-learn API. The model is fit on top of pipeline that applies basic feature-processing to convert rows in X to vectors. We use configuration_space to deduce the types of columns in X (categorical parameters are one-hot encoded).
configuration_space (Optional[dict]) – configuration space for the resulting blackbox surrogate. The default is blackbox.configuration_space. But note that if blackbox is tabular, the domains in blackbox.configuration_space are typically categorical even for numerical parameters.
predict_curves (Optional[bool]) – If True, the surrogate uses multivariate regression to predict metric curves over fidelities. If False, fidelity is used as input. The latter can lead to inconsistent predictions along fidelity and is typically more expensive. If not given, the default value is False if blackbox is of type BlackboxOffline, otherwise True.
separate_seeds (bool) – If True, seeds in blackbox map to seeds in the surrogate blackbox, which fits different models to each seed. If False, the data from blackbox is merged for all seeds, and the surrogate represents a single seed. The latter provides more data for the surrogate model to be fit, but the variation between seeds is lost in the surrogate. Defaults to False.
fit_differences (Optional[List[str]]) – Names of objectives which are cumulative sums. For these objectives, the y data is transformed to finite differences before fitting the model. This is recommended for elapsed_time objectives.

Returns:

a blackbox where the output is obtained through the fitted surrogate