Combining a Gaussian Process Model from Components

We have already seen above how to implement a surrogate model from scratch. However, many Gaussian process models proposed in the Bayesian optimization literature are combinations of more basic underlying models. In this section, we show how such combinations are implemented in Syne Tune.

Note

When planning to implement a new Gaussian process model, you should first check whether the outcome is simply a Gaussian process with mean and covariance function arising from combinations of means and kernels of the components. If that is the case, it is often simpler and more efficient to implement a new mean and covariance function using existing code (as shown above), and to use a standard GP model with these functions.

Independent Processes for Multiple Fidelities

In this section, we will look at the example of independent, providing a surrogate model for a set of functions \(y(\mathbf{x}, r)\), where \(r\in \mathcal{R}\) is an integer from a finite set. This model is used in the context of multi-fidelity HPO. Each \(y(\mathbf{x}, r)\) is represented by an independent Gaussian process, with mean function \(\mu_r(\mathbf{x})\) and covariance function \(c_r k(\mathbf{x}, \mathbf{x}')\). The covariance function \(k\) is shared between all the processes, but the scale parameters \(c_r > 0\) are different for each process. In multi-fidelity HPO, we observe more data at smaller resource levels \(r\). Using the same ARD-parameterized kernel for all processes allows to share statistical strenght between the different levels. The code in independent follows a useful pattern:

IndependentGPPerResourcePosteriorState: Posterior state, representing the posterior distribution after conditioning on data. This is used (a) to compute the log marginal likelihood for fitting the model parameters, and (b) for predictions driving the acquisition function optimization.
IndependentGPPerResourceMarginalLikelihood: Wraps code to generate posterior state, and represents the negative log marginal likelihood function used to fit the model parameters.
IndependentGPPerResourceModel: Wraps code for creating the likelihood object. API towards higher level code.

The code of IndependentGPPerResourcePosteriorState is a simple reduction to GaussProcPosteriorState, the posterior state for a basic Gaussian process. For example, here is the code to compute the posterior state:

bayesopt/gpautograd/independent/posterior_state.py

    def _compute_states(
        self,
        features: np.ndarray,
        targets: np.ndarray,
        kernel: KernelFunction,
        mean: Dict[int, MeanFunction],
        covariance_scale: Dict[int, np.ndarray],
        noise_variance: Dict[int, np.ndarray],
        resource_attr_range: Tuple[int, int],
        debug_log: bool = False,
    ):
        features, resources = decode_extended_features(features, resource_attr_range)
        self._states = dict()
        for resource, mean_function in mean.items():
            cov_scale = covariance_scale[resource]
            rows = np.flatnonzero(resources == resource)
            if rows.size > 0:
                r_features = features[rows]
                r_targets = targets[rows]
                self._states[resource] = GaussProcPosteriorState(
                    features=r_features,
                    targets=r_targets,
                    mean=mean_function,
                    kernel=(kernel, cov_scale),
                    noise_variance=noise_variance[resource],
                    debug_log=debug_log,
                )

mean and covariance_scale are dictionaries containing \(\mu_r\) and \(c_r\) respectively.
features are extended features of the form \((\mathbf{x}_i, r_i)\). The function decode_extended_features maps this to arrays \([\mathbf{x}_i]\) and \([r_i]\).
We compute separate posterior states for each level \(r\in\mathcal{R}\), using the data \((\mathbf{x}_i, y_i)\) so that \(r_i = r\).
Other methods of the base class PosteriorStateWithSampleJoint are implemented accordingly, reducing computations to the states for each level.

The code of IndependentGPPerResourceMarginalLikelihood is obvious, given the base class MarginalLikelihood. The same holds for IndependentGPPerResourceModel, given the base class GaussianProcessOptimizeModel. One interesting feature is that the creation of the likelihood object is delayed, because the set of rung levels \(\mathcal{R}\) of the multi-fidelity scheduler need to be known. The create_likelihood method is called in configure_scheduler(), a callback function with the scheduler as argument.

Since our independent GP model implements the APIs of MarginalLikelihood and GaussianProcessOptimizeModel, we can plug it into generic code in syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model, which works as outlined above. In particular, the estimator GaussProcEmpiricalBayesEstimator accepts gp_model of type IndependentGPPerResourceModel, and it creates predictors of type GaussProcPredictor.

Overview of `gpautograd`

Most of the code in syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd adheres to the same pattern (posterior state, likelihood function, model wrapper):

Standard GP model: GaussProcPosteriorState, GaussianProcessMarginalLikelihood, GaussianProcessRegression. This also covers multi-task GP models for multi-fidelity, by way of extended configurations.
Independent GP models for multi-fidelity (example above): IndependentGPPerResourcePosteriorState, IndependentGPPerResourceMarginalLikelihood, IndependentGPPerResourceModel.
Hyper-Tune independent GP models for multi-fidelity: HyperTuneIndependentGPPosteriorState, HyperTuneIndependentGPMarginalLikelihood, HyperTuneIndependentGPModel.
Hyper-Tune multi-task GP models for multi-fidelity: HyperTuneJointGPPosteriorState, HyperTuneJointGPMarginalLikelihood, HyperTuneJointGPModel.
Linear state space learning curve models: IncrementalUpdateGPAdditivePosteriorState, GaussAdditiveMarginalLikelihood, GaussianProcessLearningCurveModel. This code is still experimental.

Combining a Gaussian Process Model from Components

Independent Processes for Multiple Fidelities

Overview of gpautograd

Overview of `gpautograd`