Combining a Gaussian Process Model from Components
We have already seen above how to implement a surrogate model from scratch. However, many Gaussian process models proposed in the Bayesian optimization literature are combinations of more basic underlying models. In this section, we show how such combinations are implemented in Syne Tune.
Note
When planning to implement a new Gaussian process model, you should first check whether the outcome is simply a Gaussian process with mean and covariance function arising from combinations of means and kernels of the components. If that is the case, it is often simpler and more efficient to implement a new mean and covariance function using existing code (as shown above), and to use a standard GP model with these functions.
Independent Processes for Multiple Fidelities
In this section, we will look at the example of
independent
,
providing a surrogate model for a set of functions
\(y(\mathbf{x}, r)\), where \(r\in \mathcal{R}\) is an integer from a
finite set. This model is used in the context of
multi-fidelity HPO.
Each \(y(\mathbf{x}, r)\) is represented by an independent Gaussian process,
with mean function \(\mu_r(\mathbf{x})\) and covariance function
\(c_r k(\mathbf{x}, \mathbf{x}')\). The covariance function \(k\) is
shared between all the processes, but the scale parameters \(c_r > 0\) are
different for each process. In multi-fidelity HPO, we observe more data at
smaller resource levels \(r\). Using the same ARD-parameterized kernel for
all processes allows to share statistical strenght between the different
levels. The code in
independent
follows a useful pattern:
IndependentGPPerResourcePosteriorState
: Posterior state, representing the posterior distribution after conditioning on data. This is used (a) to compute the log marginal likelihood for fitting the model parameters, and (b) for predictions driving the acquisition function optimization.IndependentGPPerResourceMarginalLikelihood
: Wraps code to generate posterior state, and represents the negative log marginal likelihood function used to fit the model parameters.IndependentGPPerResourceModel
: Wraps code for creating the likelihood object. API towards higher level code.
The code of
IndependentGPPerResourcePosteriorState
is a simple reduction to
GaussProcPosteriorState
,
the posterior state for a basic Gaussian process. For example, here is the code
to compute the posterior state:
def _compute_states(
self,
features: np.ndarray,
targets: np.ndarray,
kernel: KernelFunction,
mean: Dict[int, MeanFunction],
covariance_scale: Dict[int, np.ndarray],
noise_variance: Dict[int, np.ndarray],
resource_attr_range: Tuple[int, int],
debug_log: bool = False,
):
features, resources = decode_extended_features(features, resource_attr_range)
self._states = dict()
for resource, mean_function in mean.items():
cov_scale = covariance_scale[resource]
rows = np.flatnonzero(resources == resource)
if rows.size > 0:
r_features = features[rows]
r_targets = targets[rows]
self._states[resource] = GaussProcPosteriorState(
features=r_features,
targets=r_targets,
mean=mean_function,
kernel=(kernel, cov_scale),
noise_variance=noise_variance[resource],
debug_log=debug_log,
)
mean
andcovariance_scale
are dictionaries containing \(\mu_r\) and \(c_r\) respectively.features
are extended features of the form \((\mathbf{x}_i, r_i)\). The functiondecode_extended_features
maps this to arrays \([\mathbf{x}_i]\) and \([r_i]\).We compute separate posterior states for each level \(r\in\mathcal{R}\), using the data \((\mathbf{x}_i, y_i)\) so that \(r_i = r\).
Other methods of the base class
PosteriorStateWithSampleJoint
are implemented accordingly, reducing computations to the states for each level.
The code of
IndependentGPPerResourceMarginalLikelihood
is obvious, given the base class
MarginalLikelihood
.
The same holds for
IndependentGPPerResourceModel
,
given the base class
GaussianProcessOptimizeModel
.
One interesting feature is that the creation of the likelihood object is
delayed, because the set of rung levels \(\mathcal{R}\) of the multi-fidelity
scheduler need to be known. The create_likelihood
method is called in
configure_scheduler()
,
a callback function with the scheduler as argument.
Since our independent GP model implements the APIs of
MarginalLikelihood
and
GaussianProcessOptimizeModel
,
we can plug it into generic code in syne_tune.optimizer.schedulers.searchers.bayesopt.models.gp_model
,
which works as outlined
above.
In particular, the estimator
GaussProcEmpiricalBayesEstimator
accepts gp_model
of type
IndependentGPPerResourceModel
,
and it creates predictors of type
GaussProcPredictor
.
Overview of gpautograd
Most of the code in
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd
adheres to
the same pattern (posterior state, likelihood function, model wrapper):
Standard GP model:
GaussProcPosteriorState
,GaussianProcessMarginalLikelihood
,GaussianProcessRegression
. This also covers multi-task GP models for multi-fidelity, by way of extended configurations.Independent GP models for multi-fidelity (example above):
IndependentGPPerResourcePosteriorState
,IndependentGPPerResourceMarginalLikelihood
,IndependentGPPerResourceModel
.Hyper-Tune independent GP models for multi-fidelity:
HyperTuneIndependentGPPosteriorState
,HyperTuneIndependentGPMarginalLikelihood
,HyperTuneIndependentGPModel
.Hyper-Tune multi-task GP models for multi-fidelity:
HyperTuneJointGPPosteriorState
,HyperTuneJointGPMarginalLikelihood
,HyperTuneJointGPModel
.Linear state space learning curve models:
IncrementalUpdateGPAdditivePosteriorState
,GaussAdditiveMarginalLikelihood
,GaussianProcessLearningCurveModel
. This code is still experimental.