syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel package

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.KernelFunction(dimension, **kwargs)[source]

Bases: MeanFunction

Base class of kernel (or covariance) function math:k(x, x')

Parameters:

dimension (int) – Dimensionality of input points after encoding into ndarray

property dimension
Returns:

Dimension d of input points

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.Matern52(dimension, ARD=False, encoding_type='logarithm', has_covariance_scale=True, **kwargs)[source]

Bases: KernelFunction

Block that is responsible for the computation of Matern 5/2 kernel.

if ARD == False, inverse_bandwidths is equal to a scalar broadcast to the d components (with d = dimension, i.e., the number of features in X).

Arguments on top of base class SquaredDistance:

Parameters:

has_covariance_scale (bool) – Kernel has covariance scale parameter? Defaults to True

property ARD: bool
forward(X1, X2)[source]

Computes Matern 5/2 kernel matrix

Parameters:
  • X1 – input matrix, shape (n1,d)

  • X2 – input matrix, shape (n2,d)

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_covariance_scale()[source]
set_covariance_scale(covariance_scale)[source]
get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.ExponentialDecayResourcesKernelFunction(kernel_x, mean_x, encoding_type='logarithm', alpha_init=1.0, mean_lam_init=0.5, gamma_init=0.5, delta_fixed_value=None, delta_init=0.5, max_metric_value=1.0, **kwargs)[source]

Bases: KernelFunction

Variant of the kernel function for modeling exponentially decaying learning curves, proposed in:

Swersky, K., Snoek, J., & Adams, R. P. (2014).
Freeze-Thaw Bayesian Optimization.

The argument in that paper actually justifies using a non-zero mean function (see ExponentialDecayResourcesMeanFunction) and centralizing the kernel proposed there. This is done here. Details in:

Tiao, Klein, Archambeau, Seeger (2020)
Model-based Asynchronous Hyperparameter Optimization

We implement a new family of kernel functions, for which the additive Freeze-Thaw kernel is one instance (delta == 0). The kernel has parameters alpha, mean_lam, gamma > 0, and 0 <= delta <= 1. Note that beta = alpha / mean_lam is used in the Freeze-Thaw paper (the Gamma distribution over lambda is parameterized differently). The additive Freeze-Thaw kernel is obtained for delta == 0 (use delta_fixed_value = 0).

In fact, this class is configured with a kernel and a mean function over inputs x (dimension d) and represents a kernel (and mean function) over inputs (x, r) (dimension d + 1), where the resource attribute r >= 0 is last.

forward(X1, X2, **kwargs)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

mean_function(X)[source]
get_params()[source]

Parameter keys are “alpha”, “mean_lam”, “gamma”, “delta” (only if not fixed to delta_fixed_value), as well as those of self.kernel_x (prefix “kernelx_”) and of self.mean_x (prefix “meanx_”).

Return type:

Dict[str, Any]

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.ExponentialDecayResourcesMeanFunction(kernel, **kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.FabolasKernelFunction(dimension=1, encoding_type='logarithm', u1_init=1.0, u3_init=0.0, **kwargs)[source]

Bases: KernelFunction

The kernel function proposed in:

Klein, A., Falkner, S., Bartels, S., Hennig, P., & Hutter, np. (2016). Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets, in AISTATS 2017. ArXiv:1605.07079 [Cs, Stat]. Retrieved from http://arxiv.org/abs/1605.07079

Please note this is only one of the components of the factorized kernel proposed in the paper. This is the finite-rank (“degenerate”) kernel for modelling data subset fraction sizes. Defined as:

k(x, y) = (U phi(x))^T (U phi(y)), x, y in [0, 1], phi(x) = [1, (1 - x)^2]^T, U = [[u1, u3], [0, u2]] upper triangular, u1, u2 > 0.

forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.ProductKernelFunction(kernel1, kernel2, name_prefixes=None, **kwargs)[source]

Bases: KernelFunction

Given two kernel functions K1, K2, this class represents the product kernel function given by

\[((x_1, x_2), (y_1, y_2)) \mapsto K(x_1, y_1) \cdot K(x_2, y_2)\]

We assume that parameters of K1 and K2 are disjoint.

forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]

Note: We assume that K1 and K2 have disjoint parameters, otherwise there will be a redundancy here.

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.FreezeThawKernelFunction(kernel_x, mean_x, encoding_type='logarithm', alpha_init=1.0, mean_lam_init=0.5, gamma_init=0.5, max_metric_value=1.0, **kwargs)[source]

Bases: KernelFunction

Variant of the kernel function for modeling exponentially decaying learning curves, proposed in:

Swersky, K., Snoek, J., & Adams, R. P. (2014). Freeze-Thaw Bayesian Optimization. ArXiv:1406.3896 [Cs, Stat). Retrieved from http://arxiv.org/abs/1406.3896

The argument in that paper actually justifies using a non-zero mean function (see ExponentialDecayResourcesMeanFunction) and centralizing the kernel proposed there. This is done here.

As in the Freeze-Thaw paper, learning curves for different configs are conditionally independent.

This class is configured with a kernel and a mean function over inputs x (dimension d) and represents a kernel (and mean function) over inputs (x, r) (dimension d + 1), where the resource attribute r >= 0 is last.

Note: This kernel is mostly for debugging! Its conditional independence assumptions allow for faster inference, as implemented in GaussProcExpDecayPosteriorState.

forward(X1, X2, **kwargs)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

mean_function(X)[source]
get_params()[source]

Parameter keys are alpha, mean_lam, gamma, delta (only if not fixed to delta_fixed_value), as well as those of self.kernel_x (prefix ‘kernelx_’) and of self.mean_x (prefix ‘meanx_’).

Return type:

Dict[str, Any]

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.FreezeThawMeanFunction(kernel, **kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.CrossValidationMeanFunction(kernel, **kwargs)[source]

Bases: MeanFunction

forward(X)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.CrossValidationKernelFunction(kernel_main, kernel_residual, mean_main, num_folds, **kwargs)[source]

Bases: KernelFunction

Kernel function suitable for \(f(x, r)\) being the average of r validation metrics evaluated on different (train, validation) splits.

More specifically, there are ‘num_folds`` such splits, and \(f(x, r)\) is the average over the first r of them.

We model the score on fold k as \(e_k(x) = f(x) + g_k(x)\), where \(f(x)\) and the \(g_k(x)\) are a priori independent Gaussian processes with kernels kernel_main and kernel_residual (all \(g_k\) share the same kernel). Moreover, the \(g_k\) are zero-mean, while \(f(x)\) may have a mean function. Then:

\[ \begin{align}\begin{aligned}f(x, r) = r^{-1} sum_{k \le r} e_k(x),\\k((x, r), (x', r')) = k_{main}(x, x') + \frac{k_{residual}(x, x')}{\mathrm{max}(r, r')}\end{aligned}\end{align} \]

Note that kernel_main, kernel_residual are over inputs \(x\) (dimension d), while the kernel represented here is over inputs \((x, r)\) of dimension d + 1, where the resource attribute \(r\) (number of folds) is last.

Inputs are encoded. We assume a linear encoding for r with bounds 1 and num_folds. TODO: Right now, all HPs are encoded, and the resource attribute counts as HP, even if it is not optimized over. This creates a dependence to how inputs are encoded.

forward(X1, X2, **kwargs)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]
Returns list of tuples

(param_internal, encoding)

over all Gluon parameters maintained here.

Returns:

List [(param_internal, encoding)]

mean_function(X)[source]
get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

class syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.kernel.RangeKernelFunction(dimension, kernel, start, **kwargs)[source]

Bases: KernelFunction

Given kernel function K and range R, this class represents

\[(x, y) \mapsto K(x_R, y_R)\]
forward(X1, X2)[source]

Overrides to implement forward computation using NDArray. Only accepts positional arguments. Parameters ———- *args : list of NDArray

Input tensors.

diagonal(X)[source]
Parameters:

X – Input data, shape (n, d)

Returns:

Diagonal of \(k(X, X)\), shape (n,)

diagonal_depends_on_X()[source]

For stationary kernels, diagonal does not depend on X

Returns:

Does diagonal() depend on X?

param_encoding_pairs()[source]

Note: We assume that K1 and K2 have disjoint parameters, otherwise there will be a redundancy here.

get_params()[source]
Return type:

Dict[str, Any]

Returns:

Dictionary with hyperparameter values

set_params(param_dict)[source]
Parameters:

param_dict (Dict[str, Any]) – Dictionary with new hyperparameter values

Returns:

Submodules