syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils module

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.cholesky_computations(features, targets, mean, kernel, noise_variance, debug_log=False)[source]

Given input matrix X (features), target matrix Y (targets), mean and kernel function, compute posterior state {L, P}, where L is the Cholesky factor of

k(X, X) + sigsq_final * I

and

L P = Y - mean(X)

Here, sigsq_final >= noise_variance is minimal such that the Cholesky factorization does not fail.

Parameters:
  • features – Input matrix X (n, d)

  • targets – Target matrix Y (n, m)

  • mean (MeanFunction) – Mean function

  • kernel (Union[KernelFunction, Tuple[KernelFunction, ndarray]]) – Kernel function, or tuple

  • noise_variance – Noise variance (may be increased)

  • debug_log (bool) – Debug output during add_jitter CustomOp?

Returns:

L, P

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.predict_posterior_marginals(features, mean, kernel, chol_fact, pred_mat, test_features)[source]

Computes posterior means and variances for test_features. If pred_mat is a matrix, so will be posterior_means, but not posterior_variances. Reflects the fact that for GP regression and fixed hyperparameters, the posterior mean depends on the targets y, but the posterior covariance does not.

Parameters:
  • features – Training inputs

  • mean (MeanFunction) – Mean function

  • kernel (Union[KernelFunction, Tuple[KernelFunction, ndarray]]) – Kernel function, or tuple

  • chol_fact – Part L of posterior state

  • pred_mat – Part P of posterior state

  • test_features – Test inputs

Returns:

posterior_means, posterior_variances

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.sample_posterior_marginals(features, mean, kernel, chol_fact, pred_mat, test_features, random_state, num_samples=1)[source]

Draws num_sample samples from the product of marginals of the posterior over input points test_features. If pred_mat is a matrix with m columns, the samples returned have shape (n_test, m, num_samples).

Parameters:
  • features – Training inputs

  • mean (MeanFunction) – Mean function

  • kernel (Union[KernelFunction, Tuple[KernelFunction, ndarray]]) – Kernel function, or tuple

  • chol_fact – Part L of posterior state

  • pred_mat – Part P of posterior state

  • test_features – Test inputs

  • num_samples (int) – Number of samples to draw

Returns:

Samples, shape (n_test, num_samples) or (n_test, m, num_samples)

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.sample_posterior_joint(features, mean, kernel, chol_fact, pred_mat, test_features, random_state, num_samples=1)[source]

Draws num_sample samples from joint posterior distribution over inputs test_features. This is done by computing mean and covariance matrix of this posterior, and using the Cholesky decomposition of the latter. If pred_mat is a matrix with m columns, the samples returned have shape (n_test, m, num_samples).

Parameters:
  • features – Training inputs

  • mean (MeanFunction) – Mean function

  • kernel (Union[KernelFunction, Tuple[KernelFunction, ndarray]]) – Kernel function, or tuple

  • chol_fact – Part L of posterior state

  • pred_mat – Part P of posterior state

  • test_features – Test inputs

  • num_samples (int) – Number of samples to draw

Returns:

Samples, shape (n_test, num_samples) or (n_test, m, num_samples)

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.cholesky_update(features, mean, kernel, chol_fact, pred_mat, noise_variance, feature, target, lvec=None)[source]

Incremental update of posterior state (Cholesky factor, prediction matrix), given one datapoint (feature, target).

Note: noise_variance is the initial value, before any jitter may have been added to compute chol_fact. Here, we add the minimum amount of jitter such that the new diagonal entry of the Cholesky factor is >= MIN_CHOLESKY_DIAGONAL_VALUE. This means that if cholesky_update is used several times, we in fact add a diagonal (but not spherical) jitter matrix.

Parameters:
  • features – Shape (n, d)

  • chol_fact – Shape (n, n)

  • pred_mat – Shape (n, m)

  • mean (MeanFunction) –

  • kernel (Union[KernelFunction, Tuple[KernelFunction, ndarray]]) –

  • noise_variance

  • feature – Shape (1, d)

  • target – Shape (1, m)

  • lvec – If given, this is the new column of the Cholesky factor except the diagonal entry. If not, this is computed here

Returns:

chol_fact_new (n+1, n+1), pred_mat_new (n+1, m)

syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.sample_and_cholesky_update(features, mean, kernel, chol_fact, pred_mat, noise_variance, feature, random_state, mean_impute_mask=None)[source]
syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.negative_log_marginal_likelihood(chol_fact, pred_mat)[source]

The marginal likelihood is only computed if pred_mat has a single column (not for fantasy sample case).