syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils module
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.cholesky_computations(features, targets, mean, kernel, noise_variance, debug_log=False)[source]
Given input matrix X (features), target matrix Y (targets), mean and kernel function, compute posterior state {L, P}, where L is the Cholesky factor of
k(X, X) + sigsq_final * I
- and
L P = Y - mean(X)
Here, sigsq_final >= noise_variance is minimal such that the Cholesky factorization does not fail.
- Parameters:
features – Input matrix X (n, d)
targets – Target matrix Y (n, m)
mean (
MeanFunction
) – Mean functionkernel (
Union
[KernelFunction
,Tuple
[KernelFunction
,ndarray
]]) – Kernel function, or tuplenoise_variance – Noise variance (may be increased)
debug_log (
bool
) – Debug output during add_jitter CustomOp?
- Returns:
L, P
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.predict_posterior_marginals(features, mean, kernel, chol_fact, pred_mat, test_features)[source]
Computes posterior means and variances for test_features. If pred_mat is a matrix, so will be posterior_means, but not posterior_variances. Reflects the fact that for GP regression and fixed hyperparameters, the posterior mean depends on the targets y, but the posterior covariance does not.
- Parameters:
features – Training inputs
mean (
MeanFunction
) – Mean functionkernel (
Union
[KernelFunction
,Tuple
[KernelFunction
,ndarray
]]) – Kernel function, or tuplechol_fact – Part L of posterior state
pred_mat – Part P of posterior state
test_features – Test inputs
- Returns:
posterior_means, posterior_variances
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.sample_posterior_marginals(features, mean, kernel, chol_fact, pred_mat, test_features, random_state, num_samples=1)[source]
Draws num_sample samples from the product of marginals of the posterior over input points test_features. If pred_mat is a matrix with m columns, the samples returned have shape (n_test, m, num_samples).
- Parameters:
features – Training inputs
mean (
MeanFunction
) – Mean functionkernel (
Union
[KernelFunction
,Tuple
[KernelFunction
,ndarray
]]) – Kernel function, or tuplechol_fact – Part L of posterior state
pred_mat – Part P of posterior state
test_features – Test inputs
num_samples (
int
) – Number of samples to draw
- Returns:
Samples, shape (n_test, num_samples) or (n_test, m, num_samples)
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.sample_posterior_joint(features, mean, kernel, chol_fact, pred_mat, test_features, random_state, num_samples=1)[source]
Draws num_sample samples from joint posterior distribution over inputs test_features. This is done by computing mean and covariance matrix of this posterior, and using the Cholesky decomposition of the latter. If pred_mat is a matrix with m columns, the samples returned have shape (n_test, m, num_samples).
- Parameters:
features – Training inputs
mean (
MeanFunction
) – Mean functionkernel (
Union
[KernelFunction
,Tuple
[KernelFunction
,ndarray
]]) – Kernel function, or tuplechol_fact – Part L of posterior state
pred_mat – Part P of posterior state
test_features – Test inputs
num_samples (
int
) – Number of samples to draw
- Returns:
Samples, shape (n_test, num_samples) or (n_test, m, num_samples)
- syne_tune.optimizer.schedulers.searchers.bayesopt.gpautograd.posterior_utils.cholesky_update(features, mean, kernel, chol_fact, pred_mat, noise_variance, feature, target, lvec=None)[source]
Incremental update of posterior state (Cholesky factor, prediction matrix), given one datapoint (feature, target).
Note: noise_variance is the initial value, before any jitter may have been added to compute chol_fact. Here, we add the minimum amount of jitter such that the new diagonal entry of the Cholesky factor is >= MIN_CHOLESKY_DIAGONAL_VALUE. This means that if cholesky_update is used several times, we in fact add a diagonal (but not spherical) jitter matrix.
- Parameters:
features – Shape (n, d)
chol_fact – Shape (n, n)
pred_mat – Shape (n, m)
mean (
MeanFunction
) –kernel (
Union
[KernelFunction
,Tuple
[KernelFunction
,ndarray
]]) –noise_variance –
feature – Shape (1, d)
target – Shape (1, m)
lvec – If given, this is the new column of the Cholesky factor except the diagonal entry. If not, this is computed here
- Returns:
chol_fact_new (n+1, n+1), pred_mat_new (n+1, m)