syne_tune.optimizer.schedulers.searchers.gp_fifo_searcher module
- class syne_tune.optimizer.schedulers.searchers.gp_fifo_searcher.GPFIFOSearcher(config_space, metric, points_to_evaluate=None, clone_from_state=False, **kwargs)[source]
Bases:
BayesianOptimizationSearcher
Gaussian process Bayesian optimization for FIFO scheduler
This searcher must be used with
FIFOScheduler
. It provides Bayesian optimization, based on a Gaussian process surrogate model.It is not recommended creating
GPFIFOSearcher
searcher objects directly, but rather to createFIFOScheduler
objects withsearcher="bayesopt"
, and passing arguments here insearch_options
. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
to create components in a consistent way.Most of the implementation is generic in
BayesianOptimizationSearcher
.Note: If metric values are to be maximized (
mode-"max"
in scheduler), the searcher usesmap_reward
to map metric values to internal criterion values, and minimizes the latter. The default choice is to multiply values by -1.Pending configurations (for which evaluation tasks are currently running) are dealt with by fantasizing (i.e., target values are drawn from the current posterior, and acquisition functions are averaged over this sample, see
num_fantasy_samples
).The GP surrogate model uses a Matern 5/2 covariance function with automatic relevance determination (ARD) of input attributes, and a constant mean function. The acquisition function is expected improvement (EI). All hyperparameters of the surrogate model are estimated by empirical Bayes (maximizing the marginal likelihood). In general, this hyperparameter fitting is the most expensive part of a
get_config()
call.Note that the full logic of construction based on arguments is given in :mod:
syne_tune.optimizer.schedulers.searchers.gp_searcher_factory
. In particular, seegp_fifo_searcher_defaults()
for default values.Additional arguments on top of parent class
StochasticSearcher
:- Parameters:
clone_from_state (bool) – Internal argument, do not use
resource_attr (str, optional) – Name of resource attribute in reports. This is optional here, but required for multi-fidelity searchers. If
resource_attr
andcost_attr
are given, cost values are read from each report and stored in the state. This allows cost models to be fit on more data.cost_attr (str, optional) – Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Needed only by cost-aware searchers. Depending on whether
resource_attr
is given, cost values are read from each report or only at the end.num_init_random (int, optional) – Number of initial
get_config()
calls for which randomly sampled configs are returned. Afterwards, the model-based searcher is used. Defaults toDEFAULT_NUM_INITIAL_RANDOM_EVALUATIONS
num_init_candidates (int, optional) – Number of initial candidates sampled at random in order to seed the model-based search in
get_config
. Defaults toDEFAULT_NUM_INITIAL_CANDIDATES
num_fantasy_samples (int, optional) – Number of samples drawn for fantasizing (latent target values for pending evaluations), defaults to 20
no_fantasizing (bool, optional) – If True, fantasizing is not done and pending evaluations are ignored. This may lead to loss of diversity in decisions. Defaults to
False
input_warping (bool, optional) – If
True
, we use a warping transform, so the kernel function becomes \(k(w(x), w(x'))\), where \(w(x)\) is a warping transform parameterized by two non-negative numbers per component, which are learned as hyperparameters. See alsoWarping
. Coordinates which belong to categorical hyperparameters, are not warped. Defaults toFalse
.boxcox_transform (bool, optional) – If
True
, target values are transformed before being fitted with a Gaussian marginal likelihood. This is using the Box-Cox transform with a parameter \(\lambda\), which is learned alongside other parameters of the surrogate model. The transform is \(\log y\) for \(\lambda = 0\), and \(y - 1\) for \(\lambda = 1\). This option requires the targets to be positive. Defaults toFalse
.gp_base_kernel (str, optional) – Selects the covariance (or kernel) function to be used. Supported choices are
SUPPORTED_BASE_MODELS
. Defaults to “matern52-ard” (Matern 5/2 with automatic relevance determination).acq_function (str, optional) – Selects the acquisition function to be used. Supported choices are
SUPPORTED_ACQUISITION_FUNCTIONS
. Defaults to “ei” (expected improvement acquisition function).acq_function_kwargs (dict, optional) – Some acquisition functions have additional parameters, they can be passed here. If none are given, default values are used.
initial_scoring (str, optional) –
Scoring function to rank initial candidates (local optimization of EI is started from top scorer):
”thompson_indep”: Independent Thompson sampling; randomized score, which can increase exploration
”acq_func”: score is the same (EI) acquisition function which is used for local optimization afterwards
Defaults to
DEFAULT_INITIAL_SCORING
skip_local_optimization (bool, optional) – If
True
, the local gradient-based optimization of the acquisition function is skipped, and the top-ranked initial candidate (after initial scoring) is returned instead. In this case,initial_scoring="acq_func"
makes most sense, otherwise the acquisition function will not be used. Defaults to Falseopt_nstarts (int, optional) – Parameter for surrogate model fitting. Number of random restarts. Defaults to 2
opt_maxiter (int, optional) – Parameter for surrogate model fitting. Maximum number of iterations per restart. Defaults to 50
opt_warmstart (bool, optional) – Parameter for surrogate model fitting. If
True
, each fitting is started from the previous optimum. Not recommended in general. Defaults toFalse
opt_verbose (bool, optional) – Parameter for surrogate model fitting. If
True
, lots of output. Defaults toFalse
max_size_data_for_model (int, optional) – If this is set, we limit the number of observations the surrogate model is fitted on this value. If there are more observations, they are down sampled, see
SubsampleSingleFidelityStateConverter
for details. This down sampling is repeated every time the model is fit. Theopt_skip_*
predicates are evaluated before the state is downsampled. PassNone
not to apply such a threshold. The default isDEFAULT_MAX_SIZE_DATA_FOR_MODEL
.max_size_top_fraction (float, optional) – Only used if
max_size_data_for_model
is set. This fraction of the down sampled set is filled with the top entries in the full set, the remaining ones are sampled at random from the full set, seeSubsampleSingleFidelityStateConverter
for details. Defaults to 0.25.opt_skip_init_length (int, optional) – Parameter for surrogate model fitting, skip predicate. Fitting is never skipped as long as number of observations below this threshold. Defaults to 150
opt_skip_period (int, optional) – Parameter for surrogate model fitting, skip predicate. If
>1
, and number of observations aboveopt_skip_init_length
, fitting is done only K-th call, and skipped otherwise. Defaults to 1 (no skipping)allow_duplicates (bool, optional) – If
True
,get_config()
may return the same configuration more than once. Defaults toFalse
restrict_configurations (List[dict], optional) – If given, the searcher only suggests configurations from this list. This needs
skip_local_optimization == True
. Ifallow_duplicates == False
, entries are popped off this list once suggested.map_reward (str or
MapReward
, optional) –In the scheduler, the metric may be minimized or maximized, but internally, Bayesian optimization is minimizing the criterion.
map_reward
converts from metric to internal criterion:”minus_x”:
criterion = -metric
”<a>_minus_x”:
criterion = <a> - metric
. For example “1_minus_x” maps accuracy to zero-one error
From a technical standpoint, it does not matter what is chosen here, because criterion is only used internally. Also note that criterion data is always normalized to mean 0, variance 1 before fitted with a Gaussian process. Defaults to “1_minus_x”
transfer_learning_task_attr (str, optional) – Used to support transfer HPO, where the state contains observed data from several tasks, one of which is the active one. To this end,
config_space
must contain a categorical parameter of nametransfer_learning_task_attr
, whose range are all task IDs. Also,transfer_learning_active_task
must denote the active task, andtransfer_learning_active_config_space
is used asactive_config_space
argument inHyperparameterRanges
. This allows us to use a narrower search space for the active task than for the union of all tasks (config_space
must be that), which is needed if some configurations of non-active tasks lie outside of the ranges inactive_config_space
. One of the implications is thatfilter_observed_data()
is selecting configs of the active task, so that incumbents or exclusion lists are restricted to data from the active task.transfer_learning_active_task (str, optional) – See
transfer_learning_task_attr
.transfer_learning_active_config_space (Dict[str, Any], optional) – See
transfer_learning_task_attr
. If not given,config_space
is the search space for the active task as well. This active config space need not contain thetransfer_learning_task_attr
parameter. In fact, this parameter is set to a categorical withtransfer_learning_active_task
as single value, so that new configs are chosen for the active task only.transfer_learning_model (str, optional) –
See
transfer_learning_task_attr
. Specifies the surrogate model to be used for transfer learning:”matern52_product”: Kernel is product of Matern 5/2 (not ARD) on
transfer_learning_task_attr
and Matern 5/2 (ARD) on the rest. Assumes that data from same task are more closely related than data from different tasks”matern52_same”: Kernel is Matern 5/2 (ARD) on the rest of the variables,
transfer_learning_task_attr
is ignored. Assumes that data from all tasks can be merged together
Defaults to “matern52_product”
- clone_from_state(state)[source]
Together with
get_state()
, this is needed in order to store and re-create the mutable state of the searcher.Given state as returned by
get_state()
, this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards,self
is not used anymore.- Parameters:
state – See above
- Returns:
New searcher object