syne_tune.optimizer.schedulers.searchers.gp_fifo_searcher module

class syne_tune.optimizer.schedulers.searchers.gp_fifo_searcher.GPFIFOSearcher(config_space, metric, points_to_evaluate=None, clone_from_state=False, **kwargs)[source]

Bases: BayesianOptimizationSearcher

Gaussian process Bayesian optimization for FIFO scheduler

This searcher must be used with FIFOScheduler. It provides Bayesian optimization, based on a Gaussian process surrogate model.

It is not recommended creating GPFIFOSearcher searcher objects directly, but rather to create FIFOScheduler objects with searcher="bayesopt", and passing arguments here in search_options. This will use the appropriate functions from :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory to create components in a consistent way.

Most of the implementation is generic in BayesianOptimizationSearcher.

Note: If metric values are to be maximized (mode-"max" in scheduler), the searcher uses map_reward to map metric values to internal criterion values, and minimizes the latter. The default choice is to multiply values by -1.

Pending configurations (for which evaluation tasks are currently running) are dealt with by fantasizing (i.e., target values are drawn from the current posterior, and acquisition functions are averaged over this sample, see num_fantasy_samples).

The GP surrogate model uses a Matern 5/2 covariance function with automatic relevance determination (ARD) of input attributes, and a constant mean function. The acquisition function is expected improvement (EI). All hyperparameters of the surrogate model are estimated by empirical Bayes (maximizing the marginal likelihood). In general, this hyperparameter fitting is the most expensive part of a get_config() call.

Note that the full logic of construction based on arguments is given in :mod:syne_tune.optimizer.schedulers.searchers.gp_searcher_factory. In particular, see gp_fifo_searcher_defaults() for default values.

Additional arguments on top of parent class StochasticSearcher:

Parameters:
  • clone_from_state (bool) – Internal argument, do not use

  • resource_attr (str, optional) – Name of resource attribute in reports. This is optional here, but required for multi-fidelity searchers. If resource_attr and cost_attr are given, cost values are read from each report and stored in the state. This allows cost models to be fit on more data.

  • cost_attr (str, optional) – Name of cost attribute in data obtained from reporter (e.g., elapsed training time). Needed only by cost-aware searchers. Depending on whether resource_attr is given, cost values are read from each report or only at the end.

  • num_init_random (int, optional) – Number of initial get_config() calls for which randomly sampled configs are returned. Afterwards, the model-based searcher is used. Defaults to DEFAULT_NUM_INITIAL_RANDOM_EVALUATIONS

  • num_init_candidates (int, optional) – Number of initial candidates sampled at random in order to seed the model-based search in get_config. Defaults to DEFAULT_NUM_INITIAL_CANDIDATES

  • num_fantasy_samples (int, optional) – Number of samples drawn for fantasizing (latent target values for pending evaluations), defaults to 20

  • no_fantasizing (bool, optional) – If True, fantasizing is not done and pending evaluations are ignored. This may lead to loss of diversity in decisions. Defaults to False

  • input_warping (bool, optional) – If True, we use a warping transform, so the kernel function becomes \(k(w(x), w(x'))\), where \(w(x)\) is a warping transform parameterized by two non-negative numbers per component, which are learned as hyperparameters. See also Warping. Coordinates which belong to categorical hyperparameters, are not warped. Defaults to False.

  • boxcox_transform (bool, optional) – If True, target values are transformed before being fitted with a Gaussian marginal likelihood. This is using the Box-Cox transform with a parameter \(\lambda\), which is learned alongside other parameters of the surrogate model. The transform is \(\log y\) for \(\lambda = 0\), and \(y - 1\) for \(\lambda = 1\). This option requires the targets to be positive. Defaults to False.

  • gp_base_kernel (str, optional) – Selects the covariance (or kernel) function to be used. Supported choices are SUPPORTED_BASE_MODELS. Defaults to “matern52-ard” (Matern 5/2 with automatic relevance determination).

  • acq_function (str, optional) – Selects the acquisition function to be used. Supported choices are SUPPORTED_ACQUISITION_FUNCTIONS. Defaults to “ei” (expected improvement acquisition function).

  • acq_function_kwargs (dict, optional) – Some acquisition functions have additional parameters, they can be passed here. If none are given, default values are used.

  • initial_scoring (str, optional) –

    Scoring function to rank initial candidates (local optimization of EI is started from top scorer):

    • ”thompson_indep”: Independent Thompson sampling; randomized score, which can increase exploration

    • ”acq_func”: score is the same (EI) acquisition function which is used for local optimization afterwards

    Defaults to DEFAULT_INITIAL_SCORING

  • skip_local_optimization (bool, optional) – If True, the local gradient-based optimization of the acquisition function is skipped, and the top-ranked initial candidate (after initial scoring) is returned instead. In this case, initial_scoring="acq_func" makes most sense, otherwise the acquisition function will not be used. Defaults to False

  • opt_nstarts (int, optional) – Parameter for surrogate model fitting. Number of random restarts. Defaults to 2

  • opt_maxiter (int, optional) – Parameter for surrogate model fitting. Maximum number of iterations per restart. Defaults to 50

  • opt_warmstart (bool, optional) – Parameter for surrogate model fitting. If True, each fitting is started from the previous optimum. Not recommended in general. Defaults to False

  • opt_verbose (bool, optional) – Parameter for surrogate model fitting. If True, lots of output. Defaults to False

  • max_size_data_for_model (int, optional) – If this is set, we limit the number of observations the surrogate model is fitted on this value. If there are more observations, they are down sampled, see SubsampleSingleFidelityStateConverter for details. This down sampling is repeated every time the model is fit. The opt_skip_* predicates are evaluated before the state is downsampled. Pass None not to apply such a threshold. The default is DEFAULT_MAX_SIZE_DATA_FOR_MODEL.

  • max_size_top_fraction (float, optional) – Only used if max_size_data_for_model is set. This fraction of the down sampled set is filled with the top entries in the full set, the remaining ones are sampled at random from the full set, see SubsampleSingleFidelityStateConverter for details. Defaults to 0.25.

  • opt_skip_init_length (int, optional) – Parameter for surrogate model fitting, skip predicate. Fitting is never skipped as long as number of observations below this threshold. Defaults to 150

  • opt_skip_period (int, optional) – Parameter for surrogate model fitting, skip predicate. If >1, and number of observations above opt_skip_init_length, fitting is done only K-th call, and skipped otherwise. Defaults to 1 (no skipping)

  • allow_duplicates (bool, optional) – If True, get_config() may return the same configuration more than once. Defaults to False

  • restrict_configurations (List[dict], optional) – If given, the searcher only suggests configurations from this list. This needs skip_local_optimization == True. If allow_duplicates == False, entries are popped off this list once suggested.

  • map_reward (str or MapReward, optional) –

    In the scheduler, the metric may be minimized or maximized, but internally, Bayesian optimization is minimizing the criterion. map_reward converts from metric to internal criterion:

    • ”minus_x”: criterion = -metric

    • ”<a>_minus_x”: criterion = <a> - metric. For example “1_minus_x” maps accuracy to zero-one error

    From a technical standpoint, it does not matter what is chosen here, because criterion is only used internally. Also note that criterion data is always normalized to mean 0, variance 1 before fitted with a Gaussian process. Defaults to “1_minus_x”

  • transfer_learning_task_attr (str, optional) – Used to support transfer HPO, where the state contains observed data from several tasks, one of which is the active one. To this end, config_space must contain a categorical parameter of name transfer_learning_task_attr, whose range are all task IDs. Also, transfer_learning_active_task must denote the active task, and transfer_learning_active_config_space is used as active_config_space argument in HyperparameterRanges. This allows us to use a narrower search space for the active task than for the union of all tasks (config_space must be that), which is needed if some configurations of non-active tasks lie outside of the ranges in active_config_space. One of the implications is that filter_observed_data() is selecting configs of the active task, so that incumbents or exclusion lists are restricted to data from the active task.

  • transfer_learning_active_task (str, optional) – See transfer_learning_task_attr.

  • transfer_learning_active_config_space (Dict[str, Any], optional) – See transfer_learning_task_attr. If not given, config_space is the search space for the active task as well. This active config space need not contain the transfer_learning_task_attr parameter. In fact, this parameter is set to a categorical with transfer_learning_active_task as single value, so that new configs are chosen for the active task only.

  • transfer_learning_model (str, optional) –

    See transfer_learning_task_attr. Specifies the surrogate model to be used for transfer learning:

    • ”matern52_product”: Kernel is product of Matern 5/2 (not ARD) on transfer_learning_task_attr and Matern 5/2 (ARD) on the rest. Assumes that data from same task are more closely related than data from different tasks

    • ”matern52_same”: Kernel is Matern 5/2 (ARD) on the rest of the variables, transfer_learning_task_attr is ignored. Assumes that data from all tasks can be merged together

    Defaults to “matern52_product”

clone_from_state(state)[source]

Together with get_state(), this is needed in order to store and re-create the mutable state of the searcher.

Given state as returned by get_state(), this method combines the non-pickle-able part of the immutable state from self with state and returns the corresponding searcher clone. Afterwards, self is not used anymore.

Parameters:

state – See above

Returns:

New searcher object