syne_tune.optimizer.schedulers.pbt module

class syne_tune.optimizer.schedulers.pbt.PBTTrialState(trial, last_score=None, last_checkpoint=None, last_perturbation_time=0, stopped=False)[source]

Bases: object

Internal PBT state tracked per-trial.

trial: Trial
last_score: float = None
last_checkpoint: int = None
last_perturbation_time: int = 0
stopped: bool = False
class syne_tune.optimizer.schedulers.pbt.PopulationBasedTraining(config_space, metric, resource_attr, max_t=100, custom_explore_fn=None, do_minimize=True, random_seed=None, population_size=4, perturbation_interval=60, quantile_fraction=0.25, resample_probability=0.25, searcher_kwargs=None)[source]

Bases: TrialScheduler

Implements the Population Based Training (PBT) algorithm. This is an adapted version of the Ray Tune implementation:

https://docs.ray.io/en/latest/tune/tutorials/tune-advanced-tutorial.html

PBT was originally presented in the following paper:

Jaderberg et. al.
Population Based Training of Neural Networks

Population based training (PBT) maintains a population of models spread across an asynchronous set of workers and dynamically adjust their hyperparameters during training. Every time a worker reaches a user-defined milestone, it returns the performance of the currently evaluated network. If the network is within the top percentile of the population, the worker resumes its training until the next milestone. If not, PBT selects a model from the top percentile uniformly at random. The worker now continues with the latest checkpoint of this new model but mutates the hyperparameters.

The mutation happens as following. For each hyperparameter, we either resample its value uniformly at random, or otherwise increment (multiply by 1.2) or decrement (multiply by 0.8) the value (probability 0.5 each). For categorical hyperparameters, the value is always resampled uniformly.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for the evaluation function.

  • metric (str) – Name of metric to optimize, key in results obtained via on_trial_result.

  • resource_attr (str) – Name of resource attribute in results obtained via on_trial_result, defaults to “time_total_s”

  • max_t (int) – max time units per trial. Trials will be stopped after max_t time units (determined by time_attr) have passed. Defaults to 100

  • custom_explore_fn (Optional[Callable[[dict], dict]]) – Custom exploration function. This function is invoked as f(config) instead of the built-in perturbations, and should return config updated as needed. If this is given, resample_probability is not used

  • do_minimize (Optional[bool]) – If True, we minimize the objective function specified by metric . Defaults to True.

  • random_seed (Optional[int]) – Seed for initializing random number generators.

  • population_size (int) – Size of the population, defaults to 4

  • perturbation_interval (int) – Models will be considered for perturbation at this interval of resource_attr. Note that perturbation incurs checkpoint overhead, so you shouldn’t set this to be too frequent. Defaults to 60

  • quantile_fraction (float) – Parameters are transferred from the top quantile_fraction fraction of trials to the bottom quantile_fraction fraction. Needs to be between 0 and 0.5. Setting it to 0 essentially implies doing no exploitation at all. Defaults to 0.25

  • resample_probability (float) – The probability of resampling from the original distribution when applying _explore(). If not resampled, the value will be perturbed by a factor of 1.2 or 0.8 if continuous, or changed to an adjacent value if discrete. Defaults to 0.25

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict[str, Any]) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

suggest()[source]

Returns a suggestion for a new trial, or one to be resumed

This method returns suggestion of type TrialSuggestion (unless there is no config left to explore, and None is returned).

If suggestion.spawn_new_trial_id is True, a new trial is to be started with config suggestion.config. Typically, this new trial is started from scratch. But if suggestion.checkpoint_trial_id is given, the trial is to be (warm)started from the checkpoint written for the trial with this ID. The new trial has ID trial_id.

If suggestion.spawn_new_trial_id is False, an existing and currently paused trial is to be resumed, whose ID is suggestion.checkpoint_trial_id. If this trial has a checkpoint, we start from there. In this case, suggestion.config is optional. If not given (default), the config of the resumed trial does not change. Otherwise, its config is overwritten by suggestion.config (see HyperbandScheduler with type="promotion" for an example why this can be useful).

Apart from the HP config, additional fields can be appended to the dict, these are passed to the trial function as well.

Return type:

Optional[TrialSuggestion]

Returns:

Suggestion for a trial to be started or to be resumed, see above. If no suggestion can be made, None is returned

metadata()[source]
Return type:

Dict[str, Any]

Returns:

Metadata for the scheduler

metric_names()[source]
Return type:

List[str]

metric_mode()[source]
Return type:

str