syne_tune.optimizer.schedulers package

class syne_tune.optimizer.schedulers.FIFOScheduler(config_space, **kwargs)[source]

Bases: TrialSchedulerWithSearcher

Scheduler which executes trials in submission order.

This is the most basic scheduler template. It can be configured to many use cases by choosing searcher along with search_options.

Parameters:
  • config_space (Dict[str, Any]) – Configuration space for evaluation function

  • searcher (str or BaseSearcher) – Searcher for get_config decisions. String values are passed to searcher_factory() along with search_options and extra information. Supported values: SUPPORTED_SEARCHERS_FIFO. Defaults to “random” (i.e., random search)

  • search_options (Dict[str, Any], optional) – If searcher is str, these arguments are passed to searcher_factory()

  • metric (str or List[str]) – Name of metric to optimize, key in results obtained via on_trial_result. For multi-objective schedulers, this can also be a list

  • mode (str or List[str], optional) – “min” if metric is minimized, “max” if metric is maximized, defaults to “min”. This can also be a list if metric is a list

  • points_to_evaluate (List[dict], optional) – List of configurations to be evaluated initially (in that order). Each config in the list can be partially specified, or even be an empty dict. For each hyperparameter not specified, the default value is determined using a midpoint heuristic. If not given, this is mapped to [dict()], a single default config determined by the midpoint heuristic. If [] (empty list), no initial configurations are specified. Note: If searcher is of type BaseSearcher, points_to_evaluate must be set there.

  • random_seed (int, optional) – Master random seed. Generators used in the scheduler or searcher are seeded using RandomSeedGenerator. If not given, the master random seed is drawn at random here.

  • max_resource_attr (str, optional) – Key name in config for fixed attribute containing the maximum resource. If this is given, max_t is not needed. We recommend to use max_resource_attr over max_t. If given, we use it to infer max_resource_level. It is also used to limit trial executions in promotion-based multi-fidelity schedulers (see class:HyperbandScheduler, type="promotion").

  • max_t (int, optional) – Value for max_resource_level. Needed for schedulers which make use of intermediate reports via on_trial_result. If this is not given, we try to infer its value from config_space (see ResourceLevelsScheduler). checking config_space["epochs"], config_space["max_t"], and config_space["max_epochs"]. If max_resource_attr is given, we use the value config_space[max_resource_attr]. But if max_t is given here, it takes precedence.

  • time_keeper (TimeKeeper, optional) – This will be used for timing here (see _elapsed_time). The time keeper has to be started at the beginning of the experiment. If not given, we use a local time keeper here, which is started with the first call to _suggest(). Can also be set after construction, with set_time_keeper(). Note: If you use SimulatorBackend, you need to pass its time_keeper here.

property searcher: BaseSearcher | None
set_time_keeper(time_keeper)[source]

Assign time keeper after construction.

This is possible only if the time keeper was not assigned at construction, and the experiment has not yet started.

Parameters:

time_keeper (TimeKeeper) – Time keeper to be used

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

Union[str, List[str]]

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

is_multiobjective_scheduler()[source]

Return True if a scheduler is multi-objective.

Return type:

bool

class syne_tune.optimizer.schedulers.HyperbandScheduler(config_space, **kwargs)[source]

Bases: FIFOScheduler, MultiFidelitySchedulerMixin, RemoveCheckpointsSchedulerMixin

Implements different variants of asynchronous Hyperband

See type for the different variants. One implementation detail is when using multiple brackets, task allocation to bracket is done randomly, based on a distribution which can be configured.

For definitions of concepts (bracket, rung, milestone), see

Li, Jamieson, Rostamizadeh, Gonina, Hardt, Recht, Talwalkar (2018)
A System for Massively Parallel Hyperparameter Tuning

or

Tiao, Klein, Lienart, Archambeau, Seeger (2020)
Model-based Asynchronous Hyperparameter and Neural Architecture Search

Note

This scheduler requires both metric and resource_attr to be returned by the reporter. Here, resource values must be positive int. If resource_attr == "epoch", this should be the number of epochs done, starting from 1 (not the epoch number, starting from 0).

Rung levels and promotion quantiles

Rung levels are values of the resource attribute at which stop/go decisions are made for jobs, comparing their metric against others at the same level. These rung levels (positive, strictly increasing) can be specified via rung_levels, the largest must be <= max_t. If rung_levels is not given, they are specified by grace_period and reduction_factor or rung_increment:

  • If \(r_{min}\) is grace_period, \(\eta\) is reduction_factor, then rung levels are \(\mathrm{round}(r_{min} \eta^j), j=0, 1, \dots\). This is the default choice for successive halving (Hyperband).

  • If rung_increment is given, but not reduction_factor, then rung levels are \(r_{min} + j \nu, j=0, 1, \dots\), where \(\nu\) is rung_increment.

If rung_levels is given, then grace_period, reduction_factor, rung_increment are ignored. If they are given, a warning is logged.

The rung levels determine the quantiles to be used in the stop/go decisions. If rung levels are \(r_j\), define \(q_j = r_j / r_{j+1}\). \(q_j\) is the promotion quantile at rung level \(r_j\). On average, a fraction of \(q_j\) jobs can continue, the remaining ones are stopped (or paused). In the default successive halving case, we have \(q_j = 1/\eta\) for all \(j\).

Cost-aware schedulers or searchers

Some schedulers (e.g., type == "cost_promotion") or searchers may depend on cost values (with key cost_attr) reported alongside the target metric. For promotion-based scheduling, a trial may pause and resume several times. The cost received in on_trial_result only counts the cost since the last resume. We maintain the sum of such costs in _cost_offset(), and append a new entry to result in on_trial_result with the total cost. If the evaluation function does not implement checkpointing, once a trial is resumed, it has to start from scratch. We detect this in on_trial_result and reset the cost offset to 0 (if the trial runs from scratch, the cost reported needs no offset added).

Note

This process requires cost_attr to be set

Pending evaluations

The searcher is notified, by searcher.register_pending calls, of (trial, resource) pairs for which evaluations are running, and a result is expected in the future. These pending evaluations can be used by the searcher in order to direct sampling elsewhere.

The choice of pending evaluations depends on searcher_data. If equal to “rungs”, pending evaluations sit only at rung levels, because observations are only used there. In the other cases, pending evaluations sit at all resource levels for which observations are obtained. For example, if a trial is at rung level \(r\) and continues towards the next rung level \(r_{next}\), if searcher_data == "rungs", searcher.register_pending is called for \(r_{next}\) only, while for other searcher_data values, pending evaluations are registered for \(r + 1, r + 2, \dots, r_{next}\). However, if in this case, register_pending_myopic is True, we instead call searcher.register_pending for \(r + 1\) when each observation is obtained (not just at a rung level). This leads to less pending evaluations at any one time. On the other hand, when a trial is continued at a rung level, we already know it will emit observations up to the next rung level, so it seems more “correct” to register all these pending evaluations in one go.

Additional arguments on top of parent class FIFOScheduler:

Parameters:
  • searcher (str or BaseSearcher) – Searcher for get_config decisions. String values are passed to searcher_factory() along with search_options and extra information. Supported values: SUPPORTED_SEARCHERS_HYPERBAND. Defaults to “random” (i.e., random search)

  • resource_attr (str, optional) – Name of resource attribute in results obtained via on_trial_result, defaults to “epoch”

  • grace_period (int, optional) – Minimum resource to be used for a job. Ignored if rung_levels is given. Defaults to 1

  • reduction_factor (float, optional) – Parameter to determine rung levels. Ignored if rung_levels is given. Must be \(\ge 2\), defaults to 3

  • rung_increment (int, optional) – Parameter to determine rung levels. Ignored if rung_levels or reduction_factor are given. Must be postive

  • rung_levels (List[int], optional) – If given, prescribes the set of rung levels to be used. Must contain positive integers, strictly increasing. This information overrides grace_period, reduction_factor, rung_increment. Note that the stop/promote rule in the successive halving scheduler is set based on the ratio of successive rung levels.

  • brackets (int, optional) – Number of brackets to be used in Hyperband. Each bracket has a different grace period, all share max_t and reduction_factor. If brackets == 1 (default), we run asynchronous successive halving.

  • type (str, optional) –

    Type of Hyperband scheduler. Defaults to “stopping”. Supported values (see also subclasses of RungSystem):

    • stopping: A config eval is executed by a single task. The task is stopped at a milestone if its metric is worse than a fraction of those who reached the milestone earlier, otherwise it continues. See StoppingRungSystem.

    • promotion: A config eval may be associated with multiple tasks over its lifetime. It is never terminated, but may be paused. Whenever a task becomes available, it may promote a config to the next milestone, if better than a fraction of others who reached the milestone. If no config can be promoted, a new one is chosen. See PromotionRungSystem.

    • cost_promotion: This is a cost-aware variant of ‘promotion’, see CostPromotionRungSystem for details. In this case, costs must be reported under the name rung_system_kwargs["cost_attr"] in results.

    • pasha: Similar to promotion type Hyperband, but it progressively expands the available resources until the ranking of configurations stabilizes.

    • rush_stopping: A variation of the stopping scheduler which requires passing rung_system_kwargs and points_to_evaluate. The first rung_system_kwargs["num_threshold_candidates"] of points_to_evaluate will enforce stricter rules on which task is continued. See RUSHStoppingRungSystem and RUSHScheduler.

    • rush_promotion: Same as rush_stopping but for promotion, see RUSHPromotionRungSystem

    • dyhpo: A model-based scheduler, which can be seen as extension of “promotion” with rung_increment rather than reduction_factor, see DynamicHPOSearcher

  • cost_attr (str, optional) – Required if the scheduler itself uses a cost metric (i.e., type="cost_promotion"), or if the searcher uses a cost metric. See also header comment.

  • searcher_data (str, optional) –

    Relevant only if a model-based searcher is used. Example: For NN tuning and ``resource_attr == “epoch”’, we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config may become. Choices:

    • ”rungs” (default): Only results at rung levels. Cheapest

    • ”all”: All results. Most expensive

    • ”rungs_and_last”: Results at rung levels, plus the most recent result. This means that in between rung levels, only the most recent result is used by the searcher. This is in between

    Note: For a Gaussian additive learning curve surrogate model, this has to be set to ‘all’.

  • register_pending_myopic (bool, optional) – See above. Used only if searcher_data != "rungs". Defaults to False

  • rung_system_per_bracket (bool, optional) – This concerns Hyperband with brackets > 1. Defaults to False. When starting a job for a new config, it is assigned a randomly sampled bracket. The larger the bracket, the larger the grace period for the config. If rung_system_per_bracket == True, we maintain separate rung level systems for each bracket, so that configs only compete with others started in the same bracket. If rung_system_per_bracket == False, we use a single rung level system, so that all configs compete with each other. In this case, the bracket of a config only determines the initial grace period, i.e. the first milestone at which it starts competing with others. This is the default. The concept of brackets in Hyperband is meant to hedge against overly aggressive filtering in successive halving, based on low fidelity criteria. In practice, successive halving (i.e., brackets = 1) often works best in the asynchronous case (as implemented here). If brackets > 1, the hedging is stronger if rung_system_per_bracket is True.

  • do_snapshots (bool, optional) – Support snapshots? If True, a snapshot of all running tasks and rung levels is returned by _promote_trial(). This snapshot is passed to searcher.get_config. Defaults to False. Note: Currently, only the stopping variant supports snapshots.

  • rung_system_kwargs (Dict[str, Any], optional) –

    Arguments passed to the rung system: * num_threshold_candidates: Used if ``type in [“rush_promotion”,

    ”rush_stopping”]``. The first num_threshold_candidates in points_to_evaluate enforce stricter requirements to the continuation of training tasks. See RUSHScheduler.

    • probability_sh: Used if type == "dyhpo". In DyHPO, we typically all paused trials against a number of new configurations, and the winner is either resumed or started (new trial). However, with the probability given here, we instead try to promote a trial as if type == "promotion". If no trial can be promoted, we fall back to the DyHPO logic. Use this to make DyHPO robust against starting too many new trials, because all paused ones score poorly (this happens especially at the beginning).

  • early_checkpoint_removal_kwargs (Dict[str, Any], optional) – If given, speculative early removal of checkpoints is done, see HyperbandRemoveCheckpointsCallback. The constructor arguments for the HyperbandRemoveCheckpointsCallback must be given here, if they cannot be inferred (key max_num_checkpoints is mandatory). This feature is used only for scheduler types which pause and resume trials.

does_pause_resume()[source]
Return type:

bool

Returns:

Is this variant doing pause and resume scheduling, in the sense that trials can be paused and resumed later?

property rung_levels: List[int]

Note that all entries of rung_levels are smaller than max_t (or config_space[max_resource_attr]): rung levels are resource levels where stop/go decisions are made. In particular, if rung_levels is passed at construction with rung_levels[-1] == max_t, this last entry is stripped off.

Returns:

Rung levels (strictly increasing, positive ints)

property num_brackets: int
Returns:

Number of brackets (i.e., rung level systems). If the scheduler does not use brackets, it has to return 1

property resource_attr: str
Returns:

Name of resource attribute in reported results

property max_resource_level: int
Returns:

Maximum resource level

property searcher_data: str
Returns:

Relevant only if a model-based searcher is used. Example: For NN tuning and resource_attr == "epoch", we receive a result for each epoch, but not all epoch values are also rung levels. searcher_data determines which of these results are passed to the searcher. As a rule, the more data the searcher receives, the better its fit, but also the more expensive get_config() may become. Choices:

  • ”rungs”: Only results at rung levels. Cheapest

  • ”all”: All results. Most expensive

  • ”rungs_and_last”: Results at rung levels plus last recent one. Not available for all multi-fidelity schedulers

on_trial_error(trial)[source]

Called when a trial has failed.

Parameters:

trial (Trial) – Trial for which error is reported.

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict[str, Any]) – Result dictionary

callback_for_checkpoint_removal(stop_criterion)[source]
Parameters:

stop_criterion (Callable[[TuningStatus], bool]) – Stopping criterion, as passed to Tuner

Return type:

Optional[TunerCallback]

Returns:

CP removal callback, or None if CP removal is not activated

class syne_tune.optimizer.schedulers.MedianStoppingRule(scheduler, resource_attr, running_average=True, metric=None, grace_time=1, grace_population=5, rank_cutoff=0.5)[source]

Bases: TrialScheduler

Applies median stopping rule in top of an existing scheduler.

  • If result at time-step ranks less than the cutoff of other results observed at this time-step, the trial is interrupted and otherwise, the wrapped scheduler is called to make the stopping decision.

  • Suggest decisions are left to the wrapped scheduler.

  • The mode of the wrapped scheduler is used.

Reference:

Google Vizier: A Service for Black-Box Optimization.
Golovin et al. 2017.
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, August 2017
Pages 1487–1495
Parameters:
  • scheduler (TrialScheduler) – Scheduler to be called for trial suggestion or when median-stopping-rule decision is to continue.

  • resource_attr (str) – Key in the reported dictionary that accounts for the resource (e.g. epoch).

  • running_average (bool) – If True, then uses the running average of observation instead of raw observations. Defaults to True

  • metric (Optional[str]) – Metric to be considered, defaults to scheduler.metric

  • grace_time (Optional[int]) – Median stopping rule is only applied for results whose resource_attr exceeds this amount. Defaults to 1

  • grace_population (int) – Median stopping rule when at least grace_population have been observed at a resource level. Defaults to 5

  • rank_cutoff (float) – Results whose quantiles are below this level are discarded. Defaults to 0.5 (median)

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

grace_condition(time_step)[source]
Parameters:

time_step (float) – Value result[self.resource_attr]

Return type:

bool

Returns:

Decide for continue?

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

class syne_tune.optimizer.schedulers.PopulationBasedTraining(config_space, custom_explore_fn=None, **kwargs)[source]

Bases: FIFOScheduler

Implements the Population Based Training (PBT) algorithm. This is an adapted version of the Ray Tune implementation:

https://docs.ray.io/en/latest/tune/tutorials/tune-advanced-tutorial.html

PBT was originally presented in the following paper:

Jaderberg et. al.
Population Based Training of Neural Networks

Population based training (PBT) maintains a population of models spread across an asynchronous set of workers and dynamically adjust their hyperparameters during training. Every time a worker reaches a user-defined milestone, it returns the performance of the currently evaluated network. If the network is within the top percentile of the population, the worker resumes its training until the next milestone. If not, PBT selects a model from the top percentile uniformly at random. The worker now continues with the latest checkpoint of this new model but mutates the hyperparameters.

The mutation happens as following. For each hyperparameter, we either resample its value uniformly at random, or otherwise increment (multiply by 1.2) or decrement (multiply by 0.8) the value (probability 0.5 each). For categorical hyperparameters, the value is always resampled uniformly.

Note: While this is implemented as child of FIFOScheduler, we require searcher="random" (default), since the current code only supports a random searcher.

Additional arguments on top of parent class FIFOScheduler.

Parameters:
  • resource_attr (str) – Name of resource attribute in results obtained via on_trial_result, defaults to “time_total_s”

  • population_size (int, optional) – Size of the population, defaults to 4

  • perturbation_interval (float, optional) – Models will be considered for perturbation at this interval of resource_attr. Note that perturbation incurs checkpoint overhead, so you shouldn’t set this to be too frequent. Defaults to 60

  • quantile_fraction (float, optional) – Parameters are transferred from the top quantile_fraction fraction of trials to the bottom quantile_fraction fraction. Needs to be between 0 and 0.5. Setting it to 0 essentially implies doing no exploitation at all. Defaults to 0.25

  • resample_probability (float, optional) – The probability of resampling from the original distribution when applying _explore(). If not resampled, the value will be perturbed by a factor of 1.2 or 0.8 if continuous, or changed to an adjacent value if discrete. Defaults to 0.25

  • custom_explore_fn (function, optional) – Custom exploration function. This function is invoked as f(config) instead of the built-in perturbations, and should return config updated as needed. If this is given, resample_probability is not used

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_result(trial, result)[source]

We simply relay result to the searcher. Other decisions are done in on_trial_complete.

Return type:

str

class syne_tune.optimizer.schedulers.RayTuneScheduler(config_space, ray_scheduler=None, ray_searcher=None, points_to_evaluate=None)[source]

Bases: TrialScheduler

Allow using Ray scheduler and searcher. Any searcher/scheduler should work, except such which need access to TrialRunner (e.g., PBT), this feature is not implemented in Syne Tune.

If ray_searcher is not given (defaults to random searcher), initial configurations to evaluate can be passed in points_to_evaluate. If ray_searcher is given, this argument is ignored (needs to be passed to ray_searcher at construction). Note: Use impute_points_to_evaluate() in order to preprocess points_to_evaluate specified by the user or the benchmark.

Parameters:
  • config_space (Dict) – Configuration space

  • ray_scheduler – Ray scheduler, defaults to FIFO scheduler

  • ray_searcher (Optional[Searcher]) – Ray searcher, defaults to random search

  • points_to_evaluate (Optional[List[Dict]]) – See above

RT_FIFOScheduler

alias of FIFOScheduler

RT_Searcher

alias of Searcher

class RandomSearch(config_space, points_to_evaluate, mode)[source]

Bases: Searcher

suggest(trial_id)[source]

Queries the algorithm to retrieve the next set of parameters.

Return type:

Optional[Dict]

Arguments:

trial_id: Trial ID used for subsequent notifications.

Returns:
dict | FINISHED | None: Configuration for a trial, if possible.

If FINISHED is returned, Tune will be notified that no more suggestions/configurations will be provided. If None is returned, Tune will skip the querying of the searcher for this step.

on_trial_complete(trial_id, result=None, error=False)[source]

Notification for the completion of trial.

Typically, this method is used for notifying the underlying optimizer of the result.

Args:

trial_id: A unique string ID for the trial. result: Dictionary of metrics for current training progress.

Note that the result dict may include NaNs or may not include the optimization metric. It is up to the subclass implementation to preprocess the result to avoid breaking the optimization process. Upon errors, this may also be None.

error: True if the training process raised an error.

on_trial_add(trial)[source]

Called when a new trial is added to the trial runner.

Additions are normally triggered by suggest.

Parameters:

trial (Trial) – Trial to be added

on_trial_error(trial)[source]

Called when a trial has failed.

Parameters:

trial (Trial) – Trial for which error is reported.

on_trial_result(trial, result)[source]

Called on each intermediate result reported by a trial.

At this point, the trial scheduler can make a decision by returning one of SchedulerDecision.CONTINUE, SchedulerDecision.PAUSE, or SchedulerDecision.STOP. This will only be called when the trial is currently running.

Parameters:
  • trial (Trial) – Trial for which results are reported

  • result (Dict) – Result dictionary

Return type:

str

Returns:

Decision what to do with the trial

on_trial_complete(trial, result)[source]

Notification for the completion of trial.

Note that on_trial_result() is called with the same result before. However, if the scheduler only uses one final report from each trial, it may ignore on_trial_result() and just use result here.

Parameters:
  • trial (Trial) – Trial which is completing

  • result (Dict) – Result dictionary

on_trial_remove(trial)[source]

Called to remove trial.

This is called when the trial is in PAUSED or PENDING state. Otherwise, call on_trial_complete().

Parameters:

trial (Trial) – Trial to be removed

metric_names()[source]
Return type:

List[str]

Returns:

List of metric names. The first one is the target metric optimized over, unless the scheduler is a genuine multi-objective metric (for example, for sampling the Pareto front)

metric_mode()[source]
Return type:

str

Returns:

“min” if target metric is minimized, otherwise “max”. Here, “min” should be the default. For a genuine multi-objective scheduler, a list of modes is returned

static convert_config_space(config_space)[source]

Converts config_space from our type to the one of Ray Tune.

Note: randint(lower, upper) in Ray Tune has exclusive upper, while this is inclusive for us. On the other hand, lograndint(lower, upper) has inclusive upper in Ray Tune as well.

Parameters:

config_space – Configuration space

Returns:

config_space converted into Ray Tune type

Subpackages

Submodules