syne_tune.experiments.benchmark_definitions.common module

class syne_tune.experiments.benchmark_definitions.common.SurrogateBenchmarkDefinition(max_wallclock_time, n_workers, elapsed_time_attr, metric, mode, blackbox_name, dataset_name, max_num_evaluations=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, max_resource_attr=None, datasets=None, fidelities=None, points_to_evaluate=None)[source]

Bases: object

Meta-data for tabulated benchmark, served by the blackbox repository.

For a standard benchmark, metric and mode are scalars, and there is a single metric. For a multi-objective benchmark (e.g., constrained HPO, cost-aware HPO, sampling of Pareto front), metric must be a list with the names of the different objectives. In this case, mode is a list of the same size or a scalar.

Note

In Syne Tune experimentation, a benchmark is simply a tuning problem (training and evaluation code or blackbox, together with defaults). They are useful beyond benchmarking (i.e., comparing different HPO methods with each other), in that many experimental studies compare setups with a single HPO method, but different variations of the tuning problem of the backend.

Parameters:
  • max_wallclock_time (float) – Default value for stopping criterion

  • n_workers (int) – Default value for tuner

  • elapsed_time_attr (str) – Name of metric reported

  • metric (Union[str, List[str]]) – Name of metric reported (or list of several)

  • mode (Union[str, List[str]]) – “max” or “min” (or list of several)

  • blackbox_name (str) – Name of blackbox, see load_blackbox()

  • dataset_name (str) – Dataset (or instance) for blackbox

  • max_num_evaluations (Optional[int]) – Default value for stopping criterion

  • surrogate (Optional[str]) – Default value for surrogate to be used, see make_surrogate(). Otherwise: use no surrogate

  • surrogate_kwargs (Optional[dict]) – Default value for arguments of surrogate, see make_surrogate()

  • add_surrogate_kwargs (Optional[dict]) – Arguments passed to add_surrogate(). Optional.

  • max_resource_attr (Optional[str]) – Internal name between backend and scheduler

  • datasets (Optional[List[str]]) – Used in transfer tuning

  • fidelities (Optional[List[int]]) – If given, this is a strictly increasing subset of the fidelity values provided by the surrogate, and only those will be reported

  • points_to_evaluate (Optional[List[Dict[str, Any]]]) – Initial configurations to be suggested by the scheduler. If your benchmark training code suggests default values for the hyperparameters, it is good practice serving this default configuration here.

max_wallclock_time: float
n_workers: int
elapsed_time_attr: str
metric: Union[str, List[str]]
mode: Union[str, List[str]]
blackbox_name: str
dataset_name: str
max_num_evaluations: Optional[int] = None
surrogate: Optional[str] = None
surrogate_kwargs: Optional[dict] = None
add_surrogate_kwargs: Optional[dict] = None
max_resource_attr: Optional[str] = None
datasets: Optional[List[str]] = None
fidelities: Optional[List[int]] = None
points_to_evaluate: Optional[List[Dict[str, Any]]] = None
class syne_tune.experiments.benchmark_definitions.common.RealBenchmarkDefinition(script, config_space, max_wallclock_time, n_workers, instance_type, metric, mode, max_resource_attr, framework, resource_attr=None, estimator_kwargs=None, max_num_evaluations=None, points_to_evaluate=None)[source]

Bases: object

Meta-data for real benchmark, given by code.

For a standard benchmark, metric and mode are scalars, and there is a single metric. For a multi-objective benchmark (e.g., constrained HPO, cost-aware HPO, sampling of Pareto front), metric must be a list with the names of the different objectives. In this case, mode is a list of the same size or a scalar.

Note

In Syne Tune experimentation, a benchmark is simply a tuning problem (training and evaluation code or blackbox, together with defaults). They are useful beyond benchmarking (i.e., comparing different HPO methods with each other), in that many experimental studies compare setups with a single HPO method, but different variations of the tuning problem of the backend.

Parameters:
  • script (Path) – Absolute filename of training script

  • config_space (Dict[str, Any]) – Default value for configuration space, must include max_resource_attr

  • max_wallclock_time (float) – Default value for stopping criterion

  • n_workers (int) – Default value for tuner

  • instance_type (str) – Default value for instance type

  • metric (str) – Name of metric reported (or list of several)

  • mode (str) – “max” or “min” (or list of several)

  • max_resource_attr (str) – Name of config_space entry

  • framework (str) – SageMaker framework to be used for script. Additional dependencies in requirements.txt in script.parent

:param resource_attr Name of attribute reported (required for

multi-fidelity)

Parameters:
  • estimator_kwargs (Optional[dict]) – Additional arguments to SageMaker estimator, e.g. framework_version

  • max_num_evaluations (Optional[int]) – Default value for stopping criterion

  • points_to_evaluate (Optional[List[Dict[str, Any]]]) – Initial configurations to be suggested by the scheduler. If your benchmark training code suggests default values for the hyperparameters, it is good practice serving this default configuration here.

script: Path
config_space: Dict[str, Any]
max_wallclock_time: float
n_workers: int
instance_type: str
metric: str
mode: str
max_resource_attr: str
framework: str
resource_attr: Optional[str] = None
estimator_kwargs: Optional[dict] = None
max_num_evaluations: Optional[int] = None
points_to_evaluate: Optional[List[Dict[str, Any]]] = None