syne_tune.experiments.benchmark_definitions.common module

class syne_tune.experiments.benchmark_definitions.common.SurrogateBenchmarkDefinition(max_wallclock_time, n_workers, elapsed_time_attr, metric, mode, blackbox_name, dataset_name, max_num_evaluations=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, max_resource_attr=None, datasets=None, fidelities=None, points_to_evaluate=None)[source]

Bases: object

Meta-data for tabulated benchmark, served by the blackbox repository.

For a standard benchmark, metric and mode are scalars, and there is a single metric. For a multi-objective benchmark (e.g., constrained HPO, cost-aware HPO, sampling of Pareto front), metric must be a list with the names of the different objectives. In this case, mode is a list of the same size or a scalar.

Note

In Syne Tune experimentation, a benchmark is simply a tuning problem (training and evaluation code or blackbox, together with defaults). They are useful beyond benchmarking (i.e., comparing different HPO methods with each other), in that many experimental studies compare setups with a single HPO method, but different variations of the tuning problem of the backend.

Parameters:

max_wallclock_time (float) – Default value for stopping criterion
n_workers (int) – Default value for tuner
elapsed_time_attr (str) – Name of metric reported
metric (Union[str, List[str]]) – Name of metric reported (or list of several)
mode (Union[str, List[str]]) – “max” or “min” (or list of several)
blackbox_name (str) – Name of blackbox, see load_blackbox()
dataset_name (str) – Dataset (or instance) for blackbox
max_num_evaluations (Optional[int]) – Default value for stopping criterion
surrogate (Optional[str]) – Default value for surrogate to be used, see make_surrogate(). Otherwise: use no surrogate
surrogate_kwargs (Optional[dict]) – Default value for arguments of surrogate, see make_surrogate()
add_surrogate_kwargs (Optional[dict]) – Arguments passed to add_surrogate(). Optional.
max_resource_attr (Optional[str]) – Internal name between backend and scheduler
datasets (Optional[List[str]]) – Used in transfer tuning
fidelities (Optional[List[int]]) – If given, this is a strictly increasing subset of the fidelity values provided by the surrogate, and only those will be reported
points_to_evaluate (Optional[List[Dict[str, Any]]]) – Initial configurations to be suggested by the scheduler. If your benchmark training code suggests default values for the hyperparameters, it is good practice serving this default configuration here.

max_wallclock_time: float

n_workers: int

elapsed_time_attr: str

metric: Union[str, List[str]]

mode: Union[str, List[str]]

blackbox_name: str

dataset_name: str

max_num_evaluations: Optional[int] = None

surrogate: Optional[str] = None

surrogate_kwargs: Optional[dict] = None

add_surrogate_kwargs: Optional[dict] = None

max_resource_attr: Optional[str] = None

datasets: Optional[List[str]] = None

fidelities: Optional[List[int]] = None

points_to_evaluate: Optional[List[Dict[str, Any]]] = None

class syne_tune.experiments.benchmark_definitions.common.RealBenchmarkDefinition(script, config_space, max_wallclock_time, n_workers, instance_type, metric, mode, max_resource_attr, framework, resource_attr=None, estimator_kwargs=None, max_num_evaluations=None, points_to_evaluate=None)[source]

Bases: object

Meta-data for real benchmark, given by code.

For a standard benchmark, metric and mode are scalars, and there is a single metric. For a multi-objective benchmark (e.g., constrained HPO, cost-aware HPO, sampling of Pareto front), metric must be a list with the names of the different objectives. In this case, mode is a list of the same size or a scalar.

Note

In Syne Tune experimentation, a benchmark is simply a tuning problem (training and evaluation code or blackbox, together with defaults). They are useful beyond benchmarking (i.e., comparing different HPO methods with each other), in that many experimental studies compare setups with a single HPO method, but different variations of the tuning problem of the backend.

Parameters:

script (Path) – Absolute filename of training script
config_space (Dict[str, Any]) – Default value for configuration space, must include max_resource_attr
max_wallclock_time (float) – Default value for stopping criterion
n_workers (int) – Default value for tuner
instance_type (str) – Default value for instance type
metric (str) – Name of metric reported (or list of several)
mode (str) – “max” or “min” (or list of several)
max_resource_attr (str) – Name of config_space entry
framework (str) – SageMaker framework to be used for script. Additional dependencies in requirements.txt in script.parent

:param resource_attr Name of attribute reported (required for: multi-fidelity)

Parameters:

estimator_kwargs (Optional[dict]) – Additional arguments to SageMaker estimator, e.g. framework_version
max_num_evaluations (Optional[int]) – Default value for stopping criterion
points_to_evaluate (Optional[List[Dict[str, Any]]]) – Initial configurations to be suggested by the scheduler. If your benchmark training code suggests default values for the hyperparameters, it is good practice serving this default configuration here.

script: Path

config_space: Dict[str, Any]

max_wallclock_time: float

n_workers: int

instance_type: str

metric: str

mode: str

max_resource_attr: str

framework: str

resource_attr: Optional[str] = None

estimator_kwargs: Optional[dict] = None

max_num_evaluations: Optional[int] = None

points_to_evaluate: Optional[List[Dict[str, Any]]] = None