syne_tune.experiments.benchmark_definitions.common module
- class syne_tune.experiments.benchmark_definitions.common.SurrogateBenchmarkDefinition(max_wallclock_time, n_workers, elapsed_time_attr, metric, mode, blackbox_name, dataset_name, max_num_evaluations=None, surrogate=None, surrogate_kwargs=None, add_surrogate_kwargs=None, max_resource_attr=None, datasets=None, fidelities=None, points_to_evaluate=None)[source]
Bases:
object
Meta-data for tabulated benchmark, served by the blackbox repository.
For a standard benchmark,
metric
andmode
are scalars, and there is a single metric. For a multi-objective benchmark (e.g., constrained HPO, cost-aware HPO, sampling of Pareto front),metric
must be a list with the names of the different objectives. In this case,mode
is a list of the same size or a scalar.Note
In Syne Tune experimentation, a benchmark is simply a tuning problem (training and evaluation code or blackbox, together with defaults). They are useful beyond benchmarking (i.e., comparing different HPO methods with each other), in that many experimental studies compare setups with a single HPO method, but different variations of the tuning problem of the backend.
- Parameters:
max_wallclock_time (
float
) – Default value for stopping criterionn_workers (
int
) – Default value for tunerelapsed_time_attr (
str
) – Name of metric reportedmetric (
Union
[str
,List
[str
]]) – Name of metric reported (or list of several)mode (
Union
[str
,List
[str
]]) – “max” or “min” (or list of several)blackbox_name (
str
) – Name of blackbox, seeload_blackbox()
dataset_name (
str
) – Dataset (or instance) for blackboxmax_num_evaluations (
Optional
[int
]) – Default value for stopping criterionsurrogate (
Optional
[str
]) – Default value for surrogate to be used, seemake_surrogate()
. Otherwise: use no surrogatesurrogate_kwargs (
Optional
[dict
]) – Default value for arguments of surrogate, seemake_surrogate()
add_surrogate_kwargs (
Optional
[dict
]) – Arguments passed toadd_surrogate()
. Optional.max_resource_attr (
Optional
[str
]) – Internal name between backend and schedulerdatasets (
Optional
[List
[str
]]) – Used in transfer tuningfidelities (
Optional
[List
[int
]]) – If given, this is a strictly increasing subset of the fidelity values provided by the surrogate, and only those will be reportedpoints_to_evaluate (
Optional
[List
[Dict
[str
,Any
]]]) – Initial configurations to be suggested by the scheduler. If your benchmark training code suggests default values for the hyperparameters, it is good practice serving this default configuration here.
-
max_wallclock_time:
float
-
n_workers:
int
-
elapsed_time_attr:
str
-
metric:
Union
[str
,List
[str
]]
-
mode:
Union
[str
,List
[str
]]
-
blackbox_name:
str
-
dataset_name:
str
-
max_num_evaluations:
Optional
[int
] = None
-
surrogate:
Optional
[str
] = None
-
surrogate_kwargs:
Optional
[dict
] = None
-
add_surrogate_kwargs:
Optional
[dict
] = None
-
max_resource_attr:
Optional
[str
] = None
-
datasets:
Optional
[List
[str
]] = None
-
fidelities:
Optional
[List
[int
]] = None
-
points_to_evaluate:
Optional
[List
[Dict
[str
,Any
]]] = None
- class syne_tune.experiments.benchmark_definitions.common.RealBenchmarkDefinition(script, config_space, max_wallclock_time, n_workers, instance_type, metric, mode, max_resource_attr, framework, resource_attr=None, estimator_kwargs=None, max_num_evaluations=None, points_to_evaluate=None)[source]
Bases:
object
Meta-data for real benchmark, given by code.
For a standard benchmark,
metric
andmode
are scalars, and there is a single metric. For a multi-objective benchmark (e.g., constrained HPO, cost-aware HPO, sampling of Pareto front),metric
must be a list with the names of the different objectives. In this case,mode
is a list of the same size or a scalar.Note
In Syne Tune experimentation, a benchmark is simply a tuning problem (training and evaluation code or blackbox, together with defaults). They are useful beyond benchmarking (i.e., comparing different HPO methods with each other), in that many experimental studies compare setups with a single HPO method, but different variations of the tuning problem of the backend.
- Parameters:
script (
Path
) – Absolute filename of training scriptconfig_space (
Dict
[str
,Any
]) – Default value for configuration space, must includemax_resource_attr
max_wallclock_time (
float
) – Default value for stopping criterionn_workers (
int
) – Default value for tunerinstance_type (
str
) – Default value for instance typemetric (
str
) – Name of metric reported (or list of several)mode (
str
) – “max” or “min” (or list of several)max_resource_attr (
str
) – Name ofconfig_space
entryframework (
str
) – SageMaker framework to be used forscript
. Additional dependencies inrequirements.txt
inscript.parent
- :param resource_attr Name of attribute reported (required for
multi-fidelity)
- Parameters:
estimator_kwargs (
Optional
[dict
]) – Additional arguments to SageMaker estimator, e.g.framework_version
max_num_evaluations (
Optional
[int
]) – Default value for stopping criterionpoints_to_evaluate (
Optional
[List
[Dict
[str
,Any
]]]) – Initial configurations to be suggested by the scheduler. If your benchmark training code suggests default values for the hyperparameters, it is good practice serving this default configuration here.
-
script:
Path
-
config_space:
Dict
[str
,Any
]
-
max_wallclock_time:
float
-
n_workers:
int
-
instance_type:
str
-
metric:
str
-
mode:
str
-
max_resource_attr:
str
-
framework:
str
-
resource_attr:
Optional
[str
] = None
-
estimator_kwargs:
Optional
[dict
] = None
-
max_num_evaluations:
Optional
[int
] = None
-
points_to_evaluate:
Optional
[List
[Dict
[str
,Any
]]] = None