syne_tune.backend.sagemaker_backend.sagemaker_backend module

class syne_tune.backend.sagemaker_backend.sagemaker_backend.SageMakerBackend(sm_estimator, metrics_names=None, s3_path=None, delete_checkpoints=False, pass_args_as_json=False, **sagemaker_fit_kwargs)[source]

Bases: TrialBackend

This backend executes each trial evaluation as a separate SageMaker training job, using sm_estimator as estimator.

Checkpoints are written to and loaded from S3, using checkpoint_s3_uri of the estimator.

Compared to LocalBackend, this backend can run any number of jobs in parallel (given sufficient resources), and any instance type can be used.

This backend allows to select the instance type and count for a trial evaluation, by passing values in the configuration, using names ST_INSTANCE_TYPE and ST_INSTANCE_COUNT. If these are given in the configuration, they overwrite the default in sm_estimator. This allows for tuning instance type and count along with the hyperparameter configuration.

Additional arguments on top of parent class TrialBackend:

Parameters:
  • sm_estimator (Framework) – SageMaker estimator for trial evaluations.

  • metrics_names (Optional[List[str]]) – Names of metrics passed to report, used to plot live curve in SageMaker (optional, only used for visualization)

  • s3_path (Optional[str]) – S3 base path used for checkpointing. The full path also involves the tuner name and the trial_id. The default base path is the S3 bucket associated with the SageMaker account

  • sagemaker_fit_kwargs – Extra arguments that passed to sagemaker.estimator.Framework when fitting the job, for instance {'train': 's3://my-data-bucket/path/to/my/training/data'}

property sm_client
add_metric_definitions_to_sagemaker_estimator(metrics_names)[source]
busy_trial_ids()[source]

Returns list of ids for currently busy trials

A trial is busy if its status is in_progress or stopping. If the execution setup is able to run n_workers jobs in parallel, then if this method returns a list of size n, the tuner may start n_workers - n new jobs.

Return type:

List[Tuple[int, str]]

Returns:

List of (trial_id, status)

stdout(trial_id)[source]

Fetch stdout log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stdout)

stderr(trial_id)[source]

Fetch stderr log for trial

Parameters:

trial_id (int) – ID of trial

Return type:

List[str]

Returns:

Lines of the log of the trial (stderr)

property source_dir: str | None
set_entrypoint(entry_point)[source]

Update the entrypoint.

Parameters:

entry_point (str) – New path of the entrypoint.

entrypoint_path()[source]
Return type:

Path

Returns:

Entrypoint path of script to be executed

initialize_sagemaker_session()[source]
copy_checkpoint(src_trial_id, tgt_trial_id)[source]

Copy the checkpoint folder from one trial to the other.

Parameters:
  • src_trial_id (int) – Source trial ID (copy from)

  • tgt_trial_id (int) – Target trial ID (copy to)

delete_checkpoint(trial_id)[source]

Removes checkpoint folder for a trial. It is OK for the folder not to exist.

Parameters:

trial_id (int) – ID of trial for which checkpoint files are deleted

set_path(results_root=None, tuner_name=None)[source]

For this backend, it is mandatory to call this method passing tuner_name before the backend is used. results_root is ignored here.

on_tuner_save()[source]

Called at the end of save().