syne_tune.backend.sagemaker_backend.sagemaker_backend module
- class syne_tune.backend.sagemaker_backend.sagemaker_backend.SageMakerBackend(sm_estimator, metrics_names=None, s3_path=None, delete_checkpoints=False, pass_args_as_json=False, **sagemaker_fit_kwargs)[source]
Bases:
TrialBackend
This backend executes each trial evaluation as a separate SageMaker training job, using
sm_estimator
as estimator.Checkpoints are written to and loaded from S3, using
checkpoint_s3_uri
of the estimator.Compared to
LocalBackend
, this backend can run any number of jobs in parallel (given sufficient resources), and any instance type can be used.This backend allows to select the instance type and count for a trial evaluation, by passing values in the configuration, using names
ST_INSTANCE_TYPE
andST_INSTANCE_COUNT
. If these are given in the configuration, they overwrite the default insm_estimator
. This allows for tuning instance type and count along with the hyperparameter configuration.Additional arguments on top of parent class
TrialBackend
:- Parameters:
sm_estimator (
Framework
) – SageMaker estimator for trial evaluations.metrics_names (
Optional
[List
[str
]]) – Names of metrics passed toreport
, used to plot live curve in SageMaker (optional, only used for visualization)s3_path (
Optional
[str
]) – S3 base path used for checkpointing. The full path also involves the tuner name and thetrial_id
. The default base path is the S3 bucket associated with the SageMaker accountsagemaker_fit_kwargs – Extra arguments that passed to
sagemaker.estimator.Framework
when fitting the job, for instance{'train': 's3://my-data-bucket/path/to/my/training/data'}
- property sm_client
- busy_trial_ids()[source]
Returns list of ids for currently busy trials
A trial is busy if its status is
in_progress
orstopping
. If the execution setup is able to runn_workers
jobs in parallel, then if this method returns a list of sizen
, the tuner may startn_workers - n
new jobs.- Return type:
List
[Tuple
[int
,str
]]- Returns:
List of
(trial_id, status)
- stdout(trial_id)[source]
Fetch
stdout
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stdout)
- stderr(trial_id)[source]
Fetch
stderr
log for trial- Parameters:
trial_id (
int
) – ID of trial- Return type:
List
[str
]- Returns:
Lines of the log of the trial (stderr)
- property source_dir: str | None
- set_entrypoint(entry_point)[source]
Update the entrypoint.
- Parameters:
entry_point (
str
) – New path of the entrypoint.
- copy_checkpoint(src_trial_id, tgt_trial_id)[source]
Copy the checkpoint folder from one trial to the other.
- Parameters:
src_trial_id (
int
) – Source trial ID (copy from)tgt_trial_id (
int
) – Target trial ID (copy to)
- delete_checkpoint(trial_id)[source]
Removes checkpoint folder for a trial. It is OK for the folder not to exist.
- Parameters:
trial_id (
int
) – ID of trial for which checkpoint files are deleted