---
title: Getting Started with Datadog
description: Datadog, the leading service for cloud-scale monitoring.
breadcrumbs: Docs > Infrastructure > Datadog Resource Catalog
---

# aws_sagemaker_trainingjob{% #aws_sagemaker_trainingjob %}

## `account_id`{% #account_id %}

**Type**: `STRING`

## `algorithm_specification`{% #algorithm_specification %}

**Type**: `STRUCT`**Provider name**: `AlgorithmSpecification`**Description**: Information about the algorithm used for training, and algorithm metadata.

- `algorithm_name`**Type**: `STRING`**Provider name**: `AlgorithmName`**Description**: The name of the algorithm resource to use for the training job. This must be an algorithm resource that you created or subscribe to on Amazon Web Services Marketplace.You must specify either the algorithm name to the `AlgorithmName` parameter or the image URI of the algorithm container to the `TrainingImage` parameter. Note that the `AlgorithmName` parameter is mutually exclusive with the `TrainingImage` parameter. If you specify a value for the `AlgorithmName` parameter, you can't specify a value for `TrainingImage`, and vice versa. If you specify values for both parameters, the training job might break; if you don't specify any value for both parameters, the training job might raise a `null` error.
- `container_arguments`**Type**: `UNORDERED_LIST_STRING`**Provider name**: `ContainerArguments`**Description**: The arguments for a container used to run a training job. See [How Amazon SageMaker Runs Your Training Image](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-dockerfile.html) for additional information.
- `container_entrypoint`**Type**: `UNORDERED_LIST_STRING`**Provider name**: `ContainerEntrypoint`**Description**: The [entrypoint script for a Docker container](https://docs.docker.com/engine/reference/builder/) used to run a training job. This script takes precedence over the default train processing instructions. See [How Amazon SageMaker Runs Your Training Image](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo-dockerfile.html) for more information.
- `enable_sage_maker_metrics_time_series`**Type**: `BOOLEAN`**Provider name**: `EnableSageMakerMetricsTimeSeries`**Description**: To generate and save time-series metrics during training, set to `true`. The default is `false` and time-series metrics aren't generated except in the following cases:
  - You use one of the SageMaker built-in algorithms
  - You use one of the following [Prebuilt SageMaker Docker Images](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-containers-frameworks-deep-learning.html):
    - Tensorflow (version >= 1.15)
    - MXNet (version >= 1.6)
    - PyTorch (version >= 1.3)
  - You specify at least one [MetricDefinition](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_MetricDefinition.html)
- `metric_definitions`**Type**: `UNORDERED_LIST_STRUCT`**Provider name**: `MetricDefinitions`**Description**: A list of metric definition objects. Each object specifies the metric name and regular expressions used to parse algorithm logs. SageMaker publishes each metric to Amazon CloudWatch.
  - `name`**Type**: `STRING`**Provider name**: `Name`**Description**: The name of the metric.
  - `regex`**Type**: `STRING`**Provider name**: `Regex`**Description**: A regular expression that searches the output of a training job and gets the value of the metric. For more information about using regular expressions to define metrics, see [Defining metrics and environment variables](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-metrics-variables.html).
- `training_image`**Type**: `STRING`**Provider name**: `TrainingImage`**Description**: The registry path of the Docker image that contains the training algorithm. For information about docker registry paths for SageMaker built-in algorithms, see [Docker Registry Paths and Example Code](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html) in the Amazon SageMaker developer guide. SageMaker supports both `registry/repository[:tag]` and `registry/repository[@digest]` image path formats. For more information about using your custom training container, see [Using Your Own Algorithms with Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html).You must specify either the algorithm name to the `AlgorithmName` parameter or the image URI of the algorithm container to the `TrainingImage` parameter. For more information, see the note in the `AlgorithmName` parameter description.
- `training_image_config`**Type**: `STRUCT`**Provider name**: `TrainingImageConfig`**Description**: The configuration to use an image from a private Docker registry for a training job.
  - `training_repository_access_mode`**Type**: `STRING`**Provider name**: `TrainingRepositoryAccessMode`**Description**: The method that your training job will use to gain access to the images in your private Docker registry. For access to an image in a private Docker registry, set to `Vpc`.
  - `training_repository_auth_config`**Type**: `STRUCT`**Provider name**: `TrainingRepositoryAuthConfig`**Description**: An object containing authentication information for a private Docker registry containing your training images.
    - `training_repository_credentials_provider_arn`**Type**: `STRING`**Provider name**: `TrainingRepositoryCredentialsProviderArn`**Description**: The Amazon Resource Name (ARN) of an Amazon Web Services Lambda function used to give SageMaker access credentials to your private Docker registry.
- `training_input_mode`**Type**: `STRING`**Provider name**: `TrainingInputMode`

## `auto_ml_job_arn`{% #auto_ml_job_arn %}

**Type**: `STRING`**Provider name**: `AutoMLJobArn`**Description**: The Amazon Resource Name (ARN) of an AutoML job.

## `billable_time_in_seconds`{% #billable_time_in_seconds %}

**Type**: `INT32`**Provider name**: `BillableTimeInSeconds`**Description**: The billable time in seconds. Billable time refers to the absolute wall-clock time. Multiply `BillableTimeInSeconds` by the number of instances (`InstanceCount`) in your training cluster to get the total compute time SageMaker bills you if you run distributed training. The formula is as follows: `BillableTimeInSeconds * InstanceCount` . You can calculate the savings from using managed spot training using the formula `(1 - BillableTimeInSeconds / TrainingTimeInSeconds) * 100`. For example, if `BillableTimeInSeconds` is 100 and `TrainingTimeInSeconds` is 500, the savings is 80%.

## `checkpoint_config`{% #checkpoint_config %}

**Type**: `STRUCT`**Provider name**: `CheckpointConfig`

- `local_path`**Type**: `STRING`**Provider name**: `LocalPath`**Description**: (Optional) The local directory where checkpoints are written. The default directory is `/opt/ml/checkpoints/`.
- `s3_uri`**Type**: `STRING`**Provider name**: `S3Uri`**Description**: Identifies the S3 path where you want SageMaker to store checkpoints. For example, `s3://bucket-name/key-name-prefix`.

## `creation_time`{% #creation_time %}

**Type**: `TIMESTAMP`**Provider name**: `CreationTime`**Description**: A timestamp that indicates when the training job was created.

## `debug_hook_config`{% #debug_hook_config %}

**Type**: `STRUCT`**Provider name**: `DebugHookConfig`

- `collection_configurations`**Type**: `UNORDERED_LIST_STRUCT`**Provider name**: `CollectionConfigurations`**Description**: Configuration information for Amazon SageMaker Debugger tensor collections. To learn more about how to configure the `CollectionConfiguration` parameter, see [Use the SageMaker and Debugger Configuration API Operations to Create, Update, and Debug Your Training Job](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-createtrainingjob-api.html).
  - `collection_name`**Type**: `STRING`**Provider name**: `CollectionName`**Description**: The name of the tensor collection. The name must be unique relative to other rule configuration names.
  - `collection_parameters`**Type**: `MAP_STRING_STRING`**Provider name**: `CollectionParameters`**Description**: Parameter values for the tensor collection. The allowed parameters are `"name"`, `"include_regex"`, `"reduction_config"`, `"save_config"`, `"tensor_names"`, and `"save_histogram"`.
- `hook_parameters`**Type**: `MAP_STRING_STRING`**Provider name**: `HookParameters`**Description**: Configuration information for the Amazon SageMaker Debugger hook parameters.
- `local_path`**Type**: `STRING`**Provider name**: `LocalPath`**Description**: Path to local storage location for metrics and tensors. Defaults to `/opt/ml/output/tensors/`.
- `s3_output_path`**Type**: `STRING`**Provider name**: `S3OutputPath`**Description**: Path to Amazon S3 storage location for metrics and tensors.

## `debug_rule_configurations`{% #debug_rule_configurations %}

**Type**: `UNORDERED_LIST_STRUCT`**Provider name**: `DebugRuleConfigurations`**Description**: Configuration information for Amazon SageMaker Debugger rules for debugging output tensors.

- `instance_type`**Type**: `STRING`**Provider name**: `InstanceType`**Description**: The instance type to deploy a custom rule for debugging a training job.
- `local_path`**Type**: `STRING`**Provider name**: `LocalPath`**Description**: Path to local storage location for output of rules. Defaults to `/opt/ml/processing/output/rule/`.
- `rule_configuration_name`**Type**: `STRING`**Provider name**: `RuleConfigurationName`**Description**: The name of the rule configuration. It must be unique relative to other rule configuration names.
- `rule_evaluator_image`**Type**: `STRING`**Provider name**: `RuleEvaluatorImage`**Description**: The Amazon Elastic Container (ECR) Image for the managed rule evaluation.
- `rule_parameters`**Type**: `MAP_STRING_STRING`**Provider name**: `RuleParameters`**Description**: Runtime configuration for rule container.
- `s3_output_path`**Type**: `STRING`**Provider name**: `S3OutputPath`**Description**: Path to Amazon S3 storage location for rules.
- `volume_size_in_gb`**Type**: `INT32`**Provider name**: `VolumeSizeInGB`**Description**: The size, in GB, of the ML storage volume attached to the processing instance.

## `debug_rule_evaluation_statuses`{% #debug_rule_evaluation_statuses %}

**Type**: `UNORDERED_LIST_STRUCT`**Provider name**: `DebugRuleEvaluationStatuses`**Description**: Evaluation status of Amazon SageMaker Debugger rules for debugging on a training job.

- `last_modified_time`**Type**: `TIMESTAMP`**Provider name**: `LastModifiedTime`**Description**: Timestamp when the rule evaluation status was last modified.
- `rule_configuration_name`**Type**: `STRING`**Provider name**: `RuleConfigurationName`**Description**: The name of the rule configuration.
- `rule_evaluation_job_arn`**Type**: `STRING`**Provider name**: `RuleEvaluationJobArn`**Description**: The Amazon Resource Name (ARN) of the rule evaluation job.
- `rule_evaluation_status`**Type**: `STRING`**Provider name**: `RuleEvaluationStatus`**Description**: Status of the rule evaluation.
- `status_details`**Type**: `STRING`**Provider name**: `StatusDetails`**Description**: Details from the rule evaluation.

## `enable_inter_container_traffic_encryption`{% #enable_inter_container_traffic_encryption %}

**Type**: `BOOLEAN`**Provider name**: `EnableInterContainerTrafficEncryption`**Description**: To encrypt all communications between ML compute instances in distributed training, choose `True`. Encryption provides greater security for distributed training, but training might take longer. How long it takes depends on the amount of communication between compute instances, especially if you use a deep learning algorithms in distributed training.

## `enable_managed_spot_training`{% #enable_managed_spot_training %}

**Type**: `BOOLEAN`**Provider name**: `EnableManagedSpotTraining`**Description**: A Boolean indicating whether managed spot training is enabled (`True`) or not (`False`).

## `enable_network_isolation`{% #enable_network_isolation %}

**Type**: `BOOLEAN`**Provider name**: `EnableNetworkIsolation`**Description**: If you want to allow inbound or outbound network calls, except for calls between peers within a training cluster for distributed training, choose `True`. If you enable network isolation for training jobs that are configured to use a VPC, SageMaker downloads and uploads customer data and model artifacts through the specified VPC, but the training container does not have network access.

## `environment`{% #environment %}

**Type**: `MAP_STRING_STRING`**Provider name**: `Environment`**Description**: The environment variables to set in the Docker container.Do not include any security-sensitive information including account access IDs, secrets, or tokens in any environment fields. As part of the shared responsibility model, you are responsible for any potential exposure, unauthorized access, or compromise of your sensitive data if caused by security-sensitive information included in the request environment variable or plain text fields.

## `experiment_config`{% #experiment_config %}

**Type**: `STRUCT`**Provider name**: `ExperimentConfig`

- `experiment_name`**Type**: `STRING`**Provider name**: `ExperimentName`**Description**: The name of an existing experiment to associate with the trial component.
- `run_name`**Type**: `STRING`**Provider name**: `RunName`**Description**: The name of the experiment run to associate with the trial component.
- `trial_component_display_name`**Type**: `STRING`**Provider name**: `TrialComponentDisplayName`**Description**: The display name for the trial component. If this key isn't specified, the display name is the trial component name.
- `trial_name`**Type**: `STRING`**Provider name**: `TrialName`**Description**: The name of an existing trial to associate the trial component with. If not specified, a new trial is created.

## `failure_reason`{% #failure_reason %}

**Type**: `STRING`**Provider name**: `FailureReason`**Description**: If the training job failed, the reason it failed.

## `final_metric_data_list`{% #final_metric_data_list %}

**Type**: `UNORDERED_LIST_STRUCT`**Provider name**: `FinalMetricDataList`**Description**: A collection of `MetricData` objects that specify the names, values, and dates and times that the training algorithm emitted to Amazon CloudWatch.

- `metric_name`**Type**: `STRING`**Provider name**: `MetricName`**Description**: The name of the metric.
- `timestamp`**Type**: `TIMESTAMP`**Provider name**: `Timestamp`**Description**: The date and time that the algorithm emitted the metric.
- `value`**Type**: `FLOAT`**Provider name**: `Value`**Description**: The value of the metric.

## `hyper_parameters`{% #hyper_parameters %}

**Type**: `MAP_STRING_STRING`**Provider name**: `HyperParameters`**Description**: Algorithm-specific parameters.

## `infra_check_config`{% #infra_check_config %}

**Type**: `STRUCT`**Provider name**: `InfraCheckConfig`**Description**: Contains information about the infrastructure health check configuration for the training job.

- `enable_infra_check`**Type**: `BOOLEAN`**Provider name**: `EnableInfraCheck`**Description**: Enables an infrastructure health check.

## `input_data_config`{% #input_data_config %}

**Type**: `UNORDERED_LIST_STRUCT`**Provider name**: `InputDataConfig`**Description**: An array of `Channel` objects that describes each data input channel.

- `channel_name`**Type**: `STRING`**Provider name**: `ChannelName`**Description**: The name of the channel.
- `compression_type`**Type**: `STRING`**Provider name**: `CompressionType`**Description**: If training data is compressed, the compression type. The default value is `None`. `CompressionType` is used only in Pipe input mode. In File mode, leave this field unset or set it to None.
- `content_type`**Type**: `STRING`**Provider name**: `ContentType`**Description**: The MIME type of the data.
- `data_source`**Type**: `STRUCT`**Provider name**: `DataSource`**Description**: The location of the channel data.
  - `file_system_data_source`**Type**: `STRUCT`**Provider name**: `FileSystemDataSource`**Description**: The file system that is associated with a channel.
    - `directory_path`**Type**: `STRING`**Provider name**: `DirectoryPath`**Description**: The full path to the directory to associate with the channel.
    - `file_system_access_mode`**Type**: `STRING`**Provider name**: `FileSystemAccessMode`**Description**: The access mode of the mount of the directory associated with the channel. A directory can be mounted either in `ro` (read-only) or `rw` (read-write) mode.
    - `file_system_id`**Type**: `STRING`**Provider name**: `FileSystemId`**Description**: The file system id.
    - `file_system_type`**Type**: `STRING`**Provider name**: `FileSystemType`**Description**: The file system type.
  - `s3_data_source`**Type**: `STRUCT`**Provider name**: `S3DataSource`**Description**: The S3 location of the data source that is associated with a channel.
    - `attribute_names`**Type**: `UNORDERED_LIST_STRING`**Provider name**: `AttributeNames`**Description**: A list of one or more attribute names to use that are found in a specified augmented manifest file.
    - `hub_access_config`**Type**: `STRUCT`**Provider name**: `HubAccessConfig`**Description**: The configuration for a private hub model reference that points to a SageMaker JumpStart public hub model.
      - `hub_content_arn`**Type**: `STRING`**Provider name**: `HubContentArn`**Description**: The ARN of your private model hub content. This should be a `ModelReference` resource type that points to a SageMaker JumpStart public hub model.
    - `instance_group_names`**Type**: `UNORDERED_LIST_STRING`**Provider name**: `InstanceGroupNames`**Description**: A list of names of instance groups that get data from the S3 data source.
    - `model_access_config`**Type**: `STRUCT`**Provider name**: `ModelAccessConfig`
      - `accept_eula`**Type**: `BOOLEAN`**Provider name**: `AcceptEula`**Description**: Specifies agreement to the model end-user license agreement (EULA). The `AcceptEula` value must be explicitly defined as `True` in order to accept the EULA that this model requires. You are responsible for reviewing and complying with any applicable license terms and making sure they are acceptable for your use case before downloading or using a model.
    - `s3_data_distribution_type`**Type**: `STRING`**Provider name**: `S3DataDistributionType`**Description**: If you want SageMaker to replicate the entire dataset on each ML compute instance that is launched for model training, specify `FullyReplicated`. If you want SageMaker to replicate a subset of data on each ML compute instance that is launched for model training, specify `ShardedByS3Key`. If there are n ML compute instances launched for a training job, each instance gets approximately 1/n of the number of S3 objects. In this case, model training on each machine uses only the subset of training data. Don't choose more ML compute instances for training than available S3 objects. If you do, some nodes won't get any data and you will pay for nodes that aren't getting any training data. This applies in both File and Pipe modes. Keep this in mind when developing algorithms. In distributed training, where you use multiple ML compute EC2 instances, you might choose `ShardedByS3Key`. If the algorithm requires copying training data to the ML storage volume (when `TrainingInputMode` is set to `File`), this copies 1/n of the number of objects.
    - `s3_data_type`**Type**: `STRING`**Provider name**: `S3DataType`**Description**: If you choose `S3Prefix`, `S3Uri` identifies a key name prefix. SageMaker uses all objects that match the specified key name prefix for model training. If you choose `ManifestFile`, `S3Uri` identifies an object that is a manifest file containing a list of object keys that you want SageMaker to use for model training. If you choose `AugmentedManifestFile`, `S3Uri` identifies an object that is an augmented manifest file in JSON lines format. This file contains the data you want to use for model training. `AugmentedManifestFile` can only be used if the Channel's input mode is `Pipe`.
    - `s3_uri`**Type**: `STRING`**Provider name**: `S3Uri`**Description**: Depending on the value specified for the `S3DataType`, identifies either a key name prefix or a manifest. For example:
      - A key name prefix might look like this: `s3://bucketname/exampleprefix/`
      - A manifest might look like this: `s3://bucketname/example.manifest` A manifest is an S3 object which is a JSON file consisting of an array of elements. The first element is a prefix which is followed by one or more suffixes. SageMaker appends the suffix elements to the prefix to get a full set of `S3Uri`. Note that the prefix must be a valid non-empty `S3Uri` that precludes users from specifying a manifest whose individual `S3Uri` is sourced from different S3 buckets. The following code example shows a valid manifest format: `[ {"prefix": "s3://customer_bucket/some/prefix/"},` `"relative/path/to/custdata-1",` `"relative/path/custdata-2",` `…` `"relative/path/custdata-N"` `]` This JSON is equivalent to the following `S3Uri` list: `s3://customer_bucket/some/prefix/relative/path/to/custdata-1` `s3://customer_bucket/some/prefix/relative/path/custdata-2` `…` `s3://customer_bucket/some/prefix/relative/path/custdata-N` The complete set of `S3Uri` in this manifest is the input data for the channel for this data source. The object that each `S3Uri` points to must be readable by the IAM role that SageMaker uses to perform tasks on your behalf.
Your input bucket must be located in same Amazon Web Services region as your training job.
- `input_mode`**Type**: `STRING`**Provider name**: `InputMode`**Description**: (Optional) The input mode to use for the data channel in a training job. If you don't set a value for `InputMode`, SageMaker uses the value set for `TrainingInputMode`. Use this parameter to override the `TrainingInputMode` setting in a [AlgorithmSpecification](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html) request when you have a channel that needs a different input mode from the training job's general setting. To download the data from Amazon Simple Storage Service (Amazon S3) to the provisioned ML storage volume, and mount the directory to a Docker volume, use `File` input mode. To stream data directly from Amazon S3 to the container, choose `Pipe` input mode. To use a model for incremental training, choose `File` input model.
- `record_wrapper_type`**Type**: `STRING`**Provider name**: `RecordWrapperType`**Description**: Specify RecordIO as the value when input data is in raw format but the training algorithm requires the RecordIO format. In this case, SageMaker wraps each individual S3 object in a RecordIO record. If the input data is already in RecordIO format, you don't need to set this attribute. For more information, see [Create a Dataset Using RecordIO](https://mxnet.apache.org/api/architecture/note_data_loading#data-format). In File mode, leave this field unset or set it to None.
- `shuffle_config`**Type**: `STRUCT`**Provider name**: `ShuffleConfig`**Description**: A configuration for a shuffle option for input data in a channel. If you use `S3Prefix` for `S3DataType`, this shuffles the results of the S3 key prefix matches. If you use `ManifestFile`, the order of the S3 object references in the `ManifestFile` is shuffled. If you use `AugmentedManifestFile`, the order of the JSON lines in the `AugmentedManifestFile` is shuffled. The shuffling order is determined using the `Seed` value. For Pipe input mode, shuffling is done at the start of every epoch. With large datasets this ensures that the order of the training data is different for each epoch, it helps reduce bias and possible overfitting. In a multi-node training job when ShuffleConfig is combined with `S3DataDistributionType` of `ShardedByS3Key`, the data is shuffled across nodes so that the content sent to a particular node on the first epoch might be sent to a different node on the second epoch.
  - `seed`**Type**: `INT64`**Provider name**: `Seed`**Description**: Determines the shuffling order in `ShuffleConfig` value.

## `labeling_job_arn`{% #labeling_job_arn %}

**Type**: `STRING`**Provider name**: `LabelingJobArn`**Description**: The Amazon Resource Name (ARN) of the SageMaker Ground Truth labeling job that created the transform or training job.

## `last_modified_time`{% #last_modified_time %}

**Type**: `TIMESTAMP`**Provider name**: `LastModifiedTime`**Description**: A timestamp that indicates when the status of the training job was last modified.

## `model_artifacts`{% #model_artifacts %}

**Type**: `STRUCT`**Provider name**: `ModelArtifacts`**Description**: Information about the Amazon S3 location that is configured for storing model artifacts.

- `s3_model_artifacts`**Type**: `STRING`**Provider name**: `S3ModelArtifacts`**Description**: The path of the S3 object that contains the model artifacts. For example, `s3://bucket-name/keynameprefix/model.tar.gz`.

## `output_data_config`{% #output_data_config %}

**Type**: `STRUCT`**Provider name**: `OutputDataConfig`**Description**: The S3 path where model artifacts that you configured when creating the job are stored. SageMaker creates subfolders for model artifacts.

- `compression_type`**Type**: `STRING`**Provider name**: `CompressionType`**Description**: The model output compression type. Select `None` to output an uncompressed model, recommended for large model outputs. Defaults to gzip.
- `kms_key_id`**Type**: `STRING`**Provider name**: `KmsKeyId`**Description**: The Amazon Web Services Key Management Service (Amazon Web Services KMS) key that SageMaker uses to encrypt the model artifacts at rest using Amazon S3 server-side encryption. The `KmsKeyId` can be any of the following formats:
  - // KMS Key ID `"1234abcd-12ab-34cd-56ef-1234567890ab"`
  - // Amazon Resource Name (ARN) of a KMS Key `"arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"`
  - // KMS Key Alias `"alias/ExampleAlias"`
  - // Amazon Resource Name (ARN) of a KMS Key Alias `"arn:aws:kms:us-west-2:111122223333:alias/ExampleAlias"`
If you use a KMS key ID or an alias of your KMS key, the SageMaker execution role must include permissions to call `kms:Encrypt`. If you don't provide a KMS key ID, SageMaker uses the default KMS key for Amazon S3 for your role's account. For more information, see [KMS-Managed Encryption Keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) in the Amazon Simple Storage Service Developer Guide. If the output data is stored in Amazon S3 Express One Zone, it is encrypted with server-side encryption with Amazon S3 managed keys (SSE-S3). KMS key is not supported for Amazon S3 Express One Zone The KMS key policy must grant permission to the IAM role that you specify in your `CreateTrainingJob`, `CreateTransformJob`, or `CreateHyperParameterTuningJob` requests. For more information, see [Using Key Policies in Amazon Web Services KMS](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html) in the Amazon Web Services Key Management Service Developer Guide.
- `s3_output_path`**Type**: `STRING`**Provider name**: `S3OutputPath`**Description**: Identifies the S3 path where you want SageMaker to store the model artifacts. For example, `s3://bucket-name/key-name-prefix`.

## `profiler_config`{% #profiler_config %}

**Type**: `STRUCT`**Provider name**: `ProfilerConfig`

- `disable_profiler`**Type**: `BOOLEAN`**Provider name**: `DisableProfiler`**Description**: Configuration to turn off Amazon SageMaker Debugger's system monitoring and profiling functionality. To turn it off, set to `True`.
- `profiling_interval_in_milliseconds`**Type**: `INT64`**Provider name**: `ProfilingIntervalInMilliseconds`**Description**: A time interval for capturing system metrics in milliseconds. Available values are 100, 200, 500, 1000 (1 second), 5000 (5 seconds), and 60000 (1 minute) milliseconds. The default value is 500 milliseconds.
- `profiling_parameters`**Type**: `MAP_STRING_STRING`**Provider name**: `ProfilingParameters`**Description**: Configuration information for capturing framework metrics. Available key strings for different profiling options are `DetailedProfilingConfig`, `PythonProfilingConfig`, and `DataLoaderProfilingConfig`. The following codes are configuration structures for the `ProfilingParameters` parameter. To learn more about how to configure the `ProfilingParameters` parameter, see [Use the SageMaker and Debugger Configuration API Operations to Create, Update, and Debug Your Training Job](https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-createtrainingjob-api.html).
- `s3_output_path`**Type**: `STRING`**Provider name**: `S3OutputPath`**Description**: Path to Amazon S3 storage location for system and framework metrics.

## `profiler_rule_configurations`{% #profiler_rule_configurations %}

**Type**: `UNORDERED_LIST_STRUCT`**Provider name**: `ProfilerRuleConfigurations`**Description**: Configuration information for Amazon SageMaker Debugger rules for profiling system and framework metrics.

- `instance_type`**Type**: `STRING`**Provider name**: `InstanceType`**Description**: The instance type to deploy a custom rule for profiling a training job.
- `local_path`**Type**: `STRING`**Provider name**: `LocalPath`**Description**: Path to local storage location for output of rules. Defaults to `/opt/ml/processing/output/rule/`.
- `rule_configuration_name`**Type**: `STRING`**Provider name**: `RuleConfigurationName`**Description**: The name of the rule configuration. It must be unique relative to other rule configuration names.
- `rule_evaluator_image`**Type**: `STRING`**Provider name**: `RuleEvaluatorImage`**Description**: The Amazon Elastic Container Registry Image for the managed rule evaluation.
- `rule_parameters`**Type**: `MAP_STRING_STRING`**Provider name**: `RuleParameters`**Description**: Runtime configuration for rule container.
- `s3_output_path`**Type**: `STRING`**Provider name**: `S3OutputPath`**Description**: Path to Amazon S3 storage location for rules.
- `volume_size_in_gb`**Type**: `INT32`**Provider name**: `VolumeSizeInGB`**Description**: The size, in GB, of the ML storage volume attached to the processing instance.

## `profiler_rule_evaluation_statuses`{% #profiler_rule_evaluation_statuses %}

**Type**: `UNORDERED_LIST_STRUCT`**Provider name**: `ProfilerRuleEvaluationStatuses`**Description**: Evaluation status of Amazon SageMaker Debugger rules for profiling on a training job.

- `last_modified_time`**Type**: `TIMESTAMP`**Provider name**: `LastModifiedTime`**Description**: Timestamp when the rule evaluation status was last modified.
- `rule_configuration_name`**Type**: `STRING`**Provider name**: `RuleConfigurationName`**Description**: The name of the rule configuration.
- `rule_evaluation_job_arn`**Type**: `STRING`**Provider name**: `RuleEvaluationJobArn`**Description**: The Amazon Resource Name (ARN) of the rule evaluation job.
- `rule_evaluation_status`**Type**: `STRING`**Provider name**: `RuleEvaluationStatus`**Description**: Status of the rule evaluation.
- `status_details`**Type**: `STRING`**Provider name**: `StatusDetails`**Description**: Details from the rule evaluation.

## `profiling_status`{% #profiling_status %}

**Type**: `STRING`**Provider name**: `ProfilingStatus`**Description**: Profiling status of a training job.

## `remote_debug_config`{% #remote_debug_config %}

**Type**: `STRUCT`**Provider name**: `RemoteDebugConfig`**Description**: Configuration for remote debugging. To learn more about the remote debugging functionality of SageMaker, see [Access a training container through Amazon Web Services Systems Manager (SSM) for remote debugging](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-debugging.html).

- `enable_remote_debug`**Type**: `BOOLEAN`**Provider name**: `EnableRemoteDebug`**Description**: If set to True, enables remote debugging.

## `resource_config`{% #resource_config %}

**Type**: `STRUCT`**Provider name**: `ResourceConfig`**Description**: Resources, including ML compute instances and ML storage volumes, that are configured for model training.

- `instance_count`**Type**: `INT32`**Provider name**: `InstanceCount`**Description**: The number of ML compute instances to use. For distributed training, provide a value greater than 1.
- `instance_groups`**Type**: `UNORDERED_LIST_STRUCT`**Provider name**: `InstanceGroups`**Description**: The configuration of a heterogeneous cluster in JSON format.
  - `instance_count`**Type**: `INT32`**Provider name**: `InstanceCount`**Description**: Specifies the number of instances of the instance group.
  - `instance_group_name`**Type**: `STRING`**Provider name**: `InstanceGroupName`**Description**: Specifies the name of the instance group.
  - `instance_type`**Type**: `STRING`**Provider name**: `InstanceType`**Description**: Specifies the instance type of the instance group.
- `instance_type`**Type**: `STRING`**Provider name**: `InstanceType`**Description**: The ML compute instance type.SageMaker Training on Amazon Elastic Compute Cloud (EC2) P4de instances is in preview release starting December 9th, 2022. [Amazon EC2 P4de instances](http://aws.amazon.com/ec2/instance-types/p4/) (currently in preview) are powered by 8 NVIDIA A100 GPUs with 80GB high-performance HBM2e GPU memory, which accelerate the speed of training ML models that need to be trained on large datasets of high-resolution data. In this preview release, Amazon SageMaker supports ML training jobs on P4de instances (`ml.p4de.24xlarge`) to reduce model training time. The `ml.p4de.24xlarge` instances are available in the following Amazon Web Services Regions.
  - US East (N. Virginia) (us-east-1)
  - US West (Oregon) (us-west-2)
To request quota limit increase and start using P4de instances, contact the SageMaker Training service team through your account team.
- `keep_alive_period_in_seconds`**Type**: `INT32`**Provider name**: `KeepAlivePeriodInSeconds`**Description**: The duration of time in seconds to retain configured resources in a warm pool for subsequent training jobs.
- `training_plan_arn`**Type**: `STRING`**Provider name**: `TrainingPlanArn`**Description**: The Amazon Resource Name (ARN); of the training plan to use for this resource configuration.
- `volume_kms_key_id`**Type**: `STRING`**Provider name**: `VolumeKmsKeyId`**Description**: The Amazon Web Services KMS key that SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) that run the training job.Certain Nitro-based instances include local storage, dependent on the instance type. Local storage volumes are encrypted using a hardware module on the instance. You can't request a `VolumeKmsKeyId` when using an instance type with local storage. For a list of instance types that support local instance storage, see [Instance Store Volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#instance-store-volumes). For more information about local instance storage encryption, see [SSD Instance Store Volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html).The `VolumeKmsKeyId` can be in any of the following formats:
  - // KMS Key ID `"1234abcd-12ab-34cd-56ef-1234567890ab"`
  - // Amazon Resource Name (ARN) of a KMS Key `"arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"`
- `volume_size_in_gb`**Type**: `INT32`**Provider name**: `VolumeSizeInGB`**Description**: The size of the ML storage volume that you want to provision. ML storage volumes store model artifacts and incremental states. Training algorithms might also use the ML storage volume for scratch space. If you want to store the training data in the ML storage volume, choose `File` as the `TrainingInputMode` in the algorithm specification. When using an ML instance with [NVMe SSD volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#nvme-ssd-volumes), SageMaker doesn't provision Amazon EBS General Purpose SSD (gp2) storage. Available storage is fixed to the NVMe-type instance's storage capacity. SageMaker configures storage paths for training datasets, checkpoints, model artifacts, and outputs to use the entire capacity of the instance storage. For example, ML instance families with the NVMe-type instance storage include `ml.p4d`, `ml.g4dn`, and `ml.g5`. When using an ML instance with the EBS-only storage option and without instance storage, you must define the size of EBS volume through `VolumeSizeInGB` in the `ResourceConfig` API. For example, ML instance families that use EBS volumes include `ml.c5` and `ml.p2`. To look up instance types and their instance storage types and volumes, see [Amazon EC2 Instance Types](http://aws.amazon.com/ec2/instance-types/). To find the default local paths defined by the SageMaker training platform, see [Amazon SageMaker Training Storage Folders for Training Datasets, Checkpoints, Model Artifacts, and Outputs](https://docs.aws.amazon.com/sagemaker/latest/dg/model-train-storage.html).

## `retry_strategy`{% #retry_strategy %}

**Type**: `STRUCT`**Provider name**: `RetryStrategy`**Description**: The number of times to retry the job when the job fails due to an `InternalServerError`.

- `maximum_retry_attempts`**Type**: `INT32`**Provider name**: `MaximumRetryAttempts`**Description**: The number of times to retry the job. When the job is retried, it's `SecondaryStatus` is changed to `STARTING`.

## `role_arn`{% #role_arn %}

**Type**: `STRING`**Provider name**: `RoleArn`**Description**: The Amazon Web Services Identity and Access Management (IAM) role configured for the training job.

## `secondary_status`{% #secondary_status %}

**Type**: `STRING`**Provider name**: `SecondaryStatus`**Description**: Provides detailed information about the state of the training job. For detailed information on the secondary status of the training job, see `StatusMessage` under [SecondaryStatusTransition](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_SecondaryStatusTransition.html). SageMaker provides primary statuses and secondary statuses that apply to each of them:

{% dl %}

{% dt %}
InProgress
{% /dt %}

{% dd %}

- `Starting` - Starting the training job.
- `Downloading` - An optional stage for algorithms that support `File` training input mode. It indicates that data is being downloaded to the ML storage volumes.
- `Training` - Training is in progress.
- `Interrupted` - The job stopped because the managed spot training instances were interrupted.
- `Uploading` - Training is complete and the model artifacts are being uploaded to the S3 location.

{% /dd %}

{% dt %}
Completed
{% /dt %}

{% dd %}

- `Completed` - The training job has completed.

{% /dd %}

{% dt %}
Failed
{% /dt %}

{% dd %}

- `Failed` - The training job has failed. The reason for the failure is returned in the `FailureReason` field of `DescribeTrainingJobResponse`.

{% /dd %}

{% dt %}
Stopped
{% /dt %}

{% dd %}

- `MaxRuntimeExceeded` - The job stopped because it exceeded the maximum allowed runtime.
- `MaxWaitTimeExceeded` - The job stopped because it exceeded the maximum allowed wait time.
- `Stopped` - The training job has stopped.

{% /dd %}

{% dt %}
Stopping
{% /dt %}

{% dd %}

- `Stopping` - Stopping the training job.

{% /dd %}

{% /dl %}
Valid values for `SecondaryStatus` are subject to change.We no longer support the following secondary statuses:
- `LaunchingMLInstances`
- `PreparingTraining`
- `DownloadingTrainingImage`


## `secondary_status_transitions`{% #secondary_status_transitions %}

**Type**: `UNORDERED_LIST_STRUCT`**Provider name**: `SecondaryStatusTransitions`**Description**: A history of all of the secondary statuses that the training job has transitioned through.

- `end_time`**Type**: `TIMESTAMP`**Provider name**: `EndTime`**Description**: A timestamp that shows when the training job transitioned out of this secondary status state into another secondary status state or when the training job has ended.
- `start_time`**Type**: `TIMESTAMP`**Provider name**: `StartTime`**Description**: A timestamp that shows when the training job transitioned to the current secondary status state.
- `status`**Type**: `STRING`**Provider name**: `Status`**Description**: Contains a secondary status information from a training job. Status might be one of the following secondary statuses:
  {% dl %}
  
  {% dt %}
InProgress
  {% /dt %}

  {% dd %}

  - `Starting` - Starting the training job.
  - `Downloading` - An optional stage for algorithms that support `File` training input mode. It indicates that data is being downloaded to the ML storage volumes.
  - `Training` - Training is in progress.
  - `Uploading` - Training is complete and the model artifacts are being uploaded to the S3 location.

  {% /dd %}

  {% dt %}
Completed
  {% /dt %}

  {% dd %}

  - `Completed` - The training job has completed.

  {% /dd %}

  {% dt %}
Failed
  {% /dt %}

  {% dd %}

  - `Failed` - The training job has failed. The reason for the failure is returned in the `FailureReason` field of `DescribeTrainingJobResponse`.

  {% /dd %}

  {% dt %}
Stopped
  {% /dt %}

  {% dd %}

  - `MaxRuntimeExceeded` - The job stopped because it exceeded the maximum allowed runtime.
  - `Stopped` - The training job has stopped.

  {% /dd %}

  {% dt %}
Stopping
  {% /dt %}

  {% dd %}

  - `Stopping` - Stopping the training job.

  {% /dd %}

    {% /dl %}
We no longer support the following secondary statuses:
  - `LaunchingMLInstances`
  - `PreparingTrainingStack`
  - `DownloadingTrainingImage`
- `status_message`**Type**: `STRING`**Provider name**: `StatusMessage`**Description**: A detailed description of the progress within a secondary status. SageMaker provides secondary statuses and status messages that apply to each of them:
  {% dl %}
  
  {% dt %}
Starting
  {% /dt %}

  {% dd %}

  - Starting the training job.
  - Launching requested ML instances.
  - Insufficient capacity error from EC2 while launching instances, retrying!
  - Launched instance was unhealthy, replacing it!
  - Preparing the instances for training.

  {% /dd %}

  {% dt %}
Training
  {% /dt %}

  {% dd %}

  - Training image download completed. Training in progress.

  {% /dd %}

    {% /dl %}
Status messages are subject to change. Therefore, we recommend not including them in code that programmatically initiates actions. For examples, don't use status messages in if statements.To have an overview of your training job's progress, view `TrainingJobStatus` and `SecondaryStatus` in [DescribeTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html), and `StatusMessage` together. For example, at the start of a training job, you might see the following:
  - `TrainingJobStatus` - InProgress
  - `SecondaryStatus` - Training
  - `StatusMessage` - Downloading the training image

## `stopping_condition`{% #stopping_condition %}

**Type**: `STRUCT`**Provider name**: `StoppingCondition`**Description**: Specifies a limit to how long a model training job can run. It also specifies how long a managed Spot training job has to complete. When the job reaches the time limit, SageMaker ends the training job. Use this API to cap model training costs. To stop a job, SageMaker sends the algorithm the `SIGTERM` signal, which delays job termination for 120 seconds. Algorithms can use this 120-second window to save the model artifacts, so the results of training are not lost.

- `max_pending_time_in_seconds`**Type**: `INT32`**Provider name**: `MaxPendingTimeInSeconds`**Description**: The maximum length of time, in seconds, that a training or compilation job can be pending before it is stopped.When working with training jobs that use capacity from [training plans](https://docs.aws.amazon.com/sagemaker/latest/dg/reserve-capacity-with-training-plans.html), not all `Pending` job states count against the `MaxPendingTimeInSeconds` limit. The following scenarios do not increment the `MaxPendingTimeInSeconds` counter:
  - The plan is in a `Scheduled` state: Jobs queued (in `Pending` status) before a plan's start date (waiting for scheduled start time)
  - Between capacity reservations: Jobs temporarily back to `Pending` status between two capacity reservation periods
`MaxPendingTimeInSeconds` only increments when jobs are actively waiting for capacity in an `Active` plan.
- `max_runtime_in_seconds`**Type**: `INT32`**Provider name**: `MaxRuntimeInSeconds`**Description**: The maximum length of time, in seconds, that a training or compilation job can run before it is stopped. For compilation jobs, if the job does not complete during this time, a `TimeOut` error is generated. We recommend starting with 900 seconds and increasing as necessary based on your model. For all other jobs, if the job does not complete during this time, SageMaker ends the job. When `RetryStrategy` is specified in the job request, `MaxRuntimeInSeconds` specifies the maximum time for all of the attempts in total, not each individual attempt. The default value is 1 day. The maximum value is 28 days. The maximum time that a `TrainingJob` can run in total, including any time spent publishing metrics or archiving and uploading models after it has been stopped, is 30 days.
- `max_wait_time_in_seconds`**Type**: `INT32`**Provider name**: `MaxWaitTimeInSeconds`**Description**: The maximum length of time, in seconds, that a managed Spot training job has to complete. It is the amount of time spent waiting for Spot capacity plus the amount of time the job can run. It must be equal to or greater than `MaxRuntimeInSeconds`. If the job does not complete during this time, SageMaker ends the job. When `RetryStrategy` is specified in the job request, `MaxWaitTimeInSeconds` specifies the maximum time for all of the attempts in total, not each individual attempt.

## `tags`{% #tags %}

**Type**: `UNORDERED_LIST_STRING`

## `tensor_board_output_config`{% #tensor_board_output_config %}

**Type**: `STRUCT`**Provider name**: `TensorBoardOutputConfig`

- `local_path`**Type**: `STRING`**Provider name**: `LocalPath`**Description**: Path to local storage location for tensorBoard output. Defaults to `/opt/ml/output/tensorboard`.
- `s3_output_path`**Type**: `STRING`**Provider name**: `S3OutputPath`**Description**: Path to Amazon S3 storage location for TensorBoard output.

## `training_end_time`{% #training_end_time %}

**Type**: `TIMESTAMP`**Provider name**: `TrainingEndTime`**Description**: Indicates the time when the training job ends on training instances. You are billed for the time interval between the value of `TrainingStartTime` and this time. For successful jobs and stopped jobs, this is the time after model artifacts are uploaded. For failed jobs, this is the time when SageMaker detects a job failure.

## `training_job_arn`{% #training_job_arn %}

**Type**: `STRING`**Provider name**: `TrainingJobArn`**Description**: The Amazon Resource Name (ARN) of the training job.

## `training_job_name`{% #training_job_name %}

**Type**: `STRING`**Provider name**: `TrainingJobName`**Description**: Name of the model training job.

## `training_job_status`{% #training_job_status %}

**Type**: `STRING`**Provider name**: `TrainingJobStatus`**Description**: The status of the training job. SageMaker provides the following training job statuses:

- `InProgress` - The training is in progress.
- `Completed` - The training job has completed.
- `Failed` - The training job has failed. To see the reason for the failure, see the `FailureReason` field in the response to a `DescribeTrainingJobResponse` call.
- `Stopping` - The training job is stopping.
- `Stopped` - The training job has stopped.
For more detailed information, see `SecondaryStatus`.


## `training_start_time`{% #training_start_time %}

**Type**: `TIMESTAMP`**Provider name**: `TrainingStartTime`**Description**: Indicates the time when the training job starts on training instances. You are billed for the time interval between this time and the value of `TrainingEndTime`. The start time in CloudWatch Logs might be later than this time. The difference is due to the time it takes to download the training data and to the size of the training container.

## `training_time_in_seconds`{% #training_time_in_seconds %}

**Type**: `INT32`**Provider name**: `TrainingTimeInSeconds`**Description**: The training time in seconds.

## `tuning_job_arn`{% #tuning_job_arn %}

**Type**: `STRING`**Provider name**: `TuningJobArn`**Description**: The Amazon Resource Name (ARN) of the associated hyperparameter tuning job if the training job was launched by a hyperparameter tuning job.

## `vpc_config`{% #vpc_config %}

**Type**: `STRUCT`**Provider name**: `VpcConfig`**Description**: A [VpcConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_VpcConfig.html) object that specifies the VPC that this training job has access to. For more information, see [Protect Training Jobs by Using an Amazon Virtual Private Cloud](https://docs.aws.amazon.com/sagemaker/latest/dg/train-vpc.html).

- `security_group_ids`**Type**: `UNORDERED_LIST_STRING`**Provider name**: `SecurityGroupIds`**Description**: The VPC security group IDs, in the form `sg-xxxxxxxx`. Specify the security groups for the VPC that is specified in the `Subnets` field.
- `subnets`**Type**: `UNORDERED_LIST_STRING`**Provider name**: `Subnets`**Description**: The ID of the subnets in the VPC to which you want to connect your training job or model. For information about the availability of specific instance types, see [Supported Instance Types and Availability Zones](https://docs.aws.amazon.com/sagemaker/latest/dg/instance-types-az.html).

## `warm_pool_status`{% #warm_pool_status %}

**Type**: `STRUCT`**Provider name**: `WarmPoolStatus`**Description**: The status of the warm pool associated with the training job.

- `resource_retained_billable_time_in_seconds`**Type**: `INT32`**Provider name**: `ResourceRetainedBillableTimeInSeconds`**Description**: The billable time in seconds used by the warm pool. Billable time refers to the absolute wall-clock time. Multiply `ResourceRetainedBillableTimeInSeconds` by the number of instances (`InstanceCount`) in your training cluster to get the total compute time SageMaker bills you if you run warm pool training. The formula is as follows: `ResourceRetainedBillableTimeInSeconds * InstanceCount`.
- `reused_by_job`**Type**: `STRING`**Provider name**: `ReusedByJob`**Description**: The name of the matching training job that reused the warm pool.
- `status`**Type**: `STRING`**Provider name**: `Status`**Description**: The status of the warm pool.
  - `InUse`: The warm pool is in use for the training job.
  - `Available`: The warm pool is available to reuse for a matching training job.
  - `Reused`: The warm pool moved to a matching training job for reuse.
  - `Terminated`: The warm pool is no longer available. Warm pools are unavailable if they are terminated by a user, terminated for a patch update, or terminated for exceeding the specified `KeepAlivePeriodInSeconds`.