AWS SageMaker

Overview

Amazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and directly deploy them into a production-ready hosted environment.

SageMaker provides:

An integrated Jupyter authoring notebook instance for easy access to your data sources for exploration and analysis, so you do not have to manage servers.
A common machine learning algorithms that are optimized to run efficiently against extremely large data in a distributed environment.

With native support for bring-your-own-algorithms and frameworks, Amazon SageMaker offers flexible distributed training options that adjust to your specific workflows. Deploy a model into a secure and scalable environment by launching it with a single click from the Amazon SageMaker console. Training and hosting are billed by minutes of usage, with no minimum fees and no upfront commitments.

Amazon SageMaker Ground Truth

High-quality training datasets by using workers including machine learning to create labeled datasets.

See GroundTruth metrics.

Amazon SageMaker Training

An Amazon SageMaker training job is an iterative process that teaches a model to make predictions by presenting examples from a training dataset. Typically, a training algorithm computes several metrics, such as training error and prediction accuracy. These metrics help diagnose whether the model is learning well and will generalize well for making predictions on unseen data. The training algorithm writes the values of these metrics to logs, which Amazon SageMaker monitors and sends to Amazon CloudWatch in real-time.

See Training metrics.

Amazon SageMaker Endpoint

Creates an endpoint using the endpoint configuration specified in the request. Amazon SageMaker uses the endpoint to provision resources and deploy models.

Amazon SageMaker Transform Job

Use batch transform when you need to do the following:

Preprocess datasets to remove noise or bias that interferes with training or inference from your dataset.
Get inferences from large datasets.
Run inference when you do not need a persistent endpoint.
Associate input records with inferences to help interpretation results.

External reference

Amazon SageMaker

Setup

To set up the integration:

Select SageMaker GroundTruth in AWS Integration Discovery Profile to discover AWS SageMaker GroundTruth.
Select SageMaker Training in AWS Integration Discovery Profile to discover AWS SageMaker Training Job.
Select SageMaker EndPoint in AWS Integration Discovery Profile to discover AWS SageMaker Endpoint.
Select SageMaker Transform Job in AWS Integration Discovery Profile to discover AWS SageMaker Transform Job.

Event support

CloudTrail event support

Supported (Sagemaker GroundTruth, Training, Endpoint, Transform Job)
Configurable in OpsRamp AWS Integration Discovery Profile.

CloudWatch alarm support

Not Supported

GroundTruth metrics

OpsRamp Metric	Metric Display Name	Unit	Aggregation Type
aws_sagemaker_labelingjobs_ActiveWorkers Number of workers on a private work team performing a labeling job.	ActiveWorkers	Count	Sum
aws_sagemaker_labelingjobs_JobsSucceeded Number of labeling jobs that succeeded. To get the total number of labeling jobs that succeeded.	JobsSucceeded	None	Sum
aws_sagemaker_labelingjobs_DatasetObjectsAutoAnnotated Number of dataset objects auto-annotated in a labeling job.	DatasetObjectsAutoAnnotated	Count	Max
aws_sagemaker_labelingjobs_DatasetObjectsHumanAnnotated Number of dataset objects annotated by a human in a labeling job.	DatasetObjectsHumanAnnotated	Count	Max
aws_sagemaker_labelingjobs_DatasetObjectsLabelingFailed Number of dataset objects that failed labeling in a labeling job.	DatasetObjectsLabelingFailed	Count	Max
aws_sagemaker_labelingjobs_TotalDatasetObjectsLabeled Number of dataset objects labeled successfully in a labeling job.	TotalDatasetObjectsLabeled	Count	Max
aws_sagemaker_labelingjobs_JobsStopped Number of labeling jobs that were stopped.	JobsStopped	Count	Sum

Training metrics

OpsRamp Metric	Metric Display Name	Unit	Aggregation Type
aws_sagemaker_trainingjobs_CPUUtilization Percentage of CPU units used by the containers on an instance.	CPUUtilization	Percent	Average
aws_sagemaker_trainingjobs_MemoryUtilization Percentage of memory used by the containers on an instance.	MemoryUtilization	Percent	Average
aws_sagemaker_trainingjobs_GPUUtilization Percentage of GPU units used by the containers on an instance.	GPUUtilization	Percent	Average
aws_sagemaker_trainingjobs_GPUMemoryUtilization Percentage of GPU memory used by the containers on an instance.	GPUMemoryUtilization	Percent	Average
aws_sagemaker_trainingjobs_DiskUtilization Percentage of disk space used by the containers on an instance.	DiskUtilization	Percent	Average