Amazon EMR is a managed cluster platform that simplifies running big data frameworks (such as Apache Hadoop and Apache Spark) on AWS to process and analyze vast amounts of data.
By using these frameworks and related open-source projects (such as Apache Hive and Apache Pig), you can:
- Process data for analytics purposes and business intelligence workloads.
- Use Amazon EMR to transform and move large amounts of data into and out of other AWS data stores and databases. For example, Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.
Use the AWS public cloud integration to discover and collect metrics against the AWS service.
External reference
Setup
To set up the AWS integration and discover the AWS service, go to AWS Integration Discovery Profile and select EMR
.
Event support
CloudTrail event support
- Supported
- Configurable in OpsRamp AWS Integration Discovery Profile.
CloudWatch alarm support
- Supported
- Configurable in OpsRamp AWS Integration Discovery Profile.
Supported metrics
OpsRamp Metric | AWS Metric | Metric Display Name | Unit | Aggregation Type |
---|---|---|---|---|
aws_elasticmapreduce_IsIdle Indicates that a cluster is no longer performing work, but is still alive and accruing charges. Set to 1 if no tasks and jobs are running; set to 0 otherwise. | IsIdle | IsIdle | Count | Average |
aws_elasticmapreduce_ContainerAllocated Number of resource containers allocated by the ResourceManager. | ContainerAllocated | ContainerAllocated | Count | Sum |
aws_elasticmapreduce_ContainerReserved Number of containers reserved. | ContainerReserved | ContainerReserved | Count | Sum |
aws_elasticmapreduce_ContainerPending Number of containers in the queue that have not yet been allocated. | ContainerPending | ContainerPending | Count | Sum |
aws_elasticmapreduce_AppsCompleted Number of applications submitted to YARN (Hadoop generation)) that have completed. | AppsCompleted | AppsCompleted | Count | Sum |
aws_elasticmapreduce_AppsKilled Number of killed applications submitted to YARN (Hadoop generation). | AppsKilled | AppsKilled | Count | Sum |
aws_elasticmapreduce_AppsPending Number of applications submitted to YARN (Hadoop generation) that are in a pending state. | AppsPending | AppsPending | Count | Sum |
aws_elasticmapreduce_AppsRunning Number of applications submitted to YARN (Hadoop generation) that are running. | AppsRunning | AppsRunning | Count | Sum |
aws_elasticmapreduce_AppsSubmitted Number of applications submitted to YARN (Hadoop generation). | AppsSubmitted | AppsSubmitted | Count | Sum |
aws_elasticmapreduce_CapacityRemainingGB Amount of remaining HDFS disk capacity. | CapacityRemainingGB | CapacityRemainingGB | Bytes | Sum |
aws_elasticmapreduce_CoreNodesRunning Number of core nodes working. Data points for this metric are reported only when a corresponding instance group exists. | CoreNodesRunning | CoreNodesRunning | Count | Sum |
aws_elasticmapreduce_CoreNodesPending Number of core nodes waiting to be assigned. All of the core nodes requested may not be immediately available; this metric reports the pending requests. | CoreNodesPending | CoreNodesPending | Count | Sum |
aws_elasticmapreduce_CorruptBlocks Gives the big picture about what is going on with cluster and can provide insight into what is causing the slow down in processing. | CorruptBlocks | CorruptBlocks | Count | Sum |
aws_elasticmapreduce_HDFSUtilization Percentage of HDFS storage currently used. | HDFSUtilization | HDFSUtilization | Percent | Average |
aws_elasticmapreduce_HDFSBytesRead Number of bytes read from HDFS. | HDFSBytesRead | HDFSBytesRead | Bytes Read | Sum |
aws_elasticmapreduce_HDFSBytesWritten Number of bytes written to HDFS. | HDFSBytesWritten | HDFSBytesWritten | Bytes Written | Sum |
aws_elasticmapreduce_LiveDataNodes Percentage of data nodes that are receiving work from Hadoop. | LiveDataNodes | LiveDataNodes | Percent | Average |
aws_elasticmapreduce_MRTotalNodes Number of nodes presently available to MapReduce jobs. | MRTotalNodes | MRTotalNodes | Count | Sum |
aws_elasticmapreduce_MRActiveNodes Number of nodes presently running MapReduce tasks or jobs. | MRActiveNodes | MRActiveNodes | Count | Sum |
aws_elasticmapreduce_MRLostNodes Number of nodes allocated to MapReduce marked in a LOST state. | MRLostNodes | MRLostNodes | Count | Sum |
aws_elasticmapreduce_MRUnhealthyNodes Number of nodes available to MapReduce jobs marked in an UNHEALTHY state. | MRUnhealthyNodes | MRUnhealthyNodes | Sum | |
aws_elasticmapreduce_MRDecommissionedNodes Number of nodes allocated to MapReduce applications marked in a DECOMMISSIONED state. | MRDecommissionedNodes | MRDecommissionedNodes | Count | Sum |
aws_elasticmapreduce_MRRebootedNodes Number of nodes available to MapReduce rebooted and marked in a REBOOTED state. | MRRebootedNodes | MRRebootedNodes | Count | Sum |
aws_elasticmapreduce_S3BytesWritten Number of bytes written to Amazon S3. | S3BytesWritten | S3BytesWritten | Bytes Written | Sum |
aws_elasticmapreduce_S3BytesRead Number of bytes read from Amazon S3. | S3BytesRead | S3BytesRead | Bytes Read | Sum |
aws_elasticmapreduce_MissingBlocks Number of blocks in which HDFS has no replicas. These might be corrupt blocks. | MissingBlocks | MissingBlocks | Count | Sum |
aws_elasticmapreduce_TotalLoad Total number of concurrent data transfers. | TotalLoad | TotalLoad | Count | Sum |
aws_elasticmapreduce_MemoryTotalMB Total amount of memory in the cluster. | MemoryTotalMB | MemoryTotalMB | Bytes | Sum |
aws_elasticmapreduce_MemoryReservedMB Amount of memory reserved. | MemoryReservedMB | MemoryReservedMB | Bytes | Sum |
aws_elasticmapreduce_MemoryAvailableMB Amount of memory available to be allocated. | MemoryAvailableMB | MemoryAvailableMB | Bytes | Sum |
aws_elasticmapreduce_MemoryAllocatedMB Amount of memory allocated to the cluster. | MemoryAllocatedMB | MemoryAllocatedMB | Bytes | Sum |
aws_elasticmapreduce_PendingDeletionBlocks Number of blocks marked for deletion. | PendingDeletionBlocks | PendingDeletionBlocks | Count | Sum |
aws_elasticmapreduce_UnderReplicatedBlocks Number of blocks that need to be replicated one or more times. | UnderReplicatedBlocks | UnderReplicatedBlocks | Count | Sum |
aws_elasticmapreduce_dfs_FSNamesystem_PendingReplicationBlocks Status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests. | dfsPendingReplicationBlocks | dfs.FSNamesystem.PendingReplicationBlocks | Count | Average |
aws_elasticmapreduce_ContainerPendingRatio Ratio of pending containers to containers allocated (ContainerPendingRatio = ContainerPending / ContainerAllocated). If ContainerAllocated = 0, ContainerPendingRatio = ContainerPending. The value of ContainerPendingRatio represents a number, not a percentage. This value is useful for scaling cluster resources based on container allocation behavior. | ContainerPendingRatio | Container Pending Ratio | Count | Sum |
aws_elasticmapreduce_AppsFailed Number of applications submitted to YARN that have failed to complete. | AppsFailed | Apps Failed | Count | Sum |
aws_elasticmapreduce_YARNMemoryAvailablePercentage Percentage of remaining memory available to YARN (YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB). This value is useful for scaling cluster resources based on YARN memory usage. | YARNMemoryAvailablePercentage | YARN Memory Available Percentage | Percent | Average |
cloud.instance.state n/a | Status/State | n/a | n/a |