Introduction
The alert definition feature allows you to set alerts on a metric using a PromQL query. It is currently defined in the K8s ConfigMap named opsramp-alert-user-config
, which you can find in the OpsRamp agent installed namespace.
The Kubernetes 2.0 agent computes the alert metric value based on the PromQL expression and computes the alert state by comparing the thresholds in the alert definition.
The OpsRamp Agent sends the alert on the K8s resource based on labels generated in resultant metric streams after running promQL query. If no resource matches with labels then alert will be sent on the cluster.
Alert Definition Template
Sample template to define a single alert:
- resourceType: k8s_resource_type
rules:
- name: alert_definition_name
interval: alert_polling_time
expr: promql_expression
isAvailability: true
warnOperator: operator_macro
warnThreshold: str_threshold_value
criticalOperator: operator_macro
criticalThreshold: str_threshold_value
alertSub: alert_subject
alertBody: alert_description
Explanation of template fields:
resourceType
: Specify the type of K8s resource (e.g., k8s_pod). Following are the possible values for resourceType:- k8s_pod
- k8s_node
- k8s_namespace
- k8s_service
- k8s_pv
- k8s_pvc
- k8s_deployment
- k8s_replicaset
- k8s_daemonset
- k8s_statefulset
rules
: A set of rules for the alert definition.name
: A unique name for the alert.interval
: Polling interval at which alert definition should run. The interval should given in time duration format (e.g., 1m, 5m, 1h).expr
: A valid PromQL query expression for calculating the metric.isAvailability
: Boolean indicating if the alert should consider resource availability.warnOperator
&criticalOperator
: Operators to compare and compute alert states. OpsRamp supports below operators for comparison:GREATER_THAN_EQUAL
GREATER_THAN
EQUAL
NOT_EQUAL
LESS_THAN_EQUAL
LESS_THAN
warnThreshold
&criticalThreshold
: Values for warning and critical thresholds.alertSub
&alertBody
: Content displayed for alerts, which can use macros for dynamic values. Below are macros that can be used while defining alert subject/body:- ${
severity
} - ${
metric.name
} - ${
component.name
} - ${
metric.value
} - ${
threshold
} - ${
resource.name
} - ${
resource.uniqueid
}
- ${
Configure Alert
OpsRamp provides basic alert definitions for resources like pods and nodes by default. Users can configure the alert definitions by editing the K8s ConfigMap below in the namespace where the agent is installed.
K8s ConfigMap Name: opsramp-alert-user-config
Access the ConfigMap:
To view or edit the ConfigMap, use the following command:
kubectl get configmap opsramp-alert-definitions -n <agent-installed-namespace> -oyaml
Example ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: "opsramp-alert-user-config-tmp"
namespace: {{ include "common.names.namespace" . | quote }}
labels:
app: "opsramp-alert-user-config"
data:
alert-definitions.yaml: |
alertDefinitions:
- resourceType: k8s_pod
rules:
- name: k8s_pod_phase
interval: 2m
expr: (k8s_pod_phase == bool 2) OR (k8s_pod_phase == bool 3)
isAvailability: true
criticalOperator: EQUAL
criticalThreshold: '0'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
- name: k8s_pod_cpu_usage_percent
interval: 2m
expr: ((k8s_pod_cpu_limit_utilization_ratio*100 ) OR (k8s_pod_cpu_limit_utilization))
isAvailability: true
warnOperator: GREATER_THAN_EQUAL
warnThreshold: '90'
criticalOperator: GREATER_THAN_EQUAL
criticalThreshold: '95'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
- name: k8s_pod_memory_usage_percent
interval: 2m
expr: ((k8s_pod_memory_limit_utilization_ratio*100 ) OR (k8s_pod_memory_limit_utilization))
isAvailability: true
warnOperator: GREATER_THAN_EQUAL
warnThreshold: '90'
criticalOperator: GREATER_THAN_EQUAL
criticalThreshold: '95'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
- name: k8s_pod_memory_usage_mb
interval: 2m
expr: (k8s_pod_memory_usage_bytes/1000000)
warnOperator: GREATER_THAN_EQUAL
warnThreshold: '200'
criticalOperator: GREATER_THAN_EQUAL
criticalThreshold: '300'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
- resourceType: k8s_node
rules:
- name: k8s_node_condition
interval: 2m
expr: ((k8s_node_condition_ready == bool 1) * (k8s_node_condition_disk_pressure == bool 0) * (k8s_node_condition_memory_pressure == bool 0) * (k8s_node_condition_network_unavailable == bool 0) * (k8s_node_condition_pid_pressure == bool 0))
isAvailability: true
criticalOperator: EQUAL
criticalThreshold: '0'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
- resourceType: k8s_namespace
rules:
- name: k8s_namespace_memory
interval: 2m
expr: >-
(sum by (k8s_cluster_name, k8s_namespace_name)
((k8s_pod_memory_usage_bytes/1000000)))
isAvailability: true
warnOperator: GREATER_THAN_EQUAL
warnThreshold: '800'
criticalOperator: GREATER_THAN_EQUAL
criticalThreshold: '1000'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
- resourceType: k8s_deployment
rules:
- name: k8s_deployment_status
interval: 2m
expr: (k8s_deployment_available/k8s_deployment_desired)
isAvailability: true
warnOperator: LESS_THAN_EQUAL
warnThreshold: '0.9'
criticalOperator: LESS_THAN_EQUAL
criticalThreshold: '0.8'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
- resourceType: k8s_replicaset
rules:
- name: k8s_replicaset_status
interval: 2m
expr: (k8s_replicaset_available/k8s_replicaset_desired)
isAvailability: true
warnOperator: LESS_THAN_EQUAL
warnThreshold: '0.9'
criticalOperator: LESS_THAN_EQUAL
criticalThreshold: '0.8'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
- resourceType: k8s_daemonset
rules:
- name: k8s_daemonset_status
interval: 2m
expr: >-
(k8s_daemonset_current_scheduled_nodes/k8s_daemonset_desired_scheduled_nodes)
isAvailability: true
warnOperator: LESS_THAN_EQUAL
warnThreshold: '0.9'
criticalOperator: LESS_THAN_EQUAL
criticalThreshold: '0.8'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
- resourceType: k8s_statefulset
rules:
- name: k8s_statefulset_status
interval: 2m
expr: (k8s_statefulset_current_pods/k8s_statefulset_desired_pods)
isAvailability: true
warnOperator: LESS_THAN_EQUAL
warnThreshold: '0.9'
criticalOperator: LESS_THAN_EQUAL
criticalThreshold: '0.8'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
You can Add or Remove new alerts using standard PromQL expressions.
Configure Availability
To configure availability of a resource, define alert definition and make the key isAvailability
as true.
Same alert definition rule will be used to compute the availability of a resource. For example, if we want to define Pod availability based on pod memory usage, we can give alert definition as below with isAvailability
as true.
- resourceType: k8s_resource_type
rules:
name: k8s_pod_memory_usage_percent
interval: 2m
expr: ((k8s_pod_memory_limit_utilization_ratio*100 ) OR (k8s_pod_memory_limit_utilization))
isAvailability: true
warnOperator: GREATER_THAN_EQUAL
warnThreshold: '90'
criticalOperator: GREATER_THAN_EQUAL
criticalThreshold: '95'
alertSub: '${severity}. ${metric.name} on the resource ${resource.name} is ${metric.value}.'
alertBody: '${severity}. ${metric.name} on the resource: ${resource.name} is ${metric.value}.'
Here if k8s_pod_memory_usage_percent
is either warning or critical, availability will be considered as down otherwise up.
View Alerts in OpsRamp Portal
Users can view logs in the OpsRamp portal from Command Center > Alerts.
- Default Alerts Screen: View your alerts through the Alerts UI.
- Alerts Details: Click on any alert entry for detailed information.