To provide notification when there is a change in the availability of agent-monitored metrics, the agent self-monitoring feature generates alerts when agent-monitored metrics are interrupted and, again, when metric monitoring is restored.
This provides client visibility into the window when the monitored metrics might differ from what is expected.
Operation
Periodically, every six hours by default, the agent compares the metrics it is configured to collect with the metrics actually collected. If there is a change in the metrics collected over the interval, fewer than expected metrics are collected, or full or partial metric collection is restored, the agent alerts on the change in status. If there is no change in status over the interval, an additional alert is not generated even if the same metrics continue to be unmonitored. If communication to the agent is interrupted, the agent sends the unmonitored metrics alert when communication is restored.
If metrics collection fails and is restored in the six-hour interval, an alert is not generated. You can change the self-monitoring interval to between one and 12 hours.
Self-monitoring generates one alert per device or agent resource, not an alert per metric. The alert is one of the following types:
Alert type | Condition | Alert message |
---|---|---|
critical | missing metrics or a change in missing metrics since the last audit | Agent: no metric samples collected from some monitors |
healed | no missing metrics | Agent: metric samples collected from all monitors |
Critical- and warning-level alerts include a list of metrics that are expected but not currently monitored. Metrics are grouped by agent template categories. For example, all missing performance metrics are grouped in the Performance Monitoring category. Each application or custom monitor is a separate category with the application monitor name as the group name.
Monitoring agent template categories:
Performance Monitoring
Process Monitor
Windows Services Monitor
Other Monitors
- custom monitor name
- G2/application monitor name
The minimum metrics sampling interval is one hour.
Constraints
If a metric is defined for alert notification only and not for graph data, self-monitoring can generate a false alert because the agent does not send graph data for these metrics. This constraint primarily applies to custom monitors.
Self-monitoring works on the metric level, not the component level, so any component agent that gets data is assumed to be working.
For example, for a
disk.utilization
metric with three components, C:, D:, and F:, if any of the components can collect data and the other two components fail, self-monitor does not send an alert.For KVM and Docker monitoring, the virtual machine or container is considered to be a component. Again, if any virtual machine or container is missing graph data, self-monitoring does not generate an alert.
If the wrong template is applied, such as a Windows-based template applied to Linux or, conversely, a Linux-based template is applied to Windows, self-monitoring sends an alert.
Metric monitoring status alert format
- Alert Type: MONITORING
- Sub-Alert Type: agent_self_monitoring_error
Field | Content | Description |
---|---|---|
Subject | Critical alert: Agent: no metric samples collected from some monitors. Warning alert: Healed alert: Agent: metric samples collected from all monitors. | Alert status |
Date Created | formatted time | Time alert created. |
Created Time At Source | formatted time | Time alert created at source. |
Description | Example:performance Monitoring: CPU, DISK, FREEDISK, MEMORY; For each alert category:
| This field contains an aggregated list of metrics not collected, categorized by agent template category. It is not an alert on a metric. |
You can also view the alert in the agent log.
Enable agent self-monitoring when the client is created
- Navigate to Setup > Accounts > Clients.
- On the CLIENTS page, click + Add.
- In the Agent Monitoring Capabilities section, select Yes to enable agent self-monitoring. Agent self-monitoring is disabled by default.
All client agents are notified when self-monitoring is enabled.
Enable or disable agent self-monitoring
- Navigate to Setup > Accounts > Clients.
- On the CLIENTS page, select the client you want to change the agent self-monitoring status for.
- Click Edit.
- In the Agent Monitoring Capabilities section, select Yes to enable agent self-monitoring. Select No to disable agent self-monitoring.
- Click Finish.
All client agents are notified when self-monitoring is disabled.
Change the self-monitoring frequency
The agent self-monitoring frequency can be changed, in minute units, at the device level, in the agent configuration.
- default frequency: 360
- minimum frequency: 60
- maximum frequency: 720
You can set the self-monitoring frequency for each client agent, and each agent can be set to a different frequency:
- Open the
configuration.properties
file in theopsramp/agent/conf
folder. - In the Misc section, find the
self_monitor_timer_min
key. - Change the value to the frequency you want, in minutes.
- Save the file.
- Restart the agent service.