Overview
Resource availability is a state of a resource, and it is identified based on the alert on the availability metric and when the resource is onboarded on the OpsRamp.
OpsRamp continuously monitors the resources and keeps track of all the metrics samples. Whenever the availability metric reaches the critical threshold limit, an alert will be raised, and based on the alert, the resource’s availability state will be changed to DOWN.
Availability Calculation
Most of OpsRamp out-of-the-box templates include at least one or two metrics for availability calculation. These metrics will help to identify the availability state of a resource.
How to Configure the Availability
Follow the steps to configure the availability:
- Select any template of your choice and edit the template.
- Go to the Metrics section and then select the metric that is more important for the resource.
- Select the Availability checkbox and save the template.
- Apply the Template to the resources.
Note
You can select more than one metric so that the availability calculation will consider two metrics instead of one metric. Ideally, irrespective of the number of templates, two or three key metrics should be sufficient for identifying the availability state of a resource.Availability States
Availability State | Description | Color Indication |
---|---|---|
UP | No critical alert on availability metrics. | GREEN |
DOWN | Critical alert on availability metrics. | RED |
UNKNOWN | Data samples are not available for the availability metrics. | GREY |
UNDEFINED | No availability metric on the resource. | BROWN |
UNMONITORED | These resources are not supported for monitoring. | LIGHT TEAL |
The onboarded resources of your client fall under any of the above categories.
Availability Rules
When you apply a template, the first option ALL is selected by default, but you can change it to ANY if you prefer. To change, select the Resource, then click the Monitors tab on the right side, and then click Availability Rule.
Note
The Availability rule applies to the resources with more than one Availability metric. If you have only one availability metric, then the ALL/ANY rule does not apply.Availability calculation is divided into two parts:
- ALL: This option means, if all the Availability metrics do not have any critical alert, then the resource is considered UP (OK). If any of the Availability metrics has a critical alert, then the resource is considered as DOWN.
- ANY: This option means, if any of the Availability metrics do not have a critical alert, then the resource is considered as UP (OK). If all the Availability metrics have a critical alert, then the resource is considered as DOWN.
You will find the options below and you have the option to switch between them.
- Resource is UP, if ALL availability metrics are OK. Otherwise, the resource is DOWN.
- Resource is UP, if ANY availability metric is OK. Otherwise, the resource is DOWN.
Possible States for Availability Rule
The below table explains the state of a resource based on all the possible combinations of availability metrics.
Assuming you have two availability metrics on a resource.
How will the state be calculated for ALL rules?
Resource is UP, if ALL availability metrics are OK. Otherwise, the resource is Down.
Metric Sample#1 | Metric Sample#2 | Sample#1 Critical Alert? | Sample#2 Critical Alert? | Availability | |
---|---|---|---|---|---|
Resource A | Collected | Collected | No | No | UP |
Resource A | Collected | Collected | Yes | Yes | DOWN |
Resource A | Collected | Collected | Yes | No | DOWN |
Resource A | Collected | Not collected | Yes | N/A | DOWN |
Resource A | Collected | Not collected | No | N/A | UNKNOWN |
Resource A | Not collected | Not collected | Yes | N/A | DOWN |
Resource A | Not collected | Not collected | N/A | N/A | UNKNOWN |
How will the state be calculated for ANY rules?
Resource is UP, if ANY availability metric is OK. Otherwise, the resource is DOWN.
Metric Sample#1 | Metric Sample#2 | Sample#1 Critical Alert? | Sample#2 Critical Alert? | Availability | |
---|---|---|---|---|---|
Resource A | Collected | Collected | No | No | UP |
Resource A | Collected | Collected | Yes | Yes | DOWN |
Resource A | Collected | Collected | Yes | No | UP |
Resource A | Collected | Not collected | Yes | N/A | UP |
Resource A | Collected | Not collected | No | N/A | UP |
Resource A | Not collected | Not collected | Yes | N/A | UNKNOWN |
Resource A | Not collected | Not collected | N/A | N/A | UNKNOWN |
When to go for the ALL Availability rule?
If you are really concerned about ALL availability metrics and expect those metrics to be always healthy, i.e., metric samples are below the critical threshold limits, then you should go with this rule.
Therefore, if you want your resource to be in the UP state, then all availability metrics should be below the critical threshold limit.
When to go for ANY Availability rule?
If you are only concerned about ANY one of the availability metrics and you expect one of the metrics in healthy i.e., the metric sample is below the critical threshold limits, then you should go with this rule.
Therefore, if you want your resource to be in UP state, then any one of the availability metrics should be below the critical threshold limit.
Resource Availability Score
Resource availability score is calculated based on the state of the availability metric.
Example: If the availability of a resource is DOWN for sometime, then the overall resource availability score is impacted.
How the resource Availability State% works
Example: A resource was onboarded in OpsRamp on January 1st, and on January 2nd, a monitoring template was assigned to the resource, but no monitoring data was collected for one day due to different reasons. On January 3rd, OpsRamp started collecting data, and there were no critical alerts on the availability metrics for one day. On January 4th, a critical alert was generated for the resource on the availability metric.
By considering the above example, the last four days of resource Availability State% will be:
UP: 25%
DOWN: 25%
UNKNOWN: 25%
UNDEFINED: 25%
Generate alert for the resources with unknown availability state
When does a resource go into an Unknown Availability State?
A template that has at least one availability metric is applied to a resource. The resource goes into an UNKNOWN state when there is no data sample collected for the metric(s).
Following are the examples based on the monitoring template frequency:
Availability Metric frequency | Description |
---|---|
< 30 minutes | OpsRamp will wait for 30 minutes for the data samples and if there are no data samples then the resource will be moved to UNKNOWN state. |
Equal to 30 minutes | OpsRamp will wait for 30 minutes + 30 minutes = 60 minutes for the data samples and if there are no data samples then the resource will be moved to UNKNOWN state. |
> 30 minutes and <= 60 minutes | OpsRamp will wait for 60 minutes + 30 minutes = 90 minutes and if there are no data samples then the resource will be moved to UNKNOWN state. |
> 60 minutes | OpsRamp will not consider the resource and it will not go to UNKNOWN state. This resource will not be part of the UNKNOWN alert. |
Note: Ensure your availability metric(s) frequency in the template is less than or equal to 60 minutes to identify the UNKNOWN state of the resource.
How will the user know if a resource goes into an UNKNOWN availability state?
A client-level critical alert will be generated every 30 minutes, if the resource availability state changes to the UNKNOWN state.
The critical alert contains a link to the list of resources with no monitoring data for the last 30 minutes. When you click the link, the Infrastructure > Search page is displayed, with a list of UNKNOWN resources.
The alert is auto healed, if all the resources in the provided link are moved out of the UNKNOWN state.
The alert is generated on the metric name system_resource_availability_state.
This alert is not shown on the resource, so you have to check the
Command Center > Alerts page.
This alerting option is, by default, in the disabled state. You can Enable/Disable the option from the Setup > Accounts > Clients page.
Availability graph on the resource
This graph shows, by default, on all the resources that support monitoring. If your resource is in UNMONITORED category on the Resource page, then this graph is not supported.
You can see this graph in the Resource > Metrics tab.
Each number represents the availability state:
Y axis value | Availability State |
---|---|
0 | UNDEFINED |
1 | UNKNOWN |
3 | DOWN |
4 | UP |
What happens to the availability of the resources if the agent or gateway communication to OpsRamp is down?
Agent/Gateway, by default, has an internal buffer to store the last one hour of metric data and 24 hours of alerting data. During this period, when agent/gateway stops sending the data to the OpsRamp cloud due to network issues then the resources monitored by the agent/gateway will be shown as unknown. Once the communication re-establishes and agent/gateway sends the data stored in the buffer then OpsRamp will recalculate the availability every 6 hours and the availability states will be re-adjusted.
In case of communication issues, the availability states will be re-calculated for the last hour and anything before the last one hour will be considered as UNKNOWN.
How to configure the resource availability state during scheduled maintenance?
By default, the resource’s actual state will be considered irrespective of whether it is in scheduled maintenance or not.
You have an option to change how your resources need to be treated during the scheduled maintenance period. Navigate to Setup > Accounts > Clients page and select the state as per your requirement.
Example: If you select UP, during the scheduled maintenance, even if your resource is powered off / restarted, the resource availability will be shown as UP.
See Create a Client for more information.