Overview

Resource availability is a state of a resource, and it is identified based on the alert on the availability metric and when the resource is onboarded on the OpsRamp.

OpsRamp continuously monitors the resources and keeps track of all the metrics samples. Whenever the availability metric reaches the critical threshold limit, an alert will be raised, and based on the alert, the resource’s availability state will be changed to DOWN.

Availability Calculation

Most of OpsRamp out-of-the-box templates include at least one or two metrics for availability calculation. These metrics will help to identify the availability state of a resource.

How to Configure the Availability

Follow the steps to configure the availability:

  1. Select any template of your choice and edit the template.
  2. Go to the Metrics section and then select the metric that is more important for the resource.
  3. Select the Availability checkbox and save the template.
  4. Apply the Template to the resources.

Availability States

Availability StateDescriptionColor Indication
UPNo critical alert on availability metrics.GREEN
DOWNCritical alert on availability metrics.RED
UNKNOWNData samples are not available for the availability metrics.GREY
UNDEFINEDNo availability metric on the resource.BROWN
UNMONITOREDThese resources are not supported for monitoring.LIGHT TEAL

The onboarded resources of your client fall under any of the above categories.

Availability Rules

When you apply a template, the first option ALL is selected by default, but you can change it to ANY if you prefer. To change, select the Resource, then click the Monitors tab on the right side, and then click Availability Rule.

Availability calculation is divided into two parts:

  • ALL: This option means, if all the Availability metrics do not have any critical alert, then the resource is considered UP (OK). If any of the Availability metrics has a critical alert, then the resource is considered as DOWN.
  • ANY: This option means, if any of the Availability metrics do not have a critical alert, then the resource is considered as UP (OK). If all the Availability metrics have a critical alert, then the resource is considered as DOWN.

You will find the options below and you have the option to switch between them.

  • Resource is UP, if ALL availability metrics are OK. Otherwise, the resource is DOWN.
  • Resource is UP, if ANY availability metric is OK. Otherwise, the resource is DOWN.

Possible States for Availability Rule

The below table explains the state of a resource based on all the possible combinations of availability metrics.

Assuming you have two availability metrics on a resource.

How will the state be calculated for ALL rules?

Resource is UP, if ALL availability metrics are OK. Otherwise, the resource is Down.

Metric Sample#1Metric Sample#2Sample#1 Critical Alert?Sample#2 Critical Alert?Availability
Resource ACollectedCollectedNoNoUP
Resource ACollectedCollectedYesYesDOWN
Resource ACollectedCollectedYesNoDOWN
Resource ACollectedNot collectedYesN/ADOWN
Resource ACollectedNot collectedNoN/AUNKNOWN
Resource ANot collectedNot collectedYesN/ADOWN
Resource ANot collectedNot collectedN/AN/AUNKNOWN

How will the state be calculated for ANY rules?

Resource is UP, if ANY availability metric is OK. Otherwise, the resource is DOWN.

Metric Sample#1Metric Sample#2Sample#1 Critical Alert?Sample#2 Critical Alert?Availability
Resource ACollectedCollectedNoNoUP
Resource ACollectedCollectedYesYesDOWN
Resource ACollectedCollectedYesNoUP
Resource ACollectedNot collectedYesN/AUP
Resource ACollectedNot collectedNoN/AUP
Resource ANot collectedNot collectedYesN/AUNKNOWN
Resource ANot collectedNot collectedN/AN/AUNKNOWN

When to go for the ALL Availability rule?

If you are really concerned about ALL availability metrics and expect those metrics to be always healthy, i.e., metric samples are below the critical threshold limits, then you should go with this rule.
Therefore, if you want your resource to be in the UP state, then all availability metrics should be below the critical threshold limit.

When to go for ANY Availability rule?

If you are only concerned about ANY one of the availability metrics and you expect one of the metrics in healthy i.e., the metric sample is below the critical threshold limits, then you should go with this rule.
Therefore, if you want your resource to be in UP state, then any one of the availability metrics should be below the critical threshold limit.

Resource Availability Score

Resource availability score is calculated based on the state of the availability metric.
Example: If the availability of a resource is DOWN for sometime, then the overall resource availability score is impacted.

How the resource Availability State% works

Example: A resource was onboarded in OpsRamp on January 1st, and on January 2nd, a monitoring template was assigned to the resource, but no monitoring data was collected for one day due to different reasons. On January 3rd, OpsRamp started collecting data, and there were no critical alerts on the availability metrics for one day. On January 4th, a critical alert was generated for the resource on the availability metric.

By considering the above example, the last four days of resource Availability State% will be:

UP: 25%

DOWN: 25%

UNKNOWN: 25%

UNDEFINED: 25%

Generate alert for the resources with unknown availability state


When does a resource go into an Unknown Availability State?

A template that has at least one availability metric is applied to a resource. The resource goes into an UNKNOWN state when there is no data sample collected for the metric(s).

Following are the examples based on the monitoring template frequency:

Availability Metric frequencyDescription
< 30 minutesOpsRamp will wait for 30 minutes for the data samples and if there are no data samples then the resource will be moved to UNKNOWN state.
Equal to 30 minutesOpsRamp will wait for 30 minutes + 30 minutes = 60 minutes for the data samples and if there are no data samples then the resource will be moved to UNKNOWN state.
> 30 minutes and <= 60 minutesOpsRamp will wait for 60 minutes + 30 minutes = 90 minutes and if there are no data samples then the resource will be moved to UNKNOWN state.
> 60 minutesOpsRamp will not consider the resource and it will not go to UNKNOWN state. This resource will not be part of the UNKNOWN alert.

Note: Ensure your availability metric(s) frequency in the template is less than or equal to 60 minutes to identify the UNKNOWN state of the resource.

How will the user know if a resource goes into an UNKNOWN availability state?

A client-level critical alert will be generated every 30 minutes, if the resource availability state changes to the UNKNOWN state.

The critical alert contains a link to the list of resources with no monitoring data for the last 30 minutes. When you click the link, the Infrastructure > Search page is displayed, with a list of UNKNOWN resources.

The alert is auto healed, if all the resources in the provided link are moved out of the UNKNOWN state.

The alert is generated on the metric name system_resource_availability_state. This alert is not shown on the resource, so you have to check the
Command Center > Alerts page.

This alerting option is, by default, in the disabled state. You can Enable/Disable the option from the Setup > Accounts > Clients page.

Availability-enable-alerting

Availability graph on the resource

This graph shows, by default, on all the resources that support monitoring. If your resource is in UNMONITORED category on the Resource page, then this graph is not supported.

You can see this graph in the Resource > Metrics tab.

Availability-states-graph

Each number represents the availability state:

Y axis valueAvailability State
0UNDEFINED
1UNKNOWN
3DOWN
4UP

What happens to the availability of the resources if the agent or gateway communication to OpsRamp is down?

Agent/Gateway, by default, has an internal buffer to store the last one hour of metric data and 24 hours of alerting data. During this period, when agent/gateway stops sending the data to the OpsRamp cloud due to network issues then the resources monitored by the agent/gateway will be shown as unknown. Once the communication re-establishes and agent/gateway sends the data stored in the buffer then OpsRamp will recalculate the availability every 6 hours and the availability states will be re-adjusted.

In case of communication issues, the availability states will be re-calculated for the last hour and anything before the last one hour will be considered as UNKNOWN.

How to configure the resource availability state during scheduled maintenance?

By default, the resource’s actual state will be considered irrespective of whether it is in scheduled maintenance or not.

You have an option to change how your resources need to be treated during the scheduled maintenance period. Navigate to Setup > Accounts > Clients page and select the state as per your requirement.

Availability settings - scheduled maintenance

Example: If you select UP, during the scheduled maintenance, even if your resource is powered off / restarted, the resource availability will be shown as UP.

See Create a Client for more information.