The alert management training file is a CSV (comma-separated values) file that provides input data for machine learning to understand alert patterns and drive the following:
- First response
- Alert escalation
The data is saved in a table-structured format. The training file is generally a text file with information separated by commas and plays a key role in alert management.
Download a sample training file
The sample training file, which can be downloaded, contains sample metric names. As provided in the training file, alerts triggered in a sequence (as provided in the training file) are correlated.
To download a sample training file:
- From All Clients, select the client.
- Go to Setup > Alerts > Alert Escalation or First Response.
- From the respective policies page, click +.
- In the Alert Escalation policy Import Dataset and Train Model section, click Alert_Escalation_Training_Example.csv to download a sample training file.
- In the First Response policy, select Suppress specific alerts or Snooze specific alerts for a given time.
- Click Alert_First_Response_Training_File_Example.csv to download a sample training file.
Create a training file
A training file can be created or modified from the sample training file.
- Open a spreadsheet.
- Enter the metric names.
Alerts triggered in the specified sequence are correlated. - Save the training file as CSV UTF-8 (Comma delimited).
Empty cells indicate All Other use cases.
When a machine-learning algorithm is doing the prediction, it uses the row which has the most exact matches. If two rows have the same matching values, the machine-learning algorithm uses the first row it encountered.
Example of empty cell usage
A user wants to route alerts in the following manner:
- The disk.utilization metric that is triggered on Windows resources goes to the Windows Disk Management Support team.
- All other metrics for Windows resources go to the Windows Support team.
The following shows an empty metric cell on the second row which indicates all other metrics:
When a machine-learning algorithm is doing the prediction, it uses the row which has the most exact matches. If two rows have the same matching values, the machine-learning algorithm uses the first row it encountered.
Example training file for alert escalation
The sample, downloadable training file contains the fields clientUniqueId
, metric
, resource.generalInfo.resourceType
, component
, incident.assigneeGroup.name
, resource.dnsName
.
Example training file for first response policy
The sample, downloadable training file contains the fields clientUniqueId
, metric
, resource.generalInfo.resourceType
, currentState
, component
, suppressed
, snoozeDuration
, processIds
.
Key considerations
Alert, resource, incident attributes, and metric names are case-sensitive. The following table provides the metric name representation for a specific monitor:
Monitor Name | G2 Agent |
---|---|
Disk Utilization | DISK |
Memory Utilization | MEMORY |
Windows Service | WINDOWS_SERVICES |
Input column attributes
Types of input columns available are Predefined attributes and Custom attributes.
Predefined attributes
Predefined attributes are the default attributes supported for a training file. The attributes are derived from alert and resource attributes. Use the Get Alert
API to retrieve alert attributes and Get Resource
API to retrieve resource attributes.
Non-collection type attributes are supported that are in three levels down from Alert. For example, in the column name resource.generalInfo.resourceType
, resource
is the first-level attribute, generalInfo
is the second-level attribute and resourceType
is the third-level attribute of alert.
The following table shows example column names for alert and alert resource attributes:
Alert attributes | Alert resource attributes |
---|---|
clientUniqueId | resource.state |
metric | resource.generalInfo.resourceType |
component | resource.generalInfo.make |
alertType | resource.generalInfo.osName |
currentState | resource.location.name |
status | resource.deviceGroup.name |
priority | resource.serviceGroup.name |
elapsedTimeString | resource.dnsName |
healedTimeString | - |
repeatCount | - |
The attributes resource.deviceGroup.name
and resource.serviceGroup.name
are exceptions that belong to the collection type attributes. These attributes are very important in determining the incident routing and prioritization rules.
The attributes resource.generalInfo.name
is not supported as an input column name. Using resource names is error prone and may cause maintenance problems. For example, every time a resource is added or removed, the training file must be updated which can be tedious and error prone. Consider using the high-level criteria such as resource.deviceGroup.name
or resource.serviceGroup.name
or resource.generalInfo.resourceType
to drive incident routing, prioritization, categorization, and cc.
Custom attributes
Custom attributes (tags) are definable and these attributes drive escalating and suppressing alerts. For example, in an organization, resources are divided into three tiers Tier 1, Tier 2, and Tier 3. Different tiers represent different importance. Tier 1 is a critical and highly business impacting resource. If Tier 1 resources are is down, the business can be highly impacted and needs immediate attention. Therefore, assigning the resource and its related alerts to a skilled personnel is equally important. In this example, Tier is the custom attribute name and Tier 1, Tier 2, and Tier 3 are the custom attribute values.
Multiple tag names are allowed as separate columns.
To use a custom attribute:
- Create a training file.
- Specify a column with
resource.tag.<tag_name>
. For example,resource.tag.Tier
. - Add the values of the custom attributes. For example,
Tier 1
.
The training file for Alert Escalation and First Response policies has been enhanced to accommodate custom attributes from both resources and alerts as inputs. This means that you can now incorporate and utilize custom attributes associated with resources and alerts when configuring these policies.
To use a custom attribute for Alert Escalation:
- Create a training file.
- Specify a column with
tag.name
. For example,tag.environment
. - Add the values of the custom attributes.
To use a custom attribute for First Response:
- Create a training file.
- Specify a column with
tag.name
. For example,tag.environment
. - Add the values of the custom attributes.
Considerations
For adding resource group or service group attributes, consider the following:
- To specify more than one device group or service group in the column names
resource.deviceGroup.name
andresource.serviceGroup.name
, list each group separately in a row. - If a group has parents, use the full path to specify the child group. For example, to specify a grandChild group, enter the full path as Parent > Child > grandChild. Similarly, to specify a child group, enter the path as Parent > Child in the training file.
Output column attributes
Learned configurations have the following column names:
Alert Escalation:
incident.assigneeGroup.name
incident.category.name
incident.subCategory.name
incident.priority
incident.cc
incident.businessImpact.name
incident.urgency.name
incident.knowledgeArticleIds
incident.notifyRoster.uniqueId
First Response:
suppressed
snoozeDuration
processIds