Introduction
Archiving Server : Archiving Server enable you to archive IM communications and meeting content for compliance reasons. If you do not have legal compliance concerns, you do not need to deploy Archiving Server. An SQL Server Back End Server is required to implement Archiving.
Monitoring Server : Monitoring Server collects data about the quality of your network media, in both Enterprise voice calls and A/V conferences. It also collects call error records (CERs), which you can use to troubleshoot failed calls. Additionally, it collects usage information in the form of call detail records (CDRs) about various Lync Server features so that you can calculate return on investment of your deployment and plan the future growth of your deployment. A monitoring server role can be deployed to collect statistical usage metrics for IM, conferencing and Enterprise voice by tracking call detail records. Monitoring Server is typically collocated with the Microsoft Lync Server 2010, Archiving Server. An SQL Server Back End Server is required to implement a Monitoring Server.
Discovery with the agent
Collector Type: Agent
Category: Application Monitors
Application Name: Microsoft Lync Front End Servers
Global Template Name : Microsoft Lync Front End Servers DotNet v4
Pre-requisites : For Lync monitors need Microsoft .NET Framework 4.
Collected Metrics
Metric Name | Display Name | Description |
---|---|---|
DBStoreQueueLatency | DBStoreQueueLatency | This component monitor returns the average time, in milliseconds, that a request is held in the queue of the BackEnd Database Server. If the topology is healthy, this counter averages less than 100 ms. |
DBStoreQueueDepth | DBStoreQueueDepth | The average number of database requests waiting to be executed. The backend might be busy and is unable to respond to requests quickly.This might be a temporary condition. If the problem persists please ensure that the hardware and software requirements. |
MSMQ_TotalMessagesInAllQueues | MSMQ_TotalMessagesInAllQueues | Number of times the application has been restarted during the web server's lifetime. |
SIP_503ResponseRate | SIP_503ResponseRate | This component monitor returns the rate of 503 responses generated by the server, per second. The 503 code corresponds to the server being unavailable. On a healthy server, you should not receive this code at a steady rate. |
SIP_504ResponseRate | SIP_504ResponseRate | This component monitor returns the rate of 504 responses generated by the server, per second. A few 504 responses to clients (for clients disconnecting abruptly) is to be expected, but this counter mainly indicates connectivity issues with other servers. |
SIP_ConnectionsActive | SIP_ConnectionsActive | This component monitor returns the number of established connections that are currently active. A connection is considered established when peer credentials are verified (e.g. via MTLS), or the peer receives a 2xx response. |
SIP_TLSConnectionsActive | SIP_TLSConnectionsActive | This component monitor returns the number of established TLS connections that are currently active. A TLS connection is considered established when the peer certificate, and possibly the host name, are verified for a trust relationship. |
memory.committedbytes | Memory CommittedBytes | Active Extended Mode SAs is the number of currently active extended mode security associations. |
memory.pagespersec | Memory PagesPersec | Current State Entries is the number of state entries in the table. A state entry is a pair of IPv6 addresses that is authorized to pass through from a public to an internal interface. |
SIP_SendsOutstanding | SIP_SendsOutstanding | This component monitor returns the number of messages that are currently present in the outgoing queues. If you receive error message 504, investigate the results from this counter. Doing so will indicate which servers are having problems. |
SIP_AvgOutgoingQueueDelay | SIP_AvgOutgoingQueueDelay | This component monitor returns the average time, in seconds, that messages have been delayed in outgoing queues. |
SIP_FlowControlledConnectionsDropped | SIP_FlowControlledConnectionsDropped | This component monitor returns the total number of connections dropped because of excessive flowcontrol. You will need to baseline this counter by testing and monitoring the server's health. The returned value should be as low as possible. |
SIP_AvgFlowControlDelay | SIP_AvgFlowControlDelay | This component monitor returns the average delay, in seconds, in message processing when the socket is flowcontrolled. You will need to baseline this counter by testing and monitoring the server's health. The returned value should be as low as possible. |
SIP_IncomingRequestRate | SIP_IncomingRequestRate | This component monitor returns the rate of received requests, per second. You will need to baseline this counter by testing and monitoring the user load. |
SIP_IncomingMessageRate | SIP_IncomingMessageRate | This component monitor returns the rate of received messages, per second. You will need to baseline this counter by testing and monitoring the user load. |
SIP_EventsInProcessing | SIP_EventsInProcessing | This component monitor returns the number of SIP transactions, or dialog state change events, that are currently being processed. You will need to baseline this counter by testing and monitoring the user load. |
SIP_500ResponseRate | SIP_500ResponseRate | This component monitor returns the rate of 500 responses generated by the server, per second. This can indicate that there is a server component that is not functioning correctly. |
SIP_AvgHoldingTimeForIncomingMessage | SIP_AvgHoldingTimeForIncomingMessage | This component monitor returns the average time that the server held the incoming messages currently being processed. If this counter is more than 10 seconds (12 seconds maximum), then the server goes into throttling mode. |
SIP_AddressSpaceUsage | SIP_AddressSpaceUsage | This component monitor returns the percentage of available address space currently in use by the server process. The returned value should be as low as possible. |
SIP_PageFileUsage | SIP_PageFileUsage | This component monitor returns the percentage of available page file space currently in use by the server process. The returned value should be as low as possible. |
SIP_IncomingMessagesTimedOut | SIP_IncomingMessagesTimedOut | The number of incoming messages currently being held by the server for processing for more than the maximum tracking interval. This server is too busy and is unable to process user requests in timely fashion. |
IM_NumberOfActiveConferences | IM_NumberOfActiveConferences | This component monitor returns the number of active instant messaging conferences. You will need to baseline this counter by testing and monitoring the user load. |
IM_NumberOfConnectedIMUsers | IM_NumberOfConnectedIMUsers | This component monitor returns the number of connected instant messaging users in all conferences. You will need to baseline this counter by testing and monitoring the user load. |
IM_WithThrottledSIPConnections | IM_WithThrottledSIPConnections | This component monitor returns the number of throttled Sip connections. If the value is greater than ten, it could indicate that Peer is not processing requests in a timely fashion. This can happen if the peer machine is overloaded. |
IMMCU_NumberOfConferences | IMMCU_NumberOfConferences | Number of instant messaging conferences. Ideally it should be evenly distributed across all frontend servers. |
IM_MCUHealthState | IM_MCUHealthState | The Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. |
IM_MCUDrainingState | IM_MCUDrainingState | This component monitor returns the current draining status of the MCU. Possible values: 0 = Not requesting to drain. 1 = Requesting to drain. 2 = Draining. When a server is drained, it stops taking new connections and calls. |
User_services_DBStoreSprocLatency | User_services_DBStoreSprocLatency | This component monitor returns the average time, in milliseconds, it takes to execute a stored procedure call. A healthy state is considered to be less than 100 ms. Server health decreases as latency increases to 12 seconds, when server throttling begins. |
User_services_NumberOfFailedHTTPConnections | User_services_NumberOfFailedHTTPConnections | This component monitor returns the rate of connection attempt failures, per second. You will need to baseline this counter by testing and monitoring the server's health. |
Memory_PagesPerSec | Memory_PagesPerSec | If a page has to be retrieved from the disk instead of from the memory, there is a negative impact to performance; the rate at which pages in memory are swapped with those in the disk needs to be below a 500 pages per second. |
AVMCU_NumberofAudiovideoconferences | AVMCU_NumberofAudiovideoconferences | Number of audiovideo conferences. Ideally it should be evenly distributed across all frontend servers. |
ASMCU_NumberOfApplicationSharingConferences | ASMCU_NumberOfApplicationSharingConferences | Number of applicationsharing conferences. Ideally it should be evenly distributed across all frontend servers. |
DATAMCU_HealthState | DATAMCU_HealthState | The Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current health of the data sharing MCU. 0 = Normal. 1 = Loaded. 2 = Full. 3 = Unavail. |
DATAMCU_DrainingState | DATAMCU_DrainingState | The Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current draining status of the data sharing MCU. 0 = Not requesting to drain. 1 = Req. |
DataMCU_EstimatedConferenceWorkitemsLoad | DataMCU_EstimatedConferenceWorkitemsLoad | The estimated time to process all pending items on the session queues measured in milliseconds. |
DataMCU_StateOfSessionQueues | DataMCU_StateOfSessionQueues | The state of the session queues. It indicates if the Data MCU is over loaded. |
DATAMCU_NumberOfDataSharingConferences | DATAMCU_NumberOfDataSharingConferences | Number of datasharing conferences. Ideally it should be evenly distributed across all frontend servers. |
ApplicationSharingMCU_HealthState | ApplicationSharingMCU_HealthState | The Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current health of the application sharing MCU. 0 = Normal. 1 = Loaded. 2 = Full. |
ApplicationSharingMCU_DrainingState | ApplicationSharingMCU_DrainingState | The Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current draining status of the application sharing MCU. 0 = Not requesting to drain. |
AudioVideoMCU_HealthState | AudioVideoMCU_HealthState | The Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current health of the audiovideo MCU. 0 = Normal. 1 = Loaded. 2 = Full. 3 = Unavailable. |
AudioVideoMCU_DrainingState | AudioVideoMCU_DrainingState | The Multipoint Conferencing Units (MCU) health counters give an indication of the overall system health; these should be 0 at all times, indicating normal operation. The current draining status of the audiovideo MCU. 0 = Not requesting to drain. 1 = Req. |
AddressBook_SearchResponseTime | AddressBook_SearchResponseTime | The average processing time for a address book search request in milliseconds. It could be due to backend database performance issues. Verify CPU load on backend database machine. Upgrade hardware if needed. |
AddressBook_SearchFailureRate | AddressBook_SearchFailureRate | The persecond rate of failed address book search requests. It could be due to backend database performance issues. Verify backend database is running and accessible. |
LS_AV_Auth_Edge_BadRequestsReceivedPerSecond | LS_AV_Auth_Edge_BadRequestsReceivedPerSecond | The number of bad requests received/sec. This error occurs when an unexpectedly high rate of invalid requests is received by the A/V Authentication Service. This could be the result of an attempt to misuse the A/V Authentication Service. |
PolicyDecisionPoint_ClientConnectionsAuthenticationTimeoutFailuresPerSecond | PolicyDecisionPoint_ClientConnectionsAuthenticationTimeoutFailuresPerSecond | The persecond rate of client connections timing out before receiving an authenticated message. Connection from client timed out because it was not authenticated within the specified time. Check if there are any certificate issues between the machines. |
PolicyDecisionPoint_ConnectionsTimedOutPerSecond | PolicyDecisionPoint_ConnectionsTimedOutPerSecond | The persecond rate of sessions that have timed out before the first packet arrived. No packets were received in the connection. This can happen when some client is trying to attack the server by creating connections and consuming resources from Bandwidth. |
PolicyDecisionPoint_ServerConnectionsAuthenticationTimeoutFailuresPerSecond | PolicyDecisionPoint_ServerConnectionsAuthenticationTimeoutFailuresPerSecond | The persecond rate of server connections timing out before receiving an authenticated message. Connection from server timed out because it was not authenticated within the specified time. Check if there are any certificate issues between the machines. |
SIP_ConnectionsRefusedDueToServerOverload | SIP_ConnectionsRefusedDueToServerOverload | The persecond rate of the connections that were refused with Service Unavailable response because the server was overloaded. If the problem persists, please ensure that hardware and software requirements for this server meets the user usage characteristic. |
ExpandDistributionList_ResponseTimeInms | ExpandDistributionList_ResponseTimeInms | Average processing time for a successful request to be completed in milliseconds. It indicates if there are any Active Directory performance issues. |
ExpandDistributionList_SOAPExceptionRate | ExpandDistributionList_SOAPExceptionRate | The persecond rate of Soap exceptions. |
AddressBookFileDownload_FailedRequestsPerSecond | AddressBookFileDownload_FailedRequestsPerSecond | The persecond rate of failed Address Book file requests. High rate of failure can be caused by authentication issues or network connectivity issues. |
LSCommunicatorWebApp_FailedDataCollaborationAuthenticationRequestsPerSecond | LSCommunicatorWebApp_FailedDataCollaborationAuthenticationRequestsPerSecond | The number of failed Data Collaboration authentication request per second. Attempts to authenticate incoming client connections for data collaboration failed. This may indicate a network attack. |
LSCommunicatorWebApp_NumberOfDataCollaborationConnectionFailuresWithDataCollaborationServers | LSCommunicatorWebApp_NumberOfDataCollaborationConnectionFailuresWithDataCollaborationServers | The number of Data Collaboration connection failures with Data Collaboration servers. Connection closed by local party or remote party or network issues. Check availability of Web Conferencing Server servers. |
LSCommunicatorWebApp_ThrottledClientDataCollaborationConnectionsPerSecond | LSCommunicatorWebApp_ThrottledClientDataCollaborationConnectionsPerSecond | The number of Data Collaboration client connections closed due to throttling per second. Client Data Collaboration was closed because client failed to read data in a timely manner. This may indicate a network failure or organized attack. |
CallPark_FailedCallParkRequests | CallPark_FailedCallParkRequests | The total number of park requests that failed. |
CallPark_FailedRequestsBecauseNoOrbitIsAvailable | CallPark_FailedRequestsBecauseNoOrbitIsAvailable | The total number of park requests failed because no orbit available. Consider adding more orbits using management console or the Power Shell commands to manage orbit ranges. |
CallPark_FailedTransfersToFallbackURI | CallPark_FailedTransfersToFallbackURI | The total number of failed fallback attempts. The fallback destination might not be reachable. |
AudioVideoConferencing_NumberOfOccasionsConferenceProcessingIsDelayed | AudioVideoConferencing_NumberOfOccasionsConferenceProcessingIsDelayed | Number of occasions conference processing is delayed. This issue may occur if the Audio Video Conferencing server is overloaded, or is not getting enough CPU resources to process audio in real time. |
SIP_MessagesPerSecondDroppedDueToUnknownDomain | SIP_MessagesPerSecondDroppedDueToUnknownDomain | The persecond rate of messages that could not be routed because the message domain is not configured and does not appear to belong to a federated partner. The Access Edge Server received SIP messages with an unknown domain. |
IMMCU_ThrottledSIPConnections | IMMCU_ThrottledSIPConnections | The number of throttled Sip connections . Peer is not processing requests in a timely fashion.This can happen if the peer machine is overloaded. |