The table below documents the SNMP interface, all headings can be directly mapped to the bind variables in the MIB file. Use the Event Source Type, Metric, Event Severity and Event Message fields to inform SNMP development efforts. Some alerts (e.g., Num Good Proxy Errors, Number of GEMS User Events) are generated based on a number or percentage of events detected in the logs of the source system (e.g, Good Proxy Logs). These are designed to capture
repeated errors, high percentages of errors, percentage of users with errors, and other aggregations that may be indicative of a problem. When these aggregate alerts are generated, the resulting email and SNMP trap also contain a concatenated string of all of the events that triggered the alerts.
Sample Notification
Good Application Group
Total App Requests
Critical The <ApplicationName> application has not received any application requests for the last <numZeroRqstMinutes> minutes. This may indicate a decrease in app activity or an availability issue. This server normally receives <avgAppRequests> requests each 3 minute sample at this hour.
Possible Action: Ensure that network connectivity between the Good Proxy servers and external networks is functioning properly.
Warning The <ApplicationName> application has received fewer application requests than normal. This may indicate a decrease in app activity or an availability issue. This server normally receives <avgAppRequests> requests each 3 minutes sample at this hour. The last 3 minute sample showed only <TotalAppRequests> requests.
Possible Action: Ensure that network connectivity between the Good Proxy and external networks is functioning properly.
Good Proxy Cluster Group
Total App Requests
Critical The Good Proxy cluster <GPSClusterName> has not received any application requests for the last <numZeroRqstMinutes> minutes. This may indicate a decrease in app activity or an availability issue. This cluster normally receives <avgAppRequests> requests each 3 minute sample at this hour.
Possible Action: Ensure that network connectivity between the Good Proxy servers and external networks is functioning properly. For Good Proxy servers configured for Direct Connect, please see GP Direct Connect guide
Warning The Good Proxy cluster <GPSClusterName> has received fewer application requests than normal for this hour in the last <numAnblRqstMinutes> minutes. This may indicate a decrease in app activity or an availability issue. This cluster normally receives <abnlAppRequests> requests each 3 minutes sample at this hour. The last 3 minute sample showed only <TotalAppRequests> requests.
Warning 1 of <NumOfTotalGPS> servers in the proxy cluster <GPSClusterName> has been unavailable for <nMinutes> minutes. The affected server is <GPSList> . Please check the details of the affected server for possible actions.
Unavailable The only server in the Good Proxy cluster <GPSClusterName> has been Unavailable for <nMinutes> minutes. The affected server is <GPSList> . Please check the details of the affected server for possible actions.
Warning The Good Proxy cluster <GPSClusterName> does not contain any member servers.
Good Proxy Server Group
Proxy Server Host Availability
Critical Good Proxy host <GoodProxyServerHost> has not been reached for <numDownMinutes> minutes. This will increase the load on other Good Proxy servers and may impact response times and service quality. Possible Action: Ensure that the Good Proxy host is up and connected to the network. If the host is available, ensure that there are no network issues or firewall rules preventing Good MSM from reaching the host via WMI.
Proxy Server Window Service Availability
Critical Good Proxy Server is not running on <GoodProxyServerHost> . This will increase the load on other Good Proxy servers and may impact response times and service quality. Possible Action: Ensure that the Good Proxy Server service is running on the host. If the service is not running or does not stay running, check the Application log in Windows Event Viewer for errors related to the Good Proxy Server service. Ensure that anti-virus services are not preventing the Good Proxy Server service from starting.
Escalated_Proxy_Log_Errors
Critical Combined Log Events Failed App Server Connection In Percent
or exceeds the critical threshold of <criticalThreshold> %.
The application servers with the highest number of connection errors occurring through this Proxy Server are (Application Server, Error Count for Past 15 minutes, Last Error) : <App Server List> .
Possible Action: Have the application owner for the affected application servers check that they are running correctly. If the application servers are running correctly, check that network connectivity between the Good Proxy servers and application servers is working correctly.
Warning The Good Proxy <GoodProxyServerHost> has experienced a high percentage of application server connection errors for the last <numMinutes> minutes. These errors may result in app service errors for users connecting to these application servers. <FailedAppServerConnectionsInPercent> % of <TotalAppServerConnections> connection attempts have failed which meets or exceeds the warning threshold of <warningThreshold> %.
The application servers with the highest number of connection errors occurring through this Proxy Server are (Application Server, Error Count for Past 15 minutes, Last Error) : <App Server List> .
Possible Action: Have the application owner for the affected application servers check that they are running correctly. If the application servers are running correctly, check that network connectivity between the Good Proxy servers and application servers is working correctly.
TotalAppRequests
Critical The Good Proxy <GoodProxyServerHost> has received fewer application requests than normal for the last <numAnblRqstMinutes> minutes. This server normally receives an average of <abnlAppRequests> requests every 3 minute sample at this hour of the day. The last 3 minute sample showed only <TotalAppRequests> requests.
Possible Action: Ensure that network connectivity between the Good Proxy and external networks is functioning properly. For Good Proxy servers configured for Direct Connect, please refer to the GD Direct Connect documentation for detailed configuration guidance
Warning The Good Proxy <GoodProxyServerHost> has received no application request in last <numZeroRqstMinutes> minutes. This server normally receives an average of <avgAppRequests> requests every 3 minute sample at this hour of the day. Possible Action: Ensure that network connectivity between the Good Proxy and external networks is functioning properly. For Good Proxy servers configured for Direct Connect, please refer to the GD Direct Connect documentation for detailed configuration guidance.
Most Critical Event
Combined User Events
Good for Domino Summary Group
High Availability/Monitor Primary GMM
Critical The GMM Server <GroupName> has failed over from the primary machine <GoodServerHost> to the standby machine <GoodStandbyServerHost> . Primary Services Availability/NumServiceAvailabilityErrors
Most Critical Event
The Sevice Status for <GroupName> at <GoodServerHost> :
Standby Services Availability/NumServiceAvailabilityErrors Most Critical
Event
The Sevice Status for <GroupName> at <GoodStandbyServerHost> : Escalated_GMM_Log_Errors
Most Critical Event
Combined Server Events
GDToHHFlows
Unavailable The number of mail flows from GOOD Server <GroupName> at host > GoodServerHost> to handheld has been 0 consecutively for at least <warningThreshold> samples, and also it has been below normal baseline range consecutively for at least <abnl_warningThrshld> samples, while the average amount of flows at this hour is <avgFlowPerHour> .
Critical The GMM Server <GroupName> has consistently been running at high RAM value ( <GMMS_RAM_ToDangerLevel> % of <GMMS_RAM_DangerLevel> MB) for the last <numMinutes> minutes. When GMMS is running at <GMMS_RAM_DangerLevel> MB, it is considered critical.
Standby GMMS Memory/GMMSProcTotalMem
Critical The GMM Server <GroupName> has consistently been running at high RAM value ( <GMMS_RAM_ToDangerLevel> % of <GMMS_RAM_DangerLevel> MB) for the last <numMinutes> minutes. When GMMS is running at <GMMS_RAM_DangerLevel> MB, it is considered critical.
Primary Host Availability/GoodServerHostAvailable
Unavailable The Good Server host <GoodServerHost> has not been reachable for <GoodServerHostDownCount> samples. The host may be unavailable, or a network issue may be preventing Good MSM from reaching the host.
Standby Host Availability/Good Server Host Available
Unavailable The Good Server host <GoodStandbyServerHost> has not been reachable for <StandbyHostDownCount> samples. The host may be unavailable, or a network issue may be preventing Good MSM from reaching the host.
Good for Exchange Summary Group
GDToHHFlows
Unavailable The GMMS <GroupName> at <GoodServerHost> is running and accessible, but Good MSM has detected no device sync activity for the last <thresholdMinutes> minutes. Normally <thresholdFlows> sync activities would be expected in this amount of time. For this hour of the day, Good MSM has learned that an average of <avgFlows> sync activities usually occur every 3 minutes. For an individual 3 minute sample period, Good MSM has observed that flows normally range from <abnlFlows> per sample up to <abnhFlows> per sample. Critical The GMMS <GroupName> at <GoodServerHost> is running and accessible, but Good MSM has detected no device sync activity for the last <thresholdMinutes> minutes. Normally
GEMS User Errors
Most critical event
Combined User Events
GEMS Group
GEMSWindowServicesAvailable
Unavailable GEMS host <GEMSHost> has not been reached for <num_down_minutes> minutes. Possible Action: Ensure that the GEM Server host is up and connected to the network. If the host is available, ensure that there are no network issues or firewall rules preventing Good MSM from reaching the host via WMI.
Critical The Windows service <Caption> is not running on <GEMSHost> . Possible Action: Ensure that the service is running on the host. If the service is not running or does not stay running, check the Application log in Windows Event Viewer for errors related to this service. Ensure that anti-virus services are not preventing the service from starting.
CAS Events
CAS Summary Group
CAS Host Availability
Unavailable The server <CASHost> has not been reachable for <CASHostDownCount> samples. The host may be unavailable, or a network issue may be preventing MSM from reaching the host. Escalated_CAS_
Warning The number of users on the ActiveSync Mailbox Exchange Server GroupName with service errors has exceeded the warning threshold of warningThreshold. Here is the list of errors and number of users with that error: <user list>
CASUserGroup
ActiveSync_User_ Errors
Most Critical Event
Combined User Events
BES Events
BES User Domino Support
Number of User Events Most Critical Event
Combined User Events HHFlashFreeMB
Warning The users BlackBerry smartphone is low on available memory ( HHFlashFreeMemory / 1048576.0 of memory available). As a smartphone runs low on memory (approximately 1.4 MB of free memory) it may begin deleting out-of-date calendar entries messages and call logs. Low memory may also slow the performance and responsiveness of the BlackBerry smartphone. Memory is consumed by media files (e.g. pictures music video) applications as well as your data (messages calendar entries and contacts). To free additional memory on the users smartphone instruct the user to transfer media files to a media card or delete any unneeded media files remove any rarely used applications and/or purge older messages and calendar entries. For step-by-step instructions look-up this user in the Good MSM Help Desk;
BES User Exchange Support
Number of User Events Most Critical Event
Combined User Events
HHFlashFreeMB
Warning The users BlackBerry smartphone is low on available memory ( HHFlashFreeMemory / 1048576.0 of memory available). As a smartphone runs low on memory (approximately 1.4 MB of free memory) it may begin deleting out-of-date calendar entries messages and call logs. Low memory may also slow the performance and responsiveness of the BlackBerry smartphone. Memory is consumed by media files (e.g. pictures music video) applications as well as your data (messages calendar entries and contacts). To free additional memory on the users smartphone instruct the user to transfer media files to a media card or delete any unneeded media files remove any rarely used applications and/or purge older messages and calendar entries. For step-by-step instructions look-up this user in the Good MSM Help Desk; UserName - UserNameEmailAddress - EmailAddressWarningThreshold - UserAdjustables[tweakRow LOW_HH_FLASH_MEMORY_WARNING] MBBES - BESNameMailServer - MailServerNameCarrier - CarrierName
BESDominoSummaryGroup
BESHostAvailable
Unavailable The BES host BESHost has not been reachable for 0 samples. The BES host may be unavailable or a network issue may be preventing Good MSM from reaching the BES host Critical The BES host BESHost has not been reachable for BESHostDownCnt> 1 samples. The BES host may be unavailable or a network issue may be preventing Good MSM from reaching the
BES host.
Unavailable The Standby BES host StandbyBESHost has not been reachable for 0 samples. The Standby BES host may be unavailable or a network issue may be preventing Good MSM from reaching the Standby BES host.
Critical The Standby BES host StandbyBESHost has not been reachable for StandbyBESHostDownCnt <> 1 samples. The Standby BES host may be unavailable or a network issue may be preventing Good MSM from reaching the Standby BES host.
Warning The number of mail flows from BES <GroupName> at <ActiveBESHost> to handheld has been 0 consecutively for at least <warningThreshold> samples, and also it has been below normal baseline range consecutively for at least <abnl_warningThrshld> samples, while the average amount of flows at this hour is <avgFlowPerHour> .
Escalated_BES_Log_Errors Most Critical Event
Combined server events
Escalated_SRP_Log_Errors
Unavailable strDownErrs <tmpStr> Critical strCriticalErrs <tmpStr> Warning strWarnErrs <tmpStr>
Licenses Remaining
Critical You currently have <LicensesRemain> licenses remaining which is less than the Critical threshold of <criticalThrsld> . You currently are using <license_used> CALs from a total pool of <license_total> .
Warning You currently have <LicensesRemain> licenses remaining which is less than the Warning threshold of <warningThrsld> . You currently are using <license_used> CALs from a total pool of <license_total> .
NumBESMachines/High Availability
Warning For HA BES deployment there should have two BES hosts. For non-HA there should have one primary BES host. However we detected NumRows(HABESTable) BES hosts. HA rule will not fire.
NumLogLinesPerUser
Abnormally large amounts of log lines may indicate that a problem is occurring on the BES that is causing excessive activity or error rates. Utilize the Good MSM consoles to investigate further.
Critical For the last num min minutes BES <GroupName> at <ActiveBESHost> has generated an abnormally high amount of log lines - more than <LoggingRateCritical> times the expected amount. In the last sample <NumOfLogLines> Good MSM monitored log lines have been generated at an average rate of <LogLinesPerUser> log lines per user. Abnormally large amounts of log lines may indicate that a problem is occurring on the BES that is causing excessive activity or error rates. Utilize the Good MSM consoles to investigate further.
Warning BES <GroupName> at <ActiveBESHost> has not generated any Good MSM monitored log lines for the last <thresholdMins> minutes. Normally <thresholdLogLines> monitored log lines would be expected in this amount of time. For this hour of the day, Good MSM has learned that an average of <totalLogLinesPerSample> monitored log lines usually occur every 3 minutes, and it can normally range from <abnlLogLinesPerSmpl> per sample up to <abnhLogLinesPerSmpl> per sample. Good MSM generally monitors for message flows, status indicators and errors but has not detected any of these expected log lines. This may indicate that the BES is not providing any service to its end users. Utilize the Good MSM consoles to investigate further.
NUmServiceAvailabilityErrors Most Critical Event
The Service Status for BES <GroupName> at <ActiveBESHost> : NUmServiceAvailabilityErrors-Standby Services Availability
Most Critical Event
The Service Status for BES <GroupName> at <ActiveBESHost> :
NumUsersWithHungThread
Critical criticalThreshold or more users on BES ‘GroupName’ at ‘ActiveBESHost’ have hung threads for longer than AdjustableTable[0, HungThreadDurationThrshld] minutes. [These users are: UserList]
Warning The number of users on BES ‘GroupName’ whose Hung Thread Duration is greater than groupThreshold minutes is above the warning threshold of AdjustableTable[0, HungThreadCountWarning] users. [These users are: UserList]
PercentUsersWithMsgPendingCount
Critical <CurrUserCount> * <PercentUsersWithMsgPendingCount> / 100.0 of <CurrUserCount> total users on BES <Name> at <ActiveBESHost> with Message Pending Counts higher than <groupThreshold> is above the baselined threshold of <pvalue> percent
Warning <NumUsersWithHighPendingCount> of <CurrUserCount> total users on BES <Name> at <ActiveBESHost> with Message Pending Counts higher than AdjustableTable[0, <MsgPendingCntThrshld> is above the baselined threshold of <pvalue> percent.
Critical <NumUsersWithHighPendingCount> of <CurrUserCount> total users on BES <Name> at <ActiveBESHost> with Message Pending Counts higher than <groupThreshold> is above the baselined threshold of 100.0 * <numUserPendMsgCntCritical> / <CurrUserCount> percent.
Warning <NumUsersWithHighPendingCount> of <CurrUserCount> total users on BES <GroupName> at <ActiveBESHost> with Message Pending Counts higher than <groupThreshold> is above the baselined threshold of 100.0 * <numUserPendMsgCntWarning> / <CurrUserCount> percent.
TotalMessagesPending
Critical The total message pending for <BESHost> is regularly above the critical threshold of <TotalMsgsPendingCritical>] . This could be an indicator of a wireless carrier failure (affecting many users at once) a RIM service failure a network failure causing an SRP connect failure between the BES Server and RIM a problem with the BES SQL Server database or hung worker thread(s) causing delays in message delivery and eventually a BES Messaging Agent restart on the BES Server. Check (1) whether the RIM Service is up (e.g. ping srp.na.blackberry.net) (2) whether a large number of users on the same wireless carrier are down (3) whether the SRP connection between the BES server and RIM and (4) the SQL connection are up by going to the BlackBerry Server Configuration console BlackBerry Router tab and clicking Test Network Connection button and then the Database Connectivity tab and clicking the Test SQL Server Connection button and (5) look in the BES server Messaging Agent logfile for log entries with phrase No Response for a specific worker thread and a specific user name.
Warning The total message pending for <BESHost> is regularly above the critical threshold of <TotalMsgsPendingCritical>. This could be an indicator of a wireless carrier failure (affecting many