ProactiveWatch 1.5 Technical White Paper

(1)

(2)

1

I. Where Does ProactiveWatch Fit In?

VARs that want to become or already have become Managed Services Providers (MSPs) have a wide variety of platforms to choose from. Most Managed Services platforms are focused upon the “pure play” MSP that has complete responsibility for the operation of customer systems, and complete authority to make changes to these systems to keep them running. These MSP platforms are time consuming to deploy and expensive to maintain. Worse still, they require the VAR to reinvent how it sells and delivers these services in order to recoup the hefty investment in the MSP platform.

ProactiveWatch is also a Managed Services platform that the VAR can offer its customers, but this is where the similarity with other Managed Services products ends. ProactiveWatch is unique in the following respects:

1. ProactiveWatch is easy to sell because it has a high customer value as a protective, preventive service and it can be sold for a reasonable price. The suggested retail price of $30 a month per monitored device can be layered on top of existing block time or time & materials service contracts already in place with your customers. ProactiveWatch is the only Managed Service offering that does not require the VAR to change how it currently sells and delivers support services.

2. The typical Managed Service is targeted at customers where the VAR charges a fixed price per managed device of $200 per month and up. While this option exists with ProactiveWatch as well, the reality is that often only a subset of the VAR’s customer base will buy into these services.

ProactiveWatch provides an attractive alternative that can be sold to nearly 100% of the VAR’s customer base.

3. All other Managed Services platforms assume the VAR must have full responsibility for the environment, and full authority to act without the customer’s knowledge to make changes in the environment. While ProactiveWatch is a valuable platform upon which to base a “total control” Managed Service, it’s unique in the following capability. It’s the only managed services platform that allows a collaborative sharing of IT responsibilities between the VAR and customer. This is because ProactiveWatch offers a Customer Console that can be sold to customers who have their own IT staff but who do not want to dedicate the staff required to configure and manage a traditional on site monitoring product.

4. ProactiveWatch is the easiest Managed Service for the VAR to purchase and implement.

ProactiveWatch is sold to the VAR as a Software as a Service (SaaS) on a simple month-to-month subscription. ProactiveWatch does not require the VAR to purchase any licensed software, any hardware infrastructure, any Windows or SQL Server licenses to support the Managed Services

platform, or require the VAR to make any kind of an up front financial commitment. Installation of the ProactiveWatch Explorer (the Console) at the VAR site takes 2 minutes. Implementation at a customer site takes 30 seconds or less per monitored device.

(3)

2

II. Architecture Advantages of ProactiveWatch

The technical architecture of the ProactiveWatch system is shown in the image below. The architecture of the system has several key components:

1. ProactiveWatch hosts the back end applications servers and database servers. VARs and Service Providers do not need to buy any licensed software from ProactiveWatch, nor does the VAR need to procure or maintain any Windows or Database server licenses or hardware to support the

ProactiveWatch system.

2. For the purposes of monitoring servers and workstations, ProactiveWatch is an agent-based system. ProactiveWatch agents are very small consuming less than 10MB of Virtual Memory on a server, and very efficient - consuming 1% CPU (or less) each time it polls the host device. Because it is so efficient, agents collect data on every process running on the server every 10 seconds.

3. Agents are available for Windows 2000 Server and above, Windows XP Workstation and above, Red Hat Linux, and SUSE Linux.

4. For the purpose of supporting network devices including switches, routers, and firewalls any server agent can collect data from any network device that the agent can access via SNMP. ProactiveWatch supports all network devices that can respond to an ICMP ping for availability monitoring. For performance and bandwidth monitoring ProactiveWatch 1.5 uses SNMP MIB 2.0.

5. Agents open an outbound port to the Gateway. The Gateway opens 443 outbound to the PW Back End. No firewall ports need to be opened at the customer site in order for ProactiveWatch to function, substantially reducing or eliminating security concerns.

6. The ProactiveWatch Explorer (Console) is a rich client .Net 2.0 application that provides a robust user experience and that the VAR installs on a server or workstation at its site. It opens port 29443 outbound to the ProactiveWatch back end.

Internet

Customer’s Windows, Citrix, and Linux Servers Web Servers

F

ire

w

all / Rout

er

F

ire

w

all / Rout

er ProactiveWatch Gateway Customer Explorer (Optional) ProactiveWatch

Back-end Server Farm

HTTPS (443) Port Open Outbound Only

VAR Admin Console VAR Explorer Port Open Outbound Only (29443) Port Open Outbound Only (29443) Remote Site ProactiveWatch Gateway HTTPS (443) Port Open Outbound Only A A A A

A A A A

Sw it ch Sw it ch Internal Users

A A A A

A

(4)

3

III. Feature Advantages of ProactiveWatch

1. Unprecedented visibility is achieved through short polling intervals. ProactiveWatch uses

lightweight, powerful agents to collect data from servers and workstations every 10 seconds providing unprecedented visibility. Agentless systems are not capable of collecting data nearly this frequently because to do so would put an unacceptable load on the network and the WMI software on the servers. Agentless approaches therefore miss many of the transient issues that cause performance problems for users. Even other agent based monitoring software solutions don’t come close to the near real-time visibility of ProactiveWatch, as they poll as infrequently as every 15 minutes or more.

2. Rich diagnostics and snap shots quickly pinpoint problems. ProactiveWatch agents collect CPU, Memory, Disk, Handle, and Thread usage data from every instance of every process on the server every 10 seconds. In addition PW agents collect any change to the installed software on the server, crashes in applications, and all Event Log entries. Finally, when an issue occurs, PW agents “snapshot” the state of the system at the time of the issue, and forward this detailed system state along with the problem to the back end for an alert. This saves you countless hours of frustrating investigation and helps to quickly pinpoint the root cause of the problem. It is not possible for agentless systems to collect this level of detail and no other agent-based system provides this level of detail as well, which limits their usefulness in supporting servers running business critical applications.

3. ProactiveWatch all but eliminates false alarms. ProactiveWatch is the only Managed Service oriented monitoring solution with Alarm Management features sophisticated enough to virtually eliminate false alarms. Other Managed Services platforms have been known to generate as many as 2,000 mostly useless alarms a week, or to require human backup to separate the good alarms from the bad ones. This simply is not necessary with ProactiveWatch so you don’t waste precious time (not to mention hours of sleep). ProactiveWatch keeps your team focused on real problems, which is real important.

4. ProactiveWatch quickly shows you how a server or other device has changed. Robust integrity monitoring isolates changes, intentional and unintended, that are made to the monitored devices. With ProactiveWatch you can easily compare system and application settings and metrics among any number of monitored devices or to a benchmarked time and date stamp.

IV. Installing ProactiveWatch 1.5

ProactiveWatch 1.5 is the simplest managed services platform to install and bring up at customer sites. Since ProactiveWatch is delivered as a SaaS solution, the VAR does not have to go through the lengthy process to acquire and build servers and install unfamiliar monitoring software. Installing PW 1.5 at customer sites consists of the following two simple steps:

(5)

4

2. Install Agents on servers. The only configuration agents need is the IP address of the server which is running the Gateway. Once agents are installed they immediately appear in the VAR’s Explorer.

Once a Gateway and Agents are installed at a customer site, future versions of the software are

distributed and installed automatically under VAR control (the VAR chooses which servers and customers to upgrade from within the ProactiveWatch Explorer).

V. The ProactiveWatch 1.5 Explorer

The ProactiveWatch Explorer is a rich client .Net 2.0 application that the VAR installs upon workstations or Citrix/Terminal Servers at the VAR’s site. This is the only piece of ProactiveWatch software that the VAR needs to install at their site. The basic ProactiveWatch subscription includes 10 concurrent instances of the Explorer, so VARs have the flexibility to have a shared Explorer in the office, and also to install copies on laptops that travel with technical personnel.

The Explorer displays the real-time status of all of the monitored servers, workstations, and network devices across all of the customers and sites monitored by ProactiveWatch. Each monitored device is a row in the Explorer. Each column represents the status of one or more monitors. Monitors like CPU usage that can clear when the condition that caused them goes away are shown with a red square then they are active, and with a triangle when they have cleared. Events that that do not automatically clear (like Event Log entries) are shown as a red and yellow square.

(6)

5

VI. Default (Out-of-the-Box) ProactiveWatch 1.5 Server Monitoring Functionality

Without any customization or configuration by the VAR or Service Provider, ProactiveWatch 1.5 provides broad and deep monitoring functionality. Monitors are configured in easy to use monitoring templates. The template that is automatically assigned to a Gateway is shown below.

Templates that implement the monitors described below are automatically assigned to server agents when they are installed. No configuration or customization is required to activate these monitors although the default thresholds listed below can be changed if needed.

1. Site Down Monitor – The PW back end constantly monitors each Gateway (GW) to make sure that the GW is communicating back to the PW back end on the required intervals. If the GW fails to check in within the required interval (by default 60 seconds), the PW back end will issue a site down alarm. The PW back end will generate an email to the main support account at the VAR, notifying the VAR of the outage. This notification does not rely upon any infrastructure at the customer or the VAR except the ability on the part of the VAR to receive an email.

2. Internet Down – the PW Gateway measures the response time of www.google.com and

www.yahoo.com every 60 seconds. If both of these web requests fail, the GW sends an Internet Down

(7)

6

3. Server Down Monitor – the PW Gateway maintains a continuous connection with each monitored server. If that connection is broken, the Gateway sends a Server Down alarm to the PW back end, which generates an alarm and a notification as described above.

4. LAN Latency Monitor – The GW continuously checks the latency over the LAN between itself and the monitored servers. If the performance of the LAN degrades, an alarm is generated.

5. CPU Usage Monitor – If total CPU usage is above 95%, or usage by any single process is above 50% for the default time period, an alarm is issued.

6. Memory Usage Monitor – If Physical Memory usage is above 90% or Virtual Memory Usage is above 70% for the default time period, and alarm is issued.

7. Disk Time Monitor – If Disk Time (the percentage of the last second in which the disk controller is actively accessing the hard disk) is above 50% for the default time period, and alarm is issued.

8. Disk Capacity Monitor – If the free space on any disk drive falls below 5% an alarm is issued.

9. Thread and Handle Usage Monitor – If any single process uses more than the desired number of threads or handles, and alarm is issued.

10.Event Log Monitoring – All Event Log entries written to the Applications Log and the System Log are automatically collected. These can be browsed with the Event Log Viewer that is part of the PW 1.5 Explorer. Alerts for any combination of severities and logs can be turned on for any combination of servers or workstations with one mouse click.

11.Installed Programs Monitor – Any change to the set of installed programs is detected, and alerted upon. Additionally any changes to the sections of the Registry that involve launching programs (Run, RunOnce) are also detected.

12.Application Crash Monitor – If any process on the server crashes, this crash is detected. Dr. Watson crash dump information is also collected if Dr. Watson is enabled on the server.

13.Windows Service Monitor – if any automatically started Windows Service goes down, and alarm is issued.

14.Reboot Monitor – If a server reboots an alarm is issued. Normal reboots can be easily masked out with the Marked As Normal feature of the system.

15.System Profile Monitor – If the profile of a monitored server changes (for example, the IP address of a server changes), an alarm will be issued.

VII. Default (Out-of-the-Box) Network Device Monitoring

(8)

7

The default Network Device template is shown below with the default settings for network device monitoring. The monitors that are on by default are:

1. ICMP Ping Failure – The selected Server Agent will ping the network devices and issue an alarm if the devices fails to respond.

2. If packet loss on the

ping exceeds the threshold, and alarm will be issued.

3. If the response time on the ping exceeds a latency threshold, an alarm will be issued.

4. If the profile of the network device (for example the version of its installed software) changes, an alarm will be issued.

5. If utilization of any of the inbound or outbound interfaces exceeds the threshold, an alarm will be issued.

The Connected Network Device monitor is off by default since it will alarm whenever the set of devices connected to a switch or router changes. This can be a very valuable monitor in certain circumstances, but it will generate a large number of false alarms for routers and switches that support workstation and laptop computers.

VIII. Optional Monitors in ProactiveWatch 1.5

ProactiveWatch contains a wide variety of monitors that you can enable by simply selecting them in a monitoring template and turning them on. Some of these monitors require some customer specific configuration. The optional monitors are detailed below:

(9)

8

2. ICA Port Response Time Monitor – This monitor tests how long it takes a Citrix server to respond to connect request on the ICA port from the agent making the port request. Any agent can run this monitor against any Citrix server that is accessible from the computer that the agent is running on.

3. Port Monitor – This monitor allows the VAR to specify ports that must be present (80 and 443 on a web server), ports that are allowed to come and go (135 - the RPC port), and then either alert based on a specific “black list” of ports or, as is shown in the example to the right, alert on any port that is not required or specifically allowed.

4. Client2Server Monitor – This monitor

tests the latency over a TCP/IP socket between any two sets of monitored devices. The C2S monitor is an excellent choice to watch the latency between the servers that constitute the tiers of an

applications system (for example from web servers to applications servers, to database servers).

5. Process and Service Down Monitors – ProactiveWatch can be configured to watch any specific process or service. This monitor is enabled by default in the Exchange Server template to watch store.exe. It can be enabled in any Base or Add-On Template in order to watch the processes or services that comprise any application or service of interest.

6. Total Handle and Total Thread Usage Monitors – These monitors watch the total number of threads and handles in use. Since the acceptable number is highly dependent upon the type of work that a server is doing, these monitors should be turned on within a monitoring template dedicated to a specific type of server.

(10)

9

IX. Base and Add-On Templates in ProactiveWatch 1.5

ProactiveWatch 1.5 ships with two kinds of templates, Base Templates and Add-On Templates. One and only one Base Template is assigned to a computer or network device. Then, as many Add-On Templates as is desired can be assigned to a computer or network device. This makes it very easy to handle variation in server configurations (for example different kinds of backup or anti-virus software) by putting the monitors for those products in Add-On Templates, and assigning them as necessary. The Base Template and Add-On Templates that ship by default with ProactiveWatch 1.5 are shown below.

(11)

10

X. Assigning Monitoring Templates to Workstations, Servers and Network Devices

ProactiveWatch provides for an easy spreadsheet like user interface to assign monitoring templates to workstations, servers, and network devices. All you have to do to assign a template to a device is to double-click in the cell that intersects that device row and the template column. You can copy and paste template assignments en-masse so you can quickly assign a set of templates to a set of devices.

The Default Exclusions Template is automatically assigned to all devices which gives the VAR one easy place to manage all of the alarms that are not desirable. Please see more detail on this feature in the Manage Alarms section of this white paper.

Notice the in the case below, the HP Insight Manger, Symantec BackupExec, and All Event Log Errors Add-On Templates are assigned to all of the servers. This shows how easy it is to configure ProactiveWatch for the different scenarios that a typical VAR encounters at his customer sites.

XI. Managing Alarms in ProactiveWatch 1.5

ProactiveWatch 1.5 is the only VAR oriented Managed Service monitoring solution that can monitor every process on the server for usage of key resources, monitor the server for changes in the state of the installed software, monitor servers for changes in desirable and undesirable ports, monitor web and Citrix servers for URL and ICA response time and monitor the Windows Event logs with the granularity required to catch critical events in a wide variety of applications and services.

With this tremendous ability to monitor deeply and broadly comes the prospect of a significant number of false alarms. ProactiveWatch includes a four-tier management system for addressing false alarms:

(12)

11

excluded in the Default Exclusions template is the Windows Performance Logs and Alerts service going down.

2. Alarms may be Marked As Normal. Marked As Normal alarms are recognized within the Console as having occurred and are shown in blue instead of red in the grid view. Alarms can be Marked As Normal for a specific time period. For example the nightly reboot of a set of servers in a farm can be Marked As Normal if it occurs within +/- 30 minutes of 2 AM, but the reboot alarm will be treated as normal otherwise.

3. Alarms and Notifications are treated separately. Notification Rules (which cause Email Alerts) are separately defined from Alarms themselves. So the VAR can easily create a rule that sends an email immediately if a site or a server is down, but that reserves all other alarms for a summary email in the morning.

4. Resource alarms (CPU, Memory, Disk Time, Handles, Threads) can be excluded based upon which process caused them. For example, on an Exchange Server, store.exe often uses all of the memory. So, without this capability, if the threshold for a memory alarm is 90%, that alarm will always fire on an Exchange Server, since store.exe will always push total memory utilization above that point. ProactiveWatch allows you to define an Exclusion rule that masks out alarms having to do with the utilization of resources caused by specific processes. So, memory alarms caused by store.exe (and sqlserver.exe) cease to be a problem.

ProactiveWatch is also unique in that false alarms can be masked before they occur, and can be applied to computers upon which they have never occurred before they occur. Furthermore, specific alarms can be generalized, and then Excluded or Marked As Normal and applied to any set of monitored devices.

XII. Notification Rules

The last layer in the system of deciding which alarms are “important” is to decide which ones should be the basis of email notifications. Note that Excluded alarms are masked out as never having occurred, and Marked As Normal alarms are noted in the Explorer, but are masked out from the set available for notifications.

(13)

12

In the case on the left below, a notification rule is being set up that sends out an email immediately if an Internet Down, Server Down or Site Down alarm occurs on any server at the customer site named WINTER. In the case on the right below a rule is being created that collects all alarms into one summary email that is then sent at 7 AM every morning (this is a great way to get a summary of what went wrong in the last 24 hours while reading your morning email).

XIII. Analytical Tools in the ProactiveWatch Explorer

All of the Tools discussed in this section are accessible from within the ProactiveWatch Explorer by either selecting a row (selecting a monitored device) and then pulling down the Tools menu, or right-clicking upon the selected row and choosing the tool.

Show Issues

(14)

13

View Metrics

Since ProactiveWatch collects data every 10 seconds, the last 360 samples represents the last hour of data for a monitored device. That data is presented in a summary form in the View Metrics dialog shown below.

System Compare

(15)

14

Application Compare

Application Compare allows you to compare the installed software across any set of managed devices, and compare the currently installed set to the set that was installed when you made a snapshot. In the example to the right, PANTHRO2 has Windows Server 2003 SP1 and a number of security updates that are not present on PANTHRO. Since both of these servers are load balanced web servers in a farm they are supposed to be identical. However you can also see (in red) that they have both been updated to the newest version of the ProactiveWatch Agent.

Event Log Analyzer

(16)

15

Distribution Graph

The Distribution Graph allows you to compare key metrics across servers, and look at their average, minimum or maximum values. This graph accesses up to seven days of historical data. An extremely useful application of this graph is to find the

maximum number of concurrent users across a set of terminal servers in the last N days.

Trend Graph

(17)

16

Multi-User Impact Analysis

The Multi-User Impact Analysis graph is very useful to determine how Citrix and MS Terminal Servers are running out of capacity as user load grows. The graph to the right collects the last 7 days of concurrent usage data along with how key system metrics change as user load grows. This server is at 80% of physical memory at 23 concurrent users. Additional users on this server would not be a good idea.

About ProactiveWatch

ProactiveWatch provides breakthrough solutions to VARs who desire to improve their relationships with their customers, inject radical improvements in business efficiencies into their own businesses, and build new recurring revenue streams in the process. ProactiveWatch allows VARs to reap these benefits without having to reinvent themselves as Managed Services Providers, and without having to make the substantial investments in infrastructure and process changes that come with a transition to an MSP model.

ProactiveWatch is unique among remote monitoring and diagnostics solutions in the degree to which the focus of the solution is upon applications, as well as the infrastructure that supports the applications.

ProactiveWatch is headquartered in Atlanta, GA, and can be reached on the World Wide Web at