[NORMAL] How a Proactive NOC Strategy Can Optimize Availability While Lowering IT Costs WHITE PAPER

(1)

How a Proactive NOC Strategy

Can Optimize Availability While

Lowering IT Costs

(2)

Executive Summary

Entuity solutions are designed to lower the cost of IT and accelerate the introduction of new services that depend on IT. Nowhere is this more important than in the Network Operations Center (NOC).

Within any large enterprise or service provider, the NOC is responsible for managing the network infrastructure. Increasingly the NOC’s function goes beyond this: it has a double strategic role: to ensure the resilience of critical, often revenue-generating, business services to the organization as a whole to be in the front line of infrastructure cost containment.

With the explosive growth of the Internet, and the increased dependence that this has placed on network quality of service, the NOC has become a mission-critical business unit. The Internet Age has seen a shift away from centralized control over building business applications and deploying network resources.

Distributed systems have put untold pressure on management – often because the speed and variety of networks are rising exponentially.

Organizations spend large amounts of capital to design, build and deploy NOCs. The choice of management technology has been shown to have a direct impact on return on investment. This white paper seeks to answer some key questions:



How does a business measure a return on NOC investment?



How does an organization cut the operational expenditures usually associated with the deployment and maintenance of a new management platform?



How do companies maximize their most important asset – qualified technical people – in an environment that is increasingly chaotic?

A successful NOC follows three fundamental dictums:



Keep problem resolution time to a minimum.



Employ a proactive management model, which locates and provisions for the fixing of network problems before the end user community becomes aware of their existence.



Ensure that fault and performance problem resolution goes beyond the network itself and incorporates visibility of applications and servers (and their associated help desks) because the network end user community perceives service quality through the desktop applications they use. This paper shows how Entuity’s network management technology meets these three challenges by integrating the three essential elements of infrastructure management -- performance, availability and resource management -- into one solution. This permits Entuity to deliver a low cost of ownership, strong performance and a wide range of capabilities when compared with other management suites.

The NOC today

The limitations of the NOC today are predominantly those imposed by the network management technology they use. Senior managers in most NOCs complain about the same three fundamental problems:



Lack of network visibility – particularly in layer 2 switched networks.



Too many alerts with too little information content.



Inability to correlate network alerts with help desk calls. Most management tools deployed focus on availability monitoring, locating outages but providing no insights into the types of network fault that cause applications to run slowly. Compounded by the fact that these tools are unable to determine business impact, i.e. which PC’s and servers are affected by particular problems, the NOC is left to operate in a vacuum, with little or no ability to share network management information with other IT departments.

Even the seemingly simple task of locating network end users has become time consuming. Because of the inherent limitations of legacy management technology, time to problem resolution in most NOCs is not decreasing, yet the cost of network ownership is constantly increasing.

(3)

Types of Network Fault

Broadly speaking, network faults can be categorized by:



Service outage – network outages that prevent desktop applications from communicating across the network.



Service degradation – network problems that cause desktop applications to run more slowly, e.g. brownouts, which cause packet loss and the need for retransmissions. Within the NOC, most emphasis has been placed on deploying management technology to monitor the network for service outages, or availability monitoring.

Unfortunately, most network end users complain not about service outages, but about degradation of service –” why is my application running slowly?” is a question many NOC managers hear on a daily basis. Service outages are rare, particularly in today’s LAN environments. And because most Enterprise and Service Provider networks are designed with fault resilience in mind, an outage in a particular area of the network does not often prevent the end user community from continuing to use network applications.

Network services often show signs of degradation before they go down. Until the NOC monitors a wider spectrum of faults within the network infrastructure, their ability to operate in a proactive mode will be inherently restricted.

Stateless Event Management Model Limitations

“It’s useful to know about a network problem, it’s also useful to know when it’s gone away” A fundamental problem in the NOC today is how to determine when a particular network alert has cleared. Has someone been dispatched to fix an alarm which occurred at 2am this morning, has the problem disappeared, or is it still occurring? Without the automated ability to clear down alerts when a problem has disappeared, the NOC is left with a serious event management problem.

The MoM (Manager of Manager)-type solutions deployed in the NOC today suffer from this scenario – namely their inability to automatically track problem states. This situation is rooted in two architectural weaknesses:



Any correlation models, which can be deployed are typically left to the end user to configure.



Most events forwarded to the MoM are ‘stateless’, i.e. they have no equivalent ‘antievent(s)’ to cancel them out. The stateless event is a considerable burden, and leaves the NOC with little more than the ability to search through historical

event logs in a reactive manner when troubleshooting ongoing problems. Prime examples of stateless events are SNMP traps and syslog events.

Eye of the Storm Empowering the NOC

The recognition of the business criticality of the NOC, combined with the definition of what makes the NOC successful as a business service, underpins the design of Eye of the Storm® (EYE).

EYE's IT Service Delivery management offers total performance management that alerts you to degradation before service breakdowns occur. The toolset determines the quality of experience for all applications and services, so providers can deliver the best possible user experience, taking action to prevent downtime. This allows users to:



Manage performance in real time – identifies how every application, server, port and link is performing in real time so you can take action.



Respond to service degradation - Entuity’s exclusive Service Degradation Index™ technology scores the performance of every device against its maximum capability in real time so resources can be deployed to avoid ‘brownouts’ and service degradation.



Identify likely trouble spots – highlight high utility and high congestion areas so bottlenecks can be fixed before they close down. It determines the risk level of every trunk or interlink connection.



Monitor Service Level Agreements – determine exact SLA conformance so you pay only for what you get.



Support decisions with full reporting – preconfigured reports provide the perfect picture of the infrastructure according to business unit, technology type, geography. Flex Reports allows you to build your own unique reports to fit your particular requirements.

EYE monitors the spectrum of faults from degradation to service outages. With both a real-time alert based interface and a historical trending interface, EYE empowers the NOC with management visibility, particularly in complex layer 2 switched environments, where time to problem resolution is often high. By recognizing that knowledge of business impact is just as important as knowledge of a network fault, EYE unifies the NOC with the help desk, to reduce the number of open trouble tickets for a particular network problem.

(4)

Degradation of Service’ Monitoring

Entuity products go one step further than the tools deployed in the NOC today, by determining when, where and why the network is running slowly. Known as ‘brownouts’, these types of network faults are often the most difficult to locate, especially in layer 2 switched environments. This allows the NOC to locate faulty cabling, faulty NIC cards and faulty transceivers which cause application response times to degrade.

Using Entuity’s Service Degradation Index™ technology, the burden of analyzing esoteric SNMP MIB variables is removed from the NOC, empowering them with usable information, not random data.

Event Management and Presentation

An important requirement of the business oriented NOC is event management; to know not only when a problem has occurred, but also when a problem has gone away. Entuity provides this functionality within the Bulletin Board event browser.

Bulletin Board provides two separate event panels - Logger and Tracker. The Logger panel provides a sliding window-based history of events from the Entuity platform; the Tracker panel contains current events.

Presentation of events is carefully managed, with a number of options that keep the display transparent and meaningful:



Event Correlation, matching alarm clear events with the original alarm, so displaying only those problems that are still occurring in the Tracker panel.

A simple example of this is the canceling effect that a ‘link up’ event has on a ‘link down’ event for a port within the network infrastructure. By automatically removing events from the Tracker panel when network problems have disappeared, the NOC has much clearer visibility into ongoing problems.



Event Aggregation, prevents events from the same source hiding other events.



Event Suppression, suppresses events from a particular source using a timeout value or until manually reactivated.



Event Ageout, sets how long an event remains in the short term tracker before deletion (events remain in the logger).



Event Prioritization, you decide the importance of an event type.



Event Annotation, add notes to events. This has real flexibility e.g. add annotations against all events from a

particular source, against just events of one type from a source or against an event regardless of source. Each annotated event is easily identifiable.



Event Configuration, event color-coding and sound notification.

Viewing Problem History – Graphs and Event Logs

Searching through event history logs to ascertain whether there have been any problems within a particular area of the network is a time consuming, labour intensive activity.. An example of this is in trying to figure out why a server backup failed last night. EYE alleviates this problem through:



Historical fault information presented as time series graphs, as well as event based information. Graphs make the task of isolating the time of a fault, and measuring its severity over time, much easier for the NOC.



Event history views be sorted against various criteria from the location of the problem, to which PC’s and servers were impacted by it, allowing for more rapid determination of the time of faults within particular areas of the network.



Event histories viewed in a consolidated form across groups of devices and VLANs to locate problems within spatial regions of the network. A good example of this is the ability that the technology provides to quickly determine whether there were any outages in the backbone VLAN(s) last night.

Business Impact Management:

Unifying NOC and Help Desk

A major issue within both Enterprise and Service Provider networks today is that the NOC has deployed tools which can alarm on network infrastructure problems, but is unable to map those problems to the IT services, and hence the business units, impacted.

An alarm saying that a fault has been reported on “slot 2, port 3” of switch ABC has tangible meaning to the NOC, but cannot be correlated with a help desk call inquiring as to why a PC has lost network connectivity. The fact that the PC in question happens to be attached to the port reporting the fault is invariably unseen in the NOC until now.

EYE maps the connectivity of PC’s and servers to the network, so when a fault is detected within the IT infrastructure, an indication of which PC’s and servers are impacted by the problem is displayed along with the problem itself.

(5)

By providing visibility into the impact of network faults on the business, calls from various IT help desks to the NOC can be much more readily interpreted in relation to known problems in the network at that time. Entuity calls this functionality Heightened Awareness™ and it is in real time.

Heightened Awareness is a combination of:



State awareness – understanding what’s going on inside every device on the network.



Dependency Awareness – knowing exactly how each device relates to its parents, children and siblings. When a problem occurs, the NOC can identify every effected device and application and user.



Business Impact Awareness – it’s not just about what box has gone down (or slowed down), it’s about knowing what applications are running on that box, who uses them and how important it is. Entuity gives you the full picture.

Sharing Management Information - Business

Views

An ongoing problem within the NOC today, particularly within the Enterprise, is its inability to share network management information with other IT services, such as desktop support. Because EYE maps the network onto the desktop and server environment, network management information can now be shared between IT departments in a form that is usable. For example, the UNIX server support group can be presented with views that limit their visibility only to areas of the network to which UNIX servers are attached, allowing them to receive fault alarms, check configuration settings such as duplex and speed, and monitor traffic volume and utilization. This sharing of information results in a great reduction in the number of support calls that the NOC receives, freeing up time that may otherwise be spent proving that the network is not in fact the cause of a particular IT problem.

Making the NOC more aware of ‘Network

Behavior’

An important exercise for the proactive NOC is the education of its staff in the understanding of the behavioral patterns of the particular network that they are managing. EYE enables historical ‘look back’ to determine which ports had high levels of degradation/utilization/traffic volume, etc. so NOC managers can prevent problems and catch intermittent faults.

Examples of this include knowing where there are bandwidth problems, which areas of the network are growing most rapidly in usage, and what was the impact of adding a new desktop application to a particular VLAN user group. EYE provides all this information by trending network statistics such as utilization, traffic volume, and ‘top talkers’ within VLANs.

A large number of user configurable reports allows the NOC to check, for example, which are the busiest trunk ports, which are the busiest WAN links, and which are the busiest VLANs, both during the day and during the night. As well as providing insights into why particular applications run slowly at particular times of the day, these reports also allow the NOC and network engineering to be more proactive in the provisioning of future bandwidth needs.

Managing the Network End User

The primary customer of the NOC is the network end user responsible for the generation of network traffic. Unfortunately, in many cases, they are the first to alert the NOC to particular network problems. Entuity technology allows the NOC to quickly locate where within the network infrastructure an end user is located - i.e. the switch or hub port to which they are attached. Armed with this information, the NOC is able to much more rapidly respond to network end user queries - from checking fault history information to checking port configuration settings such as duplex, speed and VLAN settings.

Another ongoing task within the NOC is the provisioning of network infrastructure for the addition of new users. The NOC is constantly in search of ‘spare network ports’ to which to add users, and often resorts to the procurement of yet more chassis cards in the drive to satisfy demand.

Entuity allows the NOC to achieve a much greater return on network investment by providing the NOC with spare port inventory reports that indicate the best candidate ports for re-usage, based on the length of time since they were last active.

(6)

North America Headquarters

8 West 38th_Street,

New York, NY 10018 Toll Free 1 800 926 5889 T : 212 489 5733

North America Regional

4 Mount Royal Avenue Suite 240

Marlborough, MA 01752 T : 508 357 6346

EMEA Headquarters

9a Devonshire Square London, EC2M 4YN T : +44 (0)20 7444 4800 F : +44 (0)20 7444 4808

Eye of the Storm Summary

The Entuity™ Eye of the Storm® (EYE) management solution delivers network control and predictability that enables enterprises and system integrators to manage network services and assets, meet service level commitments, and implement best practices in service delivery. EYE uniquely provides automated, continual discovery of network infrastructure inventory and connectivity to maintain an up to date knowledge base of the network from the core to the edge. Coupled with powerful integrated fault and performance management capabilities and real-time notifications to physical network changes, critical business initiatives can be effectively deployed and efficiently maintained. The rich historical information captured within the EYE integrated CMDB can also be a source to other management solutions, such as configuration, application, or systems management programs, participating in an end-to-end

management solution.

Entuity’s customers include Global 2000 companies solving mission-critical business initiatives, leveraging complex and dynamic network environments. A sampling includes: ABB, ABN-AMRO, Astra Zeneca, Bloomberg, Clifford Chance, Cooperative Financial Services, Deutsche Bank, First Horizon, IBM Global Services, MAN Financial, Morgan Stanley, SAP, Sony, Square D, Switch & Data, TIAA-CREF, University of Minnesota, and Verizon.

Eye of the Storm Integrated Network Suite

EYE provides a succinct suite of the most important functionality for network management,

presented in an easy to use, quick to deploy format. EYE delivers the best price-performance and strongest range of capabilities as the practical middle ground between single function solutions that are difficult to integrate, and heavily laden frameworks that are difficult to deploy, learn, use, and support. EYE enables companies to quickly and efficiently reach their business goals including:



Unification of business systems, processes, and infrastructures through cohesion of disparate networks following mergers or acquisitions; by automatically assessing the assets and connectivity of each network.



Optimization of the utilization and deployment of current IT assets; both to forestall capital expenditure and deliver improved service levels.



Reduction of implementation and reoccurring cost through ease of deployment, fast learning curve, and ease of use and administration, with an immediate ROI and fast time to value.



Improved productivity in the Network Operations Center (NOC) with real-time troubleshooting to isolate root cause problems and performance anomalies most likely to degrade service response.



Successful deployment of bandwidth-intensive applications without incident by determining network bandwidth headroom during feasibility studies, efficiently planning deployment, verifying performance during pilots, and monitoring service levels post deployment.



Mitigating risk and ensuring compliance for corporate security or configuration management initiatives by delivering a real-time, accurate, and detailed network inventory CMDB feed into compliance or configuration management applications.



Assuring the quality of delivered or received services of managed infrastructures through easily understandable reports of the performance, availability and resource levels detailed in service level agreements (SLA).



Successfully implementing network management best practices, such as ITIL Service Management and Service Delivery, ensuring alignment with corporate goals and objectives.