• No results found

Technical Challenges

NE 1 NE 2 NE 3Network

(a) Sequential requests (b) Concurrent requests

NE 1 NE 2 NE 3 Network Time Time elapsed Time elapsed response 3 response 2 response 2 response 3 response 1 response 1 request 3 request 2 request 1 request 1 request 2 request 3 V V V V V V V

Figure 1-13 Impact of Bulk Operations on Management Efficiency

Distribution and addressing—How to allow processing to be distributed across different systems to allow the introduction of additional hardware horsepower when required, and how to provide for location transparency and efficient addressing to shield application logic from such distribution. Again using the party analogy, when you unexpectedly go beyond a certain number of guests, you would like to be able to increase your food preparation capacity. If you have only one caterer and one oven, you might be out of luck. To increase your cooking capacity, you would like to be able to add a new oven quickly and thus “distribute” the cooking across several ovens and pots and pans instead of having to upgrade to a larger oven and larger pots and pans, which, beyond a certain size, becomes impractical. Ideally, your caterer will be able to handle increased capacity accordingly. If you had to add a second caterer, it would require you to coordinate between them and keep track of which caterer is responsible for what, which you would rather not do. This means that you want to keep the fact transparent that distribution has even occurred.

One final word concerning how to measure scale: Most network management providers claim that their management applications are scalable. Statements such as “supports millions of objects” are often made. But what does that mean? Do those objects consist of a Boolean true/false flag, or do they represent entire devices in the network? Would they be synchronized with the network resources that they represent up to the minute or once per week? Does the application require a

Management application

Management application

(a) Sequential incremental requests (b) Bulk request Time Time elapsed Time elapsed response request request response request response request response request V V response

supercomputer to run on, or will a PC do? Clear metrics, such as those in the list that follows, are required. Of course, to be comparable, claims for scale must all be based on clearly defined hardware configuration and system load:

■ Management operations throughput (per time unit, with stated assumptions on the nature of the operations, the number and complexity of parameters, and the number of network elements involved)

■ Event throughput (per time unit, maximum throughput [a burst over a short period of time] and sustained, raw receipt of events; or including some kind of processing, again with a predefined scenario)

■ Network synchronization capacity (for example, how many network elements an application can synchronize with—that is, retrieve information from—in a given unit of time)

As a side note, it should also be mentioned that, in addition to scale from a technical standpoint, service providers and enterprise IT departments expect a management system to realize economies of scale. This means that the incremental network management cost to introduce more capacity and network elements to the network should get smaller with the size of the deployment. On the flip side, not only large scale, but also small scale can be an issue. For instance, before going to large-scale network deployments, field trials of much smaller scale generally are conducted to verify the soundness of a network solution. For these scenarios, it is important that the cost of the management solution does not become prohibitive.

Cross-Section of Technologies

Building network management systems involves many different technical areas, each requiring its own specific subject matter expertise. Therefore, a firm grasp of a wide array of technologies is required to build effective nontrivial network management systems. This makes network management a technically demanding discipline because it requires a significant amount of breadth in technical expertise.

Let us take a look at some of the technologies that are typically used in network management.

Information Modeling

The centerpiece of any management application is how the application domain is modeled—that is, how network devices, cards, ports, connections, users, services, and dependencies and relationships among them are represented. The resulting models are abstractions of the real world that management algorithms and network managers have to operate on. Ideally, management applications are model driven to a certain extent. This makes them easier to extend and maintain, which is very important, given the constant technical evolution of networks and services that need to be managed.

Successful information modeling requires expertise with object-oriented analysis and design techniques and methodologies, such as the Unified Modeling Language (UML). To avoid reinventing the wheel, it is helpful to be familiar with the many models that industry consortia and standards bodies have previously defined so that they can be leveraged. Perhaps most important are good modeling heuristics and plain common modeling sense. Modeling, like design, is a creative activity. Often there is no objective “right” or “wrong” way to model, but models surely differ in how adequate they are for a particular problem domain, affecting greatly how effective, at what cost, management applications ultimately are. This requires good technical judgment and a good sense for design trade-offs.

Databases

Management systems typically require persistent storage. For instance, they need to store configuration information with which to provision the network and services. Often they also cache information from the network. This way, they avoid needing to query the network element each time someone asks for it, which improves management application performance and scalability. In many cases, management applications also need to store information that augments the information from the network with application-specific data that is not of interest to (and, therefore, not kept in) lower-level systems and network devices, such as customer information. Of course, management systems generally use and leverage existing database management systems instead of developing their own custom ones. In addition, modern development tools shield applications developers to a certain degree from database intricacies. However, aspects such as performance tuning (disk I/O frequently is a bottleneck) and efficient mapping of information models that are often object oriented into databases that are usually relational (rather than object oriented) still require familiarity with database technology.

Distributed Systems

By definition, management applications are distributed applications because they involve systems that manage and systems that are being managed. In addition to that, to meet requirements for scale as well as requirements for reliability and availability, it is often required to allow the managing system to be distributed itself. For instance, if a server runs out of horsepower to support a network of a given size, it is desirable for additional hosts to be added to increase management capacity. Likewise, reliability and availability requirements often extend from the network to the management systems, requiring a capability to fail over between systems, resulting in graceful degradation instead of a sudden failure of management capabilities. Maintenance requirements might require that individual systems be taken out of service, allowing others to take over their management duties. Similar requirements exist for the support of global management operations that follow the sun, shifting the main management load, for instance, among operations centers in Los Angeles, California; Barcelona, Spain; and Bangalore, India.

None of these requirements can be addressed simply through hardware. For instance, a reliable server does not protect against outage resulting from, say, flooding of the building it is located in or a terrorist attack. Likewise, there is typically a limit to what scale can be addressed simply by using larger servers. Instead, these issues need to be addressed through software. Therefore, many management applications need to be architected as distributed software systems that can distribute and reassign processing load between servers that can be geographically distributed.

Communication Protocols

By definition, management applications communicate with other systems—the network elements they manage, as well as possibly other management applications. At least as far as network elements are involved, this communication occurs using management protocols. Management protocols define the rules by which the systems that are involved in management communicate with each other. The technical properties of those communication mechanisms and their impact need to be well understood because they can have a profound influence on how management applications should be built. For example, is communication reliable, or can pieces of information get lost? How are pieces of information in the device identified and retrieved? What information throughput can be achieved? As with other networking applications, communications trade-offs need to be well understood to arrive at a sound overall system design.

For example, an event-oriented communication paradigm in which the management application can rely on the network element to inform it of any relevant events and changes in the network has an impact on the required complexity of network elements. In this case, network elements have to be capable of storing and retransmitting events in case they cannot be sent at the moment, they are lost, or their receipt not confirmed. This is considerably harder than having the network element merely try to send an event and then allow it to forget about the event, not knowing or caring whether it ever reached its destination. On the other hand, if a management application cannot rely on being automatically informed by network devices when something important happens, it must poll the device whenever it needs information about the network and find out by itself what, if anything, has changed. This results in higher management communications overhead and has implications on the management application’s capability to scale—after all, in many cases, nothing will have changed, meaning that much of the communication is wasted.

User Interfaces

Last in this list, but not least, human factors need to be considered. Networks can be of enormous scale and complexity. Hence, vast amounts of management information need to be visualized and navigated in an efficient manner. Consideration must be given to how to make operators efficient in performing their tasks: The user interface needs to make the operator productive, as measured, for instance, in terms of the number of operations performed per time unit or the number of network elements that a single operator can safely monitor, while preventing operational errors. In addition to human factors, there is the technical aspect that the user interface back end on a

server must scale well. In many cases, hundreds of operators need to be supported simultaneously, requiring large amounts of information to be exchanged between server and user interface clients, to keep information that is displayed to operators up-to-date.

Figure 1-14 depicts a typical screenshot for a network management application GUI. The network and its topology are depicted on a map, with icons color-coded to immediately give an overview of the overall health of the network. Different ways to navigate the map and zoom into different portions are provided, including a listing of what’s in the network that follows a file explorer metaphor. Tabs are used to switch between tasks, and subscreens provide the user with the most recent noteworthy events in the network or the status of management tasks that were recently issued.

Figure 1-14 A Typical Screenshot of a Network Management Application

Other Considerations

In addition to the technologies that are required to build a management system, a good

understanding of the managed technology itself is required—that is, of the managed network and services. Specifically, an understanding of what aspects are unique about the network and services that need to be managed is required, along with an understanding of what aspects are fairly generic and might be common to other managed technologies. For example, management of a voice network and management of an optical transport network have many aspects in common—for

example, topologies need to be displayed on a map, devices must be monitored for alarms, and inventory must be tracked. Other aspects are completely different—for example, the voice network requires management of the dial plan that allows voice calls to be directed to their destination according to the phone number dialed, whereas management of the optical network might involve managing how optical links that carry different wavelengths of light can be cross- connected.

Finally, an understanding and appreciation of the network provider’s workflow are required, along with how the management system fits in with the overall operational structure—what the management system is intended for in the first place. A thorough understanding of the system’s purpose and how it fits in with the larger context of overall network operations is of tremendous value because it facilitates prioritization between requirements and provides guidance when trade- offs between certain system aspects are required.