• No results found

ABSTRACT. Managing and Monitoring a Data Broadcast Network Presented by Sheila Joyce, Geocast Network Systems

N/A
N/A
Protected

Academic year: 2021

Share "ABSTRACT. Managing and Monitoring a Data Broadcast Network Presented by Sheila Joyce, Geocast Network Systems"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

SMPTE 35th Advanced Motion Imaging Conference Capital Hilton, Washington, D.C.

1

ABSTRACT

Managing and Monitoring a Data Broadcast Network

Presented by Sheila Joyce, Geocast Network Systems

A successful data broadcast network is highly reliant on a network management system that offers effective failure prevention and immediate detection and resolution. Achieving this efficiency in a network of diverse servers and broadcast devices, each with its own operating and event reporting systems, presents a challenge. The challenge is compounded by the need to provide a centralized monitoring and control station.

This presentation will review our network design and explain how we selected and integrated the components of our network management system to achieve a 99.99% network reliability goal.

(2)

Introduction

With the advent of data broadcast networks, the broadcast world and data communications world are merging, and both are feeling the growing pains. The broadcast industry is fresh from its campaign to meld digital television into an already complex analog network. The data communications industry is not yet finished with stretching its intricate networks of routers, servers, and switches to include broadband and wireless devices. Suddenly both are expected to weave ATSC into their fabrics, with utmost care for the bottom line. And network management engineers designing for these hybrid networks are expected to create seamless, highly reliable systems managed and operated by professionals from both worlds.

Geocast Network Systems builds a data broadcast network that provides a programming service and a distribution-agnostic infrastructure, which can be used to broadcast content over terrestrial digital television (DTV), direct broadcast satellite (DBS), and cable television. This paper reviews our network design—the foundation of our platform—and discusses the system we have implemented to manage it. Our goal for the network management system was to create a single operations center from which we could perform remote fault detection and correction with maximum efficiency.

The Geocast Platform

Geocast Network Systems delivers a mix of multimedia content to users equipped with a GeoBox,

supplementary hardware designed for personal computers and other home devices. The content is supplied by Geocast partners: traditional broadcasters, film distributors, Internet content producers, advertisers, and software developers. Our Grab, Bag, and Tag (GBT) system converts this content into Geocast objects, which are transmitted by satellite from a National Data Broadcast Center (NDBC) to Local Data Broadcast Centers (LDBCs) and Satellite Data Broadcast Centers (SDBCs).

The NDBC is the operational center where Geocast content is stored and scheduled for transmission by satellite to the LDBCs. Geocast distributes the content by means of partnerships with DTV broadcasters, who supply underutilized terrestrial bandwidth in their DTV signals for Geocast objects. Each partner station hosts an LDBC, where Geocast objects are inserted into the station' s DTV transport streams. The GeoBox receives its content over a digital TV signal, extracts the content from the digital signal, and displays it in a Geocast content screen on the PC. Geocast recently extended its platform to a DBS partner, who will supply satellite spectrum so Geocast objects can be broadcast from Satellite Data Broadcast Centers (SDBC) directly into homes. Since 310M will not be required for DBSs or the migration to cable, ATSC issues will be nonexistent.

(3)

SMPTE 35th Advanced Motion Imaging Conference Capital Hilton, Washington, D.C.

3

(4)

Grab Bag and Tag Processing

Content preparation specialists, using specially developed software, receive, digitize, and process multimedia content, such as MP3 files, graphics, games, and software from Geocast partners. During the GBT process, content is compressed, tagged with metadata, and converted into Geocast objects. These objects are stored in a database on a content storage server, then sent to the NDBC.

NDBC Functions

The Geocast NDBC consists of servers running proprietary Geocast software, Oracle databases, IP routers and switches, and an IP data encapsulator. At the NDBC, objects are received by the staging server, tagged with encryption and data transport tags, and sent to the scheduling server. The scheduling server queues objects to be sent to the IP data encapsulator, which incorporate them into an ASI stream. The ASI stream is sent by satellite to the LDBC.

Oracle databases at the NDBC store information about objects and object transmission. They also store information about how content will be viewed on the GeoBox.

Figure 2 illustrates the NDBC.

(5)

SMPTE 35th Advanced Motion Imaging Conference Capital Hilton, Washington, D.C.

5

LDBC and SDBC Functions

The LDBC consists of a variety of traditional broadcast, data broadcast, and data networking

devices, mounted in a dual or single rack configuration (see Figure 3). The configuration is

pre-racked at our system integrator’s facility and then installed at a partner station’s facility.

Before activating an LDBC or SDBC, the system must pass a rigorous signoff process involving station engineers, Geocast engineers, and system integrators. After installation, Geocast network operations engineers resolve any software and configuration issues, and system integrators resolve any hardware issues requiring field service.

Some LDBC equipment, such as the environmental manager, is not necessary to the network from an operational standpoint. It serves the specific purpose of allowing network operators to troubleshoot and manage devices and software at remote locations without dispatching a technician or calling station personnel for assistance.

(6)

Primary Rack

Redundant Rack

PID monitor Environmental manager Environmental manager Integrated receiver and decoder Integrated receiver and decoder IP router IP router IP switch IP switch

Video patch panel

Failover switch Video distribution

amplifier

Scheduling server Scheduling server

Disk storage array Disk Storage array

Broadcast multiplexer (LDBC only) Uninterruptible power supply Broadcast multiplexer (LDBC only)

(7)

SMPTE 35th Advanced Motion Imaging Conference Capital Hilton, Washington, D.C.

7

Integrated Receiver and Decoder

When the satellite signal reaches the LDBC or SDBC, it is received by an integrated receiver and decoder (IRD), which converts the stream to IP packets. The IP packets are stored on a disk array until the scheduling server selects and queues them for the multiplexer.

Broadcast Multiplexer

At the LDBCs, a broadcast multiplexer is used to calculate the number of null packets in the DTV station’s 310M transport stream; then it takes data from the scheduling server, replacing the null packets in the 310M stream with Geocast IP packets at the pre-assigned Geocast PID. From here, the packets continue downstream to the station’s transmitter. At an SDBC, the system is the same but uses DVB-ASI instead of 310M.

IP Router

An IP router provides connectivity between the network operations center (NOC) and the LDBC over a Frame Relay connection. The router is connected to an IP switch, and each LDBC device is connected to a switch port. This connection scheme provides remote telnet access to each LDBC device from the NOC.

Environmental Manager

Each LDBC device is also connected to a port on the environmental manager, which provides an alternate means of remote access in the event that access through the router and switch fails. The environmental manager also allows console access to LDBC devices from the NOC and, in extreme cases, hardware restarts of the remote device.

Uninterruptible Power Supply

Because it is essential that all LDBC devices continue to function during a power failure, we integrated an uninterruptible power supply (UPS) device at each LDBC. All LDBC devices are plugged into the UPS, and the UPS is connected to the regular power source. If power fails, devices begin receiving power from the UPS battery so they remain unaffected by the failure; and the UPS immediately alerts the network management system of the power failure.

(8)

PID Monitor

The PID monitor in the LDBC detects whether the Geocast PID and the station PIDs are in the transport stream. This function is critical to ensuring that the Geocast PID does not affect the station PIDs. If the station PIDs or the Geocast PID are not present for any reason, Geocast learns immediately.

Geocast monitors the bandwidth for 4 PIDs: the video PID, the audio PID, the pre-assigned Geocast PID, and the Program Association Table (PAT). When the bandwidth for the Geocast PID drops to zero, PID monitor software performs three functions:

• Executes a program to ensures that the primary broadcast multiplexer fails over to the

secondary broadcast multiplexer

• Prompts the sendmail feature to notify Geocast engineers immediately. Sendmail writes

to a log file, which immediately generates an alert on the operator console. If the bandwidth for the video PID or PAT drops to zero, sendmail notifies the chief engineer of the DTV station.

Allows the NOC to monitor the data rate of each PID to check for compliance with the

station's service agreement. For additional remote manageability, Geocast implemented Internet Information Software (IIS), which allows the NOC to view PID monitor output from all LDBCs on a single web interface.

The most critical function performed by the PID monitor is to monitor the station PIDs. Our first priority is to make sure that the television station is on the air at all times. Our secondary priority is to monitor the Geocast PID, to ensure the delivery of Geocast content.

Failover Switch

If the primary broadcast multiplexer at an LDBC fails, the failover switch sends an alert to the secondary multiplexer via its daughter board. The secondary multiplexer immediately takes over the functions of the primary multiplexer, ensuring that service is uninterrupted.

The Challenge

The challenge of Geocast network design and operation fell to the Network Management System (NMS) team and the Network Operations Center (NOC) team. In a data broadcast network, the NOC performs the function of Master Control: it is responsible for maintaining end-to-end connectivity on the network.

The challenge for the Geocast NOC was made more difficult by an ambitious service level agreement goal, to achieve 99.99% reliability in our network. To achieve this level of service, the

(9)

SMPTE 35th Advanced Motion Imaging Conference Capital Hilton, Washington, D.C.

9

Requirements

To arrive at a management solution for our network, the network management engineers carefully defined the requirements for the network management system. Then we selected hardware and implemented a suite of software tools that would enable the NOC to perform at the required level of efficiency.

We identified several key requirements for the network management system:

• Remote management from the NOC

• Immediate fault detection

• Easy to use interface

• A single operator console

• Fault tolerant architecture

• Flexibility

• Scalability

• Fault tracking and history

These requirements influenced our hardware selections and the software we integrated into our tools suite. The details are explained in the sections that follow.

The Software Tools Suite

Because the Geocast network connects equipment that is geographically scattered across the continent, we developed a software tools suite to support a centralized network management system. Traditional NMS software is at the heart of our implementation.

SNMP Compliant Software

We chose HP Openview Network Node Manager (NNM) as our underlying software platform because it meets many criteria on our requirements list. NNM implements the Simple Network Management Protocol (SNMP), an industry standard network management protocol, uses an agent-manager model for communication between devices. SNMP agent software on each device collects and stores relevant status and operations information, sending traps to the management station to report any exceptions.

(10)

Out-of-Box Functions

The NNM management station provides immediate fault detection by receiving and reporting events sent by SNMP agents on devices across the network. At configurable intervals, it polls the agents for availability and basic connection information, discovers new devices on the network, and places them in their accurate location on a geographic network map that is part of the NNM graphical user interface (GUI). It also tests device thresholds for conditions such as utilization patterns and errors, and presents alarms when thresholds are exceeded.

The polling, thresholding, and mapping features can be used out-of-box. NNM software does not require a time-consuming configuration step after it is installed.

Downstream Suppression

Downstream event suppression was a key requirement for our software. When a single network device has a problem, it often affects several downstream devices. This single event results in numerous related events, some of which provide duplicate information, on the network management console. To direct operator attention quickly to the true source of a problem, NNM, which speeds up problem isolation so corrective actions can begin sooner.

Scalability

As our network grows, our network management system must accommodate growth gracefully. It must scale in its ability to minimize the demand for network bandwidth, and additional LDBCs and new devices must be manageable with as little operator intervention as possible. We feel confident that HP Openview NNM meets these criteria.

User Interface

The number of devices on our network could potentially generate an unmanageable number of status alerts on the network management console. So we need network management software that sifts through large numbers of event messages, creating clear, real-time pictures of the network that make any problem recognizable immediately. We chose Netcool OMNIbus to provide this function. Netcool software uses underlying NNM software and provides the NOC with an easy-to-use, highly customizable user interface.

Event Viewer

The key end-user component of the tools suite is the event viewer. In networks where many different devices are managed remotely, network operators frequently spend a great deal of time referring to several different management consoles. The Geocast implementation of the Netcool event viewer offers a single user interface from which all devices on the network can be

monitored and managed. It displays a list of color-coded network events ranging from warnings

(11)

SMPTE 35th Advanced Motion Imaging Conference Capital Hilton, Washington, D.C.

11

when resolution occurs; and event de-duplication, which consolidates multiple reports of the

same event into a single report containing the first, last, and number of occurrences of the event. The event list displays a filtered view of alerts, which can be defined by the operator or administrator. This ensures that each operator sees only events that are relevant to his or her specific management tasks. Netcool has powerful rules files that can be modified to allow

customization of any kind of event.The administrator is able to customize the severity, text, and

actions that are associated with anytype of event, so that the source and meaning of the event is

clear and concise to the NOC.

Help Desk Software

Remedy software provides a single solution that meets our requirements for an asset database,

trouble ticketing system, and change management system. The highly customizable fields in all

our Remedy components made it possible to create tools that allow fast information entry and retrieval for remote devices.

Asset Database

The NOC needs accurate, up-to-date information about every device on the network to resolve problems, so a critical part of a network management solution is an asset database. When calling vendors for support, the NOC needs the serial number, vendor information, location of the device, and its IP address and hardware address. This information must be retrieved quickly and easily without onsite assistance at equipment locations.

Trouble Ticketing System

The trouble ticketing system plays a critical role in providing the NOC all the information they need to resolve problems successfully. Each time a new problem occurs, the NOC creates a trouble ticket to document the problem and steps taken to resolve it.

Trouble tickets are stored in a database. If the NOC needs to review the history of problems and problem resolution for a device, they enter the name or IP address of the device to retrieve all associated trouble tickets.

Trouble tickets also provide a description to the next operator who inherits the problem, so efforts are not duplicated. When escalating a problem, a trouble ticket provides the point of escalation with all necessary details about the problem.

Change Management Software

Many network problems are prevented when formal change management procedures are followed. A key tool in successful network management is software that notifies the NOC of planned changes, testing, or maintenance occurring on the network, and tracks the event.

(12)

Although change management software can facilitate sound procedures, it is not effective on its own. It must be accompanied by a commitment from all levels of the organization that the change management process will be enforced when any testing, maintenance, removal or addition of

devices, software upgrades, or other activity relevant to the network occurs. The importance of

the process cannot be over-emphasized: it is a key to preventing unnecessary outages on the network, and to keeping the NOC informed of and prepared for as many network-related events as possible. When device outages occur, the NOC must know of past and current changes,

because this information is often the key to understanding and resolving network problems.

Lightweight Monitoring Agent

Our network monitoring agent must accommodate changes as NOC requirements change, and as new types of equipment are added to our network. We also need a monitoring agent that accommodates non-SNMP devices and Geocast proprietary software. Instead of designing a Geocast-proprietary monitoring agent, we purchased SystemEDGE, a lightweight monitoring agent that is highly configurable.

A lightweight management agent consists of a small C program that is inexpensive, requires few system resources, and has built-in SNMP capabilities. SystemEDGE allows the NOC to monitor and manage our proprietary NDBC and LDBC processes by monitoring information and error messages written to server log files.

Out-of-box, SystemEDGE gives us built-in system level monitoring capabilities that enhance existing agents on UNIX and Windows devices in our network. SystemEDGE runs on all systems where memory, critical processes, and log files must be monitored:

• IP encapsulator, to ensure that the device and its critical processes are running

• Integrated receiver and decoder, to monitor file systems and process level conditions.

• PID monitor, to read the system log error and send alarms to the NOC when the PID is

not present.

Scheduling and other servers, to monitor proprietary processes, log error conditions, and

check CPU, swap, and file system usage.

Remote Display System

To access devices that are not accessible by using telnet, we implemented a shareware application called Virtual Network Computing (VNC). This small, simple program is platform-independent, and has the flexibility to be used on a variety of machine architectures. Using VNC, NOC staff can view from their desktop any device where VNC is installed. This capability has been invaluable for monitoring devices that are not otherwise accessible from a remote location, such as the IRD and broadcast multiplexer.

(13)

SMPTE 35th Advanced Motion Imaging Conference Capital Hilton, Washington, D.C.

13

Database Monitoring Agent

The Geocast network includes several servers running databases: content servers, staging servers, and scheduling servers, for example. These databases have monitoring needs that require specialized software and monitoring agents.

Like the lightweight management agent, Geocast software engineers could have written proprietary database monitoring agents, but we decided to purchase and implement off-the-shelf software: BMC Patrol. We implemented a gateway between BMC Patrol and our Netcool

software, so alerts from the database monitoring agents appear on the central network

management console. No separate management console is needed to retrieve database status.

Automatic Paging Software

The Geocast NMS team chose the Telamon TelAlert system for an automated paging software solution. It integrates well with Remedy’s Trouble Ticketing system and with Netcool. In addition, TelAlert can pass messages to support personnel through an assortment of media, including pager, voice mail, and email. Recipients of these messages can respond remotely—and in some cases, resolve the problem—using a two-way pager or a telephone.

The paging system offers a large selection of configuration options for stipulating who receives which types of messages, which notification media is used in different circumstances, and the escalation process to follow based on variables like area of expertise, priority, and schedule. Automatic paging is a tremendous timesaver, and offers the added advantage of enforcing our escalation procedures.

Tool Integration

The Geocast NMS team selected each software component to best meet the requirements of the NOC. However, these software components working in isolation were not enough to meet our efficiency targets; we needed to integrate these tools for NOC personnel to perform successfully. The objective for our integration design was to minimize reporting and tracking tasks so the NOC can focus on operational tasks.

Paging and Trouble Ticketing Systems

The NMS team integrated the automatic paging system with the trouble ticketing system, so that the paging system automatically takes action when a trouble ticket is opened.

The paging system acknowledges receiving the incident and escalates the problem. Using contact information entered on the trouble ticket from the asset database, the paging system determines whom to page at a particular stage of an outage. Any action taken by the paging system is logged into the ticket. This frees the NOC from making administrative decisions during escalations, when time is of the essence and service to our customers is at risk.

Change Management, Asset Database, and Trouble Ticketing

Integration of the change management software, the asset database, and trouble ticketing system allows the NOC to pull up an asset record or trouble ticket for a device and see its change history.

(14)

can be reached if further information is needed. And change management software is more user friendly because it automatically pulls information about a device into the change record, so manual entries are not needed.

Trouble Ticketing and Event Viewer

The integration of the trouble ticketing system with the event viewer allows trouble tickets to be opened automatically when critical network events occurs and closed automatically when the event is resolved. Because the trouble ticket system is integrated with the asset database, the ticket is automatically populated with all database information about the device associated with the alert—the NOC operator is not required to look in a separate location or open a separate window for the information. When NOC operators open a new ticket, they simply select the device and the ticket is automatically populated with all the asset information for the device.

Event Viewer and Adjunct Tools

The event viewer is customized so that when operator clicks on an event, adjunct tools and

applications related to troubleshooting the event are available directly from the event viewer. For

example, if an IP router interface fails, telnet, ping, and traceroute tools are automatically available in the event viewer.

Fault Tolerance

A fault tolerant system is critical to minimizing service interruptions and meeting the goal of 99.99% network uptime. If a network management system fails anywhere on the network, a means of failover to a redundant backup system must exist. The solution must include reliability features that ensure that no network problem goes unrecorded in the event of an NMS server failure.

The Geocast implementation consists of a fully redundant server for the Netcool and HP Openview servers. In addition, at configurable checkpoint intervals, event data on the Netcool server is copied to disk to ensure integrity. With this scheme, the backup object server automatically has the latest event database information in the event of a failure.

(15)

SMPTE 35th Advanced Motion Imaging Conference Capital Hilton, Washington, D.C.

15

Ongoing Challenges

Geocast network management engineers still face the following challenges:

• Keeping pace with changes in hardware technology and our proprietary software.

• Enhancing the monitoring infrastructure to include additional DBS equipment.

• Extending education and enlisting co-operation for change management procedures

throughout Geocast.

• As an NMS team, continue to develop our understanding of digital television broadcast

technology, and keep up-to-date with continuing developments in the technology.

• Accommodating customer’s special needs as they arise.

Conclusions

The Geocast NMS team has successfully created and implemented a network management system for a blend of devices that not only meets the stringent requirements of today’s data broadcasting networks, but is also cost effective. By selecting and configuring hardware with a clear definition of NOC requirements in mind, we have implemented a fault-tolerant system, co-located but safely isolated from the host broadcast network, that can be managed effectively from a central location. By integrating a disparate collection of software under one umbrella, we have created a single user interface that allows immediate fault detection and response, enforces sound network operating practices, and is proving economical to implement an operate.

References

Related documents

Configure the IP address of the wireless card to suit the IP address range of the network or – if DHCP is available from the existing wired network - configure for DHCP

Modeling the interpersonal regulatory domains measured by the IERQ (Hofmann et al., 2016), and recognizing the importance of measuring both intrinsic and extrinsic regulation

It follows then that heat-related health problems such as heat cramps, heat exhaustion and heat-stroke have the highest probability of occurring in bright sunshine,

The contractor shall notify a DNR Forester or Forest Technician 3 days prior to beginning work on sale.. Before work begins on sale, contractors will be required to have a

enter your University email address and network password then click on the ‘Continue’

The success, especially in Europe, of using anaerobic digestion for the stabilization of slaughterhouse wastes suggests co-digestion of large animal mortalities with manure

the present study has two main objectives: (1) to test the effects of different categories of valence (pleasant; unpleasant) and arousal (high; low) on two event-related

The average annual estimated catch of lizardfishes off Visakhapatnam is 229 t, which formed 5.3% of the total landings during the period 1990-2001.. Five spe- cies of lizardfishes