Validating the Accuracy of ISP Subscriber Traffic Usage Meters

(1)

Validating the Accuracy of ISP Subscriber Traffic Usage Meters

Peter Sevcik President NetForecast, Inc. November 15, 2011

Most major broadband ISPs are in some phase of deploying a subscriber usage meter to support the ISP’s consumption management strategy. Subscribers, consumer groups, and regulatory agencies are

questioning the accuracy of these meters, and there have been a number of incidents in which inquisitive users have uncovered real and/or perceived meter accuracy issues. This has generated bad press for ISPs. An independent audit of the ISP’s meter can assess and help assure meter accuracy, and provide a responsible and defensible way to address meter accuracy concerns.

Why Meter Accuracy Matters

Meters are being rolled out to support “excessive use” policies, bandwidth enforcement actions, and in some cases, usage-based billing. All of these measures are highly controversial with consumers and consumer advocacy groups, and they are beginning to garner the attention of regulatory bodies. Consumers are typically confused by the notion of usage for their broadband service. The typical user knows little about how much bandwidth his or her digital devices and applications consume. Additional variables such as the number of users in a home and their Internet behavior make understanding usage volumes even more challenging for consumers. The ISP usage meter is the primary information tool for a consumer to manage household bandwidth consumption. An inaccurate meter is misleading to the consumer, and erodes the legitimacy of any usage-related action by the ISP.

The Internet is built on an end-do-end principle that places control of a connection into consumer devices. Many consumers are smart and enjoy tinkering with the part of the Internet that resides in their homes, and savvy consumers will study the ISP’s meter. The meter must, therefore, withstand the scrutiny of smart network practitioners with simple measurement tools. ISPs must assume that any meter will be examined by watchdog individuals or groups. The watchdogs are often vocal, and leverage the Internet, blogs, and social media to publicly challenge ISPs.

ISPs have already encountered a public backlash due to inaccurate meters, in some cases causing the ISP to withdraw the meter. Simply denying the charges in a public relations response can worsen the problem. The best reaction is to acknowledge and correct any meter inaccuracy and publicly explain the correction. The wisest approach is to be transparent about how the meter works and provide solid evidence of its accuracy. Stonewalling critics with silence is the most dangerous approach because it can cause the problem to fester and explode into a public battle.

An example of a public maelstrom is documented in the March 29, 2011 online DSL Reports website publication entitled “Taking a Closer Look at AT&T's Inaccurate Usage Meters: AT&T Meter Is Wildly Over-Estimating Customer Usage” [1] and the subsequent coverage.

Measuring Bandwidth Usage Is Difficult

The Internet was designed to scale by providing only the most essential switching and transport functions to move datagrams from source to destination. Counting traffic is not in the Internet’s DNA. Adding detailed traffic counters to a large packet-switched network requires a complex overlay system of hardware and software technologies that must operate in realtime and scale to support billions of

(2)

transactions per hour. There are a variety of usage measurement approaches from in-band counters to out-of-band deep packet inspection. The systems must be built using components supplied by different vendors. In many cases, critical parts of the solution are still in development by the vendor, and are not ready for prime time. In other words, most systems are not yet mature enough to be problem free.

A meter system imposes a new set of requirements on the network management structure. The operational support system (OSS) is primarily a one-way command system from the operations center to the field that operates in “exception” mode (e.g., for use when things are to be installed or repaired). In contrast, a meter system generates a reverse flow of data and control parameters that move from the field to a central system. The staff and systems are not designed or sized to handle high volumes of data in the reverse direction.

Typical Meter System Architecture

This paper describes the typical meter system for cable-based broadband ISPs. However, all of the methodology can be applied to DSL and fiber-based ISPs even though they may use different basic traffic counters.

The typical cable-based ISP meter system follows an information flow from left to right as shown in the figure below. Implementations vary by ISP and by the vendors the ISP uses to support the steps in the flow. The intent; however, is the same—to show subscribers how much traffic they have consumed in the recent reporting period.

The flow from a CMTS to a meter report displayed on a web portal requires the following stages of information processing and refinement.

CMTS

Subscriber traffic is measured by the CMTS for each cable modem it serves. The CMTS keeps separate, incrementing counters for traffic traveling upstream (i.e., from the subscriber to the Internet) and downstream (i.e., from the Internet to the subscriber). It is critical to understand how each CMTS counts and assigns traffic to service flows. [2] The status and value of the counters is periodically reported in an Internet Protocol Detail Record (IPDR). Each CMTS periodically sends the IPDR for each cable modem (typically every 15 minutes depending on the CMTS manufacturer).

IPDR Collector

IPDR records are sent by the CMTS to a collector in the first information processing step. Collectors receive IPDRs from several CMTS devices, often in a geographic region. The collectors perform initial IPDR data validation and serve as temporary storage for the records. The collector must be ready to accept the IPDRs sent by each CMTS in its region. There is no room for IPDRs to be missed when a CMTS starts transmitting.

IPDR Aggregator

The next stage is a set of servers that integrate the IPDR data from a group of collectors. Aggregation starts by converting the incremental counters sent by the CMTS into explicit traffic counts for each period represented by the difference in the IPDR time stamps. At this stage, the IPDR records with a variety of time stamps (based on each CMTS configuration and decision to checkpoint counters and send IPDRs)

Data Warehouse CMTS IPDR IPDR Aggregation IPDR Collection Mediation

Engine _PortalWeb

Meter Record

(3)

contribute to the usage meter. These assignments may be complex and change as an ISP adds or modifies service offerings. [3]

Mediation Engine

In the next stage, a mediation engine associates hourly traffic data, often identified by the subscribers’ cable modem MAC address, with the subscribers’ account number. The hourly data is rolled up into daily and/or monthly traffic volumes. The mediation engine is the first place where the traffic volume can be checked against a usage policy.

Data Warehouse

After the meter information is finalized by the mediation engine, it is passed on to a “data warehouse” or some other general ISP business and operations database. The data warehouse stores the official meter records for use by other systems. This is a historic repository that may be used by engineering for planning, management for business intelligence data mining, etc.

Web Portal

The data warehouse feeds the usage information to a web portal from which subscribers view their current consumption statistics. This web application operates a secure subscriber interface that includes other account management functions. When the subscriber clicks on the meter page, the portal accesses recent meter records at the data warehouse and displays current usage, historic usage, and often a usage graphic. The meter data is often annotated to help consumers who are not familiar with terminology or how the usage meter works.

Internal ISP Meter Testing often Falls Short

Getting a usage meter system to be accurate, operate reliably, and scale to cover all of the ISP’s network technologies is a formidable challenge. Even though ISPs manage these efforts with appropriate system engineering processes, meter errors and failures still occur.

As the above description shows, meter systems are complex, and they require substantial implementation effort. Multiple groups or contractors often work on each system element, and each element is subject to its own specification, development plan, and acceptance criteria. This approach inevitably leads to meter system implementations managed by a number of distinct organizational silos. To make matters worse, the six major functional elements described above are often managed separately and supplied by different vendors. If an ISP slices the meter effort into the maximum of six silos, then the risk of the system not working properly is much greater. In our experience, successful meter implementations have consolidated the effort into three different functional areas requiring only three project teams.

Each project team tests its portion of the overall system to prove it is working. Unfortunately, this approach creates a classic silo management trap because no one is responsible for testing the system end-to-end to ensure that the meter works from the customers’ point of view. The customer sends traffic through the cable modem and looks at the meter on the web portal. He or she has no idea that there are six subsystems involved between the two events.

We consistently hear major ISPs describe plans to test each subsystem separately, mistakenly believing that testing each subsystem separately will be sufficient to ensure the system works end to end. In our experience, a well-designed, end-to-end test is the only way to know if a meter system works properly. Of course subsystems must be tested individually, but once this is done, end-to-end testing is essential. We have encountered consistent problems when ISPs rush to launch a meter and leave insufficient time for final “real system” testing. This flawed approach is guaranteed to lead to embarrassing failure.

(4)

Meter Accuracy Validation Methodology

To fully assess and correct meter accuracy, it is important to test the meter system end to end. A proper meter accuracy validation project will assure a successful meter rollout. Meter accuracy validation should be done in several phases, each of which is best performed by an organization that is independent of the implementation groups. Validation testing is an audit function best done by an independent authority to ensure impartiality and avoid conflict of interest.

There are three basic meter validation phases. The first is the creation of a formal description of what is in the meter report that the subscriber will read. The description does not mention any of the meter system elements, but rather the relationship of the subscriber traffic and what will appear on the meter report. Next is an end-to-end test that validates actual meter reports to the meter description. Validation answers a simple question: “Does the subscriber meter report accurately match what was defined to be in the report?” Finally, the results are documented in a public report to educate subscribers and the press should questions arise once the meter is launched. Below we describe the three phases of a successful meter validation project.

Meter Specification

Broadband usage meters are complex and easily confuse consumers. When planning a usage meter implementation, it is critical to define key meter attributes. Following is a framework NetForecast developed to help ISPs specify and describe meter operation. We use this framework at the beginning of each meter accuracy validation project to specify what will be validated.

The meter specification, which is jointly developed by the auditor, developers, and other ISP stakeholders at the beginning of each project, defines the following meter characteristics:

Traffic Counted Specific traffic is counted as defined here: Subscriber payload by service class (up and down) Protocol overhead (up and down) is identified

Background traffic (non subscriber-generated traffic) is identified. The non-subscriber background traffic is less than x up and y down. Granularity The mathematical base of the meter value is defined and the value is

presented to the subscriber within x units (digits) over a defined time period (e.g., hour, day, month).

Error Bounds Usage data supplied to the subscriber is +/- x% accurate per the definition of what is counted within the defined granularity by the meter system.

Timeliness The subscriber's view of their meter updates on the subscriber's "my account" portal within x hours of subscriber usage activity and represents traffic counted across a specific time span in a specified time reference (i.e., UT, local time).

Data Warehouse CMTS Subscriber View IPDR IPDR Aggregation IPDR Collection Mediation

Subscriber Traffic

(5)

Accessibility The subscriber view of the meter is easy to find within x clicks from the ISP’s service home page.

Availability The subscriber view of the meter is available x% of the time. Clarity The description of the usage meter on the "my account" portal is easy to

understand by a typical subscriber with click-through to more information as appropriate.

The specification should be a public document that describes how an ISP’s meter works and how to interpret the usage data presented to the subscriber. The document provides key background information to a subscriber who wishes to test meter accuracy. What a subscriber may perceive as a meter inaccuracy may in fact be an incomplete understanding of how the meter works and how to appropriately compare their measurements with those recorded by the meter.

Independent Validation of the Meter Specification

The independent auditor validates whether the meter does what the meter specification says it will do. It is important to note that the auditor has no stake in what the meter specification says. If the specification states that the usage measurements will show up only once per month and may be in error +/- 20%, then that is what the ISP has committed to deliver. If the end-to-end validation of the meter agrees with the statement, then the meter is certified as meeting the ISP’s specification.

The auditor’s end-to-end test of the meter system must be carried out using an independent system dedicated to the task. It must stress the meter system’s accuracy and therefore must be capable of high degree of accuracy. It must also be able to generate traffic of sufficient volume to impact the values, or “move the meter” on the ISP’s reports. The auditor’s system should reflect the architecture shown in the following figure. Test Server Data Warehouse CMTS IPDR IPDR Aggregation IPDR Collection Mediation

Defines subscriber traffic in the meter report

Traffic Counted

Internet Cable

ISP

Precise Reference Traffic Transfers Traffic Counted Traffic Observed Meter Report Meter Record The Validation

Do the ISP meter data (green) accurately match the independently measured traffic (red) as defined by the Meter Specification?

(6)

The end-to-end tests transfer reference traffic measured to a degree of accuracy at least one order of magnitude greater than the ISP’s meter. The error analysis must account for differences in traffic counts that occur at different points of instrumentation. For example, the calculations must adjust for what is counted in an IPDR versus what a consumer’s PC will count. Comparison between the test results and the specification must cover each element of the specification, which will include more factors than the difference between tested and metered traffic counts. Extreme care must be taken to ensure that the test traffic is the only traffic on the subscriber’s link.

Each separate test of down and up traffic must be under the audit team’s control, be unknown to the ISP, and be randomly scheduled. There must be multiple tests performed in order to acquire sufficient samples to generate a statistically valid conclusion regarding meter system accuracy.

Once the tests are complete and meter reports are acquired, a comprehensive error analysis will highlight small discrepancies in the ISP’s meter system. Of course at all times the audit system must check to ensure it is not introducing errors into the results. Multiple points of comparison between the ISP’s meter and the auditor’s measurement tools help assure that the test system is consistent. There is a limit to the number of locations that can be instrumented for the audit. But measurement locations must cover a sufficient variety of the ISP’s infrastructure to assure that any accuracy conclusions are defensible. Some aspects of the Meter specification do not lend themselves to technical measurements, but rather are best simply observed by the audit team (e.g., the clarity of the meter FAQs). All of the tests, analysis, and observations are referenced back to the meter specification. The goal is to validate that the specification is adhered to and not to change the meter system.

Public Meter Accuracy Report

The greatest value from a successful validation project is a report that documents the results. [4] This report should be publicly available to subscribers, and should cover more than the validation

methodology. It should also provide information to the consumer about how traffic works on the Internet and within their ISP. An important aspect is to describe the differences between what a user might perform as his or her own validation test from home and what will appear in the meter. To date, the most often perceived usage meter accuracy issues have actually been misunderstandings of why the value “I see on my router” is different from the value shown in the usage report on the ISP’s customer portal.

Lessons Learned

We have been involved in independent meter accuracy validation projects for several major ISPs over the course of more than two years. In our experience, usage meters are generally not yet mature offerings for ISPs or the vendors that supply constituent parts to the ISPs. Following is a summary of the types of issues we have encountered thus far. They are presented here so ISPs and their vendors can investigate these topics before implementation and avoid costly remediation during meter deployment.

Some of the issues discussed below are a natural byproduct of how the network operates and cannot be mitigated. These causes of counting errors limit the ability of an ISP to promise an extremely accurate usage meter.

Under-reporting by the Meter

We define under-reporting when the ISP meter shows less traffic than was actually sent over the subscriber line (negative error). This can occur from a variety of conditions but the following have been observed quit often.

Collectors or aggregators cannot keep up with IPDR volume and drop either a few records, or in some cases, a batch of records from an incoming source.

(7)

Aggregation loses time synch with the IPDR time stamps creating time gaps in the meter data. The effect can create a large hourly value when the meter “catches up”. Or it may never catch up, in which case the traffic is never counted.

Packet loss on the CM-CMTS uplink during up tests will cause TCP to retransmit. A subscriber device that is counting traffic will see the packet that was lost along with the retransmitted packet. The net effect is that a lost packet is counted twice by the subscriber but only once by the CMTS.

Over-reporting by the Meter

We define over-reporting as when the ISP meter shows more traffic than was actually sent over the subscriber line (positive error). This can occur due to a variety of conditions, but the following have been observed.

Some part of the meter system is resending meter records. Some of the meter system information flow processes are asynchronous file updates with little or no positive feedback that a file was properly received. Under these conditions, a duplicate file will be added into the system.

Packet loss on the CM-CMTS downlink during down tests causes TCP to retransmit. A subscriber device that is counting traffic sees the second packet but the CMTS saw and counted both the lost and retransmitted packet.

There is a low level of background traffic on the CM-CMTS link due to the way the network

operates. In some cases it also picks up traffic that is broadcast by neighbors on the common Internet path into the ISP. If the background traffic is too high, it can materially impact the accuracy of the meter that will show up as over-reporting.

Erratic Meter

An erratic meter is when a dramatically “wrong” value appears from time to time in the meter records. Most such events are difficult to diagnose but the following have been seen.

Errors in the CMTS traffic counters cause wide swings in the meter value (this may occur when a CMTS or a cable modem resets).

The mediation engine can have hourly, daily, monthly roll-up errors. For example there may be issues with time zone conversions and the effect of moving into or out of daylight savings time.

Late Meter

The meter system can take too long to show traffic events. This causes a failure to achieve the timeliness goal in the Meter Specification.

Summary

We have described many sources of measurement inaccuracies commonly encountered when ISPs deploy Internet traffic usage meters. The complete list is much longer. To avoid these inaccuracies, it is essential for ISPs to have a well thought through plan for end-to-end system testing that will enable inaccuracy problems to be exposed and corrected early in the deployment process. It is also critical to have an independent end-to-end system audit to provide independent validation that the usage meter is, in fact accurate and performing within well-documented specifications. This protects an ISP’s reputation for fairness and technical competence, and engenders subscriber trust.

Based on NetForecast’s experience, if you are deploying a usage meter, there is no doubt that you will encounter problems. Most major broadband ISPs are in some phase of deploying a subscriber usage meter to support the ISP’s consumption management strategy. An independent audit of an ISP’s meter can help assure meter accuracy and provide a responsible and defensible way to address meter accuracy concerns.

(8)

ISP usage meters can be accurate, and independent testing has shown some to be very accurate. The figure below shows the distribution of errors measured across four different CMTS brands. The results are based on more than 1,000 measurements. It shows a bias towards under-reporting.

Accurate usage meters shine a light on an unknown and misunderstood aspect of the digital age: bandwidth consumption. This allows consumers to become better informed, and better-informed consumers will help positively shape the Internet’s future.

About the Author

Peter Sevcik is President of NetForecast and is a leading authority on Internet traffic, performance, and technology. Peter has contributed to the design of more than 100 networks, including the Internet, and holds the patent on application response-time prediction. He can be reached at [email protected].

References

[1] “Taking A Closer Look At AT&T's Inaccurate Usage Meters AT&T Meter is Wildly Over-Estimating Customer Usage”, DSL Reports, March 29, 2011.

http://www.dslreports.com/shownews/Taking-A-Closer-Look-At-ATTs-Inaccurate-Usage-Meters-113442 [2] “Understanding IPDR Service Flow Counters for Usage Metering Applications,” Active Broadband Networks, Inc., March 2010

[3] “What Counts? Accurately Accounting for End-User Traffic with IPDR”, Andrew Sundelin, Applied Broadband, Inc., SCTE Cable-Tec Expo 2010.

[4] “Comcast Usage Meter Accuracy,” Peter Sevcik, NetForecast Report 5101, May 2010. 0% 5% 10% 15% 20% 25% 30% ‐1.0%‐0.9%‐0.8%‐0.7%‐0.6%‐0.5%‐0.4%‐0.3%‐0.2%‐0.1% 0.0% 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9% 1.0%