EXECUTION OF THE TRIAL - Closed Environment Testing of ISP Level Internet Content Filters. Repo

Chapter 3: Execution of the trial

Overview

Chapter 3 describes the methodologies used to evaluate the selected filter products for performance, effectiveness, scope and adaptability. In order to appreciate the relationship between the particular methodology followed in measuring performance and the real-world manner in which ISP networks operate, this chapter includes an outline of the architecture of a typical ISP network. The configuration of the test network used for measurement of performance and effectiveness is then described.

For measuring performance, this chapter describes how, in order to assess the extent to which a filter introduces any changes to the throughput of an ISP’s network, the trial collected data to enable comparison of the performance of the test network with no filter installed, with each filter product installed but not actually filtering content (passive mode) and with each filter product actively filtering content (active mode).

For measuring effectiveness, this chapter describes how, in order to assess the accuracy of a filter in identifying and blocking content from categories 1 and 2—while similarly identifying but allowing access to content from category 3—the trial collected data on whether each URL in the three indexes described in the previous chapter was correctly identified by each filter product, either for blocking or permitting access to the corresponding content.

For measuring scope and adaptability, this chapter describes how, in order to evaluate the capabilities of filters in filtering non-web traffic and in customising filtering policies in accordance with the

specific requirements of an ISP or one of its customers, an expert review captured details of various capabilities of the selected filter products.

PERFORMANCE

Evaluating the performance impact of filters meant determining the extent to which the operation of a particular filter product in the test network introduced degradation in network performance. Before indicating how performance impact was measured in the trial, it is necessary to describe the typical architecture of an ISP’s network in order to appreciate where and how performance of the network can be affected.

Operation of an ISP’s network

Figure 3 illustrates a typical layout for the network of an ISP offering ADSL broadband services (it is not an actual representation of any particular ISP’s architecture).

The links among the various network elements are shown using lines of varying widths and colours. The thickness of each line is representative of the bandwidth of the link represented by it. Links within the ISP’s internal network are of higher bandwidths (and are accordingly shown with

Chapter 3: Execution of the trial

lines of greater thickness) than the link between the end user and the local exchange digital subscriber line aggregation module (DSLAM).

Legend: Line Speeds

OC48: 2.488Gbps OC12: 622Mbps OC3: 155Mbps

Internet DS3: 44.736Mbps ADSL: 1.5Mbps

‘Prime’ network Central exchange

multiplexer Internet gateway Domain name server (DNS) Billing server Core router

Local exchange DSLAM

News server Mail server Edge router

Content filter Database server

Typical ISP internal network architecture End user

(usually co-located in a central exchange)

‘Access’ network

Figure 3: A typical ISP's architecture from the ISP to the end user

There are usually multiple users (in the order of hundreds to a few thousand) connected to a local exchange and multiple local exchanges (in the order of tens to a few hundred) connected to a central exchange. ISP networks are typically designed in this manner as it offers a scalable and cost-effective solution for the network demand likely to be generated by end users.

In this typical ISP network, an end user on an ADSL connection is connected via their local exchange11 DSLAM, through their central exchange multiplexer12, to their ISP, which then routes their traffic back and forth to the internet via an internet gateway. The bandwidth of the network links decreases as one gets further away from the internet gateway.

Figure 3 illustrates that the segment that is the ISP’s core network constitutes the ‘prime13’

network, as connections among its respective network elements are assigned a high bandwidth. By contrast, the segment between the end user and the local exchange DSLAM constitutes the

‘access14’ network. The peak network throughput to the end user is limited to the subscriber’s bandwidth, which is usually no more than a few megabits per second.

Consequently, a slowdown in performance on the access network does not necessarily indicate any end-to-end congestion in the network. For example, such degradation in network performance may

11_{When dealing with internet traffic, a local exchange is often referred to as a Point of Presence or POP.}

12_{A multiplexer is a telecommunications device used to break large bandwidth links into smaller links, while keeping} them synchronised.

13_{In networking terms, this is often referred to as the ‘fast’ network.} 14_{In networking terms, this is often referred to as the ‘slow’ network.}

Chapter 3: Execution of the trial

be the result of a large demand on bandwidth, such as an end user downloading a video file, which exceeds the bandwidth of the access network.

Similarly, any actual congestion in the prime network segment that is small or moderate in degree typically has a less pronounced effect on an end user on the access network. This is because there is a significantly larger amount of bandwidth on the ‘prime’ network than what the access network can demand. In the example shown in Figure 3, the bandwidth of the access network is 1.5

megabits per second, whereas that of the prime network is 622 megabits per second—over 400 times greater. As a result, the access network representing the segment between the ISP and the end user has little bearing on the effect of an ISP-level filter on the overall ISP scalability. Measurements conducted on the prime network provide a quantitative gauge of the effect on network performance of a filter within the ISP’s core network. These measured quantities express the number of transactions per second and the data rate within the network (measured in megabits per second) that the ISP’s network is capable of supporting. The effect of a filter on an ISP’s network is best reflected by the performance seen within the prime network. The measurements conducted in this trial focus on this area.

The network architecture applicable for ISPs offering connections other than ADSL—for example, dial-up, cable, satellite or mobile connections to the internet—is broadly similar to that described above, although the bandwidth available on the access network may differ significantly.

Network performance metrics

In order to understand the metrics used in the trial, a number of concepts that are central to network performance measurement are clarified below.

Networks are rated based on both:

● bandwidth—a measure of the potential rate that data can be transmitted over a network15; for example, when an ISP advertises a 1.5 megabits per second internet service, it means that, in peak conditions, the internet connection will transmit data at 1.5 megabits per second; and ● throughput—the actual speed at which data will be transferred from one point on the network

to another at a particular time16; it can be regarded as the rate at which ‘useful’ data is transferred.

Users rarely experience throughput higher than 80 per cent of the rated bandwidth.17 This is due to the inherent design of network protocols – the set of rules by which data is transferred across networks. Various network protocols are in common use; for example, the IEEE 802.3 standard for Ethernet and ATM.18 Irrespective of the standard of protocol that is used, data is split into

‘packets’ before transmitting. Each packet is assembled into a pre-defined format (as specified in the protocol), called a ‘frame’, before being transmitted. A frame typically contains elements such as the following:

● the header or preamble, which defines the type of protocol being used;

15_{Tannenbaum, Andrew S. (2002), Computer Networks 4th Edition, Prentice Hall} 16 _{http://www.support.psi.com/support/common/networking/diff.html}_.

17_{Spurgeon, Charles E. (2000), Ethernet: The Definitive Guide, O'Reilly}

18_{ATM: Asynchronous Transfer Mode is a cell relay, packet-switching network and data link layer protocol that} encodes data traffic into small (53 bytes; 48 bytes of data and 5 bytes of header information) fixed-sized cells. This differs from other technologies based on packet-switched networks (such as the Internet Protocol or Ethernet), in which variable sized packets (known as frames when referencing Layer 2) are used.

Chapter 3: Execution of the trial

● the start of frame delimiter, which indicates the start of the frame; ● the destination address—the IP address to which the packet is headed; ● the source address—the IP address from which the packet originates;

● the length of the packet, which allows the receiving device to correctly separate one packet from another;

● the data or payload—the actual useful information that needs to be transmitted, such as the contents of a web page;

● padding—any dummy bytes required to fulfil minimum frame size requirements; and ● the checksum, which is used for error-checking and correction.

SO F Le n gt h

Figure 4: The IEEE 802.3 Ethernet frame format19

Figure 4 shows the frame format for 802.3 Ethernet. The ‘SOF’ field denotes the ‘Start of Frame’ and is 1 byte. This sample frame format will use at least 27 bytes and up to 73 bytes in overhead; that is, the preamble, destination and source address, length and checksum.

There are two primary factors that affect network efficiency: 1. The amount of overhead, as seen in the above example.

2. The number of retransmissions required to transfer an error-free packet.

Considering Ethernet and the frame format illustrated in Figure 4, a regular MP3 file of approximately 4MB (or 4,194,304 bytes) would require a total of 2,797 frames, each of which would contain 27 bytes of overhead. This equates to a network efficiency of 98.23 per cent, assuming that the 4MB MP3 file is divided into 1500-byte packets and there are no retransmits20 as a result of packet loss in transmission. In reality, however, data communications without retransmissions rarely occur. If such a transmission required a single instance of retransmission, the efficiency would fall to 49.11 per cent. Routing devices attempt to balance overhead and number of retransmissions by adapting packet sizes, in order to obtain optimum network performance. As a result, the theoretical efficiency is rarely obtained.

As a result of the balancing of overhead and number of retransmissions, the throughput of networks increases with increasing network load until the network reaches a state of saturation; that is, the network is carrying as much traffic as its theoretical bandwidth. For Ethernet networks, this is about 80 per cent of the available bandwidth. Beyond this point, as network load increases, the network efficiency begins to plateau. This characteristic is illustrated in Figure 5. Similar characteristics have also been observed for other protocols.

19_{Tannenbaum, Andrew S. (2002), Computer Networks 4th Edition, Prentice Hall}

20_{A retransmit is where the same packet is transmitted more than once to overcome a scenario where the original} packet may have been lost when initially transmitted. It is analogous to repeating oneself in a conversation when the recipient fails to interpret one’s statements the first time.

Chapter 3: Execution of the trial % t h ro ug hp ut 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% 120% 130% 140% 150% 160% % load applied

Figure 5: Load versus throughput graph for various protocols21

The performance characteristics of a network are also influenced by the nature of the traffic being transmitted. Networks exhibit better efficiency when the traffic being transmitted is of a

predictable nature; for example, streaming media. This is because routing and switching devices require less time to determine the optimum packet size. The performance of Ethernet traffic degrades as traffic becomes increasingly ‘bursty’; that is, where packet size becomes random.22 Traffic generated by internet chat and games, where packet sizes vary without a predictable pattern, are examples of such traffic.

Network throughput is also measured in the number of transactions per second. A transaction is defined as a complete cycle of data interexchange. For the purpose of this trial, a transaction starts with the initiation of a web request and ends when the requested web content is delivered.

Test network and hardware

An isolated test network to simulate an ISP’s network was built to observe the effect of each filter product on network performance.

The network architecture is shown in Figure 6. The network architecture seen here is analogous to a Tier 3 ISP; that is, an ISP that purchases outbound transport from other networks in order to reach the internet (see Appendix B for details).

21_{The percentage load applied is a measure of the network demand placed on a network as a percentage of the total} bandwidth available. An array of 20 machines each demanding 10Mbps equates to 200Mbps; such an array would place a network with an available bandwidth of 100Mbps under a load of 200 per cent.

22_{Mazraani, T.Y.; Parulkar, G.M. (1992),}

Performance analysis of the Ethernet under conditions of bursty traffic, Global Telecommunications Conference, 1992. Conference Record, GLOBECOM Communication for Global Users., IEEE Volume , Issue , 6-9 Dec 1992 Page(s):592 - 596 vol.1

Chapter 3: Execution of the trial

Web server

simulates internet content

WebBench controller

controls WebBench tests; compiles and collects test results

Vendor supplied content filter

filters internet content

Gigabit switch

acts as edge router

WebBench client

simulates load generated from web requests from 0 to 6 users

WebBench client

simulates load generated from web requests from 0 to 6 users

WebBench client

simulates load generated from web requests from 0 to 6 users

WebBench client

simulates load generated from web requests from 0 to 6 users

WebBench client

simulates load generated from web requests from 0 to 6 users

WebBench client

simulates load generated from web requests from 0 to 6 users

Load Generation Array

Legend:

Content return path Content Request Path Traffic statistics information Gigabit ethernet

DNS performs IP address lookup

Figure 6: Test network for evaluating network performance of internet content filters

As the network was an isolated one, two core network functions were simulated: 1. The function of the internet as a source of content.

2. The function of end users requesting content.

Simulating the internet

The internet was simulated using a high-end web server. This web server hosted a range of content from both Category 2 and Category 3 indexes replicated from active web sites published on the internet. The nature of the content included, but was not limited to:

● static web content in the form of HTML documents; and

● images complementing web content in the form of GIF and JPEG files.

The web server acted as a target host for web requests generated by the array of client machines (described below), individually processing the requests and delivering the resultant content back to the requesting client.

Simulating end users

To measure the effect of individual filters on network performance, the function of end users requesting content was simulated using a tool called WebBench 5.0, a benchmarking and testing software program developed by VeriTest that measures the performance of web servers and networks under different load conditions. 23

WebBench 5.0 operates using a client-server architecture. The controller manages the execution of the tests and compiles the statistics collected by the client machines at the end of a test cycle. This machine was connected to an array of client machines that generated the web requests. The client load-generation array comprised six machines, each running the WebBench client software in

Chapter 3: Execution of the trial

order to generate web requests to the web server within the controlled environment. Since the environment used was an isolated environment, the total network bandwidth remained in a controlled and stable state.

The entire network was connected and switched using a gigabit switch.

An array of automated load-generating clients, as described above, is a standard method of generating web requests within a closed environment for testing of this nature. A similar load- generation array was used in the pervious trial conducted for NetAlert Ltd in 2005.

The hardware specifications for the web server, the WebBench controller and the WebBench clients are listed in Appendix D.

Test methodology

The performance testing involved a series of web requests generated by the client machines under the direction of the controller.

The controller instructed the clients to generate a defined sequence of transactions consisting of web requests varying in volume, as well as the interval at which they were generated.

The sequence of these transactions was as follows and is shown in Figure 7: 1. A web request was initiated from a client machine.

2. The web request was passed to the DNS server. 3. The DNS server returned a DNS lookup response.

4. The request was routed to the web server at the address specified in the DNS lookup response.

5. The web server responded and returned the requested content back through the content filter.

6. The filter responded by either blocking or permitting content through to the requesting client machine.

7. The client machine terminated the web request cycle.

Web server

simulates internet content

WebBench controller

controls WebBench tests; compiles and collects test results

Vendor supplied content filter

filters internet content

Gigabit switch

acts as edge router

WebBench client

simulates load generated from web requests from 0 to 6 users

DNS

performs IP address lookup

Step 1: Web

request initiated from client machine

Step 2: Web request passed to the DNS server

Step 3: DNS server returns a

DNS lookup

Step 4: Request routed to

web server

Step 5: Web server responded

and returned requested content through filter

Step 6: Filter responds by either blocking or permitting content through to the requesting client machine

Step 7: Client terminated the web

request cycle

Figure 7: Sequence of generation of a transaction

A mix is defined as a specified number of clients simulating a specified number of end users generating a specified number of transactions at a specified interval.

Chapter 3: Execution of the trial

Each test set for an individual filter comprised nine ‘mixes’ where a mix consisted of six clients each simulating between zero and six users generating transactions at defined intervals. Each transaction generated in a mix followed the below sequence:

1. Upon receiving a message from the controller with a specified mix, each client initiated a web request directed at the web server. This generated web request was the beginning of a transaction. The initiating client continued to track elapsed time before receiving a

corresponding response back from the web server, which in turn signified the end of the transaction.

2. The elapsed time for the transaction was recorded by the client machine before it proceeded to the subsequent transaction.

3. At the end of a sequence of mixes, the data recorded for all the transactions by each of the clients was transmitted back to the controller.

It was necessary to structure the testing in this manner to bring the network into saturation. This is

In document Closed Environment Testing of ISP Level Internet Content Filters. Report to the Minister for Broadband, Communications and the Digital Economy (Page 31-89)