Model and Methodology
3.3 Web Polygraph
Machine/Interface eth0 eth1
Cache server 192.1680.1 158.36.91.196
Web server n/a 192.168.0.2
Client 128.39.75.101 192.168.0.3 Table 3.2: IP addresses
The web server machine which does not have a public IP address; it gets the Internet from via the cache server machine. The ssh traffic destined for the cache server machine’s port number 222 is forwarded to the web server. Here are the iptables rules which forward the traffic to and masquerades traffic from the web server machine.
iptables
iptables -t nat -A PREROUTING -i eth1 -p tcp --dport 222 -j DNAT --to-destination 192.168.0.2:22
iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE
These iptables rules do not persist across system reboots. The following com-mands rewrite the iptables rules after each restart. The rules are stored in the file /etc/firewall.conf, and a shell script is run by the ifup command when the interface restarts, so that the rules are installed again. For this to work, port forwarding for the system must be enabled (first command below), and then the firewall configuration can be saved to a text file:
echo 1 > /proc/sys/net/ipv4/ip_forward iptables-save > /etc/firewall.conf
The simple shell scriptiptables, located in /etc/network/if-up, is used to restore the rules from firewall.conf on reboot:
/etc/network/if-up/iptables
#!/bin/sh
iptables-restore </etc/firewall .conf
3.3 Web Polygraph
Web Polygraph [6] is the main tool will be used for benchmarking in this work (referred to hereafter as simply Polygraph). Polygraph is a freely available tool which is used for performance testing. Polygraph is de-facto industry standard
3.3. WEB POLYGRAPH
for testing HTTP intermediaries. Testing of proxy caches, origin server accel-erator and L4/7 switches, content filters and other web intermediaries can be performed using this tool. Its most important features are
• High-performance HTTP clients and servers
• Realistic HTTP traffic generation
• Flexible content simulation
• A powerful configuration language
The plan for this research was to use an existing tool rather than writing such a tool from scratch. The main focus was to be on observing cache server behaviour rather than developing a sophisticated tool. As discussed in Chapter 2, Poly-graph is best and only choice for this purpose. It can generate the desired load patterns needed for this research. It is also generally up-to-date with current caching software, and it also supports more features than other software. This tool was chosen despite the fact that the significant start up overhead would be significant, which turned out to be even greater than initially anticipated. The difficulties and bugs with this tool are discussed in Chapter 5.
Polygraph can simulate a large number of clients and server processes provided that system resources support them. The main bottleneck is limited memory, as the process will die if it runs out of the memory. The other limited resources are CPU capacity, the number of TCP ports and network bandwidth. Polygraph use a custom configuration language called PGL which must be used to define workload parameters and characteristics.
Polygraph works by creating a set of client processes that it calls robots. Each robot (client) is able to behave like a real web user and can request a variety of web object content types with predefined distribution probabilities.
Polygraph runs two type of processes: polygraph-server and polygraph-client which simulate clients and servers on the network (respectively). The generated traffic from the polygraph-client is sent to the polygraph-server through the proxy.
3.3.1 Polygraph testing phases
Polygraph tests are accomplished via a series of phases. These phases are con-figured using PGL, and there are a variety of parameters available for doing do.
For example, each phase has its own begin and end request rate factors. After reaching to each phase’s goal Polygraph automatically moves to next phase.
The Polygraph phases warm, fill, inc, top1, dec, idle, top2 and cool are the phases used for reverse proxy benchmarking, as illustrated in Figure3.3.
3.3. WEB POLYGRAPH
Figure 3.3: Phases
The different phases are defined as follows:
• warm: In this warm-up phase, the client and server robots try to find each other and negotiate their configurations in order to detect any problems.
• fill In this the phase, the initially empty cache is filled by the client robots.
The time of filling varies depending on the size of the cache, the request rate, and the recurrence rate of requests. To obtain the best results, the goal of this phase is to fill twice the size of the cache in order to reach a steady state. In this experiment, the recurrence rate is set to 5% to speed up the filling.
• link: The purpose of this phase is to prevent a sudden increase in the request rate. In this phase, which links the fill phase and the first main testing period, the request rate is gradually increased to the desired rate for the following top1 phase (the peak rate for the test).
• top1: This key testing phase runs at the maximum specified request rate.
The rate is kept at the constant peak level during this phase. In order to focus on caching behavior, the recurrence of objects is set to a high value (95%) to prevent clients from generating new objects.
• dec: This phase decreases the request rate gradually down to the level selected for the following idle phase, which is typically 10% of the top1 peak rate.
• idle: During this phase, the request rate is set to a very low level in order to compare the performance under low and heavy traffic conditions. In this phase, the proxy server should have the time to do its regular off-line activities such as lazy garbage collection.
3.3. WEB POLYGRAPH
• inc: This phase is similar to the link phase. During the inc phase, the request rate is gradually increased from 10% to 100% of peak rate, in preparation for the following top2 phase.
• top2 This phase is the most important phase. Comparing the results from top1 and top2 shows if the proxy is truly in a steady state. When the test reaches this phase and the proxy server has reached a steady state, it has experienced both high and low request rates requests. The results from this phase is are used for comparison with the previous high request rate phaser, top1.
• cool Once the testing is finished, the robots and servers will cool down from the peak request rate to zero during this phase.
3.3.2 Command line options
Polygraph has a large number of command line options. Some of them can also be specified in the configuration files. When running Polygraph for this test, the following options are used:
• –config: Specify configuration file.
• –unique_world: Whether or not to use unique URLs.
• –local_rng_seed: Seed for the random number generator.
• –http_proxies: Proxy server.
• –verb_lvl: Verbosity level.
• –dump: Items to dump.
• –log: Log file location.
By default, Polygraph runs its tests using a unique load. That means that the URLs are unique, and their request sequence is different from run to run. By setting the command line option unique_world to off, then the URLs generated will not be unique, making Polygraph free to use the same objects for various tests. To further ensure that the runs are identical for both proxy servers, Poly-graph’s random number generators should be provided the same seed, using the command line option local_rng_seed. While there is no guarantee that the random number generator will generate exactly the same numbers, nevertheless the results would be nearly the same[6]. The combination of these two options makes it possible to reproduce essentially the same URL sequence for separate tests.
To run the test, the server and client processes are run separately. They can be
3.3. WEB POLYGRAPH
On the server side, the following command is used to start the Polygraph server for these tests:
server process
./polygraph-server --unique_world off --local_rng_seed 501 --config config.pg --verb_lvl 10 \ --dump errs --log results/log_server
One the client side, the following command is used to start the Polygraph client process:
client process
./polygraph-client --unique_world off --local_rng_seed 601 --config config.pg --verb_lvl 10 \ --dump errs --log results/log_client
3.3.3 Polygraph console output
The Polygraph console periodically reports the state of the test and the progress of its various phases. The phase duration (in minutes) and the fill size (in MB) as well as percentage completion of the current phase and the current frozen working set size are shown for each interval. Figure 3.4 shows an example of the client side console display.
Figure 3.4: The current phase state: client side console output
Runtime messages are also displayed, at more frequent intervals than the pre-ceding periodic status reports, in a tabular format. Any error messages are also included. Figure3.5 identifies the most important parts of these messages.