Performance test setup - Evaluation Toolset performance

6.2 Evaluation Toolset performance

6.2.2 Performance test setup

In order to rate the tools according to the metrics, a test setup is required to acquire the necessary amount of data. First, we will elaborate on how the tools and the target web app is set up. Then, we explain the proxy that we use between the client and the server, and we show how the test automation script works. The complete test setup is illustrated in figure 6.1.

Figure 6.1:The figure shows the complete test network setup using five tools and four environments (targets). It also indicates the two places where the time-difference measurements are computed.

Setup of tools and target web app

Both the four tools and the target web app will be run locally on a MacBook Pro from 2013 (macOS High Sierra). CompuRacer will be tested twice for every set setup. Once with the last-byte-synchronisation enabled (CompuRacer+, CR+ or CR_lbs) and once with this function disabled (CompuRacer or CR). This difference allows us to test the effectiveness of this particular function as well and results in five tools in total. In the results, we abbreviate Turbo Intruder to TI, Race the web to RTW and Sakurity Racer to SR.

As only the CompuRacer and RTW support the sending different requests in parallel, we only tested the parallel sending of the same request. This request is first added to the tool under test and then is sent 25 times in parallel. We record the time- differences between requests at the client and the web server inside the application. We repeat this process 15 times per tool. Between tests, we restart the application and reset its database to avoid any interference. In the next paragraphs, we discuss the process to add and send requests with the RTW, SR and TI tools.

Race the web For this tool, we have created a TOML (Tom’s Obvious, Minimal Language) configuration file for all setups. The file contains the parallel duplication amount and the request-content itself. We start the tool with this file as the only argument, and then it sends the parallel requests.

Sakurity Racer For this tool, there is no configuration file. The parallel duplication amount is hardcoded into the sending application. For every test, the Chrome extension has to be used to intercept the request of interest and forward it to the sending application. This application then immediately sends the requests in parallel to the target web app.

Turbo Intruder For this tool, we should first load it with a script that optimises the sending for triggering race conditions. In this script, the parallel duplication amount is also indicated. Then, the requests of interest should be intercepted using the Burp Suite proxy. From the proxy history, we send the request to the TI extension. The extension can then send this request at the press of a button.

Target functionality in web app We test the tools on the self-developed vulnera- ble voucher app. The most interesting redeem functionality is targeted: redeeming

a multi-use coupon (code: COUPON3) multiple times in the ’insecure’ setting. This coupon can be redeemed 100 times. The ’insecure’ setting contains a TOCTOU bug, but the race window is tiny as there is almost no logic between check-transaction whether a voucher is usable and act-transaction that uses the voucher. As this coupon will be redeemed 25 times, when no race condition occurred, this would result in 75 leftover usages. When more than 75 usages are left, and no errors occurred, we know that a race condition in the voucher app has taken place.

Note about test interference We keep an eye on the CPU and RAM usage of the MacBook during the three local tests. This check is necessary because, as already mentioned before, a race condition is exceedingly dependant on the exact timing of requests. Slow performance of the testing system will influence both the tools and the test web app in unexpected ways, and we will, therefore, monitor it. When the CPU usage rises above 50% on average, the particular test is postponed or re-executed.

Setup of proxy and remote server

As mentioned in the metric notes above, a test that only consists of a local setup with a server and client on one system or one local network is not very realistic. In a real-life setup with the server being in another city, country or continent, latency and packet loss become significant factors that we also have to include in the test. That is why, after the test is executed once with a local server; both a configuration using a random-delay-proxy called ToxiProxy (version 2.1.4) developed by Shopify (2019) is set up, and also a configuration using an Amazon EC2 microserver in- stance in Paris. Note that each test is executed from the west of the Netherlands and all relative latencies are also calculated from this location. More concretely, using the remote server and the proxy three additional tests are added which adds up to 4 test environments in total. In figure 6.1, the test network setup is illustrated and below, every environment is described in more details:

1. Local server - A server on the same local network is simulated: a delay of 1 ms or less and jitter at microsecond-level is to be expected.

2. Remote server - An Ubuntu microserver in Paris is used: a delay of about 20 ms and jitter of at most 5 ms is to be expected for every packet up- and downstream as Reinheimer and Roberts (2019a) indicate. This results in a latency of 15 to 25 ms.

3. Proxy normal- A server in Los Angeles is simulated: a delay of 250 ms and jitter of at most 50 ms is added to every upstream request. So every request will have a latency of 200 to 300 ms as Reinheimer and Roberts (2019b) indicate. 4. Proxy slow- A horrible environment is simulated: a delay of 1500 ms and jitter of at most 500 ms. So every request will have a latency of 1000 to 2000 ms2_. As exploiting race conditions is not expected to depend on downstream latency (server -> client), this latency is not included for the proxy setups. The following commands were used to start Toxyproxi, create the two different proxy setups to the testing web app (at127.0.0.1:5005) and set the ’toxics’ (latency and jitter) of these

proxy setups: $> t o x i p r o x y - s e r v e r & $> t o x i p r o x y - cli c r e a t e d e l a y _ n o r m a l - l 1 2 7 . 0 . 0 . 1 : 5 0 0 6 ,→ - u 1 2 7 . 0 . 0 . 1 : 5 0 0 5 $> t o x i p r o x y - cli c r e a t e d e l a y _ h i g h - l 1 2 7 . 0 . 0 . 1 : 5 0 0 7 - u ,→ 1 2 7 . 0 . 0 . 1 : 5 0 0 5 $> t o x i p r o x y - cli t o x i c add d e l a y _ n o r m a l - t l a t e n c y - a ,→ l a t e n c y = 2 5 0 - a j i t t e r =50 - u $> t o x i p r o x y - cli t o x i c add d e l a y _ h i g h - t l a t e n c y - a ,→ l a t e n c y = 1 5 0 0 - a j i t t e r = 5 0 0 - u

Setup of test automation script

As shown in figure 6.1, the test as a whole encompasses using five different tools to send 25 requests in parallel via four different (proxy) setups, and the complete test is repeated 15 times. This results in 20 different setups repeated 15 times to result in 300 individual tests. For every test, the time differences between requests have to be gathered from two different locations and also the number of used vouchers, and error responses have to be saved. Next to this, the logging, database and application have to be reset between tests. As it would require significant effort to gather this data by hand, most aspects of this process have been automated using a Python script.

The source code of the script can be found on a GitHub repository of the au- thor3_.

2_{Under normal circumstances, these characteristics are not found between any two servers on}

earth, but an awful network setup is still likely to yield these results.

The script contains four stages per test and repeats this 15 times until all results are gathered. Then, it creates a summary file with all time-differences, statistics and redemption results. The script should be re-run for all tools and different proxy configurations. The different stages are described below. The output of the tool for one illustrative test is shown in figure 6.2

Figure 6.2: The console output for running the test script once with illustrative data. All input is coloured green and all output from the Wireshark recorder and converter is coloured red.

Configuration stage The user first indicates the name of the tool and the proxy configuration that will be tested. For the proxy configuration, ’f’ (fast, no proxy), ’r’ (remote), ’n’ (normal proxy) and ’s’ (slow proxy) can be used. The local request recorder uses this information to optionally filter any requests from the proxy to the web app. Only the time-differences of requests from the tool to the proxy are mea- sured. Then, the script will first create the necessary folders for result storage.

Preparatory stage After the setup, several commands are executed on the server of the voucher app via SSH. These commands empty the log files of the web server and the application, restart the application and reset the database. Then, the script waits for an ENTER-press. During this time, the user should manually prepare the tool for sending parallel requests via the current proxy configuration.

Execution stage Then, he presses ENTER, and the script will start a Wireshark recorder to save the request differences at the client for 8 seconds. When the record- ing is started, it will indicate this fact. Only after this indication, the user can safely start the parallel sending of requests. All requests send before this moment or after 8 seconds will not be recorded.

Evaluation stage After 8 seconds, the script will stop the recorder, save the results and again access the server of the voucher app via SSH to read the log files. It will extract the arrival-timestamps of the 25 requests at the web server and the application. After this, it will ask the user how much success codes were returned and how much vouchers were redeemed. This data is stored, and the tool returns to the preparatory stage for the second test.

In document Towards systematic black box testing for exploitable race conditions in web apps (Page 114-119)