4.2 SOLMI: A FRAMEWORK FOR XMLi TESTING
4.5.2 XML Test Data Generation
The following work targets functional, not vulnerability testing. But they share one of our starting premises, which is the use of XML Schemas, and also produce XML documents.
There exists a large research body investigating the generation of test data for XML-based web services [45, 46, 57, 58]. Havrikov et al. [45] proposed a search-based test generation technique for XML-based systems. A tool named XMLMate has been implemented that uses program structure and existing XML schemas and inputs, to generate new and valid XML test inputs. The tool uses genetic algorithms to evolve a random or sample initial population to achieve higher branch coverage. Experimental evaluation of XMLMate yielded good results in terms of finding unique failures in test subjects.
Xu et al. [58] proposed a testing methodology for XML-based communication in accor- dance with an XML Schema. Web services and applications are tested with respect to how well they validate input XML messages according to their XML Schemas. The method gener- ates test cases from existing XML messages and schemas via schema perturbation operators. These test cases are then used to transmit invalid data to the web service or application. An evaluation on two web services revealed only 33 % of the known faults. A similar approach was presented by Offutt et al. [57] where data perturbation is used for testing web services. Test cases are generated by modifying data values and their interaction based on the types defined in the XML schema.
The approaches discussed here for XML test data generation target the functional testing of web services. They do not focus on vulnerability testing. However, the output of their approaches might be used in SOLMI to start our generation procedure.
4.6.
Summary
We discussed a taxonomy of all known XMLi types and proposed an effective testing ap- proach for XMLi in web services based on constraint solving and input mutation. The effectiveness of a testing approach is highly influenced by the quality of test data generation. We have focused on a test generation strategy that generates attack payloads (malicious content) which satisfy the associated domain constraints defined using XML Schema Def- inition (or WSDL) and attack grammars (e.g., SQL injections in our experiments). The malicious content generation is automated using a constraint solver (Hampi). Test cases (i.e., malicious XML messages) are, then, generated by mutating existing XML messages and combining them with malicious content to generate nested attacks. As a result, gener- ated tests are valid according to XSD constraints, yet malicious at the same time.
We have carried out an experimental evaluation to compare our proposed approach with a state-of-the-art tool based on fuzz testing and known attack patterns. Our subject is a real-
world financial system with an XML gateway at the front-end that is protecting the 44 back- end web services, including a total of 443 input parameters. Our approach (SOLMI) using constraint solving and input mutation delivers promising results. Approximately 78% of the tests generated by SOLMI, which were all attacks with malicious content, were able to bypass the XML gateway and reached the target web services. Only 2.37% of the tests produced by ReadyAPI, a state-of-the-art commercial tool, could bypass the gateway. Furthermore, against expectations, these tests turned out to be non-malicious. In other words, our tool was able to find vulnerabilities in a professionally configured gateway whereas fuzz testing was misleading in suggesting it was secure. Despite using a constraint solver, the computing cost of using SOLMI is affordable in practice as it takes 0.92 seconds on average to generate each test case.
Testing Front-end Web Applications
for XML Injections
This chapter focuses on the automated testing for XML injections (XMLi), a prominent family of attacks that aim at manipulating XML documents or messages to compromise XML-based applications. More specifically, we target the front-end web applications of SOA systems, i.e., front-end web applications are the systems under test (SUTs) in our context. Among other functionalities, they receive user inputs, produce XML messages, and send them to services for processing (e.g., as part of communications with SOAP and RESTful web services [10, 11]). Such user inputs must be properly validated to prevent XMLi attacks. However, in the context of large web applications with hundreds of distinct input forms, some input fields are usually not properly validated [59]. Moreover, full data validation (i.e., rejection/removal of all potentially dangerous characters) is not possible in some cases, as meta-characters like “<” could be valid, and ad-hoc, potentially faulty solutions need to be implemented. For example, if a form is used to input the message of a user, emoticons such as “<3” representing a “heart” can be quite common. As a consequence, front-end web applications can produce malicious XML messages when targeted by XMLi attacks, thus compromising services that consume these messages.
As described in Section 1.2, fuzz testing approaches (e.g., ReadyAPI [12], WSFuzzer [13]) are not capable of detecting XMLi vulnerabilities in web applications. They can generate only simple test cases using XML meta-characters (e.g., <), which are blocked by the ap- plications. Furthermore, some attacks could be based on the combination of more than one input field, where each field in isolation could pass the validation filter unaltered.
application that communicates with web services through XML messages, we first identify a set of possible malicious XML messages (called test objectives, or TOs for brevity, in this dissertation) that the SUT can produce and send to those services. These TOs are identified using fully automated tool SOLMI (described in the previous chapter), that creates malicious XML messages based on known XML attacks and the XML schemas of the web services under test. Then, we use a specifically-tailored genetic algorithm to search the input space of the SUT (e.g., text data in HTML input forms) in an attempt to generate XML messages matching the generated TOs. Search is guided by an objective function that measures the difference between the actual SUT outputs (i.e., the XML messages toward the web services) and the targeted TOs. Our approach does not require access to the source code of the SUT and can, therefore, be applied in a black-box fashion on many different systems. The current chapter focuses on the generation of test inputs and is complementary to the automated solution for generating TOs that we presented in the previous chapter.
Note that proper input validation in the front-end can prevent many of the possible security attacks. However, in the context of large web applications with hundreds of dis- tinct input forms, some input fields are typically not properly validated as a result of time pressures, changes, and lack of security expertise.
Furthermore, some attacks could be based on the combination of more than one input field, where each field in isolation could pass the validation filter unaltered. In some cases, full data validation (i.e., rejection/removal of all potentially dangerous characters) is not possible, as meta-characters like < could be valid, and ad-hoc solutions need to be implemented (which could be faulty). For example, if a form is used to input the message of a user, emoticons like <3 representing a “heart” can be quite common.
We have carried out an extensive evaluation of the proposed approach on two case stud- ies. The first study consists of 20 experiments on six web applications that simulate bank interactions with an industrial bank card processing service. These web applications have different levels of complexity in terms of the number of inputs, their data types and the validation technique. The second study includes a third-party application used for training purposes and an industrial web application having millions of registered users, with hundreds of thousands of visits per day. Results are promising, as our proposed search-based testing approach is effective at detecting XMLi vulnerabilities in both case studies, within practical execution time. The evaluation of our approach on such diverse systems, including a large industrial web application, is a sizable and useful empirical contribution.
The remainder of the chapter is structured as follows. Section 5.1 describes the testing context for this chapter. Section 5.2 presents our proposed approach and the tool that we developed for its evaluation. Section 5.3 reports and discusses our evaluation on two case studies including research questions, results and discussions. Section 5.4 discusses related work. Finally, Section 5.5 concludes the chapter.
5.1.
Testing Context
A SOA system typically consists of a front-end web application that generates XML messages (e.g., toward SOAP and RESTful web services) upon incoming user inputs (as depicted in Figure 5.1). The front-end system often performs various transformation techniques on the user inputs before generating the XML messages, e.g., encoding, validation or sanitisation. XML messages are consumed by various back-end systems or services, e.g., an SQL back- end, that are not directly accessible from the net. In this chapter, we focus on the front-end web application and aim to test if it is vulnerable to XMLi attacks. We consider the web application as a black-box. This makes our approach independent from the source code and the language in which it is written (e.g., Java, .Net, Node.js or PHP). Furthermore, this also helps broaden the applicability of our approach to systems in which source code is not easily available to the testers (e.g., external penetration testing teams). However, we assume to be able to observe the output XML messages produced by the SUT upon user inputs. To satisfy this assumption, it is enough to set up a proxy to capture network traffic leaving from the SUT, and this is relatively easy in practice.
The security of the front-end plays a vital role in the overall system’s security as it directly interacts with the user. Consider, for instance, a point of sale (POS) as the front- end that creates and forwards XML messages to the bank card processors (bank-end). If the POS system is vulnerable to XMLi attacks, it may produce and deliver manipulated XML messages to web services of the bank card processors. Depending on how the service components process the received XML messages, their security can be compromised, leading to data breaches or services being unavailable, for example.