Essential software testing dimensions

1.5 Structure of the work

2.3.1 Essential software testing dimensions

In this section, we will discuss four major dimensions (or axis) of software testing, as shown in figure 2.3. We would like to stress that not every option along every axis is compatible with every option along a different axis. That is why, for every axis, the compatibility with the other axis is discussed. For instance, static performance testing or black-box functional testing are not feasible options.

Figure 2.3: The figure shows all aspects of the four major dimensions of software testing that are discussed in this section.

The first axis is based on the three different aspects of software quality that could be tested: functional, performance or security (Moilanen et al., 2015). You could also consider adding the legal requirements to software quality as a fourth aspect, but we do not know of any legal requirements to software quality that cannot be assigned to either of the three aspects already mentioned.

• Functional - Whether every aspect of the system is implemented correctly according to the requirements. This results in a system that works as expected during a typical run.

• Performance - Whether the system is robust enough to be able to reliably support the expected (peak) load without failure.

• Security- Whether the risk that the system can be made to function in a not- intended way resulting in a negative impact is of an acceptable level.

Interestingly, improvements to any software based on functional and performance tests can also improve the security of the application as functional bugs or performance issues could be misused by an attacker. However, this does not mean that the specific security aspect of software testing can be omitted. One reason for this

is that the testers need to adopt an adversary mindset for most security testers. In other words, he needs to think like a hacker who wants to get in using any possible method (outside of the exact specification and design) instead of a developer who tends to only check conventional methods.

Although functional and performance tests do require manual labour to set up, after this phase, it can often be run automatically. For security testing, we can automatically check for most known vulnerabilities in software and dependencies, but these checks are neither durable nor conclusive. On the one hand, this is because new security bugs are found, and any listing of potential vulnerabilities quickly becomes outdated. It is also due to the very versatile and creative interaction with the system that is required to perform thorough security testing. Any determined hacker would perform such manual labour as well, and he would only need one way in. On the contrary, a security tester often tries to discover every way in – the whole attack surface.

The second axis corresponds to the five software development levels where testing could take place: unit, module, i, system and acceptance as (Ammann and Offutt, 2016, chap. 1.1) show. One might be tempted to include regression testing as well, but this is not an additional level. It is the act of comparing the software product as a whole between different versions as a part of maintenance. The in- depth explanation of every level is shown below and is adapted from (Ammann and Offutt, 2016, chap. 1.1):

• Unit testing - This often targets the smallest amount of code at the level of functions and methods. The method is tested with different kinds of expected and not expected input data to confirm a correct implementation.

• Module testing - This level targets a combination of methods and functions with the same purpose. In Object-Oriented (OO) programming, this is a class. Else, it could refer to a single file. The test assesses whether the methods within the class work together to abide by the detailed design of the application properly.

• Integration testing- This encompasses testing a combination of classes that form a single subsystem of an application. Based on the subsystem design, the tester assesses whether the structure and behaviour of this part match the design.

• System testing - This level targets the assembly of the different subsystems and their connectors into the complete system. The tester evaluates whether

the system matches the requirements to the system as a whole. This is the ’architectural design’.

• Acceptance testing- This highest level of testing is also about the complete system. However, now, the tester determines whether the actor-needs as described in the requirements are fulfilled. It requires users with the appropriate domain knowledge as it evaluates whether the system does what the users themselves want.

Both the functional and the security aspects of software quality, as mentioned above, can be tested at all levels. Performance tests, however, are often based on specific user-activity based requirements and can, therefore, only be performed as system or acceptance testing.

The third axis is the access and knowledge level of the tester regarding the internal working of the system under test: black-box,white-boxorgrey-box (Acharya and Pandya, 2012). The exact definition of these terms does not seem to be used con- sistently throughout literature. We have used the most common way to explain them. In most cases, regardless of this level, the tester does know the user-requirements of system or how it should behave.

1. Black-box - The tester known next to nothing about the internal workings of the system (like statements and branches) and cannot view the source code. He does know how the system should behave according to its requirements, he can run the system, and interact with it from the outside to evaluate the consistency of this expected behaviour. In this case, the tester does not have significantly more information than a typical user or attacker of a system has. Thus, a manual and a compiled version of a piece of software or the Internet Protocol (IP) addresses of the web app servers.

2. White-box - Next to the abilities and knowledge of the black-box tester, this tester also has full knowledge of the design and internal workings of the system and can view the source code and any related documentation. The test can, therefore, be executed from the inside of the application either by a developer or an independent tester. The advantage of this technique is that a tester can reliably test almost all execution paths of a system. This allows for the most thorough testing.

3. Grey-box- This is a hybrid form of the types described above. As Acharya and Pandya (2012) describes, in this case, the tester knows some things about the internal workings of the system and might have access to parts of the

documentation or source code. Both the advantages and the disadvantages of the other types could apply to this technique dependant on the specific case. Interestingly, in contrast to the other work by Acharya and Pandya cited earlier, (Am- mann and Offutt, 2016, chap. 1.4) claim that the test distinctions made above have become obsolete as several new types of abstract software models are a more pow- erful way to describe this aspect of testing. However, to the best of our knowledge, these distinctions are still in common use, and we also deem them sufficient for explaining the topic of this thesis. That is why we have decided to stick by these terms.

Regarding the first axis, functional tests are often executed from a white-box perspective, while performance and security tests can be performed both from a black- or grey-box perspective. Any part of the second axis, as described before, can be tested using the white-box approach. Contrarily, only system and acceptance testing seem to be possible using a black-box method. A grey-box approach might be able to target all levels depending on the specific case.

The fourth axis is the testing method: static, symbolic or dynamic (Godefroid et al., 2008). This indicates how the software is analysed to perform the tests. This analysis could be purely based on the source code itself, or it could require the tester to execute the software.

1. Static- This method could be considered the oldest form of testing. It requires access to the source code of the application and at least verifies whether the code abides by the syntax of the language itself. Any other checks that do not require knowledge of the value of variables or other execution-related be- haviours can also be considered static testing. Often, these kinds of checkers are already built into Integrated Development Environments (IDEs) to quickly spot errors while developing an application.

2. Dynamic- This method runs the application using predefined or automatically generated input and tries to discover whether all reachable program states are valid and do not result in errors. Contrary to static analysis, it requires a cer- tain knowledge of valid inputs and outputs of the program, which is not always desirable. It does, however, have the power to show that a system not only works in theory (static) but also in practice. As the system must be success- fully compiled and executed before dynamic testing is possible, a dynamic test following and complementing a static test is a common procedure.

3. Symbolic - Just like dynamic testing, this method tries to infer the run-time behaviour of the application. However, it does not do this by simply running the application itself, but by also assigning a symbolic value to each variable. Based on the requirements of the system, it can then calculate what ranges of different variable values will occur during program execution. It does not require knowledge of valid inputs like dynamic testing does. Unfortunately, ex- haustive testing in large systems is often not possible due to ’path explosion’. This is the name for an exponentially large number of paths through the program, or states, due to branches (like if, while, for, fork statements) that have to be explored. Concolic testing tries to resolve this issue by combining symbolic analysis with concrete (dynamic) testing using actual inputs to a running program to quickly rule out (prune) unfeasible paths and thereby resulting in a more manageable state-space (Sen et al., 2005).

Regarding the first and second axes, we estimate that they are compatible with all testing methods described above. However, with regards to the third axis, only the dynamic method seems to apply to black- and grey-box testing. Static and symbolic testing requires full access to the source code that is only possible when the access and knowledge of the tester are of the white-box level.

In document Towards systematic black box testing for exploitable race conditions in web apps (Page 41-45)