State of the Art
2.1. Software evaluation
Software evaluations play an important role in different areas of Software En-gineering, such as Software Measurement, Experimental Software Engineering or Software Testing.
According to the ISO 14598 standard [ISO/IEC, 1999], software evaluation is the systematic examination of the extent to which an entity is capable of ful-filling specified requirements, considering software not just as a set of computer programs but also as the produced procedures, documentation and data.
19
State of the art Thesis contribution Software evaluation and
benchmarking
Research on the foundations of software evaluation and benchmarking
Table 2.1: Relationship between the state of the art and the thesis contributions.
Software evaluations can take place all along the software life cycle: they can be performed during the software development process by evaluating inter-mediate software products or when the development has finished.
Although evaluations are usually made inside the organization that develops the software, other groups of people who are independent of the organization, such as users or auditors, can also make them. The use of independent third parties in software evaluations can be very effective, but these evaluations are much more expensive for the organizations [Rakitin, 1997].
The goals of evaluating software depend on each specific case, but they can be summarised from [Basili et al., 1986, Park et al., 1996, Gediga et al., 2002]
as follows:
To describe the software in order to understand it and to establish base-lines for comparisons.
To assess the software with respect to some quality requirements or cri-teria and determine the degree of desired quality of the software product and its weaknesses.
To improve the software by finding opportunities for enhancing its qual-ity. This improvement is measured by comparing the software with the baselines.
To compare alternative software products or different versions of a same product.
To control the software quality by ensuring that it meets the required level of quality.
To foresee in order to take decisions, establishing new goals and plans for accomplishing them.
Software can be evaluated according to numerous quality attributes. Multi-ple software quality models have been defined after the first proposals made by Boehm [1976] and Calvano and McCall [1978] in the 1970’s. In this thesis,
these quality models are not described in detail, and only an example of the models is provided, illustrated with one of the most well-known frameworks for software product quality, the framework described in the ISO 9126 standard [ISO/IEC, 2001].
The ISO 9126 identifies three different views of software product quality:
Internal quality. Internal quality concerns the totality of the character-istics of the software product from an internal view. Details of software product quality can be improved during the implementation, review and test of the code; however, the fundamental nature of the software prod-uct quality represented by internal quality remains unchanged unless re-designed.
External quality. External quality concerns the totality of the characteris-tics of the software product from an external view and refers to the quality of the software when this is executed; quality is typically measured and evaluated while testing the software in a simulated environment with sim-ulated data using external metrics. During testing, most faults should be discovered and eliminated, but some faults may still remain afterwards.
However, because it is difficult to correct the software architecture or other basic design aspects of the software, the fundamental design usually re-mains unchanged throughout testing.
Quality in use. Quality in use refers to the user’s view of the software product quality when this is used in a specific environment and in a specific context. It measures the extent to which users can achieve their goals in a particular environment, rather than the properties of the software itself.
The quality model for internal and external quality proposes six high-level software quality characteristics, which are decomposed into sets of subcharac-teristics. These high-level characteristics, shown in figure 2.1, are the following:
Functionality. It is the capability of the software to provide functions that meet stated and implied needs when the software is used under specified conditions. Functionality can be decomposed into suitability, accuracy, interoperability, security, and functionality compliance.
Reliability. It is the capability of the software to maintain its level of performance when used under specified conditions. Reliability can be decomposed into maturity, fault tolerance, recoverability, and reliability compliance.
Usability. It is the capability of the software to be attractive and under-stood, learned, and used by the user, when it is employed under specified conditions. Usability can be decomposed into understandability, learn-ability, operlearn-ability, attractiveness, and usability compliance.
Efficiency. It is the capability of the software to provide appropriate performance, relative to the amount of resources used, under stated con-ditions. Efficiency can be decomposed into time behaviour, resource util-isation, and efficiency compliance.
Maintainability. It is the capability of the software to be modified. Modi-fications may include corrections, improvements or adaptation of the soft-ware to changes in environment and in requirements and functional spec-ifications. Maintainability can be decomposed into analysability, change-ability, stchange-ability, testchange-ability, and maintainability compliance.
Portability. It is the capability of software to be transferred from one environment to another. Portability can be decomposed into adaptability, installability, co-existence, replaceability, and portability compliance.
Figure 2.1: Quality model for internal and external quality [ISO/IEC, 2001].
The quality model for quality in use proposes four software quality charac-teristics, shown in figure 2.2. These characteristics are the following:
Effectiveness. It is the capability of the software product to enable users to achieve specified goals with accuracy and completeness in a specified context of use.
Productivity. It is the capability of the software product to enable users to expend appropriate amounts of resources in relation to the effectiveness achieved in a specified context of use.
Safety. It is the capability of the software product to achieve acceptable levels of risk of harm to people, business, software, property or the envi-ronment in a specified context of use.
Satisfaction. It is the capability of the software product to satisfy users in a specified context of use.
Figure 2.2: Quality model for quality in use [ISO/IEC, 2001].
2.2. Benchmarking
In the last decades, the word benchmarking has become relevant within the business management community. The most well-known definitions in this area are those of Camp [1989] and Spendolini [1992]. Camp defines benchmark-ing as the search for industry best practices that lead to superior performance, while Spendolini expands Camp’s definition by adding that benchmarking is a continuous, systematic process for evaluating the products, services, and work processes of organizations that are recognised as representing best practices for the purpose of organizational improvement. In this context, best practices are good practices that have worked well elsewhere, are proven and have produced successful results [Wireman, 2003]. These definitions highlight the two main benchmarking characteristics: continuous improvement and the search for best practices.
The Software Engineering community also uses the term benchmarking though it does not share a common benchmarking definition. Below some of the most representative definitions used by the Software Engineering community are pre-sented:
Kitchenham [1996] and Weiss [2002] define benchmarking as a software evaluation method suitable for system comparisons. For Kitchenham, benchmarking is the process of running a number of standard tests using a number of alternative tools/methods and assessing the relative perfor-mance of the tools in those tests, whereas for Weiss, benchmarking is a method of measuring performance against a standard or a given set of standards.
Wohlin et al. [2002] adopt the business benchmarking definition, viewing benchmarking as a continuous improvement process that strives to be the best of the best through the comparison of similar processes in different contexts.
2.2.1. Benchmarking vs evaluation
The reason for benchmarking software products instead of just evaluating them is to obtain several benefits that cannot be obtained from software evalua-tions. As figure 2.3 illustrates, software evaluation shows the weaknesses of the
software or its compliance to quality requirements. If several software products are involved in the evaluation, we also obtain a comparative analysis of these products and recommendations for users. However, when benchmarking several software products, in addition to all the benefits commented, we also gain con-tinuous improvement of the products, recommendations for developers on the practices used when developing these products and, from these practices, those that can be considered best practices.
Figure 2.3: Benchmarking benefits.
2.2.2. Benchmarking classifications
This section presents two different classifications of benchmarking that, al-though they were created inside the business management community, can be applied to software benchmarking. One of the classifications is focused on the participants involved in it, whereas the other is based on the nature of the objects under analysis.
The main benchmarking classification was presented by Camp [1989]. He categorises benchmarking depending on the kind of participants involved, and his classification has been adopted by authors such as Sole and Bist [1995], Ahmed and Rafiq [1998] and Fernandez et al. [2001]. The four categories identified by Camp are
Internal benchmarking. It measures and compares the performance of activities, functions and processes within one organization.
Competitive benchmarking. In this case, the comparison is made with products, services, and/or business processes of a direct competitor.
Functional benchmarking (also called industry benchmarking). This category is similar to the previous one, competitive benchmarking, except that the comparison involves a larger and more broadly defined group of competitors in the same industry.
Generic benchmarking. Its aim is to search for general best practices, without regarding any specific industry.
Another classification categorises benchmarking according to the nature of the objects under analysis. This classification first appeared in Ahmed and Rafiq’s [1998] paper and complements Camp’s classification. A few years later, Lankford [2000] established a separate classification and identified the following types of benchmarking:
Process benchmarking. It involves comparisons between discrete work processes and systems.
Performance benchmarking. It involves comparison and scrutiny of performance attributes of products and services.
Strategic benchmarking. It involves comparison of the strategic issues or processes of an organization.