• No results found

Measuring Software Product Quality

N/A
N/A
Protected

Academic year: 2021

Share "Measuring Software Product Quality"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)

T +31 20 314 0950 info@sig.eu www.sig.eu

Measuring Software Product Quality

Eric Bouwers

(2)

2 I 54

Software Improvement Group

Who are we?

Highly specialized advisory company for cost,

quality and risks of software

Independent and therefore able to give objective

advice

What do we do?

Fact-based advice supported by our automated

toolset for source code analysis

Analysis across technologies by use of

technology-independent methods

Our mission:

(3)

3 I 54

Services

Software Risk Assessment

In-depth investigation of software quality and risks

Answers specific research questions

Software Monitoring

Continuous measurement, feedback, and decision support

Guard quality from start to finish

Software Product Certification

Five levels of technical quality

Evaluation by SIG, certification by TÜV Informationstechnik

Application Portfolio Analyses

Inventory of structure and quality of application landscape

(4)

4 I 54

Measuring Software Product Quality

(5)

5 I 54

Some definitions

Projects

Activity to create or modify software products

Design, Build, Test, Deploy

Product

Any software artifact produced to support business processes

Either built from scratch, from reusable components, or

by customization of

standard

packages

Portfolio

Collection of software products

in various phases of lifecycle

Development, acceptance, operation,

selected for decommissioning

Product

Product Product Product Product Product

Project

(6)

6 I 54

Our claim today

“Measuring the quality of software products is key to

successful software projects and healthy software portfolios”

Product

Product Product Product Product Product

Project

(7)

7 I 54

Measuring Software Product

Quality

(8)

8 I 54

The ISO 25010 standard for software quality

Functional Suitability

Performance Efficiency

Compatibility

Reliability

Portability

Maintainability

Security

(9)

9 I 54

Why focus on maintainability?

Init-

iation

Build

Accep-

tation

Maintenance

(10)

10 I 54

The sub-characteristics of maintainability in

ISO 25010

Maintain

Analyze

Reuse

Modify

Test

(11)

11 I 54

Measuring maintainability

Some requirements

Simple to understand

Allow root-cause

analyses

Technology

independent

Easy to compute

Heitlager, et. al. A Practical Model for Measuring Maintainability, QUATIC 2007

(12)

12 I 54

Measuring ISO 25010 maintainability using the

SIG model

Vo lum e Du plicati on Un it size Un it c om ple xity Un it inte rfa cing Mo du le c ou pling Com po ne nt ba lan ce Com po ne nt ind ep en de nce Analysability X X X X Modifiability X X X Testability X X X Modularity X X X Reusability X X
(13)

13 I 54

Measuring maintainability

Different levels of measurement

System

Component

Module

(14)

14 I 54

Source code measurement

Volume

Lines of code

Not comparable between technologies

Function Point Analysis (FPA)

A.J. Albrecht - IBM - 1979

Counted manually

Slow, costly, fairly accurate

Backfiring

Capers Jones - 1995

Convert LOC to FPs

Based on statistics per technology

(15)

15 I 54

Source code measurement

Duplication

0: abc

1: def

2: ghi

3: jkl

4: mno

5: pqr

6: stu

7: vwx

8: yz

34: xxxxx

35: def

36: ghi

37: jkl

38: mno

39: pqr

40: stu

41: vwx

42: xxxxxx

(16)

16 I 54

Source code measurement

Component balance

Measure for number and relative size of architectural elements

CB = SBO

×

CSU

SBO = system breakdown optimality, computed as distance from ideal

CSU = component size uniformity, computed with Gini-coefficient

0.0

0.1

0.2

0.6

E. Bouwers, J.P. Correia, and A. van Deursen, and J. Visser, A Metric for Assessing Component Balance of Software Architectures

in the proceedings of the 9th Working IEEE/IFIP Conference on Software Architecture (WICSA 2011)

A B C D

A B C D

(17)

17 I 54

From measurement to rating

A benchmark based approach

0.14

0.96

0.09

0.84

0.09

0.14

0.84

0.96

….

….

0.34

Note: example thresholds

sort

HHIII

Threshold Score

0.9

HHHHH

0.8

HHHHI

0.5

HHHII

0.3

HHIII

0.1

HIIII

(18)

18 I 54

But what about the measurements

on lower levels?

Vo lum e Du plicati on Un it size Un it c om ple xity Un it inte rfa cing Mo du le c ou pling Com po ne nt ba lan ce Com po ne nt ind ep en de nce Analysability X X X X Modifiability X X X Testability X X X Modularity X X X Reusability X X
(19)

19 I 54

Source code measurement

Logical complexity

T. McCabe,

IEEE Transactions on Software Engineering

, 1976

Academic: number of independent paths per method

Intuitive: number of decisions made in a method

Reality: the number of if statements (and while, for, ...)

McCabe: 4

(20)

20 I 54

How can we aggregate this?

(21)

21 I 54

Option 1: Summing

Crawljax

GOAL

Checkstyle

Springframework

Total McCabe

1814

6560

4611

22937

Total LOC

6972

25312

15994

79474

(22)

22 I 54

Option 2: Average

Crawljax

GOAL

Checkstyle

Springframework

Average McCabe

1,87

2,45

2,46

1,99

0" 200" 400" 600" 800" 1000" 1200" 1400" 1600" 1800" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 17" 18" 19" 20" 21" 22" 23" 24" 25" 27" 29" 32"
(23)

23 I 54 Cyclomatic complexity Risk category 1 - 5 Low 6 - 10 Moderate 11 - 25 High > 25 Very high

Sum lines of code

"

per category

"

Lines of code per risk category Low Moderate High Very high 70 % 12 % 13 % 5 % 0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%# Crawljax# Goal# Checkstyle# Springframework#

(24)

24 I 54

(25)

25 I 54

The formal six step proces

(26)

26 I 54

Visualizing the calculated metrics

(27)

27 I 54

Choosing a weight metric

(28)

28 I 54

Calculate for a benchmark of systems

Alves, et. al., Deriving Metric Thresholds from Benchmark Data, ICSM 2010

(29)

29 I 54

SIG Maintainability Model

Derivation metric thresholds

1.

Measure systems in benchmark

2.

Summarize all measurements

3.

Derive thresholds that bring out the

metric’s variability

4.

Round the thresholds

Cyclomatic complexity Risk category 1 - 5 Low 6 - 10 Moderate 11 - 25 High > 26 Very high

(a) Metric distribution (b) Box plot per risk category

Fig. 10: Unit size (method size in LOC)

(a) Metric distribution (b) Box plot per risk category

Fig. 11: Unit interfacing (number of parameters)

computed thresholds. For instance, the existence of unit test code, which contains very little complexity, will result in lower threshold values. On the other hand, the existence of generated code, which normally have very high complexity, will result in higher threshold values. Hence, it is extremely important to know which data is used for calibration. As previously stated, for deriving thresholds we removed both generated code and test code from our analysis.

VIII. THRESHOLDS FOR SIG’S QUALITY MODEL METRICS

Throughout the paper, the McCabe metric was used as case study. To investigate the applicability of our methodology to other metrics, we repeated the analysis for the SIG quality model metrics. We found that our methodology can be suc-cessfully applied to derive thresholds for all these metrics.

Figures 10, 11, 12, and 13 depict the distribution and the box plot per risk category for unit size (method size in LOC), unit interfacing (number of parameters per method), module inward coupling (file fan-in), and module interface size (number of methods per file), respectively.

From the distribution plots, we can observe, as for McCabe, that for all metrics both the highest values and the variability between systems is concentrated in the last quantiles.

Table IV summarizes the quantiles used and the derived thresholds for all the metrics from the SIG quality model. As for the McCabe metric, we derived quality profiles for each metric using our benchmark in order to verify that the thresholds are representative of the chosen quantiles. The

(a) Metric distribution (b) Box plot per risk category

Fig. 12: Module Inward Coupling (file fan-in)

(a) Metric distribution (b) Box plot per risk category

Fig. 13: Module Interface Size (number of methods per file) results are again similar. Except for the unit interfacing metric, the low risk category is centered around 70% of the code and all others are centered around 10%. For the unit interfacing metric, since the variability is relative small until the 80% quantile we decided to use 80%, 90% and 95% quantiles to derive thresholds. For this metric, the low risk category is a round 80%, the moderate risk is near 10% and the other two around 5%. Hence, from the box plots we can observe that the thresholds are indeed recognizing code around the defined quantiles.

IX. CONCLUSION

A. Contributions

We proposed a novel methodology for deriving software metric thresholds and a calibration of previously introduced metrics. Our methodology improves over others by fulfilling three fundamental requirements: i) it respects the statistical properties of the metric, such as metric scale and distribution;

ii) it is based on data analysis from a representative set of systems (benchmark); iii) it is repeatable, transparent and straightforward to carry out. These requirements were achieved by aggregating measurements from different systems using relative size weighting. Our methodology was applied to a large set of systems and thresholds were derived by choosing specific percentages of overall code of the benchmark.

B. Discussion

Using a benchmark of 100 object-oriented systems (C# and Java), both proprietary and open-source, we explained

Derive & Round

"

(30)

30 I 54 Cyclomatic complexity Risk category 1 - 5 Low 6 - 10 Moderate 11 - 25 High > 25 Very high

Sum lines of code

"

per category

"

Lines of code per risk category Low Moderate High Very high 70 % 12 % 13 % 5 % 0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%# Crawljax# Goal# Checkstyle# Springframework#

(31)

31 I 54

(32)

32 I 54

How to rank quality profiles?

Unit Complexity profiles for 20 random systems

Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

HHHHH

HHHHH HHHHI HHHII HHIII HIIII

?

(33)

33 I 54

Ordering by highest-category is not enough!

Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

HHHHH HHHHI HHHII HHIII HIIII

?

(34)

34 I 54

A better ranking algorithm

Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

Order categories

Define thresholds

of given systems

Find smallest

possible

thresholds

(35)

35 I 54

Which results in a more natural ordering

Alves, et. al., Benchmark-based Aggregation of Metrics to Ratings, IWSM / Mensura 2011

HHHHH

HHHHI

HHHII

HHIII

(36)

36 I 54

Second level thresholds

Unit size example

(37)

37 I 54

SIG Maintainability Model

Mapping quality profiles to ratings

1.

Calculate quality profiles for the systems in the benchmark

2.

Sort quality profiles

3.

Select thresholds based on 5% / 30% / 30% / 30% / 5% distribution

Select thresholds

"

HHHHH

HHHHI

HHHII

HHIII

HIIII

Sort

"

(38)

38 I 54

SIG measurement model

Putting it all together

Java JSP

Property measurements

per technology Property ratings per technology Measurements per system Rating per system

Unit size HHIII

Java

JSP Unit size HIIII

Unit size HHIII

Volume HHHII Duplication HHIII Java JSP Java JSP Duplication HHIII Duplication HHHII Java

JSP Unit complexity HHHHI

Unit complexity HHHII

Java JSP

Unit complexity HHIII

Java

JSP Unit interfacing N/A

Unit interfacing HHHHI

Java JSP

Unit interfacing HHHHI

Java

JSP Module coupling N/A

Java JSP

Module coupling HHHII Module coupling HHHII

Component balance HHHHH

Component independence HHHHI

Java JSP

Java

JSP Component independence N/A Component independence HHHHI

21 Man years Subcharacteristics Modifiability HHHII Analysability HHIII Testability HHHHI Modularity HHHHI Reusability HHHII N/A Maintainability rating Maintainability HHHII N/A N/A

(39)

39 I 54

Does this work?

(40)

40 I 54

SIG Maintainability Model

Empirical validation

•  The Influence of Software Maintainability on Issue Handling MSc thesis, Technical University Delft

•  Indicators of Issue Handling Efficiency and their Relation to Software Maintainability, MSc thesis, University of Amsterdam

•  Faster Defect Resolution with Higher Technical Quality of Software, SQM 2010

16 projects

(2.5 MLOC)

150 versions

50K issues

Internal:

maintainability

external:

Issue handling

(41)

41 I 54

Empirical validation

(42)

42 I 54

Empirical validation

Defect resolution time

(43)

43 I 54

Empirical validation

Quantification

(44)

44 I 54

SIG Quality Model

Quantification

Resolution time for defects and enhancements

Faster issue resolution with higher quality

Between 2 stars and 4 stars, resolution

(45)

45 I 54

SIG Quality Model

Quantification

Productivity (resolved issues per developer per month)

Higher productivity with higher quality

Between 2 stars and 4 stars, productivity

(46)

46 I 54

Does this work?

Yes

Theoretically

(47)

47 I 54

But is it useful?

Your question …

(48)

48 I 54

Software Risk Assessment

Analysis in SIG Laboratory

Final presentation Kick-off

session

Strategy

session Technicalsession

Technical validation session Risk validation session Final Report

(49)

49 I 54

Example

Which system to use?

(50)

50 I 54

Should we accept delay and cost overrun, or

cancel the project?

User Interface

Business Layer

Data Layer

User Interface

Business Layer

Data Layer

Vendor framework

Custom

(51)

51 I 54

Software Monitoring

Source code Automated analysis Updated Website Interpret Discuss Manage Act! Source code Automated analysis Updated Website Interpret Discuss Manage Act! Source code Automated analysis Updated Website Interpret Discuss Manage Act! Source code Automated analysis Updated Website Interpret Discuss Manage Act!
(52)

52 I 54

Example

Very specific questions

HHHHH

Volume

HHHHH

Duplication

HHHHH

Unit size

HHHHH

Unit complexity

HHHHH

Unit interfacing

HHHHI

Module coupling

HHHHI

Component balance

HHHHI

(53)

53 I 54

(54)

54 I 54

Summary

Vo lum e Du plicati on Un it size Comple xity Un it inte rfa cing Mo du le cou pling Com po nent ba lance Com po ne nt ind eped en ce Analysability X X X X Modifiability X X X Testability X X X Modularity X X X Reusability X X

Thank you!

Eric Bouwers

e.bouwers@sig.eu

0%# 10%# 20%# 30%# 40%# 50%# 60%# 70%# 80%# 90%# 100%# Crawljax# Goal# Checkstyle# Springframework#

HHHHH

References

Related documents

Geographically the decline in infant mor- tality was seen in most states ( Table III), 39 reporting lower provisional rates in 1969. and 11 showing an increase over

Standardization of herbal raw drugs include passport data of raw plant drugs, botanical authentification, microscopic & molecular examination, identification of

A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in

Composition of molding sand, on hardness, tensile strength and density of Aluminium alloy castings are studied and optimization aspects of casting process,

From one detour after finishing my undergraduate degree, to another after graduating medical school, my path to becoming a pathologist was not the direct one I envisioned.

It was decided that with the presence of such significant red flag signs that she should undergo advanced imaging, in this case an MRI, that revealed an underlying malignancy, which

In developing countries, the high level of income and wealth concentration seen in countries like China, India, Brazil, South Africa -- much higher than in developed countries

.. Weekly average SPAIW for oncology and surgery nursing units... , patient type or SPAIW) defined in the AcuityPlus PCS during the nurse–patient assignment process because these