Software Metrics: Roadmap
By Norman E. Fenton and Martin Neil
Authors (1/2)
n
Norman Fenton is Professor of
Computing at Queen Mary (University of London) and is also Chief Executive Officer of Agena, a company that specialises in risk management for
critical systems. He is head of RADAR (Risk Assessment and
Decision Analysis)
Authors (2/2)
n Martin Neil is a Reader in "Systems Risk" at the Department of Computer Science, Queen Mary, University of London, where he teaches decision and risk analysis and software
engineering. Martin is also a joint
founder and Chief Technology Officer
of Agena Ltd (UK)
Plan
n Introduction
n Brief history of software metrics
n Weaknesses of traditionnal approaches
n Causal models
n Future works
n Comments on the article
Introduction (1/9)
n The car accidents example
n « Data on car accidents in both the US and the UK reveal that January and
February are the months when the
fewest fatalities occur. »
Introduction (2/9)
n The car accidents example
n « Thus, if you collect a database of
fatalities organised by months and use
this to build a regression model, your
model would predict that it is safest to
drive when weather is coldest and roads
are at their most treacherous. »
Introduction (3/9)
n The car accidents example
n “Such a conclusion is perfectly sensible given the data available, but intuitively we know it’s wrong.”
n “The problem is that you do not have all the relevant data to make a sensible
decision about the safest time to drive.”
Introduction (4/9)
n The car accidents example
Introduction (5/9)
n “So what has this got to do with software metrics? Well, software metrics has been dominated by
statistical models, such as regression
models, when what is really needed
are causal models.”
Introduction (6/9)
n Software resource estimation
n “Much software metrics has been driven by the need for resource prediction
models.”
n “Usually this work has involved models of the form”
effort=f(size)
Introduction (7/9)
n Problems with effort=f(size)
n Size cannot cause effort.
n Such models cannot be used for risk assessment because they lack
explanatory framework.
n Managers can’t decide how to improve
things from the model’s outputs.
Introduction (8/9)
n Solution: causal modeling
n Provide an explanatory structure to explain events that can then be
quantified.
n Provide information to support
quantitative managerial decision-making during the software lifecycle.
n Provide support for risk assessment and
reduction.
Introduction (9/9)
n Software resource estimation
History of metrics (1/13)
n Def.: Software metrics is a collective term used to describe the very wide range of activities concerned with measurement in software
engineering.
History of metrics (2/13)
n These activities range from:
n Producing numbers that characterize properties of software code
n Models that help predict software
resource requirements and software
quality
History of metrics (3/13)
n Software metrics are used since the mid-1960’s
n At that time, Lines of Code was used
as a measurement of productivity and
effort
History of metrics (4/13)
n Problems using metrics:
n Theory and practice have been out of step
n Metrics often misunderstood, misused, and even reviled
n Industry is not convinced of metrics benefits
n Metrics programs are used when things
History of metrics (5/13)
n The two components of software metrics:
n The component concerned with defining the actual measures
n The component concerned with how we
collect, manage and use the measures
History of metrics (6/13)
History of metrics (7/13)
n Rationale for using metrics
n The desire to assess or predict
effort/cost of development processes
n The desire to asses or predict quality of
software products
History of metrics (8/13)
n “The key in both cases has been the
assumption that product ‘size’ should
drive any predictive models.”
History of metrics (9/13)
n LOC/programmer month as productivity measure
n Regression-based resource prediction by Putnam and Boehm:
Effort = f(LOC)
n Program quality measurement
(usually defects/KLOC)
History of metrics (10/13)
n In the mid-1970’s, we recognized the drawbacks of using LOC as a measure for different notions of program size.
n LOC cannot be compared between
high- and low-level programming
languages
History of metrics (11/13)
n From the mid-1970’s interest in
measures of software complexity and functional size (such as function
points)
n The rational for these metrics is still
to asses quality and effort/cost
History of metrics (12/13)
n Study of software metrics has been dominated by defining specific
measures and models.
n Much recent work has been
concerned with collecting, managing,
and using metrics in practice.
History of metrics (13/13)
n Most notable advances
n Work on the mechanics of implementing metrics programs
n
Grady and Caswell: first company-wide software metrics program
n
Basili, Rombach: GQM
n The use of metrics in empirical software engineering
n
Benchmarking and evaluating the
effectiveness of s.e. methods, tools and
technologies (Basili)
Weaknesses of traditional approaches (1/11)
n “The approaches to both quality prediction and resource prediction have remained fundamentally
unchanged since the early 1980’s.”
Weaknesses of traditional approaches (2/11)
n These approaches have provided some extremely valuable empirical
results, but cannot be used effectively for quantitative management and risk analysis, the primary objective of
metrics.
Weaknesses of traditional approaches (3/11)
n Regression-based model for quality prediction:
f(complexity metric) = defect density
n Problems
n Incapable of predicting defects
accurately
Weaknesses of traditional approaches (4/11)
n A further empirical study (Fenton) shown:
n Size metrics (while correlated to gross number of defects) are poor indicator of defects
n Static complexity metrics are not significantly better as predictors
n Counts of defects pre-release is a very bad indicator of quality
n
The lunch story
Weaknesses of traditional
approaches (5/11)
Weaknesses of traditional approaches (6/11)
n These results invalidate models:
n using pre-release faults as a measure for operational quality
n using complexity metrics to predict modules fault-prone post release
n
Complexity metrics were judged ‘valid’ if
correlated with pre-release fault density
Weaknesses of traditional approaches (7/11)
n Empirical phenomenon observed by Adam (1984):
n “[…] most operational system failures are caused by a small proportion of the latent faults.”
n The fact that fault density (in terms of pre-release faults) was used as a
measure of user perceived software
Weaknesses of traditional approaches (8/11)
n Explanations of the scatter plot
n “Most of the modules that had high number of pre-release, low number of post-release faults just happened to be very well tested.”
n A module that is never executed will
never reveal latent faults (no matter
how many), hence operational usage
must be taken into account.
Weaknesses of traditional approaches (9/11)
n Other problems with regression-based models for resource prediction:
n Lack causal factors to explain variation
n Based on limited historical data
n Resource constraints not modeled
n Black box models
n Cannot handle uncertainty
Little support for risk assessment and
Weaknesses of traditional approaches (10/11)
n The classic problem : “Is this system sufficiently reliable to ship?”
n Useful information:
n Measurement data from testing (such as defects found in various testing phases)
n Empirical data about the process and resources used
n Subjective information about the process/resources
n Very specific and important pieces of
evidence (proof of correctness)
Weaknesses of traditional approaches (11/11)
n In practice, we only possess fragments of such information.
n The question is how to combine such diverse information and then how to use it to help solve a decision
problem that involves risk.
Causal models (1/7)
n We need a model that take account of missing concepts from regression- based approaches:
n Diverse process and product variables
n Empirical evidence and expert judgement
n Genuine cause and effect relationship
n Uncertainty
n Incomplete information
Causal models (2/7)
n Def.: A BBN is a graphical network together with an associated set of probability tables. The nodes
represent uncertain variables and the
arcs represent the causal/relevance
relationship between the variables.
Causal models (3/7)
Causal models (4/7)
n Building and executing realistic BBN models is now possible because of recent algorithms and software tools.
n Practical applications:
n Medical diagnosis
n Mechanical failure diagnosis
n Help wizards in Microsoft Office
Causal models (5/7)
Causal models (6/7)
Causal models (7/7)
n Benefits of using BBNs:
n
Explicit modeling of ignorance and uncertainty
n
Combine diverse types of information
n
Makes assumption explicit
n
Intuitive graphical format
n
Ability to forecast with missing data
n
Use of ‘what-if?’
n
Use of subjectively or objectively derived probability distributions
n
Rigorous math semantic
n