Confirmation Bias as a Human Aspect in Software Engineering

(1)

Data Science Laboratory, Department of Mechanical and Industrial Engineering, Ryerson University

Confirmation Bias as a Human Aspect in Software Engineering

Gul Calikli, PhD

(2)

Why Human Aspects in Software Engineering?

 Enhance decision making under uncertainty, so that

managers can take decisions about efficient allocation of resources during any phase of the SDLC.

 Which parts of software should be prioritized for testing?

 Who should test/develop the most critical parts of software?

 Who should fix the bugs in the most problematic parts of the software?

 Who should/should not develop/

maintain the same source files?

 Who should we hire as a developer/

tester/analyst/designer?

(3)

Why Human Aspects in Software Engineering?

 People’s thought processes have a significant impact on software quality as software is analyzed, designed, tested, developed and managed by

people.

 While solving problems in daily life people use heuristics to solve

problems. When heuristics fail to produce a correct judgment, it results in a cognitive bias.

 Heuristics employed in daily software engineer- ing activities may also result in cognitive biases,

leading to defects.

 Some common cognitive bias types:



confirmation bias



anchoring and adjustment



availability



representativeness.

We focus on

confirmation

bias!

(4)

 Confirmation bias is defined as the tendency of people to seek evidence to verify a hypothesis rather than seeking evidence to refute that hypothesis.

Confirmation Bias in Software

Engineering

(5)

Confirmation Bias in Software Engineering

 Due to confirmation bias, developers tend to perform unit tests to make their program work rather than to break their code.

 During all levels of

software testing, we must

employ a testing strategy,

which includes adequate

attempts to fail the code to

reduce software defect

density.

(6)

Methodology to Quantify Confirmation Bias

Research Question 1:

How can we identify the measures

of confirmation bias in relation to

software development process?

(7)

Methodology to Quantify Confirmation Bias

Challenge: Quantifying confirmation bias to perform empirical analyses.

Proposed Solution:

Our methodology is an iterative

process and it mainly consists of the following steps:

1) Preparation of the confirmation bias test

2) Formation of the confirmation bias

metrics set

(8)

Confirmation Bias Test

 Confirmation bias test consists of the following:

 Interactive Test

 based on “Wason’s Rule Discovery Task”

 Written Test

 based on “Wason’s Selection Task”

Question Type No. of

Questions Abstract Questions 8 Thematic Questions 6 SW development/

testing questions

8 TOTAL 22

Written Test Content

(9)

Confirmation Bias Test

 Wason’s Rule Discovery Task

Initially, subject is given three numbers, which conform to a simple rule

Goal: Discover the correct rule

Experiment Protocol:

repeat until correct rule is announced

write down tree numbers &

reasons for choice;

receive feedback from tester;

if you are sure about the rule announce the rule;

end

if you want to terminate break; % terminated end

end

(10)

Confirmation Bias Test

 Wason’s Selection Task:

 Goal: To find out which of the four cards should be turned over to test the validity of the statement given below:

 “If there is a D on one side of the card, then it has a 3 on its other side.”

p

q p  q

p not- p

q not- q

(11)

Example: Wason’s Rule Discovery Task in Relation to Unit Testing

 Wason’s Rule Discovery Task:

 Subjects have a tendency to select many triples (i.e., test cases) that are consistent with their

hypotheses and few tests that are inconsistent with them.

 Observed Similarity with Functional (Black-box) Testing ³ :

 Program testers may select many test cases consistent with the program specifications (positive tests) and a few that are inconsistent with them (negative tests).

T: Triples conforming to the correct rule

H: Set of triples conforming to the hypotheses in subject’s mind.

(12)

Example: Wason’s Rule Selection Task in Relation to Unit Testing



Example

¹

: Suppose you want to make sure that a program avoids

dereferencing a null pointer by always checking before dereferencing.

Someone tells you there are only four sections of code to be tested, and they have determined the

following things about those sections:

If a pointer is

dereferenced, then it is checked for nullity.

 Section A checks whether the pointer is null. The pointer may or may not be dereferenced there.

 Section B does not check whether the pointer is null. The pointer may or may not be dereferenced there.

 Section C dereferences the pointer. The pointer may or may not have been checked for nullity.

 Section D does not dereference the pointer. The pointer may or may not have been checked for nullity.

Which sections need to be investigated further?

Stacy, W., & MacMillan, J. (1995). Cognitive bias in software engineering. Communication of the ACM, 38(6), 57–63.

(13)

Confirmation Bias Metrics Set

Interactive Test Metrics

Written Test Metrics

…. Next step in definition of the

metrics suite

(14)

Confirmation Bias Metrics Set: Some Practical Results

Test Se verity

Bins of Problem Solving Steps Falsifier Verifier Matcher None Interactive Test Outcome: Hypothesis

Testing Strategy

Written Test Outcome: Reich and Ruth’s Falsifier/Verifier/Matcher Classification

Group1*: Developers of a GSM/Telecommunications company (29 subjects)

Group 8*: Computer Engineering PhD candidates with minimum 2 years of development

experience (36 subjects)

(15)

Influence of Developers’ Confirmation Bias on Software Quality –Part 1

Research Question 2:

How do confirmation biases of developers affect software

quality?

(16)

Influence of Developers’ Confirmation Bias on Software Quality –Part 1

 Dataset:

 Steps of the Analysis:

 Formation of developer groups

 Estimation of developer groups confirmation bias metric values from individual values:

 Measurement of defect rate for each developer group

Analysis of the Pearson correlation between developer

groups’ confirmation bias metrics and defect rates

(17)

Influence of Developers’ Confirmation Bias on Software Quality- Part 1

 Estimation of the correlation between developer groups’ confirmation bias metrics (interactive test) and defect rates.

 Results:

Conventional effect

sizes as offered by Cohen

(Group1*) (Group8*)

(18)

Influence of Developers’ Confirmation Bias on Software Quality- Part 1

 Estimation of the correlation between developer groups’ confirmation bias metrics (written test) and defect rates.

 Results:

Conventional effect

sizes as offered by Cohen

(Group1*) (Group8*)

(19)

Influence of Developers’ Confirmation Bias on Software Quality – Part 2

Research Question 3:

How do measures of confirmation

bias perform in predicting defect

prone parts of software?

(20)

Defect Prediction Models

 Software quality is often measured by the number of defects in the software.

 Testing takes ~50% of overall time in Software Development Lifecycle (SDLC).

 Oracles/Predictors can be used to supplement testing activities for effective allocation of testing resources.

NASA Metrics Data Company Metrics Data Direct Usage of Metrics InfoGain/PCA

Equal Weighting Metrics Weighted Metrics

Decision Tree

Naïve Bayes Classification

(21)

Defect Prediction Models

• k-NN

• Naïve Bayes

• Bayesian Networks

• Neural Networks

• SVM

• Logistic Regression

• ……….

How can we enhance the performance of defect prediction models?

• under-sampling outperformed over-sampling.

• micro-sampling

Algorithms

Data Size

At the intersection of AI and SWE

• Design metrics

• File Dependency Graphs

• Churn Metrics

• Static Code Metrics

• CGBR

• Organizational metrics

• # of developers

• Developer experience

• Social Interaction nws

Data Content

Product/

Process- Related

People-related

(22)

Influence of Developers’ Confirmation Bias on Software Quality – Part 2

 Construction of the Prediction Model (also used in missing data problem)

 Algorithm: Naive Bayes

 Input data: static code, churn, confirmation bias metrics (models are constructed for each combination of these metrics)

 Preprocessing: undersampling

 10x10 cross validation

 Performance measures:

(23)

Influence of Developers’ Confirmation Bias on Software Quality – Part 2

Results

Dataset: ERP

Dataset: Telecom1

Dataset: Telecom2

Dataset: Telecom3

Dataset: Telecom4

(24)

Influence of Developers’ Confirmation Bias on Software Quality – Part 2

 Results Summary:

 Confirmation Bias is a single human aspect.

 Yet, using confirmation bias metrics led to comparable performance results in predicting defect prone parts of software.



The performance of defect prediction models built by using only confirmation bias metrics is comparable with the performance of the defect prediction models that use static code metrics and churn metrics.

 Therefore, we should further investigate other human

aspects…

(25)

Current Work: The Impact of Confirmation Bias on the Release-based Defect Prediction of Developer Groups:

 Problem: Predicting defect rates of developer groups for next releases

of a software product.

 Motivation:

Towards task assignment…

 Solution: Use Partial Least Regression (PLSR) and PCR (Principle

Component Regression).

 Methodology: Train the model with the releases 1, 2, i-1 and test it for the i^th release.

Defect Rate ?

( current & past releases)

unknown high low

REQUIRED!!!

Predict defect rates of developer groups

 use confirmation bias metrics

Avoid that group

Results (Dataset Telecom1) Results (Dataset ERP)

Results (Dataset Telecom2)

The group is ok

Residuals

Developer Group Indices Residual

range: [-0.08- 0.04]

(26)

Current Work: Dealing with Missing Data

 Problem: Collecting data (e.g. confirmation bias metrics) though interviews/tests might be challenging:



Tight schedule of developers



Evaluation apprehension



Lack of motivation



Staff turnover

 Solution: Use Expectation Maximization (EM) Algorithm to impute missing data.

 Methodology (Experimental Setup):



Form 2

^N

-2 different missing data configurations

 (N: Total number of developer groups)



Use EM to impute missing data



Build defect prediction models using imputed data



Compare obtained performance results with the performance of prediction models built using complete data.

All these result in

“missing data

problem”

(27)

Current Work: Dealing with Missing Data



Refer to previous work on defect prediction for the dataset, construction of the model and estimation of performance criteria.

 Experimental Results: Dataset: ERP Dataset: Telecom1

Dataset: Telecom2 Dataset: Telecom3

(28)

Current Work:

 Confirmation Bias Metrics: A new metrics suite proposal to measure the thought processes of developers

 We initially identified a confirmation metrics set.

 Our Current Goal: To complete the following To-Do list

 Form theoretical basis

 Refine existing metrics set



To empirically demonstrate the feasibility of our metrics.

 Formulate a single derived metric using the refined metrics set

 To analytically evaluate our metrics suite and single derived metric against the principles of “measurement theory”.

 Empirically validate the feasibility of the “single derived metric”.

Done!

We are here!

(29)

Current Work:

 Refine Existing Metrics Set:

 Criteria for the final metrics suite:



Metrics should not be highly correlated with each other



Check for the correlation between defect rates and the values of each metric



Metrics should be able to differentiate problematic software product from the rest.

Telecom1 Telecom2 ERP

• Metric Name: positiveCompatible

• χ2 : 84.9, df: 4

• Telecom1 is currently experiencing

serious post-release defects.

• Telecom2 and ERP are mission critical

Software as they include billing and

charging modules.

(30)

Current Work:

 Refine Existing Metrics Set ^(cont’d) :

 Criteria for the final metrics suite:



Metrics should be able to differentiate problematic software product from the rest.

• Metric Name: Ind

_ElimEnum^(Wason’s Eliminative/Enumerative Index)

• χ2 : 100, df: 6

• Telecom1 is currently experiencing

serious post-release defects.

• Telecom2 and ERP are mission critical

Software as they include billing and charging modules.

Telecom1 Telecom2 ERP

(31)

Current Work:

 Formulate a single derived metric

 In order to make the interpretation of the results much easier, we formulated a a single derived metric to quantify

confirmation bias level.

 Confirmation Bias Level:

Deviation of confirmation

bias metrics values from the

corresponding ideal metrics

values.

(32)

Current Work:



To analytically evaluate our metrics suite and single derived metric against the principles of “measurement theory”:

 According to “measurement theory”:



We begin with a set of objects, each of which has one or more common attributes, each of which in turn can be divided into exclusive and

exhaustive equivalence classes.



The objects and the relationship between them constitute an Empirical Relational System (ERS). In parallel, we construct a Numerical Relational System (NRS) comprising numbers and the relationships between them.



Example: Let M(x) be the value of the variable “length” for rod x, we assign numbers such that M(x) ≥M(y) if and only if x y , where represents

“not shorter than x”.  Establish a homomorphism from the ERS denoted by [A, ], where A represents the set of rows to the NRS denoted by [R,

≥]

Ê Ê

Ê

(33)

Current Work:



To analytically evaluate our metrics suite and single derived metric against the principles of “measurement theory”.



Question: Which concepts should be inherited from the “measurement theory” so that the following are prevented?



Lacking in desirable measurement properties



Being insufficiently generalized.



Some formulations of measurement fail for disciplines such as psychology (example: “concatenation” operation x o y = z )  such formulations should be identified. Appropriate ones should be inherited.



Goals:



Avoiding the criticism regarding the lack of theoretical base in the formation of a metrics set.



To form a formal methodology to define metrics sets for other cognitive aspects of people.

(34)

 Related Publications

 G. Calikli and A. Bener, “The Impact of Confirmation Bias on the Release-based Defect Prediction of Developer Groups”, the 25^th Conference on Software Engineering and Knowledge Engineering (SEKE 2013), Boston, USA, 2013. (submitted)

 G. Calikli and A. Bener, “Influence of Confirmation Biases of Developers on Software Quality: An Empirical Study“, Software Quality Journal, 2012

 G. Calikli, B. Caglayan, A. Tosun and A. Bener, “Modeling Human Aspects to Enhance Software Quality Management”, 2012 International Conference on Information Systems (ICIS 2012), Orlando Florida, USA, December, 2012.

 B. Caglayan, A. Tosun , G. Calikli, T. Aytac, A. Bener, and B. Turhan, “Dione: An Integrated Measurement and Defect Prediction Solution”, 20th International Symposium on Foundations of Software Engineering, Cary, North Carolina, USA, September, 2012.

 G. Calikli, and A. Bener, “Empirical Analyses of the Factors Affecting Confirmation Bias and the Effects of

Confirmation Bias on Software Developer/ Tester Performance”, Promise 2010, Tmişoara, Romania, September, 12-13, 2010.

 G. Calikli, and A. Bener, “Preliminary Analysis of the Effects of Confirmation Bias on Software Defect Density”, ESEM 2010, Bozen, Italy, September, 16-17, 2010.

 G. Calikli, B. Arslan and A. Bener, “Confirmation Bias in Software Development and Testing: An Analysis of the Effects of Company Size, Experience and Reasoning Skills”, 22nd Annual Psychology of Programming Interest Group Workshop, 19-21 September 2010.

 G. Calikli, A. Bener, and B. Arslan, “An Analysis of the Effects of Company Culture, Education and Experience on Confirmation Bias Levels of Software Developers and Testers”, ICSE 2010, May 2-8, Cape Town.

 G. Calikli, A. Tosun, A. Bener, and M. Celik, “The Effect of Granularity Level on Software Defect Prediction", Proceedings of the 24th International Symposium on Computer and Information Sciences (ISCIS 2009), pp.