Samiksha: Mining Issue Tracking System for Contribution and Performance Assessment

10 

Loading....

Loading....

Loading....

Loading....

Loading....

Full text

(1)

Samiksha: Mining Issue Tracking System for Contribution

and Performance Assessment

Ayushi Rastogi

IIIT-D

New Delhi, India

ayushir@iiitd.ac.in

Arpit Gupta

DTU

New Delhi,India

arpitg1991@gmail.com

Ashish Sureka

IIIT-D

New Delhi, India

ashish@iiitd.ac.in

ABSTRACT

Individual contribution and performance assessment is a stan-dard practice conducted in organizations to measure the value addition by various contributors. Accurate measure-ment of individual contributions based on pre-defined ob-jectives, roles and Key Performance Indicators (KPIs) is a challenging task. In this paper, we propose a contribution and performance assessment framework (called as Samik-sha) in the context of Software Maintenance. The focus of the study presented in this paper is Software Maintenance Activities (such as bug fixing and feature enhancement) per-formed by bug reporters, bug triagers, bug fixers, software developers, quality assurance and project managers facili-tated by an Issue Tracking System.

We present the result of a survey that we conducted to un-derstand practitioner’s perspective and experience (specifi-cally on the topic of contribution assessment for software maintenance professionals). We propose several performance metrics covering different aspects (such as number of bugs fixed weighted by priority and quality of bugs reported) and various roles (such as bug reporter and bug fixer). We con-duct a series of experiments on Google Chromium Project data (extracting data from the issue tracker for Google Chr-omium Project) and present results demonstrating the ef-fectiveness of our proposed framework.

Categories and Subject Descriptors

H.4 [Information Systems Applications]: Miscellaneous; D.2.8 [Software Engineering]: Metrics—complexity mea-sures, performance measures; D.2.9 [Software Engineer-ing]: Management—Productivity, Programming teams

General Terms

Algorithms, Experimentation, Measurement

Keywords

Mining Software Repositories, Software Maintenance,

De-Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ISEC’13, Feb. 21-23, 2013 New Delhi, India

Copyright©2013 ACM yyy-y-yyyy-yyyy-y/02/13 ...$15.00.

veloper Contribution Assessment, Issue Tracking System

1.

RESEARCH MOTIVATION AND AIM

Contribution and performance assessment of employees is an important and routine activity performed within organi-zations to measure value addition based on pre-defined Key Performance Indicators (KPIs) [9] [11]. Measuring contri-bution and impact based on the job role and responsibilities of an employee is required for career advancement decisions, identification of strengths and areas of improvements of the employee, rewards and recognition, organizational efficiency and output improvement and making informed business de-cisions [9] [11].

Software Maintenance and particularly defect-fixing pro-cess facilitated by an Issue Tracking System (the focus of the work presented in this paper) is a collaborative activ-ity involving several roles such as bug reporter, bug triager, bug fixer, project or QA (quality assurance) manager and developers. Solutions for contribution and performance as-sessment of software maintenance professionals as well as defect-fixing performance is an area that has recently at-tracted several researcher’s attention [1] [5] [8] [10] [12] [13] [4]. The work presented in this paper is motivated by the need to develop novel framework and metrics to accurately and objectively measure contribution of software mainte-nance professionals by mining data archived in Issue Track-ing Systems.

The research aim of the work presented in this paper is the following:

1. To investigate models and metrics to accurately, reli-ably and objectively measure the contribution and per-formance of software maintenance professionals (bug reporters, issue triagers, bug fixers, software develop-ers and contributors in issue resolution, quality assur-ance and project managers) involved in defect fixing process.

2. To investigate role-based contribution metrics and Key Performance Indicators (KPIs) which can be easily computed by mining data (data-driven and evidence-based) in an Issue Tracking System.

3. To validate the proposed metrics with software main-tenance professionals in Industry (experienced practi-tioners) and apply the proposed metrics on real-world dataset (empirical analysis and experiments) demon-strating the effectiveness of the approach using a case-study.

(2)

2.

RELATED WORK AND RESEARCH

CON-TRIBUTIONS

In this section, we discuss closely related work (to the study presented in this paper) and present novel research contributions in context to existing work. We conduct a literature survey in the area of mining software repositories for contribution and performance assessment (specifically in context to Software Maintenance) and present our review of relevant researches.

The most closely related research to the study presented in this paper is the work by Nagwani et al. However, there are several differences between the work by Nagwani et al. and this paper. Nagwani et al. propose a team-member ranking technique for software defect archives [12]. Their technique consists of generating a ranked list of developers based on individual contributions such as the quantity and quality of bugs fixed, comment activity in discussion forums and bug reporting productivity [12].

Gousios et al. present an approach for measuring devel-oper’s involvement and activity by mining software repos-itories such as source code repository, document archives, mailing lists, discussion forums, defect tracking system, Wiki and IRC [5]. They present a list of pre-defined developer ac-tions (easily measurable and weight assigned based on im-portance) such as reporting a bug, starting a new wiki page, committing code to the source-code repository which are used as variables in a contribution factor function [5].

Ahsan et al. present an application of mining developer ef-fort data from bug-fix activity repository to build efef-fort esti-mation models for the purpose of planning (such as schedul-ing a project release date) and cost estimation [1]. Their approach consists of measuring the time elapsed between assigned and resolved status of a bug report and multiply-ing the time with weights (incorporatmultiply-ing defect severity-level information) to compute developer contribution measure [1]. Kidane et al. propose two productivity indices (for on-line communities of developers and users of the open source projects such as Eclipse) in their study on correlating tem-poral communication patterns with performance and cre-ativity: creativity index and performance index [10]. They define creativity index as the ratio of the number of en-hancement (change in the quantity of software) integrated in a pre-defined time-period and the number of bugs resolved in the same time-period [10]. Performance index is defined as number of bugs resolved in a pre-defined time-period and number of bugs reported in the same time-period.

Kanij et al. mention that there are no well established and widely adopted performance metrics for software testers and motivate the need for a multi-faceted evaluation method for software testers [8]. They conduct a survey of Industry professionals and list five factors which are important for measuring performance of software testers: Number of bugs found, Severity of bugs, Quality of bug report, Ability of bug advocacy and Rigorousness of test planning and execution [8].

Rigby et al. present an approach to assess the personal-ity of developers by mining mailing list archives [13]. They conduct experiments (using a psychometrically-based word count text analysis tool called as Linguistic Inquiry and Word Count) on Apache HTTPD server’s mailing list to assess personality traits of Apache developers such as dili-gence, attitude, agreeableness, openness, neuroticism and

extroversion [13].

Fernandes et al. mention that performance evaluation of teams is challenging in software development projects [4]. They propose an analytical model (Stochastic Automata Networks model) and conduct a practical case-study in the context of a distributed software development setting [4].

In the context of existing literature, the work presented in this paper makes the following novel contributions:

1. A framework (called as Samiksha: a hindi word which means review and critique) for contribution and per-formance assessment for Software Maintenance Pro-fessionals. We propose 11 metrics (Key Performance Indicators in context to the activities and duties of the appraisee under evaluation) for four different roles. The novel contribution of this work is to assess the de-veloper in terms of the various role played by a devel-oper. The metrics (data-driven and evidence-based) is computed by mining data in an Issue Tracking System. While there are some studies on the topic of contribu-tion assessment for Software Maintenance Profession-als, we believe (based on our literature survey) that the area is relatively unexplored and in this paper we present a fresh perspective to the problem.

2. A survey conducted with experienced Software Main-tenance Professionals on the topic of contribution and performance assessment for bug reporter, bug owner, bug triager and contributor. We believe that there is a dearth of academic studies surveying the current prac-tices and metrics used in the Industry for contribution assessment of Software Maintenance Professionals. To the best of our knowledge, the work presented in this paper is the first study to present the framework, sur-vey from practitioners and application of the proposed framework on a real-world publicly available dataset from a popular open-source project.

3. An empirical analysis of the proposed approach on Google Chromium Issue Tracker dataset. We con-duct a series of experiments on real-world dataset and present the results of our experiments in the form of visualizations (from the perspective of an appraiser) and present our insights.

3.

RESEARCH METHODOLOGY AND

FRAMEWORK

The research work presented in this paper is motivated by the need to investigate solutions to common problems en-countered by Software Maintenance Professionals. We be-lieve that inputs from practitioners are needed to inform our research. Hence we conduct a survey and discussion with two experienced industry professionals (project man-ager level with more than a decade experience in software development and maintenance) and present the results of the survey as well as our insights (refer to Section 4). We present 11 performance and contribution metrics for four roles: bug reporter, bug triager, bug owner and collaborator. We de-fine each metrics and describe the rationale, interpretation and the formula (Section 5). The metrics are discussed at an abstract level and are not specific to a project or Issue Tracker.

(3)

Table 1: Defining the metadata of the surve

Attribute Definition Scale

PIK Perceived Importance of Key Performance Indicators 5: Highly influence (important) 1: Does not influence (least important)

CopKC Current organizational practices to capture KPIs for 1: Captured accurately (formula based i.e. objective measure) Contribution & Performance Assessment 2: Captured accurately (based on subjective information,

self-reporting and perception) 3: Unable to capture accurately 4: Not relevant / Not considered

Table 2: Survey results for the role of Bug Reporter

Survey questions Company A Company B

PIK CopKC PIK CopKC

Number of bugs reported 4 1 3 1

Number of Duplicate bugs reported 5 - 4

-Number of non-Duplicate bugs reported 3 - 5

-Number of Invalid bugs reported 3 - 5

-Number of high Priority bugs reported 5 NULL 4 1

Number of high Severity bugs reported 5 NULL 5 1

Number of bugs reported that are later reopened NULL NULL 4 1

Number of hours worked - 3 -

-Quality of reported bugs 5 3 5 2

Followed bug reporting guidelines 5 2 5 2

Correctly assigned bug area (like WebKit, BrowserUI etc.) 5 NULL 4 3 Correctly assigned bug type (like Regression, Performance etc.) 4 NULL 5 3 Reported bugs across multiple components (Diversity of Experience) 4 3 2 3 Reported bugs belonging to a specific component (Specialization) NULL 3 4 3 Participation level delivered (responded to queries and clarifications) after reporting bug - 3 - 3

Table 3: Survey results for the role of Bug Owner

Survey questions Company A Company B

PIK CopKC PIK CopKC

Number of bugs assigned or owned 5 - 4

-Number of bugs successfully resolved (from the set of bugs owned) 5 NULL 5 1

Number of high Priority bugs owned 5 1 4 3

Number of high Severity bugs owned 5 3 4 1

Number of resolved bugs that get reopened 5 1 4 1

Number of hours worked - 1 - 3

Participated in (facilitated) discussion through comments on Issue Tracker 4 - 3 -Owned bugs across multiple components (Diversity of Experience) 5 3 3 3 Owned bugs belonging to a specific component (Specialization) 5 3 3 3

Average time taken to resolve bug 5 NULL 4 2

Response time to a directly addressed comment - 3 - NULL

Table 4: Survey results for the role of Bug Collaborator

Survey questions Company A Company B

PIK CopKC PIK CopKC Participation level delivered (responded to queries and clarifications) through online threaded discussions 3 NULL 4 1 Response time to a directly addressed comment on Issue Tracker 4 NULL 5 3 Collaborated in bugs across multiple components (Diversity of Experience) 2 4 2 3 Collaborated in bugs belonging to a specific component (Specialization) 3 2 2 3

Average time taken to resolve bug 5 3 1 2

Number of times collaborated on high Priority bugs 5 NULL 3 2 Number of times collaborated on high Severity bugs 5 NULL 3 2

Number of hours worked - 3 - 3

Table 5: Survey results for the role of Bug Triager

Survey questions Company A Company B

PIK CopKC PIK CopKC

Number of bugs Triaged 5 NULL 5 2

Number of hours worked - 2 - 3

Identified Duplicate bug reports 5 NULL 4 2

Identified Invalid bug reports 5 NULL 4 2

Identified exact bug area (like WebKit, BrowserUI etc.) 5 3 5 3 Assigned best developer considering skills and workload 5 - 5

-Correctly assigned owner (fixer) - 3 - 3

Correctly assigned bug type (like Regression, Performance etc.) 4 2 5 2

Correctly assigned Priority/Severity 5 3 5 3

Participated in (facilitated) discussion through comments on Issue Tracker 4 - 4 -Response time to a directly addressed comment on Issue tracker - 3 - 3

(4)

We conduct a case-study on Google Chromium (a widely used and a popular open-source browser) project. The Issue Tracker for Google Chromium project is publicly available and the data can be downloaded using Issue Tracker API. We implement the proposed metrics and apply the approach on Google Chromium dataset. We present the results of our experiments in the form of visualizations and graphs and interpret the results from the perspective of an appraiser (Section 7).

4.

SURVEY OF SOFTWARE MAINTENANCE

PROFESSIONALS

We conducted a survey to understand the relative impor-tance of various Key Performance Indicators (KPIs) for the four roles: bug reporter, bug owner, bug triager and collab-orator. We prepared a questionnaire consisting of four parts (refer to Tables 2, 3, 4, 5). Each part corresponds to one of the four roles. The questionnaire for each part consists of a question and two response fields. The question mentions a specific activity for a particular role. For example, for the role of bug owner the activity is: number of high priority bugs fixed, and for the role of bug triager the activity is: number of times the triager is able to correctly assign the bug report to the expert (bug fixer who is the best person to resolve the given bug report).

One of the response fields,PerceivedImportance ofKey Performance Indicators (PIK), denotes the importance of the activity (for the given role) on a scale of 1 to 5 (5 be-ing highest and 1 bebe-ing lowest) for contribution assessment. The other response field, Current organizational practices to captureKPIs forContribution and Performance Assess-ment (CopKC), denotes the extent to which these perfor-mance indicators are captured in practice for perforperfor-mance evaluation. We broadly define the notion of the ability to capture KPIs for Contribution and Performance Assessment into two heads: non-relevant and relevant. The KPI is either considered non-relevant (for evaluation of developers) and hence not measured or it is considered important. These relevant KPIs are either not captured (inability of the or-ganizations to measure) or they are able to capture it. The KPIs captured can again be classified based on the approach to measure it as objective or subjective. Objective measures are captured using statistics (based on formulas) while sub-jective measure is evaluation based on the perceived notion of contribution. This hierarchical classification completely defines the notion of measurement in organization (refer to Table 1 specifying the metadata for the survey).

We requested two experienced software maintenance pro-fessionals to provide their inputs. Both the professionals have more than 10 years of experience in the software indus-try and are currently at the project manager level. One of them is working in a large and global IT (Information Tech-nology) services company (more than 100 thousand ees) and the other is working in a small (about 100 employ-ees) offshore product development services company. Tables 2, 3, 4 and 5 show the results of the survey. Some of the cells in the table have values 0−0

or 0N U LL0. 0−0

means that at the time of the survey, the survey respondent was not asked this question and 0N U LL0 implies that the sur-vey respondent did not answer this question (left the field blank).

One of the interpretations from the survey result is the gap

between the perceived importance of certain performance in-dicators and the extent to which such inin-dicators are objec-tively and accurately measured in practice. We believe that the specified performance indicators (which are considered important in the context of contribution assessment) are not measured rigorously due to lack of tool support. Results of the survey (refer to Tables 2, 3, 4 and 5) are evidences sup-porting the need of developing contribution and performance assessment framework for Software Maintenance Profession-als.

It is interesting to note that the two experienced survey respondents (from different sized organization) have given higher or same value to some KPIs while for others, the re-sult varied considerably. Based on the survey rere-sults, we infer that the perceived value of an attribute varies across organizations. Therefore, the proposed metric should ad-dress the perceived value of the attribute for the organiza-tion while measuring the contribuorganiza-tion of a developer for a given role.

5.

CONTRIBUTION EVALUATION METRICS

In this Section, we describe 11 performance and contri-bution assessment metrics for four different roles. These 11 proposed metrics is not an exhaustive list but represents few important metrics valued by the organization. Following are few introductory and relevant concepts which are common to several metrics definition and Google Chromium project describes these bug-lifecycle and reporting guidelines1.

A closed bug can have status values as: Fixed, Verified, Duplicate, Won’tFix, ExternalDependency, FixUnreleased, Invalid and Others2. An issue can be assigned a priority value from 0 to 3 (0 is most urgent; 3 is least urgent).

Google Chromium issue reports has a field called as Area3. Area represents the product area to which an issue belongs. An issue can belong to multiple areas (for Google Chromium Project) such as BuildTools, ChromeFrame, Compat-System, Compat-Web, Internals, Feature and WebKit. A product area BuildTools maps to Gyp, gclient, gcl, buildbots, try-bots, test framework issues and the product area WebKit maps to HTML, CSS, Javascript, and HTML5 features.

5.1

Bug Owner Metrics

5.1.1

Priority Weighted Fixed Issues (PWFI)

We propose a contribution metric (for the bug owner) which incorporates the number of bugs fixed as well as the priority of each bug fixed. For each unique bug-owner in the dataset, we count only Fixed or Verified issues and not Du-plicate, Won’tFix, ExternalDependency, FixUnreleased and Invalid (because an Invalid bug or a Won’tFix bug is not a contribution by the bug owner to the project).

P W F I(d) = |P| X i=0 WPi× Nd Pi NPi (1)

whereWPi is the weight (a tuning parameter which is an

1 http://www.chromium.org/for-testers/bug-reporting-guidelines

2user defined Status as permitted by Google Chromium Bug Reporting Guidelines

3 http://www.chromium.org/for-testers/bug-reporting-guidelines/chromium-bug-labels

(5)

indicator of importance, higher for urgent bugs and lowest for least urgent bugs) or multiplication factor incorporating issue priority information. Due to the weightWPi (sum of

the weight equal to one), a bug-owner contribution is not just a function of absolute number of bugs fixed and is also dependent on the type of bugs fixed.NPiis the total number

of Fixed or Verified bugs in the dataset with priorityPiand

Nd

Pi is the number of priority Pi bugs Fixed or Verified by

developerdin the dataset. The value ofP W F I(d) will be between 0 and 1 (higher the value, more is the contribution).

5.1.2

Specialization and Breadth Index (SBI)

A developer can specialize and be an expert in a specific product area (component) or can have knowledge across sev-eral product areas. Let n be the total number of unique product areas and d denotes a developer. We represent

Bn(d) as an index for breadth of expertise and 1−Bn(d) as

an index for specialization.piis the probability value

(prob-ability of developerdworking on componenti) derived from historical dataset. A lower value of Bn(d) means

special-ization and a higher value means breadth of expertise of a developer. Bn(d) =− i=n X i=1 pi∗logn(pi) (2)

The Bn(d) value can vary from 0 (minimal) to 1

(max-imum). If the probability of all Areas is the same (same distribution) then the Bn(d) is maximal. This is because

thepivalue is the same for alln. On the other extreme end,

if there is only one Area associated to a particular developer

dthen theBn(d) becomes minimal (value of 0). The

inter-pretation is that whenBn(d) is low for a specific developer

then it means a small set of Product Areas are associated with the specific developerd.

5.1.3

Deviation from Median Time to Repair (DMTTR)

Bugs reported in Issue Tracking System can be catego-rized into four classes as adaptive, perfective, corrective and preventive4 (as part of Software Maintenance Activities). Depending on their class these bugs have different urgency with which they must be resolved. For instance, a loop-hole reported in security of a software must be fixed prior to improving it’s Graphical User Interface (GUI).

According to Google Chromium bug label guidelines, pri-ority of a bug is proportional to urgency of task. More urgent a task is, higher is its priority. Also, urgency is a measure of the time required to fix bug. Therefore, high priority bugs should take less time to repair as compared to low priority bugs (more urgent tasks take less time to fix). To support this assertion, we conducted an experiment on Google Chromium Issue Tracking System dataset to cal-culate the median time required to repair bugs with same priority. The results of the experiment support the assertion (as shown in Table 6).

One of the key responsibilities for the role of bug owner is to fix bugs in-time where time required may be influenced by various external factors. Bettenburg et al. in their work pointed out that a poor quality reported bug increases the efforts and hence the time required to fix it [2] [3]. Irre-spective of these factors, ensuring timely completion of the 4

http://en.wikipedia.org/wiki/Software maintenance

reported bugs is an indicator of positive contribution for the role of bug owner and vice-versa.

This metric is defined for Closed bugs with status Fixed or Verified. It calculates deviation of the time required to fix a bug from the median time required to repair bugs with same priority. DM T T R(o, p) = 1 Ttotal × |P| X i=0 X ∀bi wi×(Tbi−M T T RPi) where |Tbi−M T T RPi| if (Tbi−M T T RPi)>0 0 otherwise (3) DM T T R(o, n) = 1 Ttotal × |P| X i=0 X ∀bi wi×(Tbi−M T T RPi) where |Tbi−M T T RPi| if (Tbi−M T T RPi)<0 0 otherwise (4) Time required to fix a bug may be less than, greater than or equal to the median time required to fix same priority bugs. Less or equal time required to fix a bug is a measure of pos-itive contribution and vice-versa. We propose two variants of DMTTR(o) i.e. DMTTR(o,p) and DMTTR(o,n). For an owner o, DMTTR(o,p) measures positive contribution p (relatively less time to repair) and DMTTR(o,n) measures negative contribution n (relatively more time to repair). wi

is a normalized tuning parameter which weighs deviation proportional to priority i.e. less time required to fix a high priority bug adds more value (to the contribution of a bug owner) than less time required to fix a low priority bug. Herebiis a bug report with priorityPi. Tbi is the total time

required to fix bug bi and is measured as difference in the

reported timestamp of bug as Fixed and first reported com-ment with status Started or Assigned in the Mailing List.

M T T RPi is median of the time required to fix bugs with

Priority Pi (median is robust to outliers5). Ttotal is a

nor-malizing parameter and is calculated as summation of the time taken by each participating bug.

5.2

Bug Reporter Metrics

In Issue Tracking System bugs are reported by multiple users (project members and non-project members). Each of these users may report bugs with different amount and quality of information. This fact was highlighted by Schugerl et al. in their work. They said that quality of reported bugs vary significantly [14] and good quality reported bug optimizes the time required to fix it and vice-versa [2].

Chromium in its bug reporting guidelines specify the fac-tors that ensure quality of the bugs reported. However, eval-uating bug reports based on these guidelines is a subjective process [14], so is assessing the contribution of the bug re-porter (result of Online Survey mentioned in Section 4).

5.2.1

Status of Reported bug Index (SRI)

A Closed bug may have status values (any one) defined in equation 6. The fraction of Closed bugs reported by a developer is a measure of contribution for the role of bug reporter. However, the status with which these bugs were Closed is a measure of quality of contribution. For instance, 5

(6)

a bug Closed with status Fixed adds more value to the con-tribution of bug reporter than a bug closed as Invalid.

SRI(r) =Nr N × |s| X s=1 ws× Cr s Cs (5)

Let N be the total number of bugs reported andNr be

the total number of bugs reported by reporter r. ws is a

weight normalized tuning parameter defined for each Cs ,

whereCs is the count of bugs reported with statuss.Csr is

the count of bugs (reported by reporterr) with statuss. s

is defined as follows:

s=< F ixed >, < V erif ied >, < F ixU nreleased >, < Invalid >, < W ontF ix >, < Duplicate >, < ExternalDependence >, < Other6>

(6)

The relative performance is a fraction calculated with re-spect to baselineSM ode, whereSM odeis frequently delivered

performance.

5.2.2

Degree of Customer Eccentricity (DCE)

Google Chromium Issue Tracking System defines priority of a bug report in terms of urgency of task. A task is urgent if it affects business7. Among other factors, business is affected (measured by priority) if bug in the software influences large fraction of the user base. Hence, reporting large fraction of high priority bugs is an indication of degree of customer eccentricity. DCE(r) = |P| X i=0 WPi× Nr Pi NPi (7)

where WPi is a normalized tuning parameter with value

proportional to the priority. NPi is the total number of

priorityPibugs reported. NPriis the total number of priority

Pibugs reported by bug reporterr. This metric is defined

for Closed bugs with status Fixed or Verified.

5.3

Bug Triager Metrics

In Triage Best Practices8 written for Google Chromium project it is stated that identifying correct project owner is one of the roles of a bug triager. Guo et al. in their work said that determining ownership is one of the five primary reasons for bug report reassignment [6]. Jeong et al. in their study on Mozilla and Eclipse reported that 37%-44% of the reported bugs are reassigned with increase in time-to-correction [7].

A bug triager is supposed to conduct quality check on bug reports (to make sure it is not spam and contains re-quired information such as steps to reproduce etc.) and assign bug report to a developer who is (available) expert to resolve similar bugs. A bug triager should have good knowledge about the expertise, workload and availability of bug developers (fixers). Incorrectly assigned bug (error by bug triager) cause multiple reassignments which in turn in-creases the repair time (productivity loss and delays). We believe that computing the number of correct and incorrect 7

http://sqa.f yicenter.com/art/Bug T riage M eeting.html

8 http://www.chromium.org/for-testers/bug-reporting-guidelines/triage-best-practices

reassignment is thus an important key performance indica-tor for the role of a bug triager.

Google Chromium in its Issue Tracking System does not give any account of the Triager who initially assigned the project owner but subsequent reassignments (through labels defined in comments in the Mailing List) are available.

5.3.1

Reassignment Effort Index (REI)

Reassignments are not always harmful and can be the re-sult of five primary reasons: identifying root cause, correct project owner, poor quality reported bug, difficulty to iden-tify proper fix and workload balancing [6]. In an attempt to identify correct project owner, triagers make large number of reassignments. These large number of reassignments to correctly identify project owner is a measure of the efforts incurred by a bug triager.

REI(t) = Nt N × X ∀b CRtb CRb (8)

Let N be the total number of bug reports which were at least once reassigned andNt is the total number of bug

reports which were at least once reassigned by triager t. This metric is defined for Closed bugs with status Fixed or Verified. For a given bug reportb,CRbis the count of total

number of project owner reassignments andCRtbis the count

of project owner reassignments by triagert.

5.3.2

Accuracy of Reassignment Index (ARI)

Correctly identifying project owner is a challenging task. It involves large number of reassignments by multiple triagers. Each of these triagers make their contribution through re-assignments. However, accuracy index (for the role of bug triager) is a measure of number of times triager correctly identifies project owner in a bug with large number of reas-signments. ARI(t) = 1 n×CRmax X ∀b CRb (9)

where n is the total number of bug reports which were accurately reassigned (last reassigned) by triager t. For all bug reportsb,CRbmeasures the count of the total number

of reassignments before it was accurately (finally) reassigned by triager t. CRmax measures the maximum number of

reassignments before final. The value of ARI(t) lies between 0 and 1 (higher the value, more the contribution).

5.4

Bug Collaborator Metrics

A group of people who collectively (as a team) contribute to fix a bug are termed as bug collaborators. According to User guide for Project Hosting Issue Tracker9, collaborators provide additional information (in the form of comments in Mailing List) to fix bugs. These comments contribute in timely resolution of reported bug.

5.4.1

Cumulative Comment Score (CCS)

Introduction to Issue Tracker, a user guide for Project Hosting Issue Tracker, states that developers add multiple comments to Mailing List.

9

(7)
(8)

The number of comments entered in Mailing List is contri-bution of the developer and is used to measure performance.

CCS(c) = 1 n X ∀b Nbc Nb ∗Kb (10)

Here,Nbis the total number of comments in bug reportb.

Nbcis the total number of comments entered by collaborator

c in bug report b. n is the count of total number of bug reports. Kb is a tuning parameter to account for variable

number of comments in bug reports. Kb is defined as ratio

ofNb and NM ode, whereNM ode is the count of number of

comments entered frequently by collaborators (except col-laborator c) on a bugb. This metric is defined for Closed bugs with status Fixed or Verified.

5.4.2

Potential Contributor Index (PCI)

A developer can specialize and be an expert in one or multiple areas (as calculated inSBI(d)). Multiple develop-ers (potential contributors) are added to CC-list based on prior knowledge (historical data or past experience) of their domain of expertise. P CI(c) =X ∀a Nacc Na ×Va (11)

Given a bug report b, PCI metric identifies list of contribu-tors who are valued to fix it by assigning them values in the range from 0 to 1. It is defined for Closed bugs with status Fixed or Verified and CC-list defined. HereNa is the total

number of bugs reported in areaa. Nacc is the number of

times prospective collaborator c was mentioned in CC-list of bugs reported in areaa. Va is the importance of area of

reported bug (as calculated inP W F Ifor areaa).

5.4.3

Potential Contributor Index-1 (PCI-1)

P CI−1 extends the idea presented inP CI. In bug fixing lifecycle, contributors are incrementally appended to CC-list to meet additional support or expertise requirement.

P CI−1(c) =X ∀a NappCC a Na ×Va (12)

Rest parameters being same (as explained inP CI),NaappCC

is the count of number of times collaboratorcwas incremen-tally appended to the CC-list of bugs with Areaa.

5.4.4

Contribution Index (CI)

When a bug is reported, developers are added to CC-list as potential or prospective contributors. However, only few people mentioned in CC-list contribute to fix bug through comments in Mailing List.

CI(c, a) =X ∀a Ncol a Ncc a ×Va (13)

CI(c,a) measures expected to actual contribution for the role of bug reporter. HereNacc is the number of times

col-laboratorcwas added in CC-list of bugs reported in areaa.

Ncol

a is subset ofNaccin which developers mentioned in

CC-list actually contributed to fix it. Va is the importance of

area of reported bug (as calculated inP W F Ifor areaa). It

values contribution proportional to importance of area and normalizes it.

6.

EXPERIMENTAL DATASET

We conduct a series of experiments on Google Chromium project dataset. Google Chromium is a popular and widely used open-source browser. The Issue Tracker for Google Chromium project is publicly available and the data can be downloaded using Google Issue Tracker API. We per-form empirical analysis on real-world and publicly avail-able dataset so that our experiments can be replicated and our results can be benchmarked and compared by other re-searchers. Table 7 displays the experimental dataset details. We perform empirical analysis on one year dataset (as the context is contribution and performance assessment and we select one year as a performance appraisal cycle) which con-sists of 24,743 bugs. In this reported timestamp of one year, 6,765 developers reported bugs. These bugs were triaged by 500 developers. The triaged bug reports were owned by 391 bug owners. These bugs were finally fixed by 5,043 develop-ers who collaborated to fix it. Finally out of 24,743 reported bugs, 8,628 bug reports were Closed with status Fixed or Verified.

7.

EXPERIMENTAL ANALYSIS

In this section, we analyze and interpret results for the proposed metrics (defined for bug reporter, bug triager, bug owner and bug fixer) calculated on Google Chromium Is-sue Tracking System dataset. These metrics are defined for Closed bugs with status Fixed or Verified.

Figure 1a shows scatter plot of Priority Weighted Fixed Issues (PWFI) metric for the role of bug owner. Here X-axis is bug owners in dataset (arranged chronologically in order of non-decreasing performance) and Y-axis is score scale. We notice that 3 out of 342 bug owners displayed excep-tional performance on score scale (as shown in Figure 1a). Software maintenance project manager or appraiser may use these results to identify different levels of performance and measure individual contribution in a data-driven, evidence-based and objective manner. For example, 25% percentile of bug owners have PWFI value above 0.00014 and 75% per-centile of bug owners have PWFI value above 0.00273 (for more details refer Table 8).

Figure 1b is scatter plot of Specialization and Breadth In-dex (SBI) metric defined for bug reporter. Similar interpre-tation as in Figure 1a can be made for this metric i.e. varying level of performance, multiple levels and tiers of contribu-tion level and very few excepcontribu-tional performances in contrast to other members in project team. This metric aids in un-derstanding specialization and breadth of knowledge within team, in addition to individual performance. This knowl-edge can be used by organizations in preparing strategies for training manpower and assigning responsibilities based on specialization and breadth of knowledge. For example, 1.7% of bug reporters demonstrate significant breadth of knowledge relative to other bug reporters (SBI score value greater than or equal to 0.5).

Figure 1c and 1d is bar chart for PCI (Potential Contribution Index) and PCI1(Potential ContriContribution Index1) -its extension respectively . These metrics are defined for the role of bug collaborator. X-axis in Figure 1c is the contribu-tion by different collaborators for a given component and

(9)

Y-Table 6: Priority-Time correlation statistics

Priority Count Time Duration (in days)

(of bugs) Maximum 3/4Quartile Median 1/4Quartile Minimum

P0 ↑ 00258 0495.37 010.69 ↓ 02.71 ↓ 0.44 0.0002 P1 02,839 1120.37 044.10 11.04 2.03 0.0000 P2 17,923 1273.15 129.63 21.99 1.93 0.0000 P3 02,265 1180.56 206.02 55.90 5.03 0.0000

Table 7: Experimental dataset

Duration 1year

Start date of the first bug report in the dataset 01/01/2009 End date of the last bug report in the dataset 31/12/2009 Bug ID of the first bug report 5,954 Bug ID of the last bug report 31,403

Total number of Reported bugs 25,383

Total number of Forbidden bugs 640

Total number of Available bugs (25,383-640)=24,743

Table 8: Statistical results of proposed metrics (yearly evaluation)

Metric Total number of uniqueRole Count Minimum Maximum MedianStatistical ParametersMode Percentile 25 % Percentile 75 % Range

PWFI Bug Owner 342 0.00004 0.06119 0.00045 0.00004 0.00014 0.00273 0.06116 SBI 405 0.00092 0.78418 0.01042 0.00099 0.00219 0.06615 0.78326 DMTTR 75 0.0 1.44123 0.09194 0.0 0.00772 0.46293 1.44123 SRI Bug Reporter 5830 0.0 0.03225 0.00002 0.00002 0.00002 0.00007 0.03225 DCE 2050 0.00003 0.05267 0.00066 0.00003 0.00003 0.00024 0.05264 REI Bug Triager 61 0.00083 0.81832 0.00496 0.00165 0.00165 0.02643 0.8175 ARI 15 0.5 1 0.5 0.5 0.5 0.8125 0.5 CCS Bug Collaborator 1549 0.25 2.75 0.25 0.25 0.25 0.5 2.5 PCI10 522 0.0 2.88489 0.01407 0.0 0.0 0.04925 2.88489 PCI-111 307 0.0 0.91472 0.0 0.0 0.0 0.01407 0.91472 CI 9 0.00086 0.01861 0.00819 - 0.00534 0.01121 0.01775

axis is the list of components for which developer was added in CC-list. In this graph, we note non-uniform contributor-component distribution i.e. large number of people collab-orate on some components as compared to others. It also shows the most potential contributors for bugs belonging to specific component. This index can be used by organization to identify interests and knowledge domain of contributors. Figure 1d shows the inverted view of PCI. In this figure, X-axis is the list of components and Y-X-axis is the contribution by developers. This graph compares contribution of devel-opers on a component. For instance, ’a...@chromium.org’ (shown in blue) contributed significantly in all the compo-nents.

Figure 1e is a thresholded score of Status of Reported bug Index (SRI) metric. X-axis is the bug reporters arranged in the order of non-decreasing performance. Y-axis is the score on logarithmic scale. This graph shows a broad level picture of the metric. We notice two visibly distinct classes sepa-rated by threshold, where threshold (mode of performance) is close to low scoring bug reporters.

Figure 1f is a plot of a thresholded score for Cumulative Comment Score (CCS) metric defined for the role of bug collaborator. Here X-axis of the graph is bug collaborators arranged in the order of non-decreasing performance and Y-axis is performance with threshold at 1, where 1 is the value given to performance delivered by a large fraction of bug collaborators. This line differentiates high performance developers from low performance developers. In this graph,

we notice that out of 1549 bug collaborators (for more de-tails refer to Table 8), only 53 collaborators crossed thresh-old thereby clearly distinguishing exceptional performances from general.

Figure 1g and 1h shows bar charts for Deviation from Mean Time To Repair (DMTTR) and Contribution Index (CI). DMTTR is defined for role of bug owner to measure positive and negative contribution. X-axis of the graph is performance score and Y-axis is bug owners. Bar chart com-pares positive and negative contribution, thus presenting overall performance for the role. We notice in the graph that ’podivi...@chromium.org’ always exceeded the Median time required to fix the bugs. In the chart, we observe that ’pinkerton@chromium.org’ delivered more positive contribu-tion as compared to negative contribucontribu-tion. Organizacontribu-tions can use these statistics to predict the expected time to get bugs resolved given that bug is assigned to a specific bug owner.

Figure 1h shows a bar chart to measure Contribution In-dex (CI) of bug collaborator ’a...@chromium.org’. Here X-axis is the list of components for which ’a...@chromium.org’ was mentioned in CC-list and/or contributed. Y-axis is the total number of bug reports. In this graph, we notice that ’a...@chromium.org’ was 47 times mentioned in CC-list of bugs reported in component WebKit, however developer col-laborated only once (refer to Table 8 for more detailed de-scription).

(10)

Index (REI) metric and Degree of Customer Eccentricity (DCE) metric defined for bug triager and bug reporter re-spectively. X- axis (on both the graphs) is performance score of developers. Y-axis of REI is count (frequency) of number of bug triagers and Y-axis of DCE is count (frequency) of number of bug reporters. The histogram plot classify devel-opers into classes (as defined by bin size). The graphs infer level of performance delivered and identify the fraction of developers in each class. The two graphs give information at different levels of abstraction, where level of abstraction is defined in terms of number and size of bins used to in-terpret data. Graph in Figure 1i gives a broad view stat-ing that 49.17% of developers delivered performance greater than 0.009. A detailed view of the performance for the role of bug reporter is shown in Figure 1j. In this graph we no-tice that only 0.537 % of the developers deliver performance greater than 0.008.

The 10 graphs in Figure 1 show different chart types de-fined for different roles. These chart types present informa-tion at different levels of abstracinforma-tion with different interpre-tation (both from contribution assessment perspective and organizational perspective). A more detailed statistical de-scription of the results for all 11 metrics is shown in Table 8.

8.

THREATS TO VALIDITY AND

LIMITA-TIONS

The survey has two respondents (Software Maintenance Professionals working in Industry) and can be influenced by bias (based on individual experiences and beliefs). In fu-ture, we plan to conduct survey of more professionals which will result in reducing bias and generalizing our conclusions. The experiments are conducted on a dataset belonging to one project (Google Chromium open-source browser) col-lected for a duration of one year (by design due to setting the appraisal cycle as one year). More experiments can be conducted on dataset belonging to diverse projects to inves-tigate the generalizability of the conclusions. Due to limited space in the paper, we defined 11 metrics for the four roles of a software developer. However, more such metrics can be defined covering various aspects and performance indicators. The study presented in this paper serves the objective of in-troducing the broad framework (which can be extended by adding more metrics), viewing it in relation to an industry survey and validating it on a real-world dataset.

9.

CONCLUSIONS

We propose a contribution and performance evaluation framework (consisting of several metrics for different types of roles) for Software Maintenance Professionals motivated by the need to objectively, accurately and reliably measure individual value-addition to the project. We conduct a sur-vey with experienced Software Maintenance Professionals working in the Industry to throw light on the Key Perfor-mance Indicators and the extent to which such indicators are measured in practice. We implement the proposed met-rics and apply it on a real-world dataset by conducting a case-study on Google Chromium project. Experimental re-sults demonstrate that the proposed metrics can be used to differentiate between contributors (based on their roles) and help organizations continuously monitor and gain visibility on individual contributions. We conclude that mining Issue

Tracker data (through APIs or direct database access) for computing pre-defined performance metrics (incorporating role-based activities) and using visualization can provide an evidence-based and data-driven solution to the problem of accurate measurement of individual contributions in a soft-ware maintenance project (specifically on bug resolution)

10.

ACKNOWLEDGEMENT

The work presented in this paper is supported by TCS Research Fellowship for PhD students awarded to the first author. The author would like to acknowledge Dr. Pamela Bhattacharya and survey respondents for providing their valuable inputs.

11.

REFERENCES

[1] Syed Nadeem Ahsan, Muhammad Tanvir Afzal, Safdar Zaman, Christian Guetl, and Franz Wotawa. Mining effort data from the oss repository of developer’s bug fix activity.Journal of IT in Asia, 3:67–80, 2010.

[2] Nicolas Bettenburg, Sascha Just, Adrian Schroted, Cathrin WeiB, Rahul Premraj, and Thomas Zimmermann. Quality of bug reports in eclipse. InProceedings of the 2007 OOPSLA workshop on eclipse technology eXchange, eclipse ’07. ACM Digital Library, 2007.

[3] Nicolas Bettenburg, Sascha Just, Adrian Schroted, Cathrin Weiss, Rahul Premraj, and Thomas Zimmermann. What makes a good bug report? InProceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, SIGSOFT’08/FSE-16. ACM Digital Library, 2008. [4] Paulo Fernandes, Afonso Sales, Alan R. Santos, and Thais

Webber. Performance evaluation of software development teams: a practical case study.Electron. Notes Theor. Comput. Sci., 275:73–92, September 2011.

[5] Georgios Gousios, Eirini Kalliamvakou, and Diomidis Spinellis. Measuring developer contribution from software repository data. InProceedings of the 2008 international working conference on Mining software repositories, MSR ’08, pages 129–132, New York, NY, USA, 2008. ACM.

[6] Philip J. Guo, Thomas Zimmermann, Nachiappan Nagappan, and Brendan Murphy. ”not my bug!” and other reasons for software bug report reassignments. InProceedings of the ACM 2011 conference on Computer supported cooperative work, CSCW’11. ACM digital library, 2011.

[7] Gaeul Jeong, Sunghun Kim, and Thomas Zimmermann. Improving bug triage with bug tossing graphs. InProceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, ESEC/FSE’09. ACM digital library, 2009.

[8] T. Kanij, R. Merkel, and J. Grundy. Performance assessment metrics for software testers. InCooperative and Human Aspects of Software Engineering (CHASE), 2012 5th International Workshop on, pages 63 –65, june 2012.

[9] Robin Kessler.Competency-Based Performance Reviews: How to Perform Employee Evaluations the Fortune 500 Way. Career Press, 2008.

[10] Yared H. Kidane and Peter A. Gloor. Correlating temporal communication patterns of the eclipse open source community with performance and creativity.Comput. Math. Organ. Theory, 13(1):17–27, March 2007.

[11] Herwig Kressler.Motivate and Reward: Performance Appraisal and Incentive Systems for Business Success. Palgrave Macmillan, 2003.

[12] Naresh Nagwani and Shrish Verma. Rank-me: A java tool for ranking team members in software bug repositories.Journal of Software Engineering and Applications, 5(4):255–261, 2012. [13] Peter C. Rigby and Ahmed E. Hassan. What can oss mailing lists tell us? a preliminary psychometric text analysis of the apache developer mailing list. InProceedings of the Fourth International Workshop on Mining Software Repositories, MSR ’07. IEEE Computer Society, 2007.

[14] Philipp Schugerl, Juergen Rilling, and Philippe Charland. Mining bug repositories- a quality assessment. In2008 International Conference on Computational Intelligence for Modelling Control and Automation. IEEE Computer Society, 2008.

Figure

Updating...

References

Updating...

Related subjects :