Open information environments (OIE) are environments in which new sources and uses
of information emerge and where the users of information, while having access to
different sources of information, often have no control over them (Parsons and Wand,
2014). The stakeholders in OIEs are usually information contributors (sources),
information consumers (users) and OIE sponsors (Parsons and Wand, 2014). OSS issue
repositories are OIEs in which the sources of information (issue data) are the developers
and non-developer users who report issues, and the users are the specific project
developers/maintainers who use the issue data for implementation purposes. The sponsors
could be the management team of web-based OSS environments (e.g., GitHub) and
organizations interested in OSS development (e.g., Redhat), who may pay their
employees to participate in OSS development.
Parsons and Wand identify the need to accommodate semantic diversity and
ensuring information quality as key requirements for OIEs to be successful. Different
users and sources are likely to have different views/interpretations of the information
generated in OIEs; these different views can lead to different meanings being assigned to
the data generated or data available, and views can change over time (Parsons and Wand,
153
mechanisms in place for accommodating the different and evolving views of sources and
users. The different contributors of OSS issue data (non-developer users and developers)
and users of these information (project developers, maintainers, and the interested
readers) come from diverse backgrounds (e.g., different organizations and nationalities),
and are geographically distributed (e.g., Crowston et al., 2012) and, therefore, can be
expected to possess diverse views about the issues concerning OSS projects. For example,
individuals from different countries could have very different views about user interface
(Schmid, 2014).
Individuals can observe some characteristics of objects in a domain and form their
individual perceptions about what they have observed; they could form different
conceptualizations about the same characteristics of an object (diverse views), and these
conceptualizations could even change over time for an individual (evolving views)
(Lukyanenko and Parsons, 2015). Issue data submitted by an individual issue reporter can
be viewed as information about some phenomena in the application domain of the
particular OSS project. Multiple issue reporters could observe the same issue, form
diverse perceptions about it, and report their individual descriptions about the observed
issue. This could result in issue information from different reporters being perceived as
duplicate by the project developers/maintainers. An example of duplicate issue
information is provided below:
Three different submitted bug reports perceived as duplicates of each other by project developers (source: open office Bugzillahttps://bz.apache.org/ooo )
[1] “The rows are way too small (in fact I can’t see a thing). I had to upsize the fonts to 22 to get a
decent view of the sheet.”
[2] “The default row height is set to 0.0 for all cells when first starting. I have been unable to find a
154
[3] “Just installed 1.0 on redhat 7.2 with KDE 2.2. Open up a new spreadsheet. The rows are invisibly
tiny. I select all rows with ctrl-a, then go to menu format/row/height and the height is showing as 0.03 cm with the default checkbox checked on. I enter 0.5 cm in the edit box and the default checkbox turns itself of. I close the dialog and the rows are now large enough to type in. Bug: The rows should not open so small. Where did the default of 0.03 come from?”
In the above example, the three bug reports were submitted by different reporters
who observed the same issue, formed different individual conceptualizations of it (diverse
views) and reported their individual descriptions about the issue. Bettenburg et al. report
that OSS developers may not perceive duplicate issue information as a serious problem;
instead, they may add useful information about an issue (Bettenburg et al., 2008). Hence,
diverse individual descriptions about the same issue could potentially help enrich the
issue information content. Since different issue reporters may make observations about
some phenomenon related to an OSS project in different ways, some may make rich
observations/mental visualizations and subsequently provide rich, detailed issue
information, while others may end up providing incomplete or incorrect issue information
as perceived by the project developers/maintainers. Incompleteness and incorrectness are
commonly occurring problems with the issue data in OSS issue repositories (Bettenburg
et al., 2008). In the above example on duplicate bug reports, it can be seen that the third
bug report has detailed information content, whereas the first bug report has limited
information content, potentially illustrating the differences in the mental
conceptualizations of their reporters at the time of reporting.
To support diverse and evolving views, that is, facilitate semantic diversity, a
desired property from OIE applications is that they should allow capturing and storing
155
Wand, 2014; Lukyanenko and Parsons, 2015). Classification is a human dependent
activity and can be greatly influenced by human characteristics such as experience and
knowledge (Lukyanenko et al., 2014b). The same thing may be classified differently by different individuals or the same thing may be classified differently by an individual at
different times (Lukyanenko et al., 2014b). For example, one individual may classify a passport as an identity document while another individual may classify it as a travel
document (Lukyanenko et al., 2014b). A priori classification presented in any IS artifact reflects fixed views that cannot easily accommodate the multiple and rapidly evolving
views that are commonplace in OIEs (Parsons and Wand, 2014). Fixed views imposed by
a priori classification can bias user-generated content to the views of a limited set of
contributors and prevent the inclusion of views of others (Lukyanenko and Parsons,
2015). This is because individuals can widely differ in their conceptualizations of objects
in some domain and individual conceptualizations can vary over time as well. As a result,
the views of many potential contributors may not match with the limited view that an a
priori classification imposes (Lukyanenko and Parsons, 2015). When information
contributors are unfamiliar with the classes presented by an information system artifact to
them, the result is a forced choice which does not match with the perceptions of the
information contributors (Lukyanenko et al., 2014b). This can have negative impacts such as lower quality of contributed information and information loss (Lukyanenko et al.,
2014b). As an example in the context of OSS issue repositories, comment 27 (Table 38) points out issues that are edge cases; for example, issues that are both a bug and an
enhancement. Other combinations are possible like bug-documentation or bug-
156
by issue gathering interface such as Bugzilla cannot accommodate such diverse cases and
may result in loss of such information.
In OSS issue repositories, this would mean that the issue reporters should be able
to specify their issue information as it is in their minds without having to worry about
assigning them to some a priori classes/labels that an issue gathering interface provides .
In other words, OSS issue gathering interfaces should capture issue information from
reporters without imposing the need to assign specific class labels to them while creating
and submitting issues. This can clearly support the diverse views of many different issue
reporters distributed across the globe. For example, consider the label issue type in the
Bugzilla issue reporting interface. If a reporter chooses enhancement as the type of his/her
issue, in order to be certain it is indeed an enhancement, he or she needs to be certain that
the requested characteristic is not already in the software which would mean having a
good knowledge of the current functionalities and characteristics of the software. It is
highly likely that often this is not the case. Consider the other label priority (severity is
similar to this). Prioritization of requirements often involves groups of requirements and
stakeholders, for example, high priority mould mean a requirement is likely to be
implemented much before several other requirements or that it is more important in
comparison to several other requirements to a group of stakeholders (Firesmith, 2004). A
lone issue reporter submitting a single issue at some time point may not have a very good
idea of priority and severity of the issue he or she is submitting. This is also indicated by
the following comment of a responding OSS developer in the second survey: #1:
“Reporters are very rarely able to accurately decide priority, severity or any of the other
157
established with a quick back and forth with the reporter.” Hence, by asking the issue reporter to assign such labels to their issue description, issue reporting interfaces such as
that of Bugzilla appear not to be accommodating diversity well in the views of issue
reporters (e.g., those issue reporters who do not have enough knowledge to provide all
labels). As a result, many issue reporters may provide incorrect labels (e.g., see developer
comment 18, Table 37), and the issue description gets stored along with those incorrect
labels. Thus enforcing a priori classification at the time of creation of issues is a potential
contributor to misclassification.
On the other hand, many OSS issue gathering interfaces (e.g., GitHub) provide a
simple interface that seeks to capture just the issue description from the issue reporter.
The issue reporters do not need to add any labels or classes to their issue information and
the issue information gets stored independent of any classes/labels. In GitHub, only
project developers/maintainers can assign labels to the submitted issues
(https://help.github.com/articles/creating-an-issue/). Thus, the decision makers (project
developers/maintainers) can infer the labels/classes (e.g., whether an enhancement,
feature request or a bug) for a particular issue from the issue description itself, provided
sufficient information has been provided in the description (c.f., Lukyanenko et al., 2014)
and the issue reporters are not forced to classify/label their issues at the time of creation
of their issues.
GitHub and Bugzilla issue gathering interfaces represent two popular, but
different, ways of capturing and storing issue information from reporters in OSS domain.
158
these interfaces are widespread and need improvement. Specifically, the true goal is getting the problem fixed. Therefore both interfaces would benefit by having a prominent area to accelerate any fix….” Google Code, Gitlab and Codeplex are examples of OSS development environments that use an issue reporting interface (shown in Appendix 4)
similar to that of GitHub whereas Jira is an example that is similar to Bugzilla.
Differences in how an information system captures information from contributors
can influence the quality of that information; for example, putting restrictions on
contributors can result in information loss (Lukyanenko et al., 2014). Therefore, it is
important to investigate how the two different approaches to issue data gathering in OSS
issue repositories may affect the information quality of issue data. This becomes even
more important considering the recent demands to GitHub management from some
GitHub developers for a complex issue reporting interface similar to that of Bugzilla.3 .In this third phase of the research, I take a qualitative approach to explore how the two
different issue gathering approaches in OSS development may contribute to the
misclassification problem (an information quality problem with OSS issue data) and what
can be done at the interface level for mitigating the misclassification problem.
The next section describes in greater detail the research methodology.
3 See, for example: https://github.com/dear-github/dear-github/issues/59 ; https://github.com/dear- github/dear-github/issues/72
159