Discussion - Design and validation of Software Requirements Specification evaluation checklist

In advance of the conference, a higher attendance than the actual number of participants (18) was anticipated. Despite this, based on the indicated experience of the

participants (Figure6.1) and the prestige of Europe’s leading RE conference the par-

ticipants are regarded as experts in the field of RE and confidence is placed in the quality and validity of the participants’ given responses.

At the start of the live study session, several participants indicated to be con- fused by the goal of the session and the materials that were provided. These questions were addressed and answered before the sessions started and the participants indicated having no further questions. Based on written feedback statements on the checklist sheet however, it seemed that some participants were under the impression that the checklist sheet was meant to validate the sample SRS, as opposed to validating the instrument. This meant that data resulting from the checklist sheets proved to be less valuable than anticipated.

The allocated time-slots of the live-study session did not allow for each participant to analyse the entire instrument. To mitigate this, four different sets of the

instrument were created where each set contained the ’core’ set of checks and a selection of the ’non-core’ set of checks. This meant that no participant analysed the checklist in its entirety but rather assessed the quality of the checklist based on their selection of checks and a general impression of the instrument. In an attempt to mitigate this problem, the (non-core) checks that weren’t included in a specific set, were greyed out on the checklist sheet but still readable. Because of this, each participant was considered to still be able to get a good general impression of the overall instrument. This did mean that certain checks were only closely inspected by a fourth of the total number of participants. The classification of ’core’ and ’non-core’ were based on intuition by the author, indicating a risk that certain checks that were in- advertently categorised as ’non-core’ did not receive the desired exposure. At the end of the session, no participant indicated requiring additional time. 18 sets of (at least partly) filled in checklist sheet were collected whilst only 14 sets of question- naires were filled in. Some participants were seen leaving the session early, but their reasons for doing so are unclear.

In analysing the quantitative results of the questionnaire, it was observed that in four instances not all of the 8 validation criteria were rated. In some cases a question mark was written instead. This indicates that some participants did not fully understand the definition of certain criteria, despite being provided with a short de- scription in the form of a footnote. For the participants who did rate these criteria, their interpretation might differ from the researcher’s interpretation.

In analysing the qualitative results, the vast majority (12 out of 19) of written

Strengths could be attributed to ‘Comprehensiveness’. Due to an unevenly distributed

Strengths attribution and the small sample size, written Weaknesses that were categorised into other quality criteria had a relatively large influence on the net strength score. As a result, most quality criteria received a negative Mean score whilst re- ceiving a decent net strength score (>5 out of 9). Furthermore, if one were to rank the criteria based on the net strengths, one written statement could affect the or- dering. Although the combined illustration provides a good general impression of the strengths and weaknesses of the instrument, the small sample size as well as the unequal distribution of the written Strengths and Weaknesses statements call for caution when interpreting the criteria’s Net strength as well as the combined results. Based on the quantitative results, the instrument is considered to have scored

well (>6.5) for Comprehensiveness, Pertinence and Fairness, and decently (>5) on

the criteria Concreteness, Applicability, Clarity and Ease of Use. Based on the qualit-

ative results, the checklist scored high (>60) for Comprehensiveness, and adequate

(>0) for Pertinence and Concreteness, slightly below 0 for Applicability and Ease of

Use, and well below 0 (-20) for Fairness and Parsimony. Comprehensiveness was regarded highly by both the qualitative and the quantitative responses. Ease of Use and Clarity scored low on both type of responses, indicating possible improvements to be made on these aspects. Fairness is a clear outlier as it is in the top 3 for the quantitative responses, but absolute bottom for the qualitative responses.

Although the instrument’s comprehensiveness is seen as a positive aspect, it might also negatively impact the usability of the checklist. Written weaknesses included: ‘Depending on the project, this completeness can be a hazard: many subjects can be skipped’, ‘Some questions seem difficult to answer or to answer in sufficient time with enough confidence’, ‘I’m not at all sure that ’one size fits all’ here - might work on some programs and not on others’ and that the checklist is ‘too generic’, ‘too ambitious’, ‘too complex’ and ‘too long’. Despite the overall positive results in the post-use questionnaire, these results indicate concerns for the usability of the

checklist and applicability to a variety of contexts. What might be overly complic- ated and too long in one context, might be too generic for an other. Further research investigating the usability and applicability of the instrument to different contexts are deemed important, as well as investigating ways to mitigate this, i.e. by adapting the instrument to specific contexts.

In predicting how the artefact will perform in practice (RQ2), one has to consider possible differences between the applicants (the Expert Group) and the target audience (practitioners in the field). Although most applicants indicated having sig- nificant experience in research as well as in practice, a certain bias might exist in the

social group that a conference attracts. As discussed in section3.2.6, there are indic-

ations of differences between what researchers and practitioners find important in validating SRSs but the difference in experience of the practitioner might also be a factor to consider. The Strong and Weak points that were expressed by the (highly experienced) participants might not correspond to the assessments of other, more novice, practitioners.

Recommendationlin 6.5mentions that there are checks in this instrument that

are not in the IEEE template and addresses that the instrument goes beyond. This is indeed the case as the IEEE templates mostly specifies what the SRS should contain.

The author is of the opinion, that as someoneevaluating a SRS you have a duty to

assert not only that the content of the SRS is not in conflict with any of the industry standards, but also to (at least some degree) evaluate whether the document makes sense given the project setting and is an adequate reflection of the stakeholders in- volved. Ultimately, the choice is left up to the evaluator whether he or she finds the contents of the SRS satisfactory or not.

In document Design and validation of Software Requirements Specification evaluation checklist (Page 71-73)