Research instrument - in software development

Assessment models used in CMMI (SEI, 2000) and SPICE (ISO 15504, 1998), both of which measure software development capability, seek to measure objectively an attribute that would normally be considered subjective. This is done by considering what objective evidence would normally demonstrate different achievement levels of the subjective attribute. The number of data points sought for each level of attribute contribute to measurement repeatability and reproducibility. There have been empirical studies of both CMMI (Herbsleb et al., 1994; Goldenson and Gibson, 2003) and of SPICE (Simon et al., 1997; Hunter and Jung, 2000; Jung et al., 2001) that confirm their validity.

The questionnaire was based on assessment models and data collection methods used in SPICE assessments5. The questionnaire was made up of 49 questions developed by identifying the

I have participated in the development of the ISO standard for process capability assessments ISO 15504 – Software Process Improvement and Capability dEtermination. I have also participated in the development of the OOSPICE process assessment model for object oriented software development. I am a trained SPICE assessor and have also participated in the SPICE trials by conducting several assessments of the software development processes

Page 105

constructs needed for the research, then constructing questions to gather data for the constructs. The questions were then allocated to a question category that related to the subject area so that questions could be asked in logical groupings and the interview could deal with each subject area without having to return to it. The interview questions (Appendix B) were then linked to a research question so that the interview questions could be reported by research question or by construct. Listing them in the different ways (Appendix A and B) permitted checking that each construct and research question was adequately covered and that there was not an undue concentration in any one area. Some interview questions would provide information that related to more than one research question.

A list of possible or probable responses was developed for each question. When the question sought an ordinal response, an ordinal scale of responses was developed. When the question sought nominal information, a list of the more expected responses was developed. This aided note-taking during the interview and added context to the question so that if the subject found the question ambiguous there was additional information available to clarify it. For example, the first question sought a measure of the size of the organization, an ordinal measure. A question on the type of system being developed listed the industry sectors for which, from my experience and knowledge of the local software development industry, software was developed (Table 23).

Table 23: Example questions from the structured interview script showing an ordinal scale of responses and a nominal list of potential responses.

1 How large is the software development organization - number of personnel, number of divisions, number of locations?

< 10 staff 11 - 30 31 - 120

>120 - 1000 single organization > 1000 or Multinational

6 What type of system is being developed? Financial, ERP, CRM, SRM Military Infrastructure Telecommunications Medical Transport Services Factory automation Other

used by my organization over a two year period. I have also assisted in an assessment of the software development section of a large Australian State Government department using the OOSPICE method. This experience was used in developing the research instrument to ensure that the data sought was as objective as possible and contributed to the attribute of interest.

Page 106

Specific terminology used in this research do not necessarily mean the same thing to all project managers. For example, to one project manager the term “project monitoring” might mean only those activities that directly seek information while to another it might mean all activities that yield information about the project. For this reason, project managers were not directly asked “How do you monitor your projects?” but were, instead, asked how they dealt with a situation. Their responses yielded information on project monitoring, project control and project

coordination.

The total number of questions was constrained by the time that it was thought each interview would take. While most people are prepared to spend up to an hour on a research project such as this, asking for more than one hour of their time was thought likely to discourage participation.

4.3.1 Reviewing the research instrument

The research instrument was briefly presented at a PhD assessment conducted by Associate Professor Jie Lu, Associate Professor Barry Jay and Professor Chengqi Zhang of the University of Technology, Sydney. The objective of the doctoral assessment is to “ensure that the student has knowledge and skills to enable successful and timely completion of the research program” (UTS, 2005). To achieve this, I gave a presentation outlining the problem and its importance, the state of research known to date, the research approach and proposed research method. A member of the panel asked how many questionnaire questions dealt with “organizational distance”. At that time there was one question that sought information to classify the

organization on a simple, four quadrant diagram of organizational distance. The panel member expressed a concern that one question would not elicit sufficient data for a construct so central to the proposed research. As a consequence, the questionnaire was modified to address those concerns and to review the adequacy of information sought for all of the constructs to be used in the research (see Section 4.8). The revised questionnaire was reviewed by the panel member. There was no formal pilot study performed for the questionnaire. Instead, the initial interviews indicated that some questions needed clarification or augmenting. Consequently, some questions were revised after the initial interviews to clarify them, and some questions were added to seek information on requirements management and project management process capability, neither of which were included in the research analysis due to the emerging depth of smaller area of project management mechanisms.

After about ten interviews, it became obvious that project managers did not manage outsourced team members differently to their own team members so the remaining subjects were simply asked whether or not they managed the outsourced team members differently. This obviated the need to spend 15 to 20 minutes answering the questions that were the same as previously asked

Page 107

questions, except that they dealt with outsourced development. If the project manager indicated that they did manage outsourced tasks differently to co-located projects then all the questions were asked.

4.3.2 Reliability

Reliability is the consistency or repeatability of the measures (Trochim, 2001 p88). The structured interview questions and their response checklists were modelled on the ISO 15504 (SPICE) assessment model (ISO 15504, 1998) whose reliability was established during trials (between September 1996 and June 1998). The trials established that internal consistency was high enough to be usable in practice.

Although the current research cannot be as rigorous in a one hour interview as a five day ISO 15504 assessment, it borrows from the method to convert a subjective judgement to one that is as objective as possible and thus achieve some measure of reliability.

An evaluation of the SPICE trials noted that there were two aspects to reliability: internal consistency and inter-rater agreement (Jung et al., 2001). They noted that internal consistency was affected by ambiguities in wording and inconsistencies in the interpretation of the wording. This research was minimally affected by ambiguities and interpretation since the structured interview questions were developed by the same person that conducted the interviews. Also most of the questions were accompanied by a checklist of potential responses that clarified the intention of the question and provided a context for the responses. To avoid prejudicing the interviewee’s possible responses, they were not shown the checklists. While these factors reduced ambiguities, there was no formal independent test of the interview questions’ internal consistency.

Inter-rater consistency does not apply because there was only one “rater”. However, the question of consistent rating by the one “rater” should be addressed. Deciding which of several possible values on a nominal scale best represents the interview subject’s response is a

subjective judgement. Lacking any constraints such subjective judgements may reduce the survey’s repeatability. This can be prevented by ensuring that the listed responses are such that a subject’s response clearly and unambiguously fits only one item in the list.

In document in software development (Page 119-122)