CHAPTER THREE METHOD
3.2. Semi-structured Interviews
3.3.7. Assess the validity, reliability and acceptability of the questionnaire
3.3.7.1. Questionnaire validity and acceptability
3.3.7.1.1. Introduction
This part of the study was designed to improve the validity of the newly developed questionnaire.
Validity is known as the degree to which a questionnaire reflects reality (Damato et al., 2005).
The term validation refers to the process by which any data collection instrument, including a questionnaire, is assessed for its dependability (Damato et al., 2005). There are a number of types of validity, including face validity, content validity, criterion validity/predictive validity and concurrent validity. It was very important to understand each type of the aforementioned validity types to decide which type of validity was important to be tested in this study. It has been reported that face validity refers to whether questions appear to be measuring what needs to be measured. This relies on knowledge of the way people respond to survey questions and
drawbacks that are common in questionnaire design. However, some researchers believe that face validity is not really validity at all. They think that face validity refers to the appearance of a
questionnaire: Is it carelessly or poorly constructed or does it look "professional"? (Williams et al., 2006). Face validity is closely related to content validity (Burford and Bagnall, 2007).
Content validity refers to whether all important aspects of the construct are covered. In most cases, this form of validity is assessed (subjectively) by a panel of experts, who have to reach agreement (Ridley, 2005). Within the criterion of validity, predictive validity, refers to whether scores on the questionnaire successfully predict a specific criterion while concurrent validity refers to whether the results of a new questionnaire are consistent with the results of established measures.
Both face and content validity were evaluated in this part of the study. The questionnaire’s face validity was evaluated since it has been reported that face validity is an important consideration for both the pre-test and final draft of the questionnaire and professional-looking questionnaires are more likely help to increase the response rate (Williams et al., 2006). The content validity was also tested, as it was important to know that all important aspects of the research area had been covered.
3.3.7.1.2. Procedure
To improve the validity of the questionnaire, the researcher interviewed expert physiotherapists to reach a consensus. The main purpose of the interviews was to obtain respondent feedback on the questionnaire. The interviews served that purpose very well because they allowed the researcher to hear the respondents' comments on the questionnaire directly and to probe their exact meaning. It allowed both the researcher and the interviewees to raise and explore many useful issues such as how could the researcher increase the response rate. To test and improve the face validity, seven physiotherapists in the Regional Rehabilitation Unit at Northwick Park Hospital, London, UK were interviewed as part of this process. The researcher used a cognitive
testing method, which was a form of structured interviewing designed to improve the face validity of a questionnaire. The cognitive testing method was developed by Willis, Royston and Bercini (1991) and consists of three strategies (Willis et al., 1991). The first was the concurrent think-aloud technique, in which interviewees were asked to verbalise their thought processes as they respond to each question. The second was paraphrasing questions, which involves asking the interviewee to repeat the question using their own words in response to a particular question,
“What does this question mean to you?” The third strategy was the use of probes; a set of questions the researcher used to prompt the interviewees to explain their responses further.
Examples of probes questions include: “Can you think of a better way to ask this question so that it would be clearer to other interviewees?” and “Are there any words in the question that other clients may find confusing or unclear?” (Willis et al., 1991).
Interviews were organised by the clinical specialist / principal physiotherapist at the Regional Rehabilitation Unit at Northwick Park Hospital, London. Interviews were divided over two days as follows: 3 interviews (physiotherapist bands 7, 7 and 5) on the first day and 4 interviews (physiotherapist bands 6, 8A, 6 and clinical specialist) on the second day. Interviewees were given a copy of the questionnaire and a feedback sheet two days before their interviews and were asked to complete the questionnaire and write their comments on and opinions about the
questionnaire on the feedback sheet (See Appendix 4.1). During the face-to-face interviews, the interviewees were given sufficient time to express their opinions and comments about each question of the questionnaire.
The first question in the interview asked the interviewees about the time it took them to complete the questionnaire. This question was important to ensure that the time taken to complete the questionnaire, which the researcher wrote in the questionnaire introduction, was accurate. Brent (2013) has studied the time a respondent would be willing to spend completing a survey (Brent
2013). Brent (2013) emphasises the importance of understanding the audience when constructing a survey as it can help inform decisions on survey length (Brent 2013). He studied how the length of a survey (as measured by the number of questions) impacts on the time respondents spend on the completion of the questionnaire. He reviewed a random sample of roughly 100,000 surveys that were 1-30 questions in length, and analysed the amount of time that respondents spent completing them. He found that the relationship between the time respondents spent answering each question and the number of questions was not linear. The more questions the survey asks, the less time the respondents spend. On average, the researcher found that respondents spent just over a minute to answer the first question of a survey (including the time spent reading the introduction) and then about 5 minutes in total to answer the next 10 questions.
To increase the response rate, it has been reported that the introduction should provide sufficient and concrete information about a study in as short a paragraph as possible. Thinking ahead, it was necessary to ask interviewees about their opinion of the introduction. The researcher then moved to other sections and asked the interviewee whether they had any concerns with any section, in general, before going through all the questions one by one. During the interviews, the participants indicated whether each of the 26 questions was clear or unclear. Furthermore, at the end of the interviews, the participants were asked about questions that were deemed to be missing, irrelevant and ⁄ or confusing. This step aimed to identify unclear or redundant questions and to assess the respondents’ reactions to the questionnaire format and the ease of response. The primary rationale behind this process was to paraphrase questions that the study participants perceived as being relatively unclear.
In addition, the interviews constituted the content validity judgement, where physiotherapists were asked to give their written comments on the content of each part of the questionnaire and
then rate the acceptability of the questionnaire using a 100-point horizontal visual scale. Each section of the questionnaire had a separate scale.
3.3.7.1.3. Data analysis
Both quantitative and qualitative methods were applied. Qualitatively, the physiotherapists interviewed were also asked to comment on each section’s wording, clarity and meaning, including suggestions for refinement and modifications wherever necessary. Quantitatively, the physiotherapists were asked to rank the acceptability of the questionnaire using a 100-point horizontal visual scale. Each section of the questionnaire had a separate scale. The lowest rating (score 0) corresponded to “the questionnaire’s section was not acceptable” and the highest rating (score 100) corresponded to “the questionnaire’s section was very acceptable”. The mean and 95% confidence intervals (95% CI) of the VAS scores were calculated from all feedback. An adequate and acceptable level was set at a mean score of 75% or higher (Chung et al., 2007).
Based on the qualitative comments provided by the participants, additional questions were created. These new questions were generated from the participants’ comments elicited during the interviews. Participant comments included various suggestions. All comments and suggestions were considered to improve the questionnaire’s structure and questions. Full details of all changes made to the original draft of the questionnaire will be described in details in the results chapter.
3.3.7.2. Questionnaire reliability process
The stability of the final draft of the questions was assessed in terms of intra-rater test retest reliability. Agreement between two different completions of the questionnaire by the same physiotherapist was estimated by calculating the point-to-point percentage of agreement at
category level (Williams, 2003). Reliability testing specifically focused on the treatment activity section.
The questionnaire’s reliability was tested in Rookwood hospital, Cardiff. Seven physiotherapists who were working with ABI patients in Rookwood Hospital, Cardiff UK were invited to
participate in the process of testing the questionnaire’s reliability. The questionnaire was sent to these physiotherapists to complete. Two weeks later, the treatment activity section of the
questionnaire was sent to the same physiotherapists again. Each questionnaire had a unique code.
These codes were connected separately to the physiotherapists’ names to make sure that both sets of feedback were from the same physiotherapist.
3.3.7.2.1. Data analysis
Kappa scores for the intra-rater test retest reliability of individuals were calculated using SPSS version 20 for Windows. The following categories were used to judge the kappa values: kappa
<0.00 was considered “poor agreement”, 0.00-0.20 “slight agreement”, 0.21-0.40 “fair
agreement”, 0.41-0.60 “moderate agreement”, 0.61-0.80 “substantial agreement”, and 0.81-1.00
“almost perfect agreement”. This method was originally proposed by Landis and Koch in 1977 (Williams, 2003). Although the benchmarks which were used are very familiar and popular, they can be over-simplistic if regarded as being universally applicable. Therefore, the results were also interpreted in percentages. Weighted statistics were calculated to assess the agreement between the 2 ratings for each rater, and for each category of the treatment activity. For each section of the treatment activities (treatment technique, treatment adjuncts, treatment position and task), overall kappa statistics across both completions were estimated with a 95% CI. The reason why the researcher did not calculate the Kappa across each single subcategory of the treatment activity list was because most of the activities had at least one case where the value of the weighted variable