A test as a whole is only as good as the items included in the test. The choice and number of items to include as well as the item format is crucial to the validity and reliability of the instrument. According to Schepers (1992) value scales, like the VSM, usually take on two formats, either questions or statements. Items 1 to12 on the VSM-94 state the question in the format of “how important is it to…” followed by the item content. Questions 13 and 14 are asked as individual questions with response scales different from those in item 1 to 12. Statements, like items 15 through to 20, can be positively or negatively stated and are usually responded to on a Likert scale which present the respondent with options ranging from agree/disagree or approve/disapprove (Gregory, 1996). Likert scales pose some problems and Schepers (1992) argues that the equal-interval quality of this scale declines when more than two of the points on the scale are anchored. Another problem is that statements with a strong positive or negative connotation will be endorsed without evaluating the content of the statement. These statements would not only increase response bias, but Swart et al. (1999) mentioned that item response distributions might tend towards a bimodal curve.
Validity and reliability of goniometric measurements were not verified in our study. This could have been done for example by an additional examiner doing another set of goniometric measurements, but we really think that this problem was thoroughly discussed and verified in the lit- erature [1-16] and we've deliberately chosen goniometric measurements as a reference to study more precisely vis- ual estimation and the role of experience. Considering all measurements as variables to be verified would have cer- tainly complicated the statistical analysis and weakened the conclusions. Based on literature statements and on our standardized technique for goniometric measure- ments, we can assume with some caution that these meas- urements were valid and reliable.
The goal of this study was to develop and validate a prac- tical tool that allows clinicians to measure postural sway in a clinical setting with body-worn accelerometers. We call our tool the instrumented sway system (ISway). Our vision is that this tool will provide reliable, automatic analysis of sway that is sensitive, accurate, robust, and consistent, without the need for clinical experts to deal with the raw data. To achieve this objective, we carried out two studies in order to determine: i) the sensitivity and experimental concurrent validity of ACC compared to force-plate mea- sures of postural sway; and ii) test-retest reliability and clin- ical concurrent validity of ACC-based measures compared to the PIGD. From this information, we recommend a sub- set of the most sensitive, reliable, and valid ISway measures to characterize postural control in PD.
A limitation of this study is the absence of convergent and discriminant valid- ity, but in the Greek context no choices of long established and reliable measures exist for this purpose. However, preliminary evidence from CFA used for con- struct validity are encouraging. Future research could further build on these ini- tial findings of construct validity adopting the modern, and holistic view of con- strict validity (Messick, 1989), also adopted by research standards for education and psychology research (AERA, APA, & NCME, 1999). According this view va- lidity is a unified construct tailored around construct validity (Chan, 2014). A second limitation is the use of EM to fill missing values that are not missing at random. Despite these limitations, the Greek version of TESC is both a valid and a reliable measure for the evaluation of students’ conduct, as perceived by their teachers. Finally, the validation of TESC, Greek version may boost the research of school-related conduct problems in Greece in relation to rejection and accep- tance theory (Rohner, 1975).
In this study, we aimed to create a standardized tool to assess basic surgical skills and to improve overall process of early surgical education. In summary, the assessment scale we developed is valid and reliable. It is an analytical scoring system that contains observable and measurable components of surgical performance. It will help educa- tors to reduce the subjectivity of the assessment and clearly express to the residents what is expected to obtain competence. Hopefully, this tool will provide a structured template for other residency programs to assess their residents for basic surgical skills.
Measuring students’ perception on specific area of interest influences the effectiveness of feedback process (28). Besides having powerful effects on learning, this focused feedback could provide important information on the possible flaws in anatomy education. Such information would reduce discrepancies between the current understandings or performance and a desired understandings or performance (28). Thus, gathering student perception through a specific, valid and reliable tool is valuable to signify the strengths, weaknesses, opportunities and threats to the anatomy department.
More than 90 items in the PROMIS-Fatigue item bank query the experience and impact of fatigue . Computer-adaptive testing (CAT) using fatigue bank items provides reasonably reliable and valid estimates with efficiency and participant burden. While attract- ive for use in many settings, CATs require the avail- ability of the internet, computers, or mobile technology (e.g., Smartphones), and algorithms for real-time administration and scoring. Moreover, the use of CATs in international clinical trials would re- quire that the complete item bank has undergone translation and cultural validation into multiple lan- guages. Thus in clinical trials and across health sys- tems, where internet access to support CAT platforms may be limited, the use of fixed-item short forms (SFs) may be preferable to ensure reliable collection of data.
Designing a valid and reliable tool to assess general and sport nutrition knowledge in an athletic population may provide the accurate information needed to advise better dietary choices and improve dietary intake . A recent systematic review  highlighted 38 studies which have used a nutrition knowledge questionnaire, only one  of which met the full validity and reliability criteria. Furthermore with regards to comprehensiveness rating  the four questionnaires which scored highest on validity and reliability [14, 16–18] scored between 36 and 55% on comprehensiveness rating. There is a clear need for a psychometrically validated nutrition know- ledge measure that can investigate the participant’s gen- eral and sport nutrition knowledge and the aims of this research were to develop a valid and reliable general and sport nutrition knowledge questionnaire for athletes.
Student learning benefits from timely and well-designed feedback. With generally increasing class sizes and competing pressures (teaching, research, service…) on time available for marking, implementing a rubric-based approach can help provide faster as well as valid and reliable feedback. This time-savings can become substantial when technology-enhanced options are pursued. For example, L. Anglin and K. Anglin’s 2008 study demonstrated that computer-assisted grading rubrics could be completed from 200%-350% faster than more traditional approaches. This is supported by practices developed from the Idaho project, which through the use of online rubrics reduced assessment marking from an average of 20 minutes per student essay to less than 10 minutes per essay. Time saved by adapting technology- enhanced marking practices can then be used elsewhere – for example, in student-teacher conferences or tutorial support. It can also be used to dramatically transform the marking process.
The results of the current study showed that the severity of the gag reflex can be predicted by a short survey, the Predictive Gagging Survey. The survey is valid: there is a moderately positive correlation between the score on our survey and the GSI. Furthermore, the survey is reliable: scores are very consistent over time. The survey takes no more than 5 minutes for a patient to complete, and it is also quick and easy for a dental health professional to score.
The reliability results of both this and Coffey’s data using this current new tool  have implications for future use in both clinical practice and research. These implications relate to both which parameters may be selected and who should perform the rating judgements. The data presented in this paper suggest that expert SLT ratings are the most reliable and hence the most useful in determining surgical outcomes and informing decision about SVR longitudinal management within a multi-disciplinary environment. The data also suggests that expert SLTs can proceed to routinely use ten parameters (Overall Grade, Social
parametric statistics have been used. However, to be able to analyze the standard error of measure (SEM), the smallest real difference (SRD) and effect size (ES) also the mean values and standard deviations a had to be ap- plied. All correlation coefficients (rho) were calculated using Spearman’s rank correlation, with a coefficient level of < 0.5 considered as low, 0.5–0.69 as moderate, 0.7–0.89 as high and 0.9–1.0 very high . To analyze the agreement between the two repeated measurements at 12-month the intraclass correlation coefficient (ICC) was applied. To check for systematic error between the two measurements, the Wilcoxon’ s signed rank test was used. Internal consistency of OMAS was calculated using Cronbach’s alpha . The standard error of measure (SEM) was defined by SEM = SD √(1-ICC) and SEM% by (SEM/mean) × 100 where mean is the mean for all values from test session 1 and 2. The smallest real difference (SRD) was defined by SRD = 1.96 × SEM × √2 and the SRD% by SRD/mean × 100 where mean is the mean for all values from test session 1 and 2. An ‘error band’ around the mean difference of the two measure- ments, d , was defined by 95% SRD = d ± SRD . Effect size of OMAS between six-month and twelve-month follow-up was calculated as (mean value of measurement 2 – mean value of measurement 1)/SD of measurement 1 . Significance was considered at the alpha level of p < 0.05. Before the statistical analysis of the validity of OMAS versus the five-grade rating scale, the subjects were reduced to three groups. Those who had answered “very good” and “good” formed one group (Group 1), “fair” formed one group (Group 2) and those who had answered “poor” and “very poor” formed one group (Group 3). When comparing the results between the three groups the Kruskal Wallis test was used and as ad hoc between each group the Mann–Whitney U -test was applied.
As shown in the critique chapter of this research, several critiques were illustrated about the Blue Ocean Strategy. It was pointed out that the critical reviews on the theory were not focused on one single part of the Blue Ocean Strategy but highlighted various different weaknesses of the theory, starting from the theory background itself till, the innovational understanding of the Blue Ocean Strategy. Resulting it can be said that the Blue Ocean Strategy has its weaknesses (Herman, 2008; Burke et al., 2009; and Kraaijenbrink, 2012) as shown by several reviewers, but on the other hand the theory itself still could be accounted as “young”. Therefore, Kim and Mauborgne should consider the weaknesses and critiques in a constructive way of improving their theory and revise their theory, excluding the weak points such as the definition of the Blue Ocean markets. Disposing the weak points of the theory, the Blue Ocean Strategy would become an even more complete and therefore reliable theory.
covers the 10 important functional domains most frequently affected by MG. 3) The proportion of bulbar and respiratory items to total number of items (4/10) is appropriate given the clinical importance of these domains. 4) The test items are appropriately weighted. For example, a maximum score for worst respiratory status is worth more points than the max- imum score for worst eyelid strength. 5) The MGC is easy to administer, taking less than 5 minutes to complete, without the need for any equipment. 6) The MGC is easy to interpret, taking less than 10 seconds to calculate a total score. Also, the assessment of each of the 10 test items provides immediate in- sight into the status of that particular functional do- main. 7) The MGC is reliable, as evidenced by the results of our test-retest assessment. 8) The MGC demonstrates concurrent and longitudinal construct validity in the MG practice care setting, based on the results of this prospective study conducted on ⬎ 150 patients at 11 centers (table 3, table 4, figure). 20 It
C onte nt validity of RA ID. A total of 24 persons returned their questionnaire to give their opinion on the scale. It included ® ve psychiatrists, one clinical psychologist, three comm unity psychiatric nurses, five carers and nine staff nurses working with the elderly in wards and day hospitals and one occupational therapist. Fourteen of them thought that all the item s in the scale were im portant. O ne suggested that sleep disturbance m ay not be an im portant item in the scale. O ne individual suggested inclusion of each of the additional sym ptom s like loss of appetite, aggres- sion, obsessive± com pulsive sym ptom s as an expres- sion of anxiety, difficulty in coping with unfam iliar surroundings and a separate section for the signs and sym ptoms of anxiety that do not ® t into a speci® c category were suggested. The explanation given of phobias and panic attacks were considered unsatisfac- tory by seven individuals.
When teachers organize planned and systematical out-of-school learning activities, students can understand the abstract and complex terms and topics better and therefore meaningful and deeper learning can occur. Within this context this study aims to develop a valid and reliable scale to determine the attitudes, behaviors, efficiency and competence of the teachers while teaching science on using the out-of-school learning environments supporting the in-class educational activities. The scale was administered to 520 teachers to evaluate the validity and reliability. An expert opinion was asked for the face and content validity of the scale and exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) were made for the construct validity. The results of the EFA displayed that the scale includes 24 items under 4 different factors. After the EFA, CFA was made to verify the structures of the scale. The fit indexes obtained from CFA were acceptable. Therefore, it was inferred that the items of the scale were in accordance with the related models. Besides, the Cronbach’s Alpha coefficient was calculated as .89. The analyses and obtained values revealed that the scale is valid and reliable.
The present study seeks to investigate the potentiality of the translation task as a testing method for measuring reading comprehension. To achieve this objective, two types of translation tests, open-ended and multiple-choice tests, and two types of reading comprehension tests, multiple-choice reading comprehension and open-ended cloze tests were developed in this study. The reliability of the tests was computed in order to estimate which translation test was more reliable and valid. Correlation coefficients were run in order to investigate whether translation tests worked as reliable and valid measures of reading comprehension, and to examine the relationship between proficiency in reading comprehension and proficiency in translation. The results indicate that the open-ended translation test is more reliable and valid than the multiple-choice one; translation has a high potentiality to work as a reliable and valid tool to assess reading comprehension; and there exists a high positive correlation between the participants’ proficiency in reading comprehension and their proficiency in translation. The findings of this study might have pedagogical implications for instructors. They may be justified to highlight the role of translation tests and benefit from them in their reading comprehension classes.
This study was the first step in developing the ReliROM, a questionnaire that aims at measuring religion and spirituality (R/S) in a reliable and valid way along multiple dimensions in Routine Outcome Monitoring. Based on theoretical considerations, 70 items from existing questionnaires measuring R/S were selected and filled in by 366 clinical and non-clinical patients. The aim of the present study was to refine the item pool and generate a provisional version of the questionnaire. Principal component analysis identified two dimensions of R/S: intrinsic religiosity and divine struggle. Furthermore, assessment of the responsiveness of the scales showed Searching for Meaning, Anxiety and Passivity to be most sensitive for measuring change over a three month period. At last, a hierarchical cluster analysis differentiated five religious profiles for psychiatric patients, namely Highly religious, Moderately religious, Struggling with divine, Struggling with meaning and Minimally religious. A MANOVA followed up by a simple contrast revealed highly religious patients to be more satisfied with their interpersonal relationships and functioning better in their work and leisure than patients who are struggling with meaning and minimally religious patients. It is suggested that items measuring the following three aspects of R/S need to be included in the ReliROM: 1) internalized, positively valued R/S, 2) negatively experienced R/S and 3) searching for meaning.
The KDQOL-36™ is considered reliable and to have good reproducibility, as indicated by the high ICC value of >0.98 in all of the subscales. For test-retest reliability, an ICC of 0.70-0.86 demonstrated the stability of the scale over time . The Kappa Index of 0.68-1 and Weighted Kappa of 0.73-1 indicated a substantial to per- fect agreement across various items in test and retest re- liability . The Cronbach’s alpha values suggested that the scale is internally reliable. The internal reliability of all of the subscales exceeded 0.65, with the exception of the PCS and MCS. As mentioned, the item selection for the PCS may need to be revised using the Hong Kong specific version to replace the standard version. The rela- tively low Cronbach’s alpha for the MCS could have been affected by the fewer items in the scale. Any instrument with more than 14 items may have a higher Cronbach’s alpha value even if the items reflect different underlying constructs .