Inter-rater reliability

Top PDF Inter-rater reliability:

Inter-rater reliability of data elements from a prototype of the Paul Coverdell National Acute Stroke Registry

Inter-rater reliability of data elements from a prototype of the Paul Coverdell National Acute Stroke Registry

Table 3 shows the estimates of inter-rater reliability for nominal data related to the emergency department (ED) processes relevant to the decision to treat a patient with tissue plasminogen activator (tPA). These data include the documentation (i.e., presence or absence) of critical time points such as stroke onset time, ED arrival time, time first seen by a doctor, time of stroke team consult, and time of initial brain imaging. Only 2 items had excellent agree- ment (i.e., kappa ≥ 0.75): the presence of hemorrhage on initial brain image, and whether time (i.e., > 3 hours) was the reason that tPA was not given (Table 3). Several important variables had poor agreement (kappa < 0.40) including whether a stroke team was consulted, whether the time of the stroke team consultation was documented, and whether the time of initial brain imaging was docu- mented. Data items relevant to understanding the onset of stroke symptoms showed only moderate reliability at best; whether the onset of symptoms was specific (i.e., known precisely) or estimated (i.e., identified as occurring within a 6 hour window) showed moderate (kappa 0.51) and poor (kappa 0.22) reliability, respectively. Finally, the source of the onset time information (i.e., witnessed, patient self report or not documented) were all unreliable measures (kappa < 0.40). Most of the items shown in Table 3 had BIs that exceeded ± 0.10 indicating substantial systematic differences between hospital abstractors and the audit rater. For example, documentation that the date and time of stroke team consult was missing (non-docu-
Show more

9 Read more

Inter rater reliability and acceptance of the structured diagnostic interview for regulatory problems in infancy

Inter rater reliability and acceptance of the structured diagnostic interview for regulatory problems in infancy

The present findings indicate that the Baby-DIPS is a reli- able and acceptable structured diagnostic interview for the assessment of RPs in infancy. Overall, inter-rater reli- ability was good to excellent for current and lifetime RPs. Importantly, a high inter-rater agreement was also found for the absence of RPs. Similarly, a strong agreement between the raters on the severity ratings of assessed RPs was found. It should be mentioned that the inter-rater reliability was not assessed for feeding difficulties due to a low base rate (see Table  3). These findings cannot be compared to other interviews for RPs in infancy because the Baby-DIPS is the first structured diagnostic interview specifically for RPs adaptable to the first year of life. The Baby-DIPS showed similar levels of inter-rater agreement as the parent-version of the Kinder-DIPS [37], which has good inter-rater agreement on lifetime major diagnostic categories (k = 0.94–0.97).
Show more

11 Read more

An instrument for quality assurance in work capacity evaluation: development, evaluation, and inter rater reliability

An instrument for quality assurance in work capacity evaluation: development, evaluation, and inter rater reliability

In this study, 20 anonymous experts ’ reports, detailing the work capacity evaluation of disability pension claim- ants, were simultaneously assessed by all peers to deter- mine inter-rater reliability and individual differences in peer judgments. In addition to these 20 reports, 240 experts ’ reports have been evaluated by two peers each to characterize the range of different reliability coeffi- cients. The results of this analysis are published else- where [38]. The reports were randomly selected and addressed medical problems from the three major med- ical indications: surgery/orthopaedics, internal medicine/ general practice, and neurology/psychiatry. The reports must have been drawn up within the last 12 months. Further, the claimant should not have received a medical rehabilitation one year before the work capacity evalu- ation. Reports differ in length depending on individual case and major indication. The evaluation included med- ical experts ’ reports from employed physicians as well as external experts, who were required to comply with the published guidelines for writing reports [39].
Show more

11 Read more

Inter Rater reliability of the Functional Movement Screen (FMS) amongst NHS physiotherapists

Inter Rater reliability of the Functional Movement Screen (FMS) amongst NHS physiotherapists

A recent study conducted by Palmer, Cuff and Lindley (2017) questioned the intra-rater reliability of the FMS amongst NHS clnicians; both specialist musculoskeletal (MSK) and rotational Physiotherapists [ 20 ]. Furthermore, the inter-rater reliability is unclear with reports ranging from fair-excellent [ 16 , 21 ] and has not been determined amongst UK public sector clinicians [ 20 ]. Currently, the inter-rater reliability of the FMS has not been determined amongst clinicians working specifically within the UK public health sector, limiting the general is ability of previous studies where raters have either been Athletic Trainers, Private Physiotherapists or Students. Additionally, only one study has investigated the reliability of the FMS amongst novice raters [ 22 ], however, the 20-hour intensive rater training and use of only Physiotherapy students limits application to clinical practice. The samples of populations used in these studies to date are grossly homogenous utilising healthy physiotherapy students [ 19 , 21 ], military recruits [ 22 ], or high level athletes [ 16 , 23 ], further limiting application to clinical practice and potentially affecting the reliability coefficient produced [ 18 ].
Show more

10 Read more

Inter-rater Reliability of the McKenzie System of Mechanical Diagnosis and Therapy in the Examination of the Knee

Inter-rater Reliability of the McKenzie System of Mechanical Diagnosis and Therapy in the Examination of the Knee

One system that has not been thoroughly tested for use with musculoskeletal pain in the extremity is the McKenzie System of Mechanical Diagnosis and Therapy (MDT). The MDT system of classification uses a non-pathoanatomically specific approach to classify patients based on their response to repeated end range loading strategies. Although demonstrating good inter-rater reliability in the assessment of musculoskeletal spinal pain (Clare, Adams & Maher, 2005; Kilpikoski, Airaksinen, Kankaanpaa, Leminen, Videman & Alen, 2002; Razmjou, Kramer & Yamada, 2000), shoulder pain (Heider Abady, Rosedale, Overend, Chesworth & Rotondi, 2014) and the extremities (Kelly, May, & Ross, 2008; May & Ross, 2009), MDT has not been evaluated on its use in the assessment of musculoskeletal knee pain.
Show more

77 Read more

Computer-aided surface estimation of pain drawings &ndash; intra- and inter-rater reliability

Computer-aided surface estimation of pain drawings &ndash; intra- and inter-rater reliability

2400 areas, and as a whole, the number of areas measured varied only by 3%. The intra-rater reliability was high with intraclass correlation coefficients 0.992 in Examiner A and 0.998 in Examiner B. The intra-individual absolute differences were small within patients within one examiner as well as between the two examiners. The inter-rater reliability was also high. Still, significant differences in the absolute mean areas (13%) were seen between the two examiners in the second to fourth measurement sessions, indicating that one of the examiners measured systematically less. The measurement error was #10%, indicating that use of the program would be advantageous both in clinical practice and in research, but if repeated, preferably with the same examiner. Since pain drawings with this method are digitized, high quality data without loss of information is possible to store in electronic medical records for later analysis, both regarding precise location and size of pain area. We conclude that the computer program Quantify One is a reliable method to calculate the areas of pain drawings.
Show more

7 Read more

Intra rater and inter rater reliability of ultrasonographic measurements of acromiongreater tuberosity distance in patients with post stroke hemiplegia

Intra rater and inter rater reliability of ultrasonographic measurements of acromiongreater tuberosity distance in patients with post stroke hemiplegia

shoulders and 94 shoulders respectively. To our knowledge, this is the first report of intra-rater and inter-rater reliability of AGT distance measurements taken by a physiotherapist following a short period of training using portable ultrasound on patients with stroke older than 50 years. Excellent reliability of measurements suggests that a physiotherapist with minimal training (4 hours) in diagnostic ultrasound is capable of undertaking reliable ultrasound measurements of AGT distance. These results are very encouraging for clinical applications with a potential for immediate feedback for therapeutic choices.
Show more

25 Read more

Development of the Finnish neurological function testing battery for dogs and its intra- and inter-rater reliability

Development of the Finnish neurological function testing battery for dogs and its intra- and inter-rater reliability

The demand for efficient, safe and evidence-based physiotherapy strategies is increasing also in veterinary medicine, creating a need for sensitive validity and reli- ability testing instruments to assess the recovery from and effects of different interventions [13]. To the authors’ knowledge, there are no functional testing batteries that evaluate overall motor function of dogs with neurologi- cal disease. Overall motor function comprises functional everyday tasks like sitting and standing, transitions from lying or sitting to standing, from standing to walking and ambulation at different speeds. No testing battery so far include voluntary motor functions progressing towards more advanced locomotion and activities of daily living, and is as well convenient to use in both clini- cal practice and research. Objective outcome measures should be validated and evaluated for internal consist- ency and intra-and inter-rater reliability [12, 14]. Face validity considers whether users or experts agree that the instrument is measuring what it is intended to meas- ure, and content validity is the degree to which all tasks in the measure assess the same domain of interest [12]. Internal consistency means that all tasks in the instru- ment measure the same attribute [12, 14]. Intra-rater reli- ability is the degree to which scores on the instrument obtained by one trained observer agree with the scores obtained when the same observer administers the meas- ure on another occasion [12]. Inter-rater reliability is the degree to which scores on the instrument obtained by one trained observer agree with the scores obtained by another trained observer [12, 14].
Show more

9 Read more

Inter rater reliability of nursing home quality indicators in the U S

Inter rater reliability of nursing home quality indicators in the U S

Prior to and throughout the course of its implementation, the MDS was repeatedly tested for inter-rater reliability among trained nurse assessors in nursing homes, large and small, for-profit and voluntary, throughout the coun- try. Results of these tests revealed adequate levels of relia- bility when the MDS was first implemented nationally in late 1990.[16] A modified version of the MDS was designed and retested in 1995 and was found to have improved in reliability in those areas with less than ade- quate reliability while sustaining reasonably high reliabil- ity in other areas. [17–19] While testing under research conditions revealed adequate reliability, other studies found comparisons of research assessments with those in the facility chart to be less positive. One study of 30 facil- ities found discrepancies in 67% of the items compared across residents and facilities but that often "errors" were miscoding into adjacent categories and the bias was not systematic (neither "up-coding" exacerbate nor "down- coding" to minimize the condition). Indeed, when relia- bility was assessed using the weighted Kappa statistic, the authors found that many items with poor absolute agree- ment rates did achieve adequate reliability.[20] The Office of the Inspector General undertook an audit in several facilities in 8 different states and also identified discrepan- cies between data in the chart and residents' conditions on independent assessment. [21] Analysis of observed dis- crepancies didn't differentiate between those within one category or those that differed by more than one category in an ordinal scale, suggesting that had a weighted Kappa statistic been used, the results would have been more comparable with those reported by Morris and his colleagues.
Show more

13 Read more

Inter rater reliability of the EPUAP pressure ulcer classification system using photographs

Inter rater reliability of the EPUAP pressure ulcer classification system using photographs

This means that selected photographs can be assessed in a reliable way based on the EPUAP classification. In this, account should be taken that it concerned experts with years of experience in the observation and classification of pressure ulcers and who – insofar as this was necessary – received prior information on the classification. It is to be expected that carers with less experience will also score less reliably. This seems to be indicated by a small study on distinguishing between different forms of redness of the skin by untrained nurses. The inter-rater reliability in this study was small (Pel-Littel, 2003). Further research is required as to whether, Table 2 Classification of the photographs based on the assessments by the nine EPUAP trustees
Show more

8 Read more

Inter-rater reliability of the QuIS as an assessment of the quality of staff-inpatient interactions

Inter-rater reliability of the QuIS as an assessment of the quality of staff-inpatient interactions

Researchers using the QuIS to evaluate the quality of staff/inpatient interactions should check its suitability in new settings, and (possibly as part of staff training) its inter-rater reliability. In practice such studies are likely to follow a similar protocol to that adopted by McClean et al.: involving the multiple observers to be employed in a subsequent main study, over a variety of wards similar to those planned for the main study; and preferably taking place at different times of day. We recommend inter-rater reliability be estimated using our A4 weight- ing scheme and a random effects meta-analytic approach to combining estimates over observation periods, ^
Show more

12 Read more

Measuring the morphological characteristics of thoracolumbar fascia in ultrasound images: an inter rater reliability study

Measuring the morphological characteristics of thoracolumbar fascia in ultrasound images: an inter rater reliability study

Methods: An exploratory analysis was performed using a fully crossed design of inter-rater reliability. Thirty observers were recruited, consisting of 21 medical doctors, 7 physiotherapists and 2 radiologists, with an average of 13.03 ± 9.6 years of clinical experience. All 30 observers independently rated the architectural disorganisation of the thoracolumbar fascia in 30 ultrasound scans, on a Likert-type scale with rankings from 1 = very disorganised to 10 = very organised. Internal consistency was assessed using Cronbach ’ s alpha. Krippendorff ’ s alpha was used to calculate the overall inter-rater reliability.
Show more

6 Read more

Inter rater reliability of three standardized functional tests in patients with low back pain

Inter rater reliability of three standardized functional tests in patients with low back pain

Clinicians use a battery of different clinical tests to inves- tigate muscular coordination of the lumbar spine [16-19]. These tests must be reliable and valid. Clinical tests such as pain provocation tests [19], segmental mobility tests and test of functional muscular coordination [5,17-21] have been reported in the literature to classify patients with a lumbar instability diagnosis. In primary care, patients with non-specific low back pain, with or without associated radiating pain, are frequently encountered. Empirically, many of these patients have an impaired function of the proximal lumbar muscles. To our knowl- edge, there are few standardized and evaluated functional tests examining functional muscular coordination of the lumbar spine which can be used clinically. The aim of the present study was to therefore standardize and examine the inter-rater reliability of three functional tests of mus- cular functional coordination of the lumbar spine in patients with low back pain (LBP).
Show more

8 Read more

A quantitative definition of scaphoid union: determining the inter-rater reliability of two techniques

A quantitative definition of scaphoid union: determining the inter-rater reliability of two techniques

Conclusions: This study describes two methods of quantifying and defining scaphoid union, both with a high inter-rater reliability. This indicates that either method can be reliably used, making it an important tool for both for clinical use and research purposes in future studies of scaphoid fractures, particularly those which are using union or time to union as their endpoint.

5 Read more

Inter rater reliability of the Foot Posture Index (FPI 6) in the assessment of the paediatric foot

Inter rater reliability of the Foot Posture Index (FPI 6) in the assessment of the paediatric foot

The reliability of the FPI-6 has been tested in adults with excellent intra-rater results (ICC 0.92 - 0.93) but moderate inter-rater results (0.52 - 0.65) [7]. Two studies investigat- ing the reliability of the index in a paediatric population have been identified, one of which evaluated the reliabil- ity of the older version of the index (FPI-8) [5]. This study looked at a number of measures of foot position in addi- tion to the FPI-8 and following reliability analysis, ICC values of 0.80 for children and 0.91 for adolescents were presented. More recently, Cain et al [8] investigated the intra-rater and inter-rater reliability of the refined FPI-6 on ten adolescents. Findings from this study reported excel- lent intra-rater reliability (ICC values ranged from 0.81 - 0.92) and good inter-rater reliability (ICC 0.69). How- ever, consideration of the nature of the data generated by the FPI-6 would suggest that analysis using ICCs would be incorrect for the present study unless logit transformed scores are used. This is the process of changing raw FPI-6 scores into a data form suitable for parametric analysis but for this, large data sets are required [9]. Without transfor- mation the index produces categorical data and therefore raw scores should be analysed using Kappa scores, partic- ularly when the data is not normally distributed [10]. In clinical practice it is common that patient care is shared amongst a team of clinicians and therefore, it is vital that any tool used in the assessment of the child is repeatable between clinicians. There is limited evidence looking at the reliability of traditional measures of foot posture in children, however initial research suggests that the FPI-6 is a reliable tool when used in the assessment of the child's foot. This study aims to investigate the inter-rater reliabil- ity of the FPI-6 when used by two experienced observers in the assessment of the paediatric foot.
Show more

6 Read more

Inter-rater reliability of the Foot Posture Index (FPI-6) in the assessment of the paediatric foot

Inter-rater reliability of the Foot Posture Index (FPI-6) in the assessment of the paediatric foot

The reliability of the FPI-6 has been tested in adults with excellent intra-rater results (ICC 0.92 - 0.93) but moderate inter-rater results (0.52 - 0.65) [7]. Two studies investigat- ing the reliability of the index in a paediatric population have been identified, one of which evaluated the reliabil- ity of the older version of the index (FPI-8) [5]. This study looked at a number of measures of foot position in addi- tion to the FPI-8 and following reliability analysis, ICC values of 0.80 for children and 0.91 for adolescents were presented. More recently, Cain et al [8] investigated the intra-rater and inter-rater reliability of the refined FPI-6 on ten adolescents. Findings from this study reported excel- lent intra-rater reliability (ICC values ranged from 0.81 - 0.92) and good inter-rater reliability (ICC 0.69). How- ever, consideration of the nature of the data generated by the FPI-6 would suggest that analysis using ICCs would be incorrect for the present study unless logit transformed scores are used. This is the process of changing raw FPI-6 scores into a data form suitable for parametric analysis but for this, large data sets are required [9]. Without transfor- mation the index produces categorical data and therefore raw scores should be analysed using Kappa scores, partic- ularly when the data is not normally distributed [10]. In clinical practice it is common that patient care is shared amongst a team of clinicians and therefore, it is vital that any tool used in the assessment of the child is repeatable between clinicians. There is limited evidence looking at the reliability of traditional measures of foot posture in children, however initial research suggests that the FPI-6 is a reliable tool when used in the assessment of the child's foot. This study aims to investigate the inter-rater reliabil- ity of the FPI-6 when used by two experienced observers in the assessment of the paediatric foot.
Show more

5 Read more

Inter rater reliability of the evaluation of muscular chains associated with posture alterations in scoliosis

Inter rater reliability of the evaluation of muscular chains associated with posture alterations in scoliosis

Methods: Design: Inter-rater reliability study. Fifty physical therapists (PTs) and two experts trained in GPR assessed the standing posture from photographs of five youths with idiopathic scoliosis using a posture analysis grid with 23 posture indices (PI). The PTs and experts indicated the muscular chain associated with posture alterations. The PTs were also divided into three groups according to their experience in GPR. Experts ’ results (after consensus) were used to verify agreement between PTs and experts for muscular chain and posture assessments. We used Kappa coefficients (K) and the percentage of agreement (%A) to assess inter-rater reliability and intra-class coefficients (ICC) for determining agreement between PTs and experts.
Show more

9 Read more

Evaluating inter-rater reliability of indicators to assess performance of medicines management in health facilities in Uganda

Evaluating inter-rater reliability of indicators to assess performance of medicines management in health facilities in Uganda

Background: To build capacity in medicines management, the Uganda Ministry of Health introduced a nationwide supervision, performance assessment and recognition strategy (SPARS) in 2012. Medicines management supervisors (MMS) assess performance using 25 indicators to identify problems, focus supervision, and monitor improvement in medicines stock and storage management, ordering and reporting, and prescribing and dispensing. Although the indicators are well-recognized and used internationally, little was known about the reliability of these indicators. An initial assessment of inter-rater reliability (IRR), which measures agreement among raters (i.e., MMS), showed poor IRR; subsequently, we implemented efforts to improve IRR. The aim of this study was to assess IRR for SPARS indicators at two subsequent time points to determine whether IRR increased following efforts to improve reproducibility. Methods: IRR was assessed in 2011 and again after efforts to improve IRR in 2012 and 2013. Efforts included targeted training, providing detailed guidelines and job aids, and refining indicator definitions and response categories. In the assessments, teams of three MMS measured 24 SPARS indicators in 26 facilities. We calculated IRR as a team agreement score (i.e., percent of the MMS teams in which all three MMS had the same score). Two sample tests for proportions were used to compare IRR scores for each indicator, domain, and overall for the initial assessment and the following two assessments. We also compared the IRR scores for indicators classified as simple (binary) versus complex (multi-component). Logistic regression was used to identify supervisor group characteristics associated with domain-specific and overall IRR scores.
Show more

12 Read more

Test-Retest and Inter-Rater Reliability Study of the Schedule for Oral-Motor Assessment in Persian Children

Test-Retest and Inter-Rater Reliability Study of the Schedule for Oral-Motor Assessment in Persian Children

Skuse et al. (1995) introduced methods of validation that showed SOMA analysis of test-retest and inter- rater reliability was excellent [18], which is slightly not consistent with our findings. It is noteworthy that analysis in some items of the SOMA is easier than the others. For example, the items which are concerned to oral- motor functions of the child rather than discrete oral-motor behaviors, obtaining a good agreement among raters may be hard. Looking practically, when SLPs were asked to judge an oral-motor functional unit such as “biting” skill as a whole, they were not in a total agreement, but when asked if there is a con- trolled stable biting, they may well agree [12].
Show more

10 Read more

Inter-rater reliability of kinesthetic measurements with the KINARM robotic exoskeleton

Inter-rater reliability of kinesthetic measurements with the KINARM robotic exoskeleton

A major issue in the field of neurorehabilitation is that there is generally a poor understanding of the character- istics and recovery of proprioceptive deficits after stroke. These deficits are often difficult to detect and measure clinically, and can even be mistaken for motor deficits [29]. Currently, there is no gold standard for evaluating proprioception or its’ sub-modalities (position sense, kinesthesia) [1] after stroke. Further, it is thought that many of the current clinical measures of sensory impair- ment are not sensitive enough to detect clinically meaningful changes in proprioceptive function over time [30, 31], necessitating the development of new more sensi- tive measurement tools (i.e., robotics). Clinical tests, such as the Thumb Localizer Test, test position sense directly, and subcomponents of other clinical tests (Rivermead Assessment of Somatosensory Performance, Nottingham Sensory Assessment, Fugl-Meyer) evaluate elements of proprioception. Typically, the inter-rater reliability of many of these measures have been reported anywhere from poor to excellent [13, 32, 33]. Oftentimes, authors will limit an evaluation scale (0–2 vs 0–10), which gener- ally leads to better reliability. However, this is clinically problematic because it typically produces a concomitant decrease in sensitivity to detect change.
Show more

9 Read more

Show all 6454 documents...