首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study evaluated the validity and reliability of a new coding system — The Family Therapist Behavior Scale (FTBS) — that was designed to identify and study clinically relevant verbal behaviors of short-term, problem-oriented family therapists. Validity was assessed by testing the scale's ability to discriminate significant, predicted differences between the in-therapy behaviors of eight beginning family therapists conducting observed interviews and eight advanced family therapists conducting supervisory interviews. All of the sessions, which were initial interviews, were videotaped. Two coders rated three five-minute samples from each of the 16 tapes with the FTBS. The validity results supported over 50 per cent of the 16 research hypotheses. The reliability analysis, based on the actual study data, indicated that the interrater reliability of the 19 category FTBS differed from chance at less than the .001 level of significance. The implications of these findings are examined and future research directions are identified.  相似文献   

2.
This paper demonstrates and compares methods for estimating the interrater reliability and interrater agreement of performance ratings. These methods can be used by applied researchers to investigate the quality of ratings gathered, for example, as criteria for a validity study, or as performance measures for selection or promotional purposes. While estimates of interrater reliability are frequently used for these purposes, indices of interrater agreement appear to be rarely reported for performance ratings. A recommended index of interrater agreement, theT index (Tinsley & Weiss, 1975), is compared to four methods of estimating interrater reliability (Pearsonr, coefficient alpha, mean correlation between raters, and intraclass correlation). Subordinate and superior ratings of the performance of 100 managers were used in these analyses. The results indicated that, in general, interrater agreement and reliability among subordinates were fairly high. Interrater agreement between subordinates and superiors was moderately high; however, interrater reliability between these two rating sources was very low. The results demonstrate that interrater agreement and reliability are distinct indices and that both should be reported. Reasons are discussed as to why interrater reliability should not be reported alone.This paper is based, in part, on a thesis submitted to East Carolina University by the second author. Portions of this study were presented at the American Psychological Association meeting in New Orleans, LA, August, 1989. The authors would like to thank Michael Campion and two anonymous reviewers for their comments on earlier drafts of this paper.  相似文献   

3.
Cooke DJ  Hart SD  Michie C 《心理评价》2004,16(3):335-339
Cross-national differences in the prevalence of psychopathy have been reported. This study examined whether rater effects could account for these differences. Psychopathy was assessed with the Psychopathy Checklist-Revised (PCL-R; R. D. Hare, 1991). Videotapes of 6 Scottish prisoners and 6 Canadian prisoners were rated by 10 Scottish and 10 Canadian raters. No significant main or interaction effects involving the nationality of raters were detected at the level of full scores or factor scores. Using a generalizability theory approach, it was demonstrated that the interrater reliability of total scores was good, that is, the proportion of variance in test scores attributable to raters was small. The interrater reliability of factor scores was lower, typically falling in the fair range. Overall, the results suggest that the reported cross-national differences are more likely to be in the expression of the disorder rather than in the eye of the beholder.  相似文献   

4.
Previous research on measurement error in job performance ratings estimated reliability using coefficients: alpha, test–retest, and interrater correlation. None of these three coefficients control for the four main sources of error in performance ratings. For this reason, coefficient of equivalence and stability (CES) has been suggested as the ideal estimate of reliability. This article presents the estimates of CES for a time interval of 1, 2, and 3 years. The values obtained for a single rater were .51, .48, and .44, respectively. For two raters, the values were .59, .55, and .51. The findings suggest that previous reliability estimates based on alpha, test–retest, and interrater coefficients overestimated the reliability of job performance ratings. In the present study, the interrater coefficient overestimates reliability by 13.6–25.4% for an interval time of 1–3 years, as it does not control for transient error. Results also showed that the importance of transient error increases as the length of the interval between the measures increases. Based on the results, it is suggested that corrected validities based on interrater reliability underestimate the magnitude of the validity. The implications of these findings for future efforts to estimate criterion reliability and predictor validity are discussed.  相似文献   

5.
The present study aimed to test the reliability and validity of the Person Centred and Experiential Psychotherapy Scale–Young Person version (PCEPS-YP). This is a newly developed and adapted 9-item scale which aims to measure counsellor competences in, and adherence to, person-centred practice, when working with adolescents. Counselling practice was assessed for 19 counsellors by randomly selecting 20-min audio segments from 142 recorded counselling sessions. Audio material was independently rated by eight raters using the PCEPS-YP to produce an average adherence rating per counsellor. Scale reliability was assessed via interrater reliability and internal consistency testing. Convergent validity was tested using ratings from the observer-rated Barrett-Leonard Relationship Inventory (BLRI Obs 40), and the scale was subjected to exploratory factor analysis. Results showed a high degree of internal consistency within raters (α = 0.95), marginally acceptable reliability across grouped raters (α = 0.58) and weaker reliability between pairs of raters (α = 0.50). Exploratory factor analysis revealed one strong factor for the scale with no subscales. Small-to-moderate correlations existed between the PCEPS-YP and the BLRI subscales and mean total score (rs = .12 to .40). Our findings suggest that the PCEPS-YP has potential as an effective, reliable and valid tool for assessing competence and adherence in person-centred practice with young people, both for research and for clinical purposes. However, training procedures need to be established that can enhance interrater reliability, and more evidence of convergent validity is needed.  相似文献   

6.
The present study examines the predictive validity of dynamic risk factors for the prediction of sexual recidivism in a sample of pedosexual offenders (N?=?135) released from the Austrian prison system between 2002 and 2005. Static-99 was used to rate static risk factors and in order to measure dynamic risk factors Stable-2000 and Stable-2007 were applied. In addition to the demonstration of results about the interrater reliability the results about the predictive and incremental validity of the dynamic risk assessment are presented. After a mean follow-up period of 5½ years Static-99, Stable-2000 and Stable-2007 showed excellent interrater reliability and good predictive validity for the prediction of sexual recidivism. Furthermore, Stable-2007 showed better predictive accuracy than its predecessor and added incremental predictive validity beyond Static-99.  相似文献   

7.
Interrater correlations are widely interpreted as estimates of the reliability of supervisory performance ratings, and are frequently used to correct the correlations between ratings and other measures (e.g., test scores) for attenuation. These interrater correlations do provide some useful information, but they are not reliability coefficients. There is clear evidence of systematic rater effects in performance appraisal, and variance associated with raters is not a source of random measurement error. We use generalizability theory to show why rater variance is not properly interpreted as measurement error, and show how such systematic rater effects can influence both reliability estimates and validity coefficients. We show conditions under which interrater correlations can either overestimate or underestimate reliability coefficients, and discuss reasons other than random measurement error for low interrater correlations.  相似文献   

8.
The assessment of multiliterate handwriting performance is rarely reported despite increased globalization. The present study describes the psychometric properties of a handwriting speed test developed for children who are biliterate in English and Chinese. This included interrater reliability, test-retest reliability, interitem correlation, construct validity, and concurrent validity. The test's reliabilities between two raters and over a 1-wk. interval were high with ICCs ranging from .89 to .99. Interitem correlation between the English and Chinese items was .87. The presence of age trends but not sex differences was a positive indicator of the test's validity. Correlations of .91 and 1.00 between the Chinese and the English items of the Handwriting Assessment Tool with the Chinese Handwriting Speed Test and Handwriting Speed Test, respectively, provided evidence of concurrent validity. These preliminary results showed the Handwriting Assessment Tool is reliable and is a potentially useful handwriting test for children biliterate in English and Chinese. The feasibility of assessing biliterate handwriting speed performance with the same set of scoring criteria for different writing systems was supported.  相似文献   

9.
Although a demand analysis is helpful for identifying potential establishing operations for the functional analysis (FA) demand condition, it may not always be practical due to time constraints. A potential alternative is the Negative Reinforcement Rating Scale (NRRS), an indirect assessment tool that may serve as a time efficient alternative to a demand analysis. The experimenter assessed the reliability and validity of the NRRS for 5 individuals with autism spectrum disorder who exhibited problem behavior. Multiple types of interrater reliability were assessed across 2 informants, and NRRS outcomes were compared to a subsequent demand analysis and FA to assess its validity. Reliability was high (M = 84%) for NRRS numerical ratings of categories but low (M = 32.9%) for specific examples provided. NRRS-identified highly aversive tasks yielded better correspondence with demand analysis outcomes than did NRRS-identified less aversive tasks.  相似文献   

10.
The Functional Assessment Rating Scale was developed as a measure of psychiatric symptomatology and psychosocial impairments. This study was designed to report estimates of reliability and validity with a population of schizophrenic patients. The scale showed very good interrater agreement, test-retest reliability, construct validity, and concurrent validity, so the scale seems a useful measure of psychopathology which may be used to assess and monitor patients displaying severe mental illnesses.  相似文献   

11.
The Self-inflicted Injury Severity Form (SIISF) was developed as an epidemiological research tool for identifying individuals in hospital emergency departments who have life-threatening self-inflicted injuries. Data were collected from 715 patients with self-inflicted injuries in two large hospitals. In 295 of these cases, a second set of data was independently collected for assessment of interrater reliability. Validity was assessed by comparing the SIISF results with simultaneously collected Risk—Rescue Ratings. Assessment of interrater reliability found that only 2.4% of physicians disagreed on the suicide method used. The kappa statistic for method used was .94, indicating excellent agreement. The SIISF was found to distinguish between severe and less severe injuries. Thus, it appears to provide a simple method to distinguish patients who have life-threatening self-inflicted injuries.  相似文献   

12.
This study evaluated the reliability and validity of the Cleveland Scale for Activities of Daily Living (CSADL), a scale designed to measure in detail specific activities of daily living in individuals with dementia. Administered to knowledgeable informants by trained examiners, the CSADL demonstrated good reliability in terms of interrater agreement and internal consistency. The validity of CSADL total scores was shown by its sensitivity to degree of cognitive impairment: All comparisons between means of the healthy elderly group and three groups of AD patients differing in severity were statistically significant. The CSADL was highly correlated with the Blessed–Roth Dementia Scale (DS-ADL) and more highly correlated with Mini-Mental State Exam scores than was the DS-ADL.  相似文献   

13.
This article reports the validation of the Adolescent Psychotherapy Q-set (APQ), a newly developed instrument, adapted from the well-established Psychotherapy Q-Set (PQS) and the Child Psychotherapy Q-set (CPQ). The APQ aims to describe the psychotherapy process in the treatment of adolescents in a form suitable for quantitative comparison and analysis. The validation was conducted with the ratings of 70 audio-recorded youth psychotherapy sessions from a range of therapists, patients, and treatment stages, using two therapeutic approaches (Short-Term Psychoanalytic Psychotherapy and Cognitive Behavioral Therapy). Data analysis included intraclass correlation coefficients, Q-factor analysis, nonparametric mean differences, and Pearson correlations. Results suggest that the APQ has good levels of interrater reliability, is able to identify differences and similarities of two therapeutic approaches, and good convergent and discriminant validity with a widely-used measure of therapist behaviors (the Comparative Psychotherapy Process Scale). The APQ reported good levels of validity and reliability. It is hoped that it will contribute to new ways of investigating the mechanisms of therapeutic change for those working with adolescents.  相似文献   

14.
Based on the sentence completions of a psychiatric patient, ratings were assigned on a number of personality dimensions by six clinical psychologists and 10 graduate students. Independent ratings by a psychologist and a psychiatrist who had interviewed the patient and had access to all the clinical background material, including case history, autobiography, and other tests, served as the criterion. A high degree of validity and interrater reliability was obtained by both clinicians and students with insignificant differences between them. Confidence in judgment was associated with extreme ratings but not with higher validity.  相似文献   

15.
《人类行为》2013,26(1):71-88
Cross-job retraining is becoming a viable option for coping with increasingly rapid technological changes in the workplace. In this study, we used data col- lected from 836 supervisors in 43 U.S. Air Force enlisted jobs to compare global versus decomposed estimates of cross-job retraining time in terms of interrater reliability and convergent validity. Convergent validities of retrain- ing time estimates were assessed in terms of their correlations with each other and with two additional determinants of retraining ease: learning difficulty of the new job, and old-versus-new job differences in aptitude requirements. In general, the reliabilities for the global and decomposed judgments were com- parable. Additional correlational results supported the convergent validities of both the global and decomposed retraining time estimates.  相似文献   

16.
The authors describe the development of the Suicide Attempt Self-Injury Interview (SASII), an instrument designed to assess the factors involved in nonfatal suicide attempts and intentional self-injury. Using 4 cohorts of participants, authors generated SASII items and evaluated them with factor and content analyses and internal consistency statistics. The final measure was assessed for reliability and validity with collateral measures. The SASII assesses variables related to method, lethality and impulsivity of the act, likelihood of rescue, suicide intent or ambivalence and other motivations, consequences, and habitual self-injury. The SASII was found to have very good interrater reliability and adequate validity.  相似文献   

17.
The purpose of this study was to assess the psychometric properties of the Zanarini Rating Scale for Borderline Personality Disorder (ZAN-BPD), the first clinician-administered scale for the assessment of change in DSM-IV borderline psychopathology. The questions for the measure were adapted from the BPD module of the Diagnostic Interview for DSM-IV Personality Disorders (DIPD-IV) to reflect a 1-week time frame and each of the nine criteria for BPD is rated on a five-point anchored rating scale of 0 to 4, yielding a total score of 0 to 36. Two diagnostic interviews that assess the presence of BPD were administered to 200 nonpsychotic patients: the BPD module of the DIPD-IV and the Revised Diagnostic Interview for Borderlines (DIB-R). The ZAN-BPD was also administered, blind to diagnostic information. In addition, each patient filled out a self-report measure of general psychopathology that is often used in borderline treatment studies, the Symptom Checklist 90 (SCL-90). The convergent validity of the ZAN-BPD and relevant scales of the SCL-90 and the DIB-R was assessed and found to be highly significant. The discriminant validity of the various scores of the ZAN-BPD was also found to be highly significant, easily discriminating the 139 patients who met the DSM-IV criteria for BPD from the 61 patients who did not. In addition, internal consistency of the ZAN-BPD was found to be high (Cronbach's alpha=0.85). The interrater reliability of the ZAN-BPD was assessed using 32 conjoint interviews, while same day test-retest reliability was assessed in a separate sample of 40 patients. All reliability raters were blind to all previously collected information concerning each subject. All intraclass correlations were in the good to excellent range. Finally, the sensitivity of the ZAN-BPD to change was assessed using a third sample of 41 patients who were reinterviewed by a blind rater 7 to 10 days after the ZAN-BPD was first administered. The SCL-90 was also readministered at this time. The correlations between difference scores of the ZAN-BPD and difference scores of the SCL-90 were found to be significant, indicating that the ZAN-BPD measures change in a clinically meaningful manner. Taken together, the results of this study suggest that the ZAN-BPD is a promising clinician-administered scale for the assessment of change in borderline psychopathology over time.  相似文献   

18.
To address the lack of a simple and standardized instrument to assess overall illness severity of Tourette's disorder (TD), the authors developed and tested a 15-item scale to measure a broad range of common symptoms including tics, inattention, hyperactivity, obsessions, compulsions, aggression, and emotional symptoms. Independent investigators used the 15-item Tourette's Disorder Scale (TODS) to assess 60 TD patients who were taking part in a double-blind placebo-controlled multicenter 8-week treatment study. Interrater reliability, internal consistency, convergent and discriminant validity, and sensitivity to change were examined. The TODS was associated with good interrater reliability, excellent internal consistency, and favorable levels of validity and sensitivity to change. Individual TODS items showed good convergent and discriminant validity against other measures. The TODS is a simple, efficient way for clinicians and parents to rate the severity of multiple symptoms commonly found in patients with Tourette's disorder.  相似文献   

19.
20.
The purpose of this study was to determine the initial reliability and validity of a screening instrument developed to detect problematic interactions between infants and parents as part of a pediatric well‐baby exam. Participants included 117 infant–mother dyads (57 preterms and 60 full terms) assessed when infants were 6 to 9 months old. Mothers and infants were observed playing an interactional game such as peek‐a‐boo during the course of the pediatric exam. The game was scored for degree of interactional reciprocity using the Pediatric Infant Parent Exam (PIPE). Acceptable levels of interrater reliability were achieved. As predicted, higher risk infants and their mothers exhibited more problematic interactions than lower risk infants and their mothers. Results indicated that the PIPE was a reliable means of screening for interactional difficulties, that was sensitive to, but not synonymous with, neonatal health indices. ©2001 Michigan Association for Infant Mental Health.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号