首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper demonstrates and compares methods for estimating the interrater reliability and interrater agreement of performance ratings. These methods can be used by applied researchers to investigate the quality of ratings gathered, for example, as criteria for a validity study, or as performance measures for selection or promotional purposes. While estimates of interrater reliability are frequently used for these purposes, indices of interrater agreement appear to be rarely reported for performance ratings. A recommended index of interrater agreement, theT index (Tinsley & Weiss, 1975), is compared to four methods of estimating interrater reliability (Pearsonr, coefficient alpha, mean correlation between raters, and intraclass correlation). Subordinate and superior ratings of the performance of 100 managers were used in these analyses. The results indicated that, in general, interrater agreement and reliability among subordinates were fairly high. Interrater agreement between subordinates and superiors was moderately high; however, interrater reliability between these two rating sources was very low. The results demonstrate that interrater agreement and reliability are distinct indices and that both should be reported. Reasons are discussed as to why interrater reliability should not be reported alone.This paper is based, in part, on a thesis submitted to East Carolina University by the second author. Portions of this study were presented at the American Psychological Association meeting in New Orleans, LA, August, 1989. The authors would like to thank Michael Campion and two anonymous reviewers for their comments on earlier drafts of this paper.  相似文献   

2.
The evidence-base services literature is continually growing, providing the field with rich and important sets of information regarding what works for treating different types of youth and families. Given this burgeoning of information, the PracticeWise Evidence-Based Services (PWEBS) Literature Database has been developed to aid in summarizing and delivering aggregated evidence-based treatment information to providers in the field. Meanwhile, the Child and Adolescent Needs and Strengths-Mental Health (CANS-MH) Scale is a youth mental health assessment tool that was developed by a separate team to assist with treatment planning. In the present study, we developed and tested a system for linking these two related ontological systems so that scientific knowledge can be more widely aggregated and made available to a wider set of audiences for enhanced mental health service delivery. Results revealed the following. First, a construct mapping comparison revealed that the CANS-MH and PWEBS ontologies share a strong core of overlapping content, particularly in the areas of Youth Behavioral/Emotional Needs, Youth Risk Behaviors, and Life Domain Functioning. Second, the CANS-MH areas were able to be used to reliably code the following components of published randomized treatment studies: (a) population sample characteristics (e.g., did the characteristics of the treatment study participant population relate to each CANS-MH area?), and (b) outcome measure targets (e.g., did the treatment study outcome measure target areas relate to each CANS-MH area?). The reliability achieved from this coding process supported the linkage between the CANS-MH areas and the PWEBS Literature Database information. Lastly, high agreement was achieved between an automated translation algorithm and the final ratings from the manual coding of published treatment studies using the CANS-MH scale. The importance of such linkages for the communication of ideas, information, and evidence across differing subfields is discussed, as well as examples of achieving enhanced quality of mental health services by linking system ontologies.  相似文献   

3.
This study examines the clinical utility of behavior ratings made by nonclinician examiners during assessments of preschool children with Attention-Deficit/Hyperactivity Disorder (AD/HD). Matched samples of children with (n = 127) and without (n = 125) AD/HD were utilized to test the internal, convergent, concurrent, and incremental validity of ratings completed by examiners on the Hillside Behavior Rating Scale (HBRS). Results indicated that HBRS ratings were internally consistent, possessed sufficient interrater reliability, and were significantly associated with parent and teacher reports of AD/HD when controlling for age, gender, intelligence, and symptoms of other psychopathology. HBRS ratings also were significantly associated with other measures of functioning, and provided a significant increment in the prediction of impairment over parent and teacher report alone. These findings suggest that behavioral ratings during testing provide a unique source of clinical information that may be useful as a supplement to parent and teacher reports.  相似文献   

4.
While well-established attachment measures have been developed for infancy, early childhood, and adulthood, a "measurement gap" has been identified in middle childhood, where behavioral or representational measures are not yet sufficiently robust. This article documents the development of a new measure--the Child Attachment Interview (CAI)--which seeks to bridge this gap. The CAI is a semistructured interview, in which children are invited to describe their relationships with their primary caregivers. The coding system is informed by the Adult Attachment Interview and the Strange Situation Procedure, and produces 4 attachment categories along with a continuous measure of attachment security based on ratings of attachment-related dimensions. The main psychometric properties are presented, including interrater reliability, test-retest reliability, and concurrent and discriminant validities, both for normally developing children and for those referred for mental health treatment. The CAI correlates as expected with other attachment measures and predicts independently collected ratings of social functioning. The findings suggest that the CAI is a reliable, valid, and promising measure of child-parent attachment in middle childhood. Directions for improvements to the coding system are discussed.  相似文献   

5.
We evaluated the reliability and validity of the Dyadic Observed Communication Scale (DOCS) coding scheme, which was developed to capture a range of communication components between parents and adolescents. Adolescents and their caregivers were recruited from mental health facilities for participation in a large, multi-site family-based HIV prevention intervention study. Seventy-one dyads were randomly selected from the larger study sample and coded using the DOCS at baseline. Preliminary validity and reliability of the DOCS was examined using various methods, such as comparing results to self-report measures and examining interrater reliability. Results suggest that the DOCS is a reliable and valid measure of observed communication among parent-adolescent dyads that captures both verbal and nonverbal communication behaviors that are typical intervention targets. The DOCS is a viable coding scheme for use by researchers and clinicians examining parent-adolescent communication. Coders can be trained to reliably capture individual and dyadic components of communication for parents and adolescents and this complex information can be obtained relatively quickly.  相似文献   

6.
This study estimated the validity and interrater reliability for the Devereux Child Behavior Rating Scale when completed by classroom teachers. The behavior of 90 preadolescent males who were diagnosed as either normal, hyperactive, or emotionally disturbed was rated by two classroom teachers using the DCB. Interrter reliability estimates found between teachers' rating were not significantly different than those reported with mental health professionals. A stepwise discriminant analysis was used to evaluate the utility of teachers' ratings in predicting group membership. Results indicated teachers' ratings on the DCB differentiated significantly between diagnostic categories.  相似文献   

7.
Early detection and treatment promote positive outcomes in mental health problems among infants. This study developed a simple and reliable screening inventory for infants’ mental health. Participants were 579 primary caregivers who had Japanese infants aged 2–6 years. Participants evaluated their children using the Mental Health Inventory for Infants (MHII; developed in this study), which contains 24 items. Exploratory factorial analysis identified the MHII factor structure; confirmatory factorial analysis examined its factorial validity. Internal consistency and criterion-related validity were also examined. Irritability (8 items), somatic symptoms (6 items), and signs of insecurity (4 items) were identified as factors in the MHII; each of these factors measures a critical aspect of infants’ mental health. The MHII’s internal consistency and scale homogeneity were acceptable; its criterion-related validity was supported. In this study, male infants exhibited greater irritability and less insecurity than females. The present results support the MHII’s reliability and validity; additionally, they indicate that caregivers may use the MHII to quickly screen for three critical aspects of infants’ mental health. We expect that the MHII will be used for early detection of mental health difficulties in infants to facilitate treatment.  相似文献   

8.
The BB-JuSt is a newly developed standardized instrument used in juvenile correctional settings to document the results of the inital assessment of treatment and educational needs of young offenders. It is made up of 23 items with 5-point rating scales (with the exception of the item caring for a child) which refer to specific needs and responsivity factors relating to educational attainments and basic reading/writing and mathematic skills, alcohol/drug/gambling problems, criminogenic disposition (e.g., aggressiveness), psychological disorders, lifestyle and social environment (e.g. associates and family). To determine the interrater reliability of this instrument 42 young offenders were classified by professional prison staff as usual, and additionally by 2 external researchers. Whereas excellent agreement between the external researchers was achieved on all items, the comparison between staff and researchers showed only moderate correlations. These results indicate that the BB-JuSt is a reliable instrument that can be used for treatment planning decisions and for research purposes but extensive training is required for users.  相似文献   

9.
The test-retest reliability of the Spanish Diagnostic Interview Schedule for Children (DISC-IV) is presented. This version was developed in Puerto Rico in consultation with an international bilingual committee, sponsored by NIMH. The sample (N = 146) consisted of children recruited from outpatient mental health clinics and a drug residential treatment facility. Two different pairs of nonclinicians administered the DISC twice to the parent and child respondents. Results indicated fair to moderate agreement for parent reports on most diagnoses. Relatively similar agreement levels were observed for last month and last year time frames. Surprisingly, the inclusion of impairment as a criterion for diagnosis did not substantially change the pattern of results for specific disorders. Parents were more reliable when reporting on diagnoses of younger (4–10) than older children. Children 11–17 years old were reliable informants on disruptive and substance abuse/dependence disorders, but unreliable for anxiety and depressive disorders. Hence, parents were more reliable when reporting about anxiety and depressive disorders whereas children were more reliable than their parents when reporting about disruptive and substance disorders.  相似文献   

10.
The present study aimed to test the reliability and validity of the Person Centred and Experiential Psychotherapy Scale–Young Person version (PCEPS-YP). This is a newly developed and adapted 9-item scale which aims to measure counsellor competences in, and adherence to, person-centred practice, when working with adolescents. Counselling practice was assessed for 19 counsellors by randomly selecting 20-min audio segments from 142 recorded counselling sessions. Audio material was independently rated by eight raters using the PCEPS-YP to produce an average adherence rating per counsellor. Scale reliability was assessed via interrater reliability and internal consistency testing. Convergent validity was tested using ratings from the observer-rated Barrett-Leonard Relationship Inventory (BLRI Obs 40), and the scale was subjected to exploratory factor analysis. Results showed a high degree of internal consistency within raters (α = 0.95), marginally acceptable reliability across grouped raters (α = 0.58) and weaker reliability between pairs of raters (α = 0.50). Exploratory factor analysis revealed one strong factor for the scale with no subscales. Small-to-moderate correlations existed between the PCEPS-YP and the BLRI subscales and mean total score (rs = .12 to .40). Our findings suggest that the PCEPS-YP has potential as an effective, reliable and valid tool for assessing competence and adherence in person-centred practice with young people, both for research and for clinical purposes. However, training procedures need to be established that can enhance interrater reliability, and more evidence of convergent validity is needed.  相似文献   

11.
Efforts to determine the prevalence of serious emotional disturbance in preschool-aged children have been hampered by the lack of a validated measure. The Preschool and Early Childhood Functional Assessment Scale (PECFAS) is a multi-dimensional measure that assesses the psychosocial functioning of children aged 3–7 years. The concurrent validity and reliability of the PECFAS were assessed in a sample of 30 preschool-aged children in a large Head Start program in Ventura, California. PECFAS ratings based on in-depth interviews were significantly related to parental ratings that the children had mental health problems, psychiatric diagnoses, teacher ratings of the child's need for mental health evaluations, teacher ratings of behavior problems on a standardized screening inventory (DIAL-R), and actual referrals for mental health evaluations. Interrater reliability for the total PECFAS score was high (r = .90) as was internal consistency of the five subscales (alpha = .86). Using the PECFAS scores as a standard, the weighted prevalence of serious emotional disturbance in this West Coast Head Start program was 17%, at the lower end of the current estimated rate of SED for older children in low income samples (18–26%).  相似文献   

12.
The aims of this study were to investigate the reliability of ICD‐10 and DC 0–3 in the diagnostic classification of mental health problems in 1½ ‐year‐old children from the general population. The reliability study was conducted as a part of an epidemiological survey of psychopathology in 1½ ‐year‐old children from the general population. In this survey, the children were assessed and diagnosed according to the ICD‐10 and the DC 0–3 after a 2‐hr session including standardized and clinical methods and videorecordings. The case records and video material of 18 children were rediagnosed by the three child psychiatrists, who had diagnosed children in the epidemiological survey. In general, the reliability in diagnostic classification of mental health problems in 1½‐year‐old children was improved with the DC 0–3 compared to the ICD‐10. In the classification of psychopathology at Axis I, the interrater reliability and test‐retest reliability kappas were 0.66 and 0.57, respectively, with the ICD‐10, and 0.72 and 0.74, respectively, with the DC 0–3. The reliability of the classification of relationship disturbances at Axis II with the DC 0–3 was high, corresponding to κ = 1. A high agreement among raters in the differentiation between psychopathology and normal variations was found. Given experienced clinicians and standardized assessment methods, it is possible to reliably identify and diagnose psychopathology in 1½‐year‐old children from the general population.  相似文献   

13.
编制适合我国大学生的心理健康筛查工具具有重要意义。本研究首先通过文献分析、实际调研及专家研讨的方式构建出量表的三个筛查级别,22个维度指标,并据此发展出具体项目。再通过对890名普通大学生和67名正在接受咨询的临床大学生样本进行预试,对另810名大学生进行正式测试,以及多轮专家评估来检验和修订项目,最终形成中国大学生心理健康筛查量表。结果表明,中国大学生心理健康筛查量表的模型结构合理,拟合良好;量表各项目区分度良好,信度和效度符合心理测量学要求;咨询求助和未求助大学生在量表以及各维度上的得分均有显著差异。因此,该量表可作为中国大学生心理健康筛查的测量工具。  相似文献   

14.
There is emerging evidence that the performance of risk assessment instruments is weaker when used for clinical decision‐making than for research purposes. For instance, research has found lower agreement between evaluators when the risk assessments are conducted during routine practice. We examined the field interrater reliability of the Short‐Term Assessment of Risk and Treatability: Adolescent Version (START:AV). Clinicians in a Dutch secure youth care facility completed START:AV assessments as part of the treatment routine. Consistent with previous literature, interrater reliability of the items and total scores was lower than previously reported in non‐field studies. Nevertheless, moderate to good interrater reliability was found for final risk judgments on most adverse outcomes. Field studies provide insights into the actual performance of structured risk assessment in real‐world settings, exposing factors that affect reliability. This information is relevant for those who wish to implement structured risk assessment with a level of reliability that is defensible considering the high stakes.  相似文献   

15.
Halo effects in the assessment of ADHD and ODD were examined. Participants were 159 undergraduate college students who rated children described as showing disruptive behaviors. Bidirectional halo effects were found. Specifically, the presence of oppositionality artificially inflated ratings of inattention and hyperactivity, and the combined presence of inattention and hyperactivity artificially inflated ratings of oppositionality. Several specific items were found to be particularly susceptible to halo effects. Due to these halo effects caution should be exercised when diagnosing multiple behavior disorders, especially with items found to be particularly susceptible. Clinical interviews conducted by mental health professionals may help distinguish between the true presence of multiple disorders and halo effects based on ratings. Future research should determine whether structured interviews conducted by mental health professionals are less susceptible to halo effects than rating scales.  相似文献   

16.
A videotape-administered role-play test of children's social skills was developed and its psychometric properties tested. Performance criteria for the test were derived from popular children's ratings of the effectiveness of different role-play responses. The test was administered to 157 fourth- and fifth-grade boys and girls who had been classified as popular, average, neglected, or rejected, on the basis of sociometric testing. The test evidenced good interrater, test-retest, and internal consistency reliabilities. Children's role-play performance correlated significantly with teacher ratings of social competence and with peer-liking ratings. Neglected children performed more poorly on the role-play test than popular children. When differences in intelligence among social status groups were statistically controlled, social status groups did not differ on the role-play test. Results of the discriminant analyses support the conclusion that teacher ratings are better than role-play tests for identifying rejected children, whereas role-play tests and measures of intelligence appear more accurate than teacher ratings for identifying neglected children.  相似文献   

17.
Strengths can have a potent effect in mitigating the impact of trauma on mental health needs and functioning. Yet, evidence is limited on the role that strengths may have in ameliorating trauma-related or mental health symptoms over time. Providing a comprehensive assessment that includes strengths, as well as needs, is an important step in making appropriate service recommendations for youth in child welfare. This study assessed 7,483 children and adolescents entering an intensive stabilization program through the Illinois child welfare system. The interaction of individual, child strengths in relation to complex trauma exposure, traumatic stress symptoms, risk behaviors, and other mental health needs were examined. Results indicated strengths are relatively stable over time and inversely associated with several negative outcomes, including risk behaviors (?.32, p?<?.001), emotional/ behavioral needs (?.33, p?<?.001) and overall functioning (?.47, p?<?.001). Traumatic stress symptoms were also related to increases in these negative outcomes. Overall, strengths had a buffering effect on traumatic stress symptoms and outcomes over time. The role of strengths in relation to traumatic stress symptoms, however, was less consistent. Youth with histories of complex trauma exposure had significantly fewer useable strengths than youth without this exposure. However, strengths improved for both youth with and without complex trauma exposure over the course of stabilization services. These findings suggest that early identification and development of child strengths can mitigate risk-taking behaviors, mental health, and functional difficulties among youth in the child welfare system. Implications for more targeted trauma-informed and strengths-based assessment, and treatment/service planning are discussed.  相似文献   

18.
Much debate has centered on what are reasonable outcomes of the short-term intensive family preservation services (IFPS). However, little attention has been given to how therapists actually formulate outcomes in their practices. The files of 98 families who used IFPS were reviewed to determine how therapists formulated outcomes and whether formulated outcomes varied by service sector (child welfare or mental health) and child age. It was found that formulated outcomes in mental health were more likely than those in child welfare to have a child focus and an interpersonal locus. Variation in outcome formulation in child welfare by child age was found, with outcomes of younger children more likely to be parent-focused than were outcomes of older children. The issues pointed out by these findings are discussed. Since case records are a potential data source for researchers, the paper concludes with a discussion of the strengths and limitations of case record reviews for research purposes.  相似文献   

19.
Forty-six children with enuresis were given a psychiatric interview. The two writers made independent ratings of 10 behavioral variables. Seven of these showed satisfactory interrater reliability. No relationship was established between child psychiatric disturbance assessed in this way and estimates of disorder obtained from information given by mothers and teachers.Professor Max Hamilton, Department of Psychiatry, University of Leeds, kindly provided computing facilities. The work was supported by a grant from the Yorkshire Regional Health Authority.  相似文献   

20.
Although the positive psychology tradition emphasizes the importance of a balanced approach regarding individual strengths and weaknesses, there is no valid instrument to measure these phenomena in organizations. The purpose of the present studies is to develop and validate an instrument that measures four dimensions, namely perceived organizational support (POS) for strengths use, POS for deficit correction, strengths use behaviour, and deficit correction behaviour. In study 1 and 2, the Strengths Use and Deficit COrrection (SUDCO) questionnaire was developed and tested for its factor structure, reliability, and convergent and criterion validity in two samples of South African employees (N = 338 and N = 361, respectively). In study 3, the convergent and criterion validity of the SUDCO were examined in a sample of Dutch engineers (N = 133). Results indicated that the intended dimensions of strengths use and deficit correction can be measured reliably with 24 items and showed convergent validity. Moreover, POS for strengths use and strengths use behaviour correlated positively with self- and manager-ratings of job performance, supporting the criterion validity of these scales. As expected, POS for deficit correction and deficit correction behaviour were unrelated to the performance ratings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号