首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Inter‐rater reliability and accuracy are measures of rater performance. Inter‐rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter‐rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert‐generated criterion ratings and between raters using intraclass correlation (2,1). Inter‐rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter‐rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter‐rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter‐rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

2.
Inter-rater reliability and accuracy are measures of rater performance. Inter-rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter-rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert-generated criterion ratings and between raters using intraclass correlation (2,1). Inter-rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter-rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter-rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter-rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

3.
《Behavior Therapy》2018,49(6):951-965
Self-help interventions for parents, which have a behavioral basis, are considered to be an effective treatment option for children with externalizing disorders. Nonbehavioral approaches are widely used but have little empirical evidence. The main objective of this trial was to compare the efficacy of a behavioral and a nonbehavioral guided self-help program for parents. Families of children (aged 4–11 years) diagnosed with attention-deficit/hyperactivity disorder (ADHD) or oppositional defiant disorder (ODD) were randomized to either a behavioral or a nonbehavioral guided self-help program including 8 parenting booklets and 10 counseling telephone calls. The analyses considered the ratings of 5 informants: blinded clinician, therapist, participant, (her or his) partner, and teacher. Of the 149 families randomized to treatment (intention-to-treat sample [ITT]), 110 parents completed the intervention (per-protocol sample [PP]). For the 4 primary outcome measures (blinded clinician- and participant-rated ADHD and ODD) at post-assessment, the analysis revealed a treatment advantage for the behavioral group in blinded clinician-rated ODD symptoms (ITT: d = 0.37; PP: d = 0.35). Further treatment differences, all in favor of the behavioral group (ITT and PP), were detected in therapist ratings (i.e., ODD) and participant ratings (e.g., parental self-efficacy [only PP], negative parenting behavior, parental stress). In both samples, no differences were found at post-assessment for ratings of the partner and the teacher, or at the 12-month follow-up (only participant ratings available). Behavioral guided self-help shows some treatment advantage in the short term. No superiority over nonbehavioral therapy was detected 12 months after treatment termination.  相似文献   

4.
The shift of paradigm from psychodynamic therapy to behavior modification has changed the views of assessment and challenged traditional broad trait concepts. Behavioral assessment has used narrow, situation-specific trait tests, state self-report tests given in situations, and behavioral observations and performance ratings. Comparison of these types of measures are reported from a study of fear reactions in three situations. Narrow trait measures are generally more predictive of behavior than broad trait measures. State measures are even more predictive when given just before the performance. But only a sampling of such state measures can be used to define a trait because of the lower reliability of states. It is argued that behavior in situations is only predictable when an adequate number of behavioral samples is used.  相似文献   

5.
The purpose of this study was to explore the efficacy of a teacher behavior rating instrument for identifying special needs students. Using a modified form of theDevereux Elementary School Behavior Rating (DESB)Scale 35 kindergarten through grade 6 regular classroom teachers completed ratings on all of their 876 students. Subsequently, extensive aptitude, academic, social, and behavioral assessment was conducted and those students were identified who were in need of supportive educational programming to function adequately within the regular class setting. Analysis of the teacher behavior ratings indicated a highly significant difference between those students identified for special supportive services and their regular classroom peers on 9 of 11 behavioral factors. The findings lend clear support for the use of classroom teachers' behavior ratings in the identification process.  相似文献   

6.
We videotaped 216 twin children (average age: 7.6 years) hitting a 5-ft “Bobo clown”. Three behaviors in the Bobo clown situation showed adequate response characteristics, rater reliability, and test-retest reliability: number of hits, intensity of hits, and number of quadrants (into which the Bobo clown was hit). In terms of two “anchor” variables, height and weight, the twin correlations were representative of other studies in suggesting substantial hereditary influence. However, twin analyses of the three behavioral ratings during the Bobo clown session yielded no evidence of hereditary influence. Moreover, the results provided evidence of substantial environmental influence of the “between-family” variety. In other words, the family environment is the major source of individual differences for these measures.  相似文献   

7.
This study examines the clinical utility of behavior ratings made by nonclinician examiners during assessments of preschool children with Attention-Deficit/Hyperactivity Disorder (AD/HD). Matched samples of children with (n = 127) and without (n = 125) AD/HD were utilized to test the internal, convergent, concurrent, and incremental validity of ratings completed by examiners on the Hillside Behavior Rating Scale (HBRS). Results indicated that HBRS ratings were internally consistent, possessed sufficient interrater reliability, and were significantly associated with parent and teacher reports of AD/HD when controlling for age, gender, intelligence, and symptoms of other psychopathology. HBRS ratings also were significantly associated with other measures of functioning, and provided a significant increment in the prediction of impairment over parent and teacher report alone. These findings suggest that behavioral ratings during testing provide a unique source of clinical information that may be useful as a supplement to parent and teacher reports.  相似文献   

8.
As schools increasingly adopt universal social, emotional, and behavioral screening, more research is needed to examine the effects of between-teacher differences due to error and bias on students' teacher-rated screening scores. The current study examined predictors of between-teacher differences in students' teacher-rated risk across one global and three narrow domains of behavioral functioning. Participants included 2450 students (52.1% male, 54.2% White) and 160 teachers (92.1% female, 80.3% White) from four elementary schools in one Southeastern U.S. school district. Teachers rated student behavior on the Behavior Assessment System for Children (Third Edition) Behavioral and Emotional Screening System (BESS)-Teacher Form and completed a survey about their training and perspectives of common behavior problems. Results of multilevel linear regression found between-teacher effects to be greater for internalizing risk scores (intraclass correlation = 0.23) than for externalizing risk scores (intraclass correlation = 0.12) or adaptive behavior scores (intraclass correlation = 0.14). Statistically significant student predictors in most models included student grade, gender, race and/or ethnicity, office discipline referrals, and course grades. We also detected effects of several teacher-level variables in one or more of the models, including teacher gender, teacher ratings of problem severity and concern for hypothetical children displaying behavior problems, and the covariance of random teacher intercept and teacher random slopes for students' office discipline referrals. Although these factors explained some teacher-level variance in students' risk scores, a notable amount of variance between teachers remains unexplained. Future research is needed to fully understand, reduce, and account for differences between teacher ratings due to error and bias.  相似文献   

9.
The present study evaluated the effectiveness of a behavioral group counseling program for parent members of the Association for Children with Learning Disabilities. Twentytwo mothers were assigned to two treatment groups (N=5 and N=6) and a control group (N=11). Treatment-group mothers received a series of eight weekly 11/2-hour sessions in which they were taught basic principles and procedures of behavior modification which they could apply to specific childrearing problems. Multiplesuccess criteria (maternal reports, direct observation, frequency counts, and attitudinal measures) were employed to provide a broad-based measurement of outcome. Results indicated that treatment ratings of childrens' conduct and disruption and parental postbehavioral observations of mother-child interactions showed improvement for the behavioral-counseling groups while control-group ratings and behavior observations remained the same. All treatment-group changes were maintained at 3-month follow-up. Consistency of treatment-group data across measures and over time suggests the effectiveness of this approach as a training method. Implications for future research were discussed.  相似文献   

10.
Students in a residential special school for children with emotional and behavioral disorders participated in a study designed to reduce their levels of inappropriate behavior. The residential care staff rated the students' behavioral problems and their class teachers rated their overt self-esteem pre and post intervention. In addition, the students completed self-ratings of their self-esteem. The students were divided into two groups, experimental and control. A multiple baseline across behaviors design was used to assess behavioral changes in the experimental group. Both groups received tangible rewards to the same level but only the experimental group received them contingent upon behaving appropriately. Results showed that the experimental group students made substantial reductions in their levels of inappropriate behavior, which were maintained at a three-month followup. Also, ratings of their behavioral problems by residential child care staff suggested that this improvement in behavior had generalized beyond the classroom to the residential setting. However, no significant differences were found between the pre- and post-intervention ratings of their self-esteem or teacher ratings of their overt self-esteem.  相似文献   

11.
It was hypothesized that siblings could function as effective behavior change agents for their behaviorally disturbed brothers and sisters within the home environment. Further, it was predicted that parents could be trained to be reliable observers of their children's performance under these circumstances. The results of the study supported both predictions with siblings in two separate families demonstrating their ability to work with their brother or sister within the context of an ABAB reversal design. Parents were also shown to obtain consistently high reliability ratings when compared to outside observers. The judicious use of siblings as behavior modification aides is recommended as a treatment procedure.  相似文献   

12.
This study predicted stable social maladjustment at ages 10, 11, and 12 from teacher behavioral ratings in kindergarten and a measure of family demographics. Kindergarten teachers rated 1,034 boys on hyperactivity, aggression, inattention, anxiety–withdrawal, and prosocial behavior. Sociodemographic information was collected from the parents. At ages 10, 11, and 12, teacher, parent, peer, and self-report behavior ratings were collected on 743 boys. School achievement was documented from school records. Boys whose average scores on each of the five behavioral ratings across ages 10, 11, and 12 were above the 90th percentile according to at least two informants were defined as having stable behavioral problems. From teacher ratings collected in kindergarten and family demographics, logistic regression analyses predicted stable social maladjustment. For each negative outcome there was a unique set of predictors. The results are discussed with reference to the early identification of children who are at risk.This research was supported by a grant from the Conseil Québécois de la Recherche Sociale. We would also like to thank H. Beauchesne, H. Boileau, P. Charlebois, L. David, L. Desmarais-Gervais, S. Larivée, and M. LeBlanc for their participation.  相似文献   

13.
14.
The relative efficacy of using ratings versus behavioral count measures to predict intelligence from mother-child interactions was investigated for 45 mother-child dyads who constituted a heterogeneous sample with respect to socioeconomic status. These dyads were observed when the children were 36 months of age; children were tested with standardized IQ tests at 36 and 60 months. Social interactions between mothers and children were coded from video tapes with two different systems by independent observers. The behavioral count system was used to code second-by-second the duration and frequency of behaviors during the session. The rating system was used to judge maternal behaviors on three scales following the session. A series of forced stepwise multiple regression analyses compared the predictive utility of the two systems both concurrently and over time. The ratings of maternal behavior yielded high correlations with child IQ both concurrently and over time and were not contributed to significantly by the behavioral count measures. The authors speculate that ratings proved more efficacious because the raters could make more subjective and intuitive judgments concerning maternal behavior.  相似文献   

15.
Soccer coaches and scouts typically assess in-game soccer performance to predict players’ future performance. However, there is hardly any research on the reliability and predictive validity of coaches’ and scouts’ performance assessments, or on strategies they can use to optimize their predictions. In the current study, we examined whether robust principles from psychological research on selection – namely structured information collection and mechanical combination of predictor information through a decision-rule – improve soccer coaches’ and scouts’ performance assessments. A total of n = 96 soccer coaches and scouts participated in an elaborate within-subjects experiment. Participants watched soccer players’ performance on video, rated their performance in both a structured and unstructured manner, and combined their ratings in a holistic and mechanical way. We examined the inter-rater reliability of the ratings and assessed the predictive validity by relating the ratings to players’ future market values. Contrary to our expectations, we did not find that ratings based on structured assessment paired with mechanical combination of the ratings showed higher inter-rater reliability and predictive validity. In contrast, unstructured-holistic ratings yielded the highest reliability and predictive validity, although differences were marginal. Overall, reliability was poor and predictive validities small-to-moderate, regardless of the approach used to rate players’ performance. The findings provide insights into the difficulty of predicting future performance in soccer.  相似文献   

16.
Three methods of personality assessment (behavior measures, behavior ratings, adjective ratings) were compared in 20 zoo-housed Great Apes: bonobos (Pan paniscus), chimpanzees (Pan troglodytes verus), gorillas (Gorilla gorilla gorilla), and orangutans (Pongo pygmaeus abelii). To test a new bottom-up approach, the studied trait constructs were systematically generated from the species’ behavioral repertoires. The assessments were reliable, temporally stable, and showed substantial cross-method coherence. In most traits, behavior ratings mediated the relations between adjective ratings and behavior measures. Results suggest that high predictability of manifest behavior is best achieved by behavior ratings, not by adjectives. Empirical evidence for trait constructs beyond current personality models points to the necessity of broad and systematic approaches for valid inferences on a species’ personality structure.  相似文献   

17.
We examined the psychometric properties of the Behavioral Inhibition Questionnaire (BIQ; Bishop, Spence, & McDonald, 2003), a rating scale for children's behavioral inhibition. Parent and teacher ratings, parent interviews, and laboratory observations were obtained for 495 preschoolers. Confirmatory factor analysis yielded 6 factors, each reflecting the BIQ's subscales, and all loading onto a second-order general dimension. Model fit was acceptable for parent ratings, but only marginal for teacher ratings. The convergent and discriminant validity of the BIQ was examined by using a multitrait-multimethod approach. Results indicate that the BIQ displays evidence of reliability and validity that can complement observational paradigms.  相似文献   

18.
The long term effects of Spivack and Shures' social problem-solving training were assessed and compared to an attention-placebo control. Thirty-seven preschool age children were involved in this year's long intervention project and six month follow-up. All subjects received 46 sessions of intervention by specially trained assistants. Support was found for the cognitive effectiveness of social problem-solving training with aberrant children at post test in that they gained significantly in their ability to generate alternative solutions to interpersonal problems. This differential effect was not sustained at follow-up. Blind teacher ratings of behavioral adjustment and independent observers' ratings of behavior (using a naturalistic observation scale developed for this study) revealed no significant behavioral training effects at post test or at follow-up. Findings are discussed with the suggestion that behavior change in young children may not be mediated through a strictly cognitive intervention, and may more logically require an integration of behavioral and cognitve techniques.  相似文献   

19.
Normative data for the Conners Abbreviated (10-item) Teacher Rating Scale (CATRS-10) derived from 1,068 children in Brazil are presented. Ratings of boys were higher than ratings of girls, and younger children had higher ratings than older children. Test-retest reliability data indicate that the CATRS-10 has acceptable reliability in Brazil but only when the same teacher rates the child at both test and retest (interval of 1 to 3 months). This study found that ratings at retest were significantly lower than ratings at first test whether or not the same teacher rated the child on both occasions. The CATRS-10 was shown to be a valid instrument in Brazil since children with behavioral problems requiring medical or psychological treatment were rated higher than children without such problems.  相似文献   

20.
中层管理人员结构化面试测评效度的现场研究   总被引:2,自引:0,他引:2  
通过对某上市公司随机抽取的43位中层管理人员素质测评的现场研究,探讨结构化面试的信度效度问题。研究设计基于岗位分析与关键事件分析,采用3人小组面试的方法,同时实施情景面试与行为描述面试,综合测评被试岗位胜任能力。分析结果表明,评委要素评价内部一致性和评委间内部一致性都比较高,并与面试半年后上级评定的任务绩效和总体绩效显著相关,结构化面试具有较高的信度与预测效度。进一步比较情景面试和行为描述面试发现,这两种结构化面试有类似的信度,但是行为描述面试具有更高的效度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号