共查询到20条相似文献,搜索用时 0 毫秒
Michael L. Feldstein Henry T. Davis 《The British journal of mathematical and statistical psychology》1984,37(1):49-61
In this paper we discuss a method for assessing agreement among raters who are scoring the number of times, a specified event occurs. In such cases, it seems reasonable to define agreement in terms of raters' behaviours in correctly identifying responses which have in fact occurred, and in their falsely counting responses which have not. We exploit the discrete nature of the response variable, and examine a class of models for mean response assuming an underlying Poisson distribution. Test statistics are given for deciding on the applicability of the models, and whether there is agreement with respect to correctly detecting responses, as well as with falsely scoring responses. 相似文献
A latent-class model of rater agreement is presented for which 1 of the model parameters can be interpreted as the proportion of systematic agreement. The latent classes of the model emerge from the factorial combination of the "true" category in which a target belongs and the ease with which raters are able to classify targets into the true category. Several constrained cases of the model are described, and the relations to other well-known agreement models and kappa-type summary coefficients are explained. The differential quality of the rating categories can be assessed on the basis of the model fit. The model is illustrated using data from diagnoses of psychiatric disorders and classifications of individuals in a persuasive communication study. 相似文献
Genetically informative data can be used to address fundamental questions concerning the measurement of behavior in children. The authors illustrate this with longitudinal multiple-rater data on internalizing problems in twins. Valid information on the behavior of a child is obtained for behavior that multiple raters agree upon and for rater-specific perception of the child's behavior. Rater-disagreement variance varsigma2(rd) accounted for 35% of the individual differences in internalizing behavior. Up to 17% of this varsigma2(rd) was accounted for by rater-specific additive genetic variance varsigma2(Au). Thus, the disagreement should not be considered only to be bias/error but also as representing the unique feature of the relationships between that parent and the child. The longitudinal extension of this model helps to make a distinction between measurement error and the raters' unique perception of the child's behavior. For internalizing behavior, the results show large stability across time, which is accounted for by common additive genetic and common shared environmental factors. Rater-specific shared environmental factors show substantial influence on stability. This could mean that rater bias may be persistent and affect longitudinal studies. 相似文献
Pi (π) and kappa (κ) statistics are widely used in the areas of psychiatry and psychological testing to compute the extent of agreement between raters on nominally scaled data. It is a fact that these coefficients occasionally yield unexpected results in situations known as the paradoxes of kappa. This paper explores the origin of these limitations, and introduces an alternative and more stable agreement coefficient referred to as the AC1 coefficient. Also proposed are new variance estimators for the multiple‐rater generalized π and AC1 statistics, whose validity does not depend upon the hypothesis of independence between raters. This is an improvement over existing alternative variances, which depend on the independence assumption. A Monte‐Carlo simulation study demonstrates the validity of these variance estimators for confidence interval construction, and confirms the value of AC1 as an improved alternative to existing inter‐rater reliability statistics. 相似文献
E M Mason 《Perceptual and motor skills》1992,74(2):347-353
The purpose of this study was to investigate the interrater reliability of the visual-motor portion of the Copying subtest of the Stanford-Binet Intelligence Scale: Fourth Edition. Eight raters independently scored 11 protocols completed by children aged 5 through 10 years, using the scoring criteria and guidelines in the manual. The raters marked each of 10 items pass or fail and computed a total raw score for each protocol. Interrater reliability coefficients were obtained for each child's protocol, and the Kappa coefficient was computed for each item. Significant raters' reliability coefficients ranged from .82 to .91, which were low in comparison to test-retest reliability and Kuder-Richardson-20 coefficients for this and other subtests of the Stanford-Binet in the technical manual. Percent agreement among 8 raters also indicated weak reliability. Although the obtained results suggested some interrater reliability coefficients within acceptable levels, questions were raised about the scoring criteria for individual items. Caution is warranted in the use of cognitive measures which include subjective judgement of the examiner in applying scoring criteria. 相似文献
This paper gives a method for determining a sample size that will achieve a prespecified bound on confidence interval width for the interrater agreement measure,. The same results can be used when a prespecified power is desired for testing hypotheses about the value of kappa. An example from the literature is used to illustrate the methods proposed here. 相似文献
Three approaches to the assessment of DSM-III personality disorders are compared by examining the manner in which interrelationships between disorders are portrayed by the different techniques. Although a fair degree of convergence was noted among the three techniques, several differences were also observed. These comparisons, particularly the noted contrasts, have implications for future researchers using these techniques. 相似文献
The study tested the hypothesis that with respect to the big five domains associated with temperament, agreement between self- and others' ratings is higher than with respect to other domains. The same was expected with respect to peer–peer agreement. There were two groups of subjects: self-raters (n=639) and peer-raters (n=1278). All subjects completed the Polish Adjective List (PAL), which consists of five scales: Dynamism, Conscientiousness, Agreeableness, Excitability and Intellect, which are Polish representations of the big five personality factors extracted in American lexical studies. Each target person completed one self-rating inventory and was assessed by two peer-raters. Domains associated with temperament (Dynamism and Excitability) elicited higher agreement between self-and peer-ratings than Agreeableness and Intellect, although in case of Conscientiousness judges appeared to be as accurate as in the case of Excitability. The pattern was even less clear with respect to the peer–peer comparison. The other finding shows that in case of female raters there was more agreement between self- and peer-rating, than in case of male raters. 相似文献
This study explores the importance of anticipated group discussion, the consensus decision rule, and rater motivation in determining how well rater teams identify ratee behaviors, i.e., behavioral accuracy. Results, based on 382 raters in 111 teams, suggest that the anticipation of group discussion can improve behavioral accuracy, but it appears that the benefits of discussion-only teams are limited to this anticipation effect. Furthermore, it also appears that rater motivation plays an important role in this type of team. Rater teams required to reach consensus, however, appear to show improved behavioral accuracy, regardless of whether raters can anticipate the consensus discussion and regardless of rater motivation levels. Implications, especially for assessment centers, are discussed. 相似文献
The present study examined whether type of inflectional case (semantic or grammatical) and phonological and morphological transparency affect the processing of Finnish modifier-head agreement in reading. Readers' eye movement patterns were registered. In Experiment 1, an agreeing modifier condition (agreement was transparent) was compared with a no-modifier condition, and in Experiment 2, similar constructions with opaque agreement were used. In both experiments, agreement was found to affect the processing of the target noun with some delay. In Experiment 3, unmarked and case-marked modifiers were used. The results again demonstrated a delayed agreement effect, ruling out the possibility that the agreement effects observed in Experiments 1 and 2 reflect a mere modifier-presence effect. We concluded that agreement exerts its effect at the level of syntactic integration but not at the level of lexical access. 相似文献
In a previous paper [Elashoff 1969], we derived optimal rater teams for a particular formulation of the dichotomous rater
problem. Here, we describe a computer-based procedure for selecting good rater teams in practice; we apply the procedure to
the selection of items for a psychological inventory.
This research was supported in part by the author's predoctoral fellowship from the National Institutes of Health and by National
Science Foundation Grant GS-341, and National Institutes of Health Grants FR-3 and FR-122. 相似文献
Paniagua C 《The Psychoanalytic quarterly》2008,77(1):219-250
The author argues that the technical advances stemming from Freud's (1923) introduction of the structural theory permit a more naturalistic and specific approach to analyzing unconscious conflict, thus facilitating id analysis. The earlier topographical technique underestimated the role of suggestion; often, it entailed interference with patients' capacity for self-observation, as well as with the exploration of their own drive derivatives. In order to illustrate the type of id material obtainable with a contemporary ego psychology approach, the author presents clinical vignettes and commentaries. It is recognized that clarifications, defense interpretations, and Gray's close-process interventions may need to be adapted to different cultural milieus. 相似文献
In the present study we analyzed the processing of grammatically anomalous sentences like "The famous dancer were nervously preparing herself/themselves to face the crowd.", which contains two anomalies, one early and one late. We investigated how processing of the later anomaly (at the pronoun 'herself' or 'themselves') was affected by the processing of the early anomaly (at 'were'). We considered two processing scenarios involving the first anomaly: (1) The representation of the subject-verb number agreement error at the first verb is coerced to match the verb, rendering 'herself' anomalous; (2) The representation of the subject-verb agreement error is coerced to match the subject noun, rendering 'themselves' anomalous. Our dependent measure was event-related scalp potentials (ERPs). When the pronoun disagreed with the verb (and agreed with the subject), a P600 was recorded, while the opposite condition elicited no reliable effect. Our data suggest that interpretation of reflexive pronouns involves the reactivation of multiple lexical items, verbs included. 相似文献
The purpose of this study was to compare audio and audiovisual techniques of analyzing stuttering behavior using a recently developed index of agreement (Young, 1975). Twenty speech pathologists identified moments of stuttering for ten adult stutterers using both audio and audiovisual methods. Although no statistically significant differences existed between the two conditions, the listeners had more difficulty identifying moments of stuttering during the audio condition when mild stutterers were used. 相似文献
Janet Dixon Elashoff 《Psychometrika》1969,34(1):21-32
How can an investigator choose a good team of raters to use for measuring a continuous variable when each available rater produces only dichotomous responses? We formulate an underlying model, define an index of goodness for rater teams in terms of average mean square error of the estimate, develop a new estimator and derive the optimal rater terms. The optimal raters have characteristic curves which are linear in form and satisfy the requirements for a Guttman scale. 相似文献