期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Poisson models for assessing rater agreement in discrete response studies

Michael L. Feldstein Henry T. Davis 《The British journal of mathematical and statistical psychology》1984,37(1):49-61

In this paper we discuss a method for assessing agreement among raters who are scoring the number of times, a specified event occurs. In such cases, it seems reasonable to define agreement in terms of raters' behaviours in correctly identifying responses which have in fact occurred, and in their falsely counting responses which have not. We exploit the discrete nature of the response variable, and examine a class of models for mean response assuming an underlying Poisson distribution. Test statistics are given for deciding on the applicability of the models, and whether there is agreement with respect to correctly detecting responses, as well as with falsely scoring responses. 相似文献

2.

Indexing systematic rater agreement with a latent-class model

Schuster C Smith DA 《心理学方法》2002,7(3):384-395

A latent-class model of rater agreement is presented for which 1 of the model parameters can be interpreted as the proportion of systematic agreement. The latent classes of the model emerge from the factorial combination of the "true" category in which a target belongs and the ease with which raters are able to classify targets into the true category. Several constrained cases of the model are described, and the relations to other well-known agreement models and kappa-type summary coefficients are explained. The differential quality of the rating categories can be assessed on the basis of the model fit. The model is illustrated using data from diagnoses of psychiatric disorders and classifications of individuals in a persuasive communication study. 相似文献

3.

Twins and the study of rater (dis)agreement

Bartels M Boomsma DI Hudziak JJ van Beijsterveldt TC van den Oord EJ 《心理学方法》2007,12(4):451-466

Genetically informative data can be used to address fundamental questions concerning the measurement of behavior in children. The authors illustrate this with longitudinal multiple-rater data on internalizing problems in twins. Valid information on the behavior of a child is obtained for behavior that multiple raters agree upon and for rater-specific perception of the child's behavior. Rater-disagreement variance varsigma2(rd) accounted for 35% of the individual differences in internalizing behavior. Up to 17% of this varsigma2(rd) was accounted for by rater-specific additive genetic variance varsigma2(Au). Thus, the disagreement should not be considered only to be bias/error but also as representing the unique feature of the relationships between that parent and the child. The longitudinal extension of this model helps to make a distinction between measurement error and the raters' unique perception of the child's behavior. For internalizing behavior, the results show large stability across time, which is accounted for by common additive genetic and common shared environmental factors. Rater-specific shared environmental factors show substantial influence on stability. This could mean that rater bias may be persistent and affect longitudinal studies. 相似文献

4.

Computing inter‐rater reliability and its variance in the presence of high agreement

Kilem Li Gwet 《The British journal of mathematical and statistical psychology》2008,61(1):29-48

Pi (π) and kappa (κ) statistics are widely used in the areas of psychiatry and psychological testing to compute the extent of agreement between raters on nominally scaled data. It is a fact that these coefficients occasionally yield unexpected results in situations known as the paradoxes of kappa. This paper explores the origin of these limitations, and introduces an alternative and more stable agreement coefficient referred to as the AC₁ coefficient. Also proposed are new variance estimators for the multiple‐rater generalized π and AC₁ statistics, whose validity does not depend upon the hypothesis of independence between raters. This is an improvement over existing alternative variances, which depend on the independence assumption. A Monte‐Carlo simulation study demonstrates the validity of these variance estimators for confidence interval construction, and confirms the value of AC₁ as an improved alternative to existing inter‐rater reliability statistics. 相似文献

5.

Capturing rater policies for processing evaluation data

Zedeck S Kafry D 《Organizational behavior and human performance》1977,18(2):269-294

相似文献

6.

Percent of agreement among raters and rater reliability of the copying subtest of the Stanford-Binet Intelligence Scale: Fourth Edition.

E M Mason 《Perceptual and motor skills》1992,74(2):347-353

The purpose of this study was to investigate the interrater reliability of the visual-motor portion of the Copying subtest of the Stanford-Binet Intelligence Scale: Fourth Edition. Eight raters independently scored 11 protocols completed by children aged 5 through 10 years, using the scoring criteria and guidelines in the manual. The raters marked each of 10 items pass or fail and computed a total raw score for each protocol. Interrater reliability coefficients were obtained for each child's protocol, and the Kappa coefficient was computed for each item. Significant raters' reliability coefficients ranged from .82 to .91, which were low in comparison to test-retest reliability and Kuder-Richardson-20 coefficients for this and other subtests of the Stanford-Binet in the technical manual. Percent agreement among 8 raters also indicated weak reliability. Although the obtained results suggested some interrater reliability coefficients within acceptable levels, questions were raised about the scoring criteria for individual items. Caution is warranted in the use of cognitive measures which include subjective judgement of the examiner in applying scoring criteria. 相似文献

7.

Sample size determinations for the two rater kappa statistic

V. F. Flack A. A. Afifi P. A. Lachenbruch H. J. A. Schouten 《Psychometrika》1988,53(3):321-325

This paper gives a method for determining a sample size that will achieve a prespecified bound on confidence interval width for the interrater agreement measure,. The same results can be used when a prespecified power is desired for testing hypotheses about the value of kappa. An example from the literature is used to illustrate the methods proposed here. 相似文献

8.

A comparison of three personality disorder assessment approaches

Leslie C. Morey 《Journal of psychopathology and behavioral assessment》1986,8(1):25-30

Three approaches to the assessment of DSM-III personality disorders are compared by examining the manner in which interrelationships between disorders are portrayed by the different techniques. Although a fair degree of convergence was noted among the three techniques, several differences were also observed. These comparisons, particularly the noted contrasts, have implications for future researchers using these techniques. 相似文献

9.

Big five domain and gender as determinants of rater agreement: a comparison based on self- and peer-rating on the Polish Adjective List

Piotr Szarota Bogdan Zawadzki Jan Strelau 《Personality and individual differences》2002,33(8)

The study tested the hypothesis that with respect to the big five domains associated with temperament, agreement between self- and others' ratings is higher than with respect to other domains. The same was expected with respect to peer–peer agreement. There were two groups of subjects: self-raters (n=639) and peer-raters (n=1278). All subjects completed the Polish Adjective List (PAL), which consists of five scales: Dynamism, Conscientiousness, Agreeableness, Excitability and Intellect, which are Polish representations of the big five personality factors extracted in American lexical studies. Each target person completed one self-rating inventory and was assessed by two peer-raters. Domains associated with temperament (Dynamism and Excitability) elicited higher agreement between self-and peer-ratings than Agreeableness and Intellect, although in case of Conscientiousness judges appeared to be as accurate as in the case of Excitability. The pattern was even less clear with respect to the peer–peer comparison. The other finding shows that in case of female raters there was more agreement between self- and peer-rating, than in case of male raters. 相似文献

10.

Comparative study of three stretching techniques

L E Holt T M Travis T Okita 《Perceptual and motor skills》1970,31(2):611-616

相似文献

11.

Why convene rater teams: An investigation of the benefits of anticipated discussion,consensus, and rater motivation

Sylvia G. Roch 《Organizational behavior and human decision processes》2007

This study explores the importance of anticipated group discussion, the consensus decision rule, and rater motivation in determining how well rater teams identify ratee behaviors, i.e., behavioral accuracy. Results, based on 382 raters in 111 teams, suggest that the anticipation of group discussion can improve behavioral accuracy, but it appears that the benefits of discussion-only teams are limited to this anticipation effect. Furthermore, it also appears that rater motivation plays an important role in this type of team. Rater teams required to reach consensus, however, appear to show improved behavioral accuracy, regardless of whether raters can anticipate the consensus discussion and regardless of rater motivation levels. Implications, especially for assessment centers, are discussed. 相似文献

12.

Multiple-Valued Logic mathematical approaches for multi-state system reliability analysis

Elena Zaitseva Vitaly Levashenko 《Journal of Applied Logic》2013,11(3):350-362

相似文献

13.

Processing modifier-head agreement in reading: Evidence for a delayed effect of agreement

Vainio S Hyöna J Pajunen A 《Memory & cognition》2008,36(2):329-340

The present study examined whether type of inflectional case (semantic or grammatical) and phonological and morphological transparency affect the processing of Finnish modifier-head agreement in reading. Readers' eye movement patterns were registered. In Experiment 1, an agreeing modifier condition (agreement was transparent) was compared with a no-modifier condition, and in Experiment 2, similar constructions with opaque agreement were used. In both experiments, agreement was found to affect the processing of the target noun with some delay. In Experiment 3, unmarked and case-marked modifiers were used. The results again demonstrated a delayed agreement effect, ruling out the possibility that the agreement effects observed in Experiments 1 and 2 reflect a mere modifier-presence effect. We concluded that agreement exerts its effect at the level of syntactic integration but not at the level of lexical access. 相似文献

14.

Optimal choice of rater teams II: Applications

Janet Dixon Elashoff Donald E. Spiegel 《Psychometrika》1969,34(1):33-44

In a previous paper [Elashoff 1969], we derived optimal rater teams for a particular formulation of the dichotomous rater problem. Here, we describe a computer-based procedure for selecting good rater teams in practice; we apply the procedure to the selection of items for a psychological inventory. This research was supported in part by the author's predoctoral fellowship from the National Institutes of Health and by National Science Foundation Grant GS-341, and National Institutes of Health Grants FR-3 and FR-122. 相似文献

15.

Id analysis and technical approaches

Paniagua C 《The Psychoanalytic quarterly》2008,77(1):219-250

The author argues that the technical advances stemming from Freud's (1923) introduction of the structural theory permit a more naturalistic and specific approach to analyzing unconscious conflict, thus facilitating id analysis. The earlier topographical technique underestimated the role of suggestion; often, it entailed interference with patients' capacity for self-observation, as well as with the exploration of their own drive derivatives. In order to illustrate the type of id material obtainable with a contemporary ego psychology approach, the author presents clinical vignettes and commentaries. It is recognized that clarifications, defense interpretations, and Gray's close-process interventions may need to be adapted to different cultural milieus. 相似文献

16.

Anaphoric agreement violation: an ERP analysis of its interpretation

Molinaro N Kim A Vespignani F Job R 《Cognition》2008,106(2):963-974

In the present study we analyzed the processing of grammatically anomalous sentences like "The famous dancer were nervously preparing herself/themselves to face the crowd.", which contains two anomalies, one early and one late. We investigated how processing of the later anomaly (at the pronoun 'herself' or 'themselves') was affected by the processing of the early anomaly (at 'were'). We considered two processing scenarios involving the first anomaly: (1) The representation of the subject-verb number agreement error at the first verb is coerced to match the verb, rendering 'herself' anomalous; (2) The representation of the subject-verb agreement error is coerced to match the subject noun, rendering 'themselves' anomalous. Our dependent measure was event-related scalp potentials (ERPs). When the pronoun disagreed with the verb (and agreed with the subject), a P600 was recorded, while the opposite condition elicited no reliable effect. Our data suggest that interpretation of reflexive pronouns involves the reactivation of multiple lexical items, verbs included. 相似文献

17.

Word-by-word analysis of observer agreement utilizing audio and audiovisual techniques

Margaret MacDonald Coyle A.R. Mallard 《Journal of Fluency Disorders》1979,4(1):23-28

The purpose of this study was to compare audio and audiovisual techniques of analyzing stuttering behavior using a recently developed index of agreement (Young, 1975). Twenty speech pathologists identified moments of stuttering for ten adult stutterers using both audio and audiovisual methods. Although no statistically significant differences existed between the two conditions, the listeners had more difficulty identifying moments of stuttering during the audio condition when mild stutterers were used. 相似文献

18.

Modification of chronic smoking behavior: A comparison of three approaches

T L Whitman 《Behaviour research and therapy》1969,7(3):257-263

相似文献

19.

Optimal choice of rater terms I: Theory

Janet Dixon Elashoff 《Psychometrika》1969,34(1):21-32

How can an investigator choose a good team of raters to use for measuring a continuous variable when each available rater produces only dichotomous responses? We formulate an underlying model, define an index of goodness for rater teams in terms of average mean square error of the estimate, develop a new estimator and derive the optimal rater terms. The optimal raters have characteristic curves which are linear in form and satisfy the requirements for a Guttman scale. 相似文献

20.

The effects of rater training,job analysis format and congruence of training on job evaluation ratings

Douglas F. Cellar John R. Curtis Jr. Kim Kohlepp Patricia Poczapski Sameena Mohiuddin 《Journal of business and psychology》1989,3(4):387-401

相似文献