期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Some Paradoxical Results for the Quadratically Weighted Kappa 总被引：1，自引：0，他引：1

Matthijs?J.?Warrens Email author 《Psychometrika》2012,77(2):315-323

The quadratically weighted kappa is the most commonly used weighted kappa statistic for summarizing interrater agreement on an ordinal scale. The paper presents several properties of the quadratically weighted kappa that are paradoxical. For agreement tables with an odd number of categories n it is shown that if one of the raters uses the same base rates for categories 1 and n, categories 2 and n−1, and so on, then the value of quadratically weighted kappa does not depend on the value of the center cell of the agreement table. Since the center cell reflects the exact agreement of the two raters on the middle category, this result questions the applicability of the quadratically weighted kappa to agreement studies. If one wants to report a single index of agreement for an ordinal scale, it is recommended that the linearly weighted kappa instead of the quadratically weighted kappa is used. 相似文献

2.

Chance‐corrected measures for 2 × 2 tables that coincide with weighted kappa

Matthijs J. Warrens 《The British journal of mathematical and statistical psychology》2011,64(2):355-365

Cohen's kappa is presently a standard tool for the analysis of agreement in a 2 × 2 reliability study, and weighted kappa is a standard statistic for summarizing a 2 × 2 validity study. The special cases of weighted kappa, for example Cohen's kappa, are chance‐corrected measures of association. For various measures of 2 × 2 association it has been observed in the literature that, after correction for chance, they coincide with a special case of weighted kappa. This paper presents the general function, linear in both numerator and denominator, that becomes weighted kappa after correction for chance. 相似文献

3.

A Note on the Linearly and Quadratically Weighted Kappa Coefficients

Pingke Li 《Psychometrika》2016,81(3):795-801

The linearly and quadratically weighted kappa coefficients are popular statistics in measuring inter-rater agreement on an ordinal scale. It has been recently demonstrated that the linearly weighted kappa is a weighted average of the kappa coefficients of the embedded 2 by 2 agreement matrices, while the quadratically weighted kappa is insensitive to the agreement matrices that are row or column reflection symmetric. A rank-one matrix decomposition approach to the weighting schemes is presented in this note such that these phenomena can be demonstrated in a concise manner. 相似文献

4.

Resampling probability values for weighted kappa with multiple raters 总被引：1，自引：0，他引：1

Mielke PW Berry KJ Johnston JE 《Psychological reports》2008,102(2):606-613

A new procedure to compute weighted kappa with multiple raters is described. A resampling procedure to compute approximate probability values for weighted kappa with multiple raters is presented. Applications of weighted kappa are illustrated with an example analysis of classifications by three independent raters. 相似文献

5.

A FORTRAN program for computing the exact variance of weighted kappa

Mielke PW Berry KJ Johnston JE 《Perceptual and motor skills》2005,101(2):468-472

An algorithm and associated FORTRAN program are provided for the exact variance of weighted kappa. Program VARKAP provides the weighted kappa test statistic, the exact variance of weighted kappa, a Z score, one-sided lower- and upper-tail N(0,1) probability values, and the two-tail N(0,1) probability value. 相似文献

6.

An Alternative Interpretation of the Linearly Weighted Kappa Coefficients for Ordinal Data

Tarald O. Kvålseth 《Psychometrika》2018,83(3):618-627

When two (or more) observers are independently categorizing a set of observations, Cohen’s kappa has become the most notable measure of interobserver agreement. When the categories are ordinal, a weighted form of kappa becomes desirable. The two most popular weighting schemes are the quadratic weights and linear weights. Quadratic weights have been justified by the fact that the corresponding weighted kappa is asymptotically equivalent to an intraclass correlation coefficient. This paper deals with linear weights and shows that the corresponding weighted kappa is equivalent to the unweighted kappa when cumulative probabilities are substituted for probabilities. A numerical example is provided. 相似文献

7.

An implicit enumeration method for an exact test of weighted kappa

Michael J. Brusco Stephanie Stahl Douglas Steinley 《The British journal of mathematical and statistical psychology》2008,61(2):439-452

The kappa coefficient is one of the most widely used measures for evaluating the agreement between two raters asked to assign N objects to one of K nominal categories. Weighted versions of kappa enable partial credit to be awarded for near agreement, most notably in the case of ordinal categories. An exact significance test for weighted kappa can be conducted by enumerating all rater agreement tables with the same fixed marginal frequencies as the observed table, and accumulating the probabilities for all tables that produce a weighted kappa index that is greater than or equal to the observed measure. Unfortunately, complete enumeration of all tables is computationally unwieldy for modest values of N and K. We present an implicit enumeration algorithm for conducting an exact test of weighted kappa, which can be applied to tables of non‐trivial size. The algorithm is particularly efficient for ‘good’ to ‘excellent’ values of weighted kappa that typically have very small p‐values. Therefore, our method is beneficial for situations where resampling tests are of limited value because the number of trials needed to estimate the p‐value tends to be large. 相似文献

8.

Test-retest reliability of the Mirowsky-Ross 2 x 2 Index of the Sense of Control

Wolinsky FD Wyrwich KW Metz SM Babu AN Tierney WM Kroenke K 《Psychological reports》2004,94(2):725-732

This study investigated the short-term stability of the 1991 Mirowsky-Ross 2 x 2 Index of the Sense of Control. From an ongoing longitudinal study, 304 subjects were randomly selected for test-retest interviews occurring 1 to 4 days after their regularly scheduled first follow-up interview. Test-retest reliability was assessed at the item level using percent agreement and weighted kappa. At the scale score level, reliability was assessed with the intraclass correlation coefficient (ICC). ICCs were also calculated within categories of demographic, socioeconomic, psychosocial, and functional status characteristics. There was moderate to substantial item-level agreement (mean weighted kappa = 51; weighted kappa range = .38 to .66). At the scale score level there was substantial agreement (ICC = .71). No appreciable differences in ICC values were found in the demographic, socioeconomic, psychosocial, and functional comparisons of status characteristics. Thus, this sense of control measure has acceptable test-retest reliability and is appropriate for use in longitudinal research. 相似文献

9.

The exact variance of weighted kappa with multiple raters 总被引：1，自引：0，他引：1

Mielke PW Berry KJ Johnston JE 《Psychological reports》2007,101(2):655-660

Weighted kappa described by Cohen in 1968 is widely used in psychological research to measure agreement between two independent raters. Everitt then provided the exact variance for weighted kappa for two raters. In this paper, Everitt's exact variance is extended to three or more raters. 相似文献

10.

A Kraemer-type Rescaling that Transforms the Odds Ratio into the Weighted Kappa Coefficient

Matthijs J. Warrens 《Psychometrika》2010,75(2):328-330

This paper presents a simple rescaling of the odds ratio that transforms the association measure into the weighted kappa statistic for a 2×2 table. 相似文献

11.

Cohen’s Linearly Weighted Kappa is a Weighted Average of 2×2 Kappas 总被引：1，自引：0，他引：1

Matthijs J. Warrens 《Psychometrika》2011,76(3):471-486

An agreement table with n∈ℕ_≥3 ordered categories can be collapsed into n−1 distinct 2×2 tables by combining adjacent categories. Vanbelle and Albert (Stat. Methodol. 6:157–163, 2009c) showed that the components of Cohen’s weighted kappa with linear weights can be obtained from these n−1 collapsed 2×2 tables. In this paper we consider several consequences of this result. One is that the weighted kappa with linear weights can be interpreted as a weighted arithmetic mean of the kappas corresponding to the 2×2 tables, where the weights are the denominators of the 2×2 kappas. In addition, it is shown that similar results and interpretations hold for linearly weighted kappas for multiple raters. 相似文献

12.

Exact and resampling probability values for weighted kappa

Berry KJ Johnston JE Mielke PW 《Psychological reports》2005,96(2):243-252

Permutation procedures to compute exact and resampling probability values for weighted kappa are described. Comparisons with asymptotic probability values demonstrate that exact permutation procedures are advantageous for sparse data sets, whereas resampling permutation procedures are appropriate for both sparse and nonsparse data sets. 相似文献

13.

Concurrent validity in Taiwan of the Comprehensive Developmental Inventory for Infants and Toddlers who were full-term infants

Liao HF Yao G Wang TM 《Perceptual and motor skills》2008,107(1):29-44

This study investigated the concurrent validity of the Comprehensive Developmental Inventory for Infants and Toddlers (CDIIT) with the Bayley Scales of Infant Development-II (BSID-II) in full-term infants. 106 full-term infants ages 6 to 18 months (63 boys, 43 girls) were recruited as a convenience sample. One tester administered the CDIIT and BSID-II to all children. The Developmental Ages and Developmental Quotients of the motor and the mental scales from both tests were analyzed with Pearson correlations and quadratic weighted kappa tests. The results showed that correlation coefficients for Developmental Ages between both tests on cognitive and motor subtests were high (r = .91-.95) and for Developmental Quotients were moderate (r = .57-.67). Moderate classification agreement was found in the two scales (quadratic weighted kappa = .50-.53). Developmental Quotients classification for the CDIIT tended to be a little higher than for the BSID-II. It was concluded that although acceptable concurrent validity was found for the Motor and Cognitive subtests of the CDIIT, the tester should be cautious to compare Developmental Quotients obtained from the above two tests in clinical or in research settings. 相似文献

14.

A comparison of self-report and interview diagnoses of DSM-III-R personality disorders

Jiri Modestin Thomas Erni Bernard Oberson 《欧洲人格杂志》1998,12(6):445-455

A total of 73 psychiatric inpatients, all of whom (but two) fulfilled criteria for at least one specific personality disorder (PD) on SCID-II PQ, were interviewed with the help of PDE. The self-report PD diagnosis was confirmed in 35 (48 per cent) patients. The diagnostic agreement between the two instruments was poor, yielding an overall weighted kappa of 0.22. Levelling off the PD base rates by increasing or decreasing the diagnostic threshold of SCID-II PQ and PDE respectively increased the overall weighted kappa to 0.38 in both instances. 70 per cent of SCID-II PQ but only 29 per cent of PDE personality disorders were of extensive type. Most frequent important co-occurrences occurred between individual PD types within cluster 2. On the whole, the results confirmed the relatively poor agreement between self-report and interview PD diagnoses. The utilization of self-report questionnaires in a clinical practice remains a controversial issue. © 1998 John Wiley & Sons, Ltd. 相似文献

15.

Hubert's multi-rater kappa revisited

Antonio Martín Andrés María Álvarez Hernández 《The British journal of mathematical and statistical psychology》2020,73(1):1-22

There is a frequent need to measure the degree of agreement among R observers who independently classify n subjects within K nominal or ordinal categories. The most popular methods are usually kappa-type measurements. When R = 2, Cohen's kappa coefficient (weighted or not) is well known. When defined in the ordinal case while assuming quadratic weights, Cohen's kappa has the advantage of coinciding with the intraclass and concordance correlation coefficients. When R > 2, there are more discrepancies because the definition of the kappa coefficient depends on how the phrase ‘an agreement has occurred’ is interpreted. In this paper, Hubert's interpretation, that ‘an agreement occurs if and only if all raters agree on the categorization of an object’, is used, which leads to Hubert's (nominal) and Schuster and Smith's (ordinal) kappa coefficients. Formulae for the large-sample variances for the estimators of all these coefficients are given, allowing the latter to illustrate the different ways of carrying out inference and, with the use of simulation, to select the optimal procedure. In addition, it is shown that Schuster and Smith's kappa coefficient coincides with the intraclass and concordance correlation coefficients if the first coefficient is also defined assuming quadratic weights. 相似文献

16.

On the interdependence of temporal and spatial judgments

Yih Lerh Huang Bill Jones 《Attention, perception & psychophysics》1982,32(1):7-14

Three experiments are reported on the tau and kappa effects, the dependence of judgments of distance upon duration (tau) and of judgments of duration upon distance (kappa). In Experiment 1, three lights in a horizontal sequence were used to define two temporal and two spatial intervals over a total duration of 160 msec. The subject was required to choose the shorter of either the two durations or the two distances. The results confirmed Collyer’s (1977) findings that the two effects are inconsistently observed across subjects when the display duration is brief. In Experiment 2, display duration was systematically manipulated from 160 to 1,500 msec. It is argued that relative temporal judgments should become easier as the total display duration is increased and that, hence, the kappa effect should become less marked. On the other hand, relative spatial judgments should become more difficult as the total duration of the display is increased, and the tau effect should become more marked. The data were in conformity with the hypothesis. In Experiment 3, data are presented for a tau experiment which fit the assumption that the effect depends upon a weighted average of distance and the expected distance which would be traversed in the given time at constant velocity. 相似文献

17.

Interrater reliability of the DSM-III-R with preschool children

John V. Lavigne Ph.D. Richard Arend Diane Rosenbaum James Sinacore Colleen Cicchetti Helen J. Binns Katherine Kaufer Christoffel Jennifer R. Hayford Patricia McGuire 《Journal of abnormal child psychology》1994,22(6):679-690

Little attention has been paid to evaluating the use of DSM-III-R with preschool children. Children (N = 510) ages 2 to 5 years who were screened at the time of a pediatric visit were selected to participate in an evaluation which included questionnaires, a semistructured interview, developmental testing, and a play observation. Following the evaluation, two clinical child psychologists independently assigned DSM-III-R diagnoses. For each diagnostic category, kappa and Ycoefficients were calculated; Ycoefficients are less sensitive to base rates of disorders. For overall agreement, the weighted mean kappa (.61), and mean Y(.66) were moderately high. Overall agreement that the child had at least one of the disruptive disorders was substantial (kappa =.64; Y =.65);agreement that there was at least one of the emotional disorders was moderate for kappa (.54), but substantial for Y(.70). Kappa coefficients were higher for major categories of disorder than for specific disorders; however, Ycoefficients did not show a decline for specific disorders. Interrater reliability of DSM-III-R appears to be similar for preschoolers and older children.This study was supported by grant MH46089 from the National Institute of Mental Health.A preliminary report was presented at the Fifth Annual NIMH International Research Conference on the Classification and Treatment of Mental Disorders in General Medical Settings, Bethesda, Maryland, September 1991. We gratefully acknowledge the members of the Pediatric Practice Research Group who participated in this study. 相似文献

18.

Computing Cohen’s kappa coefficients using SPSS MATRIX

Claude A. M. Valiquette Alain D. Lesage Mireille Cyr Jean Toupin 《Behavior research methods》1994,26(1):60-61

This short paper proposes a general computing strategy to compute Kappa coefficients using the SPSS MATRIX routine. The method is based on the following rationale. If the contingency table is considered as a square matrix, then the observed proportions of agreement lie in the main diagonal’s cells, and their sum equals the trace of the matrix, whereas the proportions of agreement expected by chance are the joint product of marginals. The generalization to weighted kappa, which requires an additional square matrix of disagreement weights, both matrices having the same order, becomes possible by the use of the Hadamard product-that is, the elementwise direct product of two matrices. 相似文献

19.

The developmental profile: preliminary results on interrater reliability and construct validity

Van HL Ingenhoven TJ van Foeken I van 't Spijker A Spinhoven P Abraham RE 《Journal of personality disorders》2000,14(4):360-365

This study presents the preliminary results of research into the interrater reliability and construct validity of the Developmental Profile (DP). In the DP a number of developmental lines, such as Object-Relations, Self-Images, and Problem-Solving Capacities, are assessed and classified according to the level of functioning. A total of 108 profiles were assessed, drawn from three different categories of patients. The weighted kappa values for interrater reliability were sufficient. On the adaptive level, but also on the maladaptive levels Symbiosis and Resistance, significant differences were found between psychiatric patients, "normal controls" (dental patients) and somatic patients. No differences were recorded between the latter two groups. The conclusion is that the DP is a promising instrument, of which the reliability and validity has to be further investigated in order to contribute to scientific support for psychodynamic theory formation. 相似文献

20.

Nonasymptotic significance tests for two measures of agreement.

K L Berry P W Mielke 《Perceptual and motor skills》2001,93(1):109-114

The kappa agreement coefficient of Cohen from 1960 and Brennan and Prediger from 1981 are defined and compared. A FORTRAN program is described that computes Cohen's kappa and Brennan and Prediger's kappa and their associated probability values based on Monte Carlo resampling and the binomial distribution, respectively. 相似文献