首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Some Paradoxical Results for the Quadratically Weighted Kappa   总被引:1,自引:0,他引:1  
The quadratically weighted kappa is the most commonly used weighted kappa statistic for summarizing interrater agreement on an ordinal scale. The paper presents several properties of the quadratically weighted kappa that are paradoxical. For agreement tables with an odd number of categories n it is shown that if one of the raters uses the same base rates for categories 1 and n, categories 2 and n−1, and so on, then the value of quadratically weighted kappa does not depend on the value of the center cell of the agreement table. Since the center cell reflects the exact agreement of the two raters on the middle category, this result questions the applicability of the quadratically weighted kappa to agreement studies. If one wants to report a single index of agreement for an ordinal scale, it is recommended that the linearly weighted kappa instead of the quadratically weighted kappa is used.  相似文献   

2.
The exact variance of weighted kappa with multiple raters   总被引:1,自引:0,他引:1  
Weighted kappa described by Cohen in 1968 is widely used in psychological research to measure agreement between two independent raters. Everitt then provided the exact variance for weighted kappa for two raters. In this paper, Everitt's exact variance is extended to three or more raters.  相似文献   

3.
Pi (π) and kappa (κ) statistics are widely used in the areas of psychiatry and psychological testing to compute the extent of agreement between raters on nominally scaled data. It is a fact that these coefficients occasionally yield unexpected results in situations known as the paradoxes of kappa. This paper explores the origin of these limitations, and introduces an alternative and more stable agreement coefficient referred to as the AC1 coefficient. Also proposed are new variance estimators for the multiple‐rater generalized π and AC1 statistics, whose validity does not depend upon the hypothesis of independence between raters. This is an improvement over existing alternative variances, which depend on the independence assumption. A Monte‐Carlo simulation study demonstrates the validity of these variance estimators for confidence interval construction, and confirms the value of AC1 as an improved alternative to existing inter‐rater reliability statistics.  相似文献   

4.
Cohen’s Linearly Weighted Kappa is a Weighted Average of 2×2 Kappas   总被引:1,自引:0,他引:1  
An agreement table with n∈ℕ≥3 ordered categories can be collapsed into n−1 distinct 2×2 tables by combining adjacent categories. Vanbelle and Albert (Stat. Methodol. 6:157–163, 2009c) showed that the components of Cohen’s weighted kappa with linear weights can be obtained from these n−1 collapsed 2×2 tables. In this paper we consider several consequences of this result. One is that the weighted kappa with linear weights can be interpreted as a weighted arithmetic mean of the kappas corresponding to the 2×2 tables, where the weights are the denominators of the 2×2 kappas. In addition, it is shown that similar results and interpretations hold for linearly weighted kappas for multiple raters.  相似文献   

5.
The kappa coefficient is one of the most widely used measures for evaluating the agreement between two raters asked to assign N objects to one of K nominal categories. Weighted versions of kappa enable partial credit to be awarded for near agreement, most notably in the case of ordinal categories. An exact significance test for weighted kappa can be conducted by enumerating all rater agreement tables with the same fixed marginal frequencies as the observed table, and accumulating the probabilities for all tables that produce a weighted kappa index that is greater than or equal to the observed measure. Unfortunately, complete enumeration of all tables is computationally unwieldy for modest values of N and K. We present an implicit enumeration algorithm for conducting an exact test of weighted kappa, which can be applied to tables of non‐trivial size. The algorithm is particularly efficient for ‘good’ to ‘excellent’ values of weighted kappa that typically have very small p‐values. Therefore, our method is beneficial for situations where resampling tests are of limited value because the number of trials needed to estimate the p‐value tends to be large.  相似文献   

6.
The rater agreement literature is complicated by the fact that it must accommodate at east two different properties of rating data: the number of raters (two versus more than two) and the rating scale level (nominal versus metric). While kappa statistics are most widely used for nominal scales, intraclass correlation coefficients have been preferred for metric scales. In this paper, we suggest a dispersion-weighted kappa framework for multiple raters that integrates some important agreement statistics by using familiar dispersion indices as weights for expressing disagreement. These weights are applied to ratings identifying cells in the traditional inter-judge contingency table. Novel agreement statistics can be obtained by applying less familiar indices of dispersion in the same wayThis revised article was published online in August 2005 with the PDF paginated correctly.  相似文献   

7.
Permutation procedures to compute exact and resampling probability values for weighted kappa are described. Comparisons with asymptotic probability values demonstrate that exact permutation procedures are advantageous for sparse data sets, whereas resampling permutation procedures are appropriate for both sparse and nonsparse data sets.  相似文献   

8.
There is a frequent need to measure the degree of agreement among R observers who independently classify n subjects within K nominal or ordinal categories. The most popular methods are usually kappa-type measurements. When = 2, Cohen's kappa coefficient (weighted or not) is well known. When defined in the ordinal case while assuming quadratic weights, Cohen's kappa has the advantage of coinciding with the intraclass and concordance correlation coefficients. When > 2, there are more discrepancies because the definition of the kappa coefficient depends on how the phrase ‘an agreement has occurred’ is interpreted. In this paper, Hubert's interpretation, that ‘an agreement occurs if and only if all raters agree on the categorization of an object’, is used, which leads to Hubert's (nominal) and Schuster and Smith's (ordinal) kappa coefficients. Formulae for the large-sample variances for the estimators of all these coefficients are given, allowing the latter to illustrate the different ways of carrying out inference and, with the use of simulation, to select the optimal procedure. In addition, it is shown that Schuster and Smith's kappa coefficient coincides with the intraclass and concordance correlation coefficients if the first coefficient is also defined assuming quadratic weights.  相似文献   

9.
In an attempt to discover the facial action units for affective states that occur during complex learning, this study adopted an emote-aloud procedure in which participants were recorded as they verbalised their affective states while interacting with an intelligent tutoring system (AutoTutor). Participants’ facial expressions were coded by two expert raters using Ekman's Facial Action Coding System and analysed using association rule mining techniques. The two expert raters received an overall kappa that ranged between .76 and .84. The association rule mining analysis uncovered facial actions associated with confusion, frustration, and boredom. We discuss these rules and the prospects of enhancing AutoTutor with non-intrusive affect-sensitive capabilities.  相似文献   

10.
Pingke Li 《Psychometrika》2016,81(3):795-801
The linearly and quadratically weighted kappa coefficients are popular statistics in measuring inter-rater agreement on an ordinal scale. It has been recently demonstrated that the linearly weighted kappa is a weighted average of the kappa coefficients of the embedded 2 by 2 agreement matrices, while the quadratically weighted kappa is insensitive to the agreement matrices that are row or column reflection symmetric. A rank-one matrix decomposition approach to the weighting schemes is presented in this note such that these phenomena can be demonstrated in a concise manner.  相似文献   

11.
When two (or more) observers are independently categorizing a set of observations, Cohen’s kappa has become the most notable measure of interobserver agreement. When the categories are ordinal, a weighted form of kappa becomes desirable. The two most popular weighting schemes are the quadratic weights and linear weights. Quadratic weights have been justified by the fact that the corresponding weighted kappa is asymptotically equivalent to an intraclass correlation coefficient. This paper deals with linear weights and shows that the corresponding weighted kappa is equivalent to the unweighted kappa when cumulative probabilities are substituted for probabilities. A numerical example is provided.  相似文献   

12.
A permutation algorithm and associated FORTRAN program are provided for weighted kappa. Program EWK provides the weighted kappa test statistic and the exact one-sided upper-tail probability values.  相似文献   

13.
Cohen's kappa is presently a standard tool for the analysis of agreement in a 2 × 2 reliability study, and weighted kappa is a standard statistic for summarizing a 2 × 2 validity study. The special cases of weighted kappa, for example Cohen's kappa, are chance‐corrected measures of association. For various measures of 2 × 2 association it has been observed in the literature that, after correction for chance, they coincide with a special case of weighted kappa. This paper presents the general function, linear in both numerator and denominator, that becomes weighted kappa after correction for chance.  相似文献   

14.
Agreement between Two Independent Groups of Raters   总被引:1,自引:0,他引:1  
We propose a coefficient of agreement to assess the degree of concordance between two independent groups of raters classifying items on a nominal scale. This coefficient, defined on a population-based model, extends the classical Cohen’s kappa coefficient for quantifying agreement between two raters. Weighted and intraclass versions of the coefficient are also given and their sampling variance is determined by the Jackknife method. The method is illustrated on medical education data which motivated the research.  相似文献   

15.
This short paper proposes a general computing strategy to compute Kappa coefficients using the SPSS MATRIX routine. The method is based on the following rationale. If the contingency table is considered as a square matrix, then the observed proportions of agreement lie in the main diagonal’s cells, and their sum equals the trace of the matrix, whereas the proportions of agreement expected by chance are the joint product of marginals. The generalization to weighted kappa, which requires an additional square matrix of disagreement weights, both matrices having the same order, becomes possible by the use of the Hadamard product-that is, the elementwise direct product of two matrices.  相似文献   

16.
An algorithm and associated FORTRAN program are provided for the exact variance of weighted kappa. Program VARKAP provides the weighted kappa test statistic, the exact variance of weighted kappa, a Z score, one-sided lower- and upper-tail N(0,1) probability values, and the two-tail N(0,1) probability value.  相似文献   

17.
A unified treatment of the weighting problem   总被引:1,自引:0,他引:1  
A general procedure is described for obtaining weighted linear combinations of variables. This includes as special cases, multiple regression weights, canonical variate analysis, principal components, maximizing composite reliability, canonical factor analysis, and certain other well-known methods. The general procedure is shown to yield certain desirable invariance properties, with respect to transformations of the variables.The author wishes to thank Dr. A. J. Cropley for preparing the necessary computer programs for this study.  相似文献   

18.
A two-step weighted least squares estimator for multiple factor analysis of dichotomized variables is discussed. The estimator is based on the first and second order joint probabilities. Asymptotic standard errors and a model test are obtained by applying the Jackknife procedure.  相似文献   

19.
A frequent problem for decision makers (DMs) analysing decisions involving multiple objectives is the identification and selection of the most preferred option from the set of non‐dominated solutions. Two techniques, weighted sum optimization and reference point optimization, have been developed to address this problem for multiobjective linear programming problems (MOLP). In this paper, we examine the relationship between these two techniques. We demonstrate that the values of the dual variables associate with auxiliary constraints of the reference point technique are equal to the weight values used to compute the same non‐dominated solution via the weighted sum technique. This insight will enable the development of new interactive solution procedures for MOLPs which allow the DM to readily switch from one method to the other during the search for the most preferred non‐dominated solution. The advantages of the approach are discussed in the paper. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

20.
This study investigated the short-term stability of the 1991 Mirowsky-Ross 2 x 2 Index of the Sense of Control. From an ongoing longitudinal study, 304 subjects were randomly selected for test-retest interviews occurring 1 to 4 days after their regularly scheduled first follow-up interview. Test-retest reliability was assessed at the item level using percent agreement and weighted kappa. At the scale score level, reliability was assessed with the intraclass correlation coefficient (ICC). ICCs were also calculated within categories of demographic, socioeconomic, psychosocial, and functional status characteristics. There was moderate to substantial item-level agreement (mean weighted kappa = 51; weighted kappa range = .38 to .66). At the scale score level there was substantial agreement (ICC = .71). No appreciable differences in ICC values were found in the demographic, socioeconomic, psychosocial, and functional comparisons of status characteristics. Thus, this sense of control measure has acceptable test-retest reliability and is appropriate for use in longitudinal research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号