首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The exact variance of weighted kappa with multiple raters   总被引:1,自引:0,他引:1  
Weighted kappa described by Cohen in 1968 is widely used in psychological research to measure agreement between two independent raters. Everitt then provided the exact variance for weighted kappa for two raters. In this paper, Everitt's exact variance is extended to three or more raters.  相似文献   

2.
A permutation algorithm and associated FORTRAN program are provided for weighted kappa. Program EWK provides the weighted kappa test statistic and the exact one-sided upper-tail probability values.  相似文献   

3.
Permutation procedures to compute exact and resampling probability values for weighted kappa are described. Comparisons with asymptotic probability values demonstrate that exact permutation procedures are advantageous for sparse data sets, whereas resampling permutation procedures are appropriate for both sparse and nonsparse data sets.  相似文献   

4.
Resampling probability values for weighted kappa with multiple raters   总被引:1,自引:0,他引:1  
A new procedure to compute weighted kappa with multiple raters is described. A resampling procedure to compute approximate probability values for weighted kappa with multiple raters is presented. Applications of weighted kappa are illustrated with an example analysis of classifications by three independent raters.  相似文献   

5.
Some Paradoxical Results for the Quadratically Weighted Kappa   总被引:1,自引:0,他引:1  
The quadratically weighted kappa is the most commonly used weighted kappa statistic for summarizing interrater agreement on an ordinal scale. The paper presents several properties of the quadratically weighted kappa that are paradoxical. For agreement tables with an odd number of categories n it is shown that if one of the raters uses the same base rates for categories 1 and n, categories 2 and n−1, and so on, then the value of quadratically weighted kappa does not depend on the value of the center cell of the agreement table. Since the center cell reflects the exact agreement of the two raters on the middle category, this result questions the applicability of the quadratically weighted kappa to agreement studies. If one wants to report a single index of agreement for an ordinal scale, it is recommended that the linearly weighted kappa instead of the quadratically weighted kappa is used.  相似文献   

6.
The kappa coefficient is one of the most widely used measures for evaluating the agreement between two raters asked to assign N objects to one of K nominal categories. Weighted versions of kappa enable partial credit to be awarded for near agreement, most notably in the case of ordinal categories. An exact significance test for weighted kappa can be conducted by enumerating all rater agreement tables with the same fixed marginal frequencies as the observed table, and accumulating the probabilities for all tables that produce a weighted kappa index that is greater than or equal to the observed measure. Unfortunately, complete enumeration of all tables is computationally unwieldy for modest values of N and K. We present an implicit enumeration algorithm for conducting an exact test of weighted kappa, which can be applied to tables of non‐trivial size. The algorithm is particularly efficient for ‘good’ to ‘excellent’ values of weighted kappa that typically have very small p‐values. Therefore, our method is beneficial for situations where resampling tests are of limited value because the number of trials needed to estimate the p‐value tends to be large.  相似文献   

7.
When two (or more) observers are independently categorizing a set of observations, Cohen’s kappa has become the most notable measure of interobserver agreement. When the categories are ordinal, a weighted form of kappa becomes desirable. The two most popular weighting schemes are the quadratic weights and linear weights. Quadratic weights have been justified by the fact that the corresponding weighted kappa is asymptotically equivalent to an intraclass correlation coefficient. This paper deals with linear weights and shows that the corresponding weighted kappa is equivalent to the unweighted kappa when cumulative probabilities are substituted for probabilities. A numerical example is provided.  相似文献   

8.
Cohen's kappa is presently a standard tool for the analysis of agreement in a 2 × 2 reliability study, and weighted kappa is a standard statistic for summarizing a 2 × 2 validity study. The special cases of weighted kappa, for example Cohen's kappa, are chance‐corrected measures of association. For various measures of 2 × 2 association it has been observed in the literature that, after correction for chance, they coincide with a special case of weighted kappa. This paper presents the general function, linear in both numerator and denominator, that becomes weighted kappa after correction for chance.  相似文献   

9.
Each of N judges independently assigns K distinct ranks to K objects. A method is described which provides the exact point probability, exact one-sided P value, and exact two-sided P value of the observed sum of N ranks for a specified object.  相似文献   

10.
Pingke Li 《Psychometrika》2016,81(3):795-801
The linearly and quadratically weighted kappa coefficients are popular statistics in measuring inter-rater agreement on an ordinal scale. It has been recently demonstrated that the linearly weighted kappa is a weighted average of the kappa coefficients of the embedded 2 by 2 agreement matrices, while the quadratically weighted kappa is insensitive to the agreement matrices that are row or column reflection symmetric. A rank-one matrix decomposition approach to the weighting schemes is presented in this note such that these phenomena can be demonstrated in a concise manner.  相似文献   

11.
One of the main objectives in meta-analysis is to estimate the overall effect size by calculating a confidence interval (CI). The usual procedure consists of assuming a standard normal distribution and a sampling variance defined as the inverse of the sum of the estimated weights of the effect sizes. But this procedure does not take into account the uncertainty due to the fact that the heterogeneity variance (tau2) and the within-study variances have to be estimated, leading to CIs that are too narrow with the consequence that the actual coverage probability is smaller than the nominal confidence level. In this article, the performances of 3 alternatives to the standard CI procedure are examined under a random-effects model and 8 different tau2 estimators to estimate the weights: the t distribution CI, the weighted variance CI (with an improved variance), and the quantile approximation method (recently proposed). The results of a Monte Carlo simulation showed that the weighted variance CI outperformed the other methods regardless of the tau2 estimator, the value of tau2, the number of studies, and the sample size.  相似文献   

12.
The log-linear model for contingency tables expresses the logarithm of a cell frequency as an additive function of main effects, interactions, etc., in a way formally identical with an analysis of variance model. Exact statistical tests are developed to test hypotheses that specific effects or sets of effects are zero, yielding procedures for exploring relationships among qualitative variables which are suitable for small samples. The tests are analogous to Fisher's exact test for a 2 × 2 contingency table. Given a hypothesis, the exact probability of the obtained table is determined, conditional on fixed marginals or other functions of the cell frequencies. The sum of the probabilities of the obtained table and of all less probable ones is the exact probability to be considered in testing the null hypothesis. Procedures for obtaining exact probabilities are explained in detail, with examples given.  相似文献   

13.
When both the variance and the N are unequal in a two-group design, the probability of a Type I error shifts from the nominal 5% error rate. The probability is too liberal when the small cell has the larger variance and too conservative when the large cell has the larger variance. We present an algorithm to circumvent the problem when the smaller group has the larger variance and show, by simulation, that the algorithm brings the error rate back to the nominal value without sacrificing the ability to detect true effects.  相似文献   

14.
This study investigated the short-term stability of the 1991 Mirowsky-Ross 2 x 2 Index of the Sense of Control. From an ongoing longitudinal study, 304 subjects were randomly selected for test-retest interviews occurring 1 to 4 days after their regularly scheduled first follow-up interview. Test-retest reliability was assessed at the item level using percent agreement and weighted kappa. At the scale score level, reliability was assessed with the intraclass correlation coefficient (ICC). ICCs were also calculated within categories of demographic, socioeconomic, psychosocial, and functional status characteristics. There was moderate to substantial item-level agreement (mean weighted kappa = 51; weighted kappa range = .38 to .66). At the scale score level there was substantial agreement (ICC = .71). No appreciable differences in ICC values were found in the demographic, socioeconomic, psychosocial, and functional comparisons of status characteristics. Thus, this sense of control measure has acceptable test-retest reliability and is appropriate for use in longitudinal research.  相似文献   

15.
Little attention has been paid to evaluating the use of DSM-III-R with preschool children. Children (N = 510) ages 2 to 5 years who were screened at the time of a pediatric visit were selected to participate in an evaluation which included questionnaires, a semistructured interview, developmental testing, and a play observation. Following the evaluation, two clinical child psychologists independently assigned DSM-III-R diagnoses. For each diagnostic category, kappa and Ycoefficients were calculated; Ycoefficients are less sensitive to base rates of disorders. For overall agreement, the weighted mean kappa (.61), and mean Y(.66) were moderately high. Overall agreement that the child had at least one of the disruptive disorders was substantial (kappa =.64; Y =.65);agreement that there was at least one of the emotional disorders was moderate for kappa (.54), but substantial for Y(.70). Kappa coefficients were higher for major categories of disorder than for specific disorders; however, Ycoefficients did not show a decline for specific disorders. Interrater reliability of DSM-III-R appears to be similar for preschoolers and older children.This study was supported by grant MH46089 from the National Institute of Mental Health.A preliminary report was presented at the Fifth Annual NIMH International Research Conference on the Classification and Treatment of Mental Disorders in General Medical Settings, Bethesda, Maryland, September 1991. We gratefully acknowledge the members of the Pediatric Practice Research Group who participated in this study.  相似文献   

16.
“Virtual Information Systems” (VIS), with probability measures outside the standard range [0,1], emerging at nonequilibrium phase transitions, may be the substrate mechanism underlying the human acquisition of information.  相似文献   

17.
Cohen’s Linearly Weighted Kappa is a Weighted Average of 2×2 Kappas   总被引:1,自引:0,他引:1  
An agreement table with n∈ℕ≥3 ordered categories can be collapsed into n−1 distinct 2×2 tables by combining adjacent categories. Vanbelle and Albert (Stat. Methodol. 6:157–163, 2009c) showed that the components of Cohen’s weighted kappa with linear weights can be obtained from these n−1 collapsed 2×2 tables. In this paper we consider several consequences of this result. One is that the weighted kappa with linear weights can be interpreted as a weighted arithmetic mean of the kappas corresponding to the 2×2 tables, where the weights are the denominators of the 2×2 kappas. In addition, it is shown that similar results and interpretations hold for linearly weighted kappas for multiple raters.  相似文献   

18.
This study investigated the concurrent validity of the Comprehensive Developmental Inventory for Infants and Toddlers (CDIIT) with the Bayley Scales of Infant Development-II (BSID-II) in full-term infants. 106 full-term infants ages 6 to 18 months (63 boys, 43 girls) were recruited as a convenience sample. One tester administered the CDIIT and BSID-II to all children. The Developmental Ages and Developmental Quotients of the motor and the mental scales from both tests were analyzed with Pearson correlations and quadratic weighted kappa tests. The results showed that correlation coefficients for Developmental Ages between both tests on cognitive and motor subtests were high (r = .91-.95) and for Developmental Quotients were moderate (r = .57-.67). Moderate classification agreement was found in the two scales (quadratic weighted kappa = .50-.53). Developmental Quotients classification for the CDIIT tended to be a little higher than for the BSID-II. It was concluded that although acceptable concurrent validity was found for the Motor and Cognitive subtests of the CDIIT, the tester should be cautious to compare Developmental Quotients obtained from the above two tests in clinical or in research settings.  相似文献   

19.
The kappa agreement coefficient of Cohen from 1960 and Brennan and Prediger from 1981 are defined and compared. A FORTRAN program is described that computes Cohen's kappa and Brennan and Prediger's kappa and their associated probability values based on Monte Carlo resampling and the binomial distribution, respectively.  相似文献   

20.
N iemelá , P. Electrodermal responses as a function of quantified threat. Scand. J. Psychol ., 1969, 10 , 19–56.—Threat was quantified by means of a 20 sec anticipation condition where the subject knew the exact time for the expected shock in advance, as well as the exact probability of receiving a shock. The probabilities used where 0.00, 0.25, 0.50, 0.75 and 1.00. The electrodermal responses (the amplitude of the skin resistance response, the amplitude of the skin potential response, and the number of electrodermal responses) during the anticipation period were found to vary systematically as functions of the probability of shock, and as functions of the time elapsed from the beginning of the experiment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号