期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Back to basics: Percentage agreement measures are adequate, but there are easier ways

Birkimer JC Brown JH 《Journal of applied behavior analysis》1979,12(4):535-543

Percentage agreement measures of interobserver agreement or "reliability" have traditionally been used to summarize observer agreement from studies using interval recording, time-sampling, and trial-scoring data collection procedures. Recent articles disagree on whether to continue using these percentage agreement measures, and on which ones to use, and what to do about chance agreements if their use is continued. Much of the disagreement derives from the need to be reasonably certain we do not accept as evidence of true interobserver agreement those agreement levels which are substantially probable as a result of chance observer agreement. The various percentage agreement measures are shown to be adequate to this task, but easier ways are discussed. Tables are given to permit checking to see if obtained disagreements are unlikely due to chance. Particularly important is the discovery of a simple rule that, when met, makes the tables unnecessary. If reliability checks using 50 or more observation occasions produce 10% or fewer disagreements, for behavior rates from 10% through 90%, the agreement achieved is quite improbably the result of chance agreement. 相似文献

2.

A graphical judgmental aid which summarizes obtained and chance reliability data and helps assess the believability of experimental effects

Birkimer JC Brown JH 《Journal of applied behavior analysis》1979,12(4):523-533

Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not. 相似文献

3.

Applied behavior analysis and interobserver reliability: A commentary on two articles by Birkimer and Brown

Hawkins RP Fabry BD 《Journal of applied behavior analysis》1979,12(4):545-552

相似文献

4.

Proposed conventions for evaluating observer reliability: A commentary on two articles by Birkimer and Brown

Hopkins BL 《Journal of applied behavior analysis》1979,12(4):561-564

相似文献

5.

On the not so recent invention of interobserver reliability: A commentary on two articles by Birkimer and Brown

Hartmann DP Gardner W 《Journal of applied behavior analysis》1979,12(4):559-560

相似文献

6.

Just because it's reliable doesn't mean it's believable: A commentary on two articles by Birkimer and Brown

Kratochwill TR 《Journal of applied behavior analysis》1979,12(4):553-557

相似文献

7.

A cautionary note on the use of probability values to evaluate interobserver agreement

Hartmann DP Gardner W 《Journal of applied behavior analysis》1982,15(1):189-190

Proposed methods of assessing the statistical significance of interobserver agreements provide erroneous probability values when conducted on serially correlated data. Investigators who wish to evaluate interobserver agreements by means of statistical significance can do so by limiting the analysis to every k(th) interval of data, or by using Markovian techniques which accommodate serial correlations. 相似文献

8.

Evaluating interobserver reliability of interval data

Hopkins BL Hermann JA 《Journal of applied behavior analysis》1977,10(1):121-126

Previous recommendations to employ occurrence, nonoccurrence, and overall estimates of interobserver reliability for interval data are reviewed. A rationale for comparing obtained reliability to reliability that would result from a random-chance model is explained. Formulae and graphic functions are presented to allow for the determination of chance agreement for each of the three indices, given any obtained per cent of intervals in which a response is recorded to occur. All indices are interpretable throughout the range of possible obtained values for the per cent of intervals in which a response is recorded. The level of chance agreement simply changes with changing values. Statistical procedures that could be used to determine whether obtained reliability is significantly superior to chance reliability are reviewed. These procedures are rejected because they yield significance levels that are partly a function of sample sizes and because there are no general rules to govern acceptable significance levels depending on the sizes of samples employed. 相似文献

9.

Effects of the use of percentage agreement on behavioral observation reliabilities: A reassessment

Hoi K. Suen Patrick S. C. Lee 《Journal of psychopathology and behavioral assessment》1985,7(3):221-234

The percentage agreement index has been and continues to be a popular measure of interobserver reliability in applied behavior analysis and child development, as well as in other fields in which behavioral observation techniques are used. An algebraic method and a linear programming method were used to assess chance-corrected reliabilities for a sample of past observations in which the percentage agreement index was used. The results indicated that, had kappa been used instead of percentage agreement, between one-fourth and three-fourth of the reported observations could be judged as unreliable against a lenient criterion and between one-half and three-fourths could be judged as unreliable against a more stringent criterion. It is suggested that the continued use of the percentage agreement index has seriously undermined the reliabilities of past observations and can no longer be justified in future studies. 相似文献

10.

Why the "I've got a better agreement measure" literature continues to grow: A commentary on two articles by Birkimer and Brown

Cone JD 《Journal of applied behavior analysis》1979,12(4):571-571

相似文献

11.

A review of the observational data-collection and reliability procedures reported in The Journal of Applied Behavior Analysis

Kelly MB 《Journal of applied behavior analysis》1977,10(1):97-101

The research published in the Journal of Applied Behavior Analysis (1968 to 1975) was surveyed for three basic elements: data-collection methods, reliability procedures, and reliability scores. Three-quarters of the studies reported observational data. Most of these studies' observational methods were variations of event recording, trial scoring, interval recording, or time-sample recording. Almost all studies reported assessment of observer reliability, usually total or point-by-point percentage agreement scores. About half the agreement scores were consistently above 90%. Less than one-quarter of the studies reported that reliability was assessed at least once per condition. 相似文献

12.

Interobserver Agreement in Behavioral Research: Importance and Calculation

Marley W. Watkins Miriam Pacheco 《Journal of Behavioral Education》2000,10(4):205-212

Behavioral researchers have developed a sophisticated methodology to evaluate behavioral change which is dependent upon accurate measurement of behavior. Direct observation of behavior has traditionally been the mainstay of behavioral measurement. Consequently, researchers must attend to the psychometric properties, such as interobserver agreement, of observational measures to ensure reliable and valid measurement. Of the many indices of interobserver agreement, percentage of agreement is the most popular. Its use persists despite repeated admonitions and empirical evidence indicating that it is not the most psychometrically sound statistic to determine interobserver agreement due to its inability to take chance into account. Cohen's (1960) kappa has long been proposed as the more psychometrically sound statistic for assessing interobserver agreement. Kappa is described and computational methods are presented. 相似文献

13.

Percentage agreement and phi: A conversion table

Lewin LM Wakefield JA 《Journal of applied behavior analysis》1979,12(2):299-301

STUDIES IN APPLIED BEHAVIOR ANALYSIS HAVE USED TWO EXPRESSIONS OF RELIABILITY FOR HUMAN OBSERVATIONS: percentage agreement (including percentage occurrence and percentage nonoccurrence agreement) and correlational techniques (including the phi coefficient). The formal relationship between these two expressions is demonstrated, and a table for converting percentage agreement to phi, or vice-versa, is presented. It is suggested that both expressions be reported in order to communicate reliability unambiguously and to facilitate comparison of the reliabilities from different studies. 相似文献

14.

The effect of witnessing consequences on the behavioral recordings of experimental observers

Harris FC Ciminero AR 《Journal of applied behavior analysis》1978,11(4):513-521

The cueing effects of interviewer praise contingent on a target behavior and expectation of behavior change were examined with six observers. Experiment I investigated the effect of cues in conjunction with expectation. Experiment II assessed the relative contributions of cues and expectation, and Experiment III examined the effect of cues in the absence of expectation. The frequencies of two behaviors, client eye contact and face touching, were held constant throughout a series of videotaped interviews between an "interviewer" and a "client". A within-subjects design was used in each experiment. During baseline conditions, praise did not follow eye contact by the client on the videotape. In all experimental conditions, praise statements from the interviewer followed each occurrence of eye contact with an equal number of praises delivered at random times when there was no eye contact. Three of the six observers dramatically increased their recordings of eye contact during the first experimental phase, but these increases were not replicated in a second praise condition. There were no systematic changes in recorded face touching. Witnessing the delivery of consequences, rather than expectation seemed to be responsible for the effect. This potential threat to the internal validity of studies using observational data may go undetected by interobserver agreement checks. 相似文献

15.

Reliability and expected loss: A unifying principle

Bruce Cooil Roland T. Rust 《Psychometrika》1994,59(2):203-216

We provide a unified, theoretical basis on which measures of data reliability may be derived or evaluated, for both quantitative and qualitative data. This approach evaluates reliability as the proportional reduction in loss (PRL) that is attained in a sample by an optimal estimator. The resulting measure is between 0 and 1, linearly related to expected loss, and provides a direct way of contrasting the measured reliability in the sample with the least reliable and most reliable data-generating cases. The PRL measure is a generalization of many of the commonly-used reliability measures.We show how the quantitative measures from generalizability theory can be derived as PRL measures (including Cronbach's alpha and measures proposed by Winer). For categorical data, we develop a new measure for the general case in which each of N judges assigns a subject to one of K categories and show that it is equivalent to a measure proposed by Perreault and Leigh for the case where N is 2.Bruce Cooil is an Associate Professor of Statistics, and Roland T. Rust is a Professor and area head for Marketing. The authors thank three anonymous reviewers and an Associate Editor for their helpful comments and suggestions. This work was supported in part by the Dean's Fund for Faculty Research of the Owen Graduate School of Management, Vanderbilt University. 相似文献

16.

Measures of interobserver agreement: Calculation formulas and distribution effects

Alvin Enis House Betty J. House Martha B. Campbell 《Journal of psychopathology and behavioral assessment》1981,3(1):37-57

Seventeen measures of association for observer reliability (interobserver agreement) are reviewed and computational formulas are given in a common notational system. An empirical comparison of 10 of these measures is made over a range of potential reliability check results. The effects on percentage and correlational measures of occurrence frequency, error frequency, and error distribution are examined. The question of which is the best measure of interobserver agreement is discussed in terms of critical issues to be considered 相似文献

17.

信度的再认识与信度概括化研究

关丹丹张厚粲《心理科学》2004,27(2):445-448

本文首先对信度概念进行了明确,指出信度是评价测验结果可靠与否的一个指标,而不是测验工具的不变属性。针对测验结果的信度估计的可变性,介绍了上世纪末Vacha-Haase提出的信度概括化研究方法．即一种用来探索得分信度估计的可变性、并对引起变异的预测源进行探讨的一种元分析方法。最后通过对信度概括化研究手段的分析,指出信度概念的再认识与信度概括化研究将会给心理测验工作者带来新的启示。相似文献

18.

The effects of instructions and calculation procedures on observers' accuracy, agreement, and calculation correctness

Boykin RA Nelson RO 《Journal of applied behavior analysis》1981,14(4):479-489

Although the quality of observational data is generally evaluated by observer agreement, measures of both observer agreement and accuracy were available in the present study. Videotapes with a criterion protocol were coded by 16 observers. All observers calculated agreement scores both on their own and their partner's data and on a contrived data set misrepresented as data collected by other observers. Compared with agreement scores calculated by the experimenter, observers erroneously inflated their own agreement scores and deflated the agreement scores on the contrived data. Half of the observers (n = 8) had been given instructions emphasizing the importance of accuracy during observation while the other half had been given instructions emphasizing interobserver agreement. Accuracy exceeded agreement for the former group, whereas agreement exceeded accuracy for the latter group. The implications are that agreement should be calculated by the experimenter and that the accuracy-agreement relationship can be altered by differential observer instructions. 相似文献

19.

A Breakdown of Reliability Coefficients by Test Type and Reliability Method,and the Clinical Implications of Low Reliability

Richard A. Charter 《The Journal of general psychology》2013,140(3):290-304

相似文献

20.

Testing the significance of interobserver agreement measures in the presence of autocorrelation: The jackknife procedure

Stephen V. Faraone Donald D. Dorfman 《Journal of psychopathology and behavioral assessment》1988,10(1):39-47

Users of interobserver agreement statistics have heretofore ignored the problem of autocorrelation in behavior sequences when testing the statistical significance of agreement measures. Due to autocorrelation traditional reliability tests based on the 2 × 2 contingency-table model (e.g., kappa, phi) are incorrect. Correct tests can be developed by using the bivariate time series as a statistical model. Seen from this perspective, testing the significance of interobserver agreement becomes formally equivalent to testing the significance of the lag-zero cross-correlation between two time series. The robust procedure known as the jackknife is suggested for this purpose. 相似文献