首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Percentage agreement measures of interobserver agreement or "reliability" have traditionally been used to summarize observer agreement from studies using interval recording, time-sampling, and trial-scoring data collection procedures. Recent articles disagree on whether to continue using these percentage agreement measures, and on which ones to use, and what to do about chance agreements if their use is continued. Much of the disagreement derives from the need to be reasonably certain we do not accept as evidence of true interobserver agreement those agreement levels which are substantially probable as a result of chance observer agreement. The various percentage agreement measures are shown to be adequate to this task, but easier ways are discussed. Tables are given to permit checking to see if obtained disagreements are unlikely due to chance. Particularly important is the discovery of a simple rule that, when met, makes the tables unnecessary. If reliability checks using 50 or more observation occasions produce 10% or fewer disagreements, for behavior rates from 10% through 90%, the agreement achieved is quite improbably the result of chance agreement.  相似文献   

2.
Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not.  相似文献   

3.
4.
5.
6.
7.
Proposed methods of assessing the statistical significance of interobserver agreements provide erroneous probability values when conducted on serially correlated data. Investigators who wish to evaluate interobserver agreements by means of statistical significance can do so by limiting the analysis to every k(th) interval of data, or by using Markovian techniques which accommodate serial correlations.  相似文献   

8.
9.
Previous recommendations to employ occurrence, nonoccurrence, and overall estimates of interobserver reliability for interval data are reviewed. A rationale for comparing obtained reliability to reliability that would result from a random-chance model is explained. Formulae and graphic functions are presented to allow for the determination of chance agreement for each of the three indices, given any obtained per cent of intervals in which a response is recorded to occur. All indices are interpretable throughout the range of possible obtained values for the per cent of intervals in which a response is recorded. The level of chance agreement simply changes with changing values. Statistical procedures that could be used to determine whether obtained reliability is significantly superior to chance reliability are reviewed. These procedures are rejected because they yield significance levels that are partly a function of sample sizes and because there are no general rules to govern acceptable significance levels depending on the sizes of samples employed.  相似文献   

10.
The percentage agreement index has been and continues to be a popular measure of interobserver reliability in applied behavior analysis and child development, as well as in other fields in which behavioral observation techniques are used. An algebraic method and a linear programming method were used to assess chance-corrected reliabilities for a sample of past observations in which the percentage agreement index was used. The results indicated that, had kappa been used instead of percentage agreement, between one-fourth and three-fourth of the reported observations could be judged as unreliable against a lenient criterion and between one-half and three-fourths could be judged as unreliable against a more stringent criterion. It is suggested that the continued use of the percentage agreement index has seriously undermined the reliabilities of past observations and can no longer be justified in future studies.  相似文献   

11.
The research published in the Journal of Applied Behavior Analysis (1968 to 1975) was surveyed for three basic elements: data-collection methods, reliability procedures, and reliability scores. Three-quarters of the studies reported observational data. Most of these studies' observational methods were variations of event recording, trial scoring, interval recording, or time-sample recording. Almost all studies reported assessment of observer reliability, usually total or point-by-point percentage agreement scores. About half the agreement scores were consistently above 90%. Less than one-quarter of the studies reported that reliability was assessed at least once per condition.  相似文献   

12.
Behavioral researchers have developed a sophisticated methodology to evaluate behavioral change which is dependent upon accurate measurement of behavior. Direct observation of behavior has traditionally been the mainstay of behavioral measurement. Consequently, researchers must attend to the psychometric properties, such as interobserver agreement, of observational measures to ensure reliable and valid measurement. Of the many indices of interobserver agreement, percentage of agreement is the most popular. Its use persists despite repeated admonitions and empirical evidence indicating that it is not the most psychometrically sound statistic to determine interobserver agreement due to its inability to take chance into account. Cohen's (1960) kappa has long been proposed as the more psychometrically sound statistic for assessing interobserver agreement. Kappa is described and computational methods are presented.  相似文献   

13.
STUDIES IN APPLIED BEHAVIOR ANALYSIS HAVE USED TWO EXPRESSIONS OF RELIABILITY FOR HUMAN OBSERVATIONS: percentage agreement (including percentage occurrence and percentage nonoccurrence agreement) and correlational techniques (including the phi coefficient). The formal relationship between these two expressions is demonstrated, and a table for converting percentage agreement to phi, or vice-versa, is presented. It is suggested that both expressions be reported in order to communicate reliability unambiguously and to facilitate comparison of the reliabilities from different studies.  相似文献   

14.
Visual analysis is integral to the analysis of single-case experimental design (SCED) data. Previous studies have shown that many factors may influence the interrater agreement (IRA) of visual analysis. One factor that has received little direct attention is the impact of contextual information. In the current study, authors of recently published SCED studies were asked to make judgments regarding functional relations based on published datasets that met criteria for design quality. Respondents were randomly assigned to view graphs with or without contextual information and the degree of interrater agreement was compared. Results revealed that contextual information had no impact on IRA for decisions of a functional relation. IRA was high across both groups for 6 of the 7 datasets examined. Implications and recommendations based on these results are discussed.  相似文献   

15.
The cueing effects of interviewer praise contingent on a target behavior and expectation of behavior change were examined with six observers. Experiment I investigated the effect of cues in conjunction with expectation. Experiment II assessed the relative contributions of cues and expectation, and Experiment III examined the effect of cues in the absence of expectation. The frequencies of two behaviors, client eye contact and face touching, were held constant throughout a series of videotaped interviews between an "interviewer" and a "client". A within-subjects design was used in each experiment. During baseline conditions, praise did not follow eye contact by the client on the videotape. In all experimental conditions, praise statements from the interviewer followed each occurrence of eye contact with an equal number of praises delivered at random times when there was no eye contact. Three of the six observers dramatically increased their recordings of eye contact during the first experimental phase, but these increases were not replicated in a second praise condition. There were no systematic changes in recorded face touching. Witnessing the delivery of consequences, rather than expectation seemed to be responsible for the effect. This potential threat to the internal validity of studies using observational data may go undetected by interobserver agreement checks.  相似文献   

16.
We provide a unified, theoretical basis on which measures of data reliability may be derived or evaluated, for both quantitative and qualitative data. This approach evaluates reliability as the proportional reduction in loss (PRL) that is attained in a sample by an optimal estimator. The resulting measure is between 0 and 1, linearly related to expected loss, and provides a direct way of contrasting the measured reliability in the sample with the least reliable and most reliable data-generating cases. The PRL measure is a generalization of many of the commonly-used reliability measures.We show how the quantitative measures from generalizability theory can be derived as PRL measures (including Cronbach's alpha and measures proposed by Winer). For categorical data, we develop a new measure for the general case in which each of N judges assigns a subject to one of K categories and show that it is equivalent to a measure proposed by Perreault and Leigh for the case where N is 2.Bruce Cooil is an Associate Professor of Statistics, and Roland T. Rust is a Professor and area head for Marketing. The authors thank three anonymous reviewers and an Associate Editor for their helpful comments and suggestions. This work was supported in part by the Dean's Fund for Faculty Research of the Owen Graduate School of Management, Vanderbilt University.  相似文献   

17.
Seventeen measures of association for observer reliability (interobserver agreement) are reviewed and computational formulas are given in a common notational system. An empirical comparison of 10 of these measures is made over a range of potential reliability check results. The effects on percentage and correlational measures of occurrence frequency, error frequency, and error distribution are examined. The question of which is the best measure of interobserver agreement is discussed in terms of critical issues to be considered  相似文献   

18.
关丹丹  张厚粲 《心理科学》2004,27(2):445-448
本文首先对信度概念进行了明确,指出信度是评价测验结果可靠与否的一个指标,而不是测验工具的不变属性。针对测验结果的信度估计的可变性,介绍了上世纪末Vacha-Haase提出的信度概括化研究方法.即一种用来探索得分信度估计的可变性、并对引起变异的预测源进行探讨的一种元分析方法。最后通过对信度概括化研究手段的分析,指出信度概念的再认识与信度概括化研究将会给心理测验工作者带来新的启示。  相似文献   

19.
A variable message sign (VMS), which is a key component of intelligent transportation systems, has been frequently used in the management of urban roads and motorways to provide drivers with real-time traffic condition information about a road section or area. Nevertheless, there is a lack of unified regulations for VMS design, which makes it difficult to accurately determine the understandability and legibility of VMS information. In practice, inappropriate designs of VMSs are common, such as overload of VMS information and excessive number of phases, particularly on the urban roads of China. Building on our earlier findings obtained by surveys, in this study, field tests were conducted to assess the reliability of VMS information in terms of the visual perception characteristics of drivers. For a full visual perception, the information displayed on a VMS must be reliable, i.e., it should be easy for drivers to detect, obtain, and understand the messages they require. Two basic theories—information theory and visual perception theory—were introduced to build a quantification model of VMS information, select evaluation indices, and design the experimental process. A total of 24 drivers participated in the field experiments and questionnaires. Five indices—information obtainment rate, periodic validity, driver subjective scoring and the corresponding UI (represents information intelligibility), and CE (represents information legibility)—were analyzed to evaluate their relationships to VMS information reliability. The results confirmed that the amount of information, and the scrolling period considering redundant distance, significantly affect the reliability of VMSs information. For static VMSs, the information obtainment rate and the subjective scoring decrease with increasing amount of information. The recollection accuracy of the drivers significantly declines when the amount of information shown increases to 90 bits, corresponding to an information obtainment rate of less than 0.8 and UI equal to 0. For dynamic VMSs, the information reliability deteriorates as the scrolling period shortens and the number of phases increases. Unreasonable reliability is found when the periodic validity is less than 0.9, i.e., the actual scrolling period is more than 10% less than the calculated one, corresponding to CE equal to 0. The reliability of information was evaluated by combining the subjective scoring of the drivers and a data-based statistical analysis and considering driving safety. Accordingly, it was recommended that 90 bit is the maximum amount of information to be shown on a VMS and that the preferred scrolling period of each phase of a dynamic VMS is 5 s. The results of this study support the objective of providing reliable information to drivers by addressing the problems related to the amount of information and presentation time for VMSs. These findings provide a basis for determining the thresholds for VMS information to promote practical and user-friendly designs of VMSs on urban roads.  相似文献   

20.
Although the quality of observational data is generally evaluated by observer agreement, measures of both observer agreement and accuracy were available in the present study. Videotapes with a criterion protocol were coded by 16 observers. All observers calculated agreement scores both on their own and their partner's data and on a contrived data set misrepresented as data collected by other observers. Compared with agreement scores calculated by the experimenter, observers erroneously inflated their own agreement scores and deflated the agreement scores on the contrived data. Half of the observers (n = 8) had been given instructions emphasizing the importance of accuracy during observation while the other half had been given instructions emphasizing interobserver agreement. Accuracy exceeded agreement for the former group, whereas agreement exceeded accuracy for the latter group. The implications are that agreement should be calculated by the experimenter and that the accuracy-agreement relationship can be altered by differential observer instructions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号