首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Proposed methods of assessing the statistical significance of interobserver agreements provide erroneous probability values when conducted on serially correlated data. Investigators who wish to evaluate interobserver agreements by means of statistical significance can do so by limiting the analysis to every k(th) interval of data, or by using Markovian techniques which accommodate serial correlations.  相似文献   

2.
Portable electronic data collection devices permit investigators to collect large amounts of observational data in a form ready for computer analysis. These devices are particularly efficient for gathering continuous data on multiple behavior categories. We expect that the increasing availability of these devices will lead to greater use of continuous data collection methods in observational research. This paper addresses the difficulties encountered when calculating traditional interobserver agreement statistics for continuous, multiple-code scoring. Two alternative strategies are described that yield interobserver agreement values based on the exact time of behavior code entries by the primary and secondary observers.Work on this paper was supported in part by NICHD Grants P01HD15051 and R01HD17650 and Office of Special Education and Rehabilitation Services Grant G008302980.  相似文献   

3.
How changes in the interobserver agreement and disagreement cells in the reliability matrix are reflected differently in eight commonly used reliability indices is shown graphically. Indices which take into account expected chance difference are compared to those indices which do not. Differences between indices which do and do not treat the agreement and the disagreement cells equally are also illustrated.  相似文献   

4.
Based on the conceptual framework outlined by Cone (1986) and Suen (1988), a practical decision tree is developed as an aid for the selection of observational reliability indices.  相似文献   

5.
Various statistics have been proposed as standard methods for calculating and reporting interobserver agreement scores. The advantages and disadvantages of each have been discussed in this journal recently but without resolution. A formula is presented that combines separate measures of occurrence and nonoccurrence percentages of agreement, with weight assigned to each measure, varying according to the observed rate of behavior. This formula, which is a modification of a formula proposed by Clement (1976), appears to reduce distortions due to "chance" agreement encountered with very high or low observed rates of behavior while maintaining the mathematical and conceptual simplicity of the conventional method for calculating occurrence and nonoccurrence agreement.  相似文献   

6.
Percentage agreement measures of interobserver agreement or "reliability" have traditionally been used to summarize observer agreement from studies using interval recording, time-sampling, and trial-scoring data collection procedures. Recent articles disagree on whether to continue using these percentage agreement measures, and on which ones to use, and what to do about chance agreements if their use is continued. Much of the disagreement derives from the need to be reasonably certain we do not accept as evidence of true interobserver agreement those agreement levels which are substantially probable as a result of chance observer agreement. The various percentage agreement measures are shown to be adequate to this task, but easier ways are discussed. Tables are given to permit checking to see if obtained disagreements are unlikely due to chance. Particularly important is the discovery of a simple rule that, when met, makes the tables unnecessary. If reliability checks using 50 or more observation occasions produce 10% or fewer disagreements, for behavior rates from 10% through 90%, the agreement achieved is quite improbably the result of chance agreement.  相似文献   

7.
The cueing effects of interviewer praise contingent on a target behavior and expectation of behavior change were examined with six observers. Experiment I investigated the effect of cues in conjunction with expectation. Experiment II assessed the relative contributions of cues and expectation, and Experiment III examined the effect of cues in the absence of expectation. The frequencies of two behaviors, client eye contact and face touching, were held constant throughout a series of videotaped interviews between an "interviewer" and a "client". A within-subjects design was used in each experiment. During baseline conditions, praise did not follow eye contact by the client on the videotape. In all experimental conditions, praise statements from the interviewer followed each occurrence of eye contact with an equal number of praises delivered at random times when there was no eye contact. Three of the six observers dramatically increased their recordings of eye contact during the first experimental phase, but these increases were not replicated in a second praise condition. There were no systematic changes in recorded face touching. Witnessing the delivery of consequences, rather than expectation seemed to be responsible for the effect. This potential threat to the internal validity of studies using observational data may go undetected by interobserver agreement checks.  相似文献   

8.
Two sources of variability must each be considered when examining change in level between two sets of data obtained by human observers; namely, variance within data sets (phases) and variability attributed to each data point (reliability). Birkimer and Brown (1979a, 1979b) have suggested that both chance levels and disagreement bands be considered in examining observer reliability and have made both methods more accessible to researchers. By clarifying and extending Birkimer and Brown's papers, a system is developed using observer agreement to determine the data point variability and thus to check the adequacy of obtained data within the experimental context.  相似文献   

9.
Seventeen measures of association for observer reliability (interobserver agreement) are reviewed and computational formulas are given in a common notational system. An empirical comparison of 10 of these measures is made over a range of potential reliability check results. The effects on percentage and correlational measures of occurrence frequency, error frequency, and error distribution are examined. The question of which is the best measure of interobserver agreement is discussed in terms of critical issues to be considered  相似文献   

10.
STUDIES IN APPLIED BEHAVIOR ANALYSIS HAVE USED TWO EXPRESSIONS OF RELIABILITY FOR HUMAN OBSERVATIONS: percentage agreement (including percentage occurrence and percentage nonoccurrence agreement) and correlational techniques (including the phi coefficient). The formal relationship between these two expressions is demonstrated, and a table for converting percentage agreement to phi, or vice-versa, is presented. It is suggested that both expressions be reported in order to communicate reliability unambiguously and to facilitate comparison of the reliabilities from different studies.  相似文献   

11.
Behavioral researchers have developed a sophisticated methodology to evaluate behavioral change which is dependent upon accurate measurement of behavior. Direct observation of behavior has traditionally been the mainstay of behavioral measurement. Consequently, researchers must attend to the psychometric properties, such as interobserver agreement, of observational measures to ensure reliable and valid measurement. Of the many indices of interobserver agreement, percentage of agreement is the most popular. Its use persists despite repeated admonitions and empirical evidence indicating that it is not the most psychometrically sound statistic to determine interobserver agreement due to its inability to take chance into account. Cohen's (1960) kappa has long been proposed as the more psychometrically sound statistic for assessing interobserver agreement. Kappa is described and computational methods are presented.  相似文献   

12.
Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not.  相似文献   

13.
Users of interobserver agreement statistics have heretofore ignored the problem of autocorrelation in behavior sequences when testing the statistical significance of agreement measures. Due to autocorrelation traditional reliability tests based on the 2 × 2 contingency-table model (e.g., kappa, phi) are incorrect. Correct tests can be developed by using the bivariate time series as a statistical model. Seen from this perspective, testing the significance of interobserver agreement becomes formally equivalent to testing the significance of the lag-zero cross-correlation between two time series. The robust procedure known as the jackknife is suggested for this purpose.  相似文献   

14.
Previous recommendations to employ occurrence, nonoccurrence, and overall estimates of interobserver reliability for interval data are reviewed. A rationale for comparing obtained reliability to reliability that would result from a random-chance model is explained. Formulae and graphic functions are presented to allow for the determination of chance agreement for each of the three indices, given any obtained per cent of intervals in which a response is recorded to occur. All indices are interpretable throughout the range of possible obtained values for the per cent of intervals in which a response is recorded. The level of chance agreement simply changes with changing values. Statistical procedures that could be used to determine whether obtained reliability is significantly superior to chance reliability are reviewed. These procedures are rejected because they yield significance levels that are partly a function of sample sizes and because there are no general rules to govern acceptable significance levels depending on the sizes of samples employed.  相似文献   

15.
Estimates of observer agreement are necessary to assess the acceptability of interval data. A common method for assessing observer agreement, per cent agreement, includes several major weaknesses and varies as a function of the frequency of behavior recorded and the inclusion or exclusion of agreements on nonoccurrences. Also, agreements that might be expected to occur by chance are not taken into account. An alternative method for assessing observer agreement that determines the exact probability that the obtained number of agreements or better would have occurred by chance is presented and explained. Agreements on both occurrences and nonoccurrences of behavior are considered in the calculation of this probability.  相似文献   

16.
17.
The meaning and properties of a commonly used index of reliability, S/L,were examined critically. It was found that the index does not reflect any conventional concept of reliability. When used for an identical behavioral observation session, it is not statistically correlated with other reliability indices. Within an observation session, the standardizing measure of Lis beyond the control of the investigator. Furthermore, the reason for the choice of Las the standard is unclear. The role of chance agreement in S/Lis not known. The exact interpretation of the index depends on which observer reports L.Overall the conceptual and mathematical meaning of S/Lis dubious. It is suggested that the S/Lindex should not be used until its nature is shown to be a measure of reliability. Other approaches such as the intraclass correlations and generalizability coefficients should be used instead.The authors are indebted to Johnny Matson for his critique of an earlier version of this paper.  相似文献   

18.
19.
A family of coefficients of relational agreement for numerical scales is proposed. The theory is a generalization to multiple judges of the Zegers and ten Berge theory of association coefficients for two variables and is based on the premise that the choice of a coefficient depends on the scale type of the variables, defined by the class of admissible transformations. Coefficients of relational agreement that denote agreement with respect to empirically meaningful relationships are derived for absolute, ratio, interval, and additive scales. The proposed theory is compared to intraclass correlation, and it is shown that the coefficient of additivity is identical to one measure of intraclass correlation.The author thanks the Editor and anonymous reviewers for helpful suggestions.  相似文献   

20.
一个基于综合印象评分法的作文分事后调整模型   总被引:4,自引:0,他引:4  
朱正才  杨惠中 《心理科学》2005,28(6):1459-1462
对大学英语四、六级考试作文评分进行了详细的描述,重点介绍了作文分的事后调整的原理和方法,并且给出了一个基于线性等值原理的数学推导。认为其数学模型主要是一个运用了“极大似然估计法”和“正态分”概念的统计模型。“评分标准的制定”、“用参照样卷来校准阅卷员对作文评分标准的掌握尺度”以及“阅卷员的培训和考核”构成了大学英语四、六级考试作文信度的基石;而作文分调整中基于“随机分发作文卷”、“客观题分数与作文分相关”以及“评分前后一致性”的统计方法则对出现系统性误差的阅卷员的评分结果进行了事后的校正。还提出如果拥有往次考试总体作文分均值数据,使用“加权移动平均法”可以实现对作文分的跨考次平衡。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号