首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Percentage agreement measures of interobserver agreement or "reliability" have traditionally been used to summarize observer agreement from studies using interval recording, time-sampling, and trial-scoring data collection procedures. Recent articles disagree on whether to continue using these percentage agreement measures, and on which ones to use, and what to do about chance agreements if their use is continued. Much of the disagreement derives from the need to be reasonably certain we do not accept as evidence of true interobserver agreement those agreement levels which are substantially probable as a result of chance observer agreement. The various percentage agreement measures are shown to be adequate to this task, but easier ways are discussed. Tables are given to permit checking to see if obtained disagreements are unlikely due to chance. Particularly important is the discovery of a simple rule that, when met, makes the tables unnecessary. If reliability checks using 50 or more observation occasions produce 10% or fewer disagreements, for behavior rates from 10% through 90%, the agreement achieved is quite improbably the result of chance agreement.  相似文献   

2.
Factors affecting interobserver agreement (reliability) with a comprehensive coding system in the naturalistic observation of children were examined. Data from 117 pairs of observations on 35 children and their families were examined with respect to reliability and three possible covariates: response frequency, observation complexity, and code definition clarity. Analysis of results strongly supported response frequency as a positive covariate of interobserver agreement. Complexity was found to negatively covary with interobserver agreement. The relationship between code clarity and reliability was in the predicted direction but failed to obtain statistical significance. Implications for observer training and data collection in observational studies are discussed.An earlier version of this article was prepared for presentation at The Association for the Advancement of Behavior Therapy, Atlanta, Georgia, December 1977.  相似文献   

3.
The research published in the Journal of Applied Behavior Analysis (1968 to 1975) was surveyed for three basic elements: data-collection methods, reliability procedures, and reliability scores. Three-quarters of the studies reported observational data. Most of these studies' observational methods were variations of event recording, trial scoring, interval recording, or time-sample recording. Almost all studies reported assessment of observer reliability, usually total or point-by-point percentage agreement scores. About half the agreement scores were consistently above 90%. Less than one-quarter of the studies reported that reliability was assessed at least once per condition.  相似文献   

4.
Although the quality of observational data is generally evaluated by observer agreement, measures of both observer agreement and accuracy were available in the present study. Videotapes with a criterion protocol were coded by 16 observers. All observers calculated agreement scores both on their own and their partner's data and on a contrived data set misrepresented as data collected by other observers. Compared with agreement scores calculated by the experimenter, observers erroneously inflated their own agreement scores and deflated the agreement scores on the contrived data. Half of the observers (n = 8) had been given instructions emphasizing the importance of accuracy during observation while the other half had been given instructions emphasizing interobserver agreement. Accuracy exceeded agreement for the former group, whereas agreement exceeded accuracy for the latter group. The implications are that agreement should be calculated by the experimenter and that the accuracy-agreement relationship can be altered by differential observer instructions.  相似文献   

5.
Summary The development of the Behavioral Coding System (BCS) used by the Social Learning Project has encompassed approximately 8 years of clinical and research experience with naturalistic observation as a clinical assessment tool. The BCS, while originally designed to accomplish certain broad purposes, illustrates a solution to an assessment task that should be applicable to other research and clinical settings in which naturalistic observation of family interactions are needed.A variety of reliability analyses, ranging from traditional interobserver agreement among coders to generalizability analyses, have supported the measurement precision of the BCS scores, for their intended purposes. In conducting this series of investigations, certain problems in psychometric analysis of observation data have arisen and been documented. Most notably, the tradition of estimating reliability via interobserver agreement has been questioned, mainly on the grounds that behavioral complexity intrudes into such analyses in ways that suggest that current observer reliability estimates may be substantially biased. The usefulness of generalizability theory is argued, particularly for observational data collected under varying assessment conditions which may influence behavioral scores.Three types of validity have been reported for BCS scores: content, concurrent, and construct validity. The BCS has favorably withstood these psychometric investigations, showing that the behavioral measures are justified on content grounds, that outside reports of behavior coincide satisfactorily with the BCS scores, and that expected behavioral changes following treatment are readily indexed by the BCS scores.Excerpts and abstracts from chapters by Jones, Reid, and Patterson (1975, pp. 42–95), and from Reid (1977).The assessment procedures were developed as part of an extended series of grants from the Section on Crime and Delinquency, National Institute of Mental Health.J. B. Reid (1977) has recently edited a manual which presents a much fuller report of the topics covered in this report, including: a more extensive literature review, operational definitions of code categories, normative data, video training tapes, and procedures for training observers.  相似文献   

6.
Two sources of variability must each be considered when examining change in level between two sets of data obtained by human observers; namely, variance within data sets (phases) and variability attributed to each data point (reliability). Birkimer and Brown (1979a, 1979b) have suggested that both chance levels and disagreement bands be considered in examining observer reliability and have made both methods more accessible to researchers. By clarifying and extending Birkimer and Brown's papers, a system is developed using observer agreement to determine the data point variability and thus to check the adequacy of obtained data within the experimental context.  相似文献   

7.
Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not.  相似文献   

8.
The three algorithms most frequently selected by behavior‐analytic researchers to compute interobserver agreement with continuous recording were used to assess the accuracy of data recorded from video samples on handheld computers by 12 observers. Rate and duration of responding were recorded for three samples each. Data files were compared with criterion records to determine observer accuracy. Block‐by‐block and exact agreement algorithms were susceptible to inflated agreement and accuracy estimates at lower rates and durations. The exact agreement method appeared to be overly stringent for recording responding at higher rates (23.5 responses per minute) and for higher relative duration (72% of session). Time‐window analysis appeared to inflate accuracy assessment at relatively high but not at low response rate and duration (4.8 responses per minute and 8% of session, respectively).  相似文献   

9.
Portable electronic data collection devices permit investigators to collect large amounts of observational data in a form ready for computer analysis. These devices are particularly efficient for gathering continuous data on multiple behavior categories. We expect that the increasing availability of these devices will lead to greater use of continuous data collection methods in observational research. This paper addresses the difficulties encountered when calculating traditional interobserver agreement statistics for continuous, multiple-code scoring. Two alternative strategies are described that yield interobserver agreement values based on the exact time of behavior code entries by the primary and secondary observers.Work on this paper was supported in part by NICHD Grants P01HD15051 and R01HD17650 and Office of Special Education and Rehabilitation Services Grant G008302980.  相似文献   

10.
In the reliability analysis literature, little attention has been given to the various possible ways of creating a basis for the comparison required to compute observer agreement. One needs this comparison to turn a sequential list of behavioral records into a confusion matrix. It is shown that the way to do this depends on the research question one needs to answer. Four methods for creating a basis for comparison for the computation of observer agreement in observational data are presented. Guidelines are given for computing observer agreement in a way that fits one’s goals. Finally, we discuss how these methods have been implemented in The Observer software. The Observer 4.1 supports all the methods that have been discussed. Most of these methods are not present in any other software package.  相似文献   

11.
Seventeen measures of association for observer reliability (interobserver agreement) are reviewed and computational formulas are given in a common notational system. An empirical comparison of 10 of these measures is made over a range of potential reliability check results. The effects on percentage and correlational measures of occurrence frequency, error frequency, and error distribution are examined. The question of which is the best measure of interobserver agreement is discussed in terms of critical issues to be considered  相似文献   

12.
Two types of interobserver reliability values may be needed in treatment studies in which observers constitute the primary data-acquisition system: trial reliability and the reliability of the composite unit or score which is subsequently analyzed, e.g., daily or weekly session totals. Two approaches to determining interobserver reliability are described: percentage agreement and "correlational" measures of reliability. The interpretation of these estimates, factors affecting their magnitude, and the advantages and limitations of each approach are presented.  相似文献   

13.
Graphical and statistical indices employed to represent observer agreement in interval recording are described as "judgmental aids", stimuli to which the researcher and scientific community must respond when viewing observer agreement data. The advantages and limitations of plotting calibrating observer agreement data and reporting conventional statistical aids are discussed in the context of their utility for researchers and research consumers of applied behavior analysis. It is argued that plotting calibrating observer data is a useful supplement to statistical aids for researchers but is of only limited utility for research consumers. Alternatives to conventional per cent agreement statistics for research consumers include reporting special agreement estimates (e.g., per cent occurrence agreement and nonoccurrence agreement) and correlational statistics (e.g., Kappa and Phi).  相似文献   

14.
Three publication sources were reviewed to determine the recent conventions for collecting, and assessing the reliability of, academic permanent-product data (handwriting, examination papers, etc in applied behavior analysis. The primary source was the Journal of Applied Behavior Analysis (1968–1974). Secondary sources included conference proceedings titled, A new direction in education: behavior analysis (E. Ramp and B. L. Hopkins, Eds., Lawrence, Kansas: Support and Development Center for Follow Through, Department of Human Development, University of Kansas, 1971), and Behavior Analysis and Education (G. Semb, Ed., Lawrence, Kansas: Support and Development Center for Follow Through, Department of Human Development, University of Kansas, 1972). Finally, as a test of the generality of the findings in the two applied behavior analysis sources, the current issue of each of 14 psychological and/or educational journals was reviewed. Thirty JABA studies reported academic permanent-product data, but only 14 reported reliability. Increasingly more product data through 1973 have been reported along with a greater proportion of authors reporting reliability. The review of the two conference proceeding publications revealed the same trend. In 1971, only three studies reported academic product data, none with reliability, while in 1972, 15 reported academic data, nine including reliability assessment. The review of 14 current education/psychology journal issues revealed four studies reporting academic data, none with reliability. Across all sources, about one-half of the studies reported reliability. Most of the studies reporting reliability described the frequency of reliability assessment, with approximately equal numbers of JABA studies reporting reliability for each paper or reliability for each session. The use of uninformed observers was reported in only three JABA studies and one conference study. Marks made on subjects' papers by either the teacher or the primary observer were reported masked for reliability purposes by only two JABA and two conference studies. Reliability was calculated on a session total basis in two JABA studies. Point-by-point agreement was given in nine JABA and three conference studies. Perfect reliability (mean agreement of 100%) was reported in only six JABA and three conference studies. Scores between 90 and 100% were reported in nine JABA and four conference studies. Scores below 80% were reported in three JABA studies. No other percentage agreement scores were reported, although one JABA study reported correlational reliability (Pearson r). In summary, recently more studies have dealt with academic data and, until 1974, a greater proportion of these studies reported reliability assessment, and relatively few studies reported either replicable methods, 100% agreement, or controls for maintaining rater independence.  相似文献   

15.
This study examined the effect on observer agreement of switching from a system of overt reliability assessment to two successive systems of covert reliability measurement. A primary purpose was to see whether agreement obtained with the use of covert data checks would improve over time if observers were provided with accurate feedback regarding their level of agreement. Seventeen undergraduate psychology students served as Ss. They observed instructional interactions between preschool children and their teachers. During the two week “overt” check period (Phase I), observers were aware of when their observations were being “checked” by a previously designated reliability assessor. In the subsequent covert phases (weeks 3–7) this information was not available to them during the observational sessions. When covert monitoring was implemented, agreement rates initially dropped significantly below the “overt” measurement phase. Gradually agreement rates improved until, in weeks 6 and 7, they were not significantly different from the initial overt measures. Es should be aware that an overt check on observer agreement may not reflect the true reliability of an observational system. However, when observers are given accurate feedback on their level of agreement, they are able significantly to improve their vigilance and consistency in the use of the observational system.  相似文献   

16.
We reviewed all research articles in 10 recent volumes of the Journal of Applied Behavior Analysis (JABA): Vol. 28(3), 1995, through Vol. 38(2), 2005. Continuous recording was used in the majority (55%) of the 168 articles reporting data on free‐operant human behaviors. Three methods for reporting interobserver agreement (exact agreement, block‐by‐block agreement, and time‐window analysis) were employed in more than 10 of the articles that reported continuous recording. Having identified these currently popular agreement computation algorithms, we explain them to assist researchers, software writers, and other consumers of JABA articles.  相似文献   

17.
The McMaster Model of Family Functioning defines seven dimensions, which may be assessed either by an observer applying a Clinical Rating Scale (CRS) to a semi-structured interview of the family and/or by family members completing a questionnaire, the Family Assessment Device (FAD). The present article applied both methods of assessment, as well as the Dyadic Adjustment Scale (DAS), to a nonclinical sample (N = 105 ). Interobserver reliability on the CRS was highly significant. Parent (FAD) vs. observer (CRS) agreement was also highly significant, except for Affective Responsiveness and Behavior Control, for which agreement was barely significant. When families were labeled as “healthy” or “unhealthy” according to cut-offs, agreement between observers and parents was high (87%), and disagreements illuminated dynamics of individual families. Finally, the DAS (completed by mothers) was significantly correlated with both the CRS and the FAD, particularly for General Functioning.  相似文献   

18.
The need to train accurate, not necessarily agreeing, observers is discussed. Intraobserver consistency as an intermediate criterion in such training is proposed and contrasted with the more familiar criterion of interobserver agreement. Videotaped observations of social interactions between handicapped and nonhandicapped preschoolers provided the medium for examining the criterion agreement of four observers trained against each type of standard. Observers generally failed to show high levels of criterion agreement whether trained to a within- or to a between-observer agreement standard. The results varied somewhat with the frequency of behaviors, however. Correlations between interobserver agreement and intraobserver consistency were variable but somewhat higher when interobserver agreement was the training criterion than when intraobserver consistency was the criterion. Correlations between interobserver agreement and criterion agreement ranged from — .16 to .89 during interobserver agreement training. Correlations between intraobserver consistency and criterion agreement ranged from — .23 to .99 during intraobserver consistency training.  相似文献   

19.
Several factors thought to influence the representativeness of behavioral assessment data were examined in an analogue study using a multifactorial design. Systematic and unsystematic methods of observing group behavior were investigated using 18 male and 18 female observers. Additionally, valence properties of the observed behaviors were inspected. Observers' assessments of a videotape were compared to a criterion code that defined the population of behaviors. Results indicated that systematic observation procedures were more accurate than unsystematic procedures, though this factor interacted with gender of observer and valence of behavior. Additionally, males tended to sample more representatively than females. A third finding indicated that the negatively valenced behavior was overestimated, whereas the neutral and positively valenced behaviors were accurately assessed.  相似文献   

20.
This paper demonstrates and compares methods for estimating the interrater reliability and interrater agreement of performance ratings. These methods can be used by applied researchers to investigate the quality of ratings gathered, for example, as criteria for a validity study, or as performance measures for selection or promotional purposes. While estimates of interrater reliability are frequently used for these purposes, indices of interrater agreement appear to be rarely reported for performance ratings. A recommended index of interrater agreement, theT index (Tinsley & Weiss, 1975), is compared to four methods of estimating interrater reliability (Pearsonr, coefficient alpha, mean correlation between raters, and intraclass correlation). Subordinate and superior ratings of the performance of 100 managers were used in these analyses. The results indicated that, in general, interrater agreement and reliability among subordinates were fairly high. Interrater agreement between subordinates and superiors was moderately high; however, interrater reliability between these two rating sources was very low. The results demonstrate that interrater agreement and reliability are distinct indices and that both should be reported. Reasons are discussed as to why interrater reliability should not be reported alone.This paper is based, in part, on a thesis submitted to East Carolina University by the second author. Portions of this study were presented at the American Psychological Association meeting in New Orleans, LA, August, 1989. The authors would like to thank Michael Campion and two anonymous reviewers for their comments on earlier drafts of this paper.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号