A standardized estimation of Rorschach interrater agreement is needed. Percentage agreement, although widely used, is found to be unsuitable. Forty-one protocols from adults in both a normal and a psychiatric sample were scored by two or three scorers, making 85 scoring pairs. Percentage agreement, correlations (phi and Pearson's r ), and kappa were computed on single response, total score, and category level. Percentage agreement shows minimal variation. Even when exceeding 0.80, it can obscure major disagreements. Kappa and correlations both vary in a similar way with level of disagreement. Total score level does not give additional information compared to single score and category levels. Kappa proved to be conservative and reliable and is therefore suggested as a standard estimate.  相似文献   

Studies of agreement commonly occur in psychiatric research. For example, researchers are often interested in the agreement among radiologists in their review of brain scans of elderly patients with dementia or in the agreement among multiple informant reports of psychopathology in children. In this paper, we consider the agreement between two raters when rating a dichotomous outcome (e.g., presence or absence of psychopathology). In particular, we consider logistic regression models that allow agreement to depend on both rater- and subject-level covariates. Logistic regression has been proposed as a simple method for identifying covariates that are predictive of agreement (Coughlin et al., 1992). However, this approach is problematic since it does not take account of agreement due to chance alone. As a result, a spurious association between the probability (or odds) of agreement and a covariate could arise due entirely to chance agreement. That is, if the prevalence of the dichotomous outcome varies among subgroups of the population, then covariates that identify the subgroups may appear to be predictive of agreement. In this paper we propose a modification to the standard logistic regression model in order to take proper account of chance agreement. An attractive feature of the proposed method is that it can be easily implemented using existing statistical software for logistic regression. The proposed method is motivated by data from the Connecticut Child Study (Zahner et al., 1992) on the agreement among parent and teacher reports of psychopathology in children. In this study, parents and teachers provide dichotomous assessments of a child's psychopathology and it is of interest to examine whether agreement among the parent and teacher reports is related to the age and gender of the child and to the time elapsed between parent and teacher assessments of the child.The authors thank the Associate Editor and the referees for their helpful comments and suggestions. We also thank Gwen Zahner for use of data from the Connecticut Child Study, which was conducted under contract to the Connecticut Department of Children and Youth Services. This research was supported by grants HL 69800, AHRQ 10871, HL52329, HL61769, GM 29745, MH 54693 and MH 17119 from the National Institutes of Health.  相似文献   

This paper examines a model and defines reasonable assumptions underlying different measures of observer agreement for categorical data collected in free operant situations. It is assumed that two or more observers classify operant behaviors of subjects into occurrences and nonoccurrences by recognition by validated response classes (categories) such that the rates of false positives and observer biases are acceptably low. Thus errors are mostly omissions, i.e., failing to observe events that occur. Four alternative cases are derived, together with formulas for calculating significance tests, variances, and standard errors, three of which do not depend on knowledge of the proportion of time points at which the event does not occur.We wish to acknowledge NICHD Grant HD-10570, The Neuropharmacology of Developmental Disorders, George Breese, Ph.D., and C. T. Gualtieri, M.D., Principal Investigators; NIEHS Grant ES-01104; USPHS Grant HD-03110; and MCH Project 916 to the Division for Disorders of Development and Learning.  相似文献   

Percentage agreement measures of interobserver agreement or "reliability" have traditionally been used to summarize observer agreement from studies using interval recording, time-sampling, and trial-scoring data collection procedures. Recent articles disagree on whether to continue using these percentage agreement measures, and on which ones to use, and what to do about chance agreements if their use is continued. Much of the disagreement derives from the need to be reasonably certain we do not accept as evidence of true interobserver agreement those agreement levels which are substantially probable as a result of chance observer agreement. The various percentage agreement measures are shown to be adequate to this task, but easier ways are discussed. Tables are given to permit checking to see if obtained disagreements are unlikely due to chance. Particularly important is the discovery of a simple rule that, when met, makes the tables unnecessary. If reliability checks using 50 or more observation occasions produce 10% or fewer disagreements, for behavior rates from 10% through 90%, the agreement achieved is quite improbably the result of chance agreement.  相似文献   

Many variables that are analyzed by social scientists are nominal in nature. When missing data occur on these variables, optimal recovery of the analysis model's parameters is a challenging endeavor. One of the most popular methods to deal with missing nominal data is multiple imputation (MI). This study evaluated the capabilities of five MI methods that can be used to treat incomplete nominal variables: multiple imputation with chained equations (MICE) using polytomous regression as the elementary imputation method; MICE based on classification and regression trees (CART); MICE based on nested logistic regressions; the ranking procedure described by Allison (2002 Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage Publications. https://doi.org/10.4135/9780857020994.n4[Crossref] [Google Scholar]); and a joint modeling approach based on the general location model. We first motivate our inquiry with an applied example and then present the results of a Monte Carlo simulation study that compared the performance of the five imputation methods under conditions of varying sample size, percentage of missing data, and number of nominal response categories. We found that MICE with polytomous regression was the strongest performer while the Allison (2002 Allison, P. D. (2002). Missing data. Thousand Oaks, CA: Sage Publications. https://doi.org/10.4135/9780857020994.n4[Crossref] [Google Scholar]) ranking procedure and MICE with CART performed poorly in most conditions.  相似文献   

This research reports about asymmetrical relations in self-other ratings of attachment style. Specifically, results showed that romantic partners hold relatively accurate perceptions of each other’s attachment styles with one exception: women’s ability to judge their male partner’s level of attachment-related anxiety was compromised compared with the other agreement indices measured. The effect was not moderated by acquaintanceship length or relationship satisfaction, but it was affected by men’s interpersonally oriented self-control. The findings appear to reflect men’s reluctance from appearing anxious to their female partners and from the nature of the anxiety dimension of attachment. Anxiety (as compared with avoidance) has a less consistent interpersonal behavioral manifestation and thus is more concealable among those motivated and capable of doing so.  相似文献   

This paper gives a method for determining a sample size that will achieve a prespecified bound on confidence interval width for the interrater agreement measure,. The same results can be used when a prespecified power is desired for testing hypotheses about the value of kappa. An example from the literature is used to illustrate the methods proposed here.  相似文献   

STUDIES IN APPLIED BEHAVIOR ANALYSIS HAVE USED TWO EXPRESSIONS OF RELIABILITY FOR HUMAN OBSERVATIONS: percentage agreement (including percentage occurrence and percentage nonoccurrence agreement) and correlational techniques (including the phi coefficient). The formal relationship between these two expressions is demonstrated, and a table for converting percentage agreement to phi, or vice-versa, is presented. It is suggested that both expressions be reported in order to communicate reliability unambiguously and to facilitate comparison of the reliabilities from different studies.  相似文献   

Two designs for comparing a judge's ratings with a known standard are presented and compared. Design A pertains to the situation where the judge is asked to categorize each ofN subjects into one ofr (known) classes with no knowledge of the actual number in each class. Design B is employed when the judge is given the actual number in each class and is asked to categorize the individuals subject to these constraints. The probability distribution of the total number of correct choices is developed in each case. A power comparison of the two procedures is undertaken.  相似文献   

Behavioral researchers have developed a sophisticated methodology to evaluate behavioral change which is dependent upon accurate measurement of behavior. Direct observation of behavior has traditionally been the mainstay of behavioral measurement. Consequently, researchers must attend to the psychometric properties, such as interobserver agreement, of observational measures to ensure reliable and valid measurement. Of the many indices of interobserver agreement, percentage of agreement is the most popular. Its use persists despite repeated admonitions and empirical evidence indicating that it is not the most psychometrically sound statistic to determine interobserver agreement due to its inability to take chance into account. Cohen's (1960) kappa has long been proposed as the more psychometrically sound statistic for assessing interobserver agreement. Kappa is described and computational methods are presented.  相似文献   

How changes in the interobserver agreement and disagreement cells in the reliability matrix are reflected differently in eight commonly used reliability indices is shown graphically. Indices which take into account expected chance difference are compared to those indices which do not. Differences between indices which do and do not treat the agreement and the disagreement cells equally are also illustrated.  相似文献   

Visual analysis is integral to the analysis of single-case experimental design (SCED) data. Previous studies have shown that many factors may influence the interrater agreement (IRA) of visual analysis. One factor that has received little direct attention is the impact of contextual information. In the current study, authors of recently published SCED studies were asked to make judgments regarding functional relations based on published datasets that met criteria for design quality. Respondents were randomly assigned to view graphs with or without contextual information and the degree of interrater agreement was compared. Results revealed that contextual information had no impact on IRA for decisions of a functional relation. IRA was high across both groups for 6 of the 7 datasets examined. Implications and recommendations based on these results are discussed.  相似文献   

Radical interpretation is used by Davison in his linguistic theory not only as an interesting thought experiment but also a general pattern that is believed to be able to give an essential and general account of linguistic interpretation. If the principle of charity is absolutely necessary to radical interpretation, it becomes, in this sense, a general methodological principle. However, radical interpretation is a local pattern that is proper only for exploring certain interpretation in a specific case, and consequently the principle of charity is an applicable principle in the limited scope. It is neither the case that every linguistic interpretation is in nature radical nor that the principle of charity is the primary and fundamental principle for all linguistic interpretation as Davidson believes.   相似文献   

Although the quality of observational data is generally evaluated by observer agreement, measures of both observer agreement and accuracy were available in the present study. Videotapes with a criterion protocol were coded by 16 observers. All observers calculated agreement scores both on their own and their partner's data and on a contrived data set misrepresented as data collected by other observers. Compared with agreement scores calculated by the experimenter, observers erroneously inflated their own agreement scores and deflated the agreement scores on the contrived data. Half of the observers (n = 8) had been given instructions emphasizing the importance of accuracy during observation while the other half had been given instructions emphasizing interobserver agreement. Accuracy exceeded agreement for the former group, whereas agreement exceeded accuracy for the latter group. The implications are that agreement should be calculated by the experimenter and that the accuracy-agreement relationship can be altered by differential observer instructions.  相似文献   

There is a frequent need to measure the degree of agreement among R observers who independently classify n subjects within K nominal or ordinal categories. The most popular methods are usually kappa-type measurements. When = 2, Cohen's kappa coefficient (weighted or not) is well known. When defined in the ordinal case while assuming quadratic weights, Cohen's kappa has the advantage of coinciding with the intraclass and concordance correlation coefficients. When > 2, there are more discrepancies because the definition of the kappa coefficient depends on how the phrase ‘an agreement has occurred’ is interpreted. In this paper, Hubert's interpretation, that ‘an agreement occurs if and only if all raters agree on the categorization of an object’, is used, which leads to Hubert's (nominal) and Schuster and Smith's (ordinal) kappa coefficients. Formulae for the large-sample variances for the estimators of all these coefficients are given, allowing the latter to illustrate the different ways of carrying out inference and, with the use of simulation, to select the optimal procedure. In addition, it is shown that Schuster and Smith's kappa coefficient coincides with the intraclass and concordance correlation coefficients if the first coefficient is also defined assuming quadratic weights.  相似文献   

The remarkably high agreement between observers using the SRIC, the TSBC, and the CFRS makes the observers interchangeable. This agreement is a product of the intensive and extensive training and monitoring of full-time observers, the use of categories rather than continua, the low degree of interpretation required by the procedures, the use of lay language, the immediate recording after short observational periods, and the familiar and standard setting. Other important features include use of rates, emphasis on time, concern for positive behavior, and close linkages between concepts and categories so that concepts are explicitly specified. Taken together, these characteristics make these systems powerful instruments for the guidance of treatment and the prediction of outcomes. The key to their success seems to be the recording of simple actions and action sequences (the basic phenomena of psychology) as the basic data from which a variety of useful indexes can be readily formed.Preparation of this article was supported by Grant MH-30654 from the National Institute of Mental Health to S. Duncan and D. Fiske.Presented at the 87th Annual Meetings of the American Psychological Association, New York City, September 1979, as part of a symposium on New assessment systems for residential treatment, management, research, and evaluation.  相似文献   

Does producing syntactic agreement rely on syntactic or memory-based retrieval processes? The present study investigated the extent to which syntactic processing deficits and working memory (WM) deficits predict susceptibility to agreement attraction [Bock, K., &; Miller, C. A. (1991). Broken agreement. Cognitive Psychology, 23, 45–93], where speakers tend to erroneously produce plural agreement for a singular subject when another noun in the sentence is grammatically plural. Four brain-injured patients with varying degrees of grammatical and WM deficits completed sentences with local nouns that matched or mismatched in number with the head noun, and that were plausible or implausible subjects. Both aspects of grammatical deficits and the extent of WM deficits predicted the extent of agreement attraction effects. These data are consistent with the proposal that producing an agreeing verb involves a cue-based search in WM for an appropriate controlling noun, which is subject to interference from other elements in memory with similar properties [cf. Badecker, W., &; Kuminiak, F. (2007). Morphology, agreement and working memory retrieval in sentence production: Evidence from gender and case in Slovak. Journal of Memory and Language, 56(1), 65–85. doi:10.1016/j.jml.2006.08.004].  相似文献   

Self/observer agreement on HEXACO-PI-R scale scores was examined as a function of observers’ subjective ratings of acquaintanceship. For each participant (N = 2199), personality self-reports were obtained along with observer reports from a friend. Each factor-level scale displayed a different pattern of upward accuracy (agreement) trends in personality judgment. Self/observer agreement for Extraversion, Emotionality, and Openness was noticeably stronger at lower acquaintanceship than that for Conscientiousness, Agreeableness, and Honesty-Humility. Conscientiousness showed a steep upward accuracy trend across acquaintanceship levels, reaching a level of accuracy comparable to that of Extraversion and Emotionality. Self/observer agreement for Honesty-Humility and Agreeableness showed slower upward trends than that of Conscientiousness. In several cases, facet-level traits within the same broad factor differed in their accuracy trends.  相似文献   

Three moderators of agreement in person perception, behavioral consistency, observability and social desirability, were studied. The major hypothesis is that the moderators can be estimated using the standing of targets on traits; that is, that as targets vary on a given trait, they vary on how they are seen as on the moderators. Using Korean ( N  = 135) and US ( N  = 81) samples, we tested this approach for 80 traits. Analyses revealed that moderators varied by the combination of trait and target standing in different ways for the two samples. In judgment of behavioral consistency over target standings, linear and curvilinear trends were stronger for the US sample than for the Korean sample. For observability, judgments were similar, although curvilinear trends were larger for the Korean sample. Furthermore, being extremely positive was perceived as less desirable for the Korean judges. These findings were discussed in terms of cultural differences. Moreover, a new approach to the study of moderators was proposed.  相似文献   

We examined articles with experiments published in the Journal of Applied Behavior Analysis and in Behavior Analysis in Practice from 2017 through 2021 to determine how frequently procedural fidelity was assessed. When procedural fidelity was assessed, we determined how often a measure of interobserver agreement for those fidelity data was provided. We also determined how often a measure of interobserver agreement for participants' behavior was provided. Across both journals and all years, 54.7% of relevant articles provided a measure of procedural fidelity. Of them, 17.7% provided a measure of interobserver agreement for procedural fidelity. In marked contrast, 96.4% provided interobserver agreement data for participants' behavior. It is unfortunate that applied behavior analysts frequently fail to provide procedural fidelity data and, when they do, often fail to provide interobserver agreement data for the fidelity data. Reviewers for, and editors of, behavior-analytic journals are encouraged to strongly consider the relative value of procedural fidelity and agreement on procedural fidelity measures when rendering recommendations on the suitability of a given submission.  相似文献   

