共查询到20条相似文献,搜索用时 15 毫秒
1.
THREE FACTORS CHARACTERISTIC OF EXPERIMENTAL SETTINGS WERE HYPOTHESIZED TO INFLATE ARTIFACTUALLY THE RELIABILITY OF OBSERVATIONAL RECORDINGS: (a) knowledge by observers of when and by whom their reliability is being assessed, (b) the absence of the experimenter or a monitor to prevent cheating, and (c) computation of reliability within- (versus between-) observer group. Three groups of four observers used a standard nine-category observational code for disruptive behavior in recording from videotapes of a classroom for 22 days. Analyses revealed considerable increases in average occurrence reliability as a function of the main effects of each of the experimental factors. The specific increases in reliability associated with each of the 12 combinations of the experimental factors are presented for each category of behavior. The possible role of observer-training procedures and behavioral definitions as determiners of nonartifactual reliability is discussed. 相似文献
2.
Graphical and statistical indices employed to represent observer agreement in interval recording are described as "judgmental aids", stimuli to which the researcher and scientific community must respond when viewing observer agreement data. The advantages and limitations of plotting calibrating observer agreement data and reporting conventional statistical aids are discussed in the context of their utility for researchers and research consumers of applied behavior analysis. It is argued that plotting calibrating observer data is a useful supplement to statistical aids for researchers but is of only limited utility for research consumers. Alternatives to conventional per cent agreement statistics for research consumers include reporting special agreement estimates (e.g., per cent occurrence agreement and nonoccurrence agreement) and correlational statistics (e.g., Kappa and Phi). 相似文献
3.
The conventionally employed procedure for rating ischemic pain was found to produce a degree of response bias associated with the ceiling points of the scale used. A new approach permitting open-ended ratings followed by transformation of these ratings into a common decile scale provided far greater test-retest reliability. This was explained largely in terms of the attenuation of rating artifact. The new procedure also gave rise to consistently linear functions for ischemic pain. Implications are raised for the measurement of pain as well as other psychological continua. 相似文献
4.
Kelly MB 《Journal of applied behavior analysis》1977,10(1):97-101
The research published in the Journal of Applied Behavior Analysis (1968 to 1975) was surveyed for three basic elements: data-collection methods, reliability procedures, and reliability scores. Three-quarters of the studies reported observational data. Most of these studies' observational methods were variations of event recording, trial scoring, interval recording, or time-sample recording. Almost all studies reported assessment of observer reliability, usually total or point-by-point percentage agreement scores. About half the agreement scores were consistently above 90%. Less than one-quarter of the studies reported that reliability was assessed at least once per condition. 相似文献
5.
Although the quality of observational data is generally evaluated by observer agreement, measures of both observer agreement and accuracy were available in the present study. Videotapes with a criterion protocol were coded by 16 observers. All observers calculated agreement scores both on their own and their partner's data and on a contrived data set misrepresented as data collected by other observers. Compared with agreement scores calculated by the experimenter, observers erroneously inflated their own agreement scores and deflated the agreement scores on the contrived data. Half of the observers (n = 8) had been given instructions emphasizing the importance of accuracy during observation while the other half had been given instructions emphasizing interobserver agreement. Accuracy exceeded agreement for the former group, whereas agreement exceeded accuracy for the latter group. The implications are that agreement should be calculated by the experimenter and that the accuracy-agreement relationship can be altered by differential observer instructions. 相似文献
6.
Betty J. House Alvin E. House 《Journal of psychopathology and behavioral assessment》1979,1(2):149-165
Factors affecting interobserver agreement (reliability) with a comprehensive coding system in the naturalistic observation of children were examined. Data from 117 pairs of observations on 35 children and their families were examined with respect to reliability and three possible covariates: response frequency, observation complexity, and code definition clarity. Analysis of results strongly supported response frequency as a positive covariate of interobserver agreement. Complexity was found to negatively covary with interobserver agreement. The relationship between code clarity and reliability was in the predicted direction but failed to obtain statistical significance. Implications for observer training and data collection in observational studies are discussed.An earlier version of this article was prepared for presentation at The Association for the Advancement of Behavior Therapy, Atlanta, Georgia, December 1977. 相似文献
7.
《Journal of Cognitive Psychology》2013,25(7):750-759
ABSTRACTConcerns have been raised about the reliability of dot-probe tasks. The cued Visual Probe Task (cVPT) uses cues predicting locations of emotional stimuli, which appears to improve reliability. However, cVPT reliability could be affected by individual differences involving cue features. Here, we assessed specifically anticipatory reliability. Further, trial-to-trial carryover effects, previously found for stimulus-evoked biases, were tested. 82 participants were analysed, who performed an online procedure including a reversal of the cue mapping. Predicted stimulus categories were neutral and angry faces. Cue-Stimulus Intervals of 400 and 1000?ms were used. An overall anticipatory attentional bias, in terms of RT difference scores, towards threat was found. Reliability was around .4, similar to previous results despite the mapping reversal procedure. Carryover effects were found with a similar pattern as for non-cued threat-evoked bias. The results confirm a reasonably reliable outcome-focused bias towards threat, showing similar carryover effects as found for stimulus-evoked bias. 相似文献
8.
In theory, the greatest lower bound (g.l.b.) to reliability is the best possible lower bound to the reliability based on single test administration. Yet the practical use of the g.l.b. has been severely hindered by sampling bias problems. It is well known that the g.l.b. based on small samples (even a sample of one thousand subjects is not generally enough) may severely overestimate the population value, and statistical treatment of the bias has been badly missing. The only results obtained so far are concerned with the asymptotic variance of the g.l.b. and of its numerator (the maximum possible error variance of a test), based on first order derivatives and the asumption of multivariate normality. The present paper extends these results by offering explicit expressions for the second order derivatives. This yields a closed form expression for the asymptotic bias of both the g.l.b. and its numerator, under the assumptions that the rank of the reduced covariance matrix is at or above the Ledermann bound, and that the nonnegativity constraints on the diagonal elements of the matrix of unique variances are inactive. It is also shown that, when the reduced rank is at its highest possible value (i.e., the number of variables minus one), the numerator of the g.l.b. is asymptotically unbiased, and the asymptotic bias of the g.l.b. is negative. The latter results are contrary to common belief, but apply only to cases where the number of variables is small. The asymptotic results are illustrated by numerical examples.This research was supported by grant DMI-9713878 from the National Science Foundation. 相似文献
9.
Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not. 相似文献
10.
STUDIES IN APPLIED BEHAVIOR ANALYSIS HAVE USED TWO EXPRESSIONS OF RELIABILITY FOR HUMAN OBSERVATIONS: percentage agreement (including percentage occurrence and percentage nonoccurrence agreement) and correlational techniques (including the phi coefficient). The formal relationship between these two expressions is demonstrated, and a table for converting percentage agreement to phi, or vice-versa, is presented. It is suggested that both expressions be reported in order to communicate reliability unambiguously and to facilitate comparison of the reliabilities from different studies. 相似文献
11.
12.
Nathan A. Kimbrel Rosemery O. Nelson-GrayJohn T. Mitchell 《Personality and individual differences》2012,52(3):395-400
The goal of the present research was to test the hypothesis that cognitive biases for negative and threatening social information mediate the effects of behavioral inhibition system (BIS) and behavioral approach system (BAS) sensitivity on social anxiety. Participants completed self-report measures of BIS and BAS and then underwent a social-threat induction procedure in which they were told they would have to perform a speech. A battery of cognitive bias measures was then administered, followed by a battery of state anxiety measures. Audience members also rated participants’ anxiety during the speech. Structural equation modeling was used to test the hypothesized model. As predicted, the fully-mediated model showed the best fit to the data, and higher BIS and lower BAS were found to have significant indirect effects on social anxiety via cognitive bias. 相似文献
13.
Karen C. Wells Robert J. McMahon Rex Forehand Douglas L. Griest 《Journal of psychopathology and behavioral assessment》1980,2(1):65-69
The purpose of the present study was to examine the effect of the presence of a reliability observer on the number of positive parent behaviors recorded by a primary observer during naturalistic parent-child interactions. Thirty parents and their young clinic-referred children served as subjects. After two initial home observations, a reliability observer was present to record data in observation session 3, but not session 4, for one-half the subjects. For the remaining subjects the reliability observer was present in session 4 but not session 3. The results of a 2 × 2 analysis of variance indicated a group by session interaction which resulted from an increase in maternal attention to the child in the presence of a reliability observer. Hypotheses to explain the finding are presented and implications of the results are discussed.This research was supported in part by NIMH Grant MH28859-01. 相似文献
14.
Gregory J. Boyle Tania J. Lennon 《Journal of psychopathology and behavioral assessment》1994,16(3):173-187
The reliability, discriminant validity, and construct validity of the Personality Assessment Inventory (PAI) — a multidimensional self-report measure of abnormal personality traits — were examined within the Australian context. Subjects included 151 normals, 30 alcoholics, and 30 schizophrenic patients. A subsample of 70 nonpsychiatric adults responded to the PAI items twice over a test-retest interval of 28 days. The resulting median retest coefficient was 0.7, indicating less than optimal stability. The median alpha (KR21) coefficient was 0.8, suggesting somewhat narrow measurement scales. A significant multivariate main effect was obtained across groups after the effects of age and gender were removed. Multiple comparisons for each of the PAI scales revealed significant differences between the respective groups, as discussed. A higher-order scale factoring did not strongly support the purported PAI structure. In reanalyses of the correlation matrices included in the Professional Manual, the purported PAI factor structure was unable to be replicated for the standardization clinical sample (N=1246), and a confirmatory factor analysis using the normative (validation) correlational data (N=1000) revealed poor fit indices, raising further concerns about construct validity. 相似文献
15.
16.
Several factors thought to influence the representativeness of behavioral assessment data were examined in an analogue study using a multifactorial design. Systematic and unsystematic methods of observing group behavior were investigated using 18 male and 18 female observers. Additionally, valence properties of the observed behaviors were inspected. Observers' assessments of a videotape were compared to a criterion code that defined the population of behaviors. Results indicated that systematic observation procedures were more accurate than unsystematic procedures, though this factor interacted with gender of observer and valence of behavior. Additionally, males tended to sample more representatively than females. A third finding indicated that the negatively valenced behavior was overestimated, whereas the neutral and positively valenced behaviors were accurately assessed. 相似文献
17.
Mark H. Licht Gordon L. Paul Christopher T. Power Kathryn L. Engel 《Journal of psychopathology and behavioral assessment》1980,2(3):175-206
The comparative effectiveness of two time-limited modes of training observers to code activity on the Staff-Resident Interaction Chronograph (SRIC) in residential treatment programs for mentally disabled adults was evaluated. The susceptibility of training procedures for consensual observer drift was also examined, as was the predictability of SRIC mastery from trainee characteristics. Two equated groups of undergraduate student trainees (N=15 each) participated in full-time training for 27 days, followed by two weeks of criterion testing in vivo and on videotapes. One group received training by experience personnel using procedures known to be effective (original method). The other group received training via a previously untested set of written and videotaped procedures that do not rely on experienced personnel (package method). Multivariate and univariate analyses of variance found both methods to be equally effective in the degree of mastery achieved by trainees, without evidence of observer drift. No meaningful predictions of coding mastery were found, but conceptual mastery was predictable from individual characteristics. Differences were obtained for both groups between in vivo versus videotaped criterion tests. The results document procedures that are both efficient and resistant to invalidity for complex observational methodology as well as feasible for standardizing assessment of staff functioning across residential settings.This article is based on a thesis submitted to the Graduate College of the University of Illinois at Urbana-Champaign in partial fulfillment of the requirements for the Ph.D. degree in psychology by the first author under direction of the second author. The third and fourth authors also participated as supervisors. Appreciation is extended to other members of the thesis committee, Fred Kanfer, W. Robert Nay, Julian Rappaport, and James Wardrop, for their comments and recommendations. This study was partially supported by Public Health Service Grants MH-25464 and MH-14257 from the National Institute of Mental Health, and by grants from The Joyce Foundation and the Illinois Department of Mental Health and Developmental Disabilities. 相似文献
18.
Accuracy in the ability to detect truths and lies isimportant in a legal setting. It might be used as atool in police investigations to eliminate potentialsuspects, to check the truthfulness of informants orto examine contradictory statements of witnesses andsuspects in the same case. A consistent finding in thedetection of deception literature is the truthbias: People's accuracy at detecting truths isusually higher than their accuracy at detecting lies.The present article examines whether the existence ofa truth bias depends on the type of lie. It is arguedthat a truth bias may occur when people judgeextensive statements (e.g. elaborations), but that alie bias may occur when people judge statements whichdo not provide much verbal information (e.g. denials).Fifty participants (college students) were exposed to20 video fragments of 20 people telling elaborations(10) or denials (10). Half of the elaborations anddenials were truthful, the other half were deceptive.After each fragment, the participants were asked toindicate whether the person was lying or telling thetruth and how confident they were in their decisionmaking. As predicted, with regard to elaborations atruth bias was found and with regard to denials a liebias was found. In other words, people have difficultyin accurately judging deceptive elaborations andtruthful denials. The study further revealedindividual differences in participants' confidence atdetecting deceit. The more socially anxious/shy theparticipants reported themselves to be, the lessconfident they were in their ability to detect deceit.Also, the more extraverted they themselves reported tobe, the more confident they were in their ability todetect deceit. The importance of confidence onimproving people's ability to detect deceit will bediscussed. 相似文献
19.
Yelton AR 《Journal of applied behavior analysis》1979,12(4):565-569
Two sources of variability must each be considered when examining change in level between two sets of data obtained by human observers; namely, variance within data sets (phases) and variability attributed to each data point (reliability). Birkimer and Brown (1979a, 1979b) have suggested that both chance levels and disagreement bands be considered in examining observer reliability and have made both methods more accessible to researchers. By clarifying and extending Birkimer and Brown's papers, a system is developed using observer agreement to determine the data point variability and thus to check the adequacy of obtained data within the experimental context. 相似文献
20.
The cueing effects of interviewer praise contingent on a target behavior and expectation of behavior change were examined with six observers. Experiment I investigated the effect of cues in conjunction with expectation. Experiment II assessed the relative contributions of cues and expectation, and Experiment III examined the effect of cues in the absence of expectation. The frequencies of two behaviors, client eye contact and face touching, were held constant throughout a series of videotaped interviews between an "interviewer" and a "client". A within-subjects design was used in each experiment. During baseline conditions, praise did not follow eye contact by the client on the videotape. In all experimental conditions, praise statements from the interviewer followed each occurrence of eye contact with an equal number of praises delivered at random times when there was no eye contact. Three of the six observers dramatically increased their recordings of eye contact during the first experimental phase, but these increases were not replicated in a second praise condition. There were no systematic changes in recorded face touching. Witnessing the delivery of consequences, rather than expectation seemed to be responsible for the effect. This potential threat to the internal validity of studies using observational data may go undetected by interobserver agreement checks. 相似文献