首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
STUDIES IN APPLIED BEHAVIOR ANALYSIS HAVE USED TWO EXPRESSIONS OF RELIABILITY FOR HUMAN OBSERVATIONS: percentage agreement (including percentage occurrence and percentage nonoccurrence agreement) and correlational techniques (including the phi coefficient). The formal relationship between these two expressions is demonstrated, and a table for converting percentage agreement to phi, or vice-versa, is presented. It is suggested that both expressions be reported in order to communicate reliability unambiguously and to facilitate comparison of the reliabilities from different studies.  相似文献   

2.
The need to train accurate, not necessarily agreeing, observers is discussed. Intraobserver consistency as an intermediate criterion in such training is proposed and contrasted with the more familiar criterion of interobserver agreement. Videotaped observations of social interactions between handicapped and nonhandicapped preschoolers provided the medium for examining the criterion agreement of four observers trained against each type of standard. Observers generally failed to show high levels of criterion agreement whether trained to a within- or to a between-observer agreement standard. The results varied somewhat with the frequency of behaviors, however. Correlations between interobserver agreement and intraobserver consistency were variable but somewhat higher when interobserver agreement was the training criterion than when intraobserver consistency was the criterion. Correlations between interobserver agreement and criterion agreement ranged from — .16 to .89 during interobserver agreement training. Correlations between intraobserver consistency and criterion agreement ranged from — .23 to .99 during intraobserver consistency training.  相似文献   

3.
Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not.  相似文献   

4.
Hattie confirms the poor reliabilities of the scales in the POI but notes that the POI has a clear factor structure and speculates that the poor scale reliabilities may simply be due to the generality of the concepts covered. It is noted in reply that the clinician would be scoring Shostrom's unreliable scales rather than Hattie's clear factors and it is shown that the poor reliabilities are not due to conceptual generality.  相似文献   

5.
The meaning and properties of a commonly used index of reliability, S/L,were examined critically. It was found that the index does not reflect any conventional concept of reliability. When used for an identical behavioral observation session, it is not statistically correlated with other reliability indices. Within an observation session, the standardizing measure of Lis beyond the control of the investigator. Furthermore, the reason for the choice of Las the standard is unclear. The role of chance agreement in S/Lis not known. The exact interpretation of the index depends on which observer reports L.Overall the conceptual and mathematical meaning of S/Lis dubious. It is suggested that the S/Lindex should not be used until its nature is shown to be a measure of reliability. Other approaches such as the intraclass correlations and generalizability coefficients should be used instead.The authors are indebted to Johnny Matson for his critique of an earlier version of this paper.  相似文献   

6.
The test-retest reliability of the Spanish Diagnostic Interview Schedule for Children (DISC-IV) is presented. This version was developed in Puerto Rico in consultation with an international bilingual committee, sponsored by NIMH. The sample (N = 146) consisted of children recruited from outpatient mental health clinics and a drug residential treatment facility. Two different pairs of nonclinicians administered the DISC twice to the parent and child respondents. Results indicated fair to moderate agreement for parent reports on most diagnoses. Relatively similar agreement levels were observed for last month and last year time frames. Surprisingly, the inclusion of impairment as a criterion for diagnosis did not substantially change the pattern of results for specific disorders. Parents were more reliable when reporting on diagnoses of younger (4–10) than older children. Children 11–17 years old were reliable informants on disruptive and substance abuse/dependence disorders, but unreliable for anxiety and depressive disorders. Hence, parents were more reliable when reporting about anxiety and depressive disorders whereas children were more reliable than their parents when reporting about disruptive and substance disorders.  相似文献   

7.
An index is proposed to measure the extent of agreement of the data of a sociometric test with another test made at an earlier time or on another test criterion. The index is used to define an index of concordancebetween the two tests. It is shown how the index may be used for either individuals or groups. Tests of the hypothesis that agreement is random are given for all cases and applied to an example.Work done under the sponsorship of the Office of Naval Research.  相似文献   

8.
9.
Previous confirmatory factor analytic research that has examined the factor structure of the Wechsler Adult Intelligence Scale–Fourth Edition (WAIS-IV) has endorsed either higher order models or oblique factor models that tend to amalgamate both general factor and index factor sources of systematic variance. An alternative model that has not yet been examined for the WAIS-IV is the bifactor model. Bifactor models allow all subtests to load onto both the general factor and their respective index factor directly. Bifactor models are also particularly amenable to the estimation of model-based reliabilities for both global composite scores (ω h ) and subscale/index scores (ω s ). Based on the WAIS-IV normative sample correlation matrices, a bifactor model that did not include any index factor cross loadings or correlated residuals was found to be better fitting than the conventional higher order and oblique factor models. Although the ω h estimate associated with the full scale intelligence quotient (FSIQ) scores was respectably high (.86), the ω s estimates associated with the WAIS-IV index scores were very low (.13 to .47). The results are interpreted in the context of the benefits of a bifactor modeling approach. Additionally, in light of the very low levels of unique internal consistency reliabilities associated with the index scores, it is contended that clinical index score interpretations are probably not justifiable.  相似文献   

10.
Percentage agreement measures of interobserver agreement or "reliability" have traditionally been used to summarize observer agreement from studies using interval recording, time-sampling, and trial-scoring data collection procedures. Recent articles disagree on whether to continue using these percentage agreement measures, and on which ones to use, and what to do about chance agreements if their use is continued. Much of the disagreement derives from the need to be reasonably certain we do not accept as evidence of true interobserver agreement those agreement levels which are substantially probable as a result of chance observer agreement. The various percentage agreement measures are shown to be adequate to this task, but easier ways are discussed. Tables are given to permit checking to see if obtained disagreements are unlikely due to chance. Particularly important is the discovery of a simple rule that, when met, makes the tables unnecessary. If reliability checks using 50 or more observation occasions produce 10% or fewer disagreements, for behavior rates from 10% through 90%, the agreement achieved is quite improbably the result of chance agreement.  相似文献   

11.
The Teamwork – Knowledge, Skills, and Ability (KSA) Test was developed by Stevens and Campion to operationalize their comprehensive taxonomy of teamwork competencies. The test is generally considered ‘valid’ and has been used frequently in organizations. Our review of the literature found an average criterion validity of.20 for the Teamwork‐KSA Test, although there was considerable variability across studies. We could find no research on the item properties, factor structure, or subscale reliabilities, and no extensive investigations of the nomological net of this test. In our field sample, we found subscale reliabilities to be generally inadequate, no meaningful factor structure, and low predictivenes of employees' performance on team‐related dimensions. Although the taxonomy it purports to measure is preeminent, the Teamwork‐KSA Test itself may have serious limitations.  相似文献   

12.
The Emotional Stroop (ES) task (I. H. Gotlib & C. D. McCann, 1984) has been proposed as an experimental measure to assess the processing of emotion or the bias in attention of emotion-laden information. However, study results have not been consistent. To further examine its reliability for empirical research, the authors of this study administered the ES task to 33 participants on 2 separate occasions separated by 1 week. Results indicated that retest reliabilities for reaction times (RTs) derived from the 3 separate emotion conditions (manic, neutral, and depressive) across the 1 week interval were very high. However, consistent with previous research, the reliabilities were very low for the interference indices (manic and depressive). These low reliabilities reflect the very high intercorrelation between the RTs derived from the 3 conditions. The authors concluded that a better indicator of the reliability for this task is the individual RTs from each emotion condition.  相似文献   

13.
14.
We conducted one of the few studies that has examined the reliability of the Structured Clinical Interview for DSM-III-R Axis I (SCID-I) with a mixed inpatient and outpatient population of adults 55 years old and over (range, 56–84 years; mean, 67.33 years). All SCID interviews were videotaped or audiotaped and were administered by Master's-level clinicians working toward their doctorate degrees in clinical psychology. Interrater reliability estimates (kappa and percentage agreement) were calculated for current major depressive episode (47% base rate) and the broad diagnostic categories of anxiety disorders (15% base rate) and somatoform disorders (12% base rate). Kappa values were .70, .77, and 1.0. Respective percentage agreement was 85% for major depression, 94% for anxiety disorders, and 100% for somatoform disorders. Overall percentage agreement was 91%. We conclude that the SCID-I can be effectively administered by relatively inexperienced clinicians to diagnose older psychiatric patients reliably. Directions that future research might take are offered.  相似文献   

15.
16.
ABSTRACT The just noticeable difference (jnd) unit of classical psycho-physics is introduced as a new way to describe accuracy and agreement in observer evaluations of personality. A formula for estimating jnd's from typically available summary statistics is derived from Thurstone's law of comparative judgment. A study examining four traits judged in 10 samples of subjects where the design permitted the calculation of jnd's by the method of paired comparisons indicated that the formula predicted the empirically derived jnd's associated with the mean judge ratings with considerable precision. Jnd's of criterion measures were also predicted, and while fit was somewhat less impressive in this case, there was still appreciable convergence between predicted and empirical values. The implications of jnd measures of agreement and accuracy are discussed. These implications include (a) possibilities for increased understanding of bias in observer judgments, (b) a new recognition that equal correlations to external criteria do not necessarily imply equal accuracy, and (c) alternative ways of describing the magnitude of effect in psychological research.  相似文献   

17.
The scoring protocol adopted by the MSCEIT V2 has been criticised since its development. The present study raises questions regarding the value of consensus scoring by analysing responses within the categorical subscales of Changes and Blends using the Optimal Scaling technique within the Categorical Principal Components Analysis (CATPCA) via Statistical Package for the Social Sciences (SPSS), (n = 206). On a number of occasions, there was no clear agreement as to the “correct” response to items within these categorical subscales. Such an issue seems integral to the application of the MSCEIT V2 and one which deserves more attention. On a more positive note, improvements were made to the reliabilities of the subscales of Changes and Blends, using Optimal Scaling, but less so for Changes. Nevertheless, this raises the possibility of improving the reliabilities of other subscales in the MSCEIT V2 and in turn improving the power of subsequent statistical tests.  相似文献   

18.
The literature on diagnosis of head pain associated with psychological factors indicates that these diagnoses rely almost exclusively on self-report criteria. The reliability of self-report criteria for diagnosis of headache has not been previously reported. The present study investigated the reliability of headache diagnosis based on the criteria suggested by the Ad Hoc Committee on Classification of Headache. The results indicated modest rater agreement. It was concluded that the headache literature may be confounded by unreliable diagnostic procedures. Improved methods of classifying headache types using self-report, behavioral, and physiological measures during pain-free and headache states are required before adequate reliability of headache classification can be achieved. It is suggested that reliable and valid measurement and classification would eliminate much of the confusion currently existing in the headache literature.  相似文献   

19.
20.
A new form (VI) of the Sensation Seeking Scale (SSS) was developed which separates reports of past experiences from desired or intended future experiences on both Disinhibition (Dis) and Thrill and Adventure Seeking (TAS) factors. Factor analyses were used to select items for the scales. High internal reliabilities were found for the Experience-Dis, Intention-TAS, and Intention-Dis scales, but only moderate reliability was found for the Experience-TAS scale. Retest reliabilities were high for all scales. The Experience-TAS and -Dis scales were highly correlated for males but not for females. The Experience- and Intention-TAS scales were moderately correlated, and the Experience- and Intention-Dis scales were highly correlated for both sexes. Both the TAS and the Dis scales on form V were highly correlated with the corresponding Intention scales on form VI. Uses for the new SS scales in individual assessment are suggested.A copy of SSS form VI may be obtained from the author.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号