首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
A basis for analyzing test-retest reliability   总被引:4,自引:0,他引:4  
Three sources of variation in experimental results for a test are distinguished: trials, persons, and items. Unreliability is defined only in terms of variation over trials. This definition leads to a more complete analysis than does the conventional one; Spearman's contention is verified that the conventional approach—which was formulated by Yule—introduces unnecessary hypotheses. It is emphasized that at least two trials are necessary to estimate the reliability coefficient. This paper is devoted largely to developinglower bounds to the reliability coefficient that can be computed from but asingle trial; these avoid the experimental difficulties of making two independent trials. Six different lower bounds are established, appropriate for different situations. Some of the bounds are easier to compute than are conventional formulas, and all the bounds assume less than do conventional formulas. The terminology used is that of psychological and sociological testing, but the discussion actually provides a general analysis of the reliability of the sum ofn variables.The writer is indebted to the members of his statistical seminar, to Professor Mark Kac, and to Professor Samuel A. Stouffer and his staff in the Research Branch, Information and Education Division, War Department, for their helpful comments on this paper.  相似文献   

3.
The test-retest reliability of qualitative items, such as occur in achievement tests, attitude questionnaires, public opinion surveys, and elsewhere, requires a different technique of analysis from that of quantitative variables. Definitions appropriate to the qualitative case are made both for the reliability coefficient of an individual on an item and for the reliability coefficient of a population on the item. From but a single trial of a large population on the item, it is possible to compute alower bound to the group reliability coefficient. Two kinds of lower bounds are presented. From two experimentally independent trials of the population on the item, it is possible to compute anupper bound to the group reliability coefficient. Two upper bounds are presented. The computations for the lower and upper bounds are all very simple. Numerical examples are given.  相似文献   

4.
5.
6.
In two studies, the construct (convergent and discriminant) validity and test-retest reliability of a date rape decision-latency measure was examined. In Study 1, 174 college men completed measures related to sexual aggression and listened to an audiotaped simulation of a date rape, during which cues of nonconsent and force gradually escalated over time. Participants were instructed to respond, by pressing a button which recorded the latency of their decisions in seconds, if and when they believed the man depicted in the scenario should stop his sexual advances. Results demonstrated positive associations between prolonged decision latencies and sexually aggressive behavior, calloused sexual beliefs, acceptance of interpersonal violence, and sexual promiscuity. In Study 2, initial results were cross-validated in a sample of 102 college men, and discriminant validity was established as decision latencies were unassociated with measures of social desirability, alcohol consumption and drug use. Test-retest reliability assessed over a 2-week interval was .87.The authors wish to thank Alan Gross and Brian Marx for providing the audiotaped stimulus materials, Richard Marsh for writing the decision-latency computer program, Jason Hicks for assisting with programming, and the undergraduate research assistants for serving as experimenters.  相似文献   

7.
The Raven Colored Progressive Matrices was administered to a sample of 259 children in Lithuania and re-administered 2 years later. The test-retest reliability was .499.  相似文献   

8.
The baseline inter-rater reliability, test-retest reliability, follow-up inter-rater reliability, and follow-up longitudinal reliability (interrater reliability between generations of raters) of borderline symptoms and the diagnosis of borderline personality disorder (BPD) were assessed using the Revised Diagnostic Interview for Borderlines (DIB-R). Excellent kappa s (> .75) were found in each of these reliability substudies for the diagnosis of BPD itself. Excellent kappa s were also found in each of the three inter-rater reliability substudies for the vast majority of borderline symptoms assessed by the DIB-R. Test-retest reliability for these symptoms was somewhat lower but still very good. More specifically, one-third of the BPD symptoms assessed had a kappa in the excellent range and the remaining two-thirds had a kappa in the fair-good range (.57-.73). The dimensional reliability of BPD symptom areas was somewhat higher than for categorical measures of the subsyndromal phenomenology of BPD. More specifically, all five dimensional measures of borderline psychopathology had intraclass correlation coefficients in the excellent range for all four reliability substudies. Taken together, the results of this study suggest that both the borderline diagnosis and the symptoms of BPD can be diagnosed reliably when using the DIB-R. They also suggest that excellent reliability, once achieved, can be maintained over time for both the syndromal and subsyndromal phenomenology of BPD.  相似文献   

9.
Tod D  Morrison TG  Edwards C 《Body image》2012,9(3):425-428
The current study assessed relationships among four commonly used drive for muscularity questionnaires, along with their 7 and 14 day test-retest reliability. Sample 1 was comprised of young British adult males (N=272; M(AGE)=20.3) who completed the questionnaires once. Sample 2, a group of young British adult males (N=54, M(AGE)=19.3), completed the questionnaires three times spaced 7 and 14 days apart. Correlations among Sample 1 ranged from .20 to .82 providing evidence for concurrent and discriminant validities. Evidence for test-retest reliability emerged with intraclass correlations ranging from .78 to .95 (p<.001) and generally nonsignificant t-tests (p>.05). Overall, the data support the psychometric properties of the drive for muscularity inventories; however, the shared variance (35-67%) hints that refinement is possible.  相似文献   

10.
Although driving while intoxicated (DWI) is a pervasive problem, reliable measures of this behavior have been elusive. In the present study, the Form 90, a widely utilized alcohol and substance use instrument, was adapted for measurement of DWI and related behaviors. Levels of reliability for the adapted instrument, the Form 90-DWI, were tested among a university sample of 60 undergraduate students who had consumed alcohol during the past 90 days. The authors administered the instrument once during an intake interview and again, 7-30 days later, to determine levels of test-retest reliability. Overall, the Form 90-DWI demonstrated high levels of reliability for many general drinking and DWI behaviors. Levels of reliability were lower for riding with an intoxicated driver and for variables involving several behavioral conjunctions, such as seat belt use and the presence of passengers when driving with a blood alcohol concentration above .08. Overall, the Form 90-DWI shows promise as a reliable measure of DWI behavior in research on treatment outcome and prevention.  相似文献   

11.
Two groups of students enrolled in a university physical activity course volunteered to complete Kolb's Learning Style Inventory at the beginning of and the end of a semester to estimate test-retest reliability. A control group (n = 129) completed the inventory in its original form while the experimental group (n = 124) completed the same test but with modified instructions providing a more specific focus. Test-retest reliability, assessed using a Pearson product-moment correlation, improved for the group given instructions which specified a contextual focus.  相似文献   

12.
Forty-five Swedish couples ( N =90) independently completed a translation of Rothbart's Infant Behavior Questionnaire (IBQ) when their infants were 3 and 8 months of age. There was greater agreeement between mothers and fathers at 8 than at 3 months, perhaps because fathers became more involved as their children grew older. At neither age was agreement as great as that reported by Rothbart (1981) in the USA. Parents did not agree on the dimensions Duration of Orienting and Soothability. In the eyes of both parents, there was significant, although modest, stability over time on most dimensions of infant temperament. There was least perceived stability in Distress to Approaching Stimuli (Fear). These results suggest that the IBQ (even in a Swedish translation) may be a reliable and valid way of measuring parental perceptions of infant temperament.  相似文献   

13.
14.
Previous research efforts have developed and validated various scales potentially useful in evaluating service learning outcomes. The developmental efforts reported for the four scales examined in this study did not include the test-retest reliabilities that would provide assurance to service learning researchers of the long-term stability and therefore usefulness of these measures. Summary estimates of 13-wk. test-retest reliabilities for the scales Civic Participation, Self-efficacy Toward Service, Attitude Toward Helping Others, and College Education's Role in Addressing Social Issues provide service learning researchers with evidence of stability of the scales over the typical duration of service learning courses.  相似文献   

15.
The reliability of magnitude-estimation scaling as a measure of overall clarity of speech was investigated. 40 subjects (M age = 19 yr.) provided magnitude-estimation responses for nine audiotaped versions of a nonsense sentence varying systematically in number of correct consonant phonemes. There was no significant difference in the magnitude-estimation responses of the subjects during two test sessions separated by one week. Analysis suggested that magnitude-estimation scaling is a reliable measure of speech clarity/intelligibility. This finding is discussed in relation to speech samples varying in aspects other than number of consonant phonemes correct and possible further clinical research applications.  相似文献   

16.
17.
The conventionally employed procedure for rating ischemic pain was found to produce a degree of response bias associated with the ceiling points of the scale used. A new approach permitting open-ended ratings followed by transformation of these ratings into a common decile scale provided far greater test-retest reliability. This was explained largely in terms of the attenuation of rating artifact. The new procedure also gave rise to consistently linear functions for ischemic pain. Implications are raised for the measurement of pain as well as other psychological continua.  相似文献   

18.
An important aspect of human individual face recognition is the ability to discriminate unfamiliar individual. Since many general processes contribute to explicit behavioural performance in individual face discrimination tasks, isolating a measure of unfamiliar individual face discrimination ability in humans is challenging. In recent years, a fast periodic visual stimulation approach (FPVS) has provided objective (frequency-locked) implicit electrophysiological indices of individual face discrimination that are highly sensitive at the individual level within a few minutes of testing. Here we evaluate the test-retest reliability of this response across scalp electroencephalographic (EEG) recording sessions separated by more than two months, in the same 30 individuals. We found no test-retest difference overall across sessions in terms of amplitude and spatial distribution of the EEG individual face discrimination response. Moreover, with only 4 stimulation sequences corresponding to 4 min of recordings per session, the individual face discrimination response was highly reliable in terms of amplitude, spatial distribution, and shape. Together with previous observations, these results strengthen the diagnostic value of FPVS-EEG as an objective and rapid flag for specific difficulties at individual face recognition in the human population.  相似文献   

19.
The aim of this study is to assess the test-retest stability of the Spanish version of Youth Self Report after 18 mo. for a sample of 357 Catalonian high school students (158 boys and 199 girls). At Time 2 the girls' scores increased on Delinquent and Aggressive Behavior scales and, therefore on Externalizing scores. At Time 2 the boys' scores increased on Attention Problems and Delinquent Behavior and decreased on Anxious/Depressed, Social Problems, and Internalizing scales. Significant differences in the remaining scales were not observed. The test-retest intraclass correlations for the broad-band scales ranged between .62 (Internalizing) and .68 (Externalizing) and for the narrow-band scales between .37 and .67. The correlations for girls and boys were similar but slightly higher for girls on Anxious/Depressed and Thought Problems.  相似文献   

20.
The purpose of the present study was to identify reliable and clinically meaningful patterns of ability and achievement using the WISC-III and WIAT. Cluster analysis was used to group the 182 WISC-III and WIAT profiles (10 WISC-III subtests and 4 WIAT subtests) of children between the ages of 9 and 14 years. Theoretical and empirical considerations were used to identify a cluster solution, which involved comparison of several five-, six- and eight-cluster solutions. A five-cluster solution was selected as being representative of the data, which was well replicated across three hierarchical clustering methods (i.e., complete linkage, average linkage-within groups, and average linkage-between groups (UPGMA)). The clusters were labeled based on their most salient characteristics, which included a group of predominantly low ability and achievement, a group demonstrating a pattern of verbal processing deficits, a group demonstrating a pattern of visual spatial/processing speed deficits, a group with low ability and achievement with average processing speed, and a group with deficits consistent with an ACID pattern. The external validity of the five subtypes was assessed through an evaluation of the relationship between cluster membership and neuropsychological test data. Most predictions regarding neuropsychological performance were supported by the data, providing further evidence of the validity of the five-cluster solution. Clinical implications of the ability-achievement typology and suggestions for future research are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号