首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
When the reliability of test scores must be estimated by an internal consistency method, partition of the test into just 2 parts may be the only way to maintain content equivalence of the parts. If the parts are classically parallel, the Spearman-Brown formula may be validly used to estimate the reliability of total scores. If the parts differ in their standard deviations but are tau equivalent, Cronbach's alpha is appropriate. However, if the 2 parts are congeneric, that is, they are unequal in functional length or they comprise heterogeneous item types, a less well-known estimate, the Angoff-Feldt coefficient, is appropriate. Guidelines in terms of the ratio of standard deviations are proposed for choosing among Spearman-Brown, alpha, and Angoff-Feldt coefficients.  相似文献   

We examined 8 data sets to determine whether it is possible to attain acceptable levels of internal consistency (coefficient alpha) reliability for the 4 Object Relations and Social Cognition scales (ORSC; Westen, Lohr, Silk, Kerber, & Goodrich, 1989) for the Thematic Apperception Test (TAT; Murray, 1943) when cards are considered as items in a scale. Number of cards used in the data sets ranged from 4 to 10, and the Spearman-Brown prophecy formula was applied to estimate the number of cards that would be required to attain alpha levels of different magnitudes. The two more structural subscales (Complexity of Representations and Understanding of Social Causality) have somewhat higher consistencies than the two more affective ones (Affect Tone and Capacity for Emotional Investment and Moral Standards). The results suggest that the use of 10 to 12 cards provides internal consistencies of alpha > or = .70 across each of the 4 ORSC scales.  相似文献   

This paper studies the asymptotic distributions of three reliability coefficient estimates: Sample coefficient alpha, the reliability estimate of a composite score following a factor analysis, and the estimate of the maximal reliability of a linear combination of item scores following a factor analysis. Results indicate that the asymptotic distribution for each of the coefficient estimates, obtained based on a normal sampling distribution, is still valid within a large class of nonnormal distributions. Therefore, a formula for calculating the standard error of the sample coefficient alpha, recently obtained by van Zyl, Neudecker and Nel, applies to other reliability coefficients and can still be used even with skewed and kurtotic data such as are typical in the social and behavioral sciences.This research was supported by grants DA01070 and DA00017 from the National Institute on Drug Abuse and a University of North Texas faculty research grant. We would like to thank the Associate Editor and two reviewers for suggestions that helped to improve the paper.  相似文献   

Reliabilities of scores for experimental tasks are likely to differ from one study to another to the extent that the task stimuli change, the number of trials varies, the type of individuals taking the task changes, the administration conditions are altered, or the focal task variable differs. Given that reliabilities vary as a function of the design of these tasks and the characteristics of the individuals taking them, making inferences about the reliability of scores in an ongoing study based on reliability estimates from prior studies is precarious. Thus, it would be advantageous to estimate reliability based on data from the ongoing study. We argue that internal consistency estimates of reliability are underutilized for experimental task data and in many applications could provide this information using a single administration of a task. We discuss different methods for computing internal consistency estimates with a generalized coefficient alpha and the conditions under which these estimates are accurate. We illustrate use of these coefficients using data for three different tasks.  相似文献   

The theory of the estimation of test reliability   总被引:13,自引:0,他引:13  
The theoretically best estimate of the reliability coefficient is stated in terms of a precise definition of the equivalence of two forms of a test. Various approximations to this theoretical formula are derived, with reference to several degrees of completeness of information about the test and to special assumptions. The familiar Spearman-Brown Formula is shown to be a special case of the general formulation of the problem of reliability. Reliability coefficients computed in various ways are presented for comparative purposes.  相似文献   

Osburn HG 《心理学方法》2000,5(3):343-355
The author studied the conditions under which coefficient alpha and 10 related internal consistency reliability coefficients underestimate the reliability of a measure. Simulated data showed that alpha, though reasonably robust when computed on n components in moderately heterogeneous data, can under certain conditions seriously underestimate the reliability of a measure. Consequently, alpha, when used in corrections for attenuation, can result in nontrivial overestimation of the corrected correlation. Most of the coefficients studied, including lambda 2, did not improve the estimate to any great extent when the data were heterogeneous. The exceptions were stratified alpha and maximal reliability, which performed well when the components were grouped into two subsets, each measuring a different factor, and maximized lambda 4, which provided the most consistently accurate estimate of the reliability in all simulations studied.  相似文献   

This paper reports on a study about the reliability and validity of a structured behavioral interview to assess private security personnel. Reliability was estimated using interrater coefficients. Two independent interviewers were used to rate each interviewee. Results show a reliability coefficient of .81 (N = 43) and .89 with Spearman-Brown correction for two raters. Validity was estimated using a content validation approach. This strategy was suggested by Lawshe (1975) to estimate the content validity of selection tests. So far, only two studies carried out by Schmitt and Ostroff (1986) and Carrier et al. (1990) have used Lawshe's strategy in the structured behavioral interview case. The interview consisted of seven questions and each was rated by 11 experts in the job. Results show a significant content validity ratio (CVR) for majority of the questions in the interview and a content validity index (CVI) of .89. Implications of these findings for the practice of the structured behavioral interview are discussed and future research is suggested.  相似文献   

A method is presented for estimating reliability using structural equation modeling (SEM) that allows for nonlinearity between factors and item scores. Assuming the focus is on consistency of summed item scores, this method for estimating reliability is preferred to those based on linear SEM models and to the most commonly reported estimate of reliability, coefficient alpha.  相似文献   

In this study, we utilized reliability generalization procedures to examine internal consistency estimates across 3 scales measuring the belief in a just world. The distribution of reliability estimates for the measures suggest low to moderate ranges of internal consistency reliability coefficients. The Global Belief in a Just World Scale (Lipkus, 1991) produced the highest average reliability score (alpha = .81) compared to the Just World Scale (Rubin & Peplau, 1973; alpha = .64) and the Just World Scale Revised (Rubin & Peplau, 1975; alpha = .68).  相似文献   

We examined the structural validity, internal consistency (alpha and omega), and test-retest reliability of scores on the Cross Racial Identity Scale (CRIS; Vandiver et al., 2000 ; Worrell, Vandiver, & Cross, 2004 ), as well as the relationship between CRIS scores and several variables related to psychological adjustment. Participants consisted of several groups of African American college students (34 ≤ n ≤ 340) attending a predominantly White university in a Western state. Confirmatory factor analyses indicated an acceptable fit of the data to the theoretical model, and alpha and omega coefficients indicate that CRIS scores have moderate to high internal consistency. CRIS scores also demonstrated stability over periods between 2 and 20 months in ranges that suggest long-term stability of racial attitudes. As predicted by the expanded nigrescence model (Cross & Vandiver, 2001 ), only self-hatred attitudes had consistent, meaningful relationships with psychological adjustment.  相似文献   

Criterion-referenced (Livingston) and norm-referenced (Gilmer-Feldt) techniques were used to measure the internal consistency reliability of Folstein's Mini-Mental State Examination (MMSE) on a large sample (N = 418) of elderly medical patients. Two administration and scoring variants of the MMSE Attention and Calculation section (Serial 7s only and WORLD only) were investigated. Livingston reliability coefficients (rs) were calculated for a wide range of cutoff scores. As necessary for the calculation of the Gilmer-Feldt r, a factor analysis showed that the MMSE measures three cognitive domains. Livingston's r for the most widely used MMSE cutoff score of 24 was .803 for Serial 7s and .795 for WORLD. The Gilmer-Feldt internal consistency reliability coefficient was .764 for Serial 7s and .747 for WORLD. Item analysis showed that nearly all of the MMSE items were good discriminators, but 12 were too easy. True score confidence intervals should be applied when interpreting MMSE test scores.  相似文献   

This study examined the validity and reliability of a Turkish version of the Modified Moral Sensitivity Questionnaire for Student Nurses (MMSQSN). After obtaining permission to adapt the MMSQSN into Turkish, the translation/back-translation method was used with expert opinions to determine content validity. Factor analysis was conducted to examine the construct validity and test–retest was performed on the questionnaire to determine reliability. Cronbach’s alpha coefficients were calculated to assess for internal consistency. Participants included 272 baccalaureate degree student nurses who took ethics lessons prior to their clinical internship. The factor analysis revealed that even though the factor structure in the original scale was the same, relevant items were categorized with similar components, and factor loads were sufficient. The correlation coefficient in the analyses of test–retest scores was .66 for the total scale (p < .05) and the Cronbach’s alpha was .73 for the total scale. The translated MMSQSN is a valid and reliable measure of ethical sensitivity in student nurses in Turkey.  相似文献   

The present study investigated people's variability across situations by getting ratings of 66 subjects on 14 bipolar dimensions from at least eight interactants, chosen for their diversity. The intercorrelation of single ratings yielded a mean coefficient of .221. The correlation of single ratings with the aggregate of the other ratings for a dimension resulted in a mean coefficient of .388. The correlation of two sets of aggregated ratings gave a mean coefficient of .550, or .710 with application of the Spearman-Brown correction. Finally, computation of Cronbach's alpha gave a mean coefficient of .735. The results provide a further demonstration of the coherence that can be revealed by aggregation. Correlations of aggregated ratings on each of the 14 dimensions with extraversion ranged up to .668, and correlations with neuroticism ranged up to .410. The study suggests that there is a dispositionality in the characteristics people display, and that the emphasis on variability (e.g., Mischel, 1968; Mischel & Peake, 1982) should be tempered.  相似文献   

When viewed in the aggregate, studies of the longitudinal consistency of intelligence, personality traits and self-opinion (self-esteem, life satisfaction etc.) show a hierarchy of consistency. Uncorrected retest coefficients over periods of 6 months to 50 yr are analyzed as the product of period-free reliability (R) and the true stability of the construct (sn, where s is the coefficient of annual stability and n the number of years of the retest interval). The annual stabilities of intelligence, personality traits and self-opinions are estimated as 0.99, 0.98 and 0.94, respectively. While intelligence and personality may be regarded as relatively stable characteristics over the length of the adult lifespan, self-opinion has little stability over periods of more than 10 yr. The hierarchy of consistency should be taken into account in causal models of human development. Although self-opinion is not a longitudinally-stable characteristics, it may still be predicted over long periods of time by higher-order constructs such as personality traits and intelligence.  相似文献   

Reliability generalization (RG) is a meta-analytic technique that allows for the systematic examination of variation in score reliability for different samples of test takers; this procedure is based on the recognition that reliability is not a stable property of a test but is sample dependent. As a demonstration of an RG analysis, I obtained 63 reliability coefficients for each of the MMPI-2 (Butcher et al., 2001) Personality Psychopathology 5 (Harkness, McNulty, & Ben-Porath, 1995) scales. The overall variability of alpha coefficients supports the argument that reliability is sample dependent and underscores the need for researchers to calculate reliability estimates based on their research samples rather than simply citing published alpha coefficients as evidence of score reliability. I observed statistically significant mean reliability differences for scores across the 5 scales, with the highest level of reliability observed for scores on the measure of Negative Emotionality and the lowest levels of reliability observed for scores on the measures of Aggression and Disconstraint. There was no evidence that the sex-composition of a sample was systematically related to score reliability, and there were no statistically significant differences in reliability between scores obtained with the English version of the test and those obtained with translated forms. However, reliability was consistently lower for scores on some scales when the data were obtained in nonclinical settings as opposed to clinical ones. Sample size was not significantly correlated with reliability estimates. RG methods have the potential for deepening the level of understanding about the role of reliability in the evaluation and use of personality tests.  相似文献   

KR-21 provides a lower limit for the computed value of KR-20. KR-20 is equivalent to coefficient alpha when a test is composed of dichotomous items scored 0 or 1. Therefore, KR-21 coefficients, computed from simple summary statistics, can be used in cases in which journal authors do not provide the test score reliability. Use of KR-21 in these cases will provide the reader with a lower limit for the value of KR-20.  相似文献   

The separate questions on an essay test or the individual judges on a rater panel may constitute congeneric parts rather than tau-equivalent parts. Also, it may be necessary to infer the lengths of the congeneric parts from their variances and covariances, rather than from some obvious feature of each part, such as the range of possible scores. Cronbach's alpha coefficient applied to such part-tests data will underestimate total score reliability. Several reliability coefficients are developed for such instruments. They may be regarded as extensions of the coefficient developed by Kristof for a three-part test.  相似文献   

This pilot study examined test-retest and internal consistency reliabilities of original and modified formats of the Exercise Self-efficacy Scale in college-age women. 30 completed original and modified versions of the scale. Data from both tests, administered 1 wk. apart, were analyzed using the intraclass correlation coefficient (ICC) to assess test-retest reliability and Cronbach coefficient alpha for internal consistency. Scores for both versions correlated .96. Cronbach coefficients alpha for the original scale were .96 for Time 1 and .98 for Time 2. Cronbach coefficients alpha for the revised scale were .95 for Time 1 and .98 for Time 2. Test-retest reliability and internal consistency remained consistently high for both versions of the scales within this sample. Implications for use of this scale and recommendations for research are given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号