首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Several authors have suggested that prior to conducting a confirmatory factor analysis it may be useful to group items into a smaller number of item ‘parcels’ or ‘testlets’. The present paper mathematically shows that coefficient alpha based on these parcel scores will only exceed alpha based on the entire set of items if W, the ratio of the average covariance of items between parcels to the average covariance of items within parcels, is greater than unity. If W is less than unity, however, and errors of measurement are uncorrelated, then stratified alpha will be a better lower bound to the reliability of a measure than the other two coefficients. Stratified alpha are also equal to the true reliability of a test when items within parcels are essentially tau‐equivalent if one assumes that errors of measurement are not correlated.  相似文献   

2.
Transient errors are caused by variations in feelings, moods, and mental states over time. If these errors are present, coefficient alpha is an inflated estimate of reliability. A true-score model is presented that incorporates transient errors for test-retest data, and a reliability estimate is derived. This estimate, referred to as the test-retest alpha, is less than coefficient alpha if transient error is present and is less susceptible to effects due to item recall than a test-retest correlation. An assumption underlying the test-retest alpha is essential tau equivalency of items. A test-retest split-half coefficient is presented as an alternative to the test-retest alpha when this assumption is violated. The test-retest alpha is the mean of all possible test-retest split-half coefficients.  相似文献   

3.
The concept of test reliability is examined in terms of general, group, and specific factors among the items, and the stability of scores in these factors from trial to trial. Four essentially different definitions of reliability are distinguished, which may be called the hypothetical self-correlation, the coefficient of equivalence, the coefficient of stability, and the coefficient of stability and equivalence. The possibility of estimating each of these coefficients is discussed. The coefficients are not interchangeable and have different values in corrections for attentuation, standard errors of measurement, and other practical applications.  相似文献   

4.
When the reliability of test scores must be estimated by an internal consistency method, partition of the test into just 2 parts may be the only way to maintain content equivalence of the parts. If the parts are classically parallel, the Spearman-Brown formula may be validly used to estimate the reliability of total scores. If the parts differ in their standard deviations but are tau equivalent, Cronbach's alpha is appropriate. However, if the 2 parts are congeneric, that is, they are unequal in functional length or they comprise heterogeneous item types, a less well-known estimate, the Angoff-Feldt coefficient, is appropriate. Guidelines in terms of the ratio of standard deviations are proposed for choosing among Spearman-Brown, alpha, and Angoff-Feldt coefficients.  相似文献   

5.
Recent work on reliability coefficients has largely focused on continuous items, including critiques of Cronbach’s alpha. Although two new model-based reliability coefficients have been proposed for dichotomous items (Dimitrov, 2003a,b; Green & Yang, 2009a), these approaches have yet to be compared to each other or other popular estimates of reliability such as omega, alpha, and the greatest lower bound. We seek computational improvements to one of these model-based reliability coefficients and, in addition, conduct initial Monte Carlo simulations to compare coefficients using dichotomous data. Our results suggest that such improvements to the model-based approach are warranted, while model-based approaches were generally superior.  相似文献   

6.
Assessing reliability of situational judgment tests (SJTs) in high‐stakes situations is problematic with reliability inappropriately measured by Cronbach's alpha when test items are heterogeneous. We computed the corrected, weighted mean alpha from 56 alpha coefficients, which produced a value of α = .46 and reviewed appropriate types of reliability to use with SJTs. In the current longitudinal study, SJT test–retest reliability was r = .82, compared with internal consistency, α = .46, and stratified alpha, α = .45 at Time 1 and α = .52 and stratified α = .51 at Time 2. We used a student sample (Time 1: n = 185; Time 2: n = 132) with items from a credentialing exam with ‘should do’ instructions. The SJT correlated significantly with cognitive ability, r = .30, and agreeableness, r = .24. In Study 2, we assessed test–retest reliability with Human Resource professionals (Time 1: n = 94; Time 2: n = 32) who had been recently credentialed and who participated in a pilot test of new SJT items with ‘most likely/least likely do’ response options. The SJT test–retest reliability was r = .66 compared with internal consistency, α = .43 and stratified α = .47 at Time 1 and α = .61 and stratified α = .67 at Time 2. We discuss the theoretical implications of the Study 1 results as well as the practical implications for use of SJTs in credentialing examinations.  相似文献   

7.
8.
9.
This purpose of this study is to develop a brief version of Sprecher and Fehr’s Compassionate Love Scale (2005). This was accomplished by administering the 21-item scale to college student participants and subsequently selecting five items for a brief version. The five items were selected based on the evaluation of high correlation coefficients between individual item responses and the overall total 21 questions from the original scale, the results of factor analysis, and items that had moderate means and high standard deviations. The correlation between the original and brief version is 0.96, while the internal reliability of the brief version, using Cronbach’s alpha, is 0.90.  相似文献   

10.
A coefficient derived from communalities of test parts has been proposed as greatest lower bound to Guttman's immediate retest reliability. The communalities have at times been calculated from covariances between itemsets, which tends to underestimate appreciably. When items are experimentally independent, a consistent estimate of the greatest defensible internal-consistency coefficient is obtained by factoring item covariances. In samples of modest size, this analysis capitalizes on chance; an estimate subject to less upward bias is suggested. For estimating alternate-forms reliability, communality-based coefficients are less appropriate than stratified alpha.I thank Edward Haertel for comments and suggestions, and Andrew Comrey for data.  相似文献   

11.
A test theory using only ordinal assumptions is presented. It is based on the idea that the test items are a sample from a universe of items. The sum across items of the ordinal relations for a pair of persons on the universe items is analogous to a true score. Using concepts from ordinal multiple regression, it is possible to estimate the tau correlations of test items with the universe order from the taus among the test items. These in turn permit the estimation of the tau of total score with the universe. It is also possible to estimate the odds that the direction of a given observed score difference is the same as that of the true score difference. The estimates of the correlations between items and universe and between total score and universe are found to agree well with the actual values in both real and artificial data.Part of this paper was presented at the June, 1989, Meeting of the Psychometric Society. The authors wish to thank several reviewers for their suggestions. This research was mainly done while the second author was a University Fellow at the University of Southern California.  相似文献   

12.
This paper discusses the limitations of Cronbach's alpha as a sole index of reliability, showing how Cronbach's alpha is analytically handicapped to capture important measurement errors and scale dimensionality, and how it is not invariant under variations of scale length, interitem correlation, and sample characteristics. In addition, the limitations and strengths of several recommendations on how to ameliorate these problems were critically reviewed. It was shown that the reliance on Cronbach's alpha as a sole index of reliability is no longer sufficiently warranted. This requires that other indices of internal consistency be reported along with alpha coefficient, and that when a scale is composed of large number of items, factor analysis should be performed, and appropriate internal consistency estimation method applied. This approach, if adopted, will largely minimize and guard against uncritical use of Cronbach's alpha coefficient.  相似文献   

13.
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE®General Analytical Writing and until 2009 in the case of TOEFL® iBT Writing. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e‐rater®. In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE General Analytical Writing and TOEFL iBT Writing. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability.  相似文献   

14.
Previous research on measurement error in job performance ratings estimated reliability using coefficients: alpha, test–retest, and interrater correlation. None of these three coefficients control for the four main sources of error in performance ratings. For this reason, coefficient of equivalence and stability (CES) has been suggested as the ideal estimate of reliability. This article presents the estimates of CES for a time interval of 1, 2, and 3 years. The values obtained for a single rater were .51, .48, and .44, respectively. For two raters, the values were .59, .55, and .51. The findings suggest that previous reliability estimates based on alpha, test–retest, and interrater coefficients overestimated the reliability of job performance ratings. In the present study, the interrater coefficient overestimates reliability by 13.6–25.4% for an interval time of 1–3 years, as it does not control for transient error. Results also showed that the importance of transient error increases as the length of the interval between the measures increases. Based on the results, it is suggested that corrected validities based on interrater reliability underestimate the magnitude of the validity. The implications of these findings for future efforts to estimate criterion reliability and predictor validity are discussed.  相似文献   

15.
In the theory of test validity it is assumed that error scores on two distinct tests, a predictor and a criterion, are uncorrelated. The expected-value concept of true score in the calssical test-theory model as formulated by Lord and Novick, Guttman, and others, implies mathematically, without further assumptions, that true scores and error scores are uncorrelated. This concept does not imply, however, that error scores on two arbitrary tests are uncorrelated, and an additional axiom of “experimental independence” is needed in order to obtain familiar results in the theory of test validity. The formulas derived in the present paper do not depend on this assumption and can be applied to all test scores. These more general formulas reveal some unexpected and anomalous properties of test validty and have implications for the interpretation of validity coefficients in practice. Under some conditions there is no attenuation produced by error of measurement, and the correlation between observed scores sometimes can exceed the correlation between true scores, so that the usual correction for attenuation may be inappropriate and misleading. Observed scores on two tests can be positively correlated even when true scores are negatively correlated, and the validity coefficient can exceed the index of reliability. In some cases of practical interest, the validity coefficient will decrease with increase in test length. These anomalies sometimes occur even when the correlation between error scores is quite small, and their magnitude is inversely related to test reliability. The elimination of correlated errors in practice will not enhance a test's predictive value, but will restore the properties of the validity coefficient that are familiar in the classical theory.  相似文献   

16.
This paper is a presentation of the statistical sampling theory of stepped-up reliability coefficients when a test has been divided into any number of equivalent parts. Maximum-likelihood estimators of the reliability are obtained and shown to be biased. Their sampling distributions are derived and form the basis of the definition of new unbiased estimators with known sampling distributions. These unbiased estimators have a smaller sampling variance than the maximum-likelihood estimators and are, because of this and some other favorable properties, recommended for general use. On the basis of the variances of the unbiased estimators the gain in accuracy in estimating reliability connected with further division of a test can be expressed explicitly. The limits of these variances and thus the limits of accuracy of estimation are derived. Finally, statistical small sample tests of the reliability coefficient are outlined. This paper also covers the sampling distribution of Cronbach's coefficient alpha.  相似文献   

17.
The quality of approximations to first and second order moments (e.g., statistics like means, variances, regression coefficients) based on latent ability estimates is being discussed. The ability estimates are obtained using either the Rasch, or the two-parameter logistic model. Straightforward use of such statistics to make inferences with respect to true latent ability is not recommended, unless we account for the fact that the basic quantities are estimates. In this paper true score theory is used to account for the latter; the counterpart of observed/true score being estimated/true latent ability. It is shown that statistics based on the true score theory are virtually unbiased if the number of items presented to each examinee is larger than fifteen. Three types of estimators are compared: maximum likelihood, weighted maximum likelihood, and Bayes modal. Furthermore, the (dis)advantages of the true score method and direct modeling of latent ability is discussed.  相似文献   

18.
A survey of the literature was made to determine the skills involved in reading comprehension that are deemed most important by authorities. Multiple-choice test items were constructed to measure each of nine skills thus identified as basic. The intercorrelations of the nine skill scores were factored, each skill being weighted in the initial matrix roughly in proportion to its importance in reading comprehension, as judged by authorities. The principal components were rather readily interpretable in terms of the initial variables. Individual scores in components I and II are sufficiently reliable to warrant their use for practical purposes, and useful measures of other components could be provided by constructing the required number of additional items. The results also indicate need for workbooks to aid in improving students' use of basic reading skills. The study provides more detailed information regarding the skills measured by theCooperative Reading Comprehension Tests than has heretofore been provided regarding the skills actually measured by any other widely used reading test. Statistical techniques for estimating the reliability coefficients of individual scores in principal-axes components, for determining whether component variances are greater than would be yielded by chance, and for calculating the significance of the differences between successive component variances are illustrated.On leave for military service.  相似文献   

19.
We developed the German Adaptation of the Personality Assessment Inventory (PAI; Morey, 1991) under careful consideration of current adaptation literature and guidelines. The adaptation process included the translation of the 344 items into German, a back translation into English as well as the testing of the language equivalence using a bilingual sample. We then standardized the final German version of the PAI for the German population. We compared the American and German norm and reliability data. The observed differences in PAI scale means did not exceed 5 T scores. Internal consistency reliability showed a similar pattern in both language versions, although the German alpha coefficients were on average slightly lower than the American ones. Factor structure was similar in both versions. We discuss expectations about the German PAI and possible problems for its practical usefulness for the German-speaking population.  相似文献   

20.
Valid use of the traditional independent samples ANOVA procedure requires that the population variances are equal. Previous research has investigated whether variance homogeneity tests, such as Levene's test, are satisfactory as gatekeepers for identifying when to use or not to use the ANOVA procedure. This research focuses on a novel homogeneity of variance test that incorporates an equivalence testing approach. Instead of testing the null hypothesis that the variances are equal against an alternative hypothesis that the variances are not equal, the equivalence-based test evaluates the null hypothesis that the difference in the variances falls outside or on the border of a predetermined interval against an alternative hypothesis that the difference in the variances falls within the predetermined interval. Thus, with the equivalence-based procedure, the alternative hypothesis is aligned with the research hypothesis (variance equality). A simulation study demonstrated that the equivalence-based test of population variance homogeneity is a better gatekeeper for the ANOVA than traditional homogeneity of variance tests.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号