首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
With respect to the often-present covariance between error terms of correlated variables, D. W. Zimmerman and R. H. Williams's (1977) adjusted correction for attenuation estimates the strength of the pairwise correlation between true scores without assuming independence of error scores. This article focuses on the derivation and analysis of formulas that perform the same function for partial and part correlation coefficients. Values produced by these formulas lie closer to the actual true-score coefficient than do the observed-score coefficients or those obtained by using C. Spearman's (1904) correction for attenuation. The new versions of the formulas thus allow analysts to use hypothetical values for error-score correlations to estimate values for the partial and part correlations between true scores while disregarding the independence-of-errors assumption.  相似文献   

3.
Based on the test theory model for ordinal measurements proposed by Schulman and Haden, the present paper considers correlations between tests, attenuation, regressions involving true and observed scores, and prediction of test reliability.The population correlation between tests is shown to be related to the expected sample correlation for samples of sizen 1 andn 2. Errors of estimation, measurement and prediction are found to be similar to their counterparts in interval test theory, while attenuation is identical to its counterpart. The bias in estimating population reliability from sample data is compared for Kendall's tau and Spearman's rho.The author wishes to thank the referees for their helpful comments on an earlier draft of this paper, and in particular, for the suggested alternative methods of establishing some of the presented results.  相似文献   

4.
This paper presents a contribution to the sampling theory of a set of homogeneous tests which differ only in length, test length being regarded as an essential test parameter. Observed variance-covariance matrices of such measurements are taken to follow a Wishart distribution. The familiar true score-and-error concept of classical test theory is employed. Upon formulation of the basic model it is shown that in a combination of such tests forming a “total” test, the singal-to-noise ratio of the components is additive and that the inverse of the population variance-covariance matrix of the component measures has all of its off-diagonal elements equal, regardless of distributional assumptions. This fact facilitates the subsequent derivation of a statistical sampling theory, there being at mostm + 1 free parameters whenm is the number of component tests. In developing the theory, the cases of known and unknown test lengths are treated separately. For both cases maximum-likelihood estimators of the relevant parameters are derived. It is argued that the resulting formulas will remain resonable even if the distributional assumptions are too narrow. Under these assumptions, however, maximum-likelihood ratio tests of the validity of the model and of hypotheses concerning reliability and standard error of measurement of the total test are given. It is shown in each case that the maximum-likelihood equations possess precisely one acceptable solution under rather natural conditions. Application of the methods can be effected without the use of a computer. Two numerical examples are appended by way of illustration. This research was supported in part by The National Institute of Child Health and Human Development, under Research Grant 1 PO1 HDO1762.  相似文献   

5.
Several authors have suggested that prior to conducting a confirmatory factor analysis it may be useful to group items into a smaller number of item ‘parcels’ or ‘testlets’. The present paper mathematically shows that coefficient alpha based on these parcel scores will only exceed alpha based on the entire set of items if W, the ratio of the average covariance of items between parcels to the average covariance of items within parcels, is greater than unity. If W is less than unity, however, and errors of measurement are uncorrelated, then stratified alpha will be a better lower bound to the reliability of a measure than the other two coefficients. Stratified alpha are also equal to the true reliability of a test when items within parcels are essentially tau‐equivalent if one assumes that errors of measurement are not correlated.  相似文献   

6.
Interrater correlations are widely interpreted as estimates of the reliability of supervisory performance ratings, and are frequently used to correct the correlations between ratings and other measures (e.g., test scores) for attenuation. These interrater correlations do provide some useful information, but they are not reliability coefficients. There is clear evidence of systematic rater effects in performance appraisal, and variance associated with raters is not a source of random measurement error. We use generalizability theory to show why rater variance is not properly interpreted as measurement error, and show how such systematic rater effects can influence both reliability estimates and validity coefficients. We show conditions under which interrater correlations can either overestimate or underestimate reliability coefficients, and discuss reasons other than random measurement error for low interrater correlations.  相似文献   

7.
A new, short, and easily administered Risk Propensity Scale (RPS) is introduced that measures general risk‐taking tendencies. This paper investigates the reliability and discriminant validity of the RPS. The RPS provided scores that yielded a good internal reliability coefficient and adequate test–retest reliability, and the scores correlated moderately to well with those of the Everyday Risk Inventory and the short Sensation‐Seeking Scale. The correlation with the scores from other scales (Need for Cognition scale, Need for Structure scale, and 2 self‐esteem scales) was low to moderate, indicating good discriminant validity. The findings are discussed in relation to risk‐perception research using gambling experiments and in relation to their usefulness for risky decision‐making research.  相似文献   

8.
Formulas for the standard error of measurement of three measures of change—simple difference scores, residualized difference scores, and the measure introduced by Tucker, Damarin, and Messick—are derived. Equating these formulas by pairs yields additional explicit formulas which provide a practical guide for determining the relative error of the three measures in any pretest-posttest design. The functional relationship between the standard error of measurement and the correlation between pretest and posttest observed scores remains essentially the same for each of the three measures despite variations in other test parameters (reliability coefficients, standard deviations), even when pretest and posttest errors of measurement are correlated.  相似文献   

9.
Assuming item parameters on a test are known constants, the reliability coefficient for item response theory (IRT) ability estimates is defined for a population of examinees in two different ways: as (a) the product-moment correlation between ability estimates on two parallel forms of a test and (b) the squared correlation between the true abilities and estimates. Due to the bias of IRT ability estimates, the parallel-forms reliability coefficient is not generally equal to the squared-correlation reliability coefficient. It is shown algebraically that the parallel-forms reliability coefficient is expected to be greater than the squared-correlation reliability coefficient, but the difference would be negligible in a practical sense.  相似文献   

10.
11.
Classical reliability theory assumes that individuals have identical true scores on both testing occasions, a condition described as stable. If some individuals' true scores are different on different testing occasions, described as unstable, the estimated reliability can be misleading. A model called stable unstable reliability theory (SURT) frames stability or instability as an empirically testable question. SURT assumes a mixed population of stable and unstable individuals in unknown proportions, with w(i) the probability that individual i is stable. w(i) becomes i's test score weight which is used to form a weighted correlation coefficient r(w) which is reliability under SURT. If all w(i) = 1 then r(w) is the classical reliability coefficient; thus classical theory is a special case of SURT. Typically r(w) is larger than the conventional reliability r, and confidence intervals on true scores are typically shorter than conventional intervals. r(w) is computed with routines in a publicly available R package.  相似文献   

12.
Previous research on measurement error in job performance ratings estimated reliability using coefficients: alpha, test–retest, and interrater correlation. None of these three coefficients control for the four main sources of error in performance ratings. For this reason, coefficient of equivalence and stability (CES) has been suggested as the ideal estimate of reliability. This article presents the estimates of CES for a time interval of 1, 2, and 3 years. The values obtained for a single rater were .51, .48, and .44, respectively. For two raters, the values were .59, .55, and .51. The findings suggest that previous reliability estimates based on alpha, test–retest, and interrater coefficients overestimated the reliability of job performance ratings. In the present study, the interrater coefficient overestimates reliability by 13.6–25.4% for an interval time of 1–3 years, as it does not control for transient error. Results also showed that the importance of transient error increases as the length of the interval between the measures increases. Based on the results, it is suggested that corrected validities based on interrater reliability underestimate the magnitude of the validity. The implications of these findings for future efforts to estimate criterion reliability and predictor validity are discussed.  相似文献   

13.
Critics of Kinesthetic Aftereffect (KAE) recommend abandoning it as a personality measure largely because of poor test-retest reliability. Although no test can be valid if lacking true reliability, to discard a measure because of poor retest reliability is an oversimplification of validation procedures. This pitfall is exemplified here by a reexamination of KAE. KAE scores involve measures before (pretest) and after (test) aftereffect induction. Internal analysis of a KAE study showed: Differential bias is present; its locus is the second session pretest; its form makes second-session pretest scores functionally more similar to first- and second-session test scores and functionally more dissimilar to first-session pretest scores. Given this second session bias, the retest correlation tells us nothing about the true reliability of a one-session KAE score. However, if a measure possesses external validity, it must to some degree show true reliability. Based upon a literature review of one-session KAE validity studies, we conclude that one-session KAE scores are valid and hence show true reliability. KAE remains a promising personality measure.  相似文献   

14.
对学前儿童语言学习能力诊断量表的效度评价   总被引:1,自引:0,他引:1  
以所编制的量表为工具 ,对采集的数据进行效度分析 ,结果表明各分测验与全量表有较好的相关 ,说明量表的内容效度是比较高的。使用因素分析的方法 ,将全部变量作系统分类 ,研究量表的结构 ,绝大部分分测验在所得的四个因素上的共通性都大于 0 .70 ;保留下的分测验与所属因素的相关系数在0 .5 3 -0 .84之间 ,它们在各个因素上有较高的负荷量 ,说明量表有较好的结构效度。从效度分析的结果看 ,本量表的测量结果应该是准确的。另外 ,还根据因素分析结果指示的方向 ,调整了分测验 ,调整后的量表结构不但与假设的量表结构十分吻合 ,而且更条理化。  相似文献   

15.
测验信度估计:从α系数到内部一致性信度   总被引:5,自引:0,他引:5  
温忠麟  叶宝娟 《心理学报》2011,43(7):821-829
沿用经典的测验信度定义, 简介了信度与a 系数的关系以及a系数的局限。为了推荐替代a系数的信度估计方法, 深入讨论了与a 系数关系密切的同质性信度和内部一致性信度。在很一般的条件下, 证明了a 系数和同质性信度都不超过内部一致性信度, 后者不超过测验信度, 说明内部一致性信度比较接近测验信度。总结出一个测验信度分析流程, 说明什么情况下a 系数还有参考价值; 什么情况下a 系数不再适用, 应当使用内部一致性信度(文献上也常称为合成信度)。提供了计算同质性信度和内部一致性信度的计算程序, 一般的应用工作者可以直接套用。  相似文献   

16.
An empirical study of test scores shows the variance of the errors of measurement to be significantly associated with true score in each of four groups studied; it also shows the distribution of the errors of measurement to be significantly skewed in three of these four groups. The mathematical rationale underlying the statistical treatment is presented. Standard error formulas are given for making the necessary significance tests.This research was in part carried out under Contracts Nonr-2214(00) and Nonr-2752(00) with the Office of Naval Research, Department of the Navy.  相似文献   

17.
It is demonstrated that theorems in test theory have corresponding dual theorems which are obtained by exchanging true scores and error scores, as well as reliability coefficients and their complements, in both the hypothesis and the conclusion. A formula that does not conform to the principle cannot be an identity in the test-theory model, but must be based on additional assumptions in the hypothesis tht perhaps are not immediately apparent. The usefulness of this principle is indicated, and its origin in the mathematical formalism underlying the theory is discussed.  相似文献   

18.
A hybrid procedure for number correct scoring is proposed. The proposed scoring procedure is based on both classical true-score theory (CTT) and multidimensional item response theory (MIRT). Specifically, the hybrid scoring procedure uses test item weights based on MIRT and the total test scores are computed based on CTT. Thus, what makes the hybrid scoring method attractive is that this method accounts for the dimensionality of the test items while test scores remain easy to compute. Further, the hybrid scoring does not require large sample sizes once the item parameters are known. Monte Carlo techniques were used to compare and contrast the proposed hybrid scoring method with three other scoring procedures. Results indicated that all scoring methods in this study generated estimated and true scores that were highly correlated. However, the hybrid scoring procedure had significantly smaller error variances between the estimated and true scores relative to the other procedures.  相似文献   

19.
A number of empirical investigations indicate that tests with a greater number of response options tend to yield better psychometric performance. We hypothesized that a version of the MMPI–2 with a polytomous response format would outperform the standard dichotomous format in terms of observed score reliability and validity. Two versions of the MMPI–2 RC scales were administered consecutively in counterbalanced order to 199 undergraduate students attending a large Midwestern university: the standard true–false version, and an experimental version containing 4 response options (very true, mainly true, slightly true, and false, not at all true). After participants completed both versions, 2 scales from the Multidimensional Personality Questionnaire (MPQ) were administered to assess differences in convergent validity. Results showed enhancements in reliability for all RC scale scores and increases in the convergent validity of scores. Directions for further investigation and potential implications for future test development are discussed.  相似文献   

20.
JOHNSON HG 《Psychometrika》1950,15(2):115-119
Evidence is cited to show that specificity, or lack of equivalence, in the comparable forms of tests has a tendency to lower the value of reliability coefficients but has no tendency to lower the value of observed trait coefficients. This implies that the greater the lack of equivalence, the higher will be coefficients corrected for attenuation. Errors of measurement are supposed to reduce the magnitude of observed trait coefficients. Since specificity does not lower the correlation between two tests and since the split-half and equivalent-form reliability coefficients treat specificity as error, it follows that these two coefficients cannot legitimately be used in Spearman's correction-for-attenuation formula.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号