共查询到20条相似文献,搜索用时 15 毫秒
1.
In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal. 相似文献
2.
A method of the IRT observed-score equating using chain equating through a third test without equating coefficients is presented with the assumption of the three-parameter logistic model. The asymptotic standard errors of the equated scores by this method are obtained using the results given by M. Liou and P.E. Cheng. The asymptotic standard errors of the IRT observed-score equating method using a synthetic examinee group with equating coefficients, which is a currently used method, are also provided. Numerical examples show that the standard errors by these observed-score equating methods are similar to those by the corresponding true score equating methods except in the range of low scores.The author is indebted to Michael J. Kolen for access to the real data used in this article and anonymous reviewers for their corrections and suggestions on this work. 相似文献
3.
采用锚测验非等组设计的数据收集方案,对4种基于经典测量理论的等值方法进行了比较研究。研究数据取自TIMSS1999数据库,兼用等值标准误和交叉验证方法作为各等值方法比较的检验标准,利用CIPE程序对实验数据进行分析。研究结果表明,针对本研究所设置的等值情境,线性等值优于等百分位等值,其中Tucker线性方法比Levine观察分数线性方法更好一些,Braun-Holland线性方法不宜采用,频数估计等百分位方法等值误差较大,亦不足取。 相似文献
4.
Some standard errors in item response theory 总被引:2,自引:0,他引:2
The mathematics required to calculate the asymptotic standard errors of the parameters of three commonly used logistic item response models is described and used to generate values for some common situations. It is shown that the maximum likelihood estimation of a lower asymptote can wreak havoc with the accuracy of estimation of a location parameter, indicating that if one needs to have accurate estimates of location parameters (say for purposes of test linking/equating or computerized adaptive testing) the sample sizes required for acceptable accuracy may be unattainable in most applications. It is suggested that other estimation methods be used if the three parameter model is applied in these situations.The research reported here was supported, in part, by contract #F41689-81-6-0012 from the Air Force Human Resources Laboratory to McFann-Gray & Associates, Benjamin A. Fairbank, Jr., Principal Investigator. Further support of Wainer's effort was supplied by the Educational Testing Service, Program Statistics Research Project. 相似文献
5.
6.
7.
Test equating is a statistical procedure to ensure that scores from different test forms can be used interchangeably. There are several methodologies available to perform equating, some of which are based on the Classical Test Theory (CTT) framework and others are based on the Item Response Theory (IRT) framework. This article compares equating transformations originated from three different frameworks, namely IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE). The comparisons were made under different data-generating scenarios, which include the development of a novel data-generation procedure that allows the simulation of test data without relying on IRT parameters while still providing control over some test score properties such as distribution skewness and item difficulty. Our results suggest that IRT methods tend to provide better results than KE even when the data are not generated from IRT processes. KE might be able to provide satisfactory results if a proper pre-smoothing solution can be found, while also being much faster than IRT methods. For daily applications, we recommend observing the sensibility of the results to the equating method, minding the importance of good model fit and meeting the assumptions of the framework. 相似文献
8.
For (0, 1) scored multiple-choice tests, a formula giving test reliability as a function of the number of item options is derived, assuming the knowledge or random guessing model, the parallelism of the new and old tests (apart from the guessing probability), and the assumptions of classical test theory. It is shown that the formula is a more general case of an equation by Lord, and reduces to Lord's equation if the items are effectively parallel. Further, the formula is shown to be closely related to another formula derived from Lord's randomly parallel tests model. 相似文献
9.
Dato N. M. de Gruijter 《Psychometrika》1984,49(2):269-272
In maximum likelihood estimation the standard error of the location parameter of the three parameter logistic model can be large, due to inaccurate estimation of the lower asymptote. Thissen and Wainer who demonstrated this effect, suggested that the introduction of a prior distribution for the lower asymptote might alleviate the problems. Here it is demonstrated in some detail that the standard error of the location parameter can be made acceptably small in this way.The author thanks Pieter Vijn for his helpful comments. 相似文献
10.
In the classical test theory, a high-reliability test always leads to a precise measurement. However, when it comes to the prediction of test scores, it is not necessarily so. Based on a Bayesian statistical approach, we predicted the distributions of test scores for a new subject, a new test, and a new subject taking a new test. Under some reasonable conditions, the predicted means, variances, and covariances of predicted scores were obtained and investigated. We found that high test reliability did not necessarily lead to small variances or covariances. For a new subject, higher test reliability led to larger predicted variances and covariances, because high test reliability enabled a more accurate prediction of test score variances. Regarding a new subject taking a new test, in this study, higher test reliability led to a large variance when the sample size was smaller than half the number of tests. The classical test theory is reanalyzed from the viewpoint of predictions and some suggestions are made. 相似文献
11.
12.
此项研究通过试验方法对4种基于经典测验理论的等值方法和11种基于项目反应理论的等值方法进行了比较研究。研究数据为HSK正式考试的数据,研究采用了较为可靠的检验标准。研究结果表明,在有些情况下,进行等值处理并非是最好的选择;在题库建设中,某些IRT方法是可行的;至少对于HSK数据,不论是单、双、三参数,不论是ms方法和mm方法,IRT参数转换等值方法的误差都较大,均不足取。 相似文献
13.
14.
In theory, the greatest lower bound (g.l.b.) to reliability is the best possible lower bound to the reliability based on single test administration. Yet the practical use of the g.l.b. has been severely hindered by sampling bias problems. It is well known that the g.l.b. based on small samples (even a sample of one thousand subjects is not generally enough) may severely overestimate the population value, and statistical treatment of the bias has been badly missing. The only results obtained so far are concerned with the asymptotic variance of the g.l.b. and of its numerator (the maximum possible error variance of a test), based on first order derivatives and the asumption of multivariate normality. The present paper extends these results by offering explicit expressions for the second order derivatives. This yields a closed form expression for the asymptotic bias of both the g.l.b. and its numerator, under the assumptions that the rank of the reduced covariance matrix is at or above the Ledermann bound, and that the nonnegativity constraints on the diagonal elements of the matrix of unique variances are inactive. It is also shown that, when the reduced rank is at its highest possible value (i.e., the number of variables minus one), the numerator of the g.l.b. is asymptotically unbiased, and the asymptotic bias of the g.l.b. is negative. The latter results are contrary to common belief, but apply only to cases where the number of variables is small. The asymptotic results are illustrated by numerical examples.This research was supported by grant DMI-9713878 from the National Science Foundation. 相似文献
15.
Haruhiko Ogasawara 《The Japanese psychological research》2001,43(2):72-82
A method of estimating item response theory (IRT) equating coefficients by the common-examinee design with the assumption of the two-parameter logistic model is provided. The method uses the marginal maximum likelihood estimation, in which individual ability parameters in a common-examinee group are numerically integrated out. The abilities of the common examinees are assumed to follow a normal distribution but with an unknown mean and standard deviation on one of the two tests to be equated. The distribution parameters are jointly estimated with the equating coefficients. Further, the asymptotic standard errors of the estimates of the equating coefficients and the parameters for the ability distribution are given. Numerical examples are provided to show the accuracy of the method. 相似文献
16.
Recent studies pertaining to an extended class of matched pairs tests based on powers of ranks are discussed. Previous questions regarding the asymptotic properties for this class of tests are clarified and a generalization of this class is described. This generalization raises a previously unanticipated concern about whether or not the analytic comparisons resulting from these tests correspond with an intuitive notion of what is being compared. 相似文献
17.
Michelle Liou 《Psychometrika》1989,54(1):153-163
This research note proposes two reliability coefficients for tests with components of unknown functional lengths. The derived coefficients are extensions of the techniques devised by Kristof and Feldt and do not require a reduction of test components into parts. Simulation study indicates that the new coefficients yield reasonably stable reliability estimates when the number of test components is small. 相似文献
18.
19.
W. Alan Nicewander 《Psychometrika》1993,58(1):139-141
It is shown that IRTs information function for an item is functionally related to local versions of classical test theories' signal/noise ratio and reliability coefficient. 相似文献
20.
David Jarjoura 《Psychometrika》1983,48(4):525-539
The problem of predicting universe scores for samples of examinees based on their responses to samples of items is treated. A general measurement procedure is described in which multiple test forms are developed from a table of specifications and each form is administered to a different sample of examinees. The measurement model categorizes items according to the cells of such a table, and the linear function derived for minimizing error variance in prediction uses responses to these categories. In addition, some distinctions are drawn between aspects of the approach taken here and the familiar regressed score estimates.The author thanks Robert L. Brennan, Michael J. Kolen, and Richard Sawyer for helpful comments and corrections, and anonymous reviewers for suggested improvements. 相似文献