期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Kseniia Marcq Bjrn Andersson 《应用心理检测》2022,46(3):200

In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal. 相似文献

2.

Asymptotic standard errors of irt observed-score equating methods

Haruhiko?Ogasawara Email author 《Psychometrika》2003,68(2):193-211

A method of the IRT observed-score equating using chain equating through a third test without equating coefficients is presented with the assumption of the three-parameter logistic model. The asymptotic standard errors of the equated scores by this method are obtained using the results given by M. Liou and P.E. Cheng. The asymptotic standard errors of the IRT observed-score equating method using a synthetic examinee group with equating coefficients, which is a currently used method, are also provided. Numerical examples show that the standard errors by these observed-score equating methods are similar to those by the corresponding true score equating methods except in the range of low scores.The author is indebted to Michael J. Kolen for access to the real data used in this article and anonymous reviewers for their corrections and suggestions on this work. 相似文献

3.

基于CTT的锚测验非等组设计中四种等值方法的比较研究

焦丽亚辛涛《心理发展与教育》2006,22(1):97-102

采用锚测验非等组设计的数据收集方案,对4种基于经典测量理论的等值方法进行了比较研究。研究数据取自TIMSS1999数据库,兼用等值标准误和交叉验证方法作为各等值方法比较的检验标准,利用CIPE程序对实验数据进行分析。研究结果表明,针对本研究所设置的等值情境,线性等值优于等百分位等值,其中Tucker线性方法比Levine观察分数线性方法更好一些,Braun-Holland线性方法不宜采用,频数估计等百分位方法等值误差较大,亦不足取。相似文献

4.

Some standard errors in item response theory 总被引：2，自引：0，他引：2

David Thissen Howard Wainer 《Psychometrika》1982,47(4):397-412

The mathematics required to calculate the asymptotic standard errors of the parameters of three commonly used logistic item response models is described and used to generate values for some common situations. It is shown that the maximum likelihood estimation of a lower asymptote can wreak havoc with the accuracy of estimation of a location parameter, indicating that if one needs to have accurate estimates of location parameters (say for purposes of test linking/equating or computerized adaptive testing) the sample sizes required for acceptable accuracy may be unattainable in most applications. It is suggested that other estimation methods be used if the three parameter model is applied in these situations.The research reported here was supported, in part, by contract #F41689-81-6-0012 from the Air Force Human Resources Laboratory to McFann-Gray & Associates, Benjamin A. Fairbank, Jr., Principal Investigator. Further support of Wainer's effort was supplied by the Educational Testing Service, Program Statistics Research Project. 相似文献

5.

普教“升中”考试中测验等值的应用研究——以广东省佛山市“升中”考试为例

张敏强黎光明焦璨《心理与行为研究》2009,7(1):27-31

以广东省佛山市"升中"考试为例.分析和探讨如何选用合适的等值设计与方法来解决普教"升中"考试不同地区分数转换的问题.采用非随机组锚测验等值设计对三种经典测验等值方法进行比较.结果发现:Tuck-er线性等值方法最优,kvine线性等值方法次之,等百分位等值方法(频数估计)不适合此类等值.等值方差分析表明题型与等值方法具有交互作用,这说明不同的题型宜选用不同的等值方法来进行等值. 相似文献

6.

透析GT信度观及其存在的问题

何宁苗丹民霍涌泉《应用心理学》2007,13(1):87-90

概化理论(GT)与经典测验理论(CTT)是随机测量模型下信度评量的重要理论来源。本文将重复测量作为理解概化理论产生与建构的切入点,剖析了GT在测验假设、全域分思想与误差观三方面的特点,提出了目前GT在理论与应用方面存在的若干问题与不足。相似文献

7.

Waldir Lencio Marie Wiberg Michela Battauz 《应用心理检测》2023,47(2):123

Test equating is a statistical procedure to ensure that scores from different test forms can be used interchangeably. There are several methodologies available to perform equating, some of which are based on the Classical Test Theory (CTT) framework and others are based on the Item Response Theory (IRT) framework. This article compares equating transformations originated from three different frameworks, namely IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE). The comparisons were made under different data-generating scenarios, which include the development of a novel data-generation procedure that allows the simulation of test data without relying on IRT parameters while still providing control over some test score properties such as distribution skewness and item difficulty. Our results suggest that IRT methods tend to provide better results than KE even when the data are not generated from IRT processes. KE might be able to provide satisfactory results if a proper pre-smoothing solution can be found, while also being much faster than IRT methods. For daily applications, we recommend observing the sensibility of the results to the equating method, minding the importance of good model fit and meeting the assumptions of the framework. 相似文献

8.

Reliability as a function of the number of item options derived from the “knowledge or random guessing” model

Robert?G.?MacCann Email author 《Psychometrika》2004,69(1):147-157

For (0, 1) scored multiple-choice tests, a formula giving test reliability as a function of the number of item options is derived, assuming the knowledge or random guessing model, the parallelism of the new and old tests (apart from the guessing probability), and the assumptions of classical test theory. It is shown that the formula is a more general case of an equation by Lord, and reduces to Lord's equation if the items are effectively parallel. Further, the formula is shown to be closely related to another formula derived from Lord's randomly parallel tests model. 相似文献

9.

A comment on ‘some standard errors in item response theory’

Dato N. M. de Gruijter 《Psychometrika》1984,49(2):269-272

In maximum likelihood estimation the standard error of the location parameter of the three parameter logistic model can be large, due to inaccurate estimation of the lower asymptote. Thissen and Wainer who demonstrated this effect, suggested that the introduction of a prior distribution for the lower asymptote might alleviate the problems. Here it is demonstrated in some detail that the standard error of the location parameter can be made acceptably small in this way.The author thanks Pieter Vijn for his helpful comments. 相似文献

10.

A Bayesian predictive analysis of test scores

Hidetoki Ishii & Hiroshi Watanabe 《The Japanese psychological research》2001,43(1):25-36

In the classical test theory, a high-reliability test always leads to a precise measurement. However, when it comes to the prediction of test scores, it is not necessarily so. Based on a Bayesian statistical approach, we predicted the distributions of test scores for a new subject, a new test, and a new subject taking a new test. Under some reasonable conditions, the predicted means, variances, and covariances of predicted scores were obtained and investigated. We found that high test reliability did not necessarily lead to small variances or covariances. For a new subject, higher test reliability led to larger predicted variances and covariances, because high test reliability enabled a more accurate prediction of test score variances. Regarding a new subject taking a new test, in this study, higher test reliability led to a large variance when the sample size was smaller than half the number of tests. The classical test theory is reanalyzed from the viewpoint of predictions and some suggestions are made. 相似文献

11.

The attack of the psychometricians 总被引：2，自引：0，他引：2

Denny Borsboom 《Psychometrika》2006,71(3):425-440

相似文献

12.

对15种测验等值方法的比较研究 总被引：20，自引：2，他引：18

下载免费PDF全文

谢小庆《心理学报》2000,32(2):217-222

此项研究通过试验方法对４种基于经典测验理论的等值方法和１１种基于项目反应理论的等值方法进行了比较研究。研究数据为ＨＳＫ正式考试的数据,研究采用了较为可靠的检验标准。研究结果表明,在有些情况下,进行等值处理并非是最好的选择;在题库建设中,某些ＩＲＴ方法是可行的;至少对于ＨＳＫ数据,不论是单、双、三参数,不论是ｍｓ方法和ｍｍ方法,ＩＲＴ参数转换等值方法的误差都较大,均不足取。相似文献

13.

Stochastic order in dichotomous item response models for fixed, adaptive, and multidimensional tests

Wim J. van der Linden 《Psychometrika》1998,63(3):211-226

相似文献

14.

The asymptotic bias of minimum trace factor analysis, with applications to the greatest lower bound to reliability

Alexander Shapiro Jos M. F. ten Berge 《Psychometrika》2000,65(3):413-425

In theory, the greatest lower bound (g.l.b.) to reliability is the best possible lower bound to the reliability based on single test administration. Yet the practical use of the g.l.b. has been severely hindered by sampling bias problems. It is well known that the g.l.b. based on small samples (even a sample of one thousand subjects is not generally enough) may severely overestimate the population value, and statistical treatment of the bias has been badly missing. The only results obtained so far are concerned with the asymptotic variance of the g.l.b. and of its numerator (the maximum possible error variance of a test), based on first order derivatives and the asumption of multivariate normality. The present paper extends these results by offering explicit expressions for the second order derivatives. This yields a closed form expression for the asymptotic bias of both the g.l.b. and its numerator, under the assumptions that the rank of the reduced covariance matrix is at or above the Ledermann bound, and that the nonnegativity constraints on the diagonal elements of the matrix of unique variances are inactive. It is also shown that, when the reduced rank is at its highest possible value (i.e., the number of variables minus one), the numerator of the g.l.b. is asymptotically unbiased, and the asymptotic bias of the g.l.b. is negative. The latter results are contrary to common belief, but apply only to cases where the number of variables is small. The asymptotic results are illustrated by numerical examples.This research was supported by grant DMI-9713878 from the National Science Foundation. 相似文献

15.

Marginal maximum likelihood estimation of item response theory (IRT) equating coefficients for the common-examinee design

Haruhiko Ogasawara 《The Japanese psychological research》2001,43(2):72-82

A method of estimating item response theory (IRT) equating coefficients by the common-examinee design with the assumption of the two-parameter logistic model is provided. The method uses the marginal maximum likelihood estimation, in which individual ability parameters in a common-examinee group are numerically integrated out. The abilities of the common examinees are assumed to follow a normal distribution but with an unknown mean and standard deviation on one of the two tests to be equated. The distribution parameters are jointly estimated with the equating coefficients. Further, the asymptotic standard errors of the estimates of the equating coefficients and the parameters for the ability distribution are given. Numerical examples are provided to show the accuracy of the method. 相似文献

16.

Asymptotic clarifications,generalizations, and concerns regarding an extended class of matched pairs tests based on powers of ranks

Paul W. Mielke Jr. Kenneth J. Berry 《Psychometrika》1983,48(3):483-485

Recent studies pertaining to an extended class of matched pairs tests based on powers of ranks are discussed. Previous questions regarding the asymptotic properties for this class of tests are clarified and a generalization of this class is described. This generalization raises a previously unanticipated concern about whether or not the analytic comparisons resulting from these tests correspond with an intuitive notion of what is being compared. 相似文献

17.

A note on reliability estimation for a test with components of unknown functional lengths

Michelle Liou 《Psychometrika》1989,54(1):153-163

This research note proposes two reliability coefficients for tests with components of unknown functional lengths. The derived coefficients are extensions of the techniques devised by Kristof and Feldt and do not require a reduction of test components into parts. Simulation study indicates that the new coefficients yield reasonably stable reliability estimates when the number of test components is small. 相似文献

18.

新一代测验理论-认知诊断理论的源起与特征 总被引：6，自引：1，他引：6

刘声涛戴海崎周骏《心理学探新》2006,26(4):73-77

认知诊断理论被视为新一代测验理论的核心,是认知心理学与现代测量学相结合的产物。对认知诊断的研究已成为当前国外心理学研究的一个重要热点,并引起国内学者的广泛关注。本文从认知诊断的源起、概念、特征,及研究的基础、框架、意义和难点等七个方面对认知诊断的理论与技术作了一个简要述评,以期推进我国心理学界认知诊断的研究工作。相似文献

19.

Some relationships between the information function of IRT and the signal/noise ratio and reliability coefficient of classical test theory

W. Alan Nicewander 《Psychometrika》1993,58(1):139-141

It is shown that IRTs information function for an item is functionally related to local versions of classical test theories' signal/noise ratio and reliability coefficient. 相似文献

20.

Best linear prediction of composite universe scores

David Jarjoura 《Psychometrika》1983,48(4):525-539

The problem of predicting universe scores for samples of examinees based on their responses to samples of items is treated. A general measurement procedure is described in which multiple test forms are developed from a table of specifications and each form is administered to a different sample of examinees. The measurement model categorizes items according to the cells of such a table, and the linear function derived for minimizing error variance in prediction uses responses to these categories. In addition, some distinctions are drawn between aspects of the approach taken here and the familiar regressed score estimates.The author thanks Robert L. Brennan, Michael J. Kolen, and Richard Sawyer for helpful comments and corrections, and anonymous reviewers for suggested improvements. 相似文献