期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Analytic smoothing for equipercentile equating under the common item nonequivalent populations design 总被引：1，自引：0，他引：1

Michael J. Kolen David Jarjoura 《Psychometrika》1987,52(1):43-59

A cubic spline method for smoothing equipercentile equating relationships under the common item nonequivalent populations design is described. Statistical techniques based on bootstrap estimation are presented that are designed to aid in choosing an equating method/degree of smoothing. These include: (a) asymptotic significance tests that compare no equating and linear equating to equipercentile equating; (b) a scheme for estimating total equating error and for dividing total estimated error into systematic and random components. The smoothing technique and statistical procedures are explored and illustrated using data from forms of a professional certification test. 相似文献

2.

The Missing Data Assumptions of the NEAT Design and their Implications for Test Equating

Sandip Sinharay Paul W. Holland 《Psychometrika》2010,75(2):309-327

The Non-Equivalent groups with Anchor Test (NEAT) design involves missing data that are missing by design. Three nonlinear observed score equating methods used with a NEAT design are the frequency estimation equipercentile equating (FEEE), the chain equipercentile equating (CEE), and the item-response-theory observed-score-equating (IRT OSE). These three methods each make different assumptions about the missing data in the NEAT design. The FEEE method assumes that the conditional distribution of the test score given the anchor test score is the same in the two examinee groups. The CEE method assumes that the equipercentile functions equating the test score to the anchor test score are the same in the two examinee groups. The IRT OSE method assumes that the IRT model employed fits the data adequately, and the items in the tests and the anchor test do not exhibit differential item functioning across the two examinee groups. This paper first describes the missing data assumptions of the three equating methods. Then it describes how the missing data in the NEAT design can be filled in a manner that is coherent with the assumptions made by each of these equating methods. Implications on equating are also discussed. 相似文献

3.

基于CTT的锚测验非等组设计中四种等值方法的比较研究

下载免费PDF全文

焦丽亚辛涛《心理发展与教育》2006,22(1):97-102

采用锚测验非等组设计的数据收集方案,对4种基于经典测量理论的等值方法进行了比较研究。研究数据取自TIMSS1999数据库,兼用等值标准误和交叉验证方法作为各等值方法比较的检验标准,利用CIPE程序对实验数据进行分析。研究结果表明,针对本研究所设置的等值情境,线性等值优于等百分位等值,其中Tucker线性方法比Levine观察分数线性方法更好一些,Braun-Holland线性方法不宜采用,频数估计等百分位方法等值误差较大,亦不足取。相似文献

4.

无铆题情况下测验分数等值方法探索——构造铆测验法

刘玥刘红云《心理科学》2015,(6):1504-1512

研究旨在探索无铆题情况下,使用构造铆测验法,实现测验分数等值。研究一和研究二分别探索题目难度排序错误、铆题难度差异对构造铆测验法的影响。结果表明：（1）等组条件下,随着错误铆题比例,难度排序错误程度,铆题难度差异增大,构造铆测验法的等值误差逐渐增大,随机等组法的等值误差较为稳定;不等组条件下,构造铆测验法的等值误差均小于随机等组法;（2）对于构造铆测验法,在不等组条件下,铆测验长度越短,等值误差越大。相似文献

5.

Standard Errors of Kernel Equating: Accounting for Bandwidth Estimation

Kseniia Marcq Bjrn Andersson 《应用心理检测》2022,46(3):200

In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal. 相似文献

6.

普教“升中”考试中测验等值的应用研究——以广东省佛山市“升中”考试为例

张敏强黎光明焦璨《心理与行为研究》2009,7(1):27-31

以广东省佛山市"升中"考试为例.分析和探讨如何选用合适的等值设计与方法来解决普教"升中"考试不同地区分数转换的问题.采用非随机组锚测验等值设计对三种经典测验等值方法进行比较.结果发现:Tuck-er线性等值方法最优,kvine线性等值方法次之,等百分位等值方法(频数估计)不适合此类等值.等值方差分析表明题型与等值方法具有交互作用,这说明不同的题型宜选用不同的等值方法来进行等值. 相似文献

7.

核等值：一种观察分数等值体系

王少杰张敏强李拓宇梁正妍《心理科学进展》2020,28(5):855-870

核等值流程包括：预平滑、估计分数概率、连续化、等值、评估等值结果。该方法兼具线性等值与等百分位等值的优点, 各环节扩展性与包容性较强; 采用平滑与连续化处理, 可降低等值随机误差; 等值差异标准误等其所特有的概念为结果评估提供可靠的工具。连续化与带宽选择方法等因素均可影响其表现; 基于核等值的新方法为等值发展提供了新颖的视角。未来可关注核等值体系的扩充与完善、流程的更新、等值方法的结合和比较等方向。相似文献

8.

含题组的测验等值

吴锐丁树良甘登文《心理学报》2010,42(3):434-442

题组越来越多地出现在各类考试中, 采用标准的IRT模型对有题组的测验等值, 可能因忽略题组的局部相依性导致等值结果的失真。为解决此问题, 我们采用基于题组的2PTM模型及IRT特征曲线法等值, 以等值系数估计值的误差大小作为衡量标准, 以Wilcoxon符号秩检验为依据, 在几种不同情况下进行了大量的Monte Carlo模拟实验。实验结果表明, 考虑了局部相依性的题组模型2PTM绝大部分情况下都比2PLM等值的误差小且有显著性差异。另外, 用6种不同等值准则对2PTM等值并评价了不同条件下等值准则之间的优劣。相似文献

9.

铆题比例对等值精度的影响

蔡艳丁树良涂冬波《心理学探新》2009,29(2):86-89

在非等组铆测验设计中,铆题量占测验长度的多大比例比较合适,这个比例随测验长度的增大可否发生变化？这些是实际工作者和研究者非常关心的问题。该文在固定被试数和测验长度的条件下,探查铆题量所占测验长度比例（简称铆题比例）的变化对等值精度的影响,讨论了在实际等值中如何在等值精度和铆题比例之间取得平衡的问题。并在模拟研究的条件下,给出了几个反应实际等值精度的指标。相似文献

10.

不同定义平行测验等值的群体不变性 总被引：1，自引：0，他引：1

刘铁川戴海琦赵玉《心理学探新》2012,(1):67-71

群体不变性是等值的一个重要假设,即对不同的考生子群体等值函数一致。本研究对不同平行测验定义下线性等值的群体不变性进行了理论分析和模拟研究,模拟研究REMSD指标通过六种不同加权方式计算。结果显示,严格平行测验在信度较低时REMSD指标更大;子群体均值差异和信度差异对REMSD的影响存在明显的交互作用;REMSD指标在期望权重等权下的最大,在分数权重采用子群体比例加权最小。最后对结果进行了讨论,对REMSD权重使用及进一步研究给出了建议。相似文献

11.

Asymptotic standard errors of irt observed-score equating methods

Haruhiko?Ogasawara Email author 《Psychometrika》2003,68(2):193-211

A method of the IRT observed-score equating using chain equating through a third test without equating coefficients is presented with the assumption of the three-parameter logistic model. The asymptotic standard errors of the equated scores by this method are obtained using the results given by M. Liou and P.E. Cheng. The asymptotic standard errors of the IRT observed-score equating method using a synthetic examinee group with equating coefficients, which is a currently used method, are also provided. Numerical examples show that the standard errors by these observed-score equating methods are similar to those by the corresponding true score equating methods except in the range of low scores.The author is indebted to Michael J. Kolen for access to the real data used in this article and anonymous reviewers for their corrections and suggestions on this work. 相似文献

12.

测验链接中的锚题代表性研究

叶萌辛涛《心理科学》2015,(1):209-215

本文旨在以“锚题代表性”这一研究命题切入,探索在非等组锚测验设计下,作为实现测验链接的重要载体,锚题和相关的测验试卷/水平之间究竟应该有什么关系。本文首先指出锚题代表性这一概念在等值和垂直量尺化领域中具有不同的含义,并给出其在垂直量尺化中的含义。通过考察测验链接中有关锚题代表性的既有研究,系统总结相关研究成果,本文概括出了当前锚题构建实践的可能优化方案,分析了锚题代表性研究的未来方向。相似文献

13.

题目难度分布和样本容量对两种CTT等值结果的影响

戴步云罗照盛《心理学探新》2012,32(3):246-251

基于经典测验理论(CTT)的等值方法主要有线性等值和等百分位等值两种。在不同情境下,不同的等值方法会产生不同的等值结果。本研究以真分数等值为依据,用蒙特卡洛模拟研究方法,综合比较了各种题目难度分布条件下和各种样本容量条件下两种CTT等值方法的等值结果。研究结果表明:(1)线性等值的误差受题目难度分布影响较大,等百分位等值的误差几乎不受题目难度分布影响。(2)线性等值的误差几乎不受样本容量的影响,等百分位等值的误差受样本容量影响较大。(3)不论题目难度分布如何,只要样本容量足够大,等百分位等值的效果都比线性等值更好。相似文献

14.

Tau-equivalence and equipercentile equating

Wendy M. Yen 《Psychometrika》1983,48(3):353-369

Test scores that are not perfectly reliable cannot be strictly equated unless they are strictly parallel [Lord, 1980]. This fact implies that tau-equivalence can be lost if an equipercentile equating is applied to observed scores that are not strictly parallel. Seventy-two simulated testing conditions are produced to simulate equating tests with different difficulties and discriminations. Number-correct and trait metrics are examined. When an equipercentile equating is applied to these data, locally biased (i.e., non-tau-equivalent) results are produced for tests of unequal difficulty. Differences between the criteria of tau-equivalence and equipercentile equivalence are discussed. 相似文献

15.

对从HSK题库中计算机自动生成试卷稳定性的试验检验 总被引：1，自引：0，他引：1

谢小庆任杰《心理学探新》1999,(4)

由计算机从题库中自动生成的试卷能否保持难度的相对稳定？根据IRT进行的等值误差范围有多大？为了回答这些问题,本文以共同组等值作为标准,对基于IRT之上的共同题等值误差进行了试验检验。试验中,采取一定措施保证了考生的动机水平。结果显示,IRT等值的校正方向都是正确的。在4个分测验中有3个分测验的的等值校正效果较理想,1个分测验的等值校正效果不够理想。计算机自动生成的试卷与原有人工命制的试卷在得分方面比较一致,分数相关达到0.931,获得证书的情况也是比较一致的。相似文献

16.

Exploring a Source of Uneven Score Equity across the Test Score Range

Anne Corinne Huggins-Manley Yuxi Qiu Randall D. Penfield 《International Journal of Testing》2018,18(1):50-70

Score equity assessment (SEA) refers to an examination of population invariance of equating across two or more subpopulations of test examinees. Previous SEA studies have shown that score equity may be present for examinees scoring at particular test score ranges but absent for examinees scoring at other score ranges. No studies to date have performed research for the purpose of understanding why score equity can be inconsistent across the score range of some tests. The purpose of this study is to explore a source of uneven subpopulation score equity across the score range of a test. It is hypothesized that the difficulty of anchor items displaying differential item functioning (DIF) is directly related to the score location at which issues of score inequity are observed. The simulation study supports the hypothesis that the difficulty of DIF items has a systematic impact on the uneven nature of conditional score equity. 相似文献

17.

项目反应理论观察分数核等值的影响因素

王少杰张敏强黄菲菲黄丽芳袁琪婷《心理科学》2022,45(4):988-997

探究带宽选择方法、样本量、题目数量、等值设计、数据模拟方式对项目反应理论观察分数核等值的影响。通过两种数据模拟方式,获得研究数据,并计算局部与全域评价指标。研究发现,在随机组设计中,带宽选择方法表现相似;考生样本量和题目数量影响甚微。在非等组设计中,惩罚法与Silverman经验准则表现优异;增加题目量可降低百分相对误差和随机误差;增加样本量导致百分相对误差变大,随机误差减小。数据模拟方式可影响等值评价。未来应重点关注等值系统评估。相似文献

18.

概率分布等值法及其应用

丁树良吴锐张节兰熊建华《心理学报》2008,40(1):101-108

在项目反应理论框架下,根据已有文献提出了开发新的测验等值准则的方法,即许多准则都可以看成是通过对锚题上作答反应概率分布进行变换而导出。据此揭示了两个著名的等值准则——Haebara方法和Stocking-Lord方法之间的联系,并且导出了一个新的等值准则——余弦等值准则。为了讨论余弦准则的行为表现,开展了一系列Monte-Carlo模拟研究。模拟结果表明,余弦准则在多级评分模型GPCM上表现比Haebara方法和Stocking--Lord方法都好,而对GRM和2PLM,其表现不如Haebara,但可以和Stocking-Lord方法相提并论。这一发现提醒我们等值准则的选用是否恰当,不仅与等值系数所落的范围有关,而且还与项目反应函数（IRF）有更密切的关系相似文献

19.

Observed-score equating as a test assembly problem

Wim J. van der Linden Richard M. Luecht 《Psychometrika》1998,63(4):401-418

A set of linear conditions on item response functions is derived that guarantees identical observed-score distributions on two test forms. The conditions can be added as constraints to a linear programming model for test assembly that assembles a new test form to have an observed-score distribution optimally equated to the distribution on an old form. For a well-designed item pool and items fitting the IRT model, use of the model results into observed-score pre-equating and prevents the necessity ofpost hoc equating by a conventional observed-score equating method. An empirical example illustrates the use of the model for an item pool from the Law School Admission Test.The authors are most indebted to Norman D. Verhelst for suggesting Proposition 4 and its proof, to the Law School Admission Council (LSAC) for making available the data set, and to Wim M. M. Tielen for his computational assistance. 相似文献

20.

等级反应模型下项目特征曲线等值法在大型考试中的应用 总被引：2，自引：1，他引：1

周骏欧东明徐淑媛戴海琦漆书青《心理学报》2005,37(6):832-838

在中国最大的资格考试之一的经济专业资格考试中,为保证不同年度间考试的可比性、进行题库建设和为计算机自适应考试做准备,应用项目反应理论中等级反应模型下的项目特征曲线等值法,采用铆测验等值设计,实现了4个年度考试资料的项目参数和能力参数的等值,并成功地组建了经济专业题库。在此基础上,利用等值技术对不同年份试卷的划界分数进行了比较,为经济考试的合格标准制定、确保考试的公平性提供了实证依据。相似文献