首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
采用锚测验非等组设计的数据收集方案,对4种基于经典测量理论的等值方法进行了比较研究。研究数据取自TIMSS1999数据库,兼用等值标准误和交叉验证方法作为各等值方法比较的检验标准,利用CIPE程序对实验数据进行分析。研究结果表明,针对本研究所设置的等值情境,线性等值优于等百分位等值,其中Tucker线性方法比Levine观察分数线性方法更好一些,Braun-Holland线性方法不宜采用,频数估计等百分位方法等值误差较大,亦不足取。  相似文献   

2.
Hongwen Guo 《Psychometrika》2010,75(3):438-453
After many equatings have been conducted in a testing program, equating errors can accumulate to a degree that is not negligible compared to the standard error of measurement. In this paper, the author investigates the asymptotic accumulative standard error of equating (ASEE) for linear equating methods, including chained linear, Tucker, and Levine, under the nonequivalent groups with anchor test (NEAT) design. A recursive formula for the ASEE is provided for a series of equatings that makes use of only historical summary statistics. This formula can serve as a new tool to measure the magnitude of equating errors that have accumulated over a series of equatings, and to help monitor and design testing programs.  相似文献   

3.
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are compared through the analysis of data sets famous in the equating literature. Also, the classical percentile-rank, linear, and mean equating models are each proven to be a special case of a Bayesian model under a highly-informative choice of prior distribution.  相似文献   

4.
罗莲 《心理学探新》2008,28(2):69-74
该文介绍了一种新的等值方法一核等值法。首先介绍了核等值法的研究过程、它的主要特点以及五个步骤(前平滑处理、估计分数概率、连续化、等值、计算等值标准误)。之后,介绍了核等值法与其他传统的观察分等值方法的差异,最后是对核等值法的评价。  相似文献   

5.
基于经典测验理论(CTT)的等值方法主要有线性等值和等百分位等值两种。在不同情境下,不同的等值方法会产生不同的等值结果。本研究以真分数等值为依据,用蒙特卡洛模拟研究方法,综合比较了各种题目难度分布条件下和各种样本容量条件下两种CTT等值方法的等值结果。研究结果表明:(1)线性等值的误差受题目难度分布影响较大,等百分位等值的误差几乎不受题目难度分布影响。(2)线性等值的误差几乎不受样本容量的影响,等百分位等值的误差受样本容量影响较大。(3)不论题目难度分布如何,只要样本容量足够大,等百分位等值的效果都比线性等值更好。  相似文献   

6.
不同定义平行测验等值的群体不变性   总被引:1,自引:0,他引:1  
群体不变性是等值的一个重要假设,即对不同的考生子群体等值函数一致。本研究对不同平行测验定义下线性等值的群体不变性进行了理论分析和模拟研究,模拟研究REMSD指标通过六种不同加权方式计算。结果显示,严格平行测验在信度较低时REMSD指标更大;子群体均值差异和信度差异对REMSD的影响存在明显的交互作用;REMSD指标在期望权重等权下的最大,在分数权重采用子群体比例加权最小。最后对结果进行了讨论,对REMSD权重使用及进一步研究给出了建议。  相似文献   

7.
探究带宽选择方法、样本量、题目数量、等值设计、数据模拟方式对项目反应理论观察分数核等值的影响。通过两种数据模拟方式,获得研究数据,并计算局部与全域评价指标。研究发现,在随机组设计中,带宽选择方法表现相似;考生样本量和题目数量影响甚微。在非等组设计中,惩罚法与Silverman经验准则表现优异;增加题目量可降低百分相对误差和随机误差;增加样本量导致百分相对误差变大,随机误差减小。数据模拟方式可影响等值评价。未来应重点关注等值系统评估。  相似文献   

8.
Assessments consisting of only a few extended constructed response items (essays) are not typically equated using anchor test designs as there are typically too few essay prompts in each form to allow for meaningful equating. This article explores the idea that output from an automated scoring program designed to measure writing fluency (a common objective of many writing prompts) can be used in place of a more traditional anchor. The linear-logistic equating method used in this article is a variant of the Tucker linear equating method appropriate for the limited score range typical of essays. The procedure is applied to historical data. Although the procedure only results in small improvements over identity equating (not equating prompts), it does produce a viable alternative, and a mechanism for checking that the identity equating is appropriate. This may be particularly useful for measuring rater drift or equating mixed format tests.  相似文献   

9.
In this paper, an overview of the observed-score equating (OSE) process is provided from the perspective of a unifying equating framework (von Davier in von Davier (Ed.), Statistical models for test equating, scaling, and linking, Springer, New York, pp. 1–17, 2011b). The framework includes all OSE approaches. Issues related to the test, common items, and sampling designs and their relationship to measurement and equating are discussed. Challenges to the equating process, model assumptions, and approaches to equating evaluation are also presented. The equating process is illustrated step-by-step with a real data example from a licensure test.  相似文献   

10.
In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.  相似文献   

11.
吴锐  丁树良  甘登文 《心理学报》2010,42(3):434-442
题组越来越多地出现在各类考试中, 采用标准的IRT模型对有题组的测验等值, 可能因忽略题组的局部相依性导致等值结果的失真。为解决此问题, 我们采用基于题组的2PTM模型及IRT特征曲线法等值, 以等值系数估计值的误差大小作为衡量标准, 以Wilcoxon符号秩检验为依据, 在几种不同情况下进行了大量的Monte Carlo模拟实验。实验结果表明, 考虑了局部相依性的题组模型2PTM绝大部分情况下都比2PLM等值的误差小且有显著性差异。另外, 用6种不同等值准则对2PTM等值并评价了不同条件下等值准则之间的优劣。  相似文献   

12.
以广东省佛山市"升中"考试为例.分析和探讨如何选用合适的等值设计与方法来解决普教"升中"考试不同地区分数转换的问题.采用非随机组锚测验等值设计对三种经典测验等值方法进行比较.结果发现:Tuck-er线性等值方法最优,kvine线性等值方法次之,等百分位等值方法(频数估计)不适合此类等值.等值方差分析表明题型与等值方法具有交互作用,这说明不同的题型宜选用不同的等值方法来进行等值.  相似文献   

13.
在项目反应理论框架下,根据已有文献提出了开发新的测验等值准则的方法,即许多准则都可以看成是通过对锚题上作答反应概率分布进行变换而导出。据此揭示了两个著名的等值准则——Haebara方法和Stocking-Lord方法之间的联系,并且导出了一个新的等值准则——余弦等值准则。为了讨论余弦准则的行为表现,开展了一系列Monte-Carlo模拟研究。模拟结果表明,余弦准则在多级评分模型GPCM上表现比Haebara方法和Stocking--Lord方法都好,而对GRM和2PLM,其表现不如Haebara,但可以和Stocking-Lord方法相提并论。这一发现提醒我们等值准则的选用是否恰当,不仅与等值系数所落的范围有关,而且还与项目反应函数(IRF)有更密切的关系  相似文献   

14.
等值作为保证测验公平性的技术手段,一直是测验理论研究的重要方面。MIRT理论的发展证明了题目和测验是复杂的,传统的单维模型已经不能满足对人和题目/测验之间关系的探讨需求。目前MIRT等值研究主要有两种取向,其中一种取向是研究多维数据对IRT等值会产生什么样的影响;第二种取向是通过开发新的计算方法和计算工具研究MIRT等值过程。MIRT等值研究最重要的是对等值方法和过程实现的研究,目前已取得一些进展,在进行这些研究的过程中最重要的考虑因素是控制其误差影响因素。  相似文献   

15.
现在,等值越来越受到各考试测验机构及测量学研究人员的重视,特别是项目反应理论等值的优越性更使他们有了信心。然而,很多人却没有注意到被试能力分布形态可能给等值结果带来的影响效果及程度。本研究以项目反应理论两级记分模型的项目参数等值在不同被试能力分布形态下的结果差异作为重点,探讨被试抽样偏差可能给项目特征曲线等值带来的误差问题。研究结果表明,被试能力分布形态会显著地影响项目参数等值的系数,特别地,能力分布的偏态系数与等值方程的截距存在显著的线性相关关系,但能力分布形态的变化对等值方程中斜率的影响并不明显  相似文献   

16.
刘玥  刘红云 《心理科学》2015,(6):1504-1512
研究旨在探索无铆题情况下,使用构造铆测验法,实现测验分数等值。研究一和研究二分别探索题目难度排序错误、铆题难度差异对构造铆测验法的影响。结果表明:(1)等组条件下,随着错误铆题比例,难度排序错误程度,铆题难度差异增大,构造铆测验法的等值误差逐渐增大,随机等组法的等值误差较为稳定;不等组条件下,构造铆测验法的等值误差均小于随机等组法;(2)对于构造铆测验法,在不等组条件下,铆测验长度越短,等值误差越大。  相似文献   

17.
高慧健  辛涛  李峰 《心理科学》2011,34(4):957-964
传统锚题-非等组设计下的测验等值,等值要求的满足具有主观性,并且由于锚题失效或难以获得等因素的影响,则该方法的使用受到了限制。因此,本研究基于规则空间模型的Q矩阵理论,生成两个Q矩阵相同但无锚题的测验的共同受测者,使用共同组设计,利用同时性估计的方法对测验进行等值,并考虑了作答失误率和测验结构对等值稳定性的影响。结果表明:共同组设计同时估计方法的等值稳定性取得了优于或等于锚题-非等组同时估计方法;失误率的增大也会导致等值稳定性的下降;并且不同的测验结构也对等值稳定性产生了影响,其中直线型和收敛型结构稳定性较好,发散型和无结构型较差。  相似文献   

18.
Adapting educational tests from 1 language to others requires equating across the different language versions to be able to compare examinees from different language groups. Such equating is usually based on translated items considered to have similar content and psychometric characteristics in both source and target languages. However, because it is not possible to ascertain that these items are really similar in different languages, it is difficult to control and validate the equating outcome. The purpose of this study was to develop a method for evaluating cross-lingual equating and apply it to the Psychometric Entrance Test (PET) used for admission to Israeli universities. This test is written in Hebrew (the source-language, SL) and translated into 5 languages. A cross-lingual equating in a double-linking plan was performed in each of 12 forms translated to 1 target language (TL1), and each of 9 forms translated to another target language (TL2). The average difference between the equating results in the 2 links, indicating the overall instability incorporated in the equating process, was more than 10 times the standard error of equating in the TL1-SL process and about half this size in the TL2-SL process. The significance of the results and the differences found between the 2 TLs are discussed, as well as the potential displayed by the method for use as a general evaluative tool for cross-lingual equating.  相似文献   

19.
The Non-Equivalent groups with Anchor Test (NEAT) design involves missing data that are missing by design. Three nonlinear observed score equating methods used with a NEAT design are the frequency estimation equipercentile equating (FEEE), the chain equipercentile equating (CEE), and the item-response-theory observed-score-equating (IRT OSE). These three methods each make different assumptions about the missing data in the NEAT design. The FEEE method assumes that the conditional distribution of the test score given the anchor test score is the same in the two examinee groups. The CEE method assumes that the equipercentile functions equating the test score to the anchor test score are the same in the two examinee groups. The IRT OSE method assumes that the IRT model employed fits the data adequately, and the items in the tests and the anchor test do not exhibit differential item functioning across the two examinee groups. This paper first describes the missing data assumptions of the three equating methods. Then it describes how the missing data in the NEAT design can be filled in a manner that is coherent with the assumptions made by each of these equating methods. Implications on equating are also discussed.  相似文献   

20.
一种新的等值准则及其适用范围的探讨   总被引:3,自引:0,他引:3  
受假设检验方法的启发,该文引出了一种基于项目反应理论的新等值方法——平方根等值准则。它具有一些特点:定义式中答对、答错概率同时出现而不能互相替代;极易从0—1评分模式的版本转换到多级评分版本;它可以看成是Haebara等值准则的加权形式。以等值系数估计值的误差大小为衡量标准,以Wilcoxon符号秩检验为依据,大量的Monte Carlo模拟结果显示了一种有趣的现象,即等值方法的运用范围既与项目参数估计精度有关,又与等值系数A的范围有关,但与另一个等值系数B的范围无关。当项目参数估计精度较高或中等而A取值在0.9~1.3之间,新方法往往比Stocking_Lord方法和Haebara方法的估计误差小且有显著性差异,当项目参数估计精度较低时,而A从1.0~2.0时新方法都有优越性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号