期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王少杰张敏强李拓宇梁正妍《心理科学进展》2020,28(5):855-870

核等值流程包括：预平滑、估计分数概率、连续化、等值、评估等值结果。该方法兼具线性等值与等百分位等值的优点, 各环节扩展性与包容性较强; 采用平滑与连续化处理, 可降低等值随机误差; 等值差异标准误等其所特有的概念为结果评估提供可靠的工具。连续化与带宽选择方法等因素均可影响其表现; 基于核等值的新方法为等值发展提供了新颖的视角。未来可关注核等值体系的扩充与完善、流程的更新、等值方法的结合和比较等方向。相似文献

2.

项目反应理论框架下的新等值方法——对数对比等值法 总被引：3，自引：2，他引：1

丁树良熊建华毛萌萌《心理学报》2003,35(6):835-841

项目反应理论有一些以除法形式给出的多级评分模型,若采用Haebara等值法、Stocking_Lord等值法或对称相对熵等值法进行测验等值,都因其对初值有较高要求而可能导致失败。针对这一类模型,我们给出了一种新的等值方法——对数对比等值法。这种方法收敛快,对迭代初值要求低,所得结果精度较高,可以为其他等值方法提供良好的初值。研究表明,对数对比等值法还改进和推广了0-1评分的两参数Logistic模型的Logit变换等值法相似文献

3.

项目反应理论观察分数核等值的影响因素

王少杰张敏强黄菲菲黄丽芳袁琪婷《心理科学》2022,45(4):988-997

探究带宽选择方法、样本量、题目数量、等值设计、数据模拟方式对项目反应理论观察分数核等值的影响。通过两种数据模拟方式,获得研究数据,并计算局部与全域评价指标。研究发现,在随机组设计中,带宽选择方法表现相似;考生样本量和题目数量影响甚微。在非等组设计中,惩罚法与Silverman经验准则表现优异;增加题目量可降低百分相对误差和随机误差;增加样本量导致百分相对误差变大,随机误差减小。数据模拟方式可影响等值评价。未来应重点关注等值系统评估。相似文献

4.

锚题题型与等值估计方法对等值的影响

戴海崎刘启辉《心理学报》2002,34(4):37-40

锚测验———非等组设计是一种非常重要的等值设计方法。研究证明 :在此设计之下作为等值媒体的锚测验采用的题型不同对等值结果会有不同影响 ;采用的等值关系估计方法不同对等值结果也有不同影响 ;题型与估计方法之间还有明显的交互作用。研究认为 ,在当前的命题与评分技术水平条件下 ,锚测验以纯客观题组成为最佳 ;在锚测验题量固定的条件下 ,等值关系估计以选用频数估计法为最佳。相似文献

5.

检验导出的等值新方法及其表现探讨

熊建华丁树良雷宁宁《心理学探新》2007,27(1):70-74

该文受Berkson将检验方法用于估计未知参数的启发,根据三个拟合优度统计量导出三种新的求取等值系数的方法,即:平方根等值方法(Square Root criterion,SQRTcrit)、对称相对熵等值方法(Symmetric Relative Entropy criterion,SREcrit)、加权等值方法(Weighted criterion,Wcrit),即Haebara准则的加权式。虽然在被检验的两个分布列很接近时,这三个多项拟合优度检验方法是渐近等价的,然而用它们求取等值系数时,Monte-Carlo模拟结果表明这三种新等值方法的行为表现存在差异。它们之间的差异和随机误差的大小有密切关系,即与项目参数估计的精度有关;还与等值系数A的范围有关。相似文献

6.

高中会考等值方法的比较研究

张光旭杨志明《心理学探新》1999,(4)

本研究采用随机等组设计与铆测验相结合的方案。首先验证了两随机等组的平均数、方差和分布状态无显著差异,再用随机等组的等值分作为等值效标来检验其他等值方法的误差,然后比较了在铆测验设计中三种线性等值方法（在不同总体权重下）的误差值,以选出适合高中合考的等值方法及总体权重。经研究发现：会考等值宜采用Tucker观察分数线性等值方法,并宜选择总体权重W1＝1。相似文献

7.

一种新的等值准则及其适用范围的探讨 总被引：3，自引：0，他引：3

丁树良熊建华罗芬吴锐甘小方涂白《心理学报》2005,37(5):674-680

受假设检验方法的启发,该文引出了一种基于项目反应理论的新等值方法——平方根等值准则。它具有一些特点：定义式中答对、答错概率同时出现而不能互相替代;极易从0—1评分模式的版本转换到多级评分版本;它可以看成是Haebara等值准则的加权形式。以等值系数估计值的误差大小为衡量标准,以Wilcoxon符号秩检验为依据,大量的Monte Carlo模拟结果显示了一种有趣的现象,即等值方法的运用范围既与项目参数估计精度有关,又与等值系数A的范围有关,但与另一个等值系数B的范围无关。当项目参数估计精度较高或中等而A取值在0.9～1.3之间,新方法往往比Stocking_Lord方法和Haebara方法的估计误差小且有显著性差异,当项目参数估计精度较低时,而A从1.0～2.0时新方法都有优越性。相似文献

8.

含题组的测验等值

吴锐丁树良甘登文《心理学报》2010,42(3):434-442

题组越来越多地出现在各类考试中, 采用标准的IRT模型对有题组的测验等值, 可能因忽略题组的局部相依性导致等值结果的失真。为解决此问题, 我们采用基于题组的2PTM模型及IRT特征曲线法等值, 以等值系数估计值的误差大小作为衡量标准, 以Wilcoxon符号秩检验为依据, 在几种不同情况下进行了大量的Monte Carlo模拟实验。实验结果表明, 考虑了局部相依性的题组模型2PTM绝大部分情况下都比2PLM等值的误差小且有显著性差异。另外, 用6种不同等值准则对2PTM等值并评价了不同条件下等值准则之间的优劣。相似文献

9.

等级反应模型项目特征曲线法等值研究 总被引：2，自引：0，他引：2

戴海崎《心理学探新》2000,20(3):49-53

主、客观题并用的测验建项目反应理论题库需作多级模型项目参数等值,本研究推演了等级反应模型下项目特征曲线等值方法并在实际等值试验中获得成功. 相似文献

10.

垂直等值的应用及最新发展述评

王烨晖边玉芳辛涛《心理学探新》2011,31(5):472-476

由于实际的需求,垂直等值方法在近些年来迅速发展。但从垂直等值方法的整个过程来看,包括垂直等值的选用、双向细目编制、发展性量尺的构建、程序的选择和结果的报告,仍存在大量有待解决的问题。同时,随着其他测量方法的发展与进步,垂直等值与之相结合从而获得了进一步的完善。综观之,垂直等值方法的发展与完善,一方面依赖于各种模型和参数估计方法的改进与创新,另一方面还依赖于研究者对学业发展本质的不断深入认识。相似文献

11.

Standard Errors of Kernel Equating: Accounting for Bandwidth Estimation

Kseniia Marcq Bjrn Andersson 《应用心理检测》2022,46(3):200

In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal. 相似文献

12.

A Bayesian Nonparametric Approach to Test Equating

George Karabatsos Stephen G. Walker 《Psychometrika》2009,74(2):211-232

A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are compared through the analysis of data sets famous in the equating literature. Also, the classical percentile-rank, linear, and mean equating models are each proven to be a special case of a Bayesian model under a highly-informative choice of prior distribution. 相似文献

13.

New Equating Methods and Their Relationships with Levine Observed Score Linear Equating Under the Kernel Equating Framework

Haiwen Chen Paul Holland 《Psychometrika》2010,75(3):542-557

In this paper, we develop a new curvilinear equating for the nonequivalent groups with anchor test (NEAT) design under the assumption of the classical test theory model, that we name curvilinear Levine observed score equating. In fact, by applying both the kernel equating framework and the mean preserving linear transformation of post-stratification equating, we obtain a family of observed score equipercentile equating functions, which also includes the classical Levine observed score linear equating and the Tucker linear equating as special cases. 相似文献

14.

Using Automated Essay Scores as an Anchor When Equating Constructed Response Writing Tests

Russell G. Almond 《International Journal of Testing》2014,14(1):73-91

Assessments consisting of only a few extended constructed response items (essays) are not typically equated using anchor test designs as there are typically too few essay prompts in each form to allow for meaningful equating. This article explores the idea that output from an automated scoring program designed to measure writing fluency (a common objective of many writing prompts) can be used in place of a more traditional anchor. The linear-logistic equating method used in this article is a variant of the Tucker linear equating method appropriate for the limited score range typical of essays. The procedure is applied to historical data. Although the procedure only results in small improvements over identity equating (not equating prompts), it does produce a viable alternative, and a mechanism for checking that the identity equating is appropriate. This may be particularly useful for measuring rater drift or equating mixed format tests. 相似文献

15.

IRT与MIRT在测验垂直等值中的应用

王怡唐文清刘晶张敏强李明黎光明《心理科学进展》2014,22(5):881-888

测验垂直等值是指将测试同一心理特质的不同水平的测验转换到同一个分数量尺上的过程。IRT与MIRT是实现垂直等值的主要方法。IRT无需假设被试的能力分布, 参数估计不依赖于样本, 是构建垂直量表的有效方法, 但测验不满足单维假设时其应用受到限制。MIRT结合IRT和因素分析的特点对IRT进行了拓展, 可更有效估计多维测验的项目参数和被试能力参数, 在垂直等值中有重要应用。已有研究主要探讨IRT和MIRT在垂直等值应用中的适用性、标定方法和参数估计方法, 比较研究两种方法的特性。未来研究应纳入更多变量条件进行比较研究, 拓展方法的应用。相似文献

16.

不同定义平行测验等值的群体不变性 总被引：1，自引：0，他引：1

刘铁川戴海琦赵玉《心理学探新》2012,(1):67-71

群体不变性是等值的一个重要假设,即对不同的考生子群体等值函数一致。本研究对不同平行测验定义下线性等值的群体不变性进行了理论分析和模拟研究,模拟研究REMSD指标通过六种不同加权方式计算。结果显示,严格平行测验在信度较低时REMSD指标更大;子群体均值差异和信度差异对REMSD的影响存在明显的交互作用;REMSD指标在期望权重等权下的最大,在分数权重采用子群体比例加权最小。最后对结果进行了讨论,对REMSD权重使用及进一步研究给出了建议。相似文献