共查询到16条相似文献,搜索用时 265 毫秒
1.
2.
基于经典测验理论(CTT)的等值方法主要有线性等值和等百分位等值两种。在不同情境下,不同的等值方法会产生不同的等值结果。本研究以真分数等值为依据,用蒙特卡洛模拟研究方法,综合比较了各种题目难度分布条件下和各种样本容量条件下两种CTT等值方法的等值结果。研究结果表明:(1)线性等值的误差受题目难度分布影响较大,等百分位等值的误差几乎不受题目难度分布影响。(2)线性等值的误差几乎不受样本容量的影响,等百分位等值的误差受样本容量影响较大。(3)不论题目难度分布如何,只要样本容量足够大,等百分位等值的效果都比线性等值更好。 相似文献
3.
对15种测验等值方法的比较研究 总被引:20,自引:2,他引:18
此项研究通过试验方法对4种基于经典测验理论的等值方法和11种基于项目反应理论的等值方法进行了比较研究。研究数据为HSK正式考试的数据,研究采用了较为可靠的检验标准。研究结果表明,在有些情况下,进行等值处理并非是最好的选择;在题库建设中,某些IRT方法是可行的;至少对于HSK数据,不论是单、双、三参数,不论是ms方法和mm方法,IRT参数转换等值方法的误差都较大,均不足取。 相似文献
4.
本研究采用随机等组设计与铆测验相结合的方案。首先验证了两随机等组的平均数、方差和分布状态无显著差异,再用随机等组的等值分作为等值效标来检验其他等值方法的误差,然后比较了在铆测验设计中三种线性等值方法(在不同总体权重下)的误差值,以选出适合高中合考的等值方法及总体权重。经研究发现:会考等值宜采用Tucker观察分数线性等值方法,并宜选择总体权重W1=1。 相似文献
5.
6.
题组越来越多地出现在各类考试中, 采用标准的IRT模型对有题组的测验等值, 可能因忽略题组的局部相依性导致等值结果的失真。为解决此问题, 我们采用基于题组的2PTM模型及IRT特征曲线法等值, 以等值系数估计值的误差大小作为衡量标准, 以Wilcoxon符号秩检验为依据, 在几种不同情况下进行了大量的Monte Carlo模拟实验。实验结果表明, 考虑了局部相依性的题组模型2PTM绝大部分情况下都比2PLM等值的误差小且有显著性差异。另外, 用6种不同等值准则对2PTM等值并评价了不同条件下等值准则之间的优劣。 相似文献
7.
8.
锚测验———非等组设计是一种非常重要的等值设计方法。研究证明 :在此设计之下作为等值媒体的锚测验采用的题型不同对等值结果会有不同影响 ;采用的等值关系估计方法不同对等值结果也有不同影响 ;题型与估计方法之间还有明显的交互作用。研究认为 ,在当前的命题与评分技术水平条件下 ,锚测验以纯客观题组成为最佳 ;在锚测验题量固定的条件下 ,等值关系估计以选用频数估计法为最佳。 相似文献
9.
10.
11.
Analytic smoothing for equipercentile equating under the common item nonequivalent populations design 总被引:1,自引:0,他引:1
A cubic spline method for smoothing equipercentile equating relationships under the common item nonequivalent populations design is described. Statistical techniques based on bootstrap estimation are presented that are designed to aid in choosing an equating method/degree of smoothing. These include: (a) asymptotic significance tests that compare no equating and linear equating to equipercentile equating; (b) a scheme for estimating total equating error and for dividing total estimated error into systematic and random components. The smoothing technique and statistical procedures are explored and illustrated using data from forms of a professional certification test. 相似文献
12.
The Non-Equivalent groups with Anchor Test (NEAT) design involves missing
data that are missing by design. Three nonlinear observed score equating methods used with a NEAT design are the frequency estimation equipercentile equating (FEEE), the chain equipercentile equating (CEE), and the item-response-theory observed-score-equating (IRT OSE). These three methods each make different assumptions about the missing data in the NEAT design. The FEEE method
assumes that the conditional distribution of the test score given the anchor test score is the same in the two examinee groups.
The CEE method assumes that the equipercentile functions equating the test score to the anchor test score are the same in
the two examinee groups. The IRT OSE method assumes that the IRT model employed fits the data adequately, and the items in
the tests and the anchor test do not exhibit differential item functioning across the two examinee groups. This paper first
describes the missing data assumptions of the three equating methods. Then it describes how the missing data in the NEAT design
can be filled in a manner that is coherent with the assumptions made by each of these equating methods. Implications on equating
are also discussed. 相似文献
13.
In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal. 相似文献
14.
Wendy M. Yen 《Psychometrika》1983,48(3):353-369
Test scores that are not perfectly reliable cannot be strictly equated unless they are strictly parallel [Lord, 1980]. This fact implies that tau-equivalence can be lost if an equipercentile equating is applied to observed scores that are not strictly parallel. Seventy-two simulated testing conditions are produced to simulate equating tests with different difficulties and discriminations. Number-correct and trait metrics are examined. When an equipercentile equating is applied to these data, locally biased (i.e., non-tau-equivalent) results are produced for tests of unequal difficulty. Differences between the criteria of tau-equivalence and equipercentile equivalence are discussed. 相似文献
15.
研究旨在探索无铆题情况下,使用构造铆测验法,实现测验分数等值。研究一和研究二分别探索题目难度排序错误、铆题难度差异对构造铆测验法的影响。结果表明:(1)等组条件下,随着错误铆题比例,难度排序错误程度,铆题难度差异增大,构造铆测验法的等值误差逐渐增大,随机等组法的等值误差较为稳定;不等组条件下,构造铆测验法的等值误差均小于随机等组法;(2)对于构造铆测验法,在不等组条件下,铆测验长度越短,等值误差越大。 相似文献
16.
In the design of common-item equating, two groups of examinees are administered separate test forms, and each test form contains a common subset of items. We consider test equating under this situation as an incomplete data problem—that is, examinees have observed scores on one test form and missing scores on the other. Through the use of statistical data-imputation techniques, the missing scores can be replaced by reasonable estimates, and consequently the forms may be directly equated as if both forms were administered to both groups. In this paper we discuss different data-imputation techniques that are useful for equipercentile equating; we also use empirical data to evaluate the accuracy of these techniques as compared with chained equipercentile equating.A paper presented at the European Meeting of the Psychometric Society, Barcelona, Spain, July, 1993. 相似文献