共查询到19条相似文献,搜索用时 250 毫秒
1.
2.
3.
4.
研究旨在探索无铆题情况下,使用构造铆测验法,实现测验分数等值。研究一和研究二分别探索题目难度排序错误、铆题难度差异对构造铆测验法的影响。结果表明:(1)等组条件下,随着错误铆题比例,难度排序错误程度,铆题难度差异增大,构造铆测验法的等值误差逐渐增大,随机等组法的等值误差较为稳定;不等组条件下,构造铆测验法的等值误差均小于随机等组法;(2)对于构造铆测验法,在不等组条件下,铆测验长度越短,等值误差越大。 相似文献
5.
6.
等级反应模型项目特征曲线法等值研究 总被引:2,自引:0,他引:2
主、客观题并用的测验建项目反应理论题库需作多级模型项目参数等值,本研究推演了等级反应模型下项目特征曲线等值方法并在实际等值试验中获得成功. 相似文献
7.
8.
应征公民计算机自适应化拼图测验的编制 总被引:1,自引:0,他引:1
在文献回顾和参考外军有关资料的基础上,根据项目反应理论和空间能力测验的有关理论编制试题库。首先采用纸笔测验的形式进行预实验,探讨采用IRT理论编制CAT拼图测验的可行性。然后,在预实验的基础上对试题进行修订并扩充试题数量,编制计算机辅助测验。选择三参数Logistic模型,采用铆题等值设计,分7份不同的试卷在全国征兵心理检测的过程中对55777名应征公民进行施测。根据测试结果,对题目进行分析,选择高质量的题目构成CAT试题库,采用a系数分层抽样的方法控制曝光率,并采用不同的测验终止策略编制CAT拼图测验。最后用WAIS智力测验积木分测验和三门功课的考试成绩为效标,通过72名被试对CAT拼图测验进行效度验证。结果显示该测验符合项目反应理论三参数Logistic模型的假设,各题目参数比较理想,所编制的测验具有较好的信度和效度,可用于应征公民心理选拔的实践 相似文献
9.
经典测量理论等值的误差研究 总被引:3,自引:0,他引:3
1 引言 等值 ,是以铆测验或铆被试组为桥梁建立两份同特质测验结果之间的比较关系。许多因素会影响等值的准确性 ,由于被试抽样给等值带来的误差叫等值抽样误差。它指的是 ,由于等值所用被试样本是从其总体中进行了不可避免的有一定程度偏性的抽样而得到的 ,据此建立的等值关系也就具有一定程度的偏差 ,这种偏差即是等值抽样误差。通过从总体中重复抽样、以一个完全拟合数据条件的等值方法进行等值 ,那么 ,等值结果分布的平均数即是真正的等值分数 ,而分布的标准差即是等值抽样标准误。本文将对等值抽样误差问题进行探讨。2 研究方法2 … 相似文献
10.
11.
The Non-Equivalent groups with Anchor Test (NEAT) design involves missing
data that are missing by design. Three nonlinear observed score equating methods used with a NEAT design are the frequency estimation equipercentile equating (FEEE), the chain equipercentile equating (CEE), and the item-response-theory observed-score-equating (IRT OSE). These three methods each make different assumptions about the missing data in the NEAT design. The FEEE method
assumes that the conditional distribution of the test score given the anchor test score is the same in the two examinee groups.
The CEE method assumes that the equipercentile functions equating the test score to the anchor test score are the same in
the two examinee groups. The IRT OSE method assumes that the IRT model employed fits the data adequately, and the items in
the tests and the anchor test do not exhibit differential item functioning across the two examinee groups. This paper first
describes the missing data assumptions of the three equating methods. Then it describes how the missing data in the NEAT design
can be filled in a manner that is coherent with the assumptions made by each of these equating methods. Implications on equating
are also discussed. 相似文献
12.
Dong‐In Kim Seung W. Choi Guemin Lee Kooghyang R. Um 《International Journal of Selection & Assessment》2008,16(2):83-92
We designed this study to evaluate several data collection and equating designs in the context of item response theory (IRT) equating. The random‐groups design and the common‐item design have been widely used for collecting data for IRT equating. In this study, we investigated four equating methods based upon these two data collection designs, using empirical data from a number of different testing programs. When the randomly equivalent group assumption was reasonably met, the four equating methods tended to produce highly comparable results. On the other hand, equating methods based upon either of the equating designs produced dissimilar results. Sample size can have differential effects on the equating results produced by the different equating methods. In practice, a common‐item equivalent‐groups design often produces unacceptably large differences in the group mean due to various anomalies such as context effects, poor quality of common items, or a very small number of common items. In such cases, a random‐groups design would produce more stable equating results. 相似文献
13.
Karl Bang Christensen Jakob Bue Bjorner Svend Kreiner Jørgen Holm Petersen 《Psychometrika》2002,67(4):563-574
A fundamental assumption of most IRT models is that items measure the same unidimensional latent construct. For the polytomous Rasch model two ways of testing this assumption against specific multidimensional alternatives are discussed. One, a marginal approach assuming a multidimensional parametric latent variable distribution, and, two, a conditional approach with no distributional assumptions about the latent variable. The second approach generalizes the Martin-Löf test for the dichotomous Rasch model in two ways: to polytomous items and to a test against an alternative that may have more than two dimensions. A study on occupational health is used to motivate and illustrate the methods.The authors would like to thank Niels Keiding, Klaus Larsen and the anonymous reviewers for valuable comments to a previous version of this paper. This research was supported by a grant from the Danish Research Academy and by a general research grant from Quality Metric, Inc. 相似文献
14.
探究带宽选择方法、样本量、题目数量、等值设计、数据模拟方式对项目反应理论观察分数核等值的影响。通过两种数据模拟方式,获得研究数据,并计算局部与全域评价指标。研究发现,在随机组设计中,带宽选择方法表现相似;考生样本量和题目数量影响甚微。在非等组设计中,惩罚法与Silverman经验准则表现优异;增加题目量可降低百分相对误差和随机误差;增加样本量导致百分相对误差变大,随机误差减小。数据模拟方式可影响等值评价。未来应重点关注等值系统评估。 相似文献
15.
题组越来越多地出现在各类考试中, 采用标准的IRT模型对有题组的测验等值, 可能因忽略题组的局部相依性导致等值结果的失真。为解决此问题, 我们采用基于题组的2PTM模型及IRT特征曲线法等值, 以等值系数估计值的误差大小作为衡量标准, 以Wilcoxon符号秩检验为依据, 在几种不同情况下进行了大量的Monte Carlo模拟实验。实验结果表明, 考虑了局部相依性的题组模型2PTM绝大部分情况下都比2PLM等值的误差小且有显著性差异。另外, 用6种不同等值准则对2PTM等值并评价了不同条件下等值准则之间的优劣。 相似文献
16.
David Andrich 《Psychometrika》2010,75(2):292-308
Rasch models are characterised by sufficient statistics for all parameters. In the Rasch unidimensional model for two ordered
categories, the parameterisation of the person and item is symmetrical and it is readily established that the total scores
of a person and item are sufficient statistics for their respective parameters. In contrast, in the unidimensional polytomous
Rasch model for more than two ordered categories, the parameterisation is not symmetrical. Specifically, each item has a vector
of item parameters, one for each category, and each person only one person parameter. In addition, different items can have
different numbers of categories and, therefore, different numbers of parameters. The sufficient statistic for the parameters
of an item is itself a vector. In estimating the person parameters in presently available software, these sufficient statistics
are not used to condition out the item parameters. This paper derives a conditional, pairwise, pseudo-likelihood and constructs
estimates of the parameters of any number of persons which are independent of all item parameters and of the maximum scores
of all items. It also shows that these estimates are consistent. Although Rasch’s original work began with equating tests
using test scores, and not with items of a test, the polytomous Rasch model has not been applied in this way. Operationally,
this is because the current approaches, in which item parameters are estimated first, cannot handle test data where there
may be many scores with zero frequencies. A small simulation study shows that, when using the estimation equations derived
in this paper, such a property of the data is no impediment to the application of the model at the level of tests. This opens
up the possibility of using the polytomous Rasch model directly in equating test scores. 相似文献
17.
项目反应理论(IRT)模型依据项目与被试的特征预测被试的作答表现, 是常用的心理测量模型。但IRT的有效运用依赖于所选用IRT模型与实际数据资料相符合的程度(即模型?资料拟合度, goodness of fit)。只有当所采用IRT分析模型与实际数据资料拟合较好时, IRT的优点和功能才能真正发挥出来(Orlando & Thissen, 2000)。而当所采用IRT模型与资料不拟合或选择了错误的模型, 则会导致如参数估计、测验等值及项目功能差异分析等具有较大误差(Kang, Cohen & Sung, 2009), 给实际工作带来不良影响。因此, 在使用IRT分析时, 应首先充分考察及检验所选用模型与实际数据是否相匹配/相拟合(McKinley & Mills, 1985)。IRT领域中常用模型?资料拟合检验统计量可从项目拟合、测验拟合两个角度进行阐述并比较, 这是心理、教育测量领域的重要主题, 也是测验分析过程中较易忽视的环节, 目前还未见此类公开发表的文章。未来的研究可以在各统计量的实证比较研究以及在认知诊断领域的拓展方面有所发展。 相似文献
18.
Item response theory (IRT) models are the central tools in modern measurement and advanced psychometrics. We offer a MATLAB
IRT modeling (IRTm) toolbox that is freely available and that follows an explicit design matrix approach, giving the end user
control and flexibility in building a model that goes beyond standard models, such as the Rasch model (Rasch, 1960) and the
two-parameter logistic model. As such, IRTm allows for a large variety of unidimensional IRT models for binary responses,
the incorporation of additional person and item information, and deviations from common model assumptions. An exclusive key
feature of the toolbox is the inclusion of copula IRT models to handle local item dependencies. Two appendixes for this report,
containing example code and information on the general copula IRT in IRTm, may be downloaded from brm.psychonomic-journals.org/content/supplemental. 相似文献
19.
Damazo T. Kadengye Eva Ceulemans Wim Van den Noortgate 《Behavior research methods》2014,46(3):823-840
This article describes a generalized longitudinal mixture item response theory (IRT) model that allows for detecting latent group differences in item response data obtained from electronic learning (e-learning) environments or other learning environments that result in large numbers of items. The described model can be viewed as a combination of a longitudinal Rasch model, a mixture Rasch model, and a random-item IRT model, and it includes some features of the explanatory IRT modeling framework. The model assumes the possible presence of latent classes in item response patterns, due to initial person-level differences before learning takes place, to latent class-specific learning trajectories, or to a combination of both. Moreover, it allows for differential item functioning over the classes. A Bayesian model estimation procedure is described, and the results of a simulation study are presented that indicate that the parameters are recovered well, particularly for conditions with large item sample sizes. The model is also illustrated with an empirical sample data set from a Web-based e-learning environment. 相似文献