首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 250 毫秒
1.
自编235个图形推理测验题目。采用铆测验等值设计,以72个联合型瑞文测验题目为铆题,对初中到大学各能力层次的1733名男性进行了测验。使用BILOG MG3.0(边际极大似然估计)对实测数据进行了分析,采用Logsitic 3参数模型。剔除数据与模型拟合不好的题目以及信息函数最大值小于0.3的题目,最终建立一个包含181道题目的题库。该题库可以用于淘汰智力较低的应征青年  相似文献   

2.
在非等组铆测验设计中,铆题量占测验长度的多大比例比较合适,这个比例随测验长度的增大可否发生变化?这些是实际工作者和研究者非常关心的问题。该文在固定被试数和测验长度的条件下,探查铆题量所占测验长度比例(简称铆题比例)的变化对等值精度的影响,讨论了在实际等值中如何在等值精度和铆题比例之间取得平衡的问题。并在模拟研究的条件下,给出了几个反应实际等值精度的指标。  相似文献   

3.
刘铁川  戴海琦  赵玉 《心理科学》2012,35(2):446-451
设置铆题来链接不同测验形式是一种常用的等值设计。但受到曝光等因素影响,铆题功能在不同施测时间会发生改变。本研究采用MH检验和logistic回归考察我国一大型考试等值的铆题质量,结果发现,有22个铆题发生参数漂移,铆题的难度参数和区分度参数可能发生漂移;这些铆题中大部分在二次使用时无法通过模型拟合检验;若不删除参数发生漂移的铆题导致较大的系统等值误差,应将铆题参数漂移检验作为等值中的一步必要工作。  相似文献   

4.
刘玥  刘红云 《心理科学》2015,(6):1504-1512
研究旨在探索无铆题情况下,使用构造铆测验法,实现测验分数等值。研究一和研究二分别探索题目难度排序错误、铆题难度差异对构造铆测验法的影响。结果表明:(1)等组条件下,随着错误铆题比例,难度排序错误程度,铆题难度差异增大,构造铆测验法的等值误差逐渐增大,随机等组法的等值误差较为稳定;不等组条件下,构造铆测验法的等值误差均小于随机等组法;(2)对于构造铆测验法,在不等组条件下,铆测验长度越短,等值误差越大。  相似文献   

5.
等级反应模型下项目特征曲线等值法在大型考试中的应用   总被引:2,自引:1,他引:1  
在中国最大的资格考试之一的经济专业资格考试中,为保证不同年度间考试的可比性、进行题库建设和为计算机自适应考试做准备,应用项目反应理论中等级反应模型下的项目特征曲线等值法,采用铆测验等值设计,实现了4个年度考试资料的项目参数和能力参数的等值,并成功地组建了经济专业题库。在此基础上,利用等值技术对不同年份试卷的划界分数进行了比较,为经济考试的合格标准制定、确保考试的公平性提供了实证依据。  相似文献   

6.
等级反应模型项目特征曲线法等值研究   总被引:2,自引:0,他引:2  
主、客观题并用的测验建项目反应理论题库需作多级模型项目参数等值,本研究推演了等级反应模型下项目特征曲线等值方法并在实际等值试验中获得成功.  相似文献   

7.
具有多级评分和题组结构的测验形式被广泛应用,本文提出多级评分认知诊断题组模型(PCDTM),能处理带题组的多级评分测验数据。研究表明:(1)PCDTM模型合理有效,在各条件中均能得到良好的参数估计结果;(2)随着样本量、题目质量、题目数量增加,PCDTM的参数估计精度提高;(3)忽视题组效应,被试判准率和题目参数精度降低,甚至发生混乱;(4)PCDTM对实证数据的拟合更优,生态效果更好。建议使用该模型时,样本量不低于1000,题目数量不少于20题。  相似文献   

8.
应征公民计算机自适应化拼图测验的编制   总被引:1,自引:0,他引:1  
在文献回顾和参考外军有关资料的基础上,根据项目反应理论和空间能力测验的有关理论编制试题库。首先采用纸笔测验的形式进行预实验,探讨采用IRT理论编制CAT拼图测验的可行性。然后,在预实验的基础上对试题进行修订并扩充试题数量,编制计算机辅助测验。选择三参数Logistic模型,采用铆题等值设计,分7份不同的试卷在全国征兵心理检测的过程中对55777名应征公民进行施测。根据测试结果,对题目进行分析,选择高质量的题目构成CAT试题库,采用a系数分层抽样的方法控制曝光率,并采用不同的测验终止策略编制CAT拼图测验。最后用WAIS智力测验积木分测验和三门功课的考试成绩为效标,通过72名被试对CAT拼图测验进行效度验证。结果显示该测验符合项目反应理论三参数Logistic模型的假设,各题目参数比较理想,所编制的测验具有较好的信度和效度,可用于应征公民心理选拔的实践  相似文献   

9.
经典测量理论等值的误差研究   总被引:3,自引:0,他引:3  
1 引言  等值 ,是以铆测验或铆被试组为桥梁建立两份同特质测验结果之间的比较关系。许多因素会影响等值的准确性 ,由于被试抽样给等值带来的误差叫等值抽样误差。它指的是 ,由于等值所用被试样本是从其总体中进行了不可避免的有一定程度偏性的抽样而得到的 ,据此建立的等值关系也就具有一定程度的偏差 ,这种偏差即是等值抽样误差。通过从总体中重复抽样、以一个完全拟合数据条件的等值方法进行等值 ,那么 ,等值结果分布的平均数即是真正的等值分数 ,而分布的标准差即是等值抽样标准误。本文将对等值抽样误差问题进行探讨。2 研究方法2 …  相似文献   

10.
陈平  李潇  任赫  辛涛 《心理科学》2023,(4):960-970
针对我国测评项目的高安全性需求,提出锚人与锚题相结合的新跨年等值设计,并采用基于实证数据的模拟研究方法探究等值方法、锚人数量、锚测验组卷方式和不同测验周期被试能力差异对等值精度的影响。结果表明:以上因素均影响等值精度且等值方法的影响突出。建议:(1)锚人较少时采用需量尺转换的等值方法;(2)锚测验组卷方式应与等值方法计算特点相匹配;(3)各周期被试能力差异较大时可酌情增加锚人或调整锚测验组卷方案。  相似文献   

11.
The Non-Equivalent groups with Anchor Test (NEAT) design involves missing data that are missing by design. Three nonlinear observed score equating methods used with a NEAT design are the frequency estimation equipercentile equating (FEEE), the chain equipercentile equating (CEE), and the item-response-theory observed-score-equating (IRT OSE). These three methods each make different assumptions about the missing data in the NEAT design. The FEEE method assumes that the conditional distribution of the test score given the anchor test score is the same in the two examinee groups. The CEE method assumes that the equipercentile functions equating the test score to the anchor test score are the same in the two examinee groups. The IRT OSE method assumes that the IRT model employed fits the data adequately, and the items in the tests and the anchor test do not exhibit differential item functioning across the two examinee groups. This paper first describes the missing data assumptions of the three equating methods. Then it describes how the missing data in the NEAT design can be filled in a manner that is coherent with the assumptions made by each of these equating methods. Implications on equating are also discussed.  相似文献   

12.
We designed this study to evaluate several data collection and equating designs in the context of item response theory (IRT) equating. The random‐groups design and the common‐item design have been widely used for collecting data for IRT equating. In this study, we investigated four equating methods based upon these two data collection designs, using empirical data from a number of different testing programs. When the randomly equivalent group assumption was reasonably met, the four equating methods tended to produce highly comparable results. On the other hand, equating methods based upon either of the equating designs produced dissimilar results. Sample size can have differential effects on the equating results produced by the different equating methods. In practice, a common‐item equivalent‐groups design often produces unacceptably large differences in the group mean due to various anomalies such as context effects, poor quality of common items, or a very small number of common items. In such cases, a random‐groups design would produce more stable equating results.  相似文献   

13.
A fundamental assumption of most IRT models is that items measure the same unidimensional latent construct. For the polytomous Rasch model two ways of testing this assumption against specific multidimensional alternatives are discussed. One, a marginal approach assuming a multidimensional parametric latent variable distribution, and, two, a conditional approach with no distributional assumptions about the latent variable. The second approach generalizes the Martin-Löf test for the dichotomous Rasch model in two ways: to polytomous items and to a test against an alternative that may have more than two dimensions. A study on occupational health is used to motivate and illustrate the methods.The authors would like to thank Niels Keiding, Klaus Larsen and the anonymous reviewers for valuable comments to a previous version of this paper. This research was supported by a grant from the Danish Research Academy and by a general research grant from Quality Metric, Inc.  相似文献   

14.
探究带宽选择方法、样本量、题目数量、等值设计、数据模拟方式对项目反应理论观察分数核等值的影响。通过两种数据模拟方式,获得研究数据,并计算局部与全域评价指标。研究发现,在随机组设计中,带宽选择方法表现相似;考生样本量和题目数量影响甚微。在非等组设计中,惩罚法与Silverman经验准则表现优异;增加题目量可降低百分相对误差和随机误差;增加样本量导致百分相对误差变大,随机误差减小。数据模拟方式可影响等值评价。未来应重点关注等值系统评估。  相似文献   

15.
吴锐  丁树良  甘登文 《心理学报》2010,42(3):434-442
题组越来越多地出现在各类考试中, 采用标准的IRT模型对有题组的测验等值, 可能因忽略题组的局部相依性导致等值结果的失真。为解决此问题, 我们采用基于题组的2PTM模型及IRT特征曲线法等值, 以等值系数估计值的误差大小作为衡量标准, 以Wilcoxon符号秩检验为依据, 在几种不同情况下进行了大量的Monte Carlo模拟实验。实验结果表明, 考虑了局部相依性的题组模型2PTM绝大部分情况下都比2PLM等值的误差小且有显著性差异。另外, 用6种不同等值准则对2PTM等值并评价了不同条件下等值准则之间的优劣。  相似文献   

16.
Rasch models are characterised by sufficient statistics for all parameters. In the Rasch unidimensional model for two ordered categories, the parameterisation of the person and item is symmetrical and it is readily established that the total scores of a person and item are sufficient statistics for their respective parameters. In contrast, in the unidimensional polytomous Rasch model for more than two ordered categories, the parameterisation is not symmetrical. Specifically, each item has a vector of item parameters, one for each category, and each person only one person parameter. In addition, different items can have different numbers of categories and, therefore, different numbers of parameters. The sufficient statistic for the parameters of an item is itself a vector. In estimating the person parameters in presently available software, these sufficient statistics are not used to condition out the item parameters. This paper derives a conditional, pairwise, pseudo-likelihood and constructs estimates of the parameters of any number of persons which are independent of all item parameters and of the maximum scores of all items. It also shows that these estimates are consistent. Although Rasch’s original work began with equating tests using test scores, and not with items of a test, the polytomous Rasch model has not been applied in this way. Operationally, this is because the current approaches, in which item parameters are estimated first, cannot handle test data where there may be many scores with zero frequencies. A small simulation study shows that, when using the estimation equations derived in this paper, such a property of the data is no impediment to the application of the model at the level of tests. This opens up the possibility of using the polytomous Rasch model directly in equating test scores.  相似文献   

17.
项目反应理论(IRT)模型依据项目与被试的特征预测被试的作答表现, 是常用的心理测量模型。但IRT的有效运用依赖于所选用IRT模型与实际数据资料相符合的程度(即模型?资料拟合度, goodness of fit)。只有当所采用IRT分析模型与实际数据资料拟合较好时, IRT的优点和功能才能真正发挥出来(Orlando & Thissen, 2000)。而当所采用IRT模型与资料不拟合或选择了错误的模型, 则会导致如参数估计、测验等值及项目功能差异分析等具有较大误差(Kang, Cohen & Sung, 2009), 给实际工作带来不良影响。因此, 在使用IRT分析时, 应首先充分考察及检验所选用模型与实际数据是否相匹配/相拟合(McKinley & Mills, 1985)。IRT领域中常用模型?资料拟合检验统计量可从项目拟合、测验拟合两个角度进行阐述并比较, 这是心理、教育测量领域的重要主题, 也是测验分析过程中较易忽视的环节, 目前还未见此类公开发表的文章。未来的研究可以在各统计量的实证比较研究以及在认知诊断领域的拓展方面有所发展。  相似文献   

18.
Item response theory (IRT) models are the central tools in modern measurement and advanced psychometrics. We offer a MATLAB IRT modeling (IRTm) toolbox that is freely available and that follows an explicit design matrix approach, giving the end user control and flexibility in building a model that goes beyond standard models, such as the Rasch model (Rasch, 1960) and the two-parameter logistic model. As such, IRTm allows for a large variety of unidimensional IRT models for binary responses, the incorporation of additional person and item information, and deviations from common model assumptions. An exclusive key feature of the toolbox is the inclusion of copula IRT models to handle local item dependencies. Two appendixes for this report, containing example code and information on the general copula IRT in IRTm, may be downloaded from brm.psychonomic-journals.org/content/supplemental.  相似文献   

19.
This article describes a generalized longitudinal mixture item response theory (IRT) model that allows for detecting latent group differences in item response data obtained from electronic learning (e-learning) environments or other learning environments that result in large numbers of items. The described model can be viewed as a combination of a longitudinal Rasch model, a mixture Rasch model, and a random-item IRT model, and it includes some features of the explanatory IRT modeling framework. The model assumes the possible presence of latent classes in item response patterns, due to initial person-level differences before learning takes place, to latent class-specific learning trajectories, or to a combination of both. Moreover, it allows for differential item functioning over the classes. A Bayesian model estimation procedure is described, and the results of a simulation study are presented that indicate that the parameters are recovered well, particularly for conditions with large item sample sizes. The model is also illustrated with an empirical sample data set from a Web-based e-learning environment.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号