首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
计算机形式的测验能够记录考生在测验中的题目作答时间(Response Time, RT),作为一种重要的辅助信息来源,RT对于测验开发和管理具有重要的价值,特别是在计算机化自适应测验(Computerized Adaptive Testing, CAT)领域。本文简要介绍了RT在CAT选题方面应用并作以简评,分析了这些技术在实践中的可行性。最后,探讨了当前RT应用于CAT选题存在的问题以及可以进一步开展的研究方向。  相似文献   

2.
自编235个图形推理测验题目。采用铆测验等值设计,以72个联合型瑞文测验题目为铆题,对初中到大学各能力层次的1733名男性进行了测验。使用BILOG MG3.0(边际极大似然估计)对实测数据进行了分析,采用Logsitic 3参数模型。剔除数据与模型拟合不好的题目以及信息函数最大值小于0.3的题目,最终建立一个包含181道题目的题库。该题库可以用于淘汰智力较低的应征青年  相似文献   

3.
本研究开发了两种新的适用于多级评分项目的多维计算机化自适应测验(PMCAT)的选题策略——修正的连续熵(RCEM)和修正的后验期望KL信息(MKB)方法,并与以往PMCAT的选题策略进行了对比研究。Monte Carlo实验结果表明:两种新开发的选题策略比原方法估计精度更高,并且RCEM方法在所有选题策略中曝光率最低。新开发的选题策略具有较理想的估计精度和曝光控制效果,为PMCAT在实践中的应用提供了新的方法支持。  相似文献   

4.
多级评分计算机化自适应测验动态综合选题策略   总被引:1,自引:0,他引:1  
罗芬  丁树良  王晓庆 《心理学报》2012,44(3):400-412
多级评分可以提供更多关于被试的信息, 是计算机化自适应测验的一个发展方向, 选题策略是计算机化自适应测验的研究重点。对于多级评分的等级反应模型, 本文拟用区间估计的思想改进近期提出的几种选题策略, 并且将两级评分b-STR和a-STR推广到多级评分以改进最大信息量选题策略。Monte Carlo模拟实验表明在达到或接近原有选题策略测验精度的基础上, 本文提出的几种新选题策略有的能够有效降低测验长度, 有的可以极大降低项目曝光率。  相似文献   

5.
针对双目标CD-CAT,将六种项目区分度(鉴别力D、一般区分度GDI、优势比OR、2PL的区分度a、属性区分度ADI、认知诊断区分度CDI)分别与IPA方法结合,得到新的选题策略。模拟研究比较了它们的表现,还考察了区分度分层在控制项目曝光的表现。结果发现:新方法都能明显提高知识状态的判准率和能力估计精度;分层选题均能很好地提高题库利用率。总体上,OR加权能显著提高测量精度;OR分层选题在保证测量精度条件下显著提高项目曝光均匀性。  相似文献   

6.
陈平 《心理学报》2016,48(9):1184-1198
在线标定技术由于具有诸多优点而被广泛应用于计算机化自适应测验(CAT)的新题标定。Method A是想法最直接、算法最简单的CAT在线标定方法, 但它具有明显的理论缺陷--在标定过程中将能力估计值视为能力真值。将全功能极大似然估计方法(FFMLE)与“利用充分性结果”估计方法(ECSE)的误差校正思路融入Method A (新方法分别记为FFMLE-Method A和ECSE-Method A), 从理论上对能力估计误差进行校正, 进而克服Method A的标定缺陷。模拟研究的结果表明:(1)在大多数实验条件下, 两种新方法较Method A总体上可以改进标定精度, 且在测验长度为10的短测验上的改进幅度最大; (2)当CAT测验长度较短或中等(10或20题)时, 两种新方法的表现与性能最优的MEM已非常接近。当测验长度较长(30题)时, ECSE-Method A的总体表现最好、优于MEM; (3)样本量越大, 各种方法的标定精度越高。  相似文献   

7.
题目属性的定义是实施认知诊断评价的关键步骤, 通过有丰富经验的领域专家对题目的属性进行定义是当前的主要方法, 然而该方法受到许多主观经验因素的影响。寻找客观的题目属性定义或验证方法可以为主观定义过程提供策略支持或对结果进行改进, 因此已经引起研究者们的关注。本研究构建了一种简单高效的题目属性定义方法, 研究使用似然比D2统计量从作答数据中估计题目属性的方法, 实现属性掌握模式、题目参数和题目属性向量的联合估计。模拟研究结果表明, 使用似然比D2统计量可以有效地识别题目的属性向量, 该方法一方面可以实现新编制题目属性向量的在线估计, 另一方面可以验证已经定义的题目属性向量的准确性。  相似文献   

8.
应用项目反应理论对瑞文测验联合型的分析   总被引:1,自引:0,他引:1  
使用BILOG-MG3.0软件,边际极大似然估计,3参数Logistic模型对354名不同能力水平的男性青年的瑞文测验联合型数据进行了分析。结果显示:大多数瑞文测验联合型的题目都适合3参数Logistic模型(有6道题不适合)。整个测验的信息函数峰值的位置在难度量表的-3到-2之间,其值为16.82。共有18道题的信息函数峰值在0.2以下。从区分度来看,72道题目的区分度均大于0.5,比较理想。难度参数显示所有题目均较低,绝大部分都在0以下,最高的只有1.01。题目的难度主要由所需的操作水平决定。伪猜测参数在0.07-0.24之间。综合分析表明瑞文测验联合型对正常青年的智力评价精度较差。  相似文献   

9.
Lord developed an approximation for the bias function for the maximum likelihood estimate in the context of the three-parameter logistic model. Using Taylor's expansion of the likelihood equation, he obtained an equation that includes the conditional expectation, given true ability, of the discrepancy between the maximum likelihood estimate and true ability. All terms of orders higher thann ?1 are ignored wheren indicates the number of items. Lord assumed that all item and individual parameters are bounded, all item parameters are known or well-estimated, and the number of items is reasonably large. In the present paper, an approximation for the bias function of the maximum likelihood estimate of the latent trait, or ability, will be developed using the same assumptions for the more general case where item responses are discrete. This will include the dichotomous response level, for which the three-parameter logistic model has been discussed, the graded response level and the nominal response level. Some observations will be made for both dichotomous and graded response levels.  相似文献   

10.
詹沛达 《心理科学》2019,(1):170-178
随着心理与教育测量研究的发展和科技的进步,计算机化(大规模)测验逐渐受到人们的关注。为探究在计算机化多维测验中如何利用作答时间数据来辅助评估多维潜在能力,以及为我国义务教育阶段教育质量监测提供数据分析方法上的理论支持。本研究以2012年和2015年国际学生能力评估(PISA)计算机化数学测验数据为例,提出了一种可同时利用作答时间和作答精度数据的联合作答与时间的多维Rasch模型。根据新模型对PISA数据的分析结果,表明引入作答时间数据,不仅有助于提高模型参数的估计精度,还有助于数据分析者利用被试的作答时间信息来做进一步的决策和干预(e.g., 对异常作答行为或预备知识的诊断)。  相似文献   

11.
Psychometric models for item-level data are broadly useful in psychology. A recurring issue for estimating item factor analysis (IFA) models is low-item endorsement (item sparseness), due to limited sample sizes or extreme items such as rare symptoms or behaviors. In this paper, I demonstrate that under conditions characterized by sparseness, currently available estimation methods, including maximum likelihood (ML), are likely to fail to converge or lead to extreme estimates and low empirical power. Bayesian estimation incorporating prior information is a promising alternative to ML estimation for IFA models with item sparseness. In this article, I use a simulation study to demonstrate that Bayesian estimation incorporating general prior information improves parameter estimate stability, overall variability in estimates, and power for IFA models with sparse, categorical indicators. Importantly, the priors proposed here can be generally applied to many research contexts in psychology, and they do not impact results compared to ML when indicators are not sparse. I then apply this method to examine the relationship between suicide ideation and insomnia in a sample of first-year college students. This provides an important alternative for researchers who may need to model items with sparse endorsement.  相似文献   

12.
目前参数估计多采用统计方法,存在耗时长、要求被试样本容量大和项目数多等缺点。本文将BP神经网络和降维法相结合,对GRM的项目参数和考生能力参数进行估计。蒙特卡洛模拟结果显示:(1)不管是人多题少还是题多人少,该网络设计下的参数估计精度都较高;(2)可以应用到多个不同等级评分的参数估计中,甚至是超过15个等级的项目参数,估计精度也较高,这是其他参数估计方法所不可比拟的;(3)运行的时长和统计估计方法相比大大缩减。  相似文献   

13.
Item response curves for a set of binary responses are studied from a Bayesian viewpoint of estimating the item parameters. For the two-parameter logistic model with normally distributed ability, restricted bivariate beta priors are used to illustrate the computation of the posterior mode via the EM algorithm. The procedure is illustrated by data from a mathematics test.This work was supported under Contract No. N00014-85-K-0113, NR 150-535, from Personnel and Training Research Programs, Psychological Sciences Division, Office of Naval Research. The authors wish to thank Mark D. Reckase for providing the ACT data used in the illustration and Michael J. Soltys for computational assistance. They also wish to thank the editor and four anonymous reviewers for many valuable suggestions.  相似文献   

14.
Probabilistic reasoning skills are important in various contexts. The aim of the present study was to develop a new instrument (the Probabilistic Reasoning Scale – PRS) to accurately measure low levels of probabilistic reasoning ability in order to identify people with difficulties in this domain. Item response theory was applied to construct the scale, and to investigate differential item functioning (i.e., whether the items were invariant) across genders, educational levels, and languages. Additionally, we tested the validity of the scale by investigating the relationships between the PRS and several other measures. The results revealed that the items had a low level of difficulty. Nonetheless, the discriminative measures showed that the items can discriminate between individuals with different trait levels, and the test information function showed that the scale accurately assesses low levels of probabilistic reasoning ability. Additionally, through investigating differential item functioning, the measurement equivalence of the scale at the item level was confirmed for gender, educational status, and language (i.e., Italian and English). Concerning validity, the results showed the expected correlations with numerical skills, math‐related attitudes, statistics achievement, IQ, reasoning skills, and risky choices both in the Italian and British samples. In conclusion, the PRS is an ideal instrument for identifying individuals who struggle with basic probabilistic reasoning, and who could be targeted by specific interventions. Copyright © 2017 John Wiley & Sons, Ltd.  相似文献   

15.
本文对多级计分认知诊断测验的DIF概念进行了界定,并通过模拟实验以及实证研究对四种常见的多级计分DIF检验方法的适用性进行理论以及实践性的探索。研究结果表明:四种方法均能对多级计分认知诊断中的DIF进行有效的检验,且各方法的表现受模型的影响不大;相较于以总分为匹配变量,以KS为匹配变量时更利于DIF的检测;以KS为匹配变量的LDFA方法以及以KS为匹配变量的曼特尔检验方法在检测DIF题目时有着最高的检验力。  相似文献   

16.
The three-parameter logistic model is widely used to model the responses to a proficiency test when the examinees can guess the correct response, as is the case for multiple-choice items. However, the weak identifiability of the parameters of the model results in large variability of the estimates and in convergence difficulties in the numerical maximization of the likelihood function. To overcome these issues, in this paper we explore various shrinkage estimation methods, following two main approaches. First, a ridge-type penalty on the guessing parameters is introduced in the likelihood function. The tuning parameter is then selected through various approaches: cross-validation, information criteria or using an empirical Bayes method. The second approach explored is based on the methodology developed to reduce the bias of the maximum likelihood estimator through an adjusted score equation. The performance of the methods is investigated through simulation studies and a real data example.  相似文献   

17.
项目难度与被试能力分布最优匹配的模拟研究   总被引:2,自引:1,他引:1  
李金波  王权 《心理学报》1998,31(2):197-203
该文运用蒙特卡罗方法对被测试能力分布与测验项目难度分布的匹配问题进行模拟分析,分析表明当能力分布为正态分布正偏态分布和负偏态分布时分别与测验项目难度分布与为正态分布,正偏态分布和负偏态分布匹配,比别的匹配有更高测验期望信息值,测验最大信息测验 系数,并且测验信息曲线最大值的能力点与能力分布的众数愈相一致,测验项目参数估计值性真实值的相关也更高。  相似文献   

18.
认知诊断计算机化自适应测验(Cognitive Diagnosis Computerized Adaptive Testing, CD-CAT)是认知诊断评估和计算机化自适应测验两者的结合,兼具认知诊断和自适应测验的特点。目前,针对CD-CAT的研究几乎都集中在0-1二级计分的数据。然而,在教育和心理评估的实际应用中,存在大量的多级计分的数据。因此,本研究探讨了多级计分CD-CAT(Polytomous CD-CAT, PCD-CAT)的实现技术,并提出了2种新的选题方法。通过模拟实验比较了新选题方法和传统选题方法在PCD-CAT的效果,结果表明:在定长PCD-CAT条件下,2种新选题方法的模式分类准确率是最高的,而在非定长PCD-CAT条件下,2种新方法的测验效率也是最高的。  相似文献   

19.
The simultaneous and nonparametric estimation of latent abilities and item characteristic curves is considered. The asymptotic properties of ordinal ability estimation and kernel smoothed nonparametric item characteristic curve estimation are investigated under very general assumptions on the underlying item response theory model as both the test length and the sample size increase. A large deviation probability inequality is stated for ordinal ability estimation. The mean squared error of kernel smoothed item characteristic curve estimates is studied and a strong consistency result is obtained showing that the worst case error in the item characteristic curve estimates over all items and ability levels converges to zero with probability equal to one.  相似文献   

20.
In item response models of the Rasch type (Fischer & Molenaar, 1995), item parameters are often estimated by the conditional maximum likelihood (CML) method. This paper addresses the loss of information in CML estimation by using the information concept of F-information (Liang, 1983). This concept makes it possible to specify the conditions for no loss of information and to define a quantification of information loss. For the dichotomous Rasch model, the derivations will be given in detail to show the use of the F-information concept for making comparisons for different estimation methods. It is shown that by using CML for item parameter estimation, some information is almost always lost. But compared to JML (joint maximum likelihood) as well as to MML (marginal maximum likelihood) the loss is very small. The reported efficiency in the use of information of CML to JML and to MML in several comparisons is always larger than 93%, and in tests with a length of 20 items or more, larger than 99%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号