首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
在对不努力作答(IER)概念和方法综述的基础上,通过一个心理量表的真实测试数据,介绍了如何综合采用不同的方法甄别IER。主要包括:(1)探索不同指标对不努力作答模式的敏感性,探讨了应用多种指标的必要性以及如何选取的问题;(2)分析IER对测验工具指标计算结果的消极影响;(3)总结不努力作答数据清洗方法及注意事项,为提升量表数据质量提供了数据清理方面的建议。  相似文献   

2.
詹沛达 《心理科学》2019,(1):170-178
随着心理与教育测量研究的发展和科技的进步,计算机化(大规模)测验逐渐受到人们的关注。为探究在计算机化多维测验中如何利用作答时间数据来辅助评估多维潜在能力,以及为我国义务教育阶段教育质量监测提供数据分析方法上的理论支持。本研究以2012年和2015年国际学生能力评估(PISA)计算机化数学测验数据为例,提出了一种可同时利用作答时间和作答精度数据的联合作答与时间的多维Rasch模型。根据新模型对PISA数据的分析结果,表明引入作答时间数据,不仅有助于提高模型参数的估计精度,还有助于数据分析者利用被试的作答时间信息来做进一步的决策和干预(e.g., 对异常作答行为或预备知识的诊断)。  相似文献   

3.
问卷调查是心理与教育领域十分常见的数据收集方法, 而被试的不认真作答可能导致问卷数据失真。回顾已有研究发现:(a)不认真作答可以从外在作答模式和内在产生原因两个方向进行定义; (b)不认真作答的常见事前控制方法主要包括降低任务难度以及提高被试作答动机两大类; (c)事后识别方法主要包括嵌入识别量表、作答模式识别、反应时识别三大类。今后的研究中应基于作答机制的研究优化与开发控制方法, 检验作答识别方法的跨情境适用性并开发新方法, 并对局部不认真的识别与处理进行更深入的探讨。  相似文献   

4.
詹沛达  Hong Jiao  Kaiwen Man 《心理学报》2020,52(9):1132-1142
在心理与教育测量中, 潜在加工速度反映学生运用潜在能力解决问题的效率。为在多维测验中探究潜在加工速度的多维性并实现参数估计, 本研究提出多维对数正态作答时间模型。实证数据分析及模拟研究结果表明:(1)潜在加工速度具有与潜在能力相匹配的多维结构; (2)新模型可精确估计个体水平的多维潜在加工速度及与作答时间有关的题目参数; (3)冗余指定潜在加工速度具有多维性带来的负面影响低于忽略其多维性所带来的。  相似文献   

5.
以大规模学业成就水平测验为背景,采用组间设计,按类别将专家分为五组,在反馈数据环节随机给予每个专家组未调整的考生真实作答数据和经过上调、下调0.5或1个标准差的调整数据后,采用单因素方差分析与项目反应理论两参数模型考查专家组在标准设定Angoff方法中如何参考使用题目作答数据。结果表明,反馈考生题目作答数据对Angoff标准划定结果有明显影响:反馈未经调整的真实考生题目作答数据影响较大;反馈调整后的高于真实考生题目作答数据影响相对较小,反馈低于真实考生题目作答数据影响相对较大。  相似文献   

6.
在心理与教育测验中,测验的计算机化越来越普遍,使得被试作答的过程性数据的搜集也越来越便利。分层模型的提出为作答时间与反应的联合分析提供了一个基本的建模框架,且逐渐成为当前最流行的方法。虽然分层模型被广泛使用,但仅仅通过参数间的关系还不能很好地解释作答时间和反应之间的关系。因此,一些研究者提出了一系列改进模型,但仍然存在一些不足。基于双因子模型的新视角,文中将测验的作答时间与反应分别视为测量被试速度和能力的两个局部因子,而作答时间与反应又视为综合测量了被试的速度与准确率权衡的一般能力或全局因子。基于此,文中提出双因子分层模型,以探讨作答时间与反应的依赖关系。模拟研究发现Mplus程序能有效估计双因子分层模型的各参数,而忽视作答时间与反应依赖关系的分层模型的参数估计结果存在明显的偏差。在实例数据分析中,相较于分层模型,双因子分层模型的各模型拟合指数表现更好。此外,不同被试在不同项目上的作答时间与反应存在不同的依赖关系,从而对被试的作答准确率与时间产生不同的影响。  相似文献   

7.
随着基于计算机的测试逐渐普及,搜集、记录、分析考生的项目作答反应时间数据成为可能,越来越多的研究者开始基于这一数据开展考试的作弊甄别研究。然而,此类研究国外较多,国内则非常之少。提出“两种三类”的作弊行为分类标准,从参数建模法、非参数建模法两个维度,对基于项目作答反应时间的作弊研究进行梳理,评述其在甄别各类作弊行为中的应用实践和甄别效果,并对未来的研究方向做出展望。  相似文献   

8.
本研究提出四个纵向题目作答时间(RT)模型以追踪潜在加工速度的变化。四者的测量模型一致,差异主要体现于描述潜在加工速度随时间变化的结构模型。实证研究结果表明四者均有实践应用性且数据分析结果具有较高一致性。模拟研究表明四者在不同模拟条件下的参数估计返真性均良好。总之,本文提出的四个纵向RT模型具有可应用性且心理计量学性能良好;不仅丰富了纵向RT数据的分析方法,还拓展了纵向潜变量模型的应用范围。  相似文献   

9.
对多模态数据的联合分析是改进结果评价、健全综合评价的主要途径。针对概率态认知诊断模型(CDM)仅能分析题目作答精度(RA)的局限,本文基于联合-层级建模框架和联合-交叉负载建模框架提出三个可联合分析RA和题目作答时间(RT)的概率态联合CDM。模拟研究和实证研究结果表明:(1)新模型参数估计返真性良好,额外引入RT有助于提高参数估计精度并提供有关个体加工速度的测量;(2)基于联合-交叉负载建模框架构建的模型对测验情境的兼容性优于基于联合-层级建模框架构建的模型;(3)概率态属性比确定态属性更精细地反映个体对属性的掌握情况。  相似文献   

10.
在认知诊断评估中利用过程性数据,如作答时间信息,能进一步提升诊断精度。通过建立被试正确作答概率与个体速度参数之间的回归模型,开发了更简洁的新模型:RRT-DINA模型。采用实证与模拟研究,与JRT-DINA模型比较,探讨了新模型的性能。PISA2012数据研究表明,RRT-DINA模型的拟合效果更好。模拟研究结果表明:(1)RRT-DINA模型可采用MCMC算法实现参数估计,估计精度较高。(2)当以RRT-DINA生成数据时,RRT-DINA的题目参数估计精度优于JRTDINA;当以JRT-DINA生成数据时,JRT-DINA的题目参数估计精度稍微优于RRT-DINA。(3)当以RRT-DINA生成数据时,RRT-DINA的判准率优于JRT-DINA模型;当以JRT-DINA生成数据时,JRT-DINA的判准率稍微优于RRT-DINA,且差距较小。  相似文献   

11.
基于计算机形式的测验使得收集作答反应时信息成为可能,这些信息的有效利用对心理与教育测验的理论研究和实际应用产生了重大影响。首先,归纳并总结了测验中使用反应时信息的五大优势。其次,分别介绍了4种不同取向下较典型的反应时模型与模型特征,并分别进行评价。再次,较系统地梳理了反应时模型在实践中的应用,使读者了解反应时信息在测验中所发挥的作用。最后,探讨了未来将反应时应用于心理与教育测量领域的几个研究方向。  相似文献   

12.
Abstract

For adequate modeling of missing responses, a thorough understanding of the nonresponse mechanisms is vital. As a large number of major testing programs are in the process or already have been moving to computer-based assessment, a rich body of additional data on examinee behavior becomes easily accessible. These additional data may contain valuable information on the processes associated with nonresponse. Bringing together research on item omissions with approaches for modeling response time data, we propose a framework for simultaneously modeling response behavior and omission behavior utilizing timing information for both. As such, the proposed model allows (a) to gain a deeper understanding of response and nonresponse behavior in general and, in particular, of the processes underlying item omissions in LSAs, (b) to model the processes determining the time examinees require to generate a response or to omit an item, and (c) to account for nonignorable item omissions. Parameter recovery of the proposed model is studied within a simulation study. An illustration of the model by means of an application to real data is provided.  相似文献   

13.
尽管多阶段测验(MST)在保持自适应测验优点的同时允许测验编制者按照一定的约束条件去建构每一个模块和题板,但建构测验时若因忽视某些潜在的因素而导致题目之间出现局部题目依赖性(LID)时,也会对MST测验结果带来一定的危害。为探究"LID对MST的危害"这一问题,本研究首先介绍了MST和LID等相关概念;然后通过模拟研究比较探讨该问题,结果表明LID的存在会影响被试能力估计的精度但仍为估计偏差较小,且该危害不限于某一特定的路由规则;之后为消除该危害,使用了题组反应模型作为MST施测过程中的分析模型,结果表明尽管该方法能够消除部分危害但效果有限。这一方面表明LID对MST中被试能力估计精度所带来的危害确实值得关注,另一方面也表明在今后关于如何消除MST中由LID造成危害的方法仍值得进一步探究的。  相似文献   

14.
项目反应理论等级反应模型项目信息量   总被引:7,自引:1,他引:6  
信息函数作为项目反应理论中的一个重要概念,在进行项目和测验分析的工作中,以及在指导测验编制的工作中,有着非常重要的应用价值。信息函数的应用在计算机化自适应测验中更是重中之重,也受到最大关注。然而,关于多级记分项目信息函数特性的研究还比较少。本研究模拟了被试特质水平参数数据和项目参数数据,其中被试特质水平参数生成了121个被试特质水平参数点,项目参数生成了4批不同区分度参数数据,每批数据有126个不同难度等级参数组合模式的项目,每个项目有5个难度等级。通过数据分析后发现,等级反应模型项目提供最大信息量所对应的被试特质水平,是与该项目几个相互临近的难度等级组相适应,既不是只与其中一个难度等级对应,也不一定是与所有难度等级对应。本研究称这种规律为“临近难度等级占优”。这个发现无疑对测验质量分析和测验编制工作,包括计算机化自适应测验编制,具有重要的指导意义  相似文献   

15.
本研究以4岁~5岁儿童认知能力测验为例,在IRT框架下探讨了如何进行追踪数据的测量不变性分析。分析模型采用项目间多维项目反应理论模型(between-item MIRT model)和项目内(within-item MIRT model)多维two-tier model,被试为来自全国的882名48个月的儿童,工具为自编4岁~5岁儿童认知能力测验。经测验水平 分析和项目水平分析,结果表明:(1)本文对追踪数据的测量不变性分析方法合理有效; (2)该测验在两个时间点上满足部分测量不变性要求,测验的潜在结构稳定; (3)“方位题”的区分度和难度参数都发生变化,另有4题难度参数出现浮动; (4)儿童在4岁~5岁期间认知能力总体呈快速发展趋势,能力增长显著。  相似文献   

16.
Computerized classification testing (CCT) commonly chooses items maximizing information at the cut score, which yields the most information for decision-making. However, a corollary problem is that all examinees will be given the same set of items, resulting in high test overlap rate and unbalanced item bank usage, which threatens test security. Moreover, another pivotal issue for CCT is time control. Since both the extremely long response time (RT) and large RT variability across examinees intensify time-induced anxiety, it is crucial to reduce the number of examinees exceeding the time limitation and the differences between examinees' test-taking times. To satisfy these practical needs, this paper proposes the novel idea of stage adaptiveness to tailor the item selection process to the decision-making requirement in each step and generate fresh insight into the existing response time selection method. Results indicate that a balanced item usage as well as short and stable test times across examinees can be achieved via the new methods.  相似文献   

17.
This paper deals with optimal partitioning of limited testing time in order to achieve maximum total test score. Nonlinear optimization theory was used to analyze this problem. A general case using a generic item response model is first presented. A special case that applies a response time model proposed by Wang and Hanson (2005) is also presented. Theoretical properties of the optimal solution are derived. Their practical implications to optimal test-taking strategies are also discussed. The theoretical properties are in general agreement with the conventional advice to the examinees on pacing strategy.  相似文献   

18.
There are a growing number of item response theory (IRT) studies that calibrate different patient-reported outcome (PRO) measures, such as anxiety, depression, physical function, and pain, on common, instrument-independent metrics. In the case of depression, it has been reported that there are considerable mean score differences when scoring on a common metric from different, previously linked instruments. Ideally, those estimates should be the same. We investigated to what extent those differences are influenced by different scoring methods that take into account several levels of uncertainty, such as measurement error (through plausible value imputation) and item parameter uncertainty (through full Bayesian IRT modeling). Depression estimates from different instruments were more similar, and their corresponding confidence/credible intervals were larger when plausible value imputation or Bayesian modeling was used, compared to the direct use of expected a posteriori (EAP) estimates. Furthermore, we explored the use of Bayesian IRT models to update item parameters based on newly collected data.  相似文献   

19.
In real testing, examinees may manifest different types of test‐taking behaviours. In this paper we focus on two types that appear to be among the more frequently occurring behaviours – solution behaviour and rapid guessing behaviour. Rapid guessing usually happens in high‐stakes tests when there is insufficient time, and in low‐stakes tests when there is lack of effort. These two qualitatively different test‐taking behaviours, if ignored, will lead to violation of the local independence assumption and, as a result, yield biased item/person parameter estimation. We propose a mixture hierarchical model to account for differences among item responses and response time patterns arising from these two behaviours. The model is also able to identify the specific behaviour an examinee engages in when answering an item. A Monte Carlo expectation maximization algorithm is proposed for model calibration. A simulation study shows that the new model yields more accurate item and person parameter estimates than a non‐mixture model when the data indeed come from two types of behaviour. The model also fits real, high‐stakes test data better than a non‐mixture model, and therefore the new model can better identify the underlying test‐taking behaviour an examinee engages in on a certain item.  相似文献   

20.
Computerized adaptive testing under nonparametric IRT models   总被引:1,自引:0,他引:1  
Nonparametric item response models have been developed as alternatives to the relatively inflexible parametric item response models. An open question is whether it is possible and practical to administer computerized adaptive testing with nonparametric models. This paper explores the possibility of computerized adaptive testing when using nonparametric item response models. A central issue is that the derivatives of item characteristic Curves may not be estimated well, which eliminates the availability of the standard maximum Fisher information criterion. As alternatives, procedures based on Shannon entropy and Kullback–Leibler information are proposed. For a long test, these procedures, which do not require the derivatives of the item characteristic eurves, become equivalent to the maximum Fisher information criterion. A simulation study is conducted to study the behavior of these two procedures, compared with random item selection. The study shows that the procedures based on Shannon entropy and Kullback–Leibler information perform similarly in terms of root mean square error, and perform much better than random item selection. The study also shows that item exposure rates need to be addressed for these methods to be practical. The authors would like to thank Hua Chang for his help in conducting this research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号