首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
康春花 《心理科学》2003,26(5):887-890
1 引言  LLTM (LinearLogisticTestModels)可以从刺激特征方面解释项目难度和被试作答概率以及被试的能力值 ,特别是把以往单纯的难度参数变成了线性组合 ,从而可以多方面多维度地解释难度参数。但LLTM的测验项目还是跟以往的测验一样 ,只有一次测验 ,也只有一个笼统的能力值 ,这样就不能更加深入地解释被试的能力 ,尤其是不能就认知加工过程来解释被试的能力值和不同心理特质。如对于具有同一能力值的被试 ,在任何项目上其答对概率都是一样的 ,这就不能了解他们的个体差异。而如果把被试的认知加工过程分解成几个心理成分 ,构建子…  相似文献   

2.
在心理测量和教育测量中,二级项目和题组项目是两类常见的项目类型。由这两种项目混合构成的测试在实践中有着重要的应用。被试在答题时,由于个人的潜在能力和项目难度不匹配,常常会产生异常反应,这些异常反应会影响IRT中潜在特质估计的准确性。仿真实验证明,二级项目题组混合IRT模型的稳健估计方法在出现异常值的情况下,能够比极大似然估计对被试的潜在特质做出更加准确的估计,能够满足实际测试的需求。  相似文献   

3.
项目选择一直是元认知控制研究中的热点问题。本研究以不同难度和分值的计算题为实验材料,通过两个实验探讨学习率的心理现实性及其对项目选择的影响。实验1中,在不限时条件下,被试者需要完成不同难度的计算题并赋予不同分值。实验2中,在限时条件下,通过变化计算题的难度和分值设计了三种不同学习率的项目,被试只能选择其中一种项目来计算以获得更高的分值。研究表明:第一,当项目所用时间增多时,被试对该项目所赋分值就增大,而时间(难度)与分值的比例是不变的,即学习率是相同的;第二,被试倾向于选择学习率更高的项目来完成计算任务,而当学习率相同时,被试倾向于优先选择高分困难项目。研究证实了学习率的心理现实性,并确定了学习率是项目选择的主要依据。  相似文献   

4.
在心理与教育测验中,测验的计算机化越来越普遍,使得被试作答的过程性数据的搜集也越来越便利。分层模型的提出为作答时间与反应的联合分析提供了一个基本的建模框架,且逐渐成为当前最流行的方法。虽然分层模型被广泛使用,但仅仅通过参数间的关系还不能很好地解释作答时间和反应之间的关系。因此,一些研究者提出了一系列改进模型,但仍然存在一些不足。基于双因子模型的新视角,文中将测验的作答时间与反应分别视为测量被试速度和能力的两个局部因子,而作答时间与反应又视为综合测量了被试的速度与准确率权衡的一般能力或全局因子。基于此,文中提出双因子分层模型,以探讨作答时间与反应的依赖关系。模拟研究发现Mplus程序能有效估计双因子分层模型的各参数,而忽视作答时间与反应依赖关系的分层模型的参数估计结果存在明显的偏差。在实例数据分析中,相较于分层模型,双因子分层模型的各模型拟合指数表现更好。此外,不同被试在不同项目上的作答时间与反应存在不同的依赖关系,从而对被试的作答准确率与时间产生不同的影响。  相似文献   

5.
余嘉元 《心理学报》1990,23(2):95-100
本研究运用蒙特卡罗模拟方法。在80种不同的实验条件下产生被试的反应矩阵,再分别用信号检测理论(SDT)和项目反应理论(IRT)中的Rasch模型、三参数逻辑斯谛模型对被试的能力进行估计,其结果表明,用这三种不同方法所得到的能力估计值均与能力的真实值有较高的相关。  相似文献   

6.
为探讨项目功能差异对于认知诊断测验估计准确性的影响,采用模拟研究的方式在3种DIF题目所占比例,3种DIF量下,检测了4种认知诊断测验中存在的DIF对于被试能力估计准确性和题目参数估计准确性的影响。结果发现:(1)DIF对于目标组被试能力估计准确性影响较大;(2)含有DIF的题目所占比例和DIF量增大,都会降低目标组被试能力估计的准确性;(3)非一致性DIF对于被试能力估计准确性的影响大于一致性DIF;(4)只有含有DIF题目的题目参数估计准确性会下降;(5)随着DIF量增大,含有DIF题目的题目参数估计准确性下降增多,但不受含DIF题目所占比例的影响。  相似文献   

7.
心理学研究的重要目的之一发现心理干预的途径和方法。但截至目前,有效干预人类被试因果推理过程的手段尚不丰富,干预手段的效果并不稳定。本研究采用完全随机设计开展两个实验,分别探讨频率树是否影响大学生被试在反事实提问和能力提问因果推理问题上的作答表现。结果显示:(1)在两个实验中都发现了明显的图形促进效应,大部分被试在借助提供嵌套集合关系频率树(而非隐藏嵌套集合关系频率树)辅助推理时使用PPC值估计因果强度;(2)频率树类型和提问方式共同影响被试的因果强度估计模式,提供嵌套集合关系频率树+反事实提问的组合促使最多被试使用PPC估计因果强度。结果说明:明确数据之间的嵌套集合关系能极大地提高被试使用PPC估计因果强度的概率,关注焦点集信息有助于被试明确数据间的嵌套集合关系。  相似文献   

8.
个体在完成多种类型认知任务时普遍使用了多种策略,且不同策略解题难度存在差异。但常见的测量模型忽视了这一事实,研究基于混合分布项目反应模型开发了Mix-DINA模型。其主要优点是:(1)可同时报告被试的知识状态与策略使用倾向;(2)不同策略对应题目参数自由估计,使之更符合主流心理学观点。研究通过模拟数据验证了自编Mix-DINA模型估计程序分析各类多策略作答时的有效性,结果显示Mix-DINA模型在分析单策略作答时也具有一定的稳健性。最后讨论了研究的局限,对多策略认知诊断的进一步研究给出了建议。  相似文献   

9.
研究目的:考察反应时与智力的关系以及通过模型对比考察4PLRT模型特性。方法:以瑞文标准推理测验为例,对4PLRT模型与3PLRT模型作比较研究。结论:(1)项目时间参数与难度参数正相关,可以依据时间参数将瑞文标准测验划分成三大部分。(2)能力估计结果比较:由于项目参数和被试速度参数的作用,CTT条件下能力相等的不同被试在计时与非计时条件下存在差异。(3)关于被试的答题策略:被试会基于自身能力而充分权衡时间与准确性的关系。这两种不同模型下被试的能力值之间的比较也说明了时间是一个重要的智力评估因子。  相似文献   

10.
二参数逻辑斯蒂模型项目参数的估计精度   总被引:1,自引:0,他引:1  
项目参数的估计精度对于测验的编制尤其是题库的建立十分重要。目前,国内外对项目参数估计精度的研究,大部分是基于在已知项目参数真值的情况下,运用各种参数估计方法产生新的估计值,再和真值进行偏度(BIAS)和均方根差(RMSE)的比较,从而说明该种估计方法的有效性。但是这种方法不能提供不同的参数真值之间的估计误差的变化规律。为了弥补这一缺陷,本文尝试从项目参数估计信息函数的角度出发研究项目参数的估计精度问题。本研究以二参数Logistic模型作为研究对象,首先定义了项目参数的估计信息函数,然后基于完全随机实验设计,通过模拟研究的方法探索影响项目参数的估计精度的因素,实验共设计了(2×3×2)种情形。研究结果表明:(1)项目参数(a,b)的估计精度均随着被试样本量的增大而提高;(2)被试的能力分布对难度参数的估计精度影响较大,对区分度参数的估计精度影响相对较小;(3)难度参数和区分度参数的估计精度都分别受到参数a和参数b的共同作用。  相似文献   

11.
计算机形式的测验能够记录考生在测验中的题目作答时间(Response Time, RT),作为一种重要的辅助信息来源,RT对于测验开发和管理具有重要的价值,特别是在计算机化自适应测验(Computerized Adaptive Testing, CAT)领域。本文简要介绍了RT在CAT选题方面应用并作以简评,分析了这些技术在实践中的可行性。最后,探讨了当前RT应用于CAT选题存在的问题以及可以进一步开展的研究方向。  相似文献   

12.
This study investigates using response times (RTs) with item responses in a computerized adaptive test (CAT) setting to enhance item selection and ability estimation and control for differential speededness. Using van der Linden’s hierarchical framework, an extended procedure for joint estimation of ability and speed parameters for use in CAT is developed following van der Linden; this is called the joint expected a posteriori estimator (J-EAP). It is shown that the J-EAP estimate of ability and speededness outperforms the standard maximum likelihood estimator (MLE) of ability and speededness in terms of correlation, root mean square error, and bias. It is further shown that under the maximum information per time unit item selection method (MICT)—a method which uses estimates for ability and speededness directly—using the J-EAP further reduces average examinee time spent and variability in test times between examinees above the resulting gains of this selection algorithm with the MLE while maintaining estimation efficiency. Simulated test results are further corroborated with test parameters derived from a real data example.  相似文献   

13.
Can Shao  Jun Li  Ying Cheng 《Psychometrika》2016,81(4):1118-1141
Change-point analysis (CPA) is a well-established statistical method to detect abrupt changes, if any, in a sequence of data. In this paper, we propose a procedure based on CPA to detect test speededness. This procedure is not only able to classify examinees into speeded and non-speeded groups, but also identify the point at which an examinee starts to speed. Identification of the change point can be very useful. First, it informs decision makers of the appropriate length of a test. Second, by removing the speeded responses, instead of the entire response sequence of an examinee suspected of speededness, ability estimation can be improved. Simulation studies show that this procedure is efficient in detecting both speeded examinees and the speeding point. Ability estimation is dramatically improved by removing speeded responses identified by our procedure. The procedure is then applied to a real dataset for illustration purpose.  相似文献   

14.
目前参数估计多采用统计方法,存在耗时长、要求被试样本容量大和项目数多等缺点。本文将BP神经网络和降维法相结合,对GRM的项目参数和考生能力参数进行估计。蒙特卡洛模拟结果显示:(1)不管是人多题少还是题多人少,该网络设计下的参数估计精度都较高;(2)可以应用到多个不同等级评分的参数估计中,甚至是超过15个等级的项目参数,估计精度也较高,这是其他参数估计方法所不可比拟的;(3)运行的时长和统计估计方法相比大大缩减。  相似文献   

15.
The purpose of this paper is to present a hypothesis testing and estimation procedure, Crossing SIBTEST, for detecting crossing DIF. Crossing DIF exists when the difference in the probabilities of a correct answer for the two examinee groups changes signs as ability level is varied. In item response theory terms, crossing DIF is indicated by two crossing item characteristic curves. Our new procedure, denoted as Crossing SIBTEST, first estimates the matching subtest score at which crossing occurs using least squares regression analysis. A Crossing SIBTEST statistic then is used to test the hypothesis of crossing DIF. The performance of Crossing SIBTEST is evaluated in this study.This research was partially supported by a grant from the Law School Admission Council and by National Science Foundation Mathematics Grant NSF-DMS-94-04327. The research reported here is collaborative in every respect and the order of authorship is alphabetical. The authors thank Jeff Douglas and Louis Roussos for their useful comments and discussions.  相似文献   

16.
设计项目参数、被试得分已知的测验情境,在两、三、四参数Logistic加权模型下进行能力估计,发现被试得分等级之间的能力步长存在着均匀的步长间距,被试得分能较好的反映多级记分的分数加权作用。两参数Logistic加权模型下会出现被试参数估计扰动现象,猜测现象会导致能力高估现象,失误现象会导致能力低估现象;三参数Logistic加权模型c型下能力高估现象未出现或不明显;三参数Logistic加权模型γ型下能力低估现象未出现或不明显;四参数Logistic加权模型下被试能力高估现象和低估现象都未出现或不明显,四参数Logistic加权模型是被试能力稳健性估计较好的方法。  相似文献   

17.
A Bayesian random effects model for testlets   总被引:4,自引:0,他引:4  
Standard item response theory (IRT) models fit to dichotomous examination responses ignore the fact that sets of items (testlets) often come from a single common stimuli (e.g. a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee proficiency). Models that assume conditional independence will overestimate the precision with which examinee proficiency is measured. Overstatement of precision may lead to inaccurate inferences such as prematurely ending an examination in which the stopping rule is based on the estimated standard error of examinee proficiency (e.g., an adaptive test). To model examinations that may be a mixture of independent items and testlets, we modified one standard IRT model to include an additional random effect for items nested within the same testlet. We use a Bayesian framework to facilitate posterior inference via a Data Augmented Gibbs Sampler (DAGS; Tanner & Wong, 1987). The modified and standard IRT models are both applied to a data set from a disclosed form of the SAT. We also provide simulation results that indicates that the degree of precision bias is a function of the variability of the testlet effects, as well as the testlet design.The authors wish to thank Robert Mislevy, Andrew Gelman and Donald B. Rubin for their helpful suggestions and comments, Ida Lawrence and Miriam Feigenbaum for providing us with the SAT data analyzed in section 5, and to the two anonymous referees for their careful reading and thoughtful suggestions on an earlier draft. We are also grateful to the Educational Testing service for providing the resources to do this research.  相似文献   

18.
The causal attributions of learning-disabled (LD) and normally achieving (NA) children in grades 3 through 8 were compared. Attributions were measured by two scales that asked children to attribute hypothetical academic failure situations to factors that were either within (e.g., insufficient effort) or beyond (e.g., insufficient ability, blaming others) their control. Consistent with a learned helplessness hypothesis, LD girls, regardless of age, were more likely than NA children to attribute their failures to factors beyond their control. In contrast, LD boys' explanations for their failures paralleled those of NA children. That is, with increasing age the LD boys were more likely to attribute their failures to insufficient effort. Explanations and implications of sex differences in developmental patterns of LD children's causal attributions are discussed.The authors wish to thank Ruth Dusseault and Betty Wallace for their help in conducting this research. We also wish to thank the teachers, children, and administrators from the Leon County Schools for their cooperation.  相似文献   

19.
The item response function (IRF) for a polytomously scored item is defined as a weighted sum of the item category response functions (ICRF, the probability of getting a particular score for a randomly sampled examinee of ability ). This paper establishes the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model). Specifically, a proof of the following assertion is provided for these models: If two items have the same IRF, then they must have the same number of categories; moreover, they must consist of the same ICRFs. As a corollary, for the Rasch dichotomous model, if two tests have the same test characteristic function (TCF), then they must have the same number of items. Moreover, for each item in one of the tests, an item in the other test with an identical IRF must exist. Theoretical as well as practical implications of these results are discussed.This research was supported by Educational Testing Service Allocation Projects No. 79409 and No. 79413. The authors wish to thank John Donoghue, Ming-Mei Wang, Rebecca Zwick, and Zhiliang Ying for their useful comments and discussions. The authors also wish to thank three anonymous reviewers for their comments.  相似文献   

20.
等级反应模型下计算机化自适应测验选题策略   总被引:7,自引:3,他引:4  
陈平  丁树良  林海菁  周婕 《心理学报》2006,38(3):461-467
计算机化自适应测验(CAT)中的选题策略,一直是国内外相关学者关注的问题。然而对多级评分的CAT的选题策略的研究却很少报导。本研究采用计算机模拟程序对等级反应模型(Graded Response Model)下CAT的四种选题策略进行研究。研究表明:等级难度值与当前能力估计值匹配选题策略的综合评价最高;在选题策略中增设 “影子题库”可以明显提高项目调用的均匀性;并且不同的项目参数分布或不同的能力估计方法都对CAT评价指标有影响  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号