首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
孟祥斌 《心理科学》2016,39(3):727-734
近年来,项目反应时间数据的建模是心理和教育测量领域的热门方向之一。针对反应时间的对数正态模型和Box-Cox正态模型的不足,本文在van der Linden的分层模型框架下基于偏正态分布建立一个反应时间的对数线性模型,并成功给出模型参数估计的马尔科夫链蒙特卡罗(Markov Chain Monte Carlo, MCMC)算法。模拟研究和实例分析的结果均表明,与对数正态模型和Box-Cox正态模型相比,对数偏正态模型表现出更加优良的拟合效果,具有更强的灵活性和适用性。  相似文献   

2.
With computerized testing, it is possible to record both the responses of test takers to test questions (i.e., items) and the amount of time spent by a test taker in responding to each question. Various models have been proposed that take into account both test-taker ability and working speed, with the many models assuming a constant working speed throughout the test. The constant working speed assumption may be inappropriate for various reasons. For example, a test taker may need to adjust the pace due to time mismanagement, or a test taker who started out working too fast may reduce the working speed to improve accuracy. A model is proposed here that allows for variable working speed. An illustration of the model using the Amsterdam Chess Test data is provided.  相似文献   

3.
Current modeling of response times on test items has been strongly influenced by the paradigm of experimental reaction-time research in psychology. For instance, some of the models have a parameter structure that was chosen to represent a speed-accuracy tradeoff, while others equate speed directly with response time. Also, several response-time models seem to be unclear as to the level of parametrization they represent. A hierarchical framework for modeling speed and accuracy on test items is presented as an alternative to these models. The framework allows a “plug-and-play approach” with alternative choices of models for the response and response-time distributions as well as the distributions of their parameters. Bayesian treatment of the framework with Markov chain Monte Carlo (MCMC) computation facilitates the approach. Use of the framework is illustrated for the choice of a normal-ogive response model, a lognormal model for the response times, and multivariate normal models for their parameters with Gibbs sampling from the joint posterior distribution. This study received funding from the Law School Admission Council (LSAC). The opinions and conclusions contained in this paper are those of the author and do not necessarily reflect the policy and position of LSAC. The author is indebted to the American Institute of Certified Public Accountants for the data set in the empirical example and to Rinke H. Klein Entink for his computational assistance  相似文献   

4.
Pohl  Steffi  Ulitzsch  Esther  von Davier  Matthias 《Psychometrika》2019,84(3):892-920

Missing values at the end of a test typically are the result of test takers running out of time and can as such be understood by studying test takers’ working speed. As testing moves to computer-based assessment, response times become available allowing to simulatenously model speed and ability. Integrating research on response time modeling with research on modeling missing responses, we propose using response times to model missing values due to time limits. We identify similarities between approaches used to account for not-reached items (Rose et al. in ETS Res Rep Ser 2010:i–53, 2010) and the speed-accuracy (SA) model for joint modeling of effective speed and effective ability as proposed by van der Linden (Psychometrika 72(3):287–308, 2007). In a simulation, we show (a) that the SA model can recover parameters in the presence of missing values due to time limits and (b) that the response time model, using item-level timing information rather than a count of not-reached items, results in person parameter estimates that differ from missing data IRT models applied to not-reached items. We propose using the SA model to model the missing data process and to use both, ability and speed, to describe the performance of test takers. We illustrate the application of the model in an empirical analysis.

  相似文献   

5.
In order to identify aberrant response-time patterns on educational and psychological tests, it is important to be able to separate the speed at which the test taker operates from the time the items require. A lognormal model for response times with this feature was used to derive a Bayesian procedure for detecting aberrant response times. Besides, a combination of the response-time model with a regular response model in an hierarchical framework was used in an alternative procedure for the detection of aberrant response times, in which collateral information on the test takers’ speed is derived from their response vectors. The procedures are illustrated using a data set for the Graduate Management Admission Test® (GMAT®). In addition, a power study was conducted using simulated cheating behavior on an adaptive test.  相似文献   

6.
In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed.  相似文献   

7.
Two new tests for a model for the response times on pure speed tests by Rasch (1960) are proposed. The model is based on the assumption that the test response times are approximately gamma distributed, with known index parameters and unknown rate parameters. The rate parameters are decomposed in a subject ability parameter and a test difficulty parameter. By treating the ability as a gamma distributed random variable, maximum marginal likelihood (MML) estimators for the test difficulty parameters and the parameters of the ability distribution are easily derived. Also the model tests proposed here pertain to the framework of MML. Two tests or modification indices are proposed. The first one is focused on the assumption of local stochastic independence, the second one on the assumption of the test characteristic functions. The tests are based on Lagrange multiplier statistics, and can therefore be computed using the parameter estimates under the null model. Therefore, model violations for all items and pairs of items can be assessed as a by-product of one single estimation run. Power studies and applications to real data are included as numerical examples.  相似文献   

8.
詹沛达 《心理科学》2019,(1):170-178
随着心理与教育测量研究的发展和科技的进步,计算机化(大规模)测验逐渐受到人们的关注。为探究在计算机化多维测验中如何利用作答时间数据来辅助评估多维潜在能力,以及为我国义务教育阶段教育质量监测提供数据分析方法上的理论支持。本研究以2012年和2015年国际学生能力评估(PISA)计算机化数学测验数据为例,提出了一种可同时利用作答时间和作答精度数据的联合作答与时间的多维Rasch模型。根据新模型对PISA数据的分析结果,表明引入作答时间数据,不仅有助于提高模型参数的估计精度,还有助于数据分析者利用被试的作答时间信息来做进一步的决策和干预(e.g., 对异常作答行为或预备知识的诊断)。  相似文献   

9.
本文对具有较好发展前景的HO-DINA模型进行拓展,将仅适用于0-1评分题型的HO-DINA模型拓广至可用于多级评分题型,采用MCMC算法实现了对模型参数的估计,并对新模型性能进行了研究。研究发现: (1)本文拓展的多级评分HO-DINA模型参数估计精度较高且诊断正确率较高。(2)多级评分的HO-DINA模型诊断的属性个数越多,属性参数( 和 )和s参数估计的精度越差、属性诊断的正确率(MMR和PRM)越低,但能力参数( )和g参数的估计精度反而越高。(3)在当前条件下,若想保证属性模式判准率在80%以上,建议诊断的属性个数不宜超过7个。  相似文献   

10.
A new method is proposed for estimating factor means and factor covariances in a group of individuals selected on their observed scores. The selection variable is, for example, the total score on an admissions test. Given a factor model for the test items based on the group of test takers, we may be interested in the factor structure for those in the top quartile. The differences in factor means and covariances between this selected group and the full group gives useful information both on successful test performance and on test validity. The new method draws on the classic Pearson-Lawley selection formulas. It avoids the fallacy of factor analysis on the selected group, which would lead to incorrect estimates. The new method is applied to a simple factor structure model for the GMAT test. Although the majority of the GMAT items test verbal skills, it is found that a quantitative factor shows the greatest change in moving from average to top quartile test takers.  相似文献   

11.
当前大多数融合反应时的IRT模型仅适用于0-1评分数据资料,极大的限制了IRT反应时模型在实际中的应用。本文在传统的二级计分反应时IRT模型基础上,拟开发一种多级评分反应时模型。在层次建模框架下,分别采用拓广分部评分模型(GPCM)和对数正态模型构建融合反应时的多级评分IRT模型(本文记为JRT-GPCM),并采用全息贝叶斯MCMC算法实现新模型的参数估计。为验证新开发的JRT-GPCM模型的可行性及其在实践中的应用,本文开展了两项研究:研究1为模拟实验研究,研究2为新模型在大五人格-神经质分量表中的应用。研究1结果表明,JRT-GPCM模型的估计精度较高,且具有较好的稳健性。研究2表明,被试的潜在特质与作答速度具有一定的正相关,且本研究结果支持Ferrando和Lorenzo-Seva(2007)提出的“距离-困难度假设”,即当被试的潜在特质与项目的难度阈限距离越远,那么被试会花费更多的时间对项目进行作答。总之,本研究为拓展反应时信息在心理测量及教育中的应用提供新的方法支持。  相似文献   

12.
The purpose of this study is to explore patterns in model-data fit related to subgroups of test takers from a large-scale writing assessment. Using data from the SAT, a calibration group was randomly selected to represent test takers who reported that English was their best language from the total population of test takers (N = 322,011). A reference scale for the items was constructed based on EBL responses. Response behaviors of test takers who reported that English was not their best language (ENBL) were examined in relationship to this reference scale. This study illustrates the use of differential subgroup analyses to identify patterns related to person misfit within subgroups, as well as subsets of items, that may affect the validity of writing scores for ENBL test takers. The methodology described here offers an approach that can be used to explore, understand, and improve the validity of scores obtained from ENBL test takers in large-scale writing assessments.  相似文献   

13.
Marginal maximum‐likelihood procedures for parameter estimation and testing the fit of a hierarchical model for speed and accuracy on test items are presented. The model is a composition of two first‐level models for dichotomous responses and response times along with multivariate normal models for their item and person parameters. It is shown how the item parameters can easily be estimated using Fisher's identity. To test the fit of the model, Lagrange multiplier tests of the assumptions of subpopulation invariance of the item parameters (i.e., no differential item functioning), the shape of the response functions, and three different types of conditional independence were derived. Simulation studies were used to show the feasibility of the estimation and testing procedures and to estimate the power and Type I error rate of the latter. In addition, the procedures were applied to an empirical data set from a computerized adaptive test of language comprehension.  相似文献   

14.
允许修改答案的认知诊断计算机化自适应测验(Reviewable Cognitive Diagnostic Computerized Adaptive Testing,RCD-CAT),有利于更准确诊断被试的知识状态,题目口袋法(Item Pocket,IP)为被试提供了缓存作答并修改的机会,改进的题目口袋法(Modified IP,MIP)对IP内修改的题目重新计分。模拟研究比较了IP、MIP、stocking Ⅰ和stocking Ⅱ在RCD-CAT效果,结果发现:stocking设计的效果最优,其中stocking Ⅱ的效果略优于stocking Ⅰ,IP法和MIP法判准率要低于传统CD-CAT,stocking设计在RCD-CAT具有较好的应用前景。  相似文献   

15.
The study investigates empirical properties of reasoning speed which is conceived as the fluency of solving reasoning problems. Responses and response times in reasoning tasks are modeled jointly to clarify the covariance structure of reasoning speed and reasoning ability. To determine underlying abilities, the predictive validities of two cognitive covariates, namely perceptual and executive attention, are investigated. A sample of N = 230 test takers completed a reasoning test, Advanced Progressive Matrices (APM), and attention tests indicating perceptual and executive attention. For modeling responses the two-parameter normal ogive model, and for modeling response times the two-parameter lognormal model was applied. Results suggest that reasoning speed is a unidimensional construct representing significant individual differences, and that reasoning speed and ability are negatively correlated but clearly distinguishable constructs. Perceptual and executive attention showed differential effects on reasoning speed and reasoning ability, i.e., reasoning speed is explained by executive attention only, while reasoning ability is explained by both covariates. Implications for the assessment of reasoning are discussed.  相似文献   

16.
尽管多阶段测验(MST)在保持自适应测验优点的同时允许测验编制者按照一定的约束条件去建构每一个模块和题板,但建构测验时若因忽视某些潜在的因素而导致题目之间出现局部题目依赖性(LID)时,也会对MST测验结果带来一定的危害。为探究"LID对MST的危害"这一问题,本研究首先介绍了MST和LID等相关概念;然后通过模拟研究比较探讨该问题,结果表明LID的存在会影响被试能力估计的精度但仍为估计偏差较小,且该危害不限于某一特定的路由规则;之后为消除该危害,使用了题组反应模型作为MST施测过程中的分析模型,结果表明尽管该方法能够消除部分危害但效果有限。这一方面表明LID对MST中被试能力估计精度所带来的危害确实值得关注,另一方面也表明在今后关于如何消除MST中由LID造成危害的方法仍值得进一步探究的。  相似文献   

17.
This paper examines the effect of (1) delay between learning and test and (2) associative interference on memory retrieval speed. The speed-accuracy tradeoff methodology, which interrupts the retrieval process at various times (0.3, 0.7, 1.0, 1.5, 2.0, and 3.0 sec) after presentation of the test item, provides a means of separating retrieval speed effects from effects on overall memory strength. Performance at short processing times is an index of retrieval speed. Performance given ample processing time is a measure of asymptotic accuracy, or memory strength. Increasing the delay between learning and test or introduction of interference relations lowered memory trace strength, as reflected in asymptotic accuracy. Items tested shortly (about 3 sec) after learning showed a significant speedup in retrieval relative to items tested at a longer (several minute) delay. Further analysis suggested that the delay effect on retrieval was primarily the result of immediate repetition, or testing of the last-learned item. The interference manipulation showed a slight and nonsignificant tendency toward slowing of memory retrieval. The implications of these results for various models of retrieval are explored via simulations. The results of all the simulations suggested a direct-access retrieval process where associations are processed largely in parallel. Contradiction or mismatch information in recognizing new items was important because it provided an explanation for a slight slowing in retrieval due to interference even with a parallel-processing assumption. Faster retrieval for the last-learned item may be the result of residual activation following active processing.  相似文献   

18.
Klotzke  Konrad  Fox  Jean-Paul 《Psychometrika》2019,84(3):649-672

A multivariate generalization of the log-normal model for response times is proposed within an innovative Bayesian modeling framework. A novel Bayesian Covariance Structure Model (BCSM) is proposed, where the inclusion of random-effect variables is avoided, while their implied dependencies are modeled directly through an additive covariance structure. This makes it possible to jointly model complex dependencies due to for instance the test format (e.g., testlets, complex constructs), time limits, or features of digitally based assessments. A class of conjugate priors is proposed for the random-effect variance parameters in the BCSM framework. They give support to testing the presence of random effects, reduce boundary effects by allowing non-positive (co)variance parameters, and support accurate estimation even for very small true variance parameters. The conjugate priors under the BCSM lead to efficient posterior computation. Bayes factors and the Bayesian Information Criterion are discussed for the purpose of model selection in the new framework. In two simulation studies, a satisfying performance of the MCMC algorithm and of the Bayes factor is shown. In comparison with parameter expansion through a half-Cauchy prior, estimates of variance parameters close to zero show no bias and undercoverage of credible intervals is avoided. An empirical example showcases the utility of the BCSM for response times to test the influence of item presentation formats on the test performance of students in a Latin square experimental design.

  相似文献   

19.
2PL模型的两种马尔可夫蒙特卡洛缺失数据处理方法比较   总被引:1,自引:0,他引:1  
曾莉  辛涛  张淑梅 《心理学报》2009,41(3):276-282
马尔科夫蒙特卡洛(MCMC)是项目反应理论中处理缺失数据的一种典型方法。文章通过模拟研究比较了在不同被试人数,项目数,缺失比例下两种MCMC方法(M-H within Gibbs和DA-T Gibbs)参数估计的精确性,并结合了实证研究。研究结果表明,两种方法是有差异的,项目参数估计均受被试人数影响很大,受缺失比例影响相对更小。在样本较大缺失比例较小时,M-H within Gibbs参数估计的均方误差(RMSE)相对略小,随着样本数的减少或缺失比例的增加,DA-T Gibbs方法逐渐优于M-H within Gibbs方法  相似文献   

20.
For detecting differential item functioning (DIF) between two or more groups of test takers in the Rasch model, their item parameters need to be placed on the same scale. Typically this is done by means of choosing a set of so-called anchor items based on statistical tests or heuristics. Here the authors suggest an alternative strategy: By means of an inequality criterion from economics, the Gini Index, the item parameters are shifted to an optimal position where the item parameter estimates of the groups best overlap. Several toy examples, extensive simulation studies, and two empirical application examples are presented to illustrate the properties of the Gini Index as an anchor point selection criterion and compare its properties to those of the criterion used in the alignment approach of Asparouhov and Muthén. In particular, the authors show that—in addition to the globally optimal position for the anchor point—the criterion plot contains valuable additional information and may help discover unaccounted DIF-inducing multidimensionality. They further provide mathematical results that enable an efficient sparse grid optimization and make it feasible to extend the approach, for example, to multiple group scenarios.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号