首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
A pplications of standard item response theory models assume local independence of items and persons. This paper presents polytomous multilevel testlet models for dual dependence due to item and person clustering in testlet‐based assessments with clustered samples. Simulation and survey data were analysed with a multilevel partial credit testlet model. This model was compared with three alternative models – a testlet partial credit model (PCM), multilevel PCM, and PCM – in terms of model parameter estimation. The results indicated that the deviance information criterion was the fit index that always correctly identified the true multilevel testlet model based on the quantified evidence in model selection, while the Akaike and Bayesian information criteria could not identify the true model. In general, the estimation model and the magnitude of item and person clustering impacted the estimation accuracy of ability parameters, while only the estimation model and the magnitude of item clustering affected the item parameter estimation accuracy. Furthermore, ignoring item clustering effects produced higher total errors in item parameter estimates but did not have much impact on the accuracy of ability parameter estimates, while ignoring person clustering effects yielded higher total errors in ability parameter estimates but did not have much effect on the accuracy of item parameter estimates. When both clustering effects were ignored in the PCM, item and ability parameter estimation accuracy was reduced.  相似文献   

2.
等级反应模型下计算机化自适应测验选题策略   总被引:7,自引:3,他引:4  
陈平  丁树良  林海菁  周婕 《心理学报》2006,38(3):461-467
计算机化自适应测验(CAT)中的选题策略,一直是国内外相关学者关注的问题。然而对多级评分的CAT的选题策略的研究却很少报导。本研究采用计算机模拟程序对等级反应模型(Graded Response Model)下CAT的四种选题策略进行研究。研究表明:等级难度值与当前能力估计值匹配选题策略的综合评价最高;在选题策略中增设 “影子题库”可以明显提高项目调用的均匀性;并且不同的项目参数分布或不同的能力估计方法都对CAT评价指标有影响  相似文献   

3.
An item response theory (IRT) model is used as a measurement error model for the dependent variable of a multilevel model. The dependent variable is latent but can be measured indirectly by using tests or questionnaires. The advantage of using latent scores as dependent variables of a multilevel model is that it offers the possibility of modelling response variation and measurement error and separating the influence of item difficulty and ability level. The two‐parameter normal ogive model is used for the IRT model. It is shown that the stochastic EM algorithm can be used to estimate the parameters which are close to the maximum likelihood estimates. This algorithm is easily implemented. The estimation procedure will be compared to an implementation of the Gibbs sampler in a Bayesian framework. Examples using real data are given.  相似文献   

4.
蔡艳  苗莹  涂冬波 《心理学报》2016,48(10):1338-1346
本文在0-1评分的CD-CAT基础上, 拓展出了适合多级评分CD-CAT (psCD-CAT)的认知诊断模型及选题策略, 为实现多级评分CD-CAT提供了方法支持。Monte Carlo模拟实验结果表明:本文拓展的多级评分CD-CAT具有较理想的属性诊断正确率及测验效率和题库安全性, 可以用于多级评分数据的CD-CAT; 模拟实验还表明, 整体来看PS-PWKL和PS-HKL两种选题策略具有较高属性判准率、题库安全性和高测验效率, 且均优于PS-KL选题策略。总之, 本研究对于进一步拓展CD-CAT在实践中的应用提供了认知诊断模型与选题策略等。  相似文献   

5.
Psychometric models for item-level data are broadly useful in psychology. A recurring issue for estimating item factor analysis (IFA) models is low-item endorsement (item sparseness), due to limited sample sizes or extreme items such as rare symptoms or behaviors. In this paper, I demonstrate that under conditions characterized by sparseness, currently available estimation methods, including maximum likelihood (ML), are likely to fail to converge or lead to extreme estimates and low empirical power. Bayesian estimation incorporating prior information is a promising alternative to ML estimation for IFA models with item sparseness. In this article, I use a simulation study to demonstrate that Bayesian estimation incorporating general prior information improves parameter estimate stability, overall variability in estimates, and power for IFA models with sparse, categorical indicators. Importantly, the priors proposed here can be generally applied to many research contexts in psychology, and they do not impact results compared to ML when indicators are not sparse. I then apply this method to examine the relationship between suicide ideation and insomnia in a sample of first-year college students. This provides an important alternative for researchers who may need to model items with sparse endorsement.  相似文献   

6.
To clarify the relationship between visual long-term memory (VLTM) and online visual processing, we investigated whether and how VLTM involuntarily affects the performance of a one-shot change detection task using images consisting of six meaningless geometric objects. In the study phase, participants observed pre-change (Experiment 1), post-change (Experiment 2), or both pre- and post-change (Experiment 3) images appearing in the subsequent change detection phase. In the change detection phase, one object always changed between pre- and post-change images and participants reported which object was changed. Results showed that VLTM of pre-change images enhanced the performance of change detection, while that of post-change images decreased accuracy. Prior exposure to both pre- and post-change images did not influence performance. These results indicate that pre-change information plays an important role in change detection, and that information in VLTM related to the current task does not always have a positive effect on performance.  相似文献   

7.
刘玥  刘红云 《心理学报》2012,44(2):263-275
题组模型可以解决传统IRT模型由于题目间局部独立性假设违背时所导致的参数估计偏差。为探讨题组随机效应模型的适用范围, 采用Monte Carlo模拟研究, 分别使用2-PL贝叶斯题组随机效应模型(BTRM)和2-PL贝叶斯模型(BM)对数据进行拟合, 考虑了题组效应、题组长度、题目数量和局部独立题目比例的影响。结果显示:(1) BTRM不受题组效应和题组长度影响, BM对参数估计的误差随题组效应和题组长度增加而增加。(2) BTRM具有一定的普遍性, 且当题组效应大, 题组长, 题目数量大时使用该模型能减少估计误差, 但是当题目数量较小时, 两个模型得到的能力估计误差都较大。(3)当局部独立题目的比例较大时, 两种模型得到的参数估计差异不大。  相似文献   

8.
We propose a generalization of the speed–accuracy response model (SARM) introduced by Maris and van der Maas (Psychometrika 77:615–633, 2012). In these models, the scores that result from a scoring rule that incorporates both the speed and accuracy of item responses are modeled. Our generalization is similar to that of the one-parameter logistic (or Rasch) model to the two-parameter logistic (or Birnbaum) model in item response theory. An expectation–maximization (EM) algorithm for estimating model parameters and standard errors was developed. Furthermore, methods to assess model fit are provided in the form of generalized residuals for item score functions and saddlepoint approximations to the density of the sum score. The presented methods were evaluated in a small simulation study, the results of which indicated good parameter recovery and reasonable type I error rates for the residuals. Finally, the methods were applied to two real data sets. It was found that the two-parameter SARM showed improved fit compared to the one-parameter SARM in both data sets.  相似文献   

9.
梁莘娅  杨艳云 《心理科学》2016,39(5):1256-1267
结构方程模型已被广泛应用于心理学、教育学、以及社会科学领域的统计分析中。结构方程模型分析中最常用的估计方法是基于正 态分布的估计量,比如极大似然估计法。这些方法需要满足两个假设。第一, 理论模型必须正确地反映变量与变量之间的关系,称为结构假 设。第二,数据必须符合多元正态分布,称为分布假设。如果这些假设不满足,基于正态分布的估计量就有可能导致不正确的卡方指数、不 正确的拟合度、以及有偏差的参数估计和参数估计的标准误。在实际应用中,几乎所有的理论模型都不能准确地解释变量与变量之间的关系, 数据也常常呈非多元正态分布。为此,一些新的估计方法得以发展。这些方法要么在理论上不要求数据呈多元正态分布,要么对因数据呈非 正态分布而导致的不正确结果进行纠正。当前较为流行的两种方法是稳健极大似然估计和贝叶斯估计。稳健极大似然估计是应用 Satorra and Bentler (1994) 的方法对不正确的卡方指数和参数估计的标准误进行调整,而参数估计和用极大似然方法得出的完全等同。贝叶斯估计方法则是 基于贝叶斯定理,其要点是:参数的后验分布是由参数的先验分布和数据似然值相乘而得来。后验分布常用马尔科夫蒙特卡洛算法来进行模拟。 对于稳健极大似然估计和贝叶斯估计这两种方法之间的优劣比较,先前的研究只局限于理论模型是正确的情境。而本研究则着重于理论模型 是错误的情境,同时也考虑到数据呈非正态分布的情境。本研究所采用的模型是验证性因子模型,数据全部由计算机模拟而来。数据的生成 取决于三个因素:8 类因子结构,3 种变量分布,和3 组样本量。这三个因素产生72 个模拟条件(72=8x3x3)。每个模拟条件下生成2000 个 数据组,每个数据组都拟合两个模型,一个是正确模型、一个是错误模型。每个模型都用两种估计方法来拟合:稳健极大似然估计法和贝叶 斯估计方法。贝叶斯估计方法中所使用的先验分布是无信息先验分布。结果分析主要着重于模型拒绝率、拟合度、参数估计、和参数估计的 标准误。研究的结果表明:在样本量充足的情况下,两种方法得出的参数估计非常相似。当数据呈非正态分布时,贝叶斯估计法比稳健极大 似然估计法更好地拒绝错误模型。但是,当样本量不足且数据呈正态分布时,贝叶斯估计在拒绝错误模型和参数估计上几乎没有优势,甚至 在一些条件下,比稳健极大似然法要差。  相似文献   

10.
多维题组效应Rasch模型   总被引:2,自引:0,他引:2  
首先, 本文诠释了“题组”的本质即一个存在共同刺激的项目集合。并基于此, 将题组效应划分为项目内单维题组效应和项目内多维题组效应。其次, 本文基于Rasch模型开发了二级评分和多级评分的多维题组效应Rasch模型, 以期较好地处理项目内多维题组效应。最后, 模拟研究结果显示新模型有效合理, 与Rasch题组模型、分部评分模型对比研究后表明:(1)测验存在项目内多维题组效应时, 仅把明显的捆绑式题组效应进行分离而忽略其他潜在的题组效应, 仍会导致参数的偏差估计甚或高估测验信度; (2)新模型更具普适性, 即便当被试作答数据不存在题组效应或只存在项目内单维题组效应, 采用新模型进行测验分析也能得到较好的参数估计结果。  相似文献   

11.
多级评分计算机化自适应测验动态综合选题策略   总被引:1,自引:0,他引:1  
罗芬  丁树良  王晓庆 《心理学报》2012,44(3):400-412
多级评分可以提供更多关于被试的信息, 是计算机化自适应测验的一个发展方向, 选题策略是计算机化自适应测验的研究重点。对于多级评分的等级反应模型, 本文拟用区间估计的思想改进近期提出的几种选题策略, 并且将两级评分b-STR和a-STR推广到多级评分以改进最大信息量选题策略。Monte Carlo模拟实验表明在达到或接近原有选题策略测验精度的基础上, 本文提出的几种新选题策略有的能够有效降低测验长度, 有的可以极大降低项目曝光率。  相似文献   

12.
Five adult humans palpated silicone breast models in a lump-detection task. The effects on detection of several lump and model characteristics were studied in three phases, using both discrete trial (restricted search procedure) and “free response” (free search procedure) psychophysical methods. Size and depth of fixed steel lumps were varied in Phase 1. Depth and hardness of uniformly sized, fixed lumps were varied in Phase 2. The presence and depth of simulated breast nodularity were also studied in Phases 1 and 2. In Phase 3, all breast models were uniformly nodular and lumps varied along dimensions of size, depth, hardness, and fixation. In Phase 1, lump detection was greatest with maximum lump size and minimum depth within the model. Neither lump hardness nor depth differentially influenced detection of the fixed lumps in Phase 2. When breast lumps were mobile, in Phase 3, size and hardness were major stimulus dimensions determining detection. These results suggest physical parameters for realistic breast models and lumps to be used in training effective breast self-examination and breast lump detection.  相似文献   

13.
Statistical methods for identifying aberrances on psychological and educational tests are pivotal to detect flaws in the design of a test or irregular behavior of test takers. Two approaches have been taken in the past to address the challenge of aberrant behavior detection, which are (1) modeling aberrant behavior via mixture modeling methods, and (2) flagging aberrant behavior via residual based outlier detection methods. In this paper, we propose a two-stage method that is conceived of as a combination of both approaches. In the first stage, a mixture hierarchical model is fitted to the response and response time data to distinguish normal and aberrant behaviors using Markov chain Monte Carlo (MCMC) algorithm. In the second stage, a further distinction between rapid guessing and cheating behavior is made at a person level using a Bayesian residual index. Simulation results show that the two-stage method yields accurate item and person parameter estimates, as well as high true detection rate and low false detection rate, under different manipulated conditions mimicking NAEP parameters. A real data example is given in the end to illustrate the potential application of the proposed method.  相似文献   

14.
In a latent class IRT model in which the latent classes are ordered on one dimension, the class specific response probabilities are subject to inequality constraints. The number of these inequality constraints increase dramatically with the number of response categories per item, if assumptions like monotonicity or double monotonicity of the cumulative category response functions are postulated. A Markov chain Monte Carlo method, the Gibbs sampler, can sample from the multivariate posterior distribution of the parameters under the constraints. Bayesian model selection can be done by posterior predictive checks and Bayes factors. A simulation study is done to evaluate results of the application of these methods to ordered latent class models in three realistic situations. Also, an example of the presented methods is given for existing data with polytomous items. It can be concluded that the Bayesian estimation procedure can handle the inequality constraints on the parameters very well. However, the application of Bayesian model selection methods requires more research.  相似文献   

15.
Most item response theory (IRT) models for dichotomous responses are based on probit or logit link functions which assume a symmetric relationship between the probability of a correct response and the latent traits of individuals taking a test. This assumption restricts the use of those models to the case in which all items behave symmetrically. On the other hand, asymmetric models proposed in the literature impose that all the items in a test behave asymmetrically. This assumption is inappropriate for great majority of tests which are, in general, composed of both symmetric and asymmetric items. Furthermore, a straightforward extension of the existing models in the literature would require a prior selection of the items' symmetry/asymmetry status. This paper proposes a Bayesian IRT model that accounts for symmetric and asymmetric items in a flexible but parsimonious way. That is achieved by assigning a finite mixture prior to the skewness parameter, with one of the mixture components being a point mass at zero. This allows for analyses under both model selection and model averaging approaches. Asymmetric item curves are designed through the centred skew normal distribution, which has a particularly appealing parametrization in terms of parameter interpretation and computational efficiency. An efficient Markov chain Monte Carlo algorithm is proposed to perform Bayesian inference and its performance is investigated in some simulated examples. Finally, the proposed methodology is applied to a data set from a large-scale educational exam in Brazil.  相似文献   

16.
Abstract

This paper evaluated multilevel reliability measures in two-level nested designs (e.g., students nested within teachers) within an item response theory framework. A simulation study was implemented to investigate the behavior of the multilevel reliability measures and the uncertainty associated with the measures in various multilevel designs regarding the number of clusters, cluster sizes, and intraclass correlations (ICCs), and in different test lengths, for two parameterizations of multilevel item response models with separate item discriminations or the same item discrimination over levels. Marginal maximum likelihood estimation (MMLE)-multiple imputation and Bayesian analysis were employed to evaluate the accuracy of the multilevel reliability measures and the empirical coverage rates of Monte Carlo (MC) confidence or credible intervals. Considering the accuracy of the multilevel reliability measures and the empirical coverage rate of the intervals, the results lead us to generally recommend MMLE-multiple imputation. In the model with separate item discriminations over levels, marginally acceptable accuracy of the multilevel reliability measures and empirical coverage rate of the MC confidence intervals were found in a limited condition, 200 clusters, 30 cluster size, .2 ICC, and 40 items, in MMLE-multiple imputation. In the model with the same item discrimination over levels, the accuracy of the multilevel reliability measures and the empirical coverage rate of the MC confidence intervals were acceptable in all multilevel designs we considered with 40 items under MMLE-multiple imputation. We discuss these findings and provide guidelines for reporting multilevel reliability measures.  相似文献   

17.
郭磊  刘伟 《心理科学》2018,(1):189-195
Zhang(2013)提出了序贯监测程序(SMP)用以检测CAT中的题目在作答过程中是否发生泄漏。然而,该方法会出现虚报且未关注在题目泄漏后,对能力估计精度产生的影响。本研究在SMP基础上引入个人拟合指标,提出SMP_PFI方法,拟在给定的置信度上核实被SMP标记的题目是否真正泄漏,并探查SMP_PFI方法对能力估计精度与被封存题目数量关系的影响。实验结果表明:新方法能够有效降低SMP单独运行时的一类错误。通过控制CPFI值能够平衡能力估计精度与被封存题目数量之间的关系。  相似文献   

18.
Cognitive diagnostic models provide a framework for classifying individuals into latent proficiency classes, also known as attribute profiles. Recent research has examined the implementation of a Pólya-gamma data augmentation strategy binary response model using logistic item response functions within a Bayesian Gibbs sampling procedure. In this paper, we propose a sequential exploratory diagnostic model for ordinal response data using a logit-link parameterization at the category level and extend the Pólya-gamma data augmentation strategy to ordinal response processes. A Gibbs sampling procedure is presented for efficient Markov chain Monte Carlo (MCMC) estimation methods. We provide results from a Monte Carlo study for model performance and present an application of the model.  相似文献   

19.
The most commonly employed item selection rule in a computerized adaptive test (CAT) is that of selecting the item with the maximum Fisher information for the estimated trait level. This means a highly unbalanced distribution of item‐exposure rates, a high overlap rate among examinees and, for item bank management, strong pressure to replace items with a high discrimination parameter in the bank. An alternative for mitigating these problems involves, at the beginning of the test, basing item selection mainly on randomness. As the test progresses, the weight of information in the selection increases. In the present work we study, for two selection rules, the progressive methods ( Revuelta & Ponsoda, 1998 ) and the proportional method ( Segall, 2004a ), different functions that define the weight of the random component according to the position in the test of the item to be administered. The functions were tested in simulated item banks and in an operative bank. We found that both the progressive and the proportional methods tolerate a high weight of the random component with minimal or zero loss of accuracy, while bank security and maintenance are improved.  相似文献   

20.
Yeh YY  Yang CT 《Acta psychologica》2008,127(1):114-128
People often fail to detect a change between two visual scenes, a phenomenon referred to as change blindness. This study investigates how a post-change object's similarity to the pre-change object influences memory of the pre-change object and affects change detection. The results of Experiment 1 showed that similarity lowered detection sensitivity but did not affect the speed of identifying the pre-change object, suggesting that similarity between the pre- and post-change objects does not degrade the pre-change representation. Identification speed for the pre-change object was faster than naming the new object regardless of detection accuracy. Similarity also decreased detection sensitivity in Experiment 2 but improved the recognition of the pre-change object under both correct detection and detection failure. The similarity effect on recognition was greatly reduced when 20% of each pre-change stimulus was masked by random dots in Experiment 3. Together the results suggest that the level of pre-change representation under detection failure is equivalent to the level under correct detection and that the pre-change representation is almost complete. Similarity lowers detection sensitivity but improves explicit access in recognition. Dissociation arises between recognition and change detection as the two judgments rely on the match-to-mismatch signal and mismatch-to-match signal, respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号