首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multidimensional item response theory (MIRT) models for response style (e.g., Bolt, Lu, & Kim, 2014, Psychological Methods, 19, 528; Falk & Cai, 2016, Psychological Methods, 21, 328) provide flexibility in accommodating various response styles, but often present difficulty in isolating the effects of response style(s) from the intended substantive trait(s). In the presence of such measurement limitations, we consider several ways in which MIRT models are nevertheless useful in lending insight into how response styles may interfere with measurement for a given test instrument. Such a study can also inform whether alternative design considerations (e.g., anchoring vignettes, self-report items of heterogeneous content) that seek to control for response style effects may be helpful. We illustrate several aspects of an MIRT approach using real and simulated analyses.  相似文献   

2.
Multidimensional item response theory (MIRT) is widely used in assessment and evaluation of educational and psychological tests. It models the individual response patterns by specifying a functional relationship between individuals' multiple latent traits and their responses to test items. One major challenge in parameter estimation in MIRT is that the likelihood involves intractable multidimensional integrals due to the latent variable structure. Various methods have been proposed that involve either direct numerical approximations to the integrals or Monte Carlo simulations. However, these methods are known to be computationally demanding in high dimensions and rely on sampling data points from a posterior distribution. We propose a new Gaussian variational expectation--maximization (GVEM) algorithm which adopts variational inference to approximate the intractable marginal likelihood by a computationally feasible lower bound. In addition, the proposed algorithm can be applied to assess the dimensionality of the latent traits in an exploratory analysis. Simulation studies are conducted to demonstrate the computational efficiency and estimation precision of the new GVEM algorithm compared to the popular alternative Metropolis–Hastings Robbins–Monro algorithm. In addition, theoretical results are presented to establish the consistency of the estimator from the new GVEM algorithm.  相似文献   

3.
The use of multidimensional forced-choice (MFC) items to assess non-cognitive traits such as personality, interests and values in psychological tests has a long history, because MFC items show strengths in preventing response bias. Recently, there has been a surge of interest in developing item response theory (IRT) models for MFC items. However, nearly all of the existing IRT models have been developed for MFC items with binary scores. Real tests use MFC items with more than two categories; such items are more informative than their binary counterparts. This study developed a new IRT model for polytomous MFC items based on the cognitive model of choice, which describes the cognitive processes underlying humans' preferential choice behaviours. The new model is unique in its ability to account for the ipsative nature of polytomous MFC items, to assess individual psychological differentiation in interests, values and emotions, and to compare the differentiation levels of latent traits between individuals. Simulation studies were conducted to examine the parameter recovery of the new model with existing computer programs. The results showed that both statement parameters and person parameters were well recovered when the sample size was sufficient. The more complete the linking of the statements was, the more accurate the parameter estimation was. This paper provides an empirical example of a career interest test using four-category MFC items. Although some aspects of the model (e.g., the nature of the person parameters) require additional validation, our approach appears promising.  相似文献   

4.
5.
The aim of latent variable selection in multidimensional item response theory (MIRT) models is to identify latent traits probed by test items of a multidimensional test. In this paper the expectation model selection (EMS) algorithm proposed by Jiang et al. (2015) is applied to minimize the Bayesian information criterion (BIC) for latent variable selection in MIRT models with a known number of latent traits. Under mild assumptions, we prove the numerical convergence of the EMS algorithm for model selection by minimizing the BIC of observed data in the presence of missing data. For the identification of MIRT models, we assume that the variances of all latent traits are unity and each latent trait has an item that is only related to it. Under this identifiability assumption, the convergence of the EMS algorithm for latent variable selection in the multidimensional two-parameter logistic (M2PL) models can be verified. We give an efficient implementation of the EMS for the M2PL models. Simulation studies show that the EMS outperforms the EM-based L1 regularization in terms of correctly selected latent variables and computation time. The EMS algorithm is applied to a real data set related to the Eysenck Personality Questionnaire.  相似文献   

6.
Applications of item response theory, which depend upon its parameter invariance property, require that parameter estimates be unbiased. A new method, weighted likelihood estimation (WLE), is derived, and proved to be less biased than maximum likelihood estimation (MLE) with the same asymptotic variance and normal distribution. WLE removes the first order bias term from MLE. Two Monte Carlo studies compare WLE with MLE and Bayesian modal estimation (BME) of ability in conventional tests and tailored tests, assuming the item parameters are known constants. The Monte Carlo studies favor WLE over MLE and BME on several criteria over a wide range of the ability scale.  相似文献   

7.
Wendy M. Yen 《Psychometrika》1985,50(4):399-410
When the three-parameter logistic model is applied to tests covering a broad range of difficulty, there frequently is an increase in mean item discrimination and a decrease in variance of item difficulties and traits as the tests become more difficult. To examine the hypothesis that this unexpected scale shrinkage effect occurs because the items increase in complexity as they increase in difficulty, an approximate relationship is derived between the unidimensional model used in data analysis and a multidimensional model hypothesized to be generating the item responses. Scale shrinkage is successfully predicted for several sets of simulated data.The author is grateful to Robert Mislevy for kindly providing a copy of his computer program, RESOLVE.  相似文献   

8.
当前国内外大部分认知诊断计算机化自适应测验(CD-CAT)主要采用PWKL作为选题策略进行研究。PWKL结合后验分布信息对KL指标进行加权,提高了判准率,但该方法仅利用个体层面信息加权,忽视了项目本身能够提供的信息,属于单源指标。本研究结合认知诊断中的项目区分度信息,对PWKL进行修正,提出了4种新的多源选题策略:GIDPWKL、AIDPWKL、CIDPWKL和KLEDPWKL方法,并在加入曝光控制下与PWKL和互信息法(MIM)进行比较。模拟研究结果表明:(1)在定长测验情景下的绝大多数实验结果表明,测验长度越短,新方法的判准率越高。平均属性/模式判准率最高的是GIDPWKL,之后是AIDPWKL,而CIDPWKL、KLEDPWKL和MIM方法的优势随实验条件不同而不同。(2)在定长测验情景下的绝大多数实验结果表明,题目质量越高,新方法的优势越明显。(3)Q矩阵结构的复杂性会影响不同选题策略的表现。(4)在变长测验情景下,4种新方法和MIM的平均测验长度均要低于PWKL方法,表现最好的是GIDPWKL方法。因此,若实际测验情景与本研究的模拟情景相似,推荐GIDPWKL方法。  相似文献   

9.
Multidimensional adaptive testing   总被引:5,自引:0,他引:5  
Maximum likelihood and Bayesian procedures for item selection and scoring of multidimensional adaptive tests are presented. A demonstration using simulated response data illustrates that multidimensional adaptive testing (MAT) can provide equal or higher reliabilities with about one-third fewer items than are required by one-dimensional adaptive testing (OAT). Furthermore, holding test-length constant across the MAT and OAT approaches, substantial improvements in reliability can be obtained from multidimensional assessment. A number of issues relating to the operational use of multidimensional adaptive testing are discussed.  相似文献   

10.
针对双目标CD-CAT,将六种项目区分度(鉴别力D、一般区分度GDI、优势比OR、2PL的区分度a、属性区分度ADI、认知诊断区分度CDI)分别与IPA方法结合,得到新的选题策略。模拟研究比较了它们的表现,还考察了区分度分层在控制项目曝光的表现。结果发现:新方法都能明显提高知识状态的判准率和能力估计精度;分层选题均能很好地提高题库利用率。总体上,OR加权能显著提高测量精度;OR分层选题在保证测量精度条件下显著提高项目曝光均匀性。  相似文献   

11.
詹沛达  Hong Jiao  Kaiwen Man 《心理学报》2020,52(9):1132-1142
在心理与教育测量中,潜在加工速度反映学生运用潜在能力解决问题的效率。为在多维测验中探究潜在加工速度的多维性并实现参数估计,本研究提出多维对数正态作答时间模型。实证数据分析及模拟研究结果表明:(1)潜在加工速度具有与潜在能力相匹配的多维结构;(2)新模型可精确估计个体水平的多维潜在加工速度及与作答时间有关的题目参数;(3)冗余指定潜在加工速度具有多维性带来的负面影响低于忽略其多维性所带来的。  相似文献   

12.
汪文义  宋丽红  丁树良 《心理学报》2016,48(12):1612-1624
介绍多维项目反应理论模型下分类准确性和分类一致性指标, 采用蒙特卡罗方法实现复杂决策规则下指标计算, 并从数学上证明分类准确性指标两类估计量在均匀先验和相同决策规则条件下依概率收敛于同一真值。研究结果表明:分类准确性指标可以比较准确地评价分类结果的准确性; 分类一致性指标可以较好地评价分类结果的重测一致性; 在一定条件下, 基于能力量尺的指标优于基于原始总分的指标; 纵使测验维度增加, 估计精度仍比较好; 随着测验长度和维度间相关增加, 分类准确性和分类一致性更高。指标可以用来评价标准参照测验或计算机分类测验的多种决策规则下分类信度和效度。  相似文献   

13.
Owen (1975) proposed an approximate empirical Bayes procedure for item selection in computerized adaptive testing (CAT). The procedure replaces the true posterior by a normal approximation with closed-form expressions for its first two moments. This approximation was necessary to minimize the computational complexity involved in a fully Bayesian approach but is no longer necessary given the computational power currently available for adaptive testing. This paper suggests several item selection criteria for adaptive testing which are all based on the use of the true posterior. Some of the statistical properties of the ability estimator produced by these criteria are discussed and empirically characterized.Portions of this paper were presented at the 60th annual meeting of the Psychometric Society, Minneapolis, Minnesota, June, 1995. The author is indebted to Wim M. M. Tielen for his computational support.  相似文献   

14.
There has recently been much interest in computerized adaptive testing (CAT) for cognitive diagnosis. While there exist various item selection criteria and different asymptotically optimal designs, these are mostly constructed based on the asymptotic theory assuming the test length goes to infinity. In practice, with limited test lengths, the desired asymptotic optimality may not always apply, and there are few studies in the literature concerning the optimal design of finite items. Related questions, such as how many items we need in order to be able to identify the attribute pattern of an examinee and what types of initial items provide the optimal classification results, are still open. This paper aims to answer these questions by providing non‐asymptotic theory of the optimal selection of initial items in cognitive diagnostic CAT. In particular, for the optimal design, we provide necessary and sufficient conditions for the Q ‐matrix structure of the initial items. The theoretical development is suitable for a general family of cognitive diagnostic models. The results not only provide a guideline for the design of optimal item selection procedures, but also may be applied to guide item bank construction.  相似文献   

15.
16.
毛秀珍  刘欢  唐倩 《心理科学》2019,(1):187-193
双因子模型假设测验考察一个一般因子和多个组因子,符合很多教育和心理测验的因素结构。“维度缩减”方法将参数估计中多维积分计算化简为多个迭代二维积分,是双因子模型的重要特征。本文针对考察多级评分项目的计算机化自适应测验,首先推导双因子等级反应模型下Fisher信息量的计算,然后推导“维度缩减”方法在项目选择方法中的应用,最后在低、中、高双因子模式题库中比较D-优化方法、后验加权Fisher信息D优化方法(PDO)、后验加权Kullback-Leibler方法(PKL)、连续熵(CEM)和互信息(MI)方法在能力估计的相关、均方根误差、绝对值偏差和欧氏距离的表现。模拟研究表明:(1)双因子模式越强,即一般因子和组因子在项目上的区分度的差异越小,一般因子估计精度降低,组因子估计精度增加,整体能力的估计精度提高;(2)相同实验条件下,连续熵方法的测量精度最高,PKL方法的能力估计精度最低,其它方法的测量精度没有显著差异。  相似文献   

17.
18.
任赫  陈平 《心理学报》2021,53(9):1044-1058
计算机化分类测验(Computerized Classification Testing, CCT)由于具备分类的功能, 目前在职业资格考试、健康与护理问卷等以分类为目的的测验中得到广泛应用。作为CCT的重要组成部分, 终止规则不仅决定测验停止的条件而且直接影响分类准确率及测验效率。然而, 目前少有研究对多维CCT (Mulitidimensional CCT, MCCT)的终止规则进行探索。针对已有MCCT终止规则的不足, 提出两种新的MCCT终止规则(即基于马氏距离的多维序贯似然比规则Mahalanobis-SPRT和随机缩减的多维广义似然比规则M-SCGLR), 并开展模拟研究在不同实验条件下(比如, 不同的题库结构、能力维度间相关及分界函数)考查它们的表现。结果表明:(1)在使用补偿性分界函数的条件下, Mahalanobis-SPRT规则具有较高的分类精度和与同类方法相近的测验长度; (2)在几乎所有实验条件下, M-SCGLR规则不仅在测验精度上大幅优于已有的多维随机缩减规则, 而且具有较短的测验长度。  相似文献   

19.
解释性项目反应理论模型(Explanatory Item Response Theory Models, EIRTM)是指基于广义线性混合模型和非线性混合模型构建的项目反应理论(Item Response Theory, IRT)模型。EIRTM能在IRT模型的基础上直接加入预测变量, 从而解决各类测量问题。首先介绍EIRTM的相关概念和参数估计方法, 然后展示如何使用EIRTM处理题目位置效应、测验模式效应、题目功能差异、局部被试依赖和局部题目依赖, 接着提供实例对EIRTM的使用进行说明, 最后对EIRTM的不足之处和应用前景进行讨论。  相似文献   

20.
Personality constructs, attitudes and other non-cognitive variables are often measured using rating or Likert-type scales, which does not come without problems. Especially in low-stakes assessments, respondents may produce biased responses due to response styles (RS) that reduce the validity and comparability of the measurement. Detecting and correcting RS is not always straightforward because not all respondents show RS and the ones who do may not do so to the same extent or in the same direction. The present study proposes the combination of a multidimensional IRTree model with a mixture distribution item response theory model and illustrates the application of the approach using data from the Programme for the International Assessment of Adult Competencies (PIAAC). This joint approach allows for the differentiation between different latent classes of respondents who show different RS behaviours and respondents who show RS versus respondents who give (largely) unbiased responses. We illustrate the application of the approach by examining extreme RS and show how the resulting latent classes can be further examined using external variables and process data from computer-based assessments to develop a better understanding of response behaviour and RS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号