首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 93 毫秒
1.
项目反应理论中参数估计程序的实现,一直是研究现代测量理论的学者们关注的问题。该研究从理论上探讨了多级模型参数估计的实现途径,并模拟了5批不同数量及分布情形的项目及被试参数,生成相应的原始得分矩阵,对自编程序及国际流行的相关程序进行了严格的比较校验,验证结果证明本程序具有精确、稳定的性能,并且发现被试量太少将影响参数估计的精确性及稳定性  相似文献   

2.
四参数Logistic模型潜在特质参数的Warm加权极大似然估计   总被引:1,自引:0,他引:1  
孟祥斌  陶剑  陈莎莉 《心理学报》2016,(8):1047-1056
本文以四参数Logistic(4-parameter Logistic,4PL)模型为研究对象,根据Warm的加权极大似然估计技巧,提出了4PL模型潜在特质参数的加权极大似然估计方法,并借助模拟研究对加权极大似然估计的性质进行验证。研究结果表明,与通常的极大似然估计和后验期望估计相比,加权极大似然估计的偏差(bias)明显减小,并且具有良好的返真性能。此外,在测试的长度较短和项目的区分度较小的情况下,加权极大似然估计依然保持了良好的统计性质,表现出更加显著的优势。  相似文献   

3.
宋枝璘  郭磊  郑天鹏 《心理学报》2022,54(4):426-440
数据缺失在测验中经常发生, 认知诊断评估也不例外, 数据缺失会导致诊断结果的偏差。首先, 通过模拟研究在多种实验条件下比较了常用的缺失数据处理方法。结果表明:(1)缺失数据导致估计精确性下降, 随着人数与题目数量减少、缺失率增大、题目质量降低, 所有方法的PCCR均下降, Bias绝对值和RMSE均上升。(2)估计题目参数时, EM法表现最好, 其次是MI, FIML和ZR法表现不稳定。(3)估计被试知识状态时, EM和FIML表现最好, MI和ZR表现不稳定。其次, 在PISA2015实证数据中进一步探索了不同方法的表现。综合模拟和实证研究结果, 推荐选用EM或FIML法进行缺失数据处理。  相似文献   

4.
田伟  辛涛  康春花 《心理科学进展》2014,22(6):1036-1046
在心理与教育测量中, 项目反应理论(Item Response Theory, IRT)模型的参数估计方法是理论研究与实践应用的基本工具。最近, 由于IRT模型的不断扩展与EM (expectation-maximization)算法自身的固有问题, 参数估计方法的改进与发展显得尤为重要。这里介绍了IRT模型中边际极大似然估计的发展, 提出了它的阶段性特征, 即联合极大似然估计阶段、确定性潜在心理特质“填补”阶段、随机潜在心理特质“填补”阶段, 重点阐述了它的潜在心理特质“填补” (data augmentation)思想。EM算法与Metropolis-Hastings Robbins-Monro (MH-RM)算法作为不同的潜在心理特质“填补”方法, 都是边际极大似然估计的思想跨越。目前, 潜在心理特质“填补”的参数估计方法仍在不断发展与完善。  相似文献   

5.
陈平 《心理学报》2016,48(9):1184-1198
在线标定技术由于具有诸多优点而被广泛应用于计算机化自适应测验(CAT)的新题标定。Method A是想法最直接、算法最简单的CAT在线标定方法, 但它具有明显的理论缺陷--在标定过程中将能力估计值视为能力真值。将全功能极大似然估计方法(FFMLE)与“利用充分性结果”估计方法(ECSE)的误差校正思路融入Method A (新方法分别记为FFMLE-Method A和ECSE-Method A), 从理论上对能力估计误差进行校正, 进而克服Method A的标定缺陷。模拟研究的结果表明:(1)在大多数实验条件下, 两种新方法较Method A总体上可以改进标定精度, 且在测验长度为10的短测验上的改进幅度最大; (2)当CAT测验长度较短或中等(10或20题)时, 两种新方法的表现与性能最优的MEM已非常接近。当测验长度较长(30题)时, ECSE-Method A的总体表现最好、优于MEM; (3)样本量越大, 各种方法的标定精度越高。  相似文献   

6.
题组作为众多测验中的一种常见题型,由于项目间存在一定程度的依赖性而违背了局部独立性假设,若用项目反应模型进行参数估计将会出现较大的偏差.题组反应理论将被试与题组的交互作用纳入到模型中,解决了项目间相依性的问题.笔者对题组反应理论的发展、基本原理及其相关研究进行了综述,并将其应用在中学英语考试中.与项目反应理论相对比,结果发现:(1)题组反应模型与项目反应模型在各参数估计值的相关系数较强,尤其是能力参数和难度参数;(2)在置信区间宽度的比较上,题组反应模型在各个参数上均窄于项目反应模型,即题组反应模型的估计精度优于项目反应模型.  相似文献   

7.
尽管多阶段测验(MST)在保持自适应测验优点的同时允许测验编制者按照一定的约束条件去建构每一个模块和题板,但建构测验时若因忽视某些潜在的因素而导致题目之间出现局部题目依赖性(LID)时,也会对MST测验结果带来一定的危害。为探究"LID对MST的危害"这一问题,本研究首先介绍了MST和LID等相关概念;然后通过模拟研究比较探讨该问题,结果表明LID的存在会影响被试能力估计的精度但仍为估计偏差较小,且该危害不限于某一特定的路由规则;之后为消除该危害,使用了题组反应模型作为MST施测过程中的分析模型,结果表明尽管该方法能够消除部分危害但效果有限。这一方面表明LID对MST中被试能力估计精度所带来的危害确实值得关注,另一方面也表明在今后关于如何消除MST中由LID造成危害的方法仍值得进一步探究的。  相似文献   

8.
A method of estimating item response theory (IRT) equating coefficients by the common-examinee design with the assumption of the two-parameter logistic model is provided. The method uses the marginal maximum likelihood estimation, in which individual ability parameters in a common-examinee group are numerically integrated out. The abilities of the common examinees are assumed to follow a normal distribution but with an unknown mean and standard deviation on one of the two tests to be equated. The distribution parameters are jointly estimated with the equating coefficients. Further, the asymptotic standard errors of the estimates of the equating coefficients and the parameters for the ability distribution are given. Numerical examples are provided to show the accuracy of the method.  相似文献   

9.
项目反应理论等级反应模型项目信息量   总被引:6,自引:1,他引:6  
信息函数作为项目反应理论中的一个重要概念,在进行项目和测验分析的工作中,以及在指导测验编制的工作中,有着非常重要的应用价值。信息函数的应用在计算机化自适应测验中更是重中之重,也受到最大关注。然而,关于多级记分项目信息函数特性的研究还比较少。本研究模拟了被试特质水平参数数据和项目参数数据,其中被试特质水平参数生成了121个被试特质水平参数点,项目参数生成了4批不同区分度参数数据,每批数据有126个不同难度等级参数组合模式的项目,每个项目有5个难度等级。通过数据分析后发现,等级反应模型项目提供最大信息量所对应的被试特质水平,是与该项目几个相互临近的难度等级组相适应,既不是只与其中一个难度等级对应,也不一定是与所有难度等级对应。本研究称这种规律为“临近难度等级占优”。这个发现无疑对测验质量分析和测验编制工作,包括计算机化自适应测验编制,具有重要的指导意义  相似文献   

10.
Conditional Covariance Theory and Detect for Polytomous Items   总被引:1,自引:0,他引:1  
This paper extends the theory of conditional covariances to polytomous items. It has been proven that under some mild conditions, commonly assumed in the analysis of response data, the conditional covariance of two items, dichotomously or polytomously scored, given an appropriately chosen composite is positive if, and only if, the two items measure similar constructs besides the composite. The theory provides a theoretical foundation for dimensionality assessment procedures based on conditional covariances or correlations, such as DETECT and DIMTEST, so that the performance of these procedures is theoretically justified when applied to response data with polytomous items. Various estimators of conditional covariances are constructed, and special attention is paid to the case of complex sampling data, such as those from the National Assessment of Educational Progress (NAEP). As such, the new version of DETECT can be applied to response data sets not only with polytomous items but also with missing values, either by design or at random. DETECT is then applied to analyze the dimensional structure of the 2002 NAEP reading samples of grades 4 and 8. The DETECT results show that the substantive test structure based on the purposes for reading is consistent with the statistical dimensional structure for either grade. This research was supported by the Educational Testing Service and the National Assessment of Educational Progress (Grant R902F980001), US Department of Education. The opinions expressed herein are solely those of the author and do not necessarily represent those of the Educational Testing Service. The author would like to thank Ting Lu, Paul Holland, Shelby Haberman, and Feng Yu for their comments and suggestions. Requests for reprints should be sent to Jinming Zhang, Educational Testing Service, MS 02-T, Rosedale Road, Princeton, NJ 08541, USA. E-mail: jzhang@ets.org  相似文献   

11.
A method is proposed for constructing indices as linear functions of variables such that the reliability of the compound score is maximized. Reliability is defined in the framework of latent variable modeling [i.e., item response theory (IRT)] and optimal weights of the components of the index are found by maximizing the posterior variance relative to the total latent variable variance. Three methods for estimating the weights are proposed. The first is a likelihood-based approach, that is, marginal maximum likelihood (MML). The other two are Bayesian approaches based on Markov chain Monte Carlo (MCMC) computational methods. One is based on an augmented Gibbs sampler specifically targeted at IRT, and the other is based on a general purpose Gibbs sampler such as implemented in OpenBugs and Jags. Simulation studies are presented to demonstrate the procedure and to compare the three methods. Results are very similar, so practitioners may be suggested the use of the easily accessible latter method. A real-data set pertaining to the 28-joint Disease Activity Score is used to show how the methods can be applied in a complex measurement situation with multiple time points and mixed data formats.  相似文献   

12.
A central assumption that is implicit in estimating item parameters in item response theory (IRT) models is the normality of the latent trait distribution, whereas a similar assumption made in categorical confirmatory factor analysis (CCFA) models is the multivariate normality of the latent response variables. Violation of the normality assumption can lead to biased parameter estimates. Although previous studies have focused primarily on unidimensional IRT models, this study extended the literature by considering a multidimensional IRT model for polytomous responses, namely the multidimensional graded response model. Moreover, this study is one of few studies that specifically compared the performance of full-information maximum likelihood (FIML) estimation versus robust weighted least squares (WLS) estimation when the normality assumption is violated. The research also manipulated the number of nonnormal latent trait dimensions. Results showed that FIML consistently outperformed WLS when there were one or multiple skewed latent trait distributions. More interestingly, the bias of the discrimination parameters was non-ignorable only when the corresponding factor was skewed. Having other skewed factors did not further exacerbate the bias, whereas biases of boundary parameters increased as more nonnormal factors were added. The item parameter standard errors recovered well with both estimation algorithms regardless of the number of nonnormal dimensions.  相似文献   

13.
等级反应模型下项目特征曲线等值法在大型考试中的应用   总被引:2,自引:1,他引:1  
在中国最大的资格考试之一的经济专业资格考试中,为保证不同年度间考试的可比性、进行题库建设和为计算机自适应考试做准备,应用项目反应理论中等级反应模型下的项目特征曲线等值法,采用铆测验等值设计,实现了4个年度考试资料的项目参数和能力参数的等值,并成功地组建了经济专业题库。在此基础上,利用等值技术对不同年份试卷的划界分数进行了比较,为经济考试的合格标准制定、确保考试的公平性提供了实证依据。  相似文献   

14.
It is often considered desirable to have the same ordering of the items by difficulty across different levels of the trait or ability. Such an ordering is an invariant item ordering (IIO). An IIO facilitates the interpretation of test results. For dichotomously scored items, earlier research surveyed the theory and methods of an invariant ordering in a nonparametric IRT context. Here the focus is on polytomously scored items, and both nonparametric and parametric IRT models are considered.The absence of the IIO property in twononparametric polytomous IRT models is discussed, and two nonparametric models are discussed that imply an IIO. A method is proposed that can be used to investigate whether empirical data imply an IIO. Furthermore, only twoparametric polytomous IRT models are found to imply an IIO. These are the rating scale model (Andrich, 1978) and a restricted rating scale version of the graded response model (Muraki, 1990). Well-known models, such as the partial credit model (Masters, 1982) and the graded response model (Samejima, 1969), do no imply an IIO.  相似文献   

15.
余嘉元 《心理学报》2002,34(5):80-86
运用联结主义中的级连相关模型对于小样本条件下的连续记分项目反应理论 (IRT)模型的项目参数和被试能力进行了估计。一组被试对于一组项目的反应矩阵作为级连相关模型的输入 ,这组被试的能力θ或该组项目的参数a、b和c作为该模型的输出 ,对神经网络进行训练使之具备了估计θ,a ,b或c的能力。计算机模拟的实验表明 ,如果测验中有少量项目取自于题库 ,就可以运用联结主义方法对IRT参数和被试能力进行较好的估计  相似文献   

16.
应用项目反应理论对瑞文测验联合型的分析   总被引:1,自引:0,他引:1  
使用BILOG-MG3.0软件,边际极大似然估计,3参数Logistic模型对354名不同能力水平的男性青年的瑞文测验联合型数据进行了分析。结果显示:大多数瑞文测验联合型的题目都适合3参数Logistic模型(有6道题不适合)。整个测验的信息函数峰值的位置在难度量表的-3到-2之间,其值为16.82。共有18道题的信息函数峰值在0.2以下。从区分度来看,72道题目的区分度均大于0.5,比较理想。难度参数显示所有题目均较低,绝大部分都在0以下,最高的只有1.01。题目的难度主要由所需的操作水平决定。伪猜测参数在0.07-0.24之间。综合分析表明瑞文测验联合型对正常青年的智力评价精度较差。  相似文献   

17.
Recent work on reliability coefficients has largely focused on continuous items, including critiques of Cronbach’s alpha. Although two new model-based reliability coefficients have been proposed for dichotomous items (Dimitrov, 2003a,b; Green & Yang, 2009a), these approaches have yet to be compared to each other or other popular estimates of reliability such as omega, alpha, and the greatest lower bound. We seek computational improvements to one of these model-based reliability coefficients and, in addition, conduct initial Monte Carlo simulations to compare coefficients using dichotomous data. Our results suggest that such improvements to the model-based approach are warranted, while model-based approaches were generally superior.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号