期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Generalized Speed–Accuracy Response Model for Dichotomous Items

Peter W. van Rijn Usama S. Ali 《Psychometrika》2018,83(1):109-131

We propose a generalization of the speed–accuracy response model (SARM) introduced by Maris and van der Maas (Psychometrika 77:615–633, 2012). In these models, the scores that result from a scoring rule that incorporates both the speed and accuracy of item responses are modeled. Our generalization is similar to that of the one-parameter logistic (or Rasch) model to the two-parameter logistic (or Birnbaum) model in item response theory. An expectation–maximization (EM) algorithm for estimating model parameters and standard errors was developed. Furthermore, methods to assess model fit are provided in the form of generalized residuals for item score functions and saddlepoint approximations to the density of the sum score. The presented methods were evaluated in a small simulation study, the results of which indicated good parameter recovery and reasonable type I error rates for the residuals. Finally, the methods were applied to two real data sets. It was found that the two-parameter SARM showed improved fit compared to the one-parameter SARM in both data sets. 相似文献

2.

Logistic加权模型的理论构建与模拟分析

简小珠戴步云戴海琦《心理学报》2016,48(12):1625-1630

试题难度、试题考查重要性程度加权是多级记分试题的两个基本属性, 因而在IRT项目特征函数中需用不同参数来表示。以往多级记分模型用多个难度参数来描述多级记分试题的难度, 不能有效的表达多级记分试题的分数权重作用。从多级记分试题的分数加权作用角度, 本文提出Logistic加权模型并论述了理论构建思想。在Logistic加权模型下对项目参数估计的EM算法进行推导并编写了相应的参数估计程序。在Logistic加权模型下进行测验模拟, 发现项目参数估计的模拟返真性能良好。相似文献

3.

多题多做测验模型及其应用

丁树良罗芬戴海琦朱玮《心理学报》2007,39(4):730-736

在IRT框架下,建立了0-1评分方式下单维双参数Logistic多题多做（MAMI）测验模型。与Spray给出的一题多做（MASI）模型相比,MAMI不仅模型更加精致,而且扩展了适用范围,参数估计方法也不同,采用EM算法求取项目参数。Monte Carlo模拟结果显示,应用MAMI测验模型与测验题量作相应增加的作法相比,两者给出的能力估计精度相同,但MAMI模型给出的项目参数估计精度更高。如果将MAMI测验模型与被试人数相应增加的作法相比,项目参数的估计精度相同,但MAMI给出的能力参数估计精度更高。这个发现表明,在一定条件下若允许修改答案,并采用累加式记分方式,纵使题量不变,也可使能力估计的精度相当于题量增加一倍的估计精度,而项目参数估计精度也会提高。这些发现不仅对技能评价和认知能力评价有参考价值,而且对数据的处理方式也有参考价值相似文献

4.

Analysis of residuals for the multionmial item response model

Mark Reiser 《Psychometrika》1996,61(3):509-528

Using the item response model as developed on the multinomial distribution, asymptotic variances are obtained for residuals associated with response patterns and first-, and second-order marginal frequencies of manifest variables. When the model does not fit well, an examination of these residuals may reveal the source of the poor fit. Finally, a limited-information test of fit for the model is developed by using residuals defined for the first-, and second-order marginals. Model evaluation based on residuals for these marginals is particularly useful when the response pattern frequencies are sparse.The author would like to thank Yasuo Amemiya and Joseph Lucke for helpful suggestions. This research was supported by a Research Incentive Grant from Arizona State University. 相似文献

5.

人格测量：从累积式模型到展开式模型

谢晶方平姜媛《心理学探新》2011,31(5):455-458

当前大多数人格测量都采用的是累积式反应模型方法,该模型假设被试在测验上的得分随其能力或特质提高而增加,但是随着人格测量技术的不断发展,这一模型的实施效果遭到了质疑,研究者们开始关注展开式模型,该模型认为被试的反应取决于被试能力和项目阈值的匹配程度,当被试能力与项目阈值完全匹配时,被试做出肯定回答的概率达到最高点,称之为“理想点”,展开式模型的目的就是找到被试的理想点,从而寻找其真正的态度强度或人格特质水平。GGUM作为一种比较成熟的展开式模型,已经开始应用于人格测量的各个领域,但仍需要进行大规模的试测,在评估和预测效度方面积累经验,建立业界认可的心理测量学标准,不断探讨和开发相应的心理测量理论和简便易行的统计程序。相似文献

6.

随机截距因子分析模型在控制条目表述效应中的应用

韦嘉张春雨赵永萍张进辅《心理科学》2016,39(4):1005-1010

本研究用中文修订版罗森博格自尊量表(RSES-R)考察随机截距因子分析模型在控制条目表述效应时的表现。用RSES-R和过分宣称问卷组成的量表调查621名中学生。结果表明,随机截距模型在建模时,拟合指数良好、因子方差与负荷合理,自尊因子分与RSES-R总分有极高相关,表明该模型能有效分离RSES-R得分的特质与表述效应。分离的表述效应因子分与受测者的自我提升水平具有显著但较弱的相关,表明表述效应与自受测者的社会赞许性有共同的成分。相似文献

7.

Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation

Sun-Joo Cho Paul De Boeck Susan Embretson Sophia Rabe-Hesketh 《Psychometrika》2014,79(1):84-104

An additive multilevel item structure (AMIS) model with random residuals is proposed. The model includes multilevel latent regressions of item discrimination and item difficulty parameters on covariates at both item and item category levels with random residuals at both levels. The AMIS model is useful for explanation purposes and also for prediction purposes as in an item generation context. The parameters can be estimated with an alternating imputation posterior algorithm that makes use of adaptive quadrature, and the performance of this algorithm is evaluated in a simulation study. 相似文献

8.

基于等级反应模型的属性层级方法 总被引：3，自引：2，他引：1

祝玉芳丁树良《心理学报》2009,41(3):267-275

给出基于等级反应模型的属性层级方法（Attribute Hierarchy Method, AHM）,并简记为GRM-AHM,提出了相应的确定GRM-AHM的期望项目反应模式全集的方法和一种新的归类法LL。用蒙特卡洛模拟实验比较GRM-AHM的几种归类法的归准率（属性模式归准率和单个属性的平均判准率）。结果发现,新归类法的归准率与AHM中的方法A差不多,但比方法B高很多;随着被试作答失误率的提高,它们的归准率都有所下降。在归类精度和简单性方面,GRM-AHM都比Bolt等(2004)提出的多级评分融合模型（Fusion Model）好相似文献

9.

A comparison of the efficiency and accuracy of BILOG and LOGIST

Wendy M. Yen 《Psychometrika》1987,52(2):275-291

Comparisons are made between BILOG version 2.2 and LOGIST 5.0 Version 2.5 in estimating the item parameters, traits, item characteristic functions (ICFs), and test characteristic functions (TCFs) for the three-parameter logistic model. Data analyzed are simulated item responses for 1000 simulees and one 10-item test, four 20-item tests, and four 40-item tests. LOGIST usually was faster than BILOG in producing maximum likelihood estimates. BILOG almost always produced more accurate estimates of individual item parameters. In estimating ICFs and TCFs BILOG was more accurate for the 10-item test, and the two programs were about equally accurate for the 20- and 40-item tests.I am grateful to Robert J. Mislevy, Martha L. Stocking, and Marilyn S. Wingersky for many helpful comments on an earlier version of this paper. I would also like to thank Hamid Kamrani and Bongmyoung Park for getting LOGIST and BILOG running and keeping them running under changing computer systems at CTB/McGraw-Hill. 相似文献

10.

不同链接函数下多级评分认知诊断模型的比较及应用研究

苗莹蔡艳史双双张晓涂冬波《心理科学》2019,(2):437-445

认知诊断测验因具有传统测验所不具备的诊断功能而日益受到重视。当前多级评分认知诊断模型开发中,研究者采用不同的链接函数（Link Function）开发出不同的多级评分认知诊断模型。本研究基于局部或相邻类别链接函数(Local or Adjacent Categories Link Function)的思想,开发出多级评分认知诊断模型LC-DINA研究采用Monte Carlo模拟研究与实证应用研究相结合的方法,将新开发模型与已有模型进行比较并应用于国际数学与科学评估（TIMMS）中,为实际应用者提供了借鉴。相似文献

11.

Use of true-score theory to predict moments of univariate and bivariate observed-score distributions

Lord Frederic M. 《Psychometrika》1960,25(4):325-342

Formulas are derived for using the available item statistics and score statistics on a test to estimate the moments of the score distribution of a lengthened (or shortened) form of the same test. Other formulas are derived for estimating the bivariate moments of the scatterplot between two parallel test forms using only the data available on either form alone. An empirical study is made showing in each case satisfactory agreement between the theoretical values predicted from the formulas and the values actually observed. These results suggest the utility of the true-score model used in deriving the formulas.This work was supported by contract Nonr-2752(00) between the Office of Naval Research and Educational Testing Service. Reproduction in whole or in part for any purpose of the United States Government is permitted. 相似文献

12.

Loglinear Rasch model tests 总被引：1，自引：0，他引：1

Hendrikus Kelderman 《Psychometrika》1984,49(2):223-245

Existing statistical tests for the fit of the Rasch model have been criticized, because they are only sensitive to specific violations of its assumptions. Contingency table methods using loglinear models have been used to test various psychometric models. In this paper, the assumptions of the Rasch model are discussed and the Rasch model is reformulated as a quasi-independence model. The model is a quasi-loglinear model for the incomplete subgroup × score × item 1 × item 2 × ... × itemk contingency table. Using ordinary contingency table methods the Rasch model can be tested generally or against less restrictive quasi-loglinear models to investigate specific violations of its assumptions. 相似文献

13.

A response model for multiple choice items

David Thissen Lynne Steinberg 《Psychometrika》1984,49(4):501-519

相似文献

14.

On measurement properties of continuation ratio models

Bas T. Hemker L. Andries van der Ark Klaas Sijtsma 《Psychometrika》2001,66(4):487-506

Three classes of polytomous IRT models are distinguished. These classes are the adjacent category models, the cumulative probability models, and the continuation ratio models. So far, the latter class has received relatively little attention. The class of continuation ratio models includes logistic models, such as the sequential model (Tutz, 1990), and nonlogistic models, such as the acceleration model (Samejima, 1995) and the nonparametric sequential model (Hemker, 1996). Four measurement properties are discussed. These are monotone likelihood ratio of the total score, stochastic ordering of the latent trait by the total score, stochastic ordering of the total score by the latent trait, and invariant item ordering. These properties have been investigated previously for the adjacent category models and the cumulative probability models, and for the continuation ratio models this is done here. It is shown that stochastic ordering of the total score by the latent trait is implied by all continuation ratio models, while monotone likelihood ratio of the total score and stochastic ordering on the latent trait by the total score are not implied by any of the continuation ratio models. Only the sequential rating scale model implies the property of invariant item ordering. Also, we present a Venn-diagram showing the relationships between all known polytomous IRT models from all three classes. 相似文献

15.

The unique correspondence of the item response function and item category response functions in polytomously scored item response models

Hua-Hua Chang John Mazzeo 《Psychometrika》1994,59(3):391-404

The item response function (IRF) for a polytomously scored item is defined as a weighted sum of the item category response functions (ICRF, the probability of getting a particular score for a randomly sampled examinee of ability ). This paper establishes the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model). Specifically, a proof of the following assertion is provided for these models: If two items have the same IRF, then they must have the same number of categories; moreover, they must consist of the same ICRFs. As a corollary, for the Rasch dichotomous model, if two tests have the same test characteristic function (TCF), then they must have the same number of items. Moreover, for each item in one of the tests, an item in the other test with an identical IRF must exist. Theoretical as well as practical implications of these results are discussed.This research was supported by Educational Testing Service Allocation Projects No. 79409 and No. 79413. The authors wish to thank John Donoghue, Ming-Mei Wang, Rebecca Zwick, and Zhiliang Ying for their useful comments and discussions. The authors also wish to thank three anonymous reviewers for their comments. 相似文献

16.

Some improved diagnostics for failure of the Rasch model

Ivo W. Molenaar 《Psychometrika》1983,48(1):49-72

Although several goodness of fit tests have been developed for the Rasch model for dichotomous items, most of them are of a global, asymptotic, and confirmatory type. This paper, based on ideas from a recent thesis by Van den Wollenberg, offers some suggestions for local, small sample, and exploratory techniques: difficulty plots for person groups scoring right and wrong on a specific item, a slope test per item based on a binomial distribution per score group, and a unidimensionality check based on an extended hypergeometric distribution per score group. This paper owes much to the inspiring and pioneering work of Arnold Van den Wollenberg, of which only minor aspects are criticized. Thanks go to Charles Lewis for stimulating discussions and for solutions to some programming problems. 相似文献

17.

Comparing score tests and other local dependence diagnostics for the graded response model

Yang Liu David Thissen 《The British journal of mathematical and statistical psychology》2014,67(3):496-513

Score tests for identifying locally dependent item pairs have been proposed for binary item response models. In this article, both the bifactor and the threshold shift score tests are generalized to the graded response model. For the bifactor test, the generalization is straightforward; it adds one secondary dimension associated only with one pair of items. For the threshold shift test, however, multiple generalizations are possible: in particular, conditional, uniform, and linear shift tests are discussed in this article. Simulation studies show that all of the score tests have accurate Type I error rates given large enough samples, although their small‐sample behaviour is not as good as that of Pearson's Χ² and M₂ as proposed in other studies for the purpose of local dependence (LD) detection. All score tests have the highest power to detect the LD which is consistent with their parametric form, and in this case they are uniformly more powerful than Χ² and M₂; even wrongly specified score tests are more powerful than Χ² and M₂ in most conditions. An example using empirical data is provided for illustration. 相似文献

18.

Assessing Item Fit for Unidimensional Item Response Theory Models Using Residuals from Estimated Item Response Functions

Shelby J. Haberman Sandip Sinharay Kyong Hee Chon 《Psychometrika》2013,78(3):417-440

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models. 相似文献

19.

多级评分的认知诊断计算机化适应测验

蔡艳苗莹涂冬波《心理学报》2016,48(10):1338-1346

本文在0-1评分的CD-CAT基础上, 拓展出了适合多级评分CD-CAT (psCD-CAT)的认知诊断模型及选题策略, 为实现多级评分CD-CAT提供了方法支持。Monte Carlo模拟实验结果表明：本文拓展的多级评分CD-CAT具有较理想的属性诊断正确率及测验效率和题库安全性, 可以用于多级评分数据的CD-CAT; 模拟实验还表明, 整体来看PS-PWKL和PS-HKL两种选题策略具有较高属性判准率、题库安全性和高测验效率, 且均优于PS-KL选题策略。总之, 本研究对于进一步拓展CD-CAT在实践中的应用提供了认知诊断模型与选题策略等。相似文献

20.

Bayesian estimation of a multilevel IRT model using gibbs sampling 总被引：3，自引：0，他引：3

Jean-Paul Fox Cees A. W. Glas 《Psychometrika》2001,66(2):271-288

In this article, a two-level regression model is imposed on the ability parameters in an item response theory (IRT) model. The advantage of using latent rather than observed scores as dependent variables of a multilevel model is that it offers the possibility of separating the influence of item difficulty and ability level and modeling response variation and measurement error. Another advantage is that, contrary to observed scores, latent scores are test-independent, which offers the possibility of using results from different tests in one analysis where the parameters of the IRT model and the multilevel model can be concurrently estimated. The two-parameter normal ogive model is used for the IRT measurement model. It will be shown that the parameters of the two-parameter normal ogive model and the multilevel model can be estimated in a Bayesian framework using Gibbs sampling. Examples using simulated and real data are given. 相似文献