期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Refining the two‐parameter testlet response model by introducing testlet discrimination parameters

Jian Tao Bao Xu Ning‐Zhong Shi Hong Jiao 《The Japanese psychological research》2013,55(3):284-291

For testlet response data, traditional item response theory (IRT) models are often not appropriate due to local dependence presented among items within a common testlet. Several testlet‐based IRT models have been developed to model examinees' responses. In this paper, a new two‐parameter normal ogive testlet response theory (2PNOTRT) model for dichotomous items is proposed by introducing testlet discrimination parameters. A Bayesian model parameter estimation approach via a data augmentation scheme is developed. Simulations are conducted to evaluate the performance of the proposed 2PNOTRT model. The results indicated that the estimation of item parameters is satisfactory overall from the viewpoint of convergence. Finally, the proposed 2PNOTRT model is applied to a set of real testlet data. 相似文献

2.

Marginal likelihood inference for a model for item responses and response times

Cees A. W. Glas Wim J. van der Linden 《The British journal of mathematical and statistical psychology》2010,63(3):603-626

Marginal maximum‐likelihood procedures for parameter estimation and testing the fit of a hierarchical model for speed and accuracy on test items are presented. The model is a composition of two first‐level models for dichotomous responses and response times along with multivariate normal models for their item and person parameters. It is shown how the item parameters can easily be estimated using Fisher's identity. To test the fit of the model, Lagrange multiplier tests of the assumptions of subpopulation invariance of the item parameters (i.e., no differential item functioning), the shape of the response functions, and three different types of conditional independence were derived. Simulation studies were used to show the feasibility of the estimation and testing procedures and to estimate the power and Type I error rate of the latter. In addition, the procedures were applied to an empirical data set from a computerized adaptive test of language comprehension. 相似文献

3.

Generalized full-information item bifactor analysis 总被引：1，自引：0，他引：1

Cai L Yang JS Hansen M 《心理学方法》2011,16(3):221-248

Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single-group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than 1 group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker's (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood requires only 2-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy. 相似文献

4.

Polytomous IRT models and monotone likelihood ratio of the total score

Bas T. Hemker Klaas Sijtsma Ivo W. Molenaar Brian W. Junker 《Psychometrika》1996,61(4):679-693

In a broad class of item response theory (IRT) models for dichotomous items the unweighted total score has monotone likelihood ratio (MLR) in the latent trait. In this study, it is shown that for polytomous items MLR holds for the partial credit model and a trivial generalization of this model. MLR does not necessarily hold if the slopes of the item step response functions vary over items, item steps, or both. MLR holds neither for Samejima's graded response model, nor for nonparametric versions of these three polytomous models. These results are surprising in the context of Grayson's and Huynh's results on MLR for nonparametric dichotomous IRT models, and suggest that establishing stochastic ordering properties for nonparametric polytomous IRT models will be much harder.Hemker's research was supported by the Netherlands Research Council, Grant 575-67-034. Junker's research was supported in part by the National Institutes of Health, Grant CA54852, and by the National Science Foundation, Grant DMS-94.04438. 相似文献

5.

A general diagnostic model applied to language testing data

Matthias von Davier 《The British journal of mathematical and statistical psychology》2008,61(2):287-307

Probabilistic models with one or more latent variables are designed to report on a corresponding number of skills or cognitive attributes. Multidimensional skill profiles offer additional information beyond what a single test score can provide, if the reported skills can be identified and distinguished reliably. Many recent approaches to skill profile models are limited to dichotomous data and have made use of computationally intensive estimation methods such as Markov chain Monte Carlo, since standard maximum likelihood (ML) estimation techniques were deemed infeasible. This paper presents a general diagnostic model (GDM) that can be estimated with standard ML techniques and applies to polytomous response variables as well as to skills with two or more proficiency levels. The paper uses one member of a larger class of diagnostic models, a compensatory diagnostic model for dichotomous and partial credit data. Many well‐known models, such as univariate and multivariate versions of the Rasch model and the two‐parameter logistic item response theory model, the generalized partial credit model, as well as a variety of skill profile models, are special cases of this GDM. In addition to an introduction to this model, the paper presents a parameter recovery study using simulated data and an application to real data from the field test for TOEFL^® Internet‐based testing. 相似文献

6.

Logistic加权模型的理论构建与模拟分析

简小珠戴步云戴海琦《心理学报》2016,48(12):1625-1630

试题难度、试题考查重要性程度加权是多级记分试题的两个基本属性, 因而在IRT项目特征函数中需用不同参数来表示。以往多级记分模型用多个难度参数来描述多级记分试题的难度, 不能有效的表达多级记分试题的分数权重作用。从多级记分试题的分数加权作用角度, 本文提出Logistic加权模型并论述了理论构建思想。在Logistic加权模型下对项目参数估计的EM算法进行推导并编写了相应的参数估计程序。在Logistic加权模型下进行测验模拟, 发现项目参数估计的模拟返真性能良好。相似文献

7.

高中英语阅读测验中题组模型的选择与应用

马洁刘红云《心理科学》2018,(6):1374-1381

本研究通过高中英语阅读测验实测数据,对比分析双参数逻辑斯蒂克模型 (2PL-IRT)和加入不同数量题组的双参数逻辑斯蒂克模型 (2PL-TRT), 探究题组数量对参数估计及模型拟合的影响。结果表明：(1) 2PL-IRT模型对能力介于-1.50到0.50的被试,能力参数估计偏差较大;(2)将题组效应大于0.50的题组作为局部独立题目纳入模型,会导致部分题目区分度参数的低估和大部分题目难度参数的高估;(3)题组效应越大,将其当作局部独立题目纳入模型估计项目参数的偏差越大。相似文献

8.

A Unified Nonparametric IRT Model for <Emphasis Type="Italic">d</Emphasis>-Dimensional Psychological Test Data (<Emphasis Type="Italic">d</Emphasis>-ISOP)

Hartmann Scheiblechner 《Psychometrika》2007,72(1):43-67

The (univariate) isotonic psychometric (ISOP) model (Scheiblechner, 1995) is a nonparametric IRT model for dichotomous and polytomous (rating scale) psychological test data. A weak subject independence axiom W1 postulates that the subjects are ordered in the same way except for ties (i.e., similarly or isotonically) by all items of a psychological test. A weak item independence axiom W2 postulates that the order of the items is similar for all subjects. Local independence (LI or W3) is assumed in all models. With these axioms, sample-free unidimensional ordinal measurements of items and subjects become feasible. A cancellation axiom (Co) gives, as a result, the additive isotonic psychometric (ADISOP) model and interval scales for subjects and items, and an independence axiom (W4) gives the completely additive isotonic psychometric (CADISOP) model with an interval scale for the response variable (Scheiblechner, 1999). The d-ISOP, d-ADISOP, and d-CADISOP models are generalizations to d-dimensional dependent variables (e.g., speed and accuracy of response). The author would like to thank an Associate Editor and two anonymous referees and also Professor H.H. Schulze for their very valuable suggestions and corrections. 相似文献

9.

基于等级反应模型的属性层级方法 总被引：3，自引：2，他引：1

祝玉芳丁树良《心理学报》2009,41(3):267-275

给出基于等级反应模型的属性层级方法（Attribute Hierarchy Method, AHM）,并简记为GRM-AHM,提出了相应的确定GRM-AHM的期望项目反应模式全集的方法和一种新的归类法LL。用蒙特卡洛模拟实验比较GRM-AHM的几种归类法的归准率（属性模式归准率和单个属性的平均判准率）。结果发现,新归类法的归准率与AHM中的方法A差不多,但比方法B高很多;随着被试作答失误率的提高,它们的归准率都有所下降。在归类精度和简单性方面,GRM-AHM都比Bolt等(2004)提出的多级评分融合模型（Fusion Model）好相似文献

10.

Latent growth curve analysis with dichotomous items: Comparing four approaches

下载免费PDF全文

Feifei Ye 《The British journal of mathematical and statistical psychology》2016,69(1):43-61

A Monte Carlo study was used to compare four approaches to growth curve analysis of subjects assessed repeatedly with the same set of dichotomous items: A two‐step procedure first estimating latent trait measures using MULTILOG and then using a hierarchical linear model to examine the changing trajectories with the estimated abilities as the outcome variable; a structural equation model using modified weighted least squares (WLSMV) estimation; and two approaches in the framework of multilevel item response models, including a hierarchical generalized linear model using Laplace estimation, and Bayesian analysis using Markov chain Monte Carlo (MCMC). These four methods have similar power in detecting the average linear slope across time. MCMC and Laplace estimates perform relatively better on the bias of the average linear slope and corresponding standard error, as well as the item location parameters. For the variance of the random intercept, and the covariance between the random intercept and slope, all estimates are biased in most conditions. For the random slope variance, only Laplace estimates are unbiased when there are eight time points. 相似文献

11.

重参数化多分属性DINA模型的多级评分拓广——基于等级反应模型

王立君赵少勇昌维唐芳詹沛达《心理科学》2022,(1):195-203

多分属性认知诊断模型（CDMs）比传统的二分属性CDMs提供更详细的诊断反馈信息,但现有大部分多分属性CDMs并不具备直接分析多级（或混合）评分数据的功能。本文基于等级反应模型对重参数化多分属性DINA模型进行多级评分拓广,开发一个可处理多级评分数据的等级反应多分属性DINA模型。首先通过实证数据分析呈现新模型的现实可应用性;然后通过模拟研究探究新模型的参数估计返真性。结果表明,新模型满足同时处理多分属性和多级评分数据的现实需求;且具备良好的心理计量学性能,但对测验质量有一定要求（e.g., 题目质量较高且测验Qp矩阵具有完备性等）。相似文献

12.

基于GRM模型的BP神经网络参数估计

下载免费PDF全文

熊建华戴虹罗芬丁树良汪文义《心理科学》2014,37(6):1485-1490

目前参数估计多采用统计方法,存在耗时长、要求被试样本容量大和项目数多等缺点。本文将BP神经网络和降维法相结合,对GRM的项目参数和考生能力参数进行估计。蒙特卡洛模拟结果显示：（1）不管是人多题少还是题多人少,该网络设计下的参数估计精度都较高;（2）可以应用到多个不同等级评分的参数估计中,甚至是超过15个等级的项目参数,估计精度也较高,这是其他参数估计方法所不可比拟的;（3）运行的时长和统计估计方法相比大大缩减。相似文献

13.

Random Item IRT Models

Paul De Boeck 《Psychometrika》2008,73(4):533-559

It is common practice in IRT to consider items as fixed and persons as random. Both, continuous and categorical person parameters are most often random variables, whereas for items only continuous parameters are used and they are commonly of the fixed type, although exceptions occur. It is shown in the present article that random item parameters make sense theoretically, and that in practice the random item approach is promising to handle several issues, such as the measurement of persons, the explanation of item difficulties, and trouble shooting with respect to DIF. In correspondence with these issues, three parts are included. All three rely on the Rasch model as the simplest model to study, and the same data set is used for all applications. First, it is shown that the Rasch model with fixed persons and random items is an interesting measurement model, both, in theory, and for its goodness of fit. Second, the linear logistic test model with an error term is introduced, so that the explanation of the item difficulties based on the item properties does not need to be perfect. Finally, two more models are presented: the random item profile model (RIP) and the random item mixture model (RIM). In the RIP, DIF is not considered a discrete phenomenon, and when a robust regression approach based on the RIP difficulties is applied, quite good DIF identification results are obtained. In the RIM, no prior anchor sets are defined, but instead a latent DIF class of items is used, so that posterior anchoring is realized (anchoring based on the item mixture). It is shown that both approaches are promising for the identification of DIF. 相似文献

14.

Item factor analysis: current approaches and future directions 总被引：2，自引：0，他引：2

Wirth RJ Edwards MC 《心理学方法》2007,12(1):58-79

The rationale underlying factor analysis applies to continuous and categorical variables alike; however, the models and estimation methods for continuous (i.e., interval or ratio scale) data are not appropriate for item-level data that are categorical in nature. The authors provide a targeted review and synthesis of the item factor analysis (IFA) estimation literature for ordered-categorical data (e.g., Likert-type response scales) with specific attention paid to the problems of estimating models with many items and many factors. Popular IFA models and estimation methods found in the structural equation modeling and item response theory literatures are presented. Following this presentation, recent developments in the estimation of IFA parameters (e.g., Markov chain Monte Carlo) are discussed. The authors conclude with considerations for future research on IFA, simulated examples, and advice for applied researchers. 相似文献

15.

On the relationship between item response theory and factor analysis of discretized variables

Yoshio Takane Jan de Leeuw 《Psychometrika》1987,52(3):393-408

Equivalence of marginal likelihood of the two-parameter normal ogive model in item response theory (IRT) and factor analysis of dichotomized variables (FA) was formally proved. The basic result on the dichotomous variables was extended to multicategory cases, both ordered and unordered categorical data. Pair comparison data arising from multiple-judgment sampling were discussed as a special case of the unordered categorical data. A taxonomy of data for the IRT and FA models was also attempted.The work reported in this paper has been supported by Grant A6394 to the first author from the Natural Sciences and Engineering Research Council of Canada. 相似文献

16.

A Bayesian random effects model for testlets 总被引：4，自引：0，他引：4

Eric T. Bradlow Howard Wainer Xiaohui Wang 《Psychometrika》1999,64(2):153-168

Standard item response theory (IRT) models fit to dichotomous examination responses ignore the fact that sets of items (testlets) often come from a single common stimuli (e.g. a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee proficiency). Models that assume conditional independence will overestimate the precision with which examinee proficiency is measured. Overstatement of precision may lead to inaccurate inferences such as prematurely ending an examination in which the stopping rule is based on the estimated standard error of examinee proficiency (e.g., an adaptive test). To model examinations that may be a mixture of independent items and testlets, we modified one standard IRT model to include an additional random effect for items nested within the same testlet. We use a Bayesian framework to facilitate posterior inference via a Data Augmented Gibbs Sampler (DAGS; Tanner & Wong, 1987). The modified and standard IRT models are both applied to a data set from a disclosed form of the SAT. We also provide simulation results that indicates that the degree of precision bias is a function of the variability of the testlet effects, as well as the testlet design.The authors wish to thank Robert Mislevy, Andrew Gelman and Donald B. Rubin for their helpful suggestions and comments, Ida Lawrence and Miriam Feigenbaum for providing us with the SAT data analyzed in section 5, and to the two anonymous referees for their careful reading and thoughtful suggestions on an earlier draft. We are also grateful to the Educational Testing service for providing the resources to do this research. 相似文献

17.

Analysis of the Reliability of the Leadership Practices Inventory in the Item Response Theory Framework

Hugo Zagorsek Stanley J. Stough Marko Jaklic 《International Journal of Selection & Assessment》2006,14(2):180-191

The paper examines the psychometric properties of the leadership practices inventory (LPI) in the framework of item response theory (IRT). The LPI assesses five dimensions (i.e. leadership practices) of transformational leadership and consists of 30 items. IRT is a model‐based theory that relates the characteristics of questionnaire items (item parameters) and characteristics of individuals (latent variables) to the probability of choosing each of the response categories. The theory does not assume that the instrument is equally reliable for all levels of the latent variable examined. Samejima's graded response model was used to estimate LPI item characteristics, such as item difficulty and item discrimination power. The results show that some items are redundant in the sense they contribute little to the overall precision of the instrument. Moreover, the LPI seems to be most precise and reliable for respondents with low to medium leadership competence, whereas it becomes increasingly unreliable for high‐quality leaders. These findings suggest that the LPI is best used for training and development purposes, but not for leader selection purposes. 相似文献

18.

Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy 总被引：1，自引：0，他引：1

Stark S Chernyshenko OS Drasgow F 《The Journal of applied psychology》2006,91(6):1292-1306

In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact. 相似文献

19.

Logical relations and comprehension in conversation

Samuel Vuchinich 《Journal of psycholinguistic research》1980,9(5):473-501

This study examines how logical relations (e.g., causality and identity) in spoken discourse affect comprehension. Research on cohesion, which shows that specific unit template structures link discourse and text together, is used to build a model of language comprehension that places template structures at the base of a context comparison operation. Subjects were engaged in ordinary conversation with a confederate trained to produce specific types of logical utterances unobtrusively. The comprehension model predicted that systematically different latencies, topical response, and remedial response of subjects would follow the test items produced by the confederate. The data support the predictions. It is shown that comprehension occurs via one processing path if there is a direct tie between the target item and the immediately prior item in discourse, and a separate processing path if the tie is between the target item and the earlier context. Subject response in conversation is shown to display useful evidence on the nature of comprehension achieved. The findings specify and extend the recent research on the integration of new information into a textual structure. 相似文献

20.

Flexible Bayesian modelling in dichotomous item response theory using mixtures of skewed item curves

Flávio B. Gonçalves Juliane Venturelli S. L. Rosangela H. Loschi 《The British journal of mathematical and statistical psychology》2023,76(1):69-86

Most item response theory (IRT) models for dichotomous responses are based on probit or logit link functions which assume a symmetric relationship between the probability of a correct response and the latent traits of individuals taking a test. This assumption restricts the use of those models to the case in which all items behave symmetrically. On the other hand, asymmetric models proposed in the literature impose that all the items in a test behave asymmetrically. This assumption is inappropriate for great majority of tests which are, in general, composed of both symmetric and asymmetric items. Furthermore, a straightforward extension of the existing models in the literature would require a prior selection of the items' symmetry/asymmetry status. This paper proposes a Bayesian IRT model that accounts for symmetric and asymmetric items in a flexible but parsimonious way. That is achieved by assigning a finite mixture prior to the skewness parameter, with one of the mixture components being a point mass at zero. This allows for analyses under both model selection and model averaging approaches. Asymmetric item curves are designed through the centred skew normal distribution, which has a particularly appealing parametrization in terms of parameter interpretation and computational efficiency. An efficient Markov chain Monte Carlo algorithm is proposed to perform Bayesian inference and its performance is investigated in some simulated examples. Finally, the proposed methodology is applied to a data set from a large-scale educational exam in Brazil. 相似文献