期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Assessing Item Fit for Unidimensional Item Response Theory Models Using Residuals from Estimated Item Response Functions

Shelby J. Haberman Sandip Sinharay Kyong Hee Chon 《Psychometrika》2013,78(3):417-440

Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models. 相似文献

2.

项目反应理论中模型-资料拟合检验常用统计量

单昕彤谭辉晔刘永吴方文涂冬波《心理科学进展》2014,22(8):1350-1362

项目反应理论(IRT)模型依据项目与被试的特征预测被试的作答表现, 是常用的心理测量模型。但IRT的有效运用依赖于所选用IRT模型与实际数据资料相符合的程度(即模型?资料拟合度, goodness of fit)。只有当所采用IRT分析模型与实际数据资料拟合较好时, IRT的优点和功能才能真正发挥出来(Orlando & Thissen, 2000)。而当所采用IRT模型与资料不拟合或选择了错误的模型, 则会导致如参数估计、测验等值及项目功能差异分析等具有较大误差(Kang, Cohen & Sung, 2009), 给实际工作带来不良影响。因此, 在使用IRT分析时, 应首先充分考察及检验所选用模型与实际数据是否相匹配/相拟合(McKinley & Mills, 1985)。IRT领域中常用模型?资料拟合检验统计量可从项目拟合、测验拟合两个角度进行阐述并比较, 这是心理、教育测量领域的重要主题, 也是测验分析过程中较易忽视的环节, 目前还未见此类公开发表的文章。未来的研究可以在各统计量的实证比较研究以及在认知诊断领域的拓展方面有所发展。相似文献

3.

A Doubly Latent Space Joint Model for Local Item and Person Dependence in the Analysis of Item Response Data

Jin Ick Hoon Jeon Minjeong 《Psychometrika》2019,84(1):236-260

Item response theory (IRT) is one of the most widely utilized tools for item response analysis; however, local item and person independence, which is a critical assumption for IRT, is often violated in real testing situations. In this article, we propose a new type of analytical approach for item response data that does not require standard local independence assumptions. By adapting a latent space joint modeling approach, our proposed model can estimate pairwise distances to represent the item and person dependence structures, from which item and person clusters in latent spaces can be identified. We provide an empirical data analysis to illustrate an application of the proposed method. A simulation study is provided to evaluate the performance of the proposed method in comparison with existing methods.

相似文献

4.

Specifying Ability Growth Models Using a Multidimensional Item Response Model for Repeated Measures Categorical Ordinal Item Response Data

Insu Paek Zhen Li Hyun-Jeong Park 《Multivariate behavioral research》2016,51(4):569-580

When categorical ordinal item response data are collected over multiple timepoints from a repeated measures design, an item response theory (IRT) modeling approach whose unit of analysis is an item response is suitable. This study proposes a few longitudinal IRT models and illustrates how a popular compensatory multidimensional IRT model can be utilized to formulate such longitudinal IRT models, which permits an investigation of ability growth at both individual and population levels. The equivalence of an existing multidimensional IRT model and those longitudinal IRT models is also elaborated so that one can make use of an existing multidimensional IRT model to implement the longitudinal IRT models. 相似文献

5.

An Introduction to Item Response Theory Using the Need for Cognition Scale

Michael C. Edwards 《Social and Personality Psychology Compass》2009,3(4):507-529

This paper provides an introduction to two commonly used item response theory (IRT) models (the two-parameter logistic model and the graded response model). Throughout the paper, the Need for Cognition Scale (NCS) is used to help illustrate different features of the IRT model. After introducing the IRT models, I explore the assumptions these models make as well as ways to assess the extent to which those assumptions are plausible. Next, I describe how adopting an IRT approach to measurement can change how one thinks about scoring, score precision, and scale construction. I briefly introduce the advanced topics of differential item functioning and computerized adaptive testing before concluding with a summary of what was learned about IRT generally, and the NCS specifically. 相似文献

6.

Item Response Modeling With BILOG-MG and MULTILOG for Windows

《International Journal of Testing》2013,13(4):365-384

Item response theory (IRT) has become one of the most popular scoring frameworks for measurement data. IRT models are used frequently in computerized adaptive testing, cognitively diagnostic assessment, and test equating. This article reviews two of the most popular software packages for IRT model estimation, BILOG-MG (Zimowski, Muraki, Mislevy, & Bock, 1996) and MULTILOG (Thissen, 1991), which are for the first time available on a single CD-ROM with new features. Most prominently, the number of items to be calibrated and examinees to be scored is now limited only by memory capacities of the hardware, MULTILOG has an interactive Windows-oriented process for creating basic command file syntax, and both BILOG-MG and MULTILOG come with a new graphics interface that displays numerous curves relevant to IRT analyses in a professional format. This article reviews the models that are and are not estimable with these programs and describes the fundamental ideas of the underlying estimation algorithms without providing detailed derivations. Moreover, the user-friendliness of both programs is assessed with a user in mind who is interested in easy-to-use IRT estimation programs within a Windows point-and-click environment. Both programs fulfill such an expectation to a large degree; yet, this review also points out some obstacles that someone relatively unfamiliar to IRT or syntax programming might have to overcome to obtain meaningful results. 相似文献

7.

多水平IRT的发展与应用述评

刘慧简小珠张敏强熊悦欣《心理科学进展》2012,20(4):627-632

阶层线性模型是处理阶层结构数据的高级统计方法, 项目反应理论是精确测量被试能力的现代测量理论。多水平项目反应理论将阶层线性模型和项目反应理论相结合, 将项目反应模型嵌套在阶层线性模型内, 实现了项目参数和不同水平能力参数的估计, 对回归系数和误差项变异的估计也更加精确。作者概述了多水平项目反应理论的发展历程, 并从项目功能差异、测验等值、学校效能研究等方面评述了多水平项目反应理论在心理与教育测量中的应用, 总结了多水平项目反应理论的价值, 同时展望了今后的研究趋势。相似文献

8.

混合IRT潜在模型及其应用轨迹

王霞谭国华王旭张敏强骆聪《心理科学进展》2014,22(3):540-548

项目反应理论是测量被试潜在特质的现代测量理论, 潜在类别分析是基于模型的潜在特质分类技术。混合项目反应理论将项目反应理论与潜在类别分析相结合, 能够同时对被试分类并量化其潜在特质。在阐述混合项目反应理论概念、原理的基础上, 介绍了MRM、mNRM和mPCM等几种常见混合模型及其参数估计方法, 并从心理与行为特征分类、项目功能差异检测、测验效度评价等方面评述了其在心理测验中的应用发展轨迹。相似文献

9.

Automated Item Generation with Recurrent Neural Networks

Matthias von Davier 《Psychometrika》2018,83(4):847-857

Utilizing technology for automated item generation is not a new idea. However, test items used in commercial testing programs or in research are still predominantly written by humans, in most cases by content experts or professional item writers. Human experts are a limited resource and testing agencies incur high costs in the process of continuous renewal of item banks to sustain testing programs. Using algorithms instead holds the promise of providing unlimited resources for this crucial part of assessment development. The approach presented here deviates in several ways from previous attempts to solve this problem. In the past, automatic item generation relied either on generating clones of narrowly defined item types such as those found in language free intelligence tests (e.g., Raven’s progressive matrices) or on an extensive analysis of task components and derivation of schemata to produce items with pre-specified variability that are hoped to have predictable levels of difficulty. It is somewhat unlikely that researchers utilizing these previous approaches would look at the proposed approach with favor; however, recent applications of machine learning show success in solving tasks that seemed impossible for machines not too long ago. The proposed approach uses deep learning to implement probabilistic language models, not unlike what Google brain and Amazon Alexa use for language processing and generation. 相似文献

10.

Modeling Rule-Based Item Generation

Hanneke Geerlings Cees A. W. Glas Wim J. van der Linden 《Psychometrika》2011,76(2):337-359

An application of a hierarchical IRT model for items in families generated through the application of different combinations of design rules is discussed. Within the families, the items are assumed to differ only in surface features. The parameters of the model are estimated in a Bayesian framework, using a data-augmented Gibbs sampler. An obvious application of the model is computerized algorithmic item generation. Such algorithms have the potential to increase the cost-effectiveness of item generation as well as the flexibility of item administration. The model is applied to data from a non-verbal intelligence test created using design rules. In addition, results from a simulation study conducted to evaluate parameter recovery are presented. 相似文献

11.

Model Selection of Nested and Non-Nested Item Response Models Using Vuong Tests

Lennart Schneider R. Philip Chalmers Rudolf Debelak Edgar C. Merkle 《Multivariate behavioral research》2020,55(5):664-684

Abstract

In this paper, we apply Vuong’s general approach of model selection to the comparison of nested and non-nested unidimensional and multidimensional item response theory (IRT) models. Vuong’s approach of model selection is useful because it allows for formal statistical tests of both nested and non-nested models. However, only the test of non-nested models has been applied in the context of IRT models to date. After summarizing the statistical theory underlying the tests, we investigate the performance of all three distinct Vuong tests in the context of IRT models using simulation studies and real data. In the non-nested case we observed that the tests can reliably distinguish between the graded response model and the generalized partial credit model. In the nested case, we observed that the tests typically perform as well as or sometimes better than the traditional likelihood ratio test. Based on these results, we argue that Vuong’s approach provides a useful set of tools for researchers and practitioners to effectively compare competing nested and non-nested IRT models. 相似文献

12.

Log-Multiplicative Association Models as Item Response Models 总被引：1，自引：0，他引：1

Carolyn J. Anderson Hsiu-Ting Yu 《Psychometrika》2007,72(1):5-23

Log-multiplicative association (LMA) models, which are special cases of log-linear models, have interpretations in terms of latent continuous variables. Two theoretical derivations of LMA models based on item response theory (IRT) arguments are presented. First, we show that Anderson and colleagues (Anderson &; Vermunt, 2000; Anderson &; Böckenholt, 2000; Anderson, 2002), who derived LMA models from statistical graphical models, made the equivalent assumptions as Holland (1990) when deriving models for the manifest probabilities of response patterns based on an IRT approach. We also present a second derivation of LMA models where item response functions are specified as functions of rest-scores. These various connections provide insights into the behavior of LMA models as item response models and point out philosophical issues with the use of LMA models as item response models. We show that even for short tests, LMA and standard IRT models yield very similar to nearly identical results when data arise from standard IRT models. Log-multiplicative association models can be used as item response models and do not require numerical integration for estimation. 相似文献

13.

Is signal detection theory fundamentally flawed? A response to Balakrishnan (1998a, 1998b, 1999)

Treisman M 《Psychonomic bulletin & review》2002,9(4):845-857

For nearly 50 years, signal detection theory (SDT; Green & Swets, 1966; Macmillan & Creelman, 1991) has been of central importance in the development of psychophysics and other areas of psychology. The theory has recently been challenged by Balakrishnan (1998b), who argues that, within SDT, an alternative index is “better justified” than d’ and who claims to show (1998a, 1999) that SDT is fundamentally flawed and should be rejected. His evidence is based on new nonparametric measures that he has introduced and applied to experimental data. He believes his results show that basic assumptions of SDT are not supported—in particular, that payoff and probability manipulations do not affect the position of the decision criterion. In view of the importance of SDT in psychology, these claims deserve careful examination. They are critically reviewed here. It appears that it is Balakrishnan’s arguments that fail, and not SDT. 相似文献

14.

项目特征曲线等值的抽样误差

罗照盛熊建华漆书青戴海琦丁树良《心理学报》2007,39(4):723-729

现在,等值越来越受到各考试测验机构及测量学研究人员的重视,特别是项目反应理论等值的优越性更使他们有了信心。然而,很多人却没有注意到被试能力分布形态可能给等值结果带来的影响效果及程度。本研究以项目反应理论两级记分模型的项目参数等值在不同被试能力分布形态下的结果差异作为重点,探讨被试抽样偏差可能给项目特征曲线等值带来的误差问题。研究结果表明,被试能力分布形态会显著地影响项目参数等值的系数,特别地,能力分布的偏态系数与等值方程的截距存在显著的线性相关关系,但能力分布形态的变化对等值方程中斜率的影响并不明显相似文献

15.

The person response function as a tool in person-fit research

Klaas Sijtsma Rob R. Meijer 《Psychometrika》2001,66(2):191-207

Item responses that do not fit an item response theory (IRT) model may cause the latent trait value to be inaccurately estimated. In the past two decades several statistics have been proposed that can be used to identify nonfitting item score patterns. These statistics all yieldscalar values. Here, the use of the person response function (PRF) for identifying nonfitting item score patterns was investigated. The PRF is afunction and can be used for diagnostic purposes. First, the PRF is defined in a class of IRT models that imply an invariant item ordering. Second, a person-fit method proposed by Trabin & Weiss (1983) is reformulated in a nonparametric IRT context assuming invariant item ordering, and statistical theory proposed by Rosenbaum (1987a) is adapted to test locally whether a PRF is nonincreasing. Third, a simulation study was conducted to compare the use of the PRF with the person-fit statistic ZU3. It is concluded that the PRF can be used as a diagnostic tool in person-fit research.The authors are grateful to Coen A. Bernaards for preparing the figures used in this article, and to Wilco H.M. Emons for checking the calculations. 相似文献

16.

The Assessment of Dimensionality for Use in Item Response Theory

《Multivariate behavioral research》2013,48(4):765-792

The application of item response theory (IRT) models requires the identification of the data's dimensionality. A popular method for determining the number of latent dimensions is the factor analysis of a correlation matrix. Unlike factor analysis, which is based on a linear model, IRT assumes a nonlinear relationship between item performance and ability. Because multidimensional scaling (MDS) assumes a monotonic relationship this method may be useful for the assessment of a data set's dimensionality for use with IRT models. This study compared MDS, exploratory and confirmatory factor analysis (EFA and CFA, respectively) in the assessment of the dimensionality of data sets which had been generated to be either one- or two-dimensional. In addition, the data sets differed in the degree of interdimensional correlation and in the number of items defining a dimension. Results showed that MDS and CFA were able to correctly identify the number of latent dimensions for all data sets. In general, EFA was able to correctly identify the data's dimensionality, except for data whose interdimensional correlation was high. 相似文献

17.

Item analysis and differential item functioning of a brief conduct problem screen

Wu J King KM Witkiewitz K Racz SJ McMahon RJ;Conduct Problems Prevention Research Group 《心理评价》2012,24(2):444-454

Research has shown that boys display higher levels of childhood conduct problems than girls, and Black children display higher levels than White children, but few studies have tested for scalar equivalence of conduct problems across gender and race. The authors conducted a 2-parameter item response theory (IRT) model to examine item characteristics of the Authority Acceptance scale from the Teacher Observation of Classroom Adaptation-Revised (AA-TOCA-R; L. Larsson-Werthamer, S. G. Kellam, & L. Wheeler, 1991) in 8,820 kindergarten children and estimated the degree of differential item functioning (DIF) by gender and race/urban status. The mean level of latent conduct problems was best represented by behaviors such as being stubborn, breaking rules, and being disobedient, whereas breaking things and taking others' property best represented the construct at one standard deviation above the mean. DIF by gender was detected, such that at equivalent levels of latent conduct problems, males received more endorsements of overt behaviors from teachers, whereas females received more endorsements of nonphysical behaviors. Moreover, overt behaviors were better discriminators of latent conduct problems for males, and nonphysical behaviors were better discriminators of latent conduct problems for females. Differences across race/urban status were not found to be conceptually meaningful. The authors' analyses also suggest that the item scaling of the AA-TOCA-R may be best represented by 5e categories instead of 6. These findings provide support for the use of IRT modeling to examine item characteristics of conduct problem scales and DIF to test for scalar equivalence across diverse subpopulations. 相似文献

18.

多水平项目反应理论模型在测验发展中的应用

刘红云骆方《心理学报》2008,40(1):92-100

作者简要介绍了多水平项目反应模型,对多水平项目反应理论与通常项目反应理论之间的关系进行了探讨,得到了多水平项目反应模型参数与通常项目反应模型参数之间的关系,并讨论了多水平项目反应模型的推广模型。通过一个实际例子,用多水平项目反应模型对测验中项目的特征进行分析;检验个体水平和组水平预测变量对能力参数的影响;对项目功能差异进行分析。最后文章就多水平项目反应理论模型的优势与不足进行了讨论相似文献

19.

A differential item functioning analysis of the PSDQ with Turkish and New Zealand/Australian adolescents

F. Hülya Aşçı Richard B. Fletcher Emine Çağlar 《Psychology of sport and exercise》2009,10(1):12-18

相似文献

20.

A Note on the Reliability Coefficients for Item Response Model-Based Ability Estimates

Seonghoon?Kim Email author 《Psychometrika》2012,77(1):153-162

Assuming item parameters on a test are known constants, the reliability coefficient for item response theory (IRT) ability estimates is defined for a population of examinees in two different ways: as (a) the product-moment correlation between ability estimates on two parallel forms of a test and (b) the squared correlation between the true abilities and estimates. Due to the bias of IRT ability estimates, the parallel-forms reliability coefficient is not generally equal to the squared-correlation reliability coefficient. It is shown algebraically that the parallel-forms reliability coefficient is expected to be greater than the squared-correlation reliability coefficient, but the difference would be negligible in a practical sense. 相似文献