首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In mathematical modeling of cognition, it is important to have well-justified criteria for choosing among differing explanations (i.e., models) of observed data. This paper introduces a Bayesian model selection approach that formalizes Occam’s razor, choosing the simplest model that describes the data well. The choice of a model is carried out by taking into account not only the traditional model selection criteria (i.e., a model’s fit to the data and the number of parameters) but also the extension of the parameter space, and, most importantly, the functional form of the model (i.e., the way in which the parameters are combined in the model’s equation). An advantage of the approach is that it can be applied to the comparison of non-nested models as well as nested ones. Application examples are presented and implications of the results for evaluating models of cognition are discussed.  相似文献   

2.
Abstract

Differential item functioning (DIF) is a pernicious statistical issue that can mask true group differences on a target latent construct. A considerable amount of research has focused on evaluating methods for testing DIF, such as using likelihood ratio tests in item response theory (IRT). Most of this research has focused on the asymptotic properties of DIF testing, in part because many latent variable methods require large samples to obtain stable parameter estimates. Much less research has evaluated these methods in small sample sizes despite the fact that many social and behavioral scientists frequently encounter small samples in practice. In this article, we examine the extent to which model complexity—the number of model parameters estimated simultaneously—affects the recovery of DIF in small samples. We compare three models that vary in complexity: logistic regression with sum scores, the 1-parameter logistic IRT model, and the 2-parameter logistic IRT model. We expected that logistic regression with sum scores and the 1-parameter logistic IRT model would more accurately estimate DIF because these models yielded more stable estimates despite being misspecified. Indeed, a simulation study and empirical example of adolescent substance use show that, even when data are generated from / assumed to be a 2-parameter logistic IRT, using parsimonious models in small samples leads to more powerful tests of DIF while adequately controlling for Type I error. We also provide evidence for minimum sample sizes needed to detect DIF, and we evaluate whether applying corrections for multiple testing is advisable. Finally, we provide recommendations for applied researchers who conduct DIF analyses in small samples.  相似文献   

3.
Log-Multiplicative Association Models as Item Response Models   总被引:1,自引:0,他引:1  
Log-multiplicative association (LMA) models, which are special cases of log-linear models, have interpretations in terms of latent continuous variables. Two theoretical derivations of LMA models based on item response theory (IRT) arguments are presented. First, we show that Anderson and colleagues (Anderson &; Vermunt, 2000; Anderson &; Böckenholt, 2000; Anderson, 2002), who derived LMA models from statistical graphical models, made the equivalent assumptions as Holland (1990) when deriving models for the manifest probabilities of response patterns based on an IRT approach. We also present a second derivation of LMA models where item response functions are specified as functions of rest-scores. These various connections provide insights into the behavior of LMA models as item response models and point out philosophical issues with the use of LMA models as item response models. We show that even for short tests, LMA and standard IRT models yield very similar to nearly identical results when data arise from standard IRT models. Log-multiplicative association models can be used as item response models and do not require numerical integration for estimation.  相似文献   

4.
Using SAS PROC NLMIXED to fit item response theory models   总被引:1,自引:0,他引:1  
Researchers routinely construct tests or questionnaires containing a set of items that measure personality traits, cognitive abilities, political attitudes, and so forth. Typically, responses to these items are scored in discrete categories, such as points on a Likert scale or a choice out of several mutually exclusive alternatives. Item response theory (IRT) explains observed responses to items on a test (questionnaire) by a person’s unobserved trait, ability, or attitude. Although applications of IRT modeling have increased considerably because of its utility in developing and assessing measuring instruments, IRT modeling has not been fully integrated into the curriculum of colleges and universities, mainly because existing general purpose statistical packages do not provide built-in routines with which to perform IRT modeling. Recent advances in statistical theory and the incorporation of those advances into general purpose statistical software such as the Statistical Analysis System (SAS) allow researchers to analyze measurement data by using a class of models known as generalized linear mixed effects models (McCulloch & Searle, 2001), which include IRT models as special cases. The purpose of this article is to demonstrate the generality and flexibility of using SAS to estimate IRT model parameters. With real data examples, we illustrate the implementations of a variety of IRT models for dichotomous, polytomous, and nominal responses. Since SAS is widely available in educational institutions, it is hoped that this article will contribute to the spread of IRT modeling in quantitative courses.  相似文献   

5.
迫选(forced-choice,FC)测验由于可以控制传统李克特方法带来的反应偏差,被广泛应用于非认知测验中,而迫选测验的传统计分方式会产生自模式数据,这种数据由于不适合于个体间的比较,一直备受批评。近年来,多种迫选IRT模型的发展使研究者能够从迫选测验中获得接近常模性的数据,再次引起了研究者与实践人员对迫选IRT模型的兴趣。首先,依据所采纳的决策模型和题目反应模型对6种较为主流的迫选IRT模型进行分类和介绍。然后,从模型构建思路、参数估计方法两个角度对各模型进行比较与总结。其次,从参数不变性检验、计算机化自适应测验(computerized adaptive testing, CAT)和效度研究3个应用研究方面进行述评。最后提出未来研究可以在模型拓展、参数不变性检验、迫选CAT测验和效度研究4个方向深入。  相似文献   

6.
企业人才甄选情境下求职者很容易在人格测验中作假。至今有关作假的研究已包含作假的内涵、来源和识别等多个方面,也诞生了多种心理模型尝试解释作假产生的心理机制,如作假动机与作假能力交互作用理论、作假计划行为理论、作假整合模型、一般作假行为模型以及作假的VIE模型,为后续理论研究点明方向。此外,作假应用领域中新兴的网络人格测验作假受到关注,在此介绍网络与纸笔测验两种形式下,人格测验作假行为、作假意向的不同。  相似文献   

7.
In a broad class of item response theory (IRT) models for dichotomous items the unweighted total score has monotone likelihood ratio (MLR) in the latent trait. In this study, it is shown that for polytomous items MLR holds for the partial credit model and a trivial generalization of this model. MLR does not necessarily hold if the slopes of the item step response functions vary over items, item steps, or both. MLR holds neither for Samejima's graded response model, nor for nonparametric versions of these three polytomous models. These results are surprising in the context of Grayson's and Huynh's results on MLR for nonparametric dichotomous IRT models, and suggest that establishing stochastic ordering properties for nonparametric polytomous IRT models will be much harder.Hemker's research was supported by the Netherlands Research Council, Grant 575-67-034. Junker's research was supported in part by the National Institutes of Health, Grant CA54852, and by the National Science Foundation, Grant DMS-94.04438.  相似文献   

8.
This article proposes a general mixture item response theory (IRT) framework that allows for classes of persons to differ with respect to the type of processes underlying the item responses. Through the use of mixture models, nonnested IRT models with different structures can be estimated for different classes, and class membership can be estimated for each person in the sample. If researchers are able to provide competing measurement models, this mixture IRT framework may help them deal with some violations of measurement invariance. To illustrate this approach, we consider a two-class mixture model, where a person’s responses to Likert-scale items containing a neutral middle category are either modeled using a generalized partial credit model, or through an IRTree model. In the first model, the middle category (“neither agree nor disagree”) is taken to be qualitatively similar to the other categories, and is taken to provide information about the person’s endorsement. In the second model, the middle category is taken to be qualitatively different and to reflect a nonresponse choice, which is modeled using an additional latent variable that captures a person’s willingness to respond. The mixture model is studied using simulation studies and is applied to an empirical example.  相似文献   

9.
阶层线性模型是处理阶层结构数据的高级统计方法, 项目反应理论是精确测量被试能力的现代测量理论。多水平项目反应理论将阶层线性模型和项目反应理论相结合, 将项目反应模型嵌套在阶层线性模型内, 实现了项目参数和不同水平能力参数的估计, 对回归系数和误差项变异的估计也更加精确。作者概述了多水平项目反应理论的发展历程, 并从项目功能差异、测验等值、学校效能研究等方面评述了多水平项目反应理论在心理与教育测量中的应用, 总结了多水平项目反应理论的价值, 同时展望了今后的研究趋势。  相似文献   

10.
项目反应理论是测量被试潜在特质的现代测量理论, 潜在类别分析是基于模型的潜在特质分类技术。混合项目反应理论将项目反应理论与潜在类别分析相结合, 能够同时对被试分类并量化其潜在特质。在阐述混合项目反应理论概念、原理的基础上, 介绍了MRM、mNRM和mPCM等几种常见混合模型及其参数估计方法, 并从心理与行为特征分类、项目功能差异检测、测验效度评价等方面评述了其在心理测验中的应用发展轨迹。  相似文献   

11.
Item response theory (IT) models are now in common use for the analysis of dichotomous item responses. This paper examines the sampling theory foundations for statistical inference in these models. The discussion includes: some history on the stochastic subject versus the random sampling interpretations of the probability in IRT models; the relationship between three versions of maximum likelihood estimation for IRT models; estimating versus estimating -predictors; IRT models and loglinear models; the identifiability of IRT models; and the role of robustness and Bayesian statistics from the sampling theory perspective.A presidential address can serve many different functions. This one is a report of investigations I started at least ten years ago to understand what IRT was all about. It is a decidedly one-sided view, but I hope it stimulates controversy and further research. I have profited from discussions of this material with many people including: Brian Junker, Charles Lewis, Nicholas Longford, Robert Mislevy, Ivo Molenaar, Donald Rock, Donald Rubin, Lynne Steinberg, Martha Stocking, William Stout, Dorothy Thayer, David Thissen, Wim van der Linden, Howard Wainer, and Marilyn Wingersky. Of course, none of them is responsible for any errors or misstatements in this paper. The research was supported in part by the Cognitive Science Program, Office of Naval Research under Contract No. Nooo14-87-K-0730 and by the Program Statistics Research Project of Educational Testing Service.  相似文献   

12.
When categorical ordinal item response data are collected over multiple timepoints from a repeated measures design, an item response theory (IRT) modeling approach whose unit of analysis is an item response is suitable. This study proposes a few longitudinal IRT models and illustrates how a popular compensatory multidimensional IRT model can be utilized to formulate such longitudinal IRT models, which permits an investigation of ability growth at both individual and population levels. The equivalence of an existing multidimensional IRT model and those longitudinal IRT models is also elaborated so that one can make use of an existing multidimensional IRT model to implement the longitudinal IRT models.  相似文献   

13.
This article analyzes latent variable models from a cognitive psychology perspective. We start by discussing work by Tuerlinckx and De Boeck (2005), who proved that a diffusion model for 2-choice response processes entails a 2-parameter logistic item response theory (IRT) model for individual differences in the response data. Following this line of reasoning, we discuss the appropriateness of IRT for measuring abilities and bipolar traits, such as pro versus contra attitudes. Surprisingly, if a diffusion model underlies the response processes, IRT models are appropriate for bipolar traits but not for ability tests. A reconsideration of the concept of ability that is appropriate for such situations leads to a new item response model for accuracy and speed based on the idea that ability has a natural zero point. The model implies fundamentally new ways to think about guessing, response speed, and person fit in IRT. We discuss the relation between this model and existing models as well as implications for psychology and psychometrics.  相似文献   

14.
Most item response theory (IRT) models for dichotomous responses are based on probit or logit link functions which assume a symmetric relationship between the probability of a correct response and the latent traits of individuals taking a test. This assumption restricts the use of those models to the case in which all items behave symmetrically. On the other hand, asymmetric models proposed in the literature impose that all the items in a test behave asymmetrically. This assumption is inappropriate for great majority of tests which are, in general, composed of both symmetric and asymmetric items. Furthermore, a straightforward extension of the existing models in the literature would require a prior selection of the items' symmetry/asymmetry status. This paper proposes a Bayesian IRT model that accounts for symmetric and asymmetric items in a flexible but parsimonious way. That is achieved by assigning a finite mixture prior to the skewness parameter, with one of the mixture components being a point mass at zero. This allows for analyses under both model selection and model averaging approaches. Asymmetric item curves are designed through the centred skew normal distribution, which has a particularly appealing parametrization in terms of parameter interpretation and computational efficiency. An efficient Markov chain Monte Carlo algorithm is proposed to perform Bayesian inference and its performance is investigated in some simulated examples. Finally, the proposed methodology is applied to a data set from a large-scale educational exam in Brazil.  相似文献   

15.
Item response theory (IRT) is supplanting classical test theory as the basis for measures development. This study demonstrated the utility of IRT for evaluating DSM-IV diagnostic criteria. Data on alcohol, cannabis, and cocaine symptoms from 372 adult clinical participants interviewed with the Composite International Diagnostic Interview--Expanded Substance Abuse Module (CIDI-SAM) were analyzed with Mplus (B. Muthen & L. Muthen, 1998) and MULTILOG (D. Thissen, 1991) software. Tolerance and legal problems criteria were dropped because of poor fit with a unidimensional model. Item response curves, test information curves, and testing of variously constrained models suggested that DSM-IV criteria in the CIDI-SAM discriminate between only impaired and less impaired cases and may not be useful to scale case severity. IRT can be used to study the construct validity of DSM-IV diagnoses and to identify diagnostic criteria with poor performance.  相似文献   

16.
刘红云  骆方 《心理学报》2008,40(1):92-100
作者简要介绍了多水平项目反应模型,对多水平项目反应理论与通常项目反应理论之间的关系进行了探讨,得到了多水平项目反应模型参数与通常项目反应模型参数之间的关系,并讨论了多水平项目反应模型的推广模型。通过一个实际例子,用多水平项目反应模型对测验中项目的特征进行分析;检验个体水平和组水平预测变量对能力参数的影响;对项目功能差异进行分析。最后文章就多水平项目反应理论模型的优势与不足进行了讨论  相似文献   

17.
Equivalence of marginal likelihood of the two-parameter normal ogive model in item response theory (IRT) and factor analysis of dichotomized variables (FA) was formally proved. The basic result on the dichotomous variables was extended to multicategory cases, both ordered and unordered categorical data. Pair comparison data arising from multiple-judgment sampling were discussed as a special case of the unordered categorical data. A taxonomy of data for the IRT and FA models was also attempted.The work reported in this paper has been supported by Grant A6394 to the first author from the Natural Sciences and Engineering Research Council of Canada.  相似文献   

18.
Generating items during testing: Psychometric issues and models   总被引:2,自引:0,他引:2  
On-line item generation is becoming increasingly feasible for many cognitive tests. Item generation seemingly conflicts with the well established principle of measuring persons from items with known psychometric properties. This paper examines psychometric principles and models required for measurement from on-line item generation. Three psychometric issues are elaborated for item generation. First, design principles to generate items are considered. A cognitive design system approach is elaborated and then illustrated with an application to a test of abstract reasoning. Second, psychometric models for calibrating generating principles, rather than specific items, are required. Existing item response theory (IRT) models are reviewed and a new IRT model that includes the impact on item discrimination, as well as difficulty, is developed. Third, the impact of item parameter uncertainty on person estimates is considered. Results from both fixed content and adaptive testing are presented.This article is based on the Presidential Address Susan E. Embretson gave on June 26, 1999 at the 1999 Annual Meeting of the Psychometric Society held at the University of Kansas in Lawrence, Kansas. —Editor  相似文献   

19.
Chen  Yunxiao  Li  Xiaoou  Zhang  Siliang 《Psychometrika》2019,84(1):124-146

Joint maximum likelihood (JML) estimation is one of the earliest approaches to fitting item response theory (IRT) models. This procedure treats both the item and person parameters as unknown but fixed model parameters and estimates them simultaneously by solving an optimization problem. However, the JML estimator is known to be asymptotically inconsistent for many IRT models, when the sample size goes to infinity and the number of items keeps fixed. Consequently, in the psychometrics literature, this estimator is less preferred to the marginal maximum likelihood (MML) estimator. In this paper, we re-investigate the JML estimator for high-dimensional exploratory item factor analysis, from both statistical and computational perspectives. In particular, we establish a notion of statistical consistency for a constrained JML estimator, under an asymptotic setting that both the numbers of items and people grow to infinity and that many responses may be missing. A parallel computing algorithm is proposed for this estimator that can scale to very large datasets. Via simulation studies, we show that when the dimensionality is high, the proposed estimator yields similar or even better results than those from the MML estimator, but can be obtained computationally much more efficiently. An illustrative real data example is provided based on the revised version of Eysenck’s Personality Questionnaire (EPQ-R).

  相似文献   

20.
It is often considered desirable to have the same ordering of the items by difficulty across different levels of the trait or ability. Such an ordering is an invariant item ordering (IIO). An IIO facilitates the interpretation of test results. For dichotomously scored items, earlier research surveyed the theory and methods of an invariant ordering in a nonparametric IRT context. Here the focus is on polytomously scored items, and both nonparametric and parametric IRT models are considered.The absence of the IIO property in twononparametric polytomous IRT models is discussed, and two nonparametric models are discussed that imply an IIO. A method is proposed that can be used to investigate whether empirical data imply an IIO. Furthermore, only twoparametric polytomous IRT models are found to imply an IIO. These are the rating scale model (Andrich, 1978) and a restricted rating scale version of the graded response model (Muraki, 1990). Well-known models, such as the partial credit model (Masters, 1982) and the graded response model (Samejima, 1969), do no imply an IIO.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号