首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
当观测指标变量为二分分类数据时,传统的因素分析方法不再适用。作者简要回顾了SEM框架下的分类数据因素分析模型和IRT框架下的测验题目和潜在能力的关系模型,并对两种框架下主要采用的参数估计方法进行了总结。通过两个模拟研究,比较了SEM框架下GLSc和MGLSc估计方法与IRT框架下MML/EM估计方法的差异。研究结果表明:(1)三种方法中,GLSc得到参数估计的偏差最大,MGLSc和MML/EM估计方法相差不大;(2)随着样本量增大,各种项目参数估计的精度均提高;(3)项目因素载荷和难度估计的精度受测验长度的影响;(4)项目因素载荷和区分度估计的精度受总体因素载荷(区分度)高低的影响;(5)测验项目中阈值的分布会影响参数估计的精度,其中受影响最大的是项目区分度。(6)总体来看,SEM框架下的项目参数估计精度较IRT框架下项目参数估计的精度高。此外,文章还将两种方法在实际应用中应该注意的问题提供了一些建议。  相似文献   

2.
The authors discuss the applicability of nonparametric item response theory (IRT) models to the construction and psychometric analysis of personality and psychopathology scales, and they contrast these models with parametric IRT models. They describe the fit of nonparametric IRT to the Depression content scale of the Minnesota Multiphasic Personality Inventory--2 (J. N. Butcher, W. G. Dahlstrom, J. R. Graham, A. Tellegen, & B. Kaemmer, 1989). They also show how nonparametric IRT models can easily be applied and how misleading results from parametric IRT models can be avoided. They recommend the use of nonparametric IRT modeling prior to using parametric logistic models when investigating personality data.  相似文献   

3.
刘红云  骆方  王玥  张玉 《心理学报》2012,44(1):121-132
作者简要回顾了SEM框架下分类数据因素分析(CCFA)模型和MIRT框架下测验题目和潜在能力的关系模型, 对两种框架下的主要参数估计方法进行了总结。通过模拟研究, 比较了SEM框架下WLSc和WLSMV估计方法与MIRT框架下MLR和MCMC估计方法的差异。研究结果表明:(1) WLSc得到参数估计的偏差最大, 且存在参数收敛的问题; (2)随着样本量增大, 各种项目参数估计的精度均提高, WLSMV方法与MLR方法得到的参数估计精度差异很小, 大多数情况下不比MCMC方法差; (3)除WLSc方法外, 随着每个维度测验题目的增多参数估计的精度逐渐增高; (4)测验维度对区分度参数和难度参数的影响较大, 而测验维度对项目因素载荷和阈值的影响相对较小; (5)项目参数的估计精度受项目测量维度数的影响, 只测量一个维度的项目参数估计精度较高。另外文章还对两种方法在实际应用中应该注意的问题提供了一些建议。  相似文献   

4.
Item factor analysis: current approaches and future directions   总被引:2,自引:0,他引:2  
The rationale underlying factor analysis applies to continuous and categorical variables alike; however, the models and estimation methods for continuous (i.e., interval or ratio scale) data are not appropriate for item-level data that are categorical in nature. The authors provide a targeted review and synthesis of the item factor analysis (IFA) estimation literature for ordered-categorical data (e.g., Likert-type response scales) with specific attention paid to the problems of estimating models with many items and many factors. Popular IFA models and estimation methods found in the structural equation modeling and item response theory literatures are presented. Following this presentation, recent developments in the estimation of IFA parameters (e.g., Markov chain Monte Carlo) are discussed. The authors conclude with considerations for future research on IFA, simulated examples, and advice for applied researchers.  相似文献   

5.
The application of psychological measures often results in item response data that arguably are consistent with both unidimensional (a single common factor) and multidimensional latent structures (typically caused by parcels of items that tap similar content domains). As such, structural ambiguity leads to seemingly endless "confirmatory" factor analytic studies in which the research question is whether scale scores can be interpreted as reflecting variation on a single trait. An alternative to the more commonly observed unidimensional, correlated traits, or second-order representations of a measure's latent structure is a bifactor model. Bifactor structures, however, are not well understood in the personality assessment community and thus rarely are applied. To address this, herein we (a) describe issues that arise in conceptualizing and modeling multidimensionality, (b) describe exploratory (including Schmid-Leiman [Schmid & Leiman, 1957] and target bifactor rotations) and confirmatory bifactor modeling, (c) differentiate between bifactor and second-order models, and (d) suggest contexts where bifactor analysis is particularly valuable (e.g., for evaluating the plausibility of subscales, determining the extent to which scores reflect a single variable even when the data are multidimensional, and evaluating the feasibility of applying a unidimensional item response theory (IRT) measurement model). We emphasize that the determination of dimensionality is a related but distinct question from either determining the extent to which scores reflect a single individual difference variable or determining the effect of multidimensionality on IRT item parameter estimates. Indeed, we suggest that in many contexts, multidimensional data can yield interpretable scale scores and be appropriately fitted to unidimensional IRT models.  相似文献   

6.
迫选测验的传统计分方式会产生自模式数据, 不能进行传统的信效度检验、因素分析和方差分析等。近年来研究者提出了一些基于项目反应理论的计分模型, 如瑟斯顿IRT模型和MUPP模型等, 它们可以规避自模式数据的弊端。瑟斯顿IRT模型方便进行参数估计, 模型定义灵活; 而MUPP模型的拓展性较差, 参数估计的方法有待提高。另一方面, 已有研究者基于MUPP模型开发了一些抗作假的迫选测验, 而瑟斯顿IRT模型距离这种应用还比较远。此外, 两个模型的适用性和有效性都有待更多的实证研究来检验。  相似文献   

7.
Missing data, such as item responses in multilevel data, are ubiquitous in educational research settings. Researchers in the item response theory (IRT) context have shown that ignoring such missing data can create problems in the estimation of the IRT model parameters. Consequently, several imputation methods for dealing with missing item data have been proposed and shown to be effective when applied with traditional IRT models. Additionally, a nonimputation direct likelihood analysis has been shown to be an effective tool for handling missing observations in clustered data settings. This study investigates the performance of six simple imputation methods, which have been found to be useful in other IRT contexts, versus a direct likelihood analysis, in multilevel data from educational settings. Multilevel item response data were simulated on the basis of two empirical data sets, and some of the item scores were deleted, such that they were missing either completely at random or simply at random. An explanatory IRT model was used for modeling the complete, incomplete, and imputed data sets. We showed that direct likelihood analysis of the incomplete data sets produced unbiased parameter estimates that were comparable to those from a complete data analysis. Multiple-imputation approaches of the two-way mean and corrected item mean substitution methods displayed varying degrees of effectiveness in imputing data that in turn could produce unbiased parameter estimates. The simple random imputation, adjusted random imputation, item means substitution, and regression imputation methods seemed to be less effective in imputing missing item scores in multilevel data settings.  相似文献   

8.
9.
An exploratory item-level full-information factor analysis was performed on the normative sample for the MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989). This method of factor analysis, developed by Schilling and Bock (Bock & Schilling, 1997) and based on item response theory, works directly with the response patterns and avoids the artifacts associated with phi coefficients and tetrachoric coefficients. Promax rotation of the factor solution organizes the clinical scale items into 10 factors that we labeled Distrust, Self-Doubt, Fitness, Serenity, Rebelliousness, Instrumentality, Irritability, Artistry, Sociability, and Self-Reliance. A comparison was made to the results of Johnson, Butcher, Null, and Johnson (1984), who performed a principal-component analysis on an item set of 550 items from the previous version of the MMPI (Hathaway & McKinley, 1943). Along with version changes and sampling differences, the essential differences between Johnson et al.'s results and ours may be attributed to differences between the Schilling and Bock method, which uses all information in the item responses, and the principal-component analysis, which uses the partial information contained in pairwise correlation coefficients. This study included 518 of the complete 567 items of the MMPI-2, versus Johnson et al.'s retention of 309 of the initially included 550 items of the previous MMPI. The full-information analysis retained all 518 initially included items and more evenly distributed the items over the 10 resulting factors, all sharply defined by their highest loading items and easy to interpret. Sampling effects and factor label considerations are discussed, along with recommendations for research that would validate the clinical utility of the implied scales for describing normal personality profiles. The full-information procedure provides for Bayes estimation of scores on these scales.  相似文献   

10.
An exploratory item-level full-information factor analysis was performed on the normative sample for the MMPI-2 (Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989). This method of factor analysis, developed by Schilling and Bock (Bock & Schilling, 1997) and based on item response theory, works directly with the response patterns and avoids the artifacts associated with phi coefficients and tetrachoric coefficients. Promax rotation of the factor solution organizes the clinical scale items into 10 factors that we labeled Distrust, Self-Doubt, Fitness, Serenity, Rebelliousness, Instrumentality, Irritability, Artistry, Sociability, and Self-Reliance. A comparison was made to the results of Johnson, Butcher, Null, and Johnson (1984), who performed a principal-component analysis on an item set of 550 items from the previous version of the MMPI (Hathaway & McKinley, 1943). Along with version changes and sampling differences, the essential differences between Johnson et al.'s results and ours may be attributed to differences between the Schilling and Bock method, which uses all information in the item responses, and the principal-component analysis, which uses the partial information contained in pairwise correlation coefficients. This study included 518 of the complete 567 items of the MMPI-2, versus Johnson et al.'s retention of 309 of the initially included 550 items of the previous MMPI. The full-information analysis retained all 518 initially included items and more evenly distributed the items over the 10 resulting factors, all sharply defined by their highest loading items and easy to interpret. Sampling effects and factor label considerations are discussed, along with recommendations for research that would validate the clinical utility of the implied scales for describing normal personality profiles. The full-information procedure provides for Bayes estimation of scores on these scales.  相似文献   

11.
刘红云  骆方 《心理学报》2008,40(1):92-100
作者简要介绍了多水平项目反应模型,对多水平项目反应理论与通常项目反应理论之间的关系进行了探讨,得到了多水平项目反应模型参数与通常项目反应模型参数之间的关系,并讨论了多水平项目反应模型的推广模型。通过一个实际例子,用多水平项目反应模型对测验中项目的特征进行分析;检验个体水平和组水平预测变量对能力参数的影响;对项目功能差异进行分析。最后文章就多水平项目反应理论模型的优势与不足进行了讨论  相似文献   

12.
Over the past decade, Mokken scale analysis (MSA) has rapidly grown in popularity among researchers from many different research areas. This tutorial provides researchers with a set of techniques and a procedure for their application, such that the construction of scales that have superior measurement properties is further optimized, taking full advantage of the properties of MSA. First, we define the conceptual context of MSA, discuss the two item response theory (IRT) models that constitute the basis of MSA, and discuss how these models differ from other IRT models. Second, we discuss dos and don'ts for MSA; the don'ts include misunderstandings we have frequently encountered with researchers in our three decades of experience with real‐data MSA. Third, we discuss a methodology for MSA on real data that consist of a sample of persons who have provided scores on a set of items that, depending on the composition of the item set, constitute the basis for one or more scales, and we use the methodology to analyse an example real‐data set.  相似文献   

13.
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.  相似文献   

14.
Using SAS PROC NLMIXED to fit item response theory models   总被引:1,自引:0,他引:1  
Researchers routinely construct tests or questionnaires containing a set of items that measure personality traits, cognitive abilities, political attitudes, and so forth. Typically, responses to these items are scored in discrete categories, such as points on a Likert scale or a choice out of several mutually exclusive alternatives. Item response theory (IRT) explains observed responses to items on a test (questionnaire) by a person’s unobserved trait, ability, or attitude. Although applications of IRT modeling have increased considerably because of its utility in developing and assessing measuring instruments, IRT modeling has not been fully integrated into the curriculum of colleges and universities, mainly because existing general purpose statistical packages do not provide built-in routines with which to perform IRT modeling. Recent advances in statistical theory and the incorporation of those advances into general purpose statistical software such as the Statistical Analysis System (SAS) allow researchers to analyze measurement data by using a class of models known as generalized linear mixed effects models (McCulloch & Searle, 2001), which include IRT models as special cases. The purpose of this article is to demonstrate the generality and flexibility of using SAS to estimate IRT model parameters. With real data examples, we illustrate the implementations of a variety of IRT models for dichotomous, polytomous, and nominal responses. Since SAS is widely available in educational institutions, it is hoped that this article will contribute to the spread of IRT modeling in quantitative courses.  相似文献   

15.
测验垂直等值是指将测试同一心理特质的不同水平的测验转换到同一个分数量尺上的过程。IRT与MIRT是实现垂直等值的主要方法。IRT无需假设被试的能力分布, 参数估计不依赖于样本, 是构建垂直量表的有效方法, 但测验不满足单维假设时其应用受到限制。MIRT结合IRT和因素分析的特点对IRT进行了拓展, 可更有效估计多维测验的项目参数和被试能力参数, 在垂直等值中有重要应用。已有研究主要探讨IRT和MIRT在垂直等值应用中的适用性、标定方法和参数估计方法, 比较研究两种方法的特性。未来研究应纳入更多变量条件进行比较研究, 拓展方法的应用。  相似文献   

16.
In a broad class of item response theory (IRT) models for dichotomous items the unweighted total score has monotone likelihood ratio (MLR) in the latent trait. In this study, it is shown that for polytomous items MLR holds for the partial credit model and a trivial generalization of this model. MLR does not necessarily hold if the slopes of the item step response functions vary over items, item steps, or both. MLR holds neither for Samejima's graded response model, nor for nonparametric versions of these three polytomous models. These results are surprising in the context of Grayson's and Huynh's results on MLR for nonparametric dichotomous IRT models, and suggest that establishing stochastic ordering properties for nonparametric polytomous IRT models will be much harder.Hemker's research was supported by the Netherlands Research Council, Grant 575-67-034. Junker's research was supported in part by the National Institutes of Health, Grant CA54852, and by the National Science Foundation, Grant DMS-94.04438.  相似文献   

17.
《认知与教导》2013,31(4):503-521
Simple arithmetic word problems are often featured in elementary school education. One type of problem, "compare with unknown reference set," ranks among the most difficult to solve. Differences in item difficulty for compare problems with unknown reference set are observed depending on the direction of the relational statement (more than vs. less than). Various cognitive models have been proposed to account for these differences. We employed item response theory (IRT) to compare competing cognitive models of student performance. The responses of 100 second-grade students to a series of compare problems with unknown reference set, along with other measures of individual differences, were fit to IRT models. Results indicated that the construction integration model (Kintsch, 1988, 1998) provided the best fit to the data. We discuss the potential contribution of psychometric approaches to the study of thinking.  相似文献   

18.
Reise SP  Henson JM 《Assessment》2000,7(4):347-364
This study asks, how well does an item response theory (IRT) based computerized adaptive NEO PI-R work? To explore this question, real-data simulations (N = 1,059) were used to evaluate a maximum information item selection computerized adaptive test (CAT) algorithm. Findings indicated satisfactory recovery of full-scale facet scores with the administration of around four items per facet scale. Thus, the NEO PI-R could be reduced in half with little loss in precision by CAT administration. However, results also indicated that the CAT algorithm was not necessary. We found that for many scales, administering the "best" four items per facet scale would have produced similar results. In the conclusion, we discuss the future of computerized personality assessment and describe the role IRT methods might play in such assessments.  相似文献   

19.
The application of item response theory (IRT) models requires the identification of the data's dimensionality. A popular method for determining the number of latent dimensions is the factor analysis of a correlation matrix. Unlike factor analysis, which is based on a linear model, IRT assumes a nonlinear relationship between item performance and ability. Because multidimensional scaling (MDS) assumes a monotonic relationship this method may be useful for the assessment of a data set's dimensionality for use with IRT models. This study compared MDS, exploratory and confirmatory factor analysis (EFA and CFA, respectively) in the assessment of the dimensionality of data sets which had been generated to be either one- or two-dimensional. In addition, the data sets differed in the degree of interdimensional correlation and in the number of items defining a dimension. Results showed that MDS and CFA were able to correctly identify the number of latent dimensions for all data sets. In general, EFA was able to correctly identify the data's dimensionality, except for data whose interdimensional correlation was high.  相似文献   

20.
尽管多阶段测验(MST)在保持自适应测验优点的同时允许测验编制者按照一定的约束条件去建构每一个模块和题板,但建构测验时若因忽视某些潜在的因素而导致题目之间出现局部题目依赖性(LID)时,也会对MST测验结果带来一定的危害。为探究"LID对MST的危害"这一问题,本研究首先介绍了MST和LID等相关概念;然后通过模拟研究比较探讨该问题,结果表明LID的存在会影响被试能力估计的精度但仍为估计偏差较小,且该危害不限于某一特定的路由规则;之后为消除该危害,使用了题组反应模型作为MST施测过程中的分析模型,结果表明尽管该方法能够消除部分危害但效果有限。这一方面表明LID对MST中被试能力估计精度所带来的危害确实值得关注,另一方面也表明在今后关于如何消除MST中由LID造成危害的方法仍值得进一步探究的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号