首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
项目反应理论(IRT)模型依据项目与被试的特征预测被试的作答表现, 是常用的心理测量模型。但IRT的有效运用依赖于所选用IRT模型与实际数据资料相符合的程度(即模型?资料拟合度, goodness of fit)。只有当所采用IRT分析模型与实际数据资料拟合较好时, IRT的优点和功能才能真正发挥出来(Orlando & Thissen, 2000)。而当所采用IRT模型与资料不拟合或选择了错误的模型, 则会导致如参数估计、测验等值及项目功能差异分析等具有较大误差(Kang, Cohen & Sung, 2009), 给实际工作带来不良影响。因此, 在使用IRT分析时, 应首先充分考察及检验所选用模型与实际数据是否相匹配/相拟合(McKinley & Mills, 1985)。IRT领域中常用模型?资料拟合检验统计量可从项目拟合、测验拟合两个角度进行阐述并比较, 这是心理、教育测量领域的重要主题, 也是测验分析过程中较易忽视的环节, 目前还未见此类公开发表的文章。未来的研究可以在各统计量的实证比较研究以及在认知诊断领域的拓展方面有所发展。  相似文献   

2.
3.
There has been renewed interest in Barton and Lord’s (An upper asymptote for the three-parameter logistic item response model (Tech. Rep. No. 80-20). Educational Testing Service, 1981) four-parameter item response model. This paper presents a Bayesian formulation that extends Béguin and Glas (MCMC estimation and some model fit analysis of multidimensional IRT models. Psychometrika, 66 (4):541–561, 2001) and proposes a model for the four-parameter normal ogive (4PNO) model. Monte Carlo evidence is presented concerning the accuracy of parameter recovery. The simulation results support the use of less informative uniform priors for the lower and upper asymptotes, which is an advantage to prior research. Monte Carlo results provide some support for using the deviance information criterion and \(\chi ^{2}\) index to choose among models with two, three, and four parameters. The 4PNO is applied to 7491 adolescents’ responses to a bullying scale collected under the 2005–2006 Health Behavior in School-Aged Children study. The results support the value of the 4PNO to estimate lower and upper asymptotes in large-scale surveys.  相似文献   

4.
We consider the identification of a semiparametric multidimensional fixed effects item response model. Item response models are typically estimated under parametric assumptions about the shape of the item characteristic curves (ICCs), and existing results suggest difficulties in recovering the distribution of individual characteristics under nonparametric assumptions. We show that if the shape of the ICCs are unrestricted, but the shape is common across individuals and items, the individual characteristics are identified. If the shape of the ICCs are allowed to differ over items, the individual characteristics are identified in the multidimensional linear compensatory case but only identified up to a monotonic transformation in the unidimensional case. Our results suggest the development of two new semiparametric estimators for the item response model.  相似文献   

5.
6.
The comparative format used in ranking and paired comparisons tasks can significantly reduce the impact of uniform response biases typically associated with rating scales. Thurstone's (1927, 1931) model provides a powerful framework for modeling comparative data such as paired comparisons and rankings. Although Thurstonian models are generally presented as scaling models, that is, stimuli-centered models, they can also be used as person-centered models. In this article, we discuss how Thurstone's model for comparative data can be formulated as item response theory models so that respondents' scores on underlying dimensions can be estimated. Item parameters and latent trait scores can be readily estimated using a widely used statistical modeling program. Simulation studies show that item characteristic curves can be accurately estimated with as few as 200 observations and that latent trait scores can be recovered to a high precision. Empirical examples are given to illustrate how the model may be applied in practice and to recommend guidelines for designing ranking and paired comparisons tasks in the future.  相似文献   

7.
The application of item response theory (IRT) models requires the identification of the data's dimensionality. A popular method for determining the number of latent dimensions is the factor analysis of a correlation matrix. Unlike factor analysis, which is based on a linear model, IRT assumes a nonlinear relationship between item performance and ability. Because multidimensional scaling (MDS) assumes a monotonic relationship this method may be useful for the assessment of a data set's dimensionality for use with IRT models. This study compared MDS, exploratory and confirmatory factor analysis (EFA and CFA, respectively) in the assessment of the dimensionality of data sets which had been generated to be either one- or two-dimensional. In addition, the data sets differed in the degree of interdimensional correlation and in the number of items defining a dimension. Results showed that MDS and CFA were able to correctly identify the number of latent dimensions for all data sets. In general, EFA was able to correctly identify the data's dimensionality, except for data whose interdimensional correlation was high.  相似文献   

8.

Item response theory (IRT) was applied to evaluate the psychometric properties of the Spiritual Assessment Inventory (SAI; Hall & Edwards, 1996 Hall, T. W. and Edwards, K. J. 1996. The initial development and factor analysis of the spiritual assessment inventory. Journal of Psychology and Theology, 24: 233246. [Crossref], [Web of Science ®] [Google Scholar], 2002 Hall, T. W. and Edwards, K. J. 2002. The spiritual assessment inventory: A theistic model and measure for assessing spiritual development. Journal for the Scientific Study of Religion, 41: 341357. [Crossref], [Web of Science ®] [Google Scholar]). The SAI is a 49-item self-report questionnaire designed to assess five aspects of spirituality: Awareness of God, Disappointment (with God), Grandiosity (excessive self-importance), Realistic Acceptance (of God), and Instability (in one's relationship to God). IRT analysis revealed that for several scales: (a) two or three items per scale carry the psychometric workload and (b) measurement precision is peaked for all five scales, such that one end of the scale, and not the other, is measured precisely. We considered how sample homogeneity and the possible quasi-continuous nature of the SAI constructs may have affected our results and, in light of this, made suggestions for SAI revisions, as well as for measuring spirituality, in general.  相似文献   

9.
Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB).  相似文献   

10.
当观测指标变量为二分分类数据时,传统的因素分析方法不再适用。作者简要回顾了SEM框架下的分类数据因素分析模型和IRT框架下的测验题目和潜在能力的关系模型,并对两种框架下主要采用的参数估计方法进行了总结。通过两个模拟研究,比较了SEM框架下GLSc和MGLSc估计方法与IRT框架下MML/EM估计方法的差异。研究结果表明:(1)三种方法中,GLSc得到参数估计的偏差最大,MGLSc和MML/EM估计方法相差不大;(2)随着样本量增大,各种项目参数估计的精度均提高;(3)项目因素载荷和难度估计的精度受测验长度的影响;(4)项目因素载荷和区分度估计的精度受总体因素载荷(区分度)高低的影响;(5)测验项目中阈值的分布会影响参数估计的精度,其中受影响最大的是项目区分度。(6)总体来看,SEM框架下的项目参数估计精度较IRT框架下项目参数估计的精度高。此外,文章还将两种方法在实际应用中应该注意的问题提供了一些建议。  相似文献   

11.
Abstract

In this paper, we apply Vuong’s general approach of model selection to the comparison of nested and non-nested unidimensional and multidimensional item response theory (IRT) models. Vuong’s approach of model selection is useful because it allows for formal statistical tests of both nested and non-nested models. However, only the test of non-nested models has been applied in the context of IRT models to date. After summarizing the statistical theory underlying the tests, we investigate the performance of all three distinct Vuong tests in the context of IRT models using simulation studies and real data. In the non-nested case we observed that the tests can reliably distinguish between the graded response model and the generalized partial credit model. In the nested case, we observed that the tests typically perform as well as or sometimes better than the traditional likelihood ratio test. Based on these results, we argue that Vuong’s approach provides a useful set of tools for researchers and practitioners to effectively compare competing nested and non-nested IRT models.  相似文献   

12.
等级反应模型下计算机化自适应测验选题策略   总被引:4,自引:3,他引:4  
陈平  丁树良  林海菁  周婕 《心理学报》2006,38(3):461-467
计算机化自适应测验(CAT)中的选题策略,一直是国内外相关学者关注的问题。然而对多级评分的CAT的选题策略的研究却很少报导。本研究采用计算机模拟程序对等级反应模型(Graded Response Model)下CAT的四种选题策略进行研究。研究表明:等级难度值与当前能力估计值匹配选题策略的综合评价最高;在选题策略中增设 “影子题库”可以明显提高项目调用的均匀性;并且不同的项目参数分布或不同的能力估计方法都对CAT评价指标有影响  相似文献   

13.
Jin  Ick Hoon  Jeon  Minjeong 《Psychometrika》2019,84(1):236-260

Item response theory (IRT) is one of the most widely utilized tools for item response analysis; however, local item and person independence, which is a critical assumption for IRT, is often violated in real testing situations. In this article, we propose a new type of analytical approach for item response data that does not require standard local independence assumptions. By adapting a latent space joint modeling approach, our proposed model can estimate pairwise distances to represent the item and person dependence structures, from which item and person clusters in latent spaces can be identified. We provide an empirical data analysis to illustrate an application of the proposed method. A simulation study is provided to evaluate the performance of the proposed method in comparison with existing methods.

  相似文献   

14.
人事决策信息利用和效能预测模型   总被引:1,自引:0,他引:1  
本研究以 1 2 2名人事管理人员为被试 ,采用问卷测量法 ,分析了企业人事决策信息利用的现状 ,信息利用指标和效能的关系。结果发现 :1企业人事决策中比较重视能力要求信息、组织环境信息和个人档案材料 ,忽略心理特点信息的利用 ;2非程序信息、人职匹配信息对效能指标的直接预测力较强 ,其他指标的预测力较弱 ;3能力要求、考核成绩、个人档案材料等信息通过人职匹配信息的利用对效能指标产生影响。文章还构建了人事决策的效能预测模型 ,并讨论了研究结果对企业人事决策诊断和发展的理论和实际意义。  相似文献   

15.
There is a growing use of noncognitive assessments around the world, and recent research has posited an ideal point response process underlying such measures. A critical issue is whether the typical use of dominance approaches (e.g., average scores, factor analysis, and the Samejima's graded response model) in scoring such measures is adequate. This study examined the performance of an ideal point scoring approach (e.g., the generalized graded unfolding model) as compared to the typical dominance scoring approaches in detecting curvilinear relationships between scored trait and external variable. Simulation results showed that when data followed the ideal point model, the ideal point approach generally exhibited more power and provided more accurate estimates of curvilinear effects than the dominance approaches. No substantial difference was found between ideal point and dominance scoring approaches in terms of Type I error rate and bias across different sample sizes and scale lengths, although skewness in the distribution of trait and external variable can potentially reduce statistical power. For dominance data, the ideal point scoring approach exhibited convergence problems in most conditions and failed to perform as well as the dominance scoring approaches. Practical implications for scoring responses to Likert-type surveys to examine curvilinear effects are discussed.  相似文献   

16.
等级反应模型项目特征曲线法等值研究   总被引:2,自引:0,他引:2  
主、客观题并用的测验建项目反应理论题库需作多级模型项目参数等值,本研究推演了等级反应模型下项目特征曲线等值方法并在实际等值试验中获得成功.  相似文献   

17.
贝叶斯网模型提供了一种方便和直观的框架结构来表示变量间的关系,非常适合在诊断测验中对教育评估的内容进行建模。本研究将两种贝叶斯网分类模型与序列多级计分诊断模型S-GDINA进行综合比较。考察两种贝叶斯网分类模型与S-GDINA在Q矩阵正确界定和包含一定比例(25%、 30%)的错误时,两者对被试的分类性能;并将贝叶斯网分类模型应用到实证数据中,展示贝叶斯网分类模型在实证数据中的分类过程和分类性能。研究结果表明:当Q矩阵由专家正确界定时,朴素贝叶斯分类模型的分类效果与S-GDINA模型相差不大,同样可以达到很好的分类效果,树增广的朴素贝叶斯分类模型的分类性能也能达到良好。实证结果进一步表明,将贝叶斯网分类模型应用于教育测量领域中的诊断分类工具是有其优势和可行的,尤其是当测验数据对于所选用诊断模型的拟合较差、测验的Q矩阵中包含错误或测验数据中包含较多的噪音时。  相似文献   

18.
Recent years have shown increased awareness of the importance of personality tests in educational, clinical, and occupational settings, and developing faking-resistant personality tests is a very pragmatic issue for achieving more precise measurement. Inspired by Stark (2002) and Stark, Chernyshenko, and Drasgow (2005), we develop a pairwise preference-based personality test that aims to measure multidimensional personality traits using a large-scale statement bank. An experiment compares the resistance of the developed personality test to faking with that of rating scale-based personality tests in the item response theory model framework. Results show that latent traits estimated from the personality test based on the rating scale method are severely biased, and that faking effect can be pragmatically ignored in the personality test developed based on the pairwise preference method.  相似文献   

19.
项目反应理论等级反应模型项目信息量   总被引:6,自引:1,他引:6  
信息函数作为项目反应理论中的一个重要概念,在进行项目和测验分析的工作中,以及在指导测验编制的工作中,有着非常重要的应用价值。信息函数的应用在计算机化自适应测验中更是重中之重,也受到最大关注。然而,关于多级记分项目信息函数特性的研究还比较少。本研究模拟了被试特质水平参数数据和项目参数数据,其中被试特质水平参数生成了121个被试特质水平参数点,项目参数生成了4批不同区分度参数数据,每批数据有126个不同难度等级参数组合模式的项目,每个项目有5个难度等级。通过数据分析后发现,等级反应模型项目提供最大信息量所对应的被试特质水平,是与该项目几个相互临近的难度等级组相适应,既不是只与其中一个难度等级对应,也不一定是与所有难度等级对应。本研究称这种规律为“临近难度等级占优”。这个发现无疑对测验质量分析和测验编制工作,包括计算机化自适应测验编制,具有重要的指导意义  相似文献   

20.
2PL模型的两种马尔可夫蒙特卡洛缺失数据处理方法比较   总被引:1,自引:0,他引:1  
曾莉  辛涛  张淑梅 《心理学报》2009,41(3):276-282
马尔科夫蒙特卡洛(MCMC)是项目反应理论中处理缺失数据的一种典型方法。文章通过模拟研究比较了在不同被试人数,项目数,缺失比例下两种MCMC方法(M-H within Gibbs和DA-T Gibbs)参数估计的精确性,并结合了实证研究。研究结果表明,两种方法是有差异的,项目参数估计均受被试人数影响很大,受缺失比例影响相对更小。在样本较大缺失比例较小时,M-H within Gibbs参数估计的均方误差(RMSE)相对略小,随着样本数的减少或缺失比例的增加,DA-T Gibbs方法逐渐优于M-H within Gibbs方法  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号