首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.

Item response theory (IRT) was applied to evaluate the psychometric properties of the Spiritual Assessment Inventory (SAI; Hall & Edwards, 1996 Hall, T. W. and Edwards, K. J. 1996. The initial development and factor analysis of the spiritual assessment inventory. Journal of Psychology and Theology, 24: 233246. [Crossref], [Web of Science ®] [Google Scholar], 2002 Hall, T. W. and Edwards, K. J. 2002. The spiritual assessment inventory: A theistic model and measure for assessing spiritual development. Journal for the Scientific Study of Religion, 41: 341357. [Crossref], [Web of Science ®] [Google Scholar]). The SAI is a 49-item self-report questionnaire designed to assess five aspects of spirituality: Awareness of God, Disappointment (with God), Grandiosity (excessive self-importance), Realistic Acceptance (of God), and Instability (in one's relationship to God). IRT analysis revealed that for several scales: (a) two or three items per scale carry the psychometric workload and (b) measurement precision is peaked for all five scales, such that one end of the scale, and not the other, is measured precisely. We considered how sample homogeneity and the possible quasi-continuous nature of the SAI constructs may have affected our results and, in light of this, made suggestions for SAI revisions, as well as for measuring spirituality, in general.  相似文献   

3.
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.  相似文献   

4.
In multidimensional item response theory (MIRT), it is possible for the estimate of a subject’s ability in some dimension to decrease after they have answered a question correctly. This paper investigates how and when this type of paradoxical result can occur. We demonstrate that many response models and statistical estimates can produce paradoxical results and that in the popular class of linearly compensatory models, maximum likelihood estimates are guaranteed to do so. In light of these findings, the appropriateness of multidimensional item response methods for assigning scores in high-stakes testing is called into question.  相似文献   

5.
本文首先分析了经典测验理论存在的局限,然后在潜在特质理论和项目特征曲线两大概念基础上阐述了项目反应理论及其基础模型的测量学原理,介绍了多个项目反应理论基础模型.最后简要介绍了七项当前应用项目反应理论指导大型题库建设和指导编制各种新型测验的热点内容.  相似文献   

6.
Liu  Yang  Yang  Ji Seung  Maydeu-Olivares  Alberto 《Psychometrika》2019,84(2):529-553
Psychometrika - In item response theory (IRT), it is often necessary to perform restricted recalibration (RR) of the model: A set of (focal) parameters is estimated holding a set of (nuisance)...  相似文献   

7.
8.
In most item response theory applications, model parameters need to be first calibrated from sample data. Latent variable (LV) scores calculated using estimated parameters are thus subject to sampling error inherited from the calibration stage. In this article, we propose a resampling-based method, namely bootstrap calibration (BC), to reduce the impact of the carryover sampling error on the interval estimates of LV scores. BC modifies the quantile of the plug-in posterior, i.e., the posterior distribution of the LV evaluated at the estimated model parameters, to better match the corresponding quantile of the true posterior, i.e., the posterior distribution evaluated at the true model parameters, over repeated sampling of calibration data. Furthermore, to achieve better coverage of the fixed true LV score, we explore the use of BC in conjunction with Jeffreys’ prior. We investigate the finite-sample performance of BC via Monte Carlo simulations and apply it to two empirical data examples.  相似文献   

9.
应用项目反应理论对瑞文测验联合型的分析   总被引:1,自引:0,他引:1  
使用BILOG-MG3.0软件,边际极大似然估计,3参数Logistic模型对354名不同能力水平的男性青年的瑞文测验联合型数据进行了分析。结果显示:大多数瑞文测验联合型的题目都适合3参数Logistic模型(有6道题不适合)。整个测验的信息函数峰值的位置在难度量表的-3到-2之间,其值为16.82。共有18道题的信息函数峰值在0.2以下。从区分度来看,72道题目的区分度均大于0.5,比较理想。难度参数显示所有题目均较低,绝大部分都在0以下,最高的只有1.01。题目的难度主要由所需的操作水平决定。伪猜测参数在0.07-0.24之间。综合分析表明瑞文测验联合型对正常青年的智力评价精度较差。  相似文献   

10.
Recently, there has been increasing interest in reporting subscores. This paper examines reporting of subscores using multidimensional item response theory (MIRT) models (e.g., Reckase in Appl. Psychol. Meas. 21:25–36, 1997; C.R. Rao and S. Sinharay (Eds), Handbook of Statistics, vol. 26, pp. 607–642, North-Holland, Amsterdam, 2007; Beguin & Glas in Psychometrika, 66:471–488, 2001). A MIRT model is fitted using a stabilized Newton–Raphson algorithm (Haberman in The Analysis of Frequency Data, University of Chicago Press, Chicago, 1974; Sociol. Methodol. 18:193–211, 1988) with adaptive Gauss–Hermite quadrature (Haberman, von Davier, & Lee in ETS Research Rep. No. RR-08-45, ETS, Princeton, 2008). A new statistical approach is proposed to assess when subscores using the MIRT model have any added value over (i)  the total score or (ii)  subscores based on classical test theory (Haberman in J. Educ. Behav. Stat. 33:204–229, 2008; Haberman, Sinharay, & Puhan in Br. J. Math. Stat. Psychol. 62:79–95, 2008). The MIRT-based methods are applied to several operational data sets. The results show that the subscores based on MIRT are slightly more accurate than subscore estimates derived by classical test theory.  相似文献   

11.
Maximum likelihood and Bayesian ability estimation in multidimensional item response models can lead to paradoxical results as proven by Hooker, Finkelman, and Schwartzman (Psychometrika 74(3): 419–442, 2009): Changing a correct response on one item into an incorrect response may produce a higher ability estimate in one dimension. Furthermore, the conditions under which this paradox arises are very general, and may in fact be fulfilled by many of the multidimensional scales currently in use. This paper tries to emphasize and extend the generality of the results of Hooker et al. by (1) considering the paradox in a generalized class of IRT models, (2) giving a weaker sufficient condition for the occurrence of the paradox with relations to an important concept of statistical association, and by (3) providing some additional specific results for linearly compensatory models with special emphasis on the factor analysis model.  相似文献   

12.
Usually, methods for detection of differential item functioning (DIF) compare the functioning of items across manifest groups. However, the manifest groups with respect to which the items function differentially may not necessarily coincide with the true source of the bias. It is expected that DIF detection under a model that includes a latent DIF variable is more sensitive to this source of bias. In a simulation study, it is shown that a mixture item response theory model, which includes a latent grouping variable, performs better in identifying DIF items than DIF detection methods using manifest variables only. The difference between manifest and latent DIF detection increases as the correlation between the manifest variable and the true source of the DIF becomes smaller. Different sample sizes, relative group sizes, and significance levels are studied. Finally, an empirical example demonstrates the detection of heterogeneity in a minority sample using a latent grouping variable. Manifest and latent DIF detection methods are applied to a Vocabulary test of the General Aptitude Test Battery (GATB).  相似文献   

13.
戴海崎  简小珠 《心理科学》2005,28(6):1433-1436
被试能力参数估计是项目反应理论应用研究最重要的技术之一。本文在理想的测验情境下,研究被试作答的偶然性对被试能力值估计的影响。研究设计了被试作答的两种偶然性情况:一是偶然做对了一道项目难度高于其能力值的试题,二是偶然做错了一道或几道项目难度低于其能力值的试题.然后分别探讨了这两种情况下对被试的能力估计所带来的影响,并且就如何消除这些偶然性所带来的影响提出了相应的方法。  相似文献   

14.
Tijmstra  Jesper  Bolsinova  Maria 《Psychometrika》2019,84(3):846-869

The assumption of latent monotonicity is made by all common parametric and nonparametric polytomous item response theory models and is crucial for establishing an ordinal level of measurement of the item score. Three forms of latent monotonicity can be distinguished: monotonicity of the cumulative probabilities, of the continuation ratios, and of the adjacent-category ratios. Observable consequences of these different forms of latent monotonicity are derived, and Bayes factor methods for testing these consequences are proposed. These methods allow for the quantification of the evidence both in favor and against the tested property. Both item-level and category-level Bayes factors are considered, and their performance is evaluated using a simulation study. The methods are applied to an empirical example consisting of a 10-item Likert scale to investigate whether a polytomous item scoring rule results in item scores that are of ordinal level measurement.

  相似文献   

15.
Bounds are established for log odds ratios (log cross-product ratios) involving pairs of items for item response models. First, expressions for bounds on log odds ratios are provided for one-dimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model. Results are also illustrated through an example from a study of model-checking procedures. The bounds obtained can provide an elementary basis for assessment of goodness of fit of these models. Any opinions expressed in this publication are those of the authors and not necessarily those of the Educational Testing Service. The authors thank Dan Eignor, Matthias von Davier, Lydia Gladkova, Brian Junker, and the three anonymous reviewers for their invaluable advice. The authors gratefully acknowledge the help of Kim Fryer with proofreading.  相似文献   

16.
Despite recent technological advances in mass survey administration, surveys seeking to measure personality traits still employ procedures identical to the traditional paper-and-pencil scales, in which every respondent is asked exactly the same items, regardless of his or her trait level. We present an empirical application of personality measurement in which the number and sequence of scale items are tailored to the trait level of each respondent and show that the total number of questions asked of each respondent could be reduced substantially, with a measurable and controllable increase in the standard error of measurement.  相似文献   

17.
该文使用非参数项目反应理论的Mokken量表及其构建程序MSP,探索性地分析HSK[初中等]听力、语法结构和阅读三个部分中40个题的潜在维度,并籍此评价此方法的优劣.实验表明:题组是多维的,阅读题的区分能力和一致性最强,能有效地聚合成一类;听力题次之,语法结构题最差;此方法存在很多不足,尤其是题目区分能力对分类的干扰与界定分类阶段的标准问题.  相似文献   

18.
This paper provides an introduction to two commonly used item response theory (IRT) models (the two-parameter logistic model and the graded response model). Throughout the paper, the Need for Cognition Scale (NCS) is used to help illustrate different features of the IRT model. After introducing the IRT models, I explore the assumptions these models make as well as ways to assess the extent to which those assumptions are plausible. Next, I describe how adopting an IRT approach to measurement can change how one thinks about scoring, score precision, and scale construction. I briefly introduce the advanced topics of differential item functioning and computerized adaptive testing before concluding with a summary of what was learned about IRT generally, and the NCS specifically.  相似文献   

19.
The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter θ is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given level of significance in many cases; and, therefore, the corresponding intervals are no longer confidence intervals in terms of the actual definition. In the present work, confidence intervals are defined more precisely by utilizing the relationship between confidence intervals and hypothesis testing. Two approaches to confidence interval construction are explored that are optimal with respect to criteria of smallness and consistency with the standard approach.  相似文献   

20.
Wang  Chun  Xu  Gongjun  Zhang  Xue 《Psychometrika》2019,84(3):673-700
Psychometrika - When latent variables are used as outcomes in regression analysis, a common approach that is used to solve the ignored measurement error issue is to take a multilevel perspective on...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号