共查询到20条相似文献,搜索用时 0 毫秒
1.
作者简要介绍了多水平项目反应模型,对多水平项目反应理论与通常项目反应理论之间的关系进行了探讨,得到了多水平项目反应模型参数与通常项目反应模型参数之间的关系,并讨论了多水平项目反应模型的推广模型。通过一个实际例子,用多水平项目反应模型对测验中项目的特征进行分析;检验个体水平和组水平预测变量对能力参数的影响;对项目功能差异进行分析。最后文章就多水平项目反应理论模型的优势与不足进行了讨论 相似文献
2.
3.
Todd W. Hall Steven P. Reise Mark G. Haviland 《The International journal for the psychology of religion》2013,23(2):157-178
Item response theory (IRT) was applied to evaluate the psychometric properties of the Spiritual Assessment Inventory (SAI; Hall & Edwards, 1996, 2002). The SAI is a 49-item self-report questionnaire designed to assess five aspects of spirituality: Awareness of God, Disappointment (with God), Grandiosity (excessive self-importance), Realistic Acceptance (of God), and Instability (in one's relationship to God). IRT analysis revealed that for several scales: (a) two or three items per scale carry the psychometric workload and (b) measurement precision is peaked for all five scales, such that one end of the scale, and not the other, is measured precisely. We considered how sample homogeneity and the possible quasi-continuous nature of the SAI constructs may have affected our results and, in light of this, made suggestions for SAI revisions, as well as for measuring spirituality, in general. 相似文献
4.
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models. 相似文献
5.
现在,等值越来越受到各考试测验机构及测量学研究人员的重视,特别是项目反应理论等值的优越性更使他们有了信心。然而,很多人却没有注意到被试能力分布形态可能给等值结果带来的影响效果及程度。本研究以项目反应理论两级记分模型的项目参数等值在不同被试能力分布形态下的结果差异作为重点,探讨被试抽样偏差可能给项目特征曲线等值带来的误差问题。研究结果表明,被试能力分布形态会显著地影响项目参数等值的系数,特别地,能力分布的偏态系数与等值方程的截距存在显著的线性相关关系,但能力分布形态的变化对等值方程中斜率的影响并不明显 相似文献
6.
7.
In multidimensional item response theory (MIRT), it is possible for the estimate of a subject’s ability in some dimension
to decrease after they have answered a question correctly. This paper investigates how and when this type of paradoxical result
can occur. We demonstrate that many response models and statistical estimates can produce paradoxical results and that in
the popular class of linearly compensatory models, maximum likelihood estimates are guaranteed to do so. In light of these
findings, the appropriateness of multidimensional item response methods for assigning scores in high-stakes testing is called
into question. 相似文献
8.
本文首先分析了经典测验理论存在的局限,然后在潜在特质理论和项目特征曲线两大概念基础上阐述了项目反应理论及其基础模型的测量学原理,介绍了多个项目反应理论基础模型.最后简要介绍了七项当前应用项目反应理论指导大型题库建设和指导编制各种新型测验的热点内容. 相似文献
9.
Dylan Molenaar Daniel Oberski Jeroen Vermunt Paul De Boeck 《Multivariate behavioral research》2016,51(5):606-626
Current approaches to model responses and response times to psychometric tests solely focus on between-subject differences in speed and ability. Within subjects, speed and ability are assumed to be constants. Violations of this assumption are generally absorbed in the residual of the model. As a result, within-subject departures from the between-subject speed and ability level remain undetected. These departures may be of interest to the researcher as they reflect differences in the response processes adopted on the items of a test. In this article, we propose a dynamic approach for responses and response times based on hidden Markov modeling to account for within-subject differences in responses and response times. A simulation study is conducted to demonstrate acceptable parameter recovery and acceptable performance of various fit indices in distinguishing between different models. In addition, both a confirmatory and an exploratory application are presented to demonstrate the practical value of the modeling approach. 相似文献
10.
Psychometrika - In item response theory (IRT), it is often necessary to perform restricted recalibration (RR) of the model: A set of (focal) parameters is estimated holding a set of (nuisance)... 相似文献
11.
测验理论的新发展:多维项目反应理论 总被引:3,自引:0,他引:3
多维项目反应理论是基于因子分析和单维项目反应理论两大背景下发展起来的一种新型测验理论。根据被试在完成一项任务时多种能力之间是如何相互作用的,多维项目反应模型可以分为补偿性模型和非补偿性模型两类。本文在系统介绍了当前普遍使用的补偿性模型的基础上,指出后续研究者应关注多维项目反应理论中多级评分和高维空间的多维模型、补偿性和非补偿性模型的融合、参数估计程序的开发和多维测验等值四个方面的研究。 相似文献
12.
13.
Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters. 相似文献
14.
Jung Aa Moon Sandip Sinharay Madeleine Keehner Irvin R. Katz 《International Journal of Testing》2020,20(2):122-145
The current study examined the relationship between test-taker cognition and psychometric item properties in multiple-selection multiple-choice and grid items. In a study with content-equivalent mathematics items in alternative item formats, adult participants’ tendency to respond to an item was affected by the presence of a grid and variations of answer options. The results of an item response theory analysis were consistent with the hypothesized cognitive processes in alternative item formats. The findings suggest that seemingly subtle variations of item design could substantially affect test-taker cognition and psychometric outcomes, emphasizing the need for investigating item format effects at a fine-grained level. 相似文献
15.
16.
计算机形式的测验能够记录考生在测验中的题目作答时间(Response Time, RT),作为一种重要的辅助信息来源,RT对于测验开发和管理具有重要的价值,特别是在计算机化自适应测验(Computerized Adaptive Testing, CAT)领域。本文简要介绍了RT在CAT选题方面应用并作以简评,分析了这些技术在实践中的可行性。最后,探讨了当前RT应用于CAT选题存在的问题以及可以进一步开展的研究方向。 相似文献
17.
Alexis D. Abernethy Seong-Hyeon Kim 《The International journal for the psychology of religion》2013,23(4):240-256
ABSTRACTIn an attempt to measure understudied dimensions of spirituality, recent efforts have focused on the transcendent dimension of spirituality. The Spiritual Transcendence Index (STI) was developed to assess a perceived experience of the sacred that affects one’s ability to transcend life’s difficulties. The main focus of the current study was to investigate the psychometric properties of the STI by utilizing the microscopic item-level examination tools unique in item response theory (IRT), as well as its scale-level exploration devices for psychometric properties of an assessment measure. IRT analyses were conducted to investigate the STI’s psychometric properties across samples (N = 712) including how well the measure assesses the latent construct, spiritual transcendence, from the low to high range of the construct. The findings confirm that the 8-item index is a single factor that assesses the latent construct, spiritual transcendence. Instead of the original 6-category version, these findings support a 4-category response version; the 3 categories of disagreement may be collapsed into a single category. These findings not only inform the refinement of the STI but also highlight an important psychometric approach for the refinement of spirituality/religiousness measures, especially those with ceiling effects. 相似文献
18.
19.
Maximum likelihood and Bayesian ability estimation in multidimensional item response models can lead to paradoxical results
as proven by Hooker, Finkelman, and Schwartzman (Psychometrika 74(3): 419–442, 2009): Changing a correct response on one item into an incorrect response may produce a higher ability estimate in one dimension.
Furthermore, the conditions under which this paradox arises are very general, and may in fact be fulfilled by many of the
multidimensional scales currently in use. This paper tries to emphasize and extend the generality of the results of Hooker
et al. by (1) considering the paradox in a generalized class of IRT models, (2) giving a weaker sufficient condition for the
occurrence of the paradox with relations to an important concept of statistical association, and by (3) providing some additional
specific results for linearly compensatory models with special emphasis on the factor analysis model. 相似文献
20.
In most item response theory applications, model parameters need to be first calibrated from sample data. Latent variable (LV) scores calculated using estimated parameters are thus subject to sampling error inherited from the calibration stage. In this article, we propose a resampling-based method, namely bootstrap calibration (BC), to reduce the impact of the carryover sampling error on the interval estimates of LV scores. BC modifies the quantile of the plug-in posterior, i.e., the posterior distribution of the LV evaluated at the estimated model parameters, to better match the corresponding quantile of the true posterior, i.e., the posterior distribution evaluated at the true model parameters, over repeated sampling of calibration data. Furthermore, to achieve better coverage of the fixed true LV score, we explore the use of BC in conjunction with Jeffreys’ prior. We investigate the finite-sample performance of BC via Monte Carlo simulations and apply it to two empirical data examples. 相似文献