期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Paradoxical Results in Multidimensional Item Response Theory

Giles Hooker Matthew Finkelman Armin Schwartzman 《Psychometrika》2009,74(3):419-442

In multidimensional item response theory (MIRT), it is possible for the estimate of a subject’s ability in some dimension to decrease after they have answered a question correctly. This paper investigates how and when this type of paradoxical result can occur. We demonstrate that many response models and statistical estimates can produce paradoxical results and that in the popular class of linearly compensatory models, maximum likelihood estimates are guaranteed to do so. In light of these findings, the appropriateness of multidimensional item response methods for assigning scores in high-stakes testing is called into question. 相似文献

2.

Generalizations of Paradoxical Results in Multidimensional Item Response Theory

Pascal?Jordan Email author Martin?Spiess 《Psychometrika》2012,77(1):127-152

Maximum likelihood and Bayesian ability estimation in multidimensional item response models can lead to paradoxical results as proven by Hooker, Finkelman, and Schwartzman (Psychometrika 74(3): 419–442, 2009): Changing a correct response on one item into an incorrect response may produce a higher ability estimate in one dimension. Furthermore, the conditions under which this paradox arises are very general, and may in fact be fulfilled by many of the multidimensional scales currently in use. This paper tries to emphasize and extend the generality of the results of Hooker et al. by (1) considering the paradox in a generalized class of IRT models, (2) giving a weaker sufficient condition for the occurrence of the paradox with relations to an important concept of statistical association, and by (3) providing some additional specific results for linearly compensatory models with special emphasis on the factor analysis model. 相似文献

3.

Paradoxical Results and Item Bundles

Giles Hooker Matthew Finkelman 《Psychometrika》2010,75(2):249-271

Hooker, Finkelman, and Schwartzman (Psychometrika, 2009, in press) defined a paradoxical result as the attainment of a higher test score by changing answers from correct to incorrect and demonstrated that such results are unavoidable for maximum likelihood estimates in multidimensional item response theory. The potential for these results to occur leads to the undesirable possibility of a subject’s best answer being detrimental to them. This paper considers the existence of paradoxical results in tests composed of item bundles when compensatory models are used. We demonstrate that paradoxical results can occur when bundle effects are modeled as nuisance parameters for each subject. However, when these nuisance parameters are modeled as random effects, or used in a Bayesian analysis, it is possible to design tests comprised of many short bundles that avoid paradoxical results and we provide an algorithm for doing so. We also examine alternative models for handling dependence between item bundles and show that using fixed dependency effects is always guaranteed to avoid paradoxical results. 相似文献

4.

A Proposed Number Correct Scoring Procedure Based on Classical True-Score Theory and Multidimensional Item Response Theory

《International Journal of Testing》2013,13(2):131-141

A hybrid procedure for number correct scoring is proposed. The proposed scoring procedure is based on both classical true-score theory (CTT) and multidimensional item response theory (MIRT). Specifically, the hybrid scoring procedure uses test item weights based on MIRT and the total test scores are computed based on CTT. Thus, what makes the hybrid scoring method attractive is that this method accounts for the dimensionality of the test items while test scores remain easy to compute. Further, the hybrid scoring does not require large sample sizes once the item parameters are known. Monte Carlo techniques were used to compare and contrast the proposed hybrid scoring method with three other scoring procedures. Results indicated that all scoring methods in this study generated estimated and true scores that were highly correlated. However, the hybrid scoring procedure had significantly smaller error variances between the estimated and true scores relative to the other procedures. 相似文献

5.

测验理论的新发展:多维项目反应理论 总被引：3，自引：0，他引：3

康春花辛涛《心理科学进展》2010,18(3):530-536

多维项目反应理论是基于因子分析和单维项目反应理论两大背景下发展起来的一种新型测验理论。根据被试在完成一项任务时多种能力之间是如何相互作用的,多维项目反应模型可以分为补偿性模型和非补偿性模型两类。本文在系统介绍了当前普遍使用的补偿性模型的基础上,指出后续研究者应关注多维项目反应理论中多级评分和高维空间的多维模型、补偿性和非补偿性模型的融合、参数估计程序的开发和多维测验等值四个方面的研究。相似文献

6.

Detecting Curvilinear Relationships: A Comparison of Scoring Approaches Based on Different Item Response Models

Mengyang Cao Q. Chelsea Song Louis Tay 《International Journal of Testing》2018,18(2):178-205

There is a growing use of noncognitive assessments around the world, and recent research has posited an ideal point response process underlying such measures. A critical issue is whether the typical use of dominance approaches (e.g., average scores, factor analysis, and the Samejima's graded response model) in scoring such measures is adequate. This study examined the performance of an ideal point scoring approach (e.g., the generalized graded unfolding model) as compared to the typical dominance scoring approaches in detecting curvilinear relationships between scored trait and external variable. Simulation results showed that when data followed the ideal point model, the ideal point approach generally exhibited more power and provided more accurate estimates of curvilinear effects than the dominance approaches. No substantial difference was found between ideal point and dominance scoring approaches in terms of Type I error rate and bias across different sample sizes and scale lengths, although skewness in the distribution of trait and external variable can potentially reduce statistical power. For dominance data, the ideal point scoring approach exhibited convergence problems in most conditions and failed to perform as well as the dominance scoring approaches. Practical implications for scoring responses to Likert-type surveys to examine curvilinear effects are discussed. 相似文献

7.

Specifying Ability Growth Models Using a Multidimensional Item Response Model for Repeated Measures Categorical Ordinal Item Response Data

Insu Paek Zhen Li Hyun-Jeong Park 《Multivariate behavioral research》2016,51(4):569-580

When categorical ordinal item response data are collected over multiple timepoints from a repeated measures design, an item response theory (IRT) modeling approach whose unit of analysis is an item response is suitable. This study proposes a few longitudinal IRT models and illustrates how a popular compensatory multidimensional IRT model can be utilized to formulate such longitudinal IRT models, which permits an investigation of ability growth at both individual and population levels. The equivalence of an existing multidimensional IRT model and those longitudinal IRT models is also elaborated so that one can make use of an existing multidimensional IRT model to implement the longitudinal IRT models. 相似文献

8.

Reporting of Subscores Using Multidimensional Item Response Theory

Shelby J. Haberman Sandip Sinharay 《Psychometrika》2010,75(2):209-227

Recently, there has been increasing interest in reporting subscores. This paper examines reporting of subscores using multidimensional item response theory (MIRT) models (e.g., Reckase in Appl. Psychol. Meas. 21:25–36, 1997; C.R. Rao and S. Sinharay (Eds), Handbook of Statistics, vol. 26, pp. 607–642, North-Holland, Amsterdam, 2007; Beguin & Glas in Psychometrika, 66:471–488, 2001). A MIRT model is fitted using a stabilized Newton–Raphson algorithm (Haberman in The Analysis of Frequency Data, University of Chicago Press, Chicago, 1974; Sociol. Methodol. 18:193–211, 1988) with adaptive Gauss–Hermite quadrature (Haberman, von Davier, & Lee in ETS Research Rep. No. RR-08-45, ETS, Princeton, 2008). A new statistical approach is proposed to assess when subscores using the MIRT model have any added value over (i) the total score or (ii) subscores based on classical test theory (Haberman in J. Educ. Behav. Stat. 33:204–229, 2008; Haberman, Sinharay, & Puhan in Br. J. Math. Stat. Psychol. 62:79–95, 2008). The MIRT-based methods are applied to several operational data sets. The results show that the subscores based on MIRT are slightly more accurate than subscore estimates derived by classical test theory. 相似文献

9.

Additive Multilevel Item Structure Models with Random Residuals: Item Modeling for Explanation and Item Generation

Sun-Joo Cho Paul De Boeck Susan Embretson Sophia Rabe-Hesketh 《Psychometrika》2014,79(1):84-104

An additive multilevel item structure (AMIS) model with random residuals is proposed. The model includes multilevel latent regressions of item discrimination and item difficulty parameters on covariates at both item and item category levels with random residuals at both levels. The AMIS model is useful for explanation purposes and also for prediction purposes as in an item generation context. The parameters can be estimated with an alternating imputation posterior algorithm that makes use of adaptive quadrature, and the performance of this algorithm is evaluated in a simulation study. 相似文献

10.

On the Complexity of Item Response Theory Models

Wes Bonifay Li Cai 《Multivariate behavioral research》2017,52(4):465-484

相似文献

11.

Latent Variable Selection for Multidimensional Item Response Theory Models via $$L_{1}$$ Regularization

Jianan Sun Yunxiao Chen Jingchen Liu Zhiliang Ying Tao Xin 《Psychometrika》2016,81(4):921-939

We develop a latent variable selection method for multidimensional item response theory models. The proposed method identifies latent traits probed by items of a multidimensional test. Its basic strategy is to impose an $L_{1}$ penalty term to the log-likelihood. The computation is carried out by the expectation–maximization algorithm combined with the coordinate descent algorithm. Simulation studies show that the resulting estimator provides an effective way in correctly identifying the latent structures. The method is applied to a real dataset involving the Eysenck Personality Questionnaire. 相似文献

12.

A Partially Confirmatory Approach to the Multidimensional Item Response Theory with the Bayesian Lasso

Chen Jinsong 《Psychometrika》2020,85(3):738-774

Psychometrika - For test development in the setting of multidimensional item response theory, the exploratory and confirmatory approaches lie on two ends of a continuum in terms of the loading and... 相似文献

13.

Restricted Recalibration of Item Response Theory Models

Liu Yang Yang Ji Seung Maydeu-Olivares Alberto 《Psychometrika》2019,84(2):529-553

Psychometrika - In item response theory (IRT), it is often necessary to perform restricted recalibration (RR) of the model: A set of (focal) parameters is estimated holding a set of (nuisance)... 相似文献

14.

Identifying the Source of Misfit in Item Response Theory Models

Yang Liu Alberto Maydeu-Olivares 《Multivariate behavioral research》2013,48(4):354-371

When an item response theory model fails to fit adequately, the items for which the model provides a good fit and those for which it does not must be determined. To this end, we compare the performance of several fit statistics for item pairs with known asymptotic distributions under maximum likelihood estimation of the item parameters: (a) a mean and variance adjustment to bivariate Pearson's X², (b) a bivariate subtable analog to Reiser's (1996) overall goodness-of-fit test, (c) a z statistic for the bivariate residual cross product, and (d) Maydeu-Olivares and Joe's (2006) M₂ statistic applied to bivariate subtables. The unadjusted Pearson's X² with heuristically determined degrees of freedom is also included in the comparison. For binary and ordinal data, our simulation results suggest that the z statistic has the best Type I error and power behavior among all the statistics under investigation when the observed information matrix is used in its computation. However, if one has to use the cross-product information, the mean and variance adjusted X² is recommended. We illustrate the use of pairwise fit statistics in 2 real-data examples and discuss possible extensions of the current research in various directions. 相似文献

15.

Profile-likelihood Confidence Intervals in Item Response Theory Models

R. Philip Chalmers Jolynn Pek Yang Liu 《Multivariate behavioral research》2017,52(5):533-550

Confidence intervals (CIs) are fundamental inferential devices which quantify the sampling variability of parameter estimates. In item response theory, CIs have been primarily obtained from large-sample Wald-type approaches based on standard error estimates, derived from the observed or expected information matrix, after parameters have been estimated via maximum likelihood. An alternative approach to constructing CIs is to quantify sampling variability directly from the likelihood function with a technique known as profile-likelihood confidence intervals (PL CIs). In this article, we introduce PL CIs for item response theory models, compare PL CIs to classical large-sample Wald-type CIs, and demonstrate important distinctions among these CIs. CIs are then constructed for parameters directly estimated in the specified model and for transformed parameters which are often obtained post-estimation. Monte Carlo simulation results suggest that PL CIs perform consistently better than Wald-type CIs for both non-transformed and transformed parameters. 相似文献

16.

Item Response Models for Forced-Choice Questionnaires: A Common Framework

Anna Brown 《Psychometrika》2016,81(1):135-160

相似文献

17.

高阶项目反应模型的发展与应用

陈飞鹏詹沛达王立君陈春晓蔡毛《心理科学进展》2015,23(1):150-157

在测量具有层阶结构的潜质时, 标准项目反应模型对项目参数估计和能力参数估计都具有较低的效率, 多维项目反应模型虽然在估计第一阶潜质时具有高效性, 但没有考虑到潜质层阶的情况, 所以它不适合用来处理具有层阶结构的潜质; 而高阶项目反应模型在处理这种具有层阶结构的潜质时, 不仅能够高效准确地对项目参数和能力参数进行估计, 而且还能同时获得高阶潜质与低阶潜质。目前存在的高阶项目反应模型有高阶DINA模型、高阶双参数正态肩型层阶模型、高阶逻辑斯蒂模型、多级评分的高阶项目反应模型和高阶题组模型。未来对高阶项目反应模型的研究方向应注意多水平高阶项目反应模型、项目内多维情况下的高阶项目反应模型以及高阶认知诊断模型。相似文献

18.

Hidden Markov Item Response Theory Models for Responses and Response Times

Dylan Molenaar Daniel Oberski Jeroen Vermunt Paul De Boeck 《Multivariate behavioral research》2016,51(5):606-626

Current approaches to model responses and response times to psychometric tests solely focus on between-subject differences in speed and ability. Within subjects, speed and ability are assumed to be constants. Violations of this assumption are generally absorbed in the residual of the model. As a result, within-subject departures from the between-subject speed and ability level remain undetected. These departures may be of interest to the researcher as they reflect differences in the response processes adopted on the items of a test. In this article, we propose a dynamic approach for responses and response times based on hidden Markov modeling to account for within-subject differences in responses and response times. A simulation study is conducted to demonstrate acceptable parameter recovery and acceptable performance of various fit indices in distinguishing between different models. In addition, both a confirmatory and an exploratory application are presented to demonstrate the practical value of the modeling approach. 相似文献

19.

Dimensionality of the Latent Structure and Item Selection Via Latent Class Multidimensional IRT Models

F. Bartolucci G. E. Montanari S. Pandolfi 《Psychometrika》2012,77(4):782-802

With reference to a questionnaire aimed at assessing the performance of Italian nursing homes on the basis of the health conditions of their patients, we investigate two relevant issues: dimensionality of the latent structure and discriminating power of the items composing the questionnaire. The approach is based on a multidimensional item response theory model, which assumes a two-parameter logistic parameterization for the response probabilities. This model represents the health status of a patient by latent variables having a discrete distribution and, therefore, it may be seen as a constrained version of the latent class model. On the basis of the adopted model, we implement a hierarchical clustering algorithm aimed at assessing the actual number of dimensions measured by the questionnaire. These dimensions correspond to disjoint groups of items. Once the number of dimensions is selected, we also study the discriminating power of every item, so that it is possible to select the subset of these items which is able to provide an amount of information close to that of the full set. We illustrate the proposed approach on the basis of the data collected on 1,051 elderly people hosted in a sample of Italian nursing homes. 相似文献

20.

Confirmatory Analyses of Componential Test Structure Using Multidimensional Item Response Theory

《Multivariate behavioral research》2013,48(2):245-268

The componential structure of synonym tasks is investigated using confirmatory multidimensional two-parameter IRT models. It was hypothesized that an open synonym task is decomposable into generating synonym candidates and evaluating these candidate words with respect to their synonymy with the stimulus word. Two subtasks were constructed to identify these two components. Different confirmatory models were estimated both with TESTMAP and with NOHARM. The componential hypothesis was supported, but it was found that the generation subtask also involved some evaluation and that generation and evaluation were highly correlated. 相似文献