首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits.  相似文献   

2.
This paper investigates the dichotomous Mokken nonparametric item response theory (IRT) axioms and properties under incomparabilities among latent trait values and items. Generalized equivalents of the unidimensional nonparametric IRT axioms and properties are formulated for nonlinear (quasi-ordered) person and indicator spaces. It is shown that monotone likelihood ratio (MLR) for the total score variable and nonlinear latent trait implies stochastic ordering (SO) of the total score variable, but may fail to imply SO of the nonlinear latent trait. The reason for this and conditions under which the implication holds are specified, based on a new, simpler proof of the fact that in the unidimensional case MLR implies SO. The approach is applied in knowledge space theory (KST), a combinatorial test theory. This leads to a (tentative) Mokken-type nonparametric axiomatization in the currently parametric theory of knowledge spaces. The nonparametric axiomatization is compared with the assumptions of the parametric basic local independence model which is fundamental in KST. It is concluded that this paper may provide a first step toward a basis for a possible fusion of the two split directions of psychological test theories IRT and KST.  相似文献   

3.
A Bayesian random effects model for testlets   总被引:4,自引:0,他引:4  
Standard item response theory (IRT) models fit to dichotomous examination responses ignore the fact that sets of items (testlets) often come from a single common stimuli (e.g. a reading comprehension passage). In this setting, all items given to an examinee are unlikely to be conditionally independent (given examinee proficiency). Models that assume conditional independence will overestimate the precision with which examinee proficiency is measured. Overstatement of precision may lead to inaccurate inferences such as prematurely ending an examination in which the stopping rule is based on the estimated standard error of examinee proficiency (e.g., an adaptive test). To model examinations that may be a mixture of independent items and testlets, we modified one standard IRT model to include an additional random effect for items nested within the same testlet. We use a Bayesian framework to facilitate posterior inference via a Data Augmented Gibbs Sampler (DAGS; Tanner & Wong, 1987). The modified and standard IRT models are both applied to a data set from a disclosed form of the SAT. We also provide simulation results that indicates that the degree of precision bias is a function of the variability of the testlet effects, as well as the testlet design.The authors wish to thank Robert Mislevy, Andrew Gelman and Donald B. Rubin for their helpful suggestions and comments, Ida Lawrence and Miriam Feigenbaum for providing us with the SAT data analyzed in section 5, and to the two anonymous referees for their careful reading and thoughtful suggestions on an earlier draft. We are also grateful to the Educational Testing service for providing the resources to do this research.  相似文献   

4.
A definition ofessential independence is proposed for sequences of polytomous items. For items satisfying the reasonable assumption that the expected amount of credit awarded increases with examinee ability, we develop a theory ofessential unidimensionality which closely parallels that of Stout. Essentially unidimensional item sequences can be shown to have a unique (up to change-of-scale) dominant underlying trait, which can be consistently estimated by a monotone transformation of the sum of the item scores. In more general polytomous-response latent trait models (with or without ordered responses), anM-estimator based upon maximum likelihood may be shown to be consistent for under essentially unidimensional violations of local independence and a variety of monotonicity/identifiability conditions. A rigorous proof of this fact is given, and the standard error of the estimator is explored. These results suggest that ability estimation methods that rely on the summation form of the log likelihood under local independence should generally be robust under essential independence, but standard errors may vary greatly from what is usually expected, depending on the degree of departure from local independence. An index of departure from local independence is also proposed.This work was supported in part by Office of Naval Research Grant N00014-87-K-0277 and National Science Foundation Grant NSF-DMS-88-02556. The author is grateful to William F. Stout for many helpful comments, and to an anonymous reviewer for raising the questions addressed in section 2. A preliminary version of section 6 appeared in the author's Ph.D. thesis.  相似文献   

5.
This study explored the application of latent variable measurement models to the Social Anhedonia Scale (SAS; Eckblad, Chapman, Chapman, & Mishlove, 1982), a widely used and influential measure in schizophrenia-related research. Specifically, we applied unidimensional and bifactor item response theory (IRT) models to data from a community sample of young adults (n = 2,227). Ordinal factor analyses revealed that identifying a coherent latent structure in the 40-item SAS data was challenging due to (a) the presence of multiple small content clusters (e.g., doublets); (b) modest relations between those clusters, which, in turn, implies a general factor of only modest strength; (c) items that shared little variance with the majority of items; and (d) cross-loadings in bifactor solutions. Consequently, we conclude that SAS responses cannot be modeled accurately by either unidimensional or bifactor IRT models. Although the application of a bifactor model to a reduced 17-item set met with better success, significant psychometric and substantive problems remained. Results highlight the challenges of applying latent variable models to scales that were not originally designed to fit these models.  相似文献   

6.
题组作为众多测验中的一种常见题型,由于项目间存在一定程度的依赖性而违背了局部独立性假设,若用项目反应模型进行参数估计将会出现较大的偏差.题组反应理论将被试与题组的交互作用纳入到模型中,解决了项目间相依性的问题.笔者对题组反应理论的发展、基本原理及其相关研究进行了综述,并将其应用在中学英语考试中.与项目反应理论相对比,结果发现:(1)题组反应模型与项目反应模型在各参数估计值的相关系数较强,尤其是能力参数和难度参数;(2)在置信区间宽度的比较上,题组反应模型在各个参数上均窄于项目反应模型,即题组反应模型的估计精度优于项目反应模型.  相似文献   

7.
多维题组效应Rasch模型   总被引:2,自引:0,他引:2  
首先, 本文诠释了“题组”的本质即一个存在共同刺激的项目集合。并基于此, 将题组效应划分为项目内单维题组效应和项目内多维题组效应。其次, 本文基于Rasch模型开发了二级评分和多级评分的多维题组效应Rasch模型, 以期较好地处理项目内多维题组效应。最后, 模拟研究结果显示新模型有效合理, 与Rasch题组模型、分部评分模型对比研究后表明:(1)测验存在项目内多维题组效应时, 仅把明显的捆绑式题组效应进行分离而忽略其他潜在的题组效应, 仍会导致参数的偏差估计甚或高估测验信度; (2)新模型更具普适性, 即便当被试作答数据不存在题组效应或只存在项目内单维题组效应, 采用新模型进行测验分析也能得到较好的参数估计结果。  相似文献   

8.
The application of psychological measures often results in item response data that arguably are consistent with both unidimensional (a single common factor) and multidimensional latent structures (typically caused by parcels of items that tap similar content domains). As such, structural ambiguity leads to seemingly endless "confirmatory" factor analytic studies in which the research question is whether scale scores can be interpreted as reflecting variation on a single trait. An alternative to the more commonly observed unidimensional, correlated traits, or second-order representations of a measure's latent structure is a bifactor model. Bifactor structures, however, are not well understood in the personality assessment community and thus rarely are applied. To address this, herein we (a) describe issues that arise in conceptualizing and modeling multidimensionality, (b) describe exploratory (including Schmid-Leiman [Schmid & Leiman, 1957] and target bifactor rotations) and confirmatory bifactor modeling, (c) differentiate between bifactor and second-order models, and (d) suggest contexts where bifactor analysis is particularly valuable (e.g., for evaluating the plausibility of subscales, determining the extent to which scores reflect a single variable even when the data are multidimensional, and evaluating the feasibility of applying a unidimensional item response theory (IRT) measurement model). We emphasize that the determination of dimensionality is a related but distinct question from either determining the extent to which scores reflect a single individual difference variable or determining the effect of multidimensionality on IRT item parameter estimates. Indeed, we suggest that in many contexts, multidimensional data can yield interpretable scale scores and be appropriately fitted to unidimensional IRT models.  相似文献   

9.
10.
The (univariate) isotonic psychometric (ISOP) model (Scheiblechner, 1995) is a nonparametric IRT model for dichotomous and polytomous (rating scale) psychological test data. A weak subject independence axiom W1 postulates that the subjects are ordered in the same way except for ties (i.e., similarly or isotonically) by all items of a psychological test. A weak item independence axiom W2 postulates that the order of the items is similar for all subjects. Local independence (LI or W3) is assumed in all models. With these axioms, sample-free unidimensional ordinal measurements of items and subjects become feasible. A cancellation axiom (Co) gives, as a result, the additive isotonic psychometric (ADISOP) model and interval scales for subjects and items, and an independence axiom (W4) gives the completely additive isotonic psychometric (CADISOP) model with an interval scale for the response variable (Scheiblechner, 1999). The d-ISOP, d-ADISOP, and d-CADISOP models are generalizations to d-dimensional dependent variables (e.g., speed and accuracy of response). The author would like to thank an Associate Editor and two anonymous referees and also Professor H.H. Schulze for their very valuable suggestions and corrections.  相似文献   

11.
Jin  Ick Hoon  Jeon  Minjeong 《Psychometrika》2019,84(1):236-260

Item response theory (IRT) is one of the most widely utilized tools for item response analysis; however, local item and person independence, which is a critical assumption for IRT, is often violated in real testing situations. In this article, we propose a new type of analytical approach for item response data that does not require standard local independence assumptions. By adapting a latent space joint modeling approach, our proposed model can estimate pairwise distances to represent the item and person dependence structures, from which item and person clusters in latent spaces can be identified. We provide an empirical data analysis to illustrate an application of the proposed method. A simulation study is provided to evaluate the performance of the proposed method in comparison with existing methods.

  相似文献   

12.
Constant latent odds-ratios models and the mantel-haenszel null hypothesis   总被引:1,自引:0,他引:1  
In the present paper, a new family of item response theory (IRT) models for dichotomous item scores is proposed. Two basic assumptions define the most general model of this family. The first assumption is local independence of the item scores given a unidimensional latent trait. The second assumption is that the odds-ratios for all item-pairs are constant functions of the latent trait. Since the latter assumption is characteristic of the whole family, the models are called constant latent odds-ratios (CLORs) models. One nonparametric special case and three parametric special cases of the general CLORs model are shown to be generalizations of the one-parameter logistic Rasch model. For all CLORs models, the total score (the unweighted sum of the item scores) is shown to be a sufficient statistic for the latent trait. In addition, conditions under the general CLORs model are studied for the investigation of differential item functioning (DIF) by means of the Mantel-Haenszel procedure. This research was supported by the Dutch Organization for Scientific Research (NWO), grant number 400-20-026.  相似文献   

13.
A fundamental assumption of most IRT models is that items measure the same unidimensional latent construct. For the polytomous Rasch model two ways of testing this assumption against specific multidimensional alternatives are discussed. One, a marginal approach assuming a multidimensional parametric latent variable distribution, and, two, a conditional approach with no distributional assumptions about the latent variable. The second approach generalizes the Martin-Löf test for the dichotomous Rasch model in two ways: to polytomous items and to a test against an alternative that may have more than two dimensions. A study on occupational health is used to motivate and illustrate the methods.The authors would like to thank Niels Keiding, Klaus Larsen and the anonymous reviewers for valuable comments to a previous version of this paper. This research was supported by a grant from the Danish Research Academy and by a general research grant from Quality Metric, Inc.  相似文献   

14.
Examinee‐selected item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set, always yields incomplete data (i.e., when only the selected items are answered, data are missing for the others) that are likely non‐ignorable in likelihood inference. Standard item response theory (IRT) models become infeasible when ESI data are missing not at random (MNAR). To solve this problem, the authors propose a two‐dimensional IRT model that posits one unidimensional IRT model for observed data and another for nominal selection patterns. The two latent variables are assumed to follow a bivariate normal distribution. In this study, the mirt freeware package was adopted to estimate parameters. The authors conduct an experiment to demonstrate that ESI data are often non‐ignorable and to determine how to apply the new model to the data collected. Two follow‐up simulation studies are conducted to assess the parameter recovery of the new model and the consequences for parameter estimation of ignoring MNAR data. The results of the two simulation studies indicate good parameter recovery of the new model and poor parameter recovery when non‐ignorable missing data were mistakenly treated as ignorable.  相似文献   

15.
We use classical test theory (CTT) and item response theory (IRT) methodologies to examine the psychometric and measurement properties of an instrument designed to assess sexual orientation harassment among military personnel (N?=?71,989). CTT analyses indicated that items were unidimensional and exhibited adequate levels of reliability. IRT analyses demonstrated that the items functioned similarly and exhibited appropriate levels of item discrimination. However, the analyses also suggested that the sensitivity of the items may be limited. Differential test functioning analyses provided evidence of the measurement equivalence of the instrument across male and female respondents. The findings provide support for the psychometric properties and measurement equivalence of the instrument for measuring sexual orientation harassment among male and female military personnel. We discuss the implications of our findings for future research on sexual orientation harassment in the workplace.  相似文献   

16.
The present study examined measurement equivalence of the Satisfaction with Life Scale between American and Chinese samples using multigroup Structural Equation Modeling (SEM), Multiple indicator multiple cause model (MIMIC), and Item Response Theory (IRT). Whereas SEM and MIMIC identified only one biased item across cultures, the IRT analysis revealed that four of the five items had differential item functioning. According to IRT, Chinese whose latent life satisfaction scores were quite high did not endorse items such as “So far I have gotten the important things I want in life” and “If I could live my life over, I would change almost nothing.” The IRT analysis also showed that even when the unbiased items were weighted more heavily than the biased items, the latent mean life satisfaction score of Chinese was substantially lower than that of Americans. The differences among SEM, MIMIC, and IRT are discussed.  相似文献   

17.
Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate—all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example.  相似文献   

18.
A conventional way to analyze item responses in multiple tests is to apply unidimensional item response models separately, one test at a time. This unidimensional approach, which ignores the correlations between latent traits, yields imprecise measures when tests are short. To resolve this problem, one can use multidimensional item response models that use correlations between latent traits to improve measurement precision of individual latent traits. The improvements are demonstrated using 2 empirical examples. It appears that the multidimensional approach improves measurement precision substantially, especially when tests are short and the number of tests is large. To achieve the same measurement precision, the multidimensional approach needs less than half of the comparable items required for the unidimensional approach.  相似文献   

19.
Item response theory (IRT) methods were applied to items from the 80-item Psychological Inventory of Criminal Thinking Styles (PICTS; G. D. Walters, 1995) to determine how well they measure the latent trait of criminal thinking in a group of 2,872 male medium security prison inmates. Preliminary analyses revealed that the 64 PICTS thinking style items, 32 PICTS proactive criminal thinking items, and 24 PICTS reactive criminal thinking items were sufficiently unidimensional to meet the local independence requirements of IRT. The PICTS was fitted to a 2-parameter logistic-graded response IRT model, the results of which showed that the 8 items measuring denial of harm (Sentimentality) displayed weak discrimination (a < 0.5), whereas most of the proactive and reactive items displayed moderate to good discrimination (a > 1.0). Information function analysis revealed that all 3 components of a hierarchical model of criminal thinking--PICTS total scale, PICTS proactive factor, and PICTS reactive factor--displayed greater precision at higher rather than lower levels of the trait dimension. The study findings indicate that items from the PICTS Sentimentality scale do a poor job of measuring general criminal thinking, whereas items from the other 7 PICTS thinking style scales provide their most precise estimates at the upper end of the trait dimension.  相似文献   

20.
A central assumption that is implicit in estimating item parameters in item response theory (IRT) models is the normality of the latent trait distribution, whereas a similar assumption made in categorical confirmatory factor analysis (CCFA) models is the multivariate normality of the latent response variables. Violation of the normality assumption can lead to biased parameter estimates. Although previous studies have focused primarily on unidimensional IRT models, this study extended the literature by considering a multidimensional IRT model for polytomous responses, namely the multidimensional graded response model. Moreover, this study is one of few studies that specifically compared the performance of full-information maximum likelihood (FIML) estimation versus robust weighted least squares (WLS) estimation when the normality assumption is violated. The research also manipulated the number of nonnormal latent trait dimensions. Results showed that FIML consistently outperformed WLS when there were one or multiple skewed latent trait distributions. More interestingly, the bias of the discrimination parameters was non-ignorable only when the corresponding factor was skewed. Having other skewed factors did not further exacerbate the bias, whereas biases of boundary parameters increased as more nonnormal factors were added. The item parameter standard errors recovered well with both estimation algorithms regardless of the number of nonnormal dimensions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号