Among current state-of-the-art estimation methods for multilevel IRT models, the two-stage divide-and-conquer strategy has practical advantages, such as clearer definition of factors, convenience for secondary data analysis, convenience for model calibration and fit evaluation, and avoidance of improper solutions. However, various studies have shown that, under the two-stage framework, ignoring measurement error in the dependent variable in stage II leads to incorrect statistical inferences. To this end, we proposed a novel method to correct both measurement bias and measurement error of latent trait estimates from stage I in the stage II estimation. In this paper, the HO-IRT model is considered as the measurement model, and a linear mixed effects model on overall (i.e., higher-order) abilities is considered as the structural model. The performance of the proposed correction method is illustrated and compared via a simulation study and a real data example using the National Educational Longitudinal Survey data (NELS 88). Results indicate that structural parameters can be recovered better after correcting measurement biases and errors.  相似文献   

It is often considered desirable to have the same ordering of the items by difficulty across different levels of the trait or ability. Such an ordering is an invariant item ordering (IIO). An IIO facilitates the interpretation of test results. For dichotomously scored items, earlier research surveyed the theory and methods of an invariant ordering in a nonparametric IRT context. Here the focus is on polytomously scored items, and both nonparametric and parametric IRT models are considered.The absence of the IIO property in twononparametric polytomous IRT models is discussed, and two nonparametric models are discussed that imply an IIO. A method is proposed that can be used to investigate whether empirical data imply an IIO. Furthermore, only twoparametric polytomous IRT models are found to imply an IIO. These are the rating scale model (Andrich, 1978) and a restricted rating scale version of the graded response model (Muraki, 1990). Well-known models, such as the partial credit model (Masters, 1982) and the graded response model (Samejima, 1969), do no imply an IIO.  相似文献   

Despite the growing popularity of diagnostic classification models (e.g., Rupp et al., 2010, Diagnostic measurement: theory, methods, and applications, Guilford Press, New York, NY) in educational and psychological measurement, methods for testing their absolute goodness of fit to real data remain relatively underdeveloped. For tests of reasonable length and for realistic sample size, full‐information test statistics such as Pearson's X2 and the likelihood ratio statistic G2 suffer from sparseness in the underlying contingency table from which they are computed. Recently, limited‐information fit statistics such as Maydeu‐Olivares and Joe's (2006, Psychometrika, 71, 713) M2 have been found to be quite useful in testing the overall goodness of fit of item response theory models. In this study, we applied Maydeu‐Olivares and Joe's (2006, Psychometrika, 71, 713) M2 statistic to diagnostic classification models. Through a series of simulation studies, we found that M2 is well calibrated across a wide range of diagnostic model structures and was sensitive to certain misspecifications of the item model (e.g., fitting disjunctive models to data generated according to a conjunctive model), errors in the Q‐matrix (adding or omitting paths, omitting a latent variable), and violations of local item independence due to unmodelled testlet effects. On the other hand, M2 was largely insensitive to misspecifications in the distribution of higher‐order latent dimensions and to the specification of an extraneous attribute. To complement the analyses of the overall model goodness of fit using M2, we investigated the utility of the Chen and Thissen (1997, J. Educ. Behav. Stat., 22, 265) local dependence statistic X LD 2 for characterizing sources of misfit, an important aspect of model appraisal often overlooked in favour of overall statements. The X LD 2 statistic was found to be slightly conservative (with Type I error rates consistently below the nominal level) but still useful in pinpointing the sources of misfit. Patterns of local dependence arising due to specific model misspecifications are illustrated. Finally, we used the M2 and X LD 2 statistics to evaluate a diagnostic model fit to data from the Trends in Mathematics and Science Study, drawing upon analyses previously conducted by Lee et al., (2011, IJT, 11, 144).  相似文献   

A Bayesian procedure to estimate the three-parameter normal ogive model and a generalization of the procedure to a model with multidimensional ability parameters are presented. The procedure is a generalization of a procedure by Albert (1992) for estimating the two-parameter normal ogive model. The procedure supports analyzing data from multiple populations and incomplete designs. It is shown that restrictions can be imposed on the factor matrix for testing specific hypotheses about the ability structure. The technique is illustrated using simulated and real data. The authors would like to thank Norman Verhelst for his valuable comments and ACT, CITO group and SweSAT for the use of their data.  相似文献   

心理和教育测量一般只能达到顺序量表的水平,其测量数据与被测因子间并非简单线性关系。题目因素分析是用来描述测量题目与因子间非线性关系的统计模型。题目因素分析主要有基于结构方程模型和基于项目反应理论两类方法,两类方法之间存在紧密的联系,甚至可以看作是同一模型的两种表现形式。本文详细阐述了该关系,同时对两类方法在参数估计、模型拟合指标、测量一致性检验和支撑软件等方面的特点进行了分析和比较,以便研究者选择最为适合其研究的方法。  相似文献   

解释性项目反应理论模型(Explanatory Item Response Theory Models, EIRTM)是指基于广义线性混合模型和非线性混合模型构建的项目反应理论(Item Response Theory, IRT)模型。EIRTM能在IRT模型的基础上直接加入预测变量, 从而解决各类测量问题。首先介绍EIRTM的相关概念和参数估计方法, 然后展示如何使用EIRTM处理题目位置效应、测验模式效应、题目功能差异、局部被试依赖和局部题目依赖, 接着提供实例对EIRTM的使用进行说明, 最后对EIRTM的不足之处和应用前景进行讨论。  相似文献   

Over the past decade, Mokken scale analysis (MSA) has rapidly grown in popularity among researchers from many different research areas. This tutorial provides researchers with a set of techniques and a procedure for their application, such that the construction of scales that have superior measurement properties is further optimized, taking full advantage of the properties of MSA. First, we define the conceptual context of MSA, discuss the two item response theory (IRT) models that constitute the basis of MSA, and discuss how these models differ from other IRT models. Second, we discuss dos and don'ts for MSA; the don'ts include misunderstandings we have frequently encountered with researchers in our three decades of experience with real‐data MSA. Third, we discuss a methodology for MSA on real data that consist of a sample of persons who have provided scores on a set of items that, depending on the composition of the item set, constitute the basis for one or more scales, and we use the methodology to analyse an example real‐data set.  相似文献   

刘红云  李冲  张平平  骆方 《心理学报》2012,44(8):1124-1136
测量工具满足等价性是进行多组比较的前提, 测量等价性的检验方法主要有基于CFA的多组比较法和基于IRT的DIF检验两类方法。文章比较了单维测验情境下基于CCFA的DIFFTEST检验方法和基于IRT模型的IRT-LR检验方法, 以及多维测验情境下DIFFTEST和基于MIRT的卡方检验方法的差异。通过模拟研究的方法, 比较了几种方法的检验力和第一类错误, 并考虑了样本总量、样本量的组间均衡性、测验长度、阈值差异大小以及维度间相关程度的影响。研究结果表明:(1)在单维测验下, IRT-LR是比DIFFTEST更为严格的检验方法; 多维测验下, 在测验较长、测验维度之间相关较高时, MIRT-MG比DIFFTEST更容易检验出项目阈值的差异, 而在测验长度较短、维度之间相关较小时, DIFFTEST的检验力反而略高于MIRT-MG方法。(2)随着阈值差值增加, DIFFTEST、IRT-LR和MIRT-MG三种方法的检验力均在增加, 当阈值差异达到中等或较大时, 三种方法都可以有效检验出测验阈值的不等价性。(3)随着样本总量增加, DIFFTEST、IRT-LR和MIRT-MG方法的检验力均在增加; 在总样本量不变, 两组样本均衡情况下三种方法的检验力均高于不均衡的情况。(4)违背等价性题目个数不变时, 测验越长DIFFTEST的检验力会下降, 而IRT-LR和MIRT-MG检验力则上升。(5) DIFFTEST方法的一类错误率平均值接近名义值0.05; 而IRT-LR和MIRT-MG方法的一类错误率平均值远低于0.05。  相似文献   

This study demonstrates the potential utility of the Behavioural Style Observational System (BSOS) as a new observational measure of children's behavioural style. The BSOS is an objective, short and easy to use measure that can be readily adapted to a variety of home and laboratory situations. In the present study, 160 mother–child dyads from the Concordia Longitudinal Risk Project (CLRP) were observed during an 11‐min behavioural sample. Videotaped interactions were coded using the BSOS for children's mood, activity level, vocal reactivity, approach to toys, mood consistency and adaptability. Comparisons between the BSOS observational ratings and mothers' ratings of the child on the EAS Temperament Survey (EAS) provided support for modest congruence between these two measurement systems, and revealed a differential predictive pattern of children's functioning. Specifically, the observation‐based BSOS predicted children's cognitive performance and adaptive behaviour during testing, whereas the mother‐rated EAS predicted maternal ratings of children's internalizing and externalizing behaviour problems. Both measures were found to independently predict mothers' ratings of parenting stress. Overall, the findings imply that neither observational measures nor maternal ratings alone are sufficient to understand children's behavioural style, and that comprehensive evaluations of children's temperament should optimally include both types of measures. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

Using an item‐response theory‐based approach (i.e. likelihood ratio test with an iterative procedure), we examined the equivalence of the Rosenberg Self‐Esteem Scale (RSES) in a sample of US and Chinese college students. Results from the differential item functioning (DIF) analysis showed that the RSES was not fully equivalent at the item level, as well as at the scale level. The two cultural groups did not use the scale comparably, with the US students showing more extreme responses than the Chinese students. Moreover, we evaluated the practical impact of DIF and found that cultural differences in average self‐esteem scores disappeared after the DIF was taken into account. In the present study, we discuss the implications of our findings for cross‐cultural research and provide suggestions for future studies using the RSES in China.  相似文献   

学习进阶是对学生在一段时间内,关于某一主题连续发展、逐渐复杂思维的描述,它的建立是以假设性学习进阶为起点到收集证据验证假设性学习进阶不断迭代的过程。心理测量学模型能够使学习进阶与评价结合起来,既为验证学习进阶的有效性提供证据,又能对学生做出诊断,当前应用到学习进阶中的心理测量学模型有单维项目反应模型、多维项目反应模型和认知诊断模型。学习进阶还能为垂直量尺化、自适应学习提供新的研究视角,但应注意项目功能差异等问题。  相似文献   

Self‐regulation presumably rests upon multiple processes that include an awareness of ongoing self‐experience, enduring self‐knowledge and self‐control. The present investigation tested this multi‐process model using the Five‐Facet Mindfulness Questionnaire (FFMQ) and the Integrative Self‐Knowledge and Brief Self‐Control Scales. Using a sample of 1162 Iranian university students, we confirmed the five‐factor structure of the FFMQ in Iran and documented its factorial invariance across males and females. Self‐regulatory variables correlated negatively with Perceived Stress, Depression, and Anxiety and positively with Self‐Esteem and Satisfaction with Life. Partial mediation effects confirmed that self‐regulatory measures ameliorated the disturbing effects of Perceived Stress. Integrative Self‐Knowledge and Self‐Control interacted to partially mediate the association of Perceived Stress with lower levels of Satisfaction with Life. Integrative Self‐Knowledge, alone or in interaction with Self‐Control, was the only self‐regulation variable to display the expected mediation of Perceived Stress associations with all other measures. Self‐Control failed to be implicated in self‐regulation only in the mediation of Anxiety. These data confirmed the need to further examine this multi‐process model of self‐regulation.  相似文献   

In the context of the additive multi‐criteria value model, this paper investigates how the set of criteria weights (weight‐set hereafter) can be determined according to the preference orders of alternatives given by the decision maker. A construction method is proposed for the weight‐set for different intervals of β, where β is a differential amount of value between the preference information on two alternatives. The results of this paper are important for sensitivity analysis in multi‐criteria decision making (MCDM) problems and multi‐criteria group decision analysis. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

Non‐suicidal self‐injury (NSSI) is an increasing health concern. Despite the potential benefits of disclosing the behaviour, many decide not to do so because of the fear of negative social reactions. In this review, we examined the existing research on reported and perceived reactions to NSSI disclosure with the aim of identifying how an individual who discloses their NSSI perceives others' responses to this disclosure, with the ultimate goal of understanding how these reactions may impact those who disclose their NSSI. Among the initial 275 studies, 10 fit the inclusion criteria. Three studies reported perceived responses by individuals who had disclosed their NSSI; six studies examined self‐reported responses by others; one study focused on disclosures online. Individuals who disclosed their NSSI often received negative responses, which caused them to withdraw from seeking further help. On the other hand, recipients' reactions to NSSI disclosure varied based on NSSI characteristics such as its perceived cause and/or underlying motivation. Results highlight the importance of providing support rather than searching for the underlying drives of NSSI.  相似文献   

A covariance structure analysis method for testing time‐invariance in reliability in multiwave, multiple‐indicator models in outlined. The approach accounts for observed variable specificity and permits, in addition, estimation of reliability in terms of ‘pure’ measurement error variance. The proposed procedure is developed within a confirmatory factor analysis framework and illustrated with data from a cognitive intervention study.  相似文献   

There are a growing number of item response theory (IRT) studies that calibrate different patient-reported outcome (PRO) measures, such as anxiety, depression, physical function, and pain, on common, instrument-independent metrics. In the case of depression, it has been reported that there are considerable mean score differences when scoring on a common metric from different, previously linked instruments. Ideally, those estimates should be the same. We investigated to what extent those differences are influenced by different scoring methods that take into account several levels of uncertainty, such as measurement error (through plausible value imputation) and item parameter uncertainty (through full Bayesian IRT modeling). Depression estimates from different instruments were more similar, and their corresponding confidence/credible intervals were larger when plausible value imputation or Bayesian modeling was used, compared to the direct use of expected a posteriori (EAP) estimates. Furthermore, we explored the use of Bayesian IRT models to update item parameters based on newly collected data.  相似文献   

A new multilevel latent state graded response model for longitudinal multitrait–multimethod (MTMM) measurement designs combining structurally different and interchangeable methods is proposed. The model allows researchers to examine construct validity over time and to study the change and stability of constructs and method effects based on ordinal response variables. We show how Bayesian estimation techniques can address a number of important issues that typically arise in longitudinal multilevel MTMM studies and facilitates the estimation of the model presented. Estimation accuracy and the impact of between‐ and within‐level sample sizes as well as different prior specifications on parameter recovery were investigated in a Monte Carlo simulation study. Findings indicate that the parameters of the model presented can be accurately estimated with Bayesian estimation methods in the case of low convergent validity with as few as 250 clusters and more than two observations within each cluster. The model was applied to well‐being data from a longitudinal MTMM study, assessing the change and stability of life satisfaction and subjective happiness in young adults after high‐school graduation. Guidelines for empirical applications are provided and advantages and limitations of a Bayesian approach to estimating longitudinal multilevel MTMM models are discussed.  相似文献   

