期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Robust Measurement via A Fused Latent and Graphical Item Response Theory Model

Yunxiao Chen Xiaoou Li Jingchen Liu Zhiliang Ying 《Psychometrika》2018,83(3):538-562

Item response theory (IRT) plays an important role in psychological and educational measurement. Unlike the classical testing theory, IRT models aggregate the item level information, yielding more accurate measurements. Most IRT models assume local independence, an assumption not likely to be satisfied in practice, especially when the number of items is large. Results in the literature and simulation studies in this paper reveal that misspecifying the local independence assumption may result in inaccurate measurements and differential item functioning. To provide more robust measurements, we propose an integrated approach by adding a graphical component to a multidimensional IRT model that can offset the effect of unknown local dependence. The new model contains a confirmatory latent variable component, which measures the targeted latent traits, and a graphical component, which captures the local dependence. An efficient proximal algorithm is proposed for the parameter estimation and structure learning of the local dependence. This approach can substantially improve the measurement, given no prior information on the local dependence structure. The model can be applied to measure both a unidimensional latent trait and multidimensional latent traits. 相似文献

2.

Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models

Edward Haksing Ip 《The British journal of mathematical and statistical psychology》2010,63(2):395-416

Multidimensionality is a core concept in the measurement and analysis of psychological data. In personality assessment, for example, constructs are mostly theoretically defined as unidimensional, yet responses collected from the real world are almost always determined by multiple factors. Significant research efforts have concentrated on the use of simulated studies to evaluate the robustness of unidimensional item response models when applied to multidimensional data with a dominant dimension. In contrast, in the present paper, I report the result from a theoretical investigation that a multidimensional item response model is empirically indistinguishable from a locally dependent unidimensional model, of which the single dimension represents the actual construct of interest. A practical implication of this result is that multidimensional response data do not automatically require the use of multidimensional models. Circumstances under which the alternative approach of locally dependent unidimensional models may be useful are discussed. 相似文献

3.

Bifactor models and rotations: exploring the extent to which multidimensional data yield univocal scale scores 总被引：1，自引：0，他引：1

Reise SP Moore TM Haviland MG 《Journal of personality assessment》2010,92(6):544-559

The application of psychological measures often results in item response data that arguably are consistent with both unidimensional (a single common factor) and multidimensional latent structures (typically caused by parcels of items that tap similar content domains). As such, structural ambiguity leads to seemingly endless "confirmatory" factor analytic studies in which the research question is whether scale scores can be interpreted as reflecting variation on a single trait. An alternative to the more commonly observed unidimensional, correlated traits, or second-order representations of a measure's latent structure is a bifactor model. Bifactor structures, however, are not well understood in the personality assessment community and thus rarely are applied. To address this, herein we (a) describe issues that arise in conceptualizing and modeling multidimensionality, (b) describe exploratory (including Schmid-Leiman [Schmid & Leiman, 1957] and target bifactor rotations) and confirmatory bifactor modeling, (c) differentiate between bifactor and second-order models, and (d) suggest contexts where bifactor analysis is particularly valuable (e.g., for evaluating the plausibility of subscales, determining the extent to which scores reflect a single variable even when the data are multidimensional, and evaluating the feasibility of applying a unidimensional item response theory (IRT) measurement model). We emphasize that the determination of dimensionality is a related but distinct question from either determining the extent to which scores reflect a single individual difference variable or determining the effect of multidimensionality on IRT item parameter estimates. Indeed, we suggest that in many contexts, multidimensional data can yield interpretable scale scores and be appropriately fitted to unidimensional IRT models. 相似文献

4.

Measuring change for a multidimensional test using a generalized explanatory longitudinal item response model

Sun‐Joo Cho Michele Athay Kristopher J. Preacher 《The British journal of mathematical and statistical psychology》2013,66(2):353-381

Even though many educational and psychological tests are known to be multidimensional, little research has been done to address how to measure individual differences in change within an item response theory framework. In this paper, we suggest a generalized explanatory longitudinal item response model to measure individual differences in change. New longitudinal models for multidimensional tests and existing models for unidimensional tests are presented within this framework and implemented with software developed for generalized linear models. In addition to the measurement of change, the longitudinal models we present can also be used to explain individual differences in change scores for person groups (e.g., learning disabled students versus non‐learning disabled students) and to model differences in item difficulties across item groups (e.g., number operation, measurement, and representation item groups in a mathematics test). An empirical example illustrates the use of the various models for measuring individual differences in change when there are person groups and multiple skill domains which lead to multidimensionality at a time point. 相似文献

5.

Multidimensional modeling with unidimensional approximations

J. Douglas Carroll Michael V. Levine 《Journal of mathematical psychology》2007,51(4):207-228

This paper advances nonparametric multidimensional item response theory by reporting experimental results on the use of nonmetric multidimensional scaling (MDS) to synthesize a multidimensional model from several approximating one-dimensional models. A two-dimensional simulation data set contains items in which the two-component traits combine linearly (dominance model items) and items in which the two-component traits combine quadratically (ideal point items). Several unidimensional approximations of the two-dimensional model were obtained by running unidimensional estimation software on the simulated data set. The graphs reconstructed from MDS of the unidimensional approximations at selected points clearly separate dominance items from ideal point items, and also various types of dominance or ideal point models. MDS also succeeded in determining the dimensionality of the simulation model items from the observable item responses. 相似文献

6.

Robustness of Parameter Estimation to Assumptions of Normality in the Multidimensional Graded Response Model

Chun Wang Shiyang Su David J. Weiss 《Multivariate behavioral research》2018,53(3):403-418

A central assumption that is implicit in estimating item parameters in item response theory (IRT) models is the normality of the latent trait distribution, whereas a similar assumption made in categorical confirmatory factor analysis (CCFA) models is the multivariate normality of the latent response variables. Violation of the normality assumption can lead to biased parameter estimates. Although previous studies have focused primarily on unidimensional IRT models, this study extended the literature by considering a multidimensional IRT model for polytomous responses, namely the multidimensional graded response model. Moreover, this study is one of few studies that specifically compared the performance of full-information maximum likelihood (FIML) estimation versus robust weighted least squares (WLS) estimation when the normality assumption is violated. The research also manipulated the number of nonnormal latent trait dimensions. Results showed that FIML consistently outperformed WLS when there were one or multiple skewed latent trait distributions. More interestingly, the bias of the discrimination parameters was non-ignorable only when the corresponding factor was skewed. Having other skewed factors did not further exacerbate the bias, whereas biases of boundary parameters increased as more nonnormal factors were added. The item parameter standard errors recovered well with both estimation algorithms regardless of the number of nonnormal dimensions. 相似文献

7.

项目间多维测验作答时间数据分析：潜在特质速度间效应建模

郭小军罗照盛严娟《心理科学》2022,45(5):1222-1229

随着计算机测验使用的普及化,被试在心理与教育测验上的作答反应时的获取也越发便利。为了充分利用项目反应时信息,单维与多维的反应时模型相继被提出。然后,在项目间多维反应时数据中,潜在特质速度之间可能存在共同关系（比如,层阶关系）,此时现有的反应时模型并不能适用。基于此,本研究提出了高阶对数正态反应时模型与双因子对数正态反应时模型。在模拟研究中,高阶对数正态反应时模型与双因子对数正态反应时模型的各参数都能被准确估计。在瑞文标准推理测验的三组测验项目的反应时数据中,双因子对数正态反应时模型表现出更为优秀的拟合效果,同时基于多个统计量说明了局部与全局潜在特质速度同时存在的必要性。因此,在项目间多维测验反应时数据分析中,非常有必要考虑多维潜在特质速度之间的共同效应。相似文献

8.

使用验证性补偿多维IRT模型进行认知诊断评估

詹沛达陈平边玉芳《心理学报》2016,48(10):1347-1356

随着人们对测验反馈结果精细化的需求逐渐提高, 具有认知诊断功能的测量方法逐渐受到人们的关注。在认知诊断模型(CDMs)闪耀着光芒的同时, 另一类能够在连续量尺上提供精细反馈的多维IRT模型(MIRTMs)似乎受到些许冷落。为探究MIRTMs潜在的认知诊断功能, 本文以补偿模型为视角, 聚焦于分别属于MIRTMs的多维两参数logistic模型(M2PLM)和属于CDMs的线性logistic模型(LLM); 之后为使两者具有可比性, 可对补偿M2PLM引入验证性矩阵(Q矩阵)来界定题目与维度之间的关系, 进而得到验证性的补偿M2PLM (CC-M2PLM), 并通过把潜在特质按切点划分为跨界属性, 以期使CC-M2PLM展现出其本应具有的认知诊断功能; 预研究表明logistic量尺上的0点可作为相对合理的切点; 然后, 通过模拟研究对比探究CC-M2PLM和LLM的认知诊断功能, 结果表明CC-M2PLM可用于分析诊断测验数据, 且认知诊断功能与直接使用LLM的效果相当; 最后, 以两则实证数据为例来说明CC-M2PLM在实际诊断测验分析中的可行性。相似文献

9.

Loglinear Rasch models for the analysis of stability and change

Thorsten Meiser 《Psychometrika》1996,61(4):629-645

Loglinear unidimensional and multidimensional Rasch models are considered for the analysis of repeated observations of polytomous indicators with ordered response categories. Reparameterizations and parameter restrictions are provided which facilitate specification of a variety of hypotheses about latent processes of change. Models of purely quantitative change in latent traits are proposed as well as models including structural change. A conditional likelihood ratio test is presented for the comparison of unidimensional and multiple scales Rasch models. In the context of longitudinal research, this renders possible the statistical test of homogeneity of change against subject-specific change in latent traits. Applications to two empirical data sets illustrate the use of the models.The author is greatly indebted to Ulf Böckenholt, Rolf Langeheine, and several anonymous reviewers for many helpful suggestions. 相似文献

10.

Normative Scoring of Multidimensional Pairwise Preference Personality Scales Using IRT: Empirical Comparisons With Other Formats

Oleksandr S. Chernyshenko Stephen Stark Matthew S. Prewett Ashley A. Gray Frederick R. Stilson Matthew D. Tuttle 《人类行为》2013,26(2):105-127

In this article, we offer some suggestions as to why tetrads and pentads have become the dominant formats for administering multidimensional forced choice (MFC) items but, in turn, raise questions regarding the underlying psychometric model and means of addressing item quality and scoring accuracy. We then focus our attention on multidimensional pairwise preference (MDPP) items and present an item response theory–based approach to constructing and modeling MDPP responses directly, assessing information at the item and scale levels, and a way of computing standard errors for trait scores and estimating scale reliability. To demonstrate the viability of this method for applied use, we show that the correspondence between MDPP scores derived from direct modeling with those obtained using single statement and unidimensional pairwise preference measures administered in a laboratory setting. Trait score correlations and criterion related validities are compared across testing formats and rating sources (i.e., self and other), and the usefulness of our model-based approach is further demonstrated by some illustrative results involving computerized adaptive tests (CAT). 相似文献

11.

Latent trait models in the study of intelligence

Susan E. Whitely 《Intelligence》1980,4(2):97-132

This article examines the potential contribution of latent trait models to the study of intelligence. Nontechnical introductions to both unidimensional and multidimensional latent trait models are given, and possible research applications are considered. Latent trait models are shown to resolve several measurement problems in studies of intellectual change, including ability modification studies and life-span development studies. Furthermore, under certain conditions, latent trait models are found useful for construct validation research, since they can represent an individual differences model of cognitive processing on ability test items. Multidimensional latent trait models are shown to be especially useful as processing models, because they can be used to test alternative multiple component theories of test item processing. Furthermore, multidimensional models can be used to decompose test item difficulty into component contributions and estimate individual differences in processing abilities. 相似文献

12.

多维题组效应Rasch模型 总被引：2，自引：0，他引：2

詹沛达王文中王立君李晓敏《心理学报》2014,46(8):1208-1222

首先, 本文诠释了“题组”的本质即一个存在共同刺激的项目集合。并基于此, 将题组效应划分为项目内单维题组效应和项目内多维题组效应。其次, 本文基于Rasch模型开发了二级评分和多级评分的多维题组效应Rasch模型, 以期较好地处理项目内多维题组效应。最后, 模拟研究结果显示新模型有效合理, 与Rasch题组模型、分部评分模型对比研究后表明：(1)测验存在项目内多维题组效应时, 仅把明显的捆绑式题组效应进行分离而忽略其他潜在的题组效应, 仍会导致参数的偏差估计甚或高估测验信度; (2)新模型更具普适性, 即便当被试作答数据不存在题组效应或只存在项目内单维题组效应, 采用新模型进行测验分析也能得到较好的参数估计结果。相似文献

13.

On the need for negative local item dependence

Brian?Habing Email author Louis?A.?Roussos 《Psychometrika》2003,68(3):435-451

While negative local item dependence (LID) has been discussed in numerous articles, its occurrence and effects often go unrecognized. This is due in part to confusion over what unidimensional latent trait is being utilized in evaluating the LID of multidimensional testing data. This article addresses this confusion by using an appropriately chosen latent variable to condition on. It then provides a proof that negative LID must occur when unidimensional ability estimates (such as number right score) are obtained from data which follow a very general class of multidimensional item response theory models. The importance of specifying what unidimensional latent trait is used, and its effect on the sign of the LIDs are shown to have implications in regard to a variety of foundational theoretical arguments, to the simulation of LID data sets, and to the use of testlet scoring for removing LID.This paper is based in part on a chapter in the first author's doctoral dissertation, written at the University of Illinois at Urbana-Champaign under the supervision of William Stout. Part of this research has been presented at the annual meeting of the National Council on Measurement in Education, San Diego, California, April 14–16, 1998.The research of the first author was partially supported by a Harold Gulliksen Psychometric fellowship through Educational Testing Service and by a Research and Productive Scholarship award from the University of South Carolina. 相似文献

14.

计算机化多维测验中作答时间和作答精度数据的联合分析

詹沛达《心理科学》2019,(1):170-178

随着心理与教育测量研究的发展和科技的进步,计算机化(大规模)测验逐渐受到人们的关注。为探究在计算机化多维测验中如何利用作答时间数据来辅助评估多维潜在能力,以及为我国义务教育阶段教育质量监测提供数据分析方法上的理论支持。本研究以2012年和2015年国际学生能力评估(PISA)计算机化数学测验数据为例,提出了一种可同时利用作答时间和作答精度数据的联合作答与时间的多维Rasch模型。根据新模型对PISA数据的分析结果,表明引入作答时间数据,不仅有助于提高模型参数的估计精度,还有助于数据分析者利用被试的作答时间信息来做进一步的决策和干预(e.g., 对异常作答行为或预备知识的诊断)。相似文献

15.

Psychometric Modeling of response speed and accuracy with mixed and conditional regression

Gerard?J.?P.?Van?Breukelen Email author 《Psychometrika》2005,70(2):359-376

Human performance in cognitive testing and experimental psychology is expressed in terms of response speed and accuracy. Data analysis is often limited to either speed or accuracy, and/or to crude summary measures like mean response time (RT) or the percentage correct responses. This paper proposes the use of mixed regression for the psychometric modeling of response speed and accuracy in testing and experiments. Mixed logistic regression of response accuracy extends logistic item response theory modeling to multidimensional models with covariates and interactions. Mixed linear regression of response time extends mixed ANOVA to unbalanced designs with covariates and heterogeneity of variance. Related to mixed regression is conditional regression, which requires no normality assumption, but is limited to unidimensional models. Mixed and conditional methods are both applied to an experimental study of mental rotation. Univariate and bivariate analyzes show how within-subject correlation between response and RT can be distinguished from between-subject correlation, and how latent traits can be detected, given careful item design or content analysis. It is concluded that both response and RT must be recorded in cognitive testing, and that mixed regression is a versatile method for analyzing test data.I am grateful to Rogier Donders for putting his data at my disposal. 相似文献

16.

Gaussian variational estimation for multidimensional item response theory

April E. Cho Chun Wang Xue Zhang Gongjun Xu 《The British journal of mathematical and statistical psychology》2021,74(Z1):52-85

Multidimensional item response theory (MIRT) is widely used in assessment and evaluation of educational and psychological tests. It models the individual response patterns by specifying a functional relationship between individuals' multiple latent traits and their responses to test items. One major challenge in parameter estimation in MIRT is that the likelihood involves intractable multidimensional integrals due to the latent variable structure. Various methods have been proposed that involve either direct numerical approximations to the integrals or Monte Carlo simulations. However, these methods are known to be computationally demanding in high dimensions and rely on sampling data points from a posterior distribution. We propose a new Gaussian variational expectation--maximization (GVEM) algorithm which adopts variational inference to approximate the intractable marginal likelihood by a computationally feasible lower bound. In addition, the proposed algorithm can be applied to assess the dimensionality of the latent traits in an exploratory analysis. Simulation studies are conducted to demonstrate the computational efficiency and estimation precision of the new GVEM algorithm compared to the popular alternative Metropolis–Hastings Robbins–Monro algorithm. In addition, theoretical results are presented to establish the consistency of the estimator from the new GVEM algorithm. 相似文献

17.

The Rosenberg Self-Esteem Scale: A Bifactor Answer to a Two-Factor Question?

Michael T. McKay Daniel Boduszek Séamus A. Harvey 《Journal of personality assessment》2014,96(6):654-660

Despite its long-standing and widespread use, disagreement remains regarding the structure of the Rosenberg Self-Esteem Scale (RSES). In particular, concern remains regarding the degree to which the scale assesses self-esteem as a unidimensional or multidimensional (positive and negative self-esteem) construct. Using a sample of 3,862 high school students in the United Kingdom, 4 models were tested: (a) a unidimensional model, (b) a correlated 2-factor model in which the 2 latent variables are represented by positive and negative self-esteem, (c) a hierarchical model, and (d) a bifactor model. The totality of results including item loadings, goodness-of-fit indexes, reliability estimates, and correlations with self-efficacy measures all supported the bifactor model, suggesting that the 2 hypothesized factors are better understood as “grouping” factors rather than as representative of latent constructs. Accordingly, this study supports the unidimensionality of the RSES and the scoring of all 10 items to produce a global self-esteem score. 相似文献

18.

分类数据测量等价性检验方法及其比较：项目阈值(难度)参数的组间差异性检验

刘红云李冲张平平骆方《心理学报》2012,44(8):1124-1136

测量工具满足等价性是进行多组比较的前提, 测量等价性的检验方法主要有基于CFA的多组比较法和基于IRT的DIF检验两类方法。文章比较了单维测验情境下基于CCFA的DIFFTEST检验方法和基于IRT模型的IRT-LR检验方法, 以及多维测验情境下DIFFTEST和基于MIRT的卡方检验方法的差异。通过模拟研究的方法, 比较了几种方法的检验力和第一类错误, 并考虑了样本总量、样本量的组间均衡性、测验长度、阈值差异大小以及维度间相关程度的影响。研究结果表明：(1)在单维测验下, IRT-LR是比DIFFTEST更为严格的检验方法; 多维测验下, 在测验较长、测验维度之间相关较高时, MIRT-MG比DIFFTEST更容易检验出项目阈值的差异, 而在测验长度较短、维度之间相关较小时, DIFFTEST的检验力反而略高于MIRT-MG方法。(2)随着阈值差值增加, DIFFTEST、IRT-LR和MIRT-MG三种方法的检验力均在增加, 当阈值差异达到中等或较大时, 三种方法都可以有效检验出测验阈值的不等价性。(3)随着样本总量增加, DIFFTEST、IRT-LR和MIRT-MG方法的检验力均在增加; 在总样本量不变, 两组样本均衡情况下三种方法的检验力均高于不均衡的情况。(4)违背等价性题目个数不变时, 测验越长DIFFTEST的检验力会下降, 而IRT-LR和MIRT-MG检验力则上升。(5) DIFFTEST方法的一类错误率平均值接近名义值0.05; 而IRT-LR和MIRT-MG方法的一类错误率平均值远低于0.05。相似文献

19.

多维对数正态作答时间模型：对潜在加工速度多维性的探究

詹沛达 Hong Jiao Kaiwen Man 《心理学报》2020,52(9):1132-1142

在心理与教育测量中, 潜在加工速度反映学生运用潜在能力解决问题的效率。为在多维测验中探究潜在加工速度的多维性并实现参数估计, 本研究提出多维对数正态作答时间模型。实证数据分析及模拟研究结果表明：(1)潜在加工速度具有与潜在能力相匹配的多维结构; (2)新模型可精确估计个体水平的多维潜在加工速度及与作答时间有关的题目参数; (3)冗余指定潜在加工速度具有多维性带来的负面影响低于忽略其多维性所带来的。相似文献

20.

Log-Multiplicative Association Models as Item Response Models 总被引：1，自引：0，他引：1

Carolyn J. Anderson Hsiu-Ting Yu 《Psychometrika》2007,72(1):5-23

Log-multiplicative association (LMA) models, which are special cases of log-linear models, have interpretations in terms of latent continuous variables. Two theoretical derivations of LMA models based on item response theory (IRT) arguments are presented. First, we show that Anderson and colleagues (Anderson &; Vermunt, 2000; Anderson &; Böckenholt, 2000; Anderson, 2002), who derived LMA models from statistical graphical models, made the equivalent assumptions as Holland (1990) when deriving models for the manifest probabilities of response patterns based on an IRT approach. We also present a second derivation of LMA models where item response functions are specified as functions of rest-scores. These various connections provide insights into the behavior of LMA models as item response models and point out philosophical issues with the use of LMA models as item response models. We show that even for short tests, LMA and standard IRT models yield very similar to nearly identical results when data arise from standard IRT models. Log-multiplicative association models can be used as item response models and do not require numerical integration for estimation. 相似文献