首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The item response function (IRF) for a polytomously scored item is defined as a weighted sum of the item category response functions (ICRF, the probability of getting a particular score for a randomly sampled examinee of ability ). This paper establishes the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model). Specifically, a proof of the following assertion is provided for these models: If two items have the same IRF, then they must have the same number of categories; moreover, they must consist of the same ICRFs. As a corollary, for the Rasch dichotomous model, if two tests have the same test characteristic function (TCF), then they must have the same number of items. Moreover, for each item in one of the tests, an item in the other test with an identical IRF must exist. Theoretical as well as practical implications of these results are discussed.This research was supported by Educational Testing Service Allocation Projects No. 79409 and No. 79413. The authors wish to thank John Donoghue, Ming-Mei Wang, Rebecca Zwick, and Zhiliang Ying for their useful comments and discussions. The authors also wish to thank three anonymous reviewers for their comments.  相似文献   

2.
刘红云  骆方  王玥  张玉 《心理学报》2012,44(1):121-132
作者简要回顾了SEM框架下分类数据因素分析(CCFA)模型和MIRT框架下测验题目和潜在能力的关系模型, 对两种框架下的主要参数估计方法进行了总结。通过模拟研究, 比较了SEM框架下WLSc和WLSMV估计方法与MIRT框架下MLR和MCMC估计方法的差异。研究结果表明:(1) WLSc得到参数估计的偏差最大, 且存在参数收敛的问题; (2)随着样本量增大, 各种项目参数估计的精度均提高, WLSMV方法与MLR方法得到的参数估计精度差异很小, 大多数情况下不比MCMC方法差; (3)除WLSc方法外, 随着每个维度测验题目的增多参数估计的精度逐渐增高; (4)测验维度对区分度参数和难度参数的影响较大, 而测验维度对项目因素载荷和阈值的影响相对较小; (5)项目参数的估计精度受项目测量维度数的影响, 只测量一个维度的项目参数估计精度较高。另外文章还对两种方法在实际应用中应该注意的问题提供了一些建议。  相似文献   

3.
Maximum likelihood estimation of the linear factor model for continuous items assumes normally distributed item scores. We consider deviations from normality by means of a skew‐normally distributed factor model or a quadratic factor model. We show that the item distributions under a skew‐normal factor are equivalent to those under a quadratic model up to third‐order moments. The reverse only holds if the quadratic loadings are equal to each other and within certain bounds. We illustrate that observed data which follow any skew‐normal factor model can be so well approximated with the quadratic factor model that the models are empirically indistinguishable, and that the reverse does not hold in general. The choice between the two models to account for deviations of normality is illustrated by an empirical example from clinical psychology.  相似文献   

4.
Single-case designs are a class of repeated measures experiments used to evaluate the effects of interventions for small or specialized populations, such as individuals with low-incidence disabilities. There has been growing interest in systematic reviews and syntheses of evidence from single-case designs, but there remains a need to further develop appropriate statistical models and effect sizes for data from the designs. We propose a novel model for single-case data that exhibit nonlinear time trends created by an intervention that produces gradual effects, which build up and dissipate over time. The model expresses a structural relationship between a pattern of treatment assignment and an outcome variable, making it appropriate for both treatment reversal and multiple baseline designs. It is formulated as a generalized linear model so that it can be applied to outcomes measured as frequency counts or proportions, both of which are commonly used in single-case research, while providing readily interpretable effect size estimates such as log response ratios or log odds ratios. We demonstrate the gradual effects model by applying it to data from a single-case study and examine the performance of proposed estimation methods in a Monte Carlo simulation of frequency count data.  相似文献   

5.
To prevent response bias, personality questionnaires may use comparative response formats. These include forced choice, where respondents choose among a number of items, and quantitative comparisons, where respondents indicate the extent to which items are preferred to each other. The present article extends Thurstonian modeling of binary choice data to “proportion-of-total” (compositional) formats. Following the seminal work of Aitchison, compositional item data are transformed into log ratios, conceptualized as differences of latent item utilities. The mean and covariance structure of the log ratios is modeled using confirmatory factor analysis (CFA), where the item utilities are first-order factors, and personal attributes measured by a questionnaire are second-order factors. A simulation study with two sample sizes, N = 300 and N = 1,000, shows that the method provides very good recovery of true parameters and near-nominal rejection rates. The approach is illustrated with empirical data from N = 317 students, comparing model parameters obtained with compositional and Likert-scale versions of a Big Five measure. The results show that the proposed model successfully captures the latent structures and person scores on the measured traits.  相似文献   

6.
简小珠  戴步云  戴海琦 《心理学报》2016,48(12):1625-1630
试题难度、试题考查重要性程度加权是多级记分试题的两个基本属性, 因而在IRT项目特征函数中需用不同参数来表示。以往多级记分模型用多个难度参数来描述多级记分试题的难度, 不能有效的表达多级记分试题的分数权重作用。从多级记分试题的分数加权作用角度, 本文提出Logistic加权模型并论述了理论构建思想。在Logistic加权模型下对项目参数估计的EM算法进行推导并编写了相应的参数估计程序。在Logistic加权模型下进行测验模拟, 发现项目参数估计的模拟返真性能良好。  相似文献   

7.
高旭亮  汪大勋  王芳  蔡艳  涂冬波 《心理学报》2019,51(12):1386-1397
基于分部评分模型的思路, 本文提出了一般化的分部评分认知诊断模型(General Partial Credit Diagnostic Model, GPCDM), 与国际上已有的基于分部评分模型思路的多级评分模型GDM (von Davier, 2008)和PC-DINA (de la Torre, 2012)相比, GPCDM的Q矩阵定义更加灵活, 项目参数的约束条件更少。Monte Carlo实验研究表明, GPCDM模型的参数估计精度指标RMSE介于[0.015, 0.043], 表明估计精度尚可; TIMSS (2007)实证数据应用研究表明, 与GDM和PC-DINA模型相比, GPCDM与该数据的拟合度更好, 并且使用GPCDM分析该数据的诊断效果也更优。总之, 本研究提供了一种约束条件更少、功能更为强大的多级评分认知诊断模型。  相似文献   

8.
当观测指标变量为二分分类数据时,传统的因素分析方法不再适用。作者简要回顾了SEM框架下的分类数据因素分析模型和IRT框架下的测验题目和潜在能力的关系模型,并对两种框架下主要采用的参数估计方法进行了总结。通过两个模拟研究,比较了SEM框架下GLSc和MGLSc估计方法与IRT框架下MML/EM估计方法的差异。研究结果表明:(1)三种方法中,GLSc得到参数估计的偏差最大,MGLSc和MML/EM估计方法相差不大;(2)随着样本量增大,各种项目参数估计的精度均提高;(3)项目因素载荷和难度估计的精度受测验长度的影响;(4)项目因素载荷和区分度估计的精度受总体因素载荷(区分度)高低的影响;(5)测验项目中阈值的分布会影响参数估计的精度,其中受影响最大的是项目区分度。(6)总体来看,SEM框架下的项目参数估计精度较IRT框架下项目参数估计的精度高。此外,文章还将两种方法在实际应用中应该注意的问题提供了一些建议。  相似文献   

9.
孟祥斌 《心理科学》2016,39(3):727-734
近年来,项目反应时间数据的建模是心理和教育测量领域的热门方向之一。针对反应时间的对数正态模型和Box-Cox正态模型的不足,本文在van der Linden的分层模型框架下基于偏正态分布建立一个反应时间的对数线性模型,并成功给出模型参数估计的马尔科夫链蒙特卡罗(Markov Chain Monte Carlo, MCMC)算法。模拟研究和实例分析的结果均表明,与对数正态模型和Box-Cox正态模型相比,对数偏正态模型表现出更加优良的拟合效果,具有更强的灵活性和适用性。  相似文献   

10.
Statistical methods are presented to facilitate a more complete analysis of results obtained when a scaling model is applied to data from two or more groups. These methods can be used to (a) compare the corresponding estimated latent distributions obtained using the scaling model applied to the different groups, (b) compare the corresponding estimated item reliabilities (or item response error rates) for the different groups, and (c) test whether the scaling model applied to the several groups can be replaced by a more parsimonious scaling model that includes various homogeneity constraints (i.e., constraints that describe which parameters in the model are the same for the several groups). Various kinds of scaling models are considered here in the multiple-group context.Support for this research was provided in part by the National Science Foundation, to Clogg by Grant No. SES-7823759 and to Goodman by Grant No. SES-8303838. Clogg and Goodman were Fellows at the Center for Advanced Study in the Behavioral Sciences when part of the research was done, with financial support provided in part by National Science Foundation grant BNS-8011494 to the Center. The authors are indebted to Mark P. Becker and James W. Shockey for helpful comments.  相似文献   

11.
When categorical ordinal item response data are collected over multiple timepoints from a repeated measures design, an item response theory (IRT) modeling approach whose unit of analysis is an item response is suitable. This study proposes a few longitudinal IRT models and illustrates how a popular compensatory multidimensional IRT model can be utilized to formulate such longitudinal IRT models, which permits an investigation of ability growth at both individual and population levels. The equivalence of an existing multidimensional IRT model and those longitudinal IRT models is also elaborated so that one can make use of an existing multidimensional IRT model to implement the longitudinal IRT models.  相似文献   

12.
Conjunctive item response models are introduced such that (a) sufficient statistics for latent traits are not necessarily additive in item scores; (b) items are not necessarily locally independent; and (c) existing compensatory (additive) item response models including the binomial, Rasch, logistic, and general locally independent model are special cases. Simple estimates and hypothesis tests for conjunctive models are introduced and evaluated as well. Conjunctive models are also identified with cognitive models that assume the existence of several individually necessary component processes for a global ability. It is concluded that conjunctive models and methods may show promise for constructing improved tests and uncovering conjunctive cognitive structure. It is also concluded that conjunctive item response theory may help to clarify the relationships between local dependence, multidimensionality, and item response function form.I appreciate the many helpful suggestions that were given by the reviewers and Ivo Molenaar.  相似文献   

13.
阶层线性模型是处理阶层结构数据的高级统计方法, 项目反应理论是精确测量被试能力的现代测量理论。多水平项目反应理论将阶层线性模型和项目反应理论相结合, 将项目反应模型嵌套在阶层线性模型内, 实现了项目参数和不同水平能力参数的估计, 对回归系数和误差项变异的估计也更加精确。作者概述了多水平项目反应理论的发展历程, 并从项目功能差异、测验等值、学校效能研究等方面评述了多水平项目反应理论在心理与教育测量中的应用, 总结了多水平项目反应理论的价值, 同时展望了今后的研究趋势。  相似文献   

14.
汪文义  丁树良 《心理科学》2012,35(2):452-456
目前已有研究证明可达阵在认知诊断测验编制中起重要作用,但迄今为止并没有引起普遍注意。本文主要讨论当题库缺少某些可达阵对应的项目类,对原始题的属性向量在线标定的准确性的影响。本文对含6个属性的独立型结构进行了模拟试验,结果显示:如果题库不充要,原始题的属性标定准确性受到影响,题库中非可达阵中项目对标定有一定的弥补作用。间接印证了可达阵在认知诊断题库起到非常重要的作用。  相似文献   

15.
This article demonstrates the use of mixed-effects logistic regression (MLR) for conducting sequential analyses of binary observational data. MLR is a special case of the mixed-effects logit modeling framework, which may be applied to multicategorical observational data. The MLR approach is motivated in part by G. A. Dagne, G. W. Howe, C. H. Brown, & B. O. Muthén (2002) advances in general linear mixed models for sequential analyses of observational data in the form of contingency table frequency counts. The advantage of the MLR approach is that it circumvents obstacles in the estimation of random sampling error encountered using Dagne and colleagues' approach. This article demonstrates the MLR model in an analysis of observed sequences of communication in a sample of young adult same-sex peer dyads. The results obtained using MLR are compared with those of a parallel analysis using Dagne and colleagues' linear mixed model for binary observational data in the form of log odds ratios. Similarities and differences between the results of the 2 approaches are discussed. Implications for the use of linear mixed models versus mixed-effects logit models for sequential analyses are considered.  相似文献   

16.
Despite the importance of both response probability and response time for testing models of choice, there is a dearth of chronometric studies examining systematic asymmetries that occur over time- and space-orders in the method of paired comparisons. In this study, systematic asymmetries in discriminating the magnitude of paired visual stimuli are examined by way of log odds ratios of binary responses as well as by signed response speed. Hierarchical Bayesian modeling is used to map response probabilities and response speed onto constituent psychological process, and processing capacity is also assessed using response time distribution hazard functions. The findings include characteristic order effects that change systematically in magnitude and direction with changes in the magnitude and separation of the stimuli. After Hellstr?m (1979, 2000), sensation weighting (SW) model analyses show that such order effects are reflected in the weighted accumulation of noisy information about the difference between stimulus values over time, and interindividual differences in weightings asymmetries are related to the relative processing capacity of participants. An account of SW based on the use of reference level information and maximization of signal-to-noise ratios is posited, which finds support from theoretically driven analyses of behavioral data.  相似文献   

17.
The authors review the common methods for measuring strength of contingency between 2 behaviors in a behavioral sequence, the binomial z score and the adjusted cell residual, and point out a number of limitations of these approaches. They present a new approach using log odds ratios and empirical Bayes estimation in the context of hierarchical modeling, an approach not constrained by these limitations. A series of hierarchical models is presented to test the stationarity of behavioral sequences, the homogeneity of sequences across a sample of episodes, and whether covariates can account for variation in sequences across the sample. These models are applied to observational data taken from a study of the behavioral interactions of 254 couples to illustrate their use.  相似文献   

18.
Constant latent odds-ratios models and the mantel-haenszel null hypothesis   总被引:1,自引:0,他引:1  
In the present paper, a new family of item response theory (IRT) models for dichotomous item scores is proposed. Two basic assumptions define the most general model of this family. The first assumption is local independence of the item scores given a unidimensional latent trait. The second assumption is that the odds-ratios for all item-pairs are constant functions of the latent trait. Since the latter assumption is characteristic of the whole family, the models are called constant latent odds-ratios (CLORs) models. One nonparametric special case and three parametric special cases of the general CLORs model are shown to be generalizations of the one-parameter logistic Rasch model. For all CLORs models, the total score (the unweighted sum of the item scores) is shown to be a sufficient statistic for the latent trait. In addition, conditions under the general CLORs model are studied for the investigation of differential item functioning (DIF) by means of the Mantel-Haenszel procedure. This research was supported by the Dutch Organization for Scientific Research (NWO), grant number 400-20-026.  相似文献   

19.
Finite sample inference procedures are considered for analyzing the observed scores on a multiple choice test with several items, where, for example, the items are dissimilar, or the item responses are correlated. A discrete p-parameter exponential family model leads to a generalized linear model framework and, in a special case, a convenient regression of true score upon observed score. Techniques based upon the likelihood function, Akaike's information criteria (AIC), an approximate Bayesian marginalization procedure based on conditional maximization (BCM), and simulations for exact posterior densities (importance sampling) are used to facilitate finite sample investigations of the average true score, individual true scores, and various probabilities of interest. A simulation study suggests that, when the examinees come from two different populations, the exponential family can adequately generalize Duncan's beta-binomial model. Extensions to regression models, the classical test theory model, and empirical Bayes estimation problems are mentioned. The Duncan, Keats, and Matsumura data sets are used to illustrate potential advantages and flexibility of the exponential family model, and the BCM technique.The authors wish to thank Ella Mae Matsumura for her data set and helpful comments, Frank Baker for his advice on item response theory, Hirotugu Akaike and Taskin Atilgan, for helpful discussions regarding AIC, Graham Wood for his advice concerning the class of all binomial mixture models, Yiu Ming Chiu for providing useful references and information on tetrachoric models, and the Editor and two referees for suggesting several references and alternative approaches.  相似文献   

20.
A model for multiple-choice exams is developed from a signal-detection perspective. A correct alternative in a multiple-choice exam can be viewed as being a signal embedded in noise (incorrect alternatives). Examinees are assumed to have perceptions of the plausibility of each alternative, and the decision process is to choose the most plausible alternative. It is also assumed that each examinee either knows or does not know each item. These assumptions together lead to a signal detection choice model for multiple-choice exams. The model can be viewed, statistically, as a mixture extension, with random mixing, of the traditional choice model, or similarly, as a grade-of-membership extension. A version of the model with extreme value distributions is developed, in which case the model simplifies to a mixture multinomial logit model with random mixing. The approach is shown to offer measures of item discrimination and difficulty, along with information about the relative plausibility of each of the alternatives. The model, parameters, and measures derived from the parameters are compared to those obtained with several commonly used item response theory models. An application of the model to an educational data set is presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号