期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Investigating latent constructs with item response models: A MATLAB IRTm toolbox

Johan Braeken Francis Tuerlinckx 《Behavior research methods》2009,41(4):1127-1137

Item response theory (IRT) models are the central tools in modern measurement and advanced psychometrics. We offer a MATLAB IRT modeling (IRTm) toolbox that is freely available and that follows an explicit design matrix approach, giving the end user control and flexibility in building a model that goes beyond standard models, such as the Rasch model (Rasch, 1960) and the two-parameter logistic model. As such, IRTm allows for a large variety of unidimensional IRT models for binary responses, the incorporation of additional person and item information, and deviations from common model assumptions. An exclusive key feature of the toolbox is the inclusion of copula IRT models to handle local item dependencies. Two appendixes for this report, containing example code and information on the general copula IRT in IRTm, may be downloaded from brm.psychonomic-journals.org/content/supplemental. 相似文献

2.

使用题组反应模型缓解局部题目依赖性对多阶段测验的危害

詹沛达高椿雷边玉芳罗照盛《心理科学》2017,40(1):216-223

尽管多阶段测验(MST)在保持自适应测验优点的同时允许测验编制者按照一定的约束条件去建构每一个模块和题板,但建构测验时若因忽视某些潜在的因素而导致题目之间出现局部题目依赖性(LID)时,也会对MST测验结果带来一定的危害。为探究"LID对MST的危害"这一问题,本研究首先介绍了MST和LID等相关概念;然后通过模拟研究比较探讨该问题,结果表明LID的存在会影响被试能力估计的精度但仍为估计偏差较小,且该危害不限于某一特定的路由规则;之后为消除该危害,使用了题组反应模型作为MST施测过程中的分析模型,结果表明尽管该方法能够消除部分危害但效果有限。这一方面表明LID对MST中被试能力估计精度所带来的危害确实值得关注,另一方面也表明在今后关于如何消除MST中由LID造成危害的方法仍值得进一步探究的。相似文献

3.

Limits on Log Odds Ratios for Unidimensional Item Response Theory Models

Shelby J. Haberman Paul W. Holland Sandip Sinharay 《Psychometrika》2007,72(4):551-561

Bounds are established for log odds ratios (log cross-product ratios) involving pairs of items for item response models. First, expressions for bounds on log odds ratios are provided for one-dimensional item response models in general. Then, explicit bounds are obtained for the Rasch model and the two-parameter logistic (2PL) model. Results are also illustrated through an example from a study of model-checking procedures. The bounds obtained can provide an elementary basis for assessment of goodness of fit of these models. Any opinions expressed in this publication are those of the authors and not necessarily those of the Educational Testing Service. The authors thank Dan Eignor, Matthias von Davier, Lydia Gladkova, Brian Junker, and the three anonymous reviewers for their invaluable advice. The authors gratefully acknowledge the help of Kim Fryer with proofreading. 相似文献

4.

Sufficiency and Conditional Estimation of Person Parameters in the Polytomous Rasch Model

David Andrich 《Psychometrika》2010,75(2):292-308

Rasch models are characterised by sufficient statistics for all parameters. In the Rasch unidimensional model for two ordered categories, the parameterisation of the person and item is symmetrical and it is readily established that the total scores of a person and item are sufficient statistics for their respective parameters. In contrast, in the unidimensional polytomous Rasch model for more than two ordered categories, the parameterisation is not symmetrical. Specifically, each item has a vector of item parameters, one for each category, and each person only one person parameter. In addition, different items can have different numbers of categories and, therefore, different numbers of parameters. The sufficient statistic for the parameters of an item is itself a vector. In estimating the person parameters in presently available software, these sufficient statistics are not used to condition out the item parameters. This paper derives a conditional, pairwise, pseudo-likelihood and constructs estimates of the parameters of any number of persons which are independent of all item parameters and of the maximum scores of all items. It also shows that these estimates are consistent. Although Rasch’s original work began with equating tests using test scores, and not with items of a test, the polytomous Rasch model has not been applied in this way. Operationally, this is because the current approaches, in which item parameters are estimated first, cannot handle test data where there may be many scores with zero frequencies. A small simulation study shows that, when using the estimation equations derived in this paper, such a property of the data is no impediment to the application of the model at the level of tests. This opens up the possibility of using the polytomous Rasch model directly in equating test scores. 相似文献

5.

Testing the Rasch model by means of the mixture fit index

《The British journal of mathematical and statistical psychology》2006,59(1):89-95

Rudas, Clogg, and Lindsay (RCL) proposed a new index of fit for contingency table analysis. Using the overparametrized two‐component mixture, where the first component with weight 1?w represents the model to be tested and the second component with weight w is unstructured, the mixture index of fit was defined to be the smallest w compatible with the saturated two‐component mixture. This index of fit, which is insensitive to sample size, is applied to the problem of assessing the fit of the Rasch model. In this application, use is made of the equivalence of the semi‐parametric version of the Rasch model to specifically restricted latent class models. Therefore, the Rasch model can be represented by the structured component of the RCL mixture, with this component itself consisting of two or more subcomponents corresponding to the classes, and the unstructured component capturing the discrepancies between the data and the model. An empirical example demonstrates the application of this approach. Based on four‐item data, the one‐ and two‐class unrestricted latent class models and the one‐ to three‐class models restricted according to the Rasch model are considered, with respect to both their chi‐squared statistics and their mixture fit indices. 相似文献

6.

应用Rasch模型测试和分析儿童入学准备状态

下载免费PDF全文

刘昊刘肖岑冯晓霞《心理科学》2013,36(2):484-488

本研究的目的在于应用Rasch模型编制和分析数学入学准备测验,从而分析Rasch模型的有效性和优势。自编数学入学准备测试,对150名平均年龄为6.6岁的儿童进行测查,应用Rasch模型对题目和评分等级做出修正并分析结果。结果表明修正后的测试具有较好的信效度,较好地拟合了Rasch模型,评分等级设置合理,测试的整体难度相对较低。儿童的Rasch分数和性别无关,但受到年龄、家庭社会经济地位的影响。相对于经典测量理论而言,应用Rasch模型进行入学准备测试的编制和分析具有优势。相似文献

7.

Diagnosing item score patterns on a test using item response theory-based person-fit statistics

Meijer RR 《心理学方法》2003,8(1):72-87

Person-fit statistics have been proposed to investigate the fit of an item score pattern to an item response theory (IRT) model. The author investigated how these statistics can be used to detect different types of misfit. Intelligence test data were analyzed using person-fit statistics in the context of the G. Rasch (1960) model and R. J. Mokken's (1971, 1997) IRT models. The effect of the choice of an IRT model to detect misfitting item score patterns and the usefulness of person-fit statisticsfor diagnosis of misfit are discussed. Results showed that different types of person-fit statistics can be used to detect different kinds of person misfit. Parametric person-fit statistics had more power than nonparametric person-fit statistics. 相似文献

8.

A relation between a between-item multidimensional IRT model and the mixture rasch model

Frank?Rijmen Email author Paul?De Boeck 《Psychometrika》2005,70(3):481-496

Two generalizations of the Rasch model are compared: the between-item multidimensional model (Adams, Wilson, and Wang, 1997), and the mixture Rasch model (Mislevy & Verhelst, 1990; Rost, 1990). It is shown that the between-item multidimensional model is formally equivalent with a continuous mixture of Rasch models for which, within each class of the mixture, the item parameters are equal to the item parameters of the multidimensional model up to a shift parameter that is specific for the dimension an item belongs to in the multidimensional model. In a simulation study, the relation between both types of models also holds when the number of classes of the mixture is as small as two. The relation is illustrated with a study on verbal aggression. Frank Rijmen was supported by the Fund for Scientific Research Flanders (FWO). This research is also funded by the GOA/2000/02 granted from the KU Leuven. We would like to thank Kristof Vansteelandt for providing the data of the study on verbal aggression. 相似文献

9.

多维题组效应Rasch模型 总被引：2，自引：0，他引：2

詹沛达王文中王立君李晓敏《心理学报》2014,46(8):1208-1222

首先, 本文诠释了“题组”的本质即一个存在共同刺激的项目集合。并基于此, 将题组效应划分为项目内单维题组效应和项目内多维题组效应。其次, 本文基于Rasch模型开发了二级评分和多级评分的多维题组效应Rasch模型, 以期较好地处理项目内多维题组效应。最后, 模拟研究结果显示新模型有效合理, 与Rasch题组模型、分部评分模型对比研究后表明：(1)测验存在项目内多维题组效应时, 仅把明显的捆绑式题组效应进行分离而忽略其他潜在的题组效应, 仍会导致参数的偏差估计甚或高估测验信度; (2)新模型更具普适性, 即便当被试作答数据不存在题组效应或只存在项目内单维题组效应, 采用新模型进行测验分析也能得到较好的参数估计结果。相似文献

10.

Using the open-source statistical language R to analyze the dichotomous Rasch model

Li Y 《Behavior research methods》2006,38(3):532-541

R, an open-source statistical language and data analysis tool, is gaining popularity among psychologists currently teaching statistics. R is especially suitable for teaching advanced topics, such as fitting the dichotomous Rasch model-a topic that involves transforming complicated mathematical formulas into statistical computations. This article describes R’s use as a teaching tool and a data analysis software program in the analysis of the Rasch model in item response theory. It also explains the theory behind, as well as an educator’s goals for, fitting the Rasch model with joint maximum likelihood estimation. This article also summarizes the R syntax for parameter estimation and the calculation of fit statistics. The results produced by R is compared with the results obtained from MINI STEP and the output of a conditional logit model. The use of R is encouraged because it is free, supported by a network of peer researchers, and covers both basic and advanced topics in statistics frequently used by psychologists. 相似文献

11.

Scale Alignment in the Between-Item Multidimensional Partial Credit Model

Leah Feuerstahler Mark Wilson 《应用心理检测》2021,45(4):268

In between-item multidimensional item response models, it is often desirable to compare individual latent trait estimates across dimensions. These comparisons are only justified if the model dimensions are scaled relative to each other. Traditionally, this scaling is done using approaches such as standardization—fixing the latent mean and standard deviation to 0 and 1 for all dimensions. However, approaches such as standardization do not guarantee that Rasch model properties hold across dimensions. Specifically, for between-item multidimensional Rasch family models, the unique ordering of items holds within dimensions, but not across dimensions. Previously, Feuerstahler and Wilson described the concept of scale alignment, which aims to enforce the unique ordering of items across dimensions by linearly transforming item parameters within dimensions. In this article, we extend the concept of scale alignment to the between-item multidimensional partial credit model and to models fit using incomplete data. We illustrate this method in the context of the Kindergarten Individual Development Survey (KIDS), a multidimensional survey of kindergarten readiness used in the state of Illinois. We also present simulation results that demonstrate the effectiveness of scale alignment in the context of polytomous item response models and missing data. 相似文献

12.

A goodness of fit test for the rasch model

Erling B. Andersen 《Psychometrika》1973,38(1):123-140

The Rasch model is an item analysis model with logistic item characteristic curves of equal slope,i.e. with constant item discriminating powers. The proposed goodness of fit test is based on a comparison between difficulties estimated from different scoregroups and over-all estimates. Based on the within scoregroup estimates and the over-all estimates of item difficulties a conditional likelihood ratio is formed. It is shown that—2 times the logarithm of this ratio isx ²-distributed when the Rasch model is true. The power of the proposed goodness of fit test is discussed for alternative models with logistic item characteristic curves, but unequal discriminating items from a scholastic aptitude test. 相似文献

13.

Cognitive Complexity in the Remote Association Test - Chinese Version

Su-Pin Hung Po-Sheng Huang Hsueh-Chih Chen 《创造力研究杂志》2016,28(4):442-449

The remote association test (RAT) has been applied in various fields; however, evidence of construct validity for the original version and subsequent extensions of the RAT remains limited. This study aimed to elucidate the dimensionality and the relationship between item features and item difficulties for the RAT—Chinese Version (RAT-C) using the Rasch model and the linear logistic test model (LLTM). The revised 30-item RAT-C was administered to 475 undergraduates (263 women and 212 men) in 8 universities in Taiwan. Item features (including types of associations among stimulus words, and frequency and concreteness of target words) were recoded. The analysis found that the RAT-C measured a single latent construct, with all 30 items conforming to the Rasch model’s expectation. Furthermore, according to the LLTM analysis, most item features predicted Rasch item difficulty, suggesting that these features can explain why some items were more difficult than others and can be used to create new items with known item difficulty to tailor the difficulty level for different groups of participants in the future. 相似文献

14.

Loglinear Rasch model tests 总被引：1，自引：0，他引：1

Hendrikus Kelderman 《Psychometrika》1984,49(2):223-245

Existing statistical tests for the fit of the Rasch model have been criticized, because they are only sensitive to specific violations of its assumptions. Contingency table methods using loglinear models have been used to test various psychometric models. In this paper, the assumptions of the Rasch model are discussed and the Rasch model is reformulated as a quasi-independence model. The model is a quasi-loglinear model for the incomplete subgroup × score × item 1 × item 2 × ... × itemk contingency table. Using ordinary contingency table methods the Rasch model can be tested generally or against less restrictive quasi-loglinear models to investigate specific violations of its assumptions. 相似文献

15.

Rasch analysis of the Hospital Anxiety and Depression Scale in spinal cord injury

R Müller A Cieza S Geyh 《Rehabilitation psychology》2012,57(3):214-223

Purpose: The purpose of this study was to evaluate the psychometric properties of the Hospital Anxiety and Depression Scale (HADS), applied among persons with spinal cord injury (SCI), using Rasch analysis. Methods: A cross-sectional multicenter study was conducted and the data of 102 people with SCI were analyzed. Rasch analyses were performed to assess dimensionality, overall and individual item fit, response scale structure, targeting, and differential item functioning. Results: The anxiety and depression subscales showed unidimensionality, that is, model and item fit. The two subscales are reliable (r = .72, 0.82) in SCI. No disordered structure of the response scales or differential item functioning in age, gender, education, relationship status, level of spinal lesion were found. Stepwise deletion of the misfitting items did not produce a total score that fulfilled the statistical criteria for unidimensionality. Conclusions: The results of the Rasch analyses support the use of the anxiety and depression subscales among people with SCI. However, further research is needed to confirm these findings and examine sensitivity to change of the HADS in SCI, which would support its use in longitudinal observational and intervention studies. (PsycINFO Database Record (c) 2012 APA, all rights reserved). 相似文献

16.

Bayesian item fit analysis for unidimensional item response theory models

《The British journal of mathematical and statistical psychology》2006,59(2):429-449

Assessing item fit for unidimensional item response theory models for dichotomous items has always been an issue of enormous interest, but there exists no unanimously agreed item fit diagnostic for these models, and hence there is room for further investigation of the area. This paper employs the posterior predictive model‐checking method, a popular Bayesian model‐checking tool, to examine item fit for the above‐mentioned models. An item fit plot, comparing the observed and predicted proportion‐correct scores of examinees with different raw scores, is suggested. This paper also suggests how to obtain posterior predictive p‐values (which are natural Bayesian p‐values) for the item fit statistics of Orlando and Thissen that summarize numerically the information in the above‐mentioned item fit plots. A number of simulation studies and a real data application demonstrate the effectiveness of the suggested item fit diagnostics. The suggested techniques seem to have adequate power and reasonable Type I error rate, and psychometricians will find them promising. 相似文献

17.

四参数Logistic模型和传统模型对被试作答拟合能力的比较研究 总被引：1，自引：0，他引：1

下载免费PDF全文

刘玥刘红云《心理学探新》2018,(3):228-235

针对测验中高能力被试答错容易试题的睡眠现象,可使用四参数Logistic模型分析数据。研究选取了来自心理测验和成就测验的实际数据,分别采用传统模型和四参数Logistic模型进行拟合,对不同模型的拟合指标及参数估计结果进行比较。结果表明,四参数Logistic模型能够提高拟合程度,增强估计结果的准确性,有效纠正高能力被试能力被低估的现象。建议在必要时使用四参数Logistic模型进行数据分析。相似文献

18.

The many null distributions of person fit indices 总被引：1，自引：0，他引：1

Ivo W. Molenaar Herbert Hoijtink 《Psychometrika》1990,55(1):75-106

This paper deals with the situation of an investigator who has collected the scores ofn persons to a set ofk dichotomous items, and wants to investigate whether the answers of all respondents are compatible with the one parameter logistic test model of Rasch. Contrary to the standard analysis of the Rasch model, where all persons are kept in the analysis and badly fittingitems may be removed, this paper studies the alternative model in which a small minority ofpersons has an answer strategy not described by the Rasch model. Such persons are called anomalous or aberrant. From the response vectors consisting ofk symbols each equal to 0 or 1, it is desired to classify each respondent as either anomalous or as conforming to the model. As this model is probabilistic, such a classification will possibly involve false positives and false negatives. Both for the Rasch model and for other item response models, the literature contains several proposals for a person fit index, which expresses for each individual the plausibility that his/her behavior follows the model. The present paper argues that such indices can only provide a satisfactory solution to the classification problem if their statistical distribution is known under the null hypothesis that all persons answer according to the model. This distribution, however, turns out to be rather different for different values of the person's latent trait value. This value will be called ability parameter, although our results are equally valid for Rasch scales measuring other attributes.As the true ability parameter is unknown, one can only use its estimate in order to obtain an estimated person fit value and an estimated null hypothesis distribution. The paper describes three specifications for the latter: assuming that the true ability equals its estimate, integrating across the ability distribution assumed for the population, and conditioning on the total score, which is in the Rasch model the sufficient statistic for the ability parameter.Classification rules for aberrance will be worked out for each of the three specifications. Depending on test length, item parameters and desired accuracy, they are based on the exact distribution, its Monte Carlo estimate and a new and promising approximation based on the moments of the person fit statistic. Results for the likelihood person fit statistic are given in detail, the methods could also be applied to other fit statistics. A comparison of the three specifications results in the recommendation to condition on the total score, as this avoids some problems of interpretation that affect the other two specifications.The authors express their gratitude to the reviewers and to many colleagues for comments on an earlier version. 相似文献

19.

A Generalized Speed–Accuracy Response Model for Dichotomous Items

Peter W. van Rijn Usama S. Ali 《Psychometrika》2018,83(1):109-131

We propose a generalization of the speed–accuracy response model (SARM) introduced by Maris and van der Maas (Psychometrika 77:615–633, 2012). In these models, the scores that result from a scoring rule that incorporates both the speed and accuracy of item responses are modeled. Our generalization is similar to that of the one-parameter logistic (or Rasch) model to the two-parameter logistic (or Birnbaum) model in item response theory. An expectation–maximization (EM) algorithm for estimating model parameters and standard errors was developed. Furthermore, methods to assess model fit are provided in the form of generalized residuals for item score functions and saddlepoint approximations to the density of the sum score. The presented methods were evaluated in a small simulation study, the results of which indicated good parameter recovery and reasonable type I error rates for the residuals. Finally, the methods were applied to two real data sets. It was found that the two-parameter SARM showed improved fit compared to the one-parameter SARM in both data sets. 相似文献

20.

Psychological Test Calibration Using the Rasch Model—Some Critical Suggestions on Traditional Approaches

《International Journal of Testing》2013,13(4):377-394

In this article, we emphasize that the Rasch model is not only very useful for psychological test calibration but is also necessary if the number of solved items is to be used as an examinee's score. Simplified proof that the Rasch model implies specific objective parameter comparisons is given. Consequently, a model check per se is possible. For data and item pools that fail to fit the Rasch model, various reasons are listed. For instance, the two-parameter logistic or three-parameter logistic models would probably be more suitable. Several suggestions are given for controlling the overall Type I risk, for including a power analysis (i.e., taking the Type II risk into account), for disclosing artificial model check results, and for the deletion of Rasch model misfitting examinees. These suggestions are empirically founded and may serve in the establishment of certain rough state-of-the-art standards. However, a degree of statistical elaboration is needed; and forthcoming test authors will still suffer from the fact that no standard software exists that offers all of the given approaches as a package. 相似文献