首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Item responses that do not fit an item response theory (IRT) model may cause the latent trait value to be inaccurately estimated. In the past two decades several statistics have been proposed that can be used to identify nonfitting item score patterns. These statistics all yieldscalar values. Here, the use of the person response function (PRF) for identifying nonfitting item score patterns was investigated. The PRF is afunction and can be used for diagnostic purposes. First, the PRF is defined in a class of IRT models that imply an invariant item ordering. Second, a person-fit method proposed by Trabin & Weiss (1983) is reformulated in a nonparametric IRT context assuming invariant item ordering, and statistical theory proposed by Rosenbaum (1987a) is adapted to test locally whether a PRF is nonincreasing. Third, a simulation study was conducted to compare the use of the PRF with the person-fit statistic ZU3. It is concluded that the PRF can be used as a diagnostic tool in person-fit research.The authors are grateful to Coen A. Bernaards for preparing the figures used in this article, and to Wilco H.M. Emons for checking the calculations.  相似文献   

2.
考试抄袭是最难识别的作弊方式。抄袭统计量(ACS)和人员拟合统计量(PFS)是识别抄袭的两类主要统计方法。ACS是根据被怀疑抄袭者与被抄袭者实际得分模式相似的概率来识别抄袭者。PFS 则把一个观察的项目得分模式与一定的测量模型相对比,来检验被试得分模式是否与测量模型预测的模式相吻合。其中,PFS由于在识别异常得分模式时存在一些干扰因素,所以对结果的解释存在多样性,应用较少。ACS是专门用于识别抄袭的统计方法,研究表明其识别率更高。目前ACS指标在美国的SAT和一些资格认证考试中已经得到广泛应用  相似文献   

3.
We illustrate the usefulness of person-fit methodology for personality assessment. For this purpose, we use person-fit methods from item response theory. First, we give a nontechnical introduction to existing person-fit statistics. Second, we analyze data from Harter's (1985) Self-Perception Profile for Children (Harter, 1985) in a sample of children ranging from 8 to 12 years of age (N = 611) and argue that for some children, the scale scores should be interpreted with care and caution. Combined information from person-fit indexes and from observation, interviews, and self-concept theory showed that similar score profiles may have a different interpretation. For some children in the sample, item scores did not adequately reflect their trait level. Based on teacher interviews, this was found to be due most likely to a less developed self-concept and/or problems understanding the meaning of the questions. We recommend investigating the scalability of score patterns when using self-report inventories to help the researcher interpret respondents' behavior correctly.  相似文献   

4.
Person-fit statistics test whether the likelihood of a respondent's complete vector of item scores on a test is low given the hypothesized item response theory model. This binary information may be insufficient for diagnosing the cause of a misfitting item-score vector. The authors propose a comprehensive methodology for person-fit analysis in the context of nonparametric item response theory. The methodology (a) includes H. Van der Flier's (1982) global person-fit statistic U3 to make the binary decision about fit or misfit of a person's item-score vector, (b) uses kernel smoothing (J. O. Ramsay, 1991) to estimate the person-response function for the misfitting item-score vectors, and (c) evaluates unexpected trends in the person-response function using a new local person-fit statistic (W. H. M. Emons, 2003). An empirical data example shows how to use the methodology for practical person-fit analysis.  相似文献   

5.
The Everyday Discrimination Scale (EDS), a widely used measure of daily perceived discrimination, is purported to be unidimensional, to function well among African Americans, and to have adequate construct validity. Two separate studies and data sources were used to examine and cross-validate the psychometric properties of the EDS. In Study 1, an exploratory factor analysis was conducted on a sample of African American law students (N = 589), providing strong evidence of local dependence, or nuisance multidimensionality within the EDS. In Study 2, a separate nationally representative community sample (N = 3,527) was used to model the identified local dependence in an item factor analysis (i.e., bifactor model). Next, item response theory (IRT) calibrations were conducted to obtain item parameters. A five-item, revised-EDS was then tested for gender differential item functioning (in an IRT framework). Based on these analyses, a summed score to IRT-scaled score translation table is provided for the revised-EDS. Our results indicate that the revised-EDS is unidimensional, with minimal differential item functioning, and retains predictive validity consistent with the original scale.  相似文献   

6.
王昭  郭庆科  韩丹 《心理科学》2012,35(5):1225-1232
个人拟合指标是考察心理测验中偏差得分模式的新方法。研究中考察了G、C、MCI、U3、U、W、ECI6、L等8个拟合指标对艾森克人格问卷信效度的影响,以及各指标与正反向题回答不一致项目数的相关。结果表明,删除不同比例拟合程度不好的个体后,测验的信效度明显提高。同时PFS可鉴别人格测验中的默认反应偏差。各指标中l对测验信效度的改善效果最为理想。  相似文献   

7.
In this article, the authors developed a common strategy for identifying differential item functioning (DIF) items that can be implemented in both the mean and covariance structures method (MACS) and item response theory (IRT). They proposed examining the loadings (discrimination) and the intercept (location) parameters simultaneously using the likelihood ratio test with a free-baseline model and Bonferroni corrected critical p values. They compared the relative efficacy of this approach with alternative implementations for various types and amounts of DIF, sample sizes, numbers of response categories, and amounts of impact (latent mean differences). Results indicated that the proposed strategy was considerably more effective than an alternative approach involving a constrained-baseline model. Both MACS and IRT performed similarly well in the majority of experimental conditions. As expected, MACS performed slightly worse in dichotomous conditions but better than IRT in polytomous cases where sample sizes were small. Also, contrary to popular belief, MACS performed well in conditions where DIF was simulated on item thresholds (item means), and its accuracy was not affected by impact.  相似文献   

8.
When scaling data using item response theory, valid statements based on the measurement model are only permissible if the model fits the data. Most item fit statistics used to assess the fit between observed item responses and the item responses predicted by the measurement model show significant weaknesses, such as the dependence of fit statistics on sample size and number of items. In order to assess the size of misfit and to thus use the fit statistic as an effect size, dependencies on properties of the data set are undesirable. The present study describes a new approach and empirically tests it for consistency. We developed an estimator of the distance between the predicted item response functions (IRFs) and the true IRFs by semiparametric adaptation of IRFs. For the semiparametric adaptation, the approach of extended basis functions due to Ramsay and Silverman (2005) is used. The IRF is defined as the sum of a linear term and a more flexible term constructed via basis function expansions. The group lasso method is applied as a regularization of the flexible term, and determines whether all parameters of the basis functions are fixed at zero or freely estimated. Thus, the method serves as a selection criterion for items that should be adjusted semiparametrically. The distance between the predicted and semiparametrically adjusted IRF of misfitting items can then be determined by describing the fitting items by the parametric form of the IRF and the misfitting items by the semiparametric approach. In a simulation study, we demonstrated that the proposed method delivers satisfactory results in large samples (i.e., N ≥ 1,000).  相似文献   

9.
When analysts evaluate performance assessments, they often use modern measurement theory models to identify raters who frequently give ratings that are different from what would be expected, given the quality of the performance. To detect problematic scoring patterns, two rater fit statistics, the infit and outfit mean square error (MSE) statistics are routinely used. However, the interpretation of these statistics is not straightforward. A common practice is that researchers employ established rule-of-thumb critical values to interpret infit and outfit MSE statistics. Unfortunately, prior studies have shown that these rule-of-thumb values may not be appropriate in many empirical situations. Parametric bootstrapped critical values for infit and outfit MSE statistics provide a promising alternative approach to identifying item and person misfit in item response theory (IRT) analyses. However, researchers have not examined the performance of this approach for detecting rater misfit. In this study, we illustrate a bootstrap procedure that researchers can use to identify critical values for infit and outfit MSE statistics, and we used a simulation study to assess the false-positive and true-positive rates of these two statistics. We observed that the false-positive rates were highly inflated, and the true-positive rates were relatively low. Thus, we proposed an iterative parametric bootstrap procedure to overcome these limitations. The results indicated that using the iterative procedure to establish 95% critical values of infit and outfit MSE statistics had better-controlled false-positive rates and higher true-positive rates compared to using traditional parametric bootstrap procedure and rule-of-thumb critical values.  相似文献   

10.
This article proposes a general mixture item response theory (IRT) framework that allows for classes of persons to differ with respect to the type of processes underlying the item responses. Through the use of mixture models, nonnested IRT models with different structures can be estimated for different classes, and class membership can be estimated for each person in the sample. If researchers are able to provide competing measurement models, this mixture IRT framework may help them deal with some violations of measurement invariance. To illustrate this approach, we consider a two-class mixture model, where a person’s responses to Likert-scale items containing a neutral middle category are either modeled using a generalized partial credit model, or through an IRTree model. In the first model, the middle category (“neither agree nor disagree”) is taken to be qualitatively similar to the other categories, and is taken to provide information about the person’s endorsement. In the second model, the middle category is taken to be qualitatively different and to reflect a nonresponse choice, which is modeled using an additional latent variable that captures a person’s willingness to respond. The mixture model is studied using simulation studies and is applied to an empirical example.  相似文献   

11.
An item response theory (IRT) approach to test linking based on summed scores is presented and demonstrated by calibrating a modified 23-item version of the Center for Epidemiologic Studies Depression Scale (CES-D) to the standard 20-item CES-D. Data are from the Depression Patient Outcomes Research Team, II, which used a modified CES-D to measure risk for depression. Responses (N = 1,120) to items on both the original and modified versions were calibrated simultaneously using F. Samejima's (1969, 1997) graded IRT model. The 2 scales were linked on the basis of derived summed-score-to-IRT-score translation tables. The established cut score of 16 on the standard CES-D corresponded most closely to a summed score of 20 on the modified version. The IRT summed-score approach to test linking is a straightforward, valid, and practical method that can be applied in a variety of situations.  相似文献   

12.
Various different item response theory (IRT) models can be used in educational and psychological measurement to analyze test data. One of the major drawbacks of these models is that efficient parameter estimation can only be achieved with very large data sets. Therefore, it is often worthwhile to search for designs of the test data that in some way will optimize the parameter estimates. The results from the statistical theory on optimal design can be applied for efficient estimation of the parameters.A major problem in finding an optimal design for IRT models is that the designs are only optimal for a given set of parameters, that is, they are locally optimal. Locally optimal designs can be constructed with a sequential design procedure. In this paper minimax designs are proposed for IRT models to overcome the problem of local optimality. Minimax designs are compared to sequentially constructed designs for the two parameter logistic model and the results show that minimax design can be nearly as efficient as sequentially constructed designs.  相似文献   

13.
A nonlinear mixed model framework for item response theory   总被引:1,自引:0,他引:1  
Mixed models take the dependency between observations based on the same cluster into account by introducing 1 or more random effects. Common item response theory (IRT) models introduce latent person variables to model the dependence between responses of the same participant. Assuming a distribution for the latent variables, these IRT models are formally equivalent with nonlinear mixed models. It is shown how a variety of IRT models can be formulated as particular instances of nonlinear mixed models. The unifying framework offers the advantage that relations between different IRT models become explicit and that it is rather straightforward to see how existing IRT models can be adapted and extended. The approach is illustrated with a self-report study on anger.  相似文献   

14.
An instrument's sensitivity to detect individual-level change is an important consideration for both psychometric and clinical researchers. In this article, we develop a cognitive problems measure and evaluate its sensitivity to detect change from an item response theory (IRT) perspective. After illustrating assumption checking and model fit assessment, we detail 4 features of IRT modeling: (a) the scale information curve and its relation to the bandwidth of measurement precision, (b) the scale response curve and how it is used to link the latent trait metric with the raw score metric, (c) content-based versus norm-based score referencing, and (d) the level of measurement of the latent trait scale. We conclude that IRT offers an informative, alternative framework for understanding an instrument's psychometric properties and recommend that IRT analyses be considered prior to investigations of change, growth, or the effectiveness of clinical interventions.  相似文献   

15.
This paper provides an introduction to two commonly used item response theory (IRT) models (the two-parameter logistic model and the graded response model). Throughout the paper, the Need for Cognition Scale (NCS) is used to help illustrate different features of the IRT model. After introducing the IRT models, I explore the assumptions these models make as well as ways to assess the extent to which those assumptions are plausible. Next, I describe how adopting an IRT approach to measurement can change how one thinks about scoring, score precision, and scale construction. I briefly introduce the advanced topics of differential item functioning and computerized adaptive testing before concluding with a summary of what was learned about IRT generally, and the NCS specifically.  相似文献   

16.
It is shown that measurement error in predictor variables can be modeled using item response theory (IRT). The predictor variables, that may be defined at any level of an hierarchical regression model, are treated as latent variables. The normal ogive model is used to describe the relation between the latent variables and dichotomous observed variables, which may be responses to tests or questionnaires. It will be shown that the multilevel model with measurement error in the observed predictor variables can be estimated in a Bayesian framework using Gibbs sampling. In this article, handling measurement error via the normal ogive model is compared with alternative approaches using the classical true score model. Examples using real data are given.This paper is part of the dissertation by Fox (2001) that won the 2002 Psychometric Society Dissertation Award.  相似文献   

17.
It is often considered desirable to have the same ordering of the items by difficulty across different levels of the trait or ability. Such an ordering is an invariant item ordering (IIO). An IIO facilitates the interpretation of test results. For dichotomously scored items, earlier research surveyed the theory and methods of an invariant ordering in a nonparametric IRT context. Here the focus is on polytomously scored items, and both nonparametric and parametric IRT models are considered.The absence of the IIO property in twononparametric polytomous IRT models is discussed, and two nonparametric models are discussed that imply an IIO. A method is proposed that can be used to investigate whether empirical data imply an IIO. Furthermore, only twoparametric polytomous IRT models are found to imply an IIO. These are the rating scale model (Andrich, 1978) and a restricted rating scale version of the graded response model (Muraki, 1990). Well-known models, such as the partial credit model (Masters, 1982) and the graded response model (Samejima, 1969), do no imply an IIO.  相似文献   

18.
Missing data, such as item responses in multilevel data, are ubiquitous in educational research settings. Researchers in the item response theory (IRT) context have shown that ignoring such missing data can create problems in the estimation of the IRT model parameters. Consequently, several imputation methods for dealing with missing item data have been proposed and shown to be effective when applied with traditional IRT models. Additionally, a nonimputation direct likelihood analysis has been shown to be an effective tool for handling missing observations in clustered data settings. This study investigates the performance of six simple imputation methods, which have been found to be useful in other IRT contexts, versus a direct likelihood analysis, in multilevel data from educational settings. Multilevel item response data were simulated on the basis of two empirical data sets, and some of the item scores were deleted, such that they were missing either completely at random or simply at random. An explanatory IRT model was used for modeling the complete, incomplete, and imputed data sets. We showed that direct likelihood analysis of the incomplete data sets produced unbiased parameter estimates that were comparable to those from a complete data analysis. Multiple-imputation approaches of the two-way mean and corrected item mean substitution methods displayed varying degrees of effectiveness in imputing data that in turn could produce unbiased parameter estimates. The simple random imputation, adjusted random imputation, item means substitution, and regression imputation methods seemed to be less effective in imputing missing item scores in multilevel data settings.  相似文献   

19.
Woods CM 《心理学方法》2006,11(3):253-270
Popular methods for fitting unidimensional item response theory (IRT) models to data assume that the latent variable is normally distributed in the population of respondents, but this can be unreasonable for some variables. Ramsay-curve IRT (RC-IRT) was developed to detect and correct for this nonnormality. The primary aims of this article are to introduce RC-IRT less technically than it has been described elsewhere; to evaluate RC-IRT for ordinal data via simulation, including new approaches for model selection; and to illustrate RC-IRT with empirical examples. The empirical examples demonstrate the utility of RC-IRT for real data, and the simulation study indicates that when the latent distribution is skewed, RC-IRT results can be more accurate than those based on the normal model. Along with a plot of candidate curves, the Hannan-Quinn criterion is recommended for model selection.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号