共查询到20条相似文献,搜索用时 15 毫秒
1.
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models. 相似文献
2.
Jeffrey N. Rouder Jun Lu Dongchu Sun Paul Speckman Richard Morey Moshe Naveh-Benjamin 《Psychometrika》2007,72(4):621-642
The theory of signal detection is convenient for measuring mnemonic ability in recognition memory paradigms. In these paradigms,
randomly selected participants are asked to study randomly selected items. In practice, researchers aggregate data across
items or participants or both. The signal detection model is nonlinear; consequently, analysis with aggregated data is not
consistent. In fact, mnemonic ability is underestimated, even in the large-sample limit. We present two hierarchical Bayesian
models that simultaneously account for participant and item variability. We show how these models provide for accurate estimation
of participants’ mnemonic ability as well as the memorability of items. The model is benchmarked with a simulation study and
applied to a novel data set.
This research is supported by NSF grants SES-0095919 and SES-0351523, NIH grant R01-MH071418, a University of Missouri Research
Leave grant and fellowships from the Spanish Ministry of Education and the University of Leuven, Belgium. 相似文献
3.
Differential item functioning (DIF), referring to between-group variation in item characteristics above and beyond the group-level disparity in the latent variable of interest, has long been regarded as an important item-level diagnostic. The presence of DIF impairs the fit of the single-group item response model being used, and calls for either model modification or item deletion in practice, depending on the mode of analysis. Methods for testing DIF with continuous covariates, rather than categorical grouping variables, have been developed; however, they are restrictive in parametric forms, and thus are not sufficiently flexible to describe complex interaction among latent variables and covariates. In the current study, we formulate the probability of endorsing each test item as a general bivariate function of a unidimensional latent trait and a single covariate, which is then approximated by a two-dimensional smoothing spline. The accuracy and precision of the proposed procedure is evaluated via Monte Carlo simulations. If anchor items are available, we proposed an extended model that simultaneously estimates item characteristic functions (ICFs) for anchor items, ICFs conditional on the covariate for non-anchor items, and the latent variable density conditional on the covariate—all using regression splines. A permutation DIF test is developed, and its performance is compared to the conventional parametric approach in a simulation study. We also illustrate the proposed semiparametric DIF testing procedure with an empirical example. 相似文献
4.
In multidimensional item response models, paradoxical scoring effects can arise, wherein correct answers are penalized and incorrect answers are rewarded. For the most prominent class of IRT models, the class of linearly compensatory models, a general derivation of paradoxical scoring effects based on the geometry of item discrimination vectors is given, which furthermore corrects an error in an established theorem on paradoxical results. This approach highlights the very counterintuitive way in which item discrimination parameters (and also factor loadings) have to be interpreted in terms of their influence on the latent ability estimate. It is proven that, despite the error in the original proof, the key result concerning the existence of paradoxical effects remains true—although the actual relation to the item parameters is shown to be a more complicated function than previous results suggested. The new proof enables further insights into the actual mathematical causation of the paradox and generalizes the findings within the class of linearly compensatory models. 相似文献
5.
This paper presents an explanatory multidimensional multilevel random item response model and its application to reading data with multilevel item structure. The model includes multilevel random item parameters that allow consideration of variability in item parameters at both item and item group levels. Item-level random item parameters were included to model unexplained variance remaining when item related covariates were used to explain variation in item difficulties. Item group-level random item parameters were included to model dependency in item responses among items having the same item stem. Using the model, this study examined the dimensionality of a person’s word knowledge, termed lexical representation, and how aspects of morphological knowledge contributed to lexical representations for different persons, items, and item groups. 相似文献
6.
7.
8.
《International Journal of Testing》2013,13(4):365-384
Item response theory (IRT) has become one of the most popular scoring frameworks for measurement data. IRT models are used frequently in computerized adaptive testing, cognitively diagnostic assessment, and test equating. This article reviews two of the most popular software packages for IRT model estimation, BILOG-MG (Zimowski, Muraki, Mislevy, & Bock, 1996) and MULTILOG (Thissen, 1991), which are for the first time available on a single CD-ROM with new features. Most prominently, the number of items to be calibrated and examinees to be scored is now limited only by memory capacities of the hardware, MULTILOG has an interactive Windows-oriented process for creating basic command file syntax, and both BILOG-MG and MULTILOG come with a new graphics interface that displays numerous curves relevant to IRT analyses in a professional format. This article reviews the models that are and are not estimable with these programs and describes the fundamental ideas of the underlying estimation algorithms without providing detailed derivations. Moreover, the user-friendliness of both programs is assessed with a user in mind who is interested in easy-to-use IRT estimation programs within a Windows point-and-click environment. Both programs fulfill such an expectation to a large degree; yet, this review also points out some obstacles that someone relatively unfamiliar to IRT or syntax programming might have to overcome to obtain meaningful results. 相似文献
9.
With reference to a questionnaire aimed at assessing the performance of Italian nursing homes on the basis of the health conditions of their patients, we investigate two relevant issues: dimensionality of the latent structure and discriminating power of the items composing the questionnaire. The approach is based on a multidimensional item response theory model, which assumes a two-parameter logistic parameterization for the response probabilities. This model represents the health status of a patient by latent variables having a discrete distribution and, therefore, it may be seen as a constrained version of the latent class model. On the basis of the adopted model, we implement a hierarchical clustering algorithm aimed at assessing the actual number of dimensions measured by the questionnaire. These dimensions correspond to disjoint groups of items. Once the number of dimensions is selected, we also study the discriminating power of every item, so that it is possible to select the subset of these items which is able to provide an amount of information close to that of the full set. We illustrate the proposed approach on the basis of the data collected on 1,051 elderly people hosted in a sample of Italian nursing homes. 相似文献
10.
Sonya K. Sterba Ruth E. Mathiowetz Daniel J. Bauer 《Multivariate behavioral research》2013,48(4):658-659
Abstract Conventional growth models assume that the random effects describing individual trajectories are conditionally normal. In practice, this assumption may often be unrealistic. As an alternative, Nagin (2005) suggested a semiparametric group-based approach (SPGA) which approximates an unknown, continuous distribution of individual trajectories with a mixture of group trajectories. Prior simulations (Brame, Nagin, &; Wasserman, 2006; Nagin, 2005) indicated that SPGA could generate nearly-unbiased estimates of means and variances of a nonnormal distribution of individual trajectories, as functions of group-trajectory estimates. However, these studies used few random effects—usually only a random intercept. Based on the analytical relationship between SPGA and adaptive quadrature, we hypothesized that SPGA's ability to approximate (a) random effect variances/covariances and (b) effects of time-invariant predictors of growth should deteriorate as the dimensionality of the random effects distribution increases. We expected this problem to be mitigated by correlations among the random effects (highly correlated random effects functioning as fewer dimensions) and sample size (larger N supporting more groups). We tested these hypotheses via simulation, varying the number of random effects (1, 2, or 3), correlation among the random effects (0 or .6), and N (250, 500). Results indicated that, as the number of random effects increased, SPGA approximations remained acceptable for fixed effects, but became increasingly negatively biased for random effect variances. Whereas correlated random effects and larger N reduced this underestimation, correlated random effects sometimes distorted recovery of predictor effects. To illustrate this underestimation, Figure 1 depicts SPGA's approximation of the intercept variance from a three correlated random effect generating model (N < eqid1 > 500). These results suggest SPGA approximations are inadequate for the nonnormal, high-dimensional distributions of individual trajectories often seen in practice.
Abstract: Adequacy of Semiparametric Approximations for Growth Models with Nonnormal Random Effects
Published online:
18 December 2008 FIGURE 1 SPGA-approximated intercept variance from a three correlated random effect generating model. Notes. The dashed horizontal lines denote + 10% bias. The solid horizontal line denotes the population-generating parameter value; * denotes the best-BIC selected number of groups. The vertical bars denote 90% confidence intervals. 相似文献
11.
Collin Rice 《No?s (Detroit, Mich.)》2015,49(3):589-615
A prominent approach to scientific explanation and modeling claims that for a model to provide an explanation it must accurately represent at least some of the actual causes in the event's causal history. In this paper, I argue that many optimality explanations present a serious challenge to this causal approach. I contend that many optimality models provide highly idealized equilibrium explanations that do not accurately represent the causes of their target system(s). Furthermore, in many contexts, it is in virtue of their independence of causes that optimality models are able to provide a better explanation than competing causal models. Consequently, our account of explanation and modeling must expand beyond the causal approach. 相似文献
12.
13.
Composite links and exploded likelihoods are powerful yet simple tools for specifying a wide range of latent variable models.
Applications considered include survival or duration models, models for rankings, small area estimation with census information,
models for ordinal responses, item response models with guessing, randomized response models, unfolding models, latent class
models with random effects, multilevel latent class models, models with log-normal latent variables, and zero-inflated Poisson
models with random effects. Some of the ideas are illustrated by estimating an unfolding model for attitudes to female work
participation.
We wish to thank The Research Council of Norway for a grant supporting our collaboration. 相似文献
14.
This Monte Carlo study examined the impact of misspecifying the 𝚺 matrix in longitudinal data analysis under both the multilevel model and mixed model frameworks. Under the multilevel model approach, under-specification and general-misspecification of the 𝚺 matrix usually resulted in overestimation of the variances of the random effects (e.g., τ00, ττ11 ) and standard errors of the corresponding growth parameter estimates (e.g., SEβ 0, SEβ 1). Overestimates of the standard errors led to lower statistical power in tests of the growth parameters. An unstructured 𝚺 matrix under the mixed model framework generally led to underestimates of standard errors of the growth parameter estimates. Underestimates of the standard errors led to inflation of the type I error rate in tests of the growth parameters. Implications of the compensatory relationship between the random effects of the growth parameters and the longitudinal error structure for model specification were discussed. 相似文献
15.
In item response theory modeling of responses and response times, it is commonly assumed that the item responses have the same characteristics across the response times. However, heterogeneity might arise in the data if subjects resort to different response processes when solving the test items. These differences may be within-subject effects, that is, a subject might use a certain process on some of the items and a different process with different item characteristics on the other items. If the probability of using one process over the other process depends on the subject’s response time, within-subject heterogeneity of the item characteristics across the response times arises. In this paper, the method of response mixture modeling is presented to account for such heterogeneity. Contrary to traditional mixture modeling where the full response vectors are classified, response mixture modeling involves classification of the individual elements in the response vector. In a simulation study, the response mixture model is shown to be viable in terms of parameter recovery. In addition, the response mixture model is applied to a real dataset to illustrate its use in investigating within-subject heterogeneity in the item characteristics across response times. 相似文献
16.
Randomized response (RR) is a well-known method for measuring sensitive behavior. Yet this method is not often applied because:
(i) of its lower efficiency and the resulting need for larger sample sizes which make applications of RR costly; (ii) despite
its privacy-protection mechanism the RR design may not be followed by every respondent; and (iii) the incorrect belief that
RR yields estimates only of aggregate-level behavior but that these estimates cannot be linked to individual-level covariates.
This paper addresses the efficiency problem by applying item randomized-response (IRR) models for the analysis of multivariate
RR data. In these models, a person parameter is estimated based on multiple measures of a sensitive behavior under study which
allow for more powerful analyses of individual differences than available from univariate RR data. Response behavior that
does not follow the RR design is approached by introducing mixture components in the IRR models with one component consisting
of respondents who answer truthfully and another component consisting of respondents who do not provide truthful responses.
An analysis of data from two large-scale Dutch surveys conducted among recipients of invalidity insurance benefits shows that
the willingness of a respondent to answer truthfully is related to the educational level of the respondents and the perceived
clarity of the instructions. A person is more willing to comply when the expected benefits of noncompliance are minor and
social control is strong.
The authors are grateful to the reviewers whose suggestions helped to improve the clarity of the paper substantially. The
authors also wish to thank the Dutch Ministry of Social Affairs and Employment for making the reported data available. This
research was supported in parts by grants from the Social Sciences and Humanities Research Council of Canada and the Canadian
Foundation of Innovation. 相似文献
17.
Lesa Hoffman 《Multivariate behavioral research》2013,48(4):609-629
Heterogeneity of variance may be more than a statistical nuisance—it may be of direct interest as a result of individual differences. In studies of short-term fluctuation, individual differences may relate to the magnitude of within-person variation as well as to level of an outcome or its covariation with other processes. Although models for heterogeneous variances have been utilized in group contexts (i.e., dispersion models), they are not usually applied in examinations of intraindividual variation. This work illustrates how an extension of the multilevel model for heterogeneous variances can be used to examine individual differences in level, between- and within-person covariation, and magnitude of within-person variation of daily positive and negative mood in persons with dementia. 相似文献
18.
Scott C. Roesch Arianna A. Aldridge Stephanie N. Stocking Feion Villodas Queenie Leung Carrie E. Bartley 《Multivariate behavioral research》2013,48(5):767-789
This study used multilevel modeling of daily diary data to model within-person (state) and between-person (trait) components of coping variables. This application included the introduction of multilevel factor analysis (MFA) and a comparison of the predictive ability of these trait/state factors. Daily diary data were collected on a large (n = 366) multiethnic sample over the course of 5 days. Intraclass correlation coefficient for the derived factors suggested approximately equal amounts of variability in coping usage at the state and trait levels. MFAs showed that Problem-Focused Coping and Social Support emerged as stable factors at both the within-person and between-person levels. Other factors (Minimization, Emotional Rumination, Avoidance, Distraction) were specific to the within-person or between-person levels but not both. Multilevel structural equation modeling (MSEM) showed that the prediction of daily positive and negative affect differed as a function of outcome and level of coping factor. The Discussion section focuses primarily on a conceptual and methodological understanding of modeling state and trait coping using daily diary data with MFA and MSEM to examine covariation among coping variables and predicting outcomes of interest. 相似文献
19.
Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions 总被引:1,自引:0,他引:1
Traditional testing procedures typically utilize unidimensional item response theory (IRT) models to provide a single, continuous estimate of a student’s overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed feedback for students, teachers, and other stakeholders. Diagnostic classification models (DCMs) provide multidimensional feedback by using categorical latent variables that represent distinct skills underlying a test that students may or may not have mastered. The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of a unidimensional IRT model and a DCM where the categorical latent variables represent misconceptions instead of skills. In addition to an estimate of ability along a latent continuum, the SICM model provides multidimensional, diagnostic feedback in the form of statistical estimates of probabilities that students have certain misconceptions. Through an empirical data analysis, we show how this additional feedback can be used by stakeholders to tailor instruction for students’ needs. We also provide results from a simulation study that demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions. 相似文献
20.
The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter θ is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given level of significance in many cases; and, therefore, the corresponding intervals are no longer confidence intervals in terms of the actual definition. In the present work, confidence intervals are defined more precisely by utilizing the relationship between confidence intervals and hypothesis testing. Two approaches to confidence interval construction are explored that are optimal with respect to criteria of smallness and consistency with the standard approach. 相似文献