首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
The new R package flirt is introduced for flexible item response theory (IRT) modeling of psychological, educational, and behavior assessment data. flirt integrates a generalized linear and nonlinear mixed modeling framework with graphical model theory. The graphical model framework allows for efficient maximum likelihood estimation. The key feature of flirt is its modular approach to facilitate convenient and flexible model specifications. Researchers can construct customized IRT models by simply selecting various modeling modules, such as parametric forms, number of dimensions, item and person covariates, person groups, link functions, etc. In this paper, we describe major features of flirt and provide examples to illustrate how flirt works in practice.  相似文献   

2.
ABSTRACT

When measuring psychological traits, one has to consider that respondents often show content-unrelated response behavior in answering questionnaires. To disentangle the target trait and two such response styles, extreme responding and midpoint responding, Böckenholt (2012a Böckenholt, U. (2012a). Modeling multiple response processes in judgment and choice. Psychological Methods, 17, 665678. doi:10.1037/a0028111[Crossref], [PubMed], [Web of Science ®] [Google Scholar]) developed an item response model based on a latent processing tree structure. We propose a theoretically motivated extension of this model to also measure acquiescence, the tendency to agree with both regular and reversed items. Substantively, our approach builds on multinomial processing tree (MPT) models that are used in cognitive psychology to disentangle qualitatively distinct processes. Accordingly, the new model for response styles assumes a mixture distribution of affirmative responses, which are either determined by the underlying target trait or by acquiescence. In order to estimate the model parameters, we rely on Bayesian hierarchical estimation of MPT models. In simulations, we show that the model provides unbiased estimates of response styles and the target trait, and we compare the new model and Böckenholt’s model in a recovery study. An empirical example from personality psychology is used for illustrative purposes.  相似文献   

3.
The application of multidimensional item response theory models to repeated observations has demonstrated great promise in developmental research. It allows researchers to take into consideration both the characteristics of item response and measurement error in longitudinal trajectory analysis, which improves the reliability and validity of the latent growth curve (LGC) model. The purpose of this study is to demonstrate the potential of Bayesian methods and the utility of a comprehensive modeling framework, the one combining a measurement model (e.g., a multidimensional graded response model, MGRM) with a structural model (e.g., an associative latent growth curve analysis, ALGC). All analyses are implemented in WinBUGS 1.4.3 (Spiegelhalter, Thomas, Best, &; Lunn, 2003 Spiegelhalter, D. J., Thomas, A., Best, N. G. and Lunn, D. 2003. WinBUGS user manual Cambridge, , UK: MRC Biostatistics Unit, Institute of Public Health. Retrieved October 24, 2006, from http://www.mrc-bsu.cam.ac.uk/bugs [Google Scholar]), which allows researchers to use Markov chain Monte Carlo simulation methods to fit complex statistical models and circumvent intractable analytic or numerical integrations. The utility of this MGRM-ALGC modeling framework was investigated with both simulated and empirical data, and promising results were obtained. As the results indicate, being a flexible multivariate multilevel model, this MGRM-ALGC model not only produces item parameter estimates that are readily estimable and interpretable but also estimates the corresponding covariation in the developmental dimensions. In terms of substantive interpretation, as adolescents perceived themselves more socially isolated, the chance that they are engaged with delinquent peers becomes profoundly larger. Generally, boys have a higher initial exposure extent than girls. However, there is no gender difference associated with other latent growth parameters.  相似文献   

4.
In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone &; Chave's (1929) Thurstone, L. L. and Chave, E. J. 1929. The measurement of attitude: A psychophysical method and some experiments with a scale for measuring attitude toward the church. Chicago, IL: University of Chicago Press.. [Crossref] [Google Scholar] criterion of irrelevance, which is a graphical, exploratory method for evaluating the “relevance” of dichotomous attitude items. We generalized this criterion to graded response items and quantified the relevance by fitting a unimodal smoother. The resulting goodness-of-fit was used to determine item fit and aggregated scale fit. Based on a simulation procedure, cutoff values were proposed for the measures of item fit. These cutoff values showed high power rates and acceptable Type I error rates. We present 2 applications of the OCM method. First, we apply the OCM method to personality data from the Developmental Profile; second, we analyze attitude data collected by Roberts and Laughlin (1996) Roberts, J. S. and Laughlin, J. E. 1996. A unidimensional item response model for unfolding responses from a graded disagree-agree response scale. Applied Psychological Measurement, 20: 231255. [Crossref], [Web of Science ®] [Google Scholar] concerning opinions of capital punishment.  相似文献   

5.
The process-component approach has become quite popular for examining many psychological concepts. A typical example is the model with internal restrictions on item difficulty (MIRID) described by Butter (1994) Butter, R. 1994. Item response models with internal restrictions on item difficulty, Doctoral thesis Leuven, , Belgium: Katholieke Universiteit..  [Google Scholar] and Butter, De Boeck, and Verhelst (1998). This study proposes a hierarchical generalized random-situation random-weight MIRID. The proposed model is more flexible for formulating endogenous latent variables within a multilevel framework, allowing the analysis of polytomous data with complex models (e.g., including item discriminations, random situations, random weights, and heteroskedasticity). The parameters in the proposed model can be estimated using the computer program WinBUGS, which adopts Markov Chain Monte Carlo algorithms. To illustrate the application of the proposed model, a real data set about guilt is analyzed and a comparison of MIRIDs for various conditions is conducted.  相似文献   

6.
There has been renewed interest in Barton and Lord’s (An upper asymptote for the three-parameter logistic item response model (Tech. Rep. No. 80-20). Educational Testing Service, 1981) four-parameter item response model. This paper presents a Bayesian formulation that extends Béguin and Glas (MCMC estimation and some model fit analysis of multidimensional IRT models. Psychometrika, 66 (4):541–561, 2001) and proposes a model for the four-parameter normal ogive (4PNO) model. Monte Carlo evidence is presented concerning the accuracy of parameter recovery. The simulation results support the use of less informative uniform priors for the lower and upper asymptotes, which is an advantage to prior research. Monte Carlo results provide some support for using the deviance information criterion and \(\chi ^{2}\) index to choose among models with two, three, and four parameters. The 4PNO is applied to 7491 adolescents’ responses to a bullying scale collected under the 2005–2006 Health Behavior in School-Aged Children study. The results support the value of the 4PNO to estimate lower and upper asymptotes in large-scale surveys.  相似文献   

7.

Purpose

This study examined whether demographic question placement affects demographic and non-demographic question completion rates, non-demographic item means, and blank questionnaire rates using a web-based survey of Veterans Health Administration employees.

Methodology

Data were taken from the 2010 Voice of the Veterans Administration Survey (VoVA), a voluntary, confidential, web-based survey offered to all VA employees. Participants were given two versions of the questionnaires. One version had demographic questions placed at the beginning and the other version had demographic questions placed at the end of the questionnaire.

Findings

Results indicated that placing demographic questions at the beginning of a questionnaire increased item response rate for demographic items without affecting the item response rate for non-demographic items or the average of item mean scores.

Implications

In addition to validity issues, a goal for surveyors is to maximize response rates and to minimize the number of missing responses. It is therefore important to determine which questionnaire characteristics affect these values. Results of this study suggest demographic placement is an important factor.

Originality/Value

There are various opinions about the most advantageous location of demographic questions in questionnaires; however, the issue has rarely been examined empirically. This study uses an experimental design and a large sample size to examine the effects of demographic placement on survey response characteristics.  相似文献   

8.
In a recent empirical study, Starns, Hicks, Brown, and Martin (Memory & Cognition, 36, 1–8 2008) collected source judgments for old items that participants had claimed to be new and found residual source discriminability depending on the old-new response bias. The authors interpreted their finding as evidence in favor of the bivariate signal-detection model, but against the two-high-threshold model of item/source memory. According to the latter, new responses only follow from the state of old-new uncertainty for which no source discrimination is possible, and the probability of entering this state is independent of the old-new response bias. However, when missed old items were presented for source discrimination, the participants could infer that the items had been previously studied. To test whether this implicit feedback led to second retrieval attempts and thus to source memory for presumably unrecognized items, we replicated Starns et al.’s (Memory & Cognition, 36, 1–8 2008) finding and compared their procedure to a procedure without such feedback. Our results challenge the conclusion to abandon discrete processing in source memory; source memory for unrecognized items is probably an artifact of the procedure, by which implicit feedback prompts participants to reconsider their recognition judgment when asked to rate the source of old items in the absence of item memory.  相似文献   

9.
We present a semi-parametric approach to estimating item response functions (IRF) useful when the true IRF does not strictly follow commonly used functions. Our approach replaces the linear predictor of the generalized partial credit model with a monotonic polynomial. The model includes the regular generalized partial credit model at the lowest order polynomial. Our approach extends Liang’s (A semi-parametric approach to estimate IRFs, Unpublished doctoral dissertation, 2007) method for dichotomous item responses to the case of polytomous data. Furthermore, item parameter estimation is implemented with maximum marginal likelihood using the Bock–Aitkin EM algorithm, thereby facilitating multiple group analyses useful in operational settings. Our approach is demonstrated on both educational and psychological data. We present simulation results comparing our approach to more standard IRF estimation approaches and other non-parametric and semi-parametric alternatives.  相似文献   

10.
Pursuing the line of the difference models in IRT (Thissen &; Steinberg, 1986 Thissen, D., &; Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51:567577. doi:10.1007/BF02295596.[Crossref], [Web of Science ®] [Google Scholar]), this article proposed a new cognitive diagnostic model for graded/polytomous data based on the deterministic input, noisy, and gate (Haertel, 1989 Haertel, E. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 333352. doi:10.1111/j.1745-3984.1989.tb00336.x.[Crossref], [Web of Science ®] [Google Scholar]; Junker &; Sijtsma, 2001 Junker, B. W., &; Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258272. doi:10.1177/01466210122032064.[Crossref], [Web of Science ®] [Google Scholar]), which is named the DINA model for graded data (DINA-GD). We investigated the performance of a full Bayesian estimation of the proposed model. In the simulation, the classification accuracy and item recovery for the DINA-GD model were investigated. The results indicated that the proposed model had acceptable examinees' correct attribute classification rate and item parameter recovery. In addition, a real-data example was used to illustrate the application of this new model with the graded data or polytomously scored items.  相似文献   

11.
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.  相似文献   

12.
We propose a generalization of the speed–accuracy response model (SARM) introduced by Maris and van der Maas (Psychometrika 77:615–633, 2012). In these models, the scores that result from a scoring rule that incorporates both the speed and accuracy of item responses are modeled. Our generalization is similar to that of the one-parameter logistic (or Rasch) model to the two-parameter logistic (or Birnbaum) model in item response theory. An expectation–maximization (EM) algorithm for estimating model parameters and standard errors was developed. Furthermore, methods to assess model fit are provided in the form of generalized residuals for item score functions and saddlepoint approximations to the density of the sum score. The presented methods were evaluated in a small simulation study, the results of which indicated good parameter recovery and reasonable type I error rates for the residuals. Finally, the methods were applied to two real data sets. It was found that the two-parameter SARM showed improved fit compared to the one-parameter SARM in both data sets.  相似文献   

13.
14.
In this article, we propose a simplified version of the maximum information per time unit method (MIT; Fan, Wang, Chang, & Douglas, Journal of Educational and Behavioral Statistics 37: 655–670, 2012), or MIT-S, for computerized adaptive testing. Unlike the original MIT method, the proposed MIT-S method does not require fitting a response time model to the individual-level response time data. It is also computationally efficient. The performance of the MIT-S method was compared against that of the maximum information (MI) method in terms of measurement precision, testing time saving, and item pool usage under various item response theory (IRT) models. The results indicated that when the underlying IRT model is the two- or three-parameter logistic model, the MIT-S method maintains measurement precision and saves testing time. It performs similarly to the MI method in exposure control; both result in highly skewed item exposure distributions, due to heavy reliance on the highly discriminating items. If the underlying model is the one-parameter logistic (1PL) model, the MIT-S method maintains the measurement precision and saves a considerable amount of testing time. However, its heavy reliance on time-saving items leads to a highly skewed item exposure distribution. This weakness can be ameliorated by using randomesque exposure control, which successfully balances the item pool usage. Overall, the MIT-S method with randomesque exposure control is recommended for achieving better testing efficiency while maintaining measurement precision and balanced item pool usage when the underlying IRT model is 1PL.  相似文献   

15.
Lihua Yao 《Psychometrika》2012,77(3):495-523
Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure of item pools, the population distribution of the simulees, the number of items selected, and the content area. The existing procedures such as Volume (Segall in Psychometrika, 61:331?C354, 1996), Kullback?CLeibler information (Veldkamp & van?der Linden in Psychometrika 67:575?C588, 2002), Minimize the error variance of the linear combination (van?der Linden in J. Educ. Behav. Stat. 24:398?C412, 1999), and Minimum Angle (Reckase in Multidimensional item response theory, Springer, New York, 2009) are compared to a new procedure, Minimize the error variance of the composite score with the optimized weight, proposed for the first time in this study. The intent is to find an item selection procedure that yields higher precisions for both the domain and composite abilities and a higher percentage of selected items from the item pool. The comparison is performed by examining the absolute bias, correlation, test reliability, time used, and item usage. Three sets of item pools are used with the item parameters estimated from real live CAT data. Results show that Volume and Minimum Angle performed similarly, balancing information for all content areas, while the other three procedures performed similarly, with a high precision for both domain and overall scores when selecting items with the required number of items for each domain. The new item selection procedure has the highest percentage of item usage. Moreover, for the overall score, it produces similar or even better results compared to those from the method that selects items favoring the general dimension using the general model (Segall in Psychometrika 66:79?C97, 2001); the general dimension method has low precision for the domain scores. In addition to the simulation study, the mathematical theories for certain procedures are derived. The theories are confirmed by the simulation applications.  相似文献   

16.
With a few exceptions, the problem of linking item response model parameters from different item calibrations has been conceptualized as an instance of the problem of test equating scores on different test forms. This paper argues, however, that the use of item response models does not require any test score equating. Instead, it involves the necessity of parameter linking due to a fundamental problem inherent in the formal nature of these models—their general lack of identifiability. More specifically, item response model parameters need to be linked to adjust for the different effects of the identifiability restrictions used in separate item calibrations. Our main theorems characterize the formal nature of these linking functions for monotone, continuous response models, derive their specific shapes for different parameterizations of the 3PL model, and show how to identify them from the parameter values of the common items or persons in different linking designs.  相似文献   

17.
This study employed a multifaceted model assessment approach to investigate the dimensionality and nomological network of a popular measure of trait reactance, the Hong Psychological Reactance Scale (HPRS; Hong &; Page, 1989 Hong, S.-M., &; Page, S. (1989). A psychological reactance scale: Development, factor structure and reliability. Psychological Reports, 64, 13231326.[Crossref], [Web of Science ®] [Google Scholar]). To address confusion regarding the scoring and modeling of the HPRS as well as its limited external validity evidence, we tested competing factor models, diagnosed model–data misfit, examined relationships between competing factor models and key personality traits, and cross-validated the results. Confirmatory factor analytic results supported modeling the HPRS via a bifactor model and, when this model was applied, trait reactance was negatively related to agreeableness, conscientiousness, and conformity, and positively related to entitlement, as expected. However, we also demonstrated the consequences of championing a 1-factor model by highlighting differences in relationships with external variables. Specifically, although modeling the HPRS scores with the bifactor model resulted in greater model–data fit than the 1-factor model, relationships with external variables based on the 2 models differed negligibly. Moreover, bifactor statistical indexes indicted that scores were essentially unidimensional, providing some support that HPRS scores can be treated as unidimensional in structure. Implications for using and scoring the HPRS are discussed.  相似文献   

18.
The Psychopathy Checklist–Revised (PCL–R; Hare, 2003 Hare, R. D. (2003). Hare Psychopathy Checklist–Revised technical manual (2nd ed.). Toronto, ON, Canada: Multi-Health Systems. [Google Scholar]) is one of the most commonly used measures of psychopathy. Scores range from 0 to 40, and legal and mental health professionals sometimes rely on a cut score or threshold to classify individuals as psychopaths. This practice, among other things, assumes that all items contribute equally to the overall raw score. Results from an item response theory analysis (Bolt, Hare, Vitale, &; Newman, 2004 Bolt, D. M., Hare, R. D., Vitale, J. E., &; Newman, J. P. (2004). A multigroup item response theory analysis of the Psychopathy Checklist–Revised. Psychological Assessment, 16, 155168. doi:10.1037/1040-3590.16.2.155[Crossref], [PubMed], [Web of Science ®] [Google Scholar]), however, indicate that PCL–R items differ in the amount of information they can provide about psychopathy. We examined the consequences of these item differences for using a cut score, detailing the consequences for a previously applied cut score of 30 as an example. Results indicated that there were more than 8.5 million different response combinations that equaled 30 and more than 14.2 million that equaled 30 or more. This raw score, like others, corresponded to a broad range of PCL–R-defined psychopathy, indicating that applying cut scores on this measure results in imprecise quantifications of psychopathy. We show that by using the item parameters along with an individual's particular scores on the PCL–R items, it is possible to arrive at a more precise understanding of an individual's level of psychopathy on this instrument.  相似文献   

19.
Bifactor latent structures were introduced over 70 years ago, but only recently has bifactor modeling been rediscovered as an effective approach to modeling construct-relevant multidimensionality in a set of ordered categorical item responses. I begin by describing the Schmid-Leiman bifactor procedure (Schmid &; Leiman, 1957 Schmid, J. 1957. The comparability of the bi-factor and second-order factor patterns. Journal of Experimental Education, 25: 249253. [Taylor &; Francis Online], [Web of Science ®] [Google Scholar]) and highlight its relations with correlated-factors and second-order exploratory factor models. After describing limitations of the Schmid-Leiman, 2 newer methods of exploratory bifactor modeling are considered, namely, analytic bifactor (Jennrich &; Bentler, 2011 Jennrich, R. I. and Bentler, P. M. 2011. Exploratory bi-factor analysis. Psychometrika, 76: 537549. [Crossref], [PubMed], [Web of Science ®] [Google Scholar]) and target bifactor rotations (Reise, Moore, &; Maydeu-Olivares, 2011 Reise, S. P., Moore, T. M. and Maydeu-Olivares, A. 2011. Targeted bifactor rotations and assessing the impact of model violations on the parameters of unidimensional and bifactor models. Educational and Psychological Measurement, 71: 684711. [Crossref], [Web of Science ®] [Google Scholar]). Then I discuss limited- and full-information estimation approaches to confirmatory bifactor models that have emerged from the item response theory and factor analysis traditions, respectively. Comparison of the confirmatory bifactor model to alternative nested confirmatory models and establishing parameter invariance for the general factor also are discussed. Finally, important applications of bifactor models are reviewed. These applications demonstrate that bifactor modeling potentially provides a solid foundation for conceptualizing psychological constructs, constructing measures, and evaluating a measure's psychometric properties. However, some applications of the bifactor model may be limited due to its restrictive assumptions.  相似文献   

20.

Purpose

This research advances understanding of empirical time modeling techniques in self-regulated learning research. We intuitively explain several such methods by situating their use in the extant literature. Further, we note key statistical and inferential assumptions of each method while making clear the inferential consequences of inattention to such assumptions.

Design/Methodology/Approach

Using a population model derived from a recent large-scale review of the training and work learning literature, we employ a Monte Carlo simulation fitting six variations of linear mixed models, seven variations of latent common factor models, and a single latent change score model to 1500 simulated datasets.

Findings

The latent change score model outperformed all six of the linear mixed models and all seven of the latent common factor models with respect to (1) estimation precision of the average learner improvement, (2) correctly rejecting a false null hypothesis about such average improvement, and (3) correctly failing to reject true null hypothesis about between-learner differences (i.e., random slopes) in average improvement.

Implications

The latent change score model is a more flexible method of modeling time in self-regulated learning research, particularly for learner processes consistent with twenty-first-century workplaces. Consequently, defaulting to linear mixed or latent common factor modeling methods may have adverse inferential consequences for better understanding self-regulated learning in twenty-first-century work.

Originality/Value

Ours is the first study to critically, rigorously, and empirically evaluate self-regulated learning modeling methods and to provide a more flexible alternative consistent with modern self-regulated learning knowledge.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号