首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Residual analysis (e.g. Hambleton & Swaminathan, Item response theory: principles and applications, Kluwer Academic, Boston, 1985; Hambleton, Swaminathan, & Rogers, Fundamentals of item response theory, Sage, Newbury Park, 1991) is a popular method to assess fit of item response theory (IRT) models. We suggest a form of residual analysis that may be applied to assess item fit for unidimensional IRT models. The residual analysis consists of a comparison of the maximum-likelihood estimate of the item characteristic curve with an alternative ratio estimate of the item characteristic curve. The large sample distribution of the residual is proved to be standardized normal when the IRT model fits the data. We compare the performance of our suggested residual to the standardized residual of Hambleton et al. (Fundamentals of item response theory, Sage, Newbury Park, 1991) in a detailed simulation study. We then calculate our suggested residuals using data from an operational test. The residuals appear to be useful in assessing the item fit for unidimensional IRT models.  相似文献   

2.
With a few exceptions, the problem of linking item response model parameters from different item calibrations has been conceptualized as an instance of the problem of test equating scores on different test forms. This paper argues, however, that the use of item response models does not require any test score equating. Instead, it involves the necessity of parameter linking due to a fundamental problem inherent in the formal nature of these models—their general lack of identifiability. More specifically, item response model parameters need to be linked to adjust for the different effects of the identifiability restrictions used in separate item calibrations. Our main theorems characterize the formal nature of these linking functions for monotone, continuous response models, derive their specific shapes for different parameterizations of the 3PL model, and show how to identify them from the parameter values of the common items or persons in different linking designs.  相似文献   

3.
For trinary partial credit items the shape of the item information and the item discrimination function is examined in relation to the item parameters. In particular, it is shown that these functions are unimodal if 21 < 4 ln 2 and bimodal otherwi The locations and values of the maxima are derived. Furthermore, it is demonstrated that the value of the maximum is decreasing in 21. Consequently, the maximum of a unimodal item information function is always larger than the maximum of a bimodal one, and similarly for the item discrimination function.The work reported herein was partially supported under the National Assessment of Educational Progress (Grant No. R999G30002; CFDA No. 84.999G) as administered by the Office of Educational Research and Improvement, US Department of Education.  相似文献   

4.
Lihua Yao 《Psychometrika》2012,77(3):495-523
Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure of item pools, the population distribution of the simulees, the number of items selected, and the content area. The existing procedures such as Volume (Segall in Psychometrika, 61:331?C354, 1996), Kullback?CLeibler information (Veldkamp & van?der Linden in Psychometrika 67:575?C588, 2002), Minimize the error variance of the linear combination (van?der Linden in J. Educ. Behav. Stat. 24:398?C412, 1999), and Minimum Angle (Reckase in Multidimensional item response theory, Springer, New York, 2009) are compared to a new procedure, Minimize the error variance of the composite score with the optimized weight, proposed for the first time in this study. The intent is to find an item selection procedure that yields higher precisions for both the domain and composite abilities and a higher percentage of selected items from the item pool. The comparison is performed by examining the absolute bias, correlation, test reliability, time used, and item usage. Three sets of item pools are used with the item parameters estimated from real live CAT data. Results show that Volume and Minimum Angle performed similarly, balancing information for all content areas, while the other three procedures performed similarly, with a high precision for both domain and overall scores when selecting items with the required number of items for each domain. The new item selection procedure has the highest percentage of item usage. Moreover, for the overall score, it produces similar or even better results compared to those from the method that selects items favoring the general dimension using the general model (Segall in Psychometrika 66:79?C97, 2001); the general dimension method has low precision for the domain scores. In addition to the simulation study, the mathematical theories for certain procedures are derived. The theories are confirmed by the simulation applications.  相似文献   

5.
杨向东 《心理科学进展》2010,18(8):1349-1358
从测验项目解决的认知过程的视角分析了在不同测验理论框架下的测量模型中的基本假设, 指出测量模型是测验开发者有关测验项目反应机制的理论假设的具体表征, 是系统检验测量假设和过程的统计框架。然而, 不管是经典测验理论、概化理论, 还是早期的项目反应理论模型, 相关假设都过于简化, 缺少相应实质理论的支持。与之相比, 认知测量模型强调与个体在测验项目反应过程中的认知过程、认知策略和知识结构的对应性, 提供了在实质理论基础上界定测量建构、设计测验项目、进行建模分析和解释的可能性, 为日益边缘化的心理测量学和主流心理学研究的融合奠定了基础。  相似文献   

6.
The Wisconsin Schizotypy Scales are widely used for assessing schizotypy in nonclinical and clinical samples. However, they were developed using classical test theory (CTT) and have not had their psychometric properties examined with more sophisticated measurement models. The present study employed item response theory (IRT) as well as traditional CTT to examine psychometric properties of four of the schizotypy scales on the item and scale level, using a large sample of undergraduate students (n = 6,137). In addition, we investigated differential item functioning (DIF) for sex and ethnicity. The analyses revealed many strengths of the four scales, but some items had low discrimination values and many items had high DIF. The results offer useful guidance for applied users and for future development of these scales.  相似文献   

7.
There has been renewed interest in Barton and Lord’s (An upper asymptote for the three-parameter logistic item response model (Tech. Rep. No. 80-20). Educational Testing Service, 1981) four-parameter item response model. This paper presents a Bayesian formulation that extends Béguin and Glas (MCMC estimation and some model fit analysis of multidimensional IRT models. Psychometrika, 66 (4):541–561, 2001) and proposes a model for the four-parameter normal ogive (4PNO) model. Monte Carlo evidence is presented concerning the accuracy of parameter recovery. The simulation results support the use of less informative uniform priors for the lower and upper asymptotes, which is an advantage to prior research. Monte Carlo results provide some support for using the deviance information criterion and \(\chi ^{2}\) index to choose among models with two, three, and four parameters. The 4PNO is applied to 7491 adolescents’ responses to a bullying scale collected under the 2005–2006 Health Behavior in School-Aged Children study. The results support the value of the 4PNO to estimate lower and upper asymptotes in large-scale surveys.  相似文献   

8.
9.
ABSTRACT

The Informant Questionnaire on Cognitive Decline (IQCODE) is a formal informant report instrument, originally developed by Jorm and Jacomb (1989 Jorm, A. F. and Jacomb, P. A. 1989. The Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE): Socio-demographic correlates, reliability, validity and some norms. Psychological Medicine, 19(4): 10151022. [Crossref], [PubMed], [Web of Science ®] [Google Scholar]; Psychological Medicine, 19(4), 1015). The goal of the present study was to evaluate the range of cognitive decline in which the IQCODE is most sensitive, using item response theory (IRT). Existing data (N = 740) from a sample of community-dwelling older adults was used for this purpose. A 2-parameter model estimating item difficulty and discrimination fit the data best. Additionally, the IQCODE provided the most psychometric information in the range of ?0.5 < θ < 1.5, with peak information obtained at approximately θ = 0.4. Based on individuals' latent score (θ) estimates, items on the IQCODE are adequate for use as a screening tool for dementia. Results of the item calibration may be useful for targeted assessment needs, such as the development of short forms.  相似文献   

10.
Hooker, Finkelman, and Schwartzman (Psychometrika, 2009, in press) defined a paradoxical result as the attainment of a higher test score by changing answers from correct to incorrect and demonstrated that such results are unavoidable for maximum likelihood estimates in multidimensional item response theory. The potential for these results to occur leads to the undesirable possibility of a subject’s best answer being detrimental to them. This paper considers the existence of paradoxical results in tests composed of item bundles when compensatory models are used. We demonstrate that paradoxical results can occur when bundle effects are modeled as nuisance parameters for each subject. However, when these nuisance parameters are modeled as random effects, or used in a Bayesian analysis, it is possible to design tests comprised of many short bundles that avoid paradoxical results and we provide an algorithm for doing so. We also examine alternative models for handling dependence between item bundles and show that using fixed dependency effects is always guaranteed to avoid paradoxical results.  相似文献   

11.
The application of multidimensional item response theory models to repeated observations has demonstrated great promise in developmental research. It allows researchers to take into consideration both the characteristics of item response and measurement error in longitudinal trajectory analysis, which improves the reliability and validity of the latent growth curve (LGC) model. The purpose of this study is to demonstrate the potential of Bayesian methods and the utility of a comprehensive modeling framework, the one combining a measurement model (e.g., a multidimensional graded response model, MGRM) with a structural model (e.g., an associative latent growth curve analysis, ALGC). All analyses are implemented in WinBUGS 1.4.3 (Spiegelhalter, Thomas, Best, &; Lunn, 2003 Spiegelhalter, D. J., Thomas, A., Best, N. G. and Lunn, D. 2003. WinBUGS user manual Cambridge, , UK: MRC Biostatistics Unit, Institute of Public Health. Retrieved October 24, 2006, from http://www.mrc-bsu.cam.ac.uk/bugs [Google Scholar]), which allows researchers to use Markov chain Monte Carlo simulation methods to fit complex statistical models and circumvent intractable analytic or numerical integrations. The utility of this MGRM-ALGC modeling framework was investigated with both simulated and empirical data, and promising results were obtained. As the results indicate, being a flexible multivariate multilevel model, this MGRM-ALGC model not only produces item parameter estimates that are readily estimable and interpretable but also estimates the corresponding covariation in the developmental dimensions. In terms of substantive interpretation, as adolescents perceived themselves more socially isolated, the chance that they are engaged with delinquent peers becomes profoundly larger. Generally, boys have a higher initial exposure extent than girls. However, there is no gender difference associated with other latent growth parameters.  相似文献   

12.
This paper discusses two forms of separability of item and person parameters in the context of response time (RT) models. The first is separate sufficiency: the existence of sufficient statistics for the item (person) parameters that do not depend on the person (item) parameters. The second is ranking independence: the likelihood of the item (person) ranking with respect to RTs does not depend on the person (item) parameters. For each form a theorem stating sufficient conditions, is proved. The two forms of separability are shown to include several (special cases of) models from psychometric and biometric literature. Ranking independence imposes no restrictions on the general distribution form, but on its parametrization. An estimation procedure based upon ranks and pseudolikelihood theory is discussed, as well as the relation of ranking independence to the concept of double monotonicity.I am indebted to Wim van der Linden for bringing Thissen's (1983) paper to my notice, and to Martijn Berger, Frans Tan, and the anonymous reviewers for their constructive comments on earlier drafts of this paper.  相似文献   

13.

Purpose

This study examined whether demographic question placement affects demographic and non-demographic question completion rates, non-demographic item means, and blank questionnaire rates using a web-based survey of Veterans Health Administration employees.

Methodology

Data were taken from the 2010 Voice of the Veterans Administration Survey (VoVA), a voluntary, confidential, web-based survey offered to all VA employees. Participants were given two versions of the questionnaires. One version had demographic questions placed at the beginning and the other version had demographic questions placed at the end of the questionnaire.

Findings

Results indicated that placing demographic questions at the beginning of a questionnaire increased item response rate for demographic items without affecting the item response rate for non-demographic items or the average of item mean scores.

Implications

In addition to validity issues, a goal for surveyors is to maximize response rates and to minimize the number of missing responses. It is therefore important to determine which questionnaire characteristics affect these values. Results of this study suggest demographic placement is an important factor.

Originality/Value

There are various opinions about the most advantageous location of demographic questions in questionnaires; however, the issue has rarely been examined empirically. This study uses an experimental design and a large sample size to examine the effects of demographic placement on survey response characteristics.  相似文献   

14.
We develop a latent variable selection method for multidimensional item response theory models. The proposed method identifies latent traits probed by items of a multidimensional test. Its basic strategy is to impose an \(L_{1}\) penalty term to the log-likelihood. The computation is carried out by the expectation–maximization algorithm combined with the coordinate descent algorithm. Simulation studies show that the resulting estimator provides an effective way in correctly identifying the latent structures. The method is applied to a real dataset involving the Eysenck Personality Questionnaire.  相似文献   

15.
In this article we propose a model-free diagnostic for single-peakedness (unimodality) of item responses. Presuming a unidimensional unfolding scale and a given item ordering, we approximate item response functions of all items based on ordered conditional means (OCM). The proposed OCM methodology is based on Thurstone &; Chave's (1929) Thurstone, L. L. and Chave, E. J. 1929. The measurement of attitude: A psychophysical method and some experiments with a scale for measuring attitude toward the church. Chicago, IL: University of Chicago Press.. [Crossref] [Google Scholar] criterion of irrelevance, which is a graphical, exploratory method for evaluating the “relevance” of dichotomous attitude items. We generalized this criterion to graded response items and quantified the relevance by fitting a unimodal smoother. The resulting goodness-of-fit was used to determine item fit and aggregated scale fit. Based on a simulation procedure, cutoff values were proposed for the measures of item fit. These cutoff values showed high power rates and acceptable Type I error rates. We present 2 applications of the OCM method. First, we apply the OCM method to personality data from the Developmental Profile; second, we analyze attitude data collected by Roberts and Laughlin (1996) Roberts, J. S. and Laughlin, J. E. 1996. A unidimensional item response model for unfolding responses from a graded disagree-agree response scale. Applied Psychological Measurement, 20: 231255. [Crossref], [Web of Science ®] [Google Scholar] concerning opinions of capital punishment.  相似文献   

16.
Using a random sample of children in grades one through six as well as a sample of children referred for social or emotional problems, the item level validity of the Devereux Elementary School Behavior Rating Scale was assessed. Results indicated that Devereux responses for normal children tended to be skewed, with the item means falling near the positive end of the response continuum and with narrow standard deviations. Item means in the referred sample were closer to the center of the distribution and tended to have larger standard deviations. When item-to-subscale correlations were considered, the Devereux items, in general, were significantly correlated with their home subscale. Thirty-two of the 45 items on the scale had home scale correlations that were significantly higher than any other subscale correlation for that item. K-R 20 coefficients for subscales ranged from .74 to .89. Because the 9 items that had very poor subscale correlations were clustered on 4 of the 11 original subscales, it was recommended that these subscales be eliminated. The result would be a 31-item scale that measures seven different factors.Authorship is equal.  相似文献   

17.
Mael's taxonomy of biodata item attributes is applied to two biodata instruments in order to investigate the relationship between item attributes and (1) validity and (2) socially desirable responding. The results show a strong relationship between item attributes and validity, with items that are more historical, external, objective, discrete, and verifiable and less job relevant displaying stronger validities. Weaker relationships are observed between the item attributes and socially desirable responding. Implications of these results for building a clearinghouse for documentation of objective biodata items is discussed.  相似文献   

18.
Traditional testing procedures typically utilize unidimensional item response theory (IRT) models to provide a single, continuous estimate of a student’s overall ability. Advances in psychometrics have focused on measuring multiple dimensions of ability to provide more detailed feedback for students, teachers, and other stakeholders. Diagnostic classification models (DCMs) provide multidimensional feedback by using categorical latent variables that represent distinct skills underlying a test that students may or may not have mastered. The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of a unidimensional IRT model and a DCM where the categorical latent variables represent misconceptions instead of skills. In addition to an estimate of ability along a latent continuum, the SICM model provides multidimensional, diagnostic feedback in the form of statistical estimates of probabilities that students have certain misconceptions. Through an empirical data analysis, we show how this additional feedback can be used by stakeholders to tailor instruction for students’ needs. We also provide results from a simulation study that demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions.  相似文献   

19.
Generating items during testing: Psychometric issues and models   总被引:2,自引:0,他引:2  
On-line item generation is becoming increasingly feasible for many cognitive tests. Item generation seemingly conflicts with the well established principle of measuring persons from items with known psychometric properties. This paper examines psychometric principles and models required for measurement from on-line item generation. Three psychometric issues are elaborated for item generation. First, design principles to generate items are considered. A cognitive design system approach is elaborated and then illustrated with an application to a test of abstract reasoning. Second, psychometric models for calibrating generating principles, rather than specific items, are required. Existing item response theory (IRT) models are reviewed and a new IRT model that includes the impact on item discrimination, as well as difficulty, is developed. Third, the impact of item parameter uncertainty on person estimates is considered. Results from both fixed content and adaptive testing are presented.This article is based on the Presidential Address Susan E. Embretson gave on June 26, 1999 at the 1999 Annual Meeting of the Psychometric Society held at the University of Kansas in Lawrence, Kansas. —Editor  相似文献   

20.

Item response theory (IRT) was applied to evaluate the psychometric properties of the Spiritual Assessment Inventory (SAI; Hall & Edwards, 1996 Hall, T. W. and Edwards, K. J. 1996. The initial development and factor analysis of the spiritual assessment inventory. Journal of Psychology and Theology, 24: 233246. [Crossref], [Web of Science ®] [Google Scholar], 2002 Hall, T. W. and Edwards, K. J. 2002. The spiritual assessment inventory: A theistic model and measure for assessing spiritual development. Journal for the Scientific Study of Religion, 41: 341357. [Crossref], [Web of Science ®] [Google Scholar]). The SAI is a 49-item self-report questionnaire designed to assess five aspects of spirituality: Awareness of God, Disappointment (with God), Grandiosity (excessive self-importance), Realistic Acceptance (of God), and Instability (in one's relationship to God). IRT analysis revealed that for several scales: (a) two or three items per scale carry the psychometric workload and (b) measurement precision is peaked for all five scales, such that one end of the scale, and not the other, is measured precisely. We considered how sample homogeneity and the possible quasi-continuous nature of the SAI constructs may have affected our results and, in light of this, made suggestions for SAI revisions, as well as for measuring spirituality, in general.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号