期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

An IRT model based on the Rasch model is proposed for composite tasks, that is, tasks that are decomposed into subtasks of different kinds. There is one subtask for each component that is discerned in the composite tasks. A component is a generic kind of subtask of which the subtasks resulting from the decomposition are specific instantiations with respect to the particular composite tasks under study. The proposed model constrains the difficulties of the composite tasks to be linear combinations of the difficulties of the corresponding subtask items, which are estimated together with the weights used in the linear combinations, one weight for each kind of subtask. Although the model does not belong to the exponential family, its parameters can be estimated using conditional maximum likelihood estimation. The approach is demonstrated with an application to spelling tasks. We thank Eric Maris for his helpful comments. 相似文献

7.

Bayesian item fit analysis for unidimensional item response theory models

《The British journal of mathematical and statistical psychology》2006,59(2):429-449

Assessing item fit for unidimensional item response theory models for dichotomous items has always been an issue of enormous interest, but there exists no unanimously agreed item fit diagnostic for these models, and hence there is room for further investigation of the area. This paper employs the posterior predictive model‐checking method, a popular Bayesian model‐checking tool, to examine item fit for the above‐mentioned models. An item fit plot, comparing the observed and predicted proportion‐correct scores of examinees with different raw scores, is suggested. This paper also suggests how to obtain posterior predictive p‐values (which are natural Bayesian p‐values) for the item fit statistics of Orlando and Thissen that summarize numerically the information in the above‐mentioned item fit plots. A number of simulation studies and a real data application demonstrate the effectiveness of the suggested item fit diagnostics. The suggested techniques seem to have adequate power and reasonable Type I error rate, and psychometricians will find them promising. 相似文献

8.

Specifying optimum examinees for item parameter estimation in item response theory

Martha L. Stocking 《Psychometrika》1990,55(3):461-475

Information functions are used to find the optimum ability levels and maximum contributions to information for estimating item parameters in three commonly used logistic item response models. For the three and two parameter logistic models, examinees who contribute maximally to the estimation of item difficulty contribute little to the estimation of item discrimination. This suggests that in applications that depend heavily upon the veracity of individual item parameter estimates (e.g. adaptive testing or text construction), better item calibration results may be obtained (for fixed sample sizes) from examinee calibration samples in which ability is widely dispersed.This work was supported by Contract No. N00014-83-C-0457, project designation NR 150-520, from Cognitive Science Program, Cognitive and Neural Sciences Division, Office of Naval Research and Educational Testing Service through the Program Research Planning Council. Reproduction in whole or in part is permitted for any purpose of the United States Government. The author wishes to acknowledge the invaluable assistance of Maxine B. Kingston in carrying out this study, and to thank Charles Lewis for his many insightful comments on earlier drafts of this paper. 相似文献

9.

Solving the measurement invariance anchor item problem in item response theory

Meade AW Wright NA 《The Journal of applied psychology》2012,97(5):1016-1031

The efficacy of tests of differential item functioning (measurement invariance) has been well established. It is clear that when properly implemented, these tests can successfully identify differentially functioning (DF) items when they exist. However, an assumption of these analyses is that the metric for different groups is linked using anchor items that are invariant. In practice, however, it is impossible to be certain which items are DF and which are invariant. This problem of anchor items, or referent indicators, has long plagued invariance research, and a multitude of suggested approaches have been put forth. Unfortunately, the relative efficacy of these approaches has not been tested. This study compares 11 variations on 5 qualitatively different approaches from recent literature for selecting optimal anchor items. A large-scale simulation study indicates that for nearly all conditions, an easily implemented 2-stage procedure recently put forth by Lopez Rivas, Stark, and Chernyshenko (2009) provided optimal power while maintaining nominal Type I error. With this approach, appropriate anchor items can be easily and quickly located, resulting in more efficacious invariance tests. Recommendations for invariance testing are illustrated using a pedagogical example of employee responses to an organizational culture measure. (PsycINFO Database Record (c) 2012 APA, all rights reserved). 相似文献

10.

Examining differential item functioning due to item difficulty and alternative attractiveness

Paul Westers Henk Kelderman 《Psychometrika》1992,57(1):107-118

A method for analyzing test item responses is proposed to examine differential item functioning (DIF) in multiple-choice items through a combination of the usual notion of DIF, for correct/incorrect responses and information about DIF contained in each of the alternatives. The proposed method uses incomplete latent class models to examine whether DIF is caused by the attractiveness of the alternatives, difficulty of the item, or both. DIF with respect to either known or unknown subgroups can be tested by a likelihood ratio test that is asymptotically distributed as a chi-square random variable. 相似文献

11.

基于项目区分度的双目标CD-CAT选题策略

何洁毛秀珍唐倩王霞《心理科学》2022,(1):204-212

针对双目标CD-CAT,将六种项目区分度（鉴别力D、一般区分度GDI、优势比OR、2PL的区分度a、属性区分度ADI、认知诊断区分度CDI）分别与IPA方法结合,得到新的选题策略。模拟研究比较了它们的表现,还考察了区分度分层在控制项目曝光的表现。结果发现：新方法都能明显提高知识状态的判准率和能力估计精度;分层选题均能很好地提高题库利用率。总体上,OR加权能显著提高测量精度;OR分层选题在保证测量精度条件下显著提高项目曝光均匀性。相似文献

12.

The unique correspondence of the item response function and item category response functions in polytomously scored item response models

Hua-Hua Chang John Mazzeo 《Psychometrika》1994,59(3):391-404

The item response function (IRF) for a polytomously scored item is defined as a weighted sum of the item category response functions (ICRF, the probability of getting a particular score for a randomly sampled examinee of ability ). This paper establishes the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model). Specifically, a proof of the following assertion is provided for these models: If two items have the same IRF, then they must have the same number of categories; moreover, they must consist of the same ICRFs. As a corollary, for the Rasch dichotomous model, if two tests have the same test characteristic function (TCF), then they must have the same number of items. Moreover, for each item in one of the tests, an item in the other test with an identical IRF must exist. Theoretical as well as practical implications of these results are discussed.This research was supported by Educational Testing Service Allocation Projects No. 79409 and No. 79413. The authors wish to thank John Donoghue, Ming-Mei Wang, Rebecca Zwick, and Zhiliang Ying for their useful comments and discussions. The authors also wish to thank three anonymous reviewers for their comments. 相似文献

13.

Use of item ratings to examine personality test item cognitive response processes

Eric D. Gordon Ronald R. Holden 《Personality and individual differences》1996,21(6):897-905

This research examines the processes respondents use to answer personality test items. A total of 158 true/false items from four scales of the Personality Research Form and the California Psychological Inventory were used as stimuli. University students (N = 120) responded to each item and indicated one of nine strategies used in deciding on a response. Obtained response strategy ratings for items were reliable and their frequencies corresponded closely to previous findings with other items. Subsequently, the associations between item response strategy frequencies and item-total correlations were computed. Congruent with previous research, better items avoided behaviours or experiences and evoked responding based on traits and on referring to the statements of others. The associations between item response strategies and other indices of item quality are discussed and implications regarding scale development are offered. 相似文献

14.

Comparing item characteristic curves

Paul R. Rosenbaum 《Psychometrika》1987,52(2):217-233

Test items are often evaluated and compared by contrasting the shapes of their item characteristics curves (ICC's) or surfaces. The current paper develops and applies three general (i.e., nonparametric) comparisons of the shapes of two item characteristic surfaces: (i) proportional latent odds, (ii) uniform relative difficulty, and (iii) item sensitivity. Two items may be compared in these ways while making no assumption about the shapes of item characteristic surfaces for other items, and no assumption about the dimensionality of the latent variable. Also studied is a method for comparing the relative shapes of two item characteristic curves in two examinee populations.The author is grateful to Paul Holland, Robert Mislevy, Tue Tjur, Rebecca Zwick, the editor and reviewers for valuable comments on the subject of this paper, to Mari A. Pearlman for advice on the pairing of items in the examples, and to Dorothy Thayer for assistance with computing. 相似文献

15.

A semiparametric approach for item response function estimation to detect item misfit

Carmen Köhler Alexander Robitzsch Katharina Fährmann Matthias von Davier Johannes Hartig 《The British journal of mathematical and statistical psychology》2021,74(Z1):157-175

When scaling data using item response theory, valid statements based on the measurement model are only permissible if the model fits the data. Most item fit statistics used to assess the fit between observed item responses and the item responses predicted by the measurement model show significant weaknesses, such as the dependence of fit statistics on sample size and number of items. In order to assess the size of misfit and to thus use the fit statistic as an effect size, dependencies on properties of the data set are undesirable. The present study describes a new approach and empirically tests it for consistency. We developed an estimator of the distance between the predicted item response functions (IRFs) and the true IRFs by semiparametric adaptation of IRFs. For the semiparametric adaptation, the approach of extended basis functions due to Ramsay and Silverman (2005) is used. The IRF is defined as the sum of a linear term and a more flexible term constructed via basis function expansions. The group lasso method is applied as a regularization of the flexible term, and determines whether all parameters of the basis functions are fixed at zero or freely estimated. Thus, the method serves as a selection criterion for items that should be adjusted semiparametrically. The distance between the predicted and semiparametrically adjusted IRF of misfitting items can then be determined by describing the fitting items by the parametric form of the IRF and the misfitting items by the semiparametric approach. In a simulation study, we demonstrated that the proposed method delivers satisfactory results in large samples (i.e., N ≥ 1,000). 相似文献

16.

Spoken-word processing in aphasia: effects of item overlap and item repetition

Janse E 《Brain and language》2008,105(3):185-198

Two studies were carried out to investigate the effects of presentation of primes showing partial (word-initial) or full overlap on processing of spoken target words. The first study investigated whether time compression would interfere with lexical processing so as to elicit aphasic-like performance in non-brain-damaged subjects. The second study was designed to compare effects of item overlap and item repetition in aphasic patients of different diagnostic types. Time compression did not interfere with lexical deactivation for the non-brain-damaged subjects. Furthermore, all aphasic patients showed immediate inhibition of co-activated candidates. These combined results show that deactivation is a fast process. Repetition effects, however, seem to arise only at the longer term in aphasic patients. Importantly, poor performance on diagnostic verbal STM tasks was shown to be related to lexical decision performance in both overlap and repetition conditions, which suggests a common underlying deficit. 相似文献

17.

Full-information item bi-factor analysis

Robert D. Gibbons Donald R. Hedeker 《Psychometrika》1992,57(3):423-436

A plausibles-factor solution for many types of psychological and educational tests is one that exhibits a general factor ands − 1 group or method related factors. The bi-factor solution results from the constraint that each item has a nonzero loading on the primary dimension and at most one of thes − 1 group factors. This paper derives a bi-factor item-response model for binary response data. In marginal maximum likelihood estimation of item parameters, the bi-factor restriction leads to a major simplification of likelihood equations and (a) permits analysis of models with large numbers of group factors; (b) permits conditional dependence within identified subsets of items; and (c) provides more parsimonious factor solutions than an unrestricted full-information item factor analysis in some cases. Supported by the Cognitive Science Program, Office of Naval Research, Under grant #N00014-89-J-1104. We would like to thank Darrell Bock for several helpful suggestions. 相似文献

18.

Aggregate item response analysis 总被引：1，自引：0，他引：1

Gordon G. Bechtel Chezy Ofir 《Psychometrika》1988,53(1):93-107

A stochastic postulate is given for the multiple-item, successive-intervals scaling of populations. The logistic equivalent of this postulate provides an aggregate item response model in which a unidimensional submodel may be nested. This reduction provides a subtractive conjoint measurement of several items and stimuli on the same latent scale. Generalized-least-squares methods are used to estimate and test the multiple-item model, and its unidimensional reduction, on aggregate survey responses. The entire procedure is illustrated with an analysis of semantic-differential attitude data. This analysis exhibits an item selection procedure that is applicable to various social constructs.The authors dedicate this paper to the memory and contributions of Clyde Coombs.The programming and data analyses for the present paper were carried out by José Ventura of the Department of Industrial and Systems Engineering, and Jerry Meiten of the Department of Statistics, University of Florida.The study was also supported by the College of Business Administration, University of Florida, and the Faculty of Social Sciences, Hebrew University of Jerusalem. 相似文献

19.

Priorities in item recognition

Vicki P. Raeburnj 《Memory & cognition》1974,2(4):663-669

The Sternberg paradigm was used to examine item recognition in the experiments reported here. Functions relating reaction time to positive set size and relating reaction time to the serial position of positive targets were discussed within the context of Sternberg’s (1969) recognition model. The experiments were designed to test (a) the hypothesis that certain members of the positive set receive preferential processing and (b) the hypothesis that some members of the negative set are compared to the target in memory. The results of both experiments supported the first hypothesis, and the results of Experiment I supported the second hypothesis. Models generally consistent with these results were discussed. 相似文献

20.

Diagnosing item score patterns on a test using item response theory-based person-fit statistics

Meijer RR 《心理学方法》2003,8(1):72-87

Person-fit statistics have been proposed to investigate the fit of an item score pattern to an item response theory (IRT) model. The author investigated how these statistics can be used to detect different types of misfit. Intelligence test data were analyzed using person-fit statistics in the context of the G. Rasch (1960) model and R. J. Mokken's (1971, 1997) IRT models. The effect of the choice of an IRT model to detect misfitting item score patterns and the usefulness of person-fit statisticsfor diagnosis of misfit are discussed. Results showed that different types of person-fit statistics can be used to detect different kinds of person misfit. Parametric person-fit statistics had more power than nonparametric person-fit statistics. 相似文献