首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Under assumptions that will hold for the usual test situation, it is proved that test reliability and variance increase (a) as the average inter-item correlation increases, and (b) as the variance of the item difficulty distribution decreases. As the average item variance increases, the test variance will increase, but the test reliability will not be affected. (It is noted that as the average item variance increases, the average item difficulty approaches .50). In this development, no account is taken of the effect of chance success, or the possible effect on student attitude of different item difficulty distributions. In order to maximize the reliability and variance of a test, the items should have high intercorrelations, all items should be of the same difficulty level, and the level should be as near to 50% as possible.The desirability of determining this relationship has been indicated by previous writers. Work on the present paper arose out of some problems raised by Dr. Herbert S. Conrad in connection with an analysis of aptitude tests.On leave for Government war research from the Psychology Department, University of Chicago.  相似文献   

2.
When scaling data using item response theory, valid statements based on the measurement model are only permissible if the model fits the data. Most item fit statistics used to assess the fit between observed item responses and the item responses predicted by the measurement model show significant weaknesses, such as the dependence of fit statistics on sample size and number of items. In order to assess the size of misfit and to thus use the fit statistic as an effect size, dependencies on properties of the data set are undesirable. The present study describes a new approach and empirically tests it for consistency. We developed an estimator of the distance between the predicted item response functions (IRFs) and the true IRFs by semiparametric adaptation of IRFs. For the semiparametric adaptation, the approach of extended basis functions due to Ramsay and Silverman (2005) is used. The IRF is defined as the sum of a linear term and a more flexible term constructed via basis function expansions. The group lasso method is applied as a regularization of the flexible term, and determines whether all parameters of the basis functions are fixed at zero or freely estimated. Thus, the method serves as a selection criterion for items that should be adjusted semiparametrically. The distance between the predicted and semiparametrically adjusted IRF of misfitting items can then be determined by describing the fitting items by the parametric form of the IRF and the misfitting items by the semiparametric approach. In a simulation study, we demonstrated that the proposed method delivers satisfactory results in large samples (i.e., N ≥ 1,000).  相似文献   

3.
The item response function (IRF) for a polytomously scored item is defined as a weighted sum of the item category response functions (ICRF, the probability of getting a particular score for a randomly sampled examinee of ability ). This paper establishes the correspondence between an IRF and a unique set of ICRFs for two of the most commonly used polytomous IRT models (the partial credit models and the graded response model). Specifically, a proof of the following assertion is provided for these models: If two items have the same IRF, then they must have the same number of categories; moreover, they must consist of the same ICRFs. As a corollary, for the Rasch dichotomous model, if two tests have the same test characteristic function (TCF), then they must have the same number of items. Moreover, for each item in one of the tests, an item in the other test with an identical IRF must exist. Theoretical as well as practical implications of these results are discussed.This research was supported by Educational Testing Service Allocation Projects No. 79409 and No. 79413. The authors wish to thank John Donoghue, Ming-Mei Wang, Rebecca Zwick, and Zhiliang Ying for their useful comments and discussions. The authors also wish to thank three anonymous reviewers for their comments.  相似文献   

4.
Since item values obtained by item analysis procedures are not always stable from one situation to another, it follows that selection of items for validity or difficulty is sometimes useless. An application of Chi Square to testing homogeneity of item values is made, in the case of theUL method, and illustrative data are presented. A method of applying sampling theory to Horst's maximizing function is outlined, as illustrative of author's observation that the results of item analysis by any of various methods may be similarly tested.  相似文献   

5.
Item difficulty effects in skill learning were examined by giving participants extensive training with repeated alphabet arithmetic problems that varied in addend size (e.g., C ? D = ? is easy; C ? J = ? is harder). Recognition memory for the items, as measured by interpolated recognition tests, was acquired early in training and was unaffected by item difficulty. Memory for the solutions to items, as measured by the participants’ strategy reports that they had retrieved, rather than computed, the solution, was acquired later and was affected by item difficulty. Solutions to easier items were learned earlier in training for both young adults (18–24 years) and older adults (60–75 years), superimposed on an overall lower level of solution learning in older participants. The results suggest that the formation of associations between problems and their solutions is effortful and shares limited processing resources with the computational demands of the problem.  相似文献   

6.
Conjunctive item response models are introduced such that (a) sufficient statistics for latent traits are not necessarily additive in item scores; (b) items are not necessarily locally independent; and (c) existing compensatory (additive) item response models including the binomial, Rasch, logistic, and general locally independent model are special cases. Simple estimates and hypothesis tests for conjunctive models are introduced and evaluated as well. Conjunctive models are also identified with cognitive models that assume the existence of several individually necessary component processes for a global ability. It is concluded that conjunctive models and methods may show promise for constructing improved tests and uncovering conjunctive cognitive structure. It is also concluded that conjunctive item response theory may help to clarify the relationships between local dependence, multidimensionality, and item response function form.I appreciate the many helpful suggestions that were given by the reviewers and Ivo Molenaar.  相似文献   

7.
针对双目标CD-CAT,将六种项目区分度(鉴别力D、一般区分度GDI、优势比OR、2PL的区分度a、属性区分度ADI、认知诊断区分度CDI)分别与IPA方法结合,得到新的选题策略。模拟研究比较了它们的表现,还考察了区分度分层在控制项目曝光的表现。结果发现:新方法都能明显提高知识状态的判准率和能力估计精度;分层选题均能很好地提高题库利用率。总体上,OR加权能显著提高测量精度;OR分层选题在保证测量精度条件下显著提高项目曝光均匀性。  相似文献   

8.
A method for analyzing test item responses is proposed to examine differential item functioning (DIF) in multiple-choice items through a combination of the usual notion of DIF, for correct/incorrect responses and information about DIF contained in each of the alternatives. The proposed method uses incomplete latent class models to examine whether DIF is caused by the attractiveness of the alternatives, difficulty of the item, or both. DIF with respect to either known or unknown subgroups can be tested by a likelihood ratio test that is asymptotically distributed as a chi-square random variable.  相似文献   

9.
A plausibles-factor solution for many types of psychological and educational tests is one that exhibits a general factor ands − 1 group or method related factors. The bi-factor solution results from the constraint that each item has a nonzero loading on the primary dimension and at most one of thes − 1 group factors. This paper derives a bi-factor item-response model for binary response data. In marginal maximum likelihood estimation of item parameters, the bi-factor restriction leads to a major simplification of likelihood equations and (a) permits analysis of models with large numbers of group factors; (b) permits conditional dependence within identified subsets of items; and (c) provides more parsimonious factor solutions than an unrestricted full-information item factor analysis in some cases. Supported by the Cognitive Science Program, Office of Naval Research, Under grant #N00014-89-J-1104. We would like to thank Darrell Bock for several helpful suggestions.  相似文献   

10.
The relation between item difficulty distributions and the validity and reliability of tests is computed through use of normal correlation surfaces for varying numbers of items and varying degrees of item intercorrelations. Optimal or near optimal item difficulty distributions are thus identified for various possible item difficulty distributions. The results indicate that, if a test is of conventional length, is homogeneous as to content, and has a symmetrical distribution of item difficulties, correlation with a normally distributed perfect measure of the attribute common to the items does not vary appreciably with variation in the item difficulty distribution. Greater variation was evident in correlation with a second duplicate test (reliability). The general implications of these findings and their particular significance for evaluating techniques aimed at increasing reliability are considered.  相似文献   

11.
This study presents a psychometric evaluation of the Expanded Cognitive Reflection Test (CRT7) based on item response theory. The participants (N?=?1204) completed the CRT7 and provided self-reported information about their cognitive styles through the Preference for Intuition and Deliberation Scale (PID). A two-parameter logistic model was fitted to the data to obtain the item difficulty and discrimination parameters of the CRT7. The results showed that the items had good discriminatory power (αs?=?.80 ? 2.92), but the range of difficulty was restricted (βs ranged from ?.60 to .32). Moreover, the CRT7 showed a pattern of correlations with the PID which was similar to that of the original CRT. When taken together, these results are evidence of the adequacy of the CRT7 as an expanded tool for measuring cognitive reflection; however, one of the newer items (the pig item) was consistently problematic across analyses, and so it is recommended that in future studies it should be removed from the CRT7.  相似文献   

12.
Assessing item fit for unidimensional item response theory models for dichotomous items has always been an issue of enormous interest, but there exists no unanimously agreed item fit diagnostic for these models, and hence there is room for further investigation of the area. This paper employs the posterior predictive model‐checking method, a popular Bayesian model‐checking tool, to examine item fit for the above‐mentioned models. An item fit plot, comparing the observed and predicted proportion‐correct scores of examinees with different raw scores, is suggested. This paper also suggests how to obtain posterior predictive p‐values (which are natural Bayesian p‐values) for the item fit statistics of Orlando and Thissen that summarize numerically the information in the above‐mentioned item fit plots. A number of simulation studies and a real data application demonstrate the effectiveness of the suggested item fit diagnostics. The suggested techniques seem to have adequate power and reasonable Type I error rate, and psychometricians will find them promising.  相似文献   

13.
The purpose of this study is to find a formula that describes the relationship between item exposure parameters and item parameters in computerized adaptive tests by using genetic programming (GP) – a biologically inspired artificial intelligence technique. Based on the formula, item exposure parameters for new parallel item pools can be predicted without conducting additional iterative simulations. Results show that an interesting formula between item exposure parameters and item parameters in a pool can be found by using GP. The item exposure parameters predicted based on the found formula were close to those observed from the Sympson and Hetter (1985) procedure and performed well in controlling item exposure rates. Similar results were observed for the Stocking and Lewis (1998) multinomial model for item selection and the Sympson and Hetter procedure with content balancing. The proposed GP approach has provided a knowledge‐based solution for finding item exposure parameters.  相似文献   

14.
The fluency of information encoding has frequently been discussed as a major determinant of predicted memory performance indicated by judgements of learning (JOLs). Previous studies established encoding fluency effects on JOLs. However, it is largely unknown whether fluency takes effect above and beyond the effects of item difficulty. We therefore tested whether encoding fluency still affects JOLs when numerous additional cues indicating the difficulty of an item are available as well. In three experiments, participants made JOLs for another participant while observing his or her self-paced study phase. However, study times were swapped in one experimental condition, so that items with short study times (indicating high fluency) were presented for long durations, whereas items with long study times (indicating low fluency) were presented for short durations. Results showed that both item difficulty and encoding fluency affected JOLs. Thus, encoding fluency in itself is indeed an important cue for JOLs that does not become redundant when difficulty information is available in addition. This observation lends considerable support to the ease-of-processing hypothesis.  相似文献   

15.
Marginal maximum‐likelihood procedures for parameter estimation and testing the fit of a hierarchical model for speed and accuracy on test items are presented. The model is a composition of two first‐level models for dichotomous responses and response times along with multivariate normal models for their item and person parameters. It is shown how the item parameters can easily be estimated using Fisher's identity. To test the fit of the model, Lagrange multiplier tests of the assumptions of subpopulation invariance of the item parameters (i.e., no differential item functioning), the shape of the response functions, and three different types of conditional independence were derived. Simulation studies were used to show the feasibility of the estimation and testing procedures and to estimate the power and Type I error rate of the latter. In addition, the procedures were applied to an empirical data set from a computerized adaptive test of language comprehension.  相似文献   

16.
A new method, with an application program in Matlab code, is proposed for testing item performance models on empirical databases. This method uses data intraclass correlation statistics as expected correlations to which one compares simple functions of correlations between model predictions and observed item performance. The method rests on a data population model whose validity for the considered data is suitably tested and has been verified for three behavioural measure databases. Contrarily to usual model selection criteria, this method provides an effective way of testing under-fitting and over-fitting, answering the usually neglected question "does this model suitably account for these data?"  相似文献   

17.
Generalized full-information item bifactor analysis   总被引:1,自引:0,他引:1  
Cai L  Yang JS  Hansen M 《心理学方法》2011,16(3):221-248
Full-information item bifactor analysis is an important statistical method in psychological and educational measurement. Current methods are limited to single-group analysis and inflexible in the types of item response models supported. We propose a flexible multiple-group item bifactor analysis framework that supports a variety of multidimensional item response theory models for an arbitrary mixing of dichotomous, ordinal, and nominal items. The extended item bifactor model also enables the estimation of latent variable means and variances when data from more than 1 group are present. Generalized user-defined parameter restrictions are permitted within or across groups. We derive an efficient full-information maximum marginal likelihood estimator. Our estimation method achieves substantial computational savings by extending Gibbons and Hedeker's (1992) bifactor dimension reduction method so that the optimization of the marginal log-likelihood requires only 2-dimensional integration regardless of the dimensionality of the latent variables. We use simulation studies to demonstrate the flexibility and accuracy of the proposed methods. We apply the model to study cross-country differences, including differential item functioning, using data from a large international education survey on mathematics literacy.  相似文献   

18.
The Informant Questionnaire on Cognitive Decline (IQCODE) is a formal informant report instrument, originally developed by Jorm and Jacomb (1989; Psychological Medicine, 19(4), 1015). The goal of the present study was to evaluate the range of cognitive decline in which the IQCODE is most sensitive, using item response theory (IRT). Existing data (N = 740) from a sample of community-dwelling older adults was used for this purpose. A 2-parameter model estimating item difficulty and discrimination fit the data best. Additionally, the IQCODE provided the most psychometric information in the range of -0.5 < theta < 1.5, with peak information obtained at approximately theta = 0.4. Based on individuals' latent score (theta) estimates, items on the IQCODE are adequate for use as a screening tool for dementia. Results of the item calibration may be useful for targeted assessment needs, such as the development of short forms.  相似文献   

19.
Although paper and pencil tests of employee honesty are becoming increasingly widespread in industry, a paucity of research exists regarding them. In a recent review of this literature, Sackett and Harris (1984) noted that scant psychometric evidence is available as to their merits or weaknesses. The aim of this paper is to report on the factor and item analysis of one such test. A principal axis solution and item response theory model (1-parameter) were used to examine the data. The factor analysis revealed four readily interpretable factors. With regard to the item analysis, the results indicated that on the whole most of the 40 items showed a reasonable fit to the model. The implications of this research are addressed.  相似文献   

20.
The repeated-testing paradigm is used to study both retroactive interference and hypermnesia (the improvement in memory across repeated tests). Considerable theoretical progress has been made by separately analyzing the 2 components of hypermnesia: the recovery of previously unrecalled items on later tests (item gains) and the forgetting of previously recalled items on later tests (item losses). Item gains increase with increases in item-specific processing, whereas item losses decrease with increases in relational processing. The authors suggest that separate analysis of item gains and losses in retroactive interference research may also prove fruitful. Three experiments showed that an interpolated list affects item gains but not losses, whereas processing similarity between the target and interpolated lists affects losses but not gains. These results are interpreted within the relational-item-specific processing framework.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号