首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The implications contained in Richardson's article on item analysis in March 1936 issue ofPsychometrika are examined in the light of multiple factor theory. It is shown that item analysis is a necessary, but not a sufficient condition for the construction of a test which shall measure a single trait. The intercorrelations of certain items selected by a method of item analysis are examined, found to contain many zero and some negative correlations. Multiple factor analysis showed that eight traits were measured by the items which had been asserted to measure only one.  相似文献   

2.
Assessing item fit for unidimensional item response theory models for dichotomous items has always been an issue of enormous interest, but there exists no unanimously agreed item fit diagnostic for these models, and hence there is room for further investigation of the area. This paper employs the posterior predictive model‐checking method, a popular Bayesian model‐checking tool, to examine item fit for the above‐mentioned models. An item fit plot, comparing the observed and predicted proportion‐correct scores of examinees with different raw scores, is suggested. This paper also suggests how to obtain posterior predictive p‐values (which are natural Bayesian p‐values) for the item fit statistics of Orlando and Thissen that summarize numerically the information in the above‐mentioned item fit plots. A number of simulation studies and a real data application demonstrate the effectiveness of the suggested item fit diagnostics. The suggested techniques seem to have adequate power and reasonable Type I error rate, and psychometricians will find them promising.  相似文献   

3.
For item responses fitting the Rasch model, the assumptions underlying the Mokken model of double monotonicity are met. This makes non‐parametric item response theory a natural starting‐point for Rasch item analysis. This paper studies scalability coefficients based on Loevinger's H coefficient that summarizes the number of Guttman errors in the data matrix. These coefficients are shown to yield efficient tests of the Rasch model using p‐values computed using Markov chain Monte Carlo methods. The power of the tests of unequal item discrimination, and their ability to distinguish between local dependence and unequal item discrimination, are discussed. The methods are illustrated and motivated using a simulation study and a real data example.  相似文献   

4.
Item responses that do not fit an item response theory (IRT) model may cause the latent trait value to be inaccurately estimated. In the past two decades several statistics have been proposed that can be used to identify nonfitting item score patterns. These statistics all yieldscalar values. Here, the use of the person response function (PRF) for identifying nonfitting item score patterns was investigated. The PRF is afunction and can be used for diagnostic purposes. First, the PRF is defined in a class of IRT models that imply an invariant item ordering. Second, a person-fit method proposed by Trabin & Weiss (1983) is reformulated in a nonparametric IRT context assuming invariant item ordering, and statistical theory proposed by Rosenbaum (1987a) is adapted to test locally whether a PRF is nonincreasing. Third, a simulation study was conducted to compare the use of the PRF with the person-fit statistic ZU3. It is concluded that the PRF can be used as a diagnostic tool in person-fit research.The authors are grateful to Coen A. Bernaards for preparing the figures used in this article, and to Wilco H.M. Emons for checking the calculations.  相似文献   

5.
The Campbell Development Surveys? constitutes an integrated battery of five surveys covering organizational satisfaction, leadership characteristics, interests and skills, team morale, and community life. All of the surveys have a common set of features: identical item formats (although different contents), homogeneous scoring scales, standard T scores, procedural checks, and similar profile reports. Each survey is described; the normative process is reported, and illustrative data for a variety of samples are presented. Comments are made on the survey development process, and applications of these surveys are discussed.  相似文献   

6.
C. T. Fan 《Psychometrika》1954,19(3):231-237
This paper describes the construction of a new item analysis table for the high-low-27-per-cent group method. The table provides a ready means of translating the observed proportions of success in the two extreme groups (p H ,p L ) into measures of item difficulty and item discrimination (p, Δ, andr). The tabled values of both the difficulty index,p, and the discrimination index,r, have been derived from Karl Pearson's tables of the normal bivariate surface.  相似文献   

7.
Cluster Analysis for Cognitive Diagnosis: Theory and Applications   总被引:3,自引:0,他引:3  
Latent class models for cognitive diagnosis often begin with specification of a matrix that indicates which attributes or skills are needed for each item. Then by imposing restrictions that take this into account, along with a theory governing how subjects interact with items, parametric formulations of item response functions are derived and fitted. Cluster analysis provides an alternative approach that does not require specifying an item response model, but does require an item-by-attribute matrix. After summarizing the data with a particular vector of sum-scores, K-means cluster analysis or hierarchical agglomerative cluster analysis can be applied with the purpose of clustering subjects who possess the same skills. Asymptotic classification accuracy results are given, along with simulations comparing effects of test length and method of clustering. An application to a language examination is provided to illustrate how the methods can be implemented in practice.  相似文献   

8.
Occupational value with predefined items scale is an originally 26‐item Swedish tool used to assess values people find in their everyday doings. The present study validated this scale on a Turkish sample and described the values that Turks perceived in their daily doings. The participants included a convenience sample of 446 adults with mean age 26 (SD = 7.3). Initial item analysis followed by principal component analysis (Promax) and internal reliability analyses of the components were conducted. Statistics yielded a 19‐item solution distributed across four factors. The Cronbach's alpha was .86, indicating good reliability. Confirming the earlier applications of the scale in the European and the American samples, factors related to recuperation, goal direction and social interaction emerged. Additionally, there appeared another occupational value subfactor, conservation, which did not show up in the Swedish and the American data analyses.  相似文献   

9.
10.
A method of Guttman scalogram analysis is presented that does not involve sorting and rearranging the entries in the item response matrix. The method requires dichotomous items. Formulas are presented for estimating the reproducibility of the scale and estimating the expected value of the chance reproducibility. An index of consistency is suggested for evaluating the reproducibility. An illustrative example is presented in detail. The logical basis of the method is discussed. Finally, several methods are suggested for dealing with non-dichotomous items.Lois K. Anderson assisted the author materially in the many computations required for this paper. The research reported in this paper was supported in part by the Department of Economics and Social Sciences at M.I.T. and in part, jointly, by the Army, Navy and Air Force under contract with the Massachusetts Institute of Technology.  相似文献   

11.
To prevent response bias, personality questionnaires may use comparative response formats. These include forced choice, where respondents choose among a number of items, and quantitative comparisons, where respondents indicate the extent to which items are preferred to each other. The present article extends Thurstonian modeling of binary choice data to “proportion-of-total” (compositional) formats. Following the seminal work of Aitchison, compositional item data are transformed into log ratios, conceptualized as differences of latent item utilities. The mean and covariance structure of the log ratios is modeled using confirmatory factor analysis (CFA), where the item utilities are first-order factors, and personal attributes measured by a questionnaire are second-order factors. A simulation study with two sample sizes, N = 300 and N = 1,000, shows that the method provides very good recovery of true parameters and near-nominal rejection rates. The approach is illustrated with empirical data from N = 317 students, comparing model parameters obtained with compositional and Likert-scale versions of a Big Five measure. The results show that the proposed model successfully captures the latent structures and person scores on the measured traits.  相似文献   

12.
A plausibles-factor solution for many types of psychological and educational tests is one that exhibits a general factor ands − 1 group or method related factors. The bi-factor solution results from the constraint that each item has a nonzero loading on the primary dimension and at most one of thes − 1 group factors. This paper derives a bi-factor item-response model for binary response data. In marginal maximum likelihood estimation of item parameters, the bi-factor restriction leads to a major simplification of likelihood equations and (a) permits analysis of models with large numbers of group factors; (b) permits conditional dependence within identified subsets of items; and (c) provides more parsimonious factor solutions than an unrestricted full-information item factor analysis in some cases. Supported by the Cognitive Science Program, Office of Naval Research, Under grant #N00014-89-J-1104. We would like to thank Darrell Bock for several helpful suggestions.  相似文献   

13.
Replenishing item pools for on-line ability testing requires innovative and efficient data collection designs. By generating localD-optimal designs for selecting individual examinees, and consistently estimating item parameters in the presence of error in the design points, sequential procedures are efficient for on-line item calibration. The estimating error in the on-line ability values is accounted for with an item parameter estimate studied by Stefanski and Carroll. LocallyD-optimaln-point designs are derived using the branch-and-bound algorithm of Welch. In simulations, the overall sequential designs appear to be considerably more efficient than random seeding of items.This report was prepared under the Navy Manpower, Personnel, and Training R&D Program of the Office of the Chief of Naval Research under Contract N00014-87-0696. The authors wish to acknowledge the valuable advice and consultation given by Ronald Armstrong, Charles Davis, Bradford Sympson, Zhaobo Wang, Ing-Long Wu and three anonymous reviewers.  相似文献   

14.
Computerized adaptive testing (CAT) was originally proposed to measure θ, usually a latent trait, with greater precision by sequentially selecting items according to the student’s responses to previously administered items. Although the application of CAT is promising for many educational testing programs, most of the current CAT systems were not designed to provide diagnostic information. This article discusses item selection strategies specifically tailored for cognitive diagnostic tests. Our goal is to identify an effective item selection algorithm that not only estimates θ efficiently, but also classifies the student’s knowledge status α accurately. A single-stage item selection method with a dual purpose will be introduced. The main idea is to treat diagnostic criteria as constraints: Using the maximum priority index method to meet these constraints, the CAT system is able to generate cognitive diagnostic feedback in a fairly straightforward fashion. Different priority functions are proposed. Some of them are based on certain information measures, such as Kullback–Leibler information, and others utilize only the information provided by the Q-matrix. An extensive simulation study is conducted, and the results indicate that the information-based method not only yields higher classification rates for cognitive diagnosis, but also achieves more accurate θ estimation. Other constraint controls, such as item exposure rates, are also considered for all the competing methods.  相似文献   

15.
While the Angoff (1971) is a commonly used cut score method, critics ( Berk, 1996; Impara & Plake, 1997 ) argue the Angoff places too‐high cognitive demands on raters. In response to criticisms of the Angoff, a number of modifications to the method have been proposed. Some suggested Angoff modifications include using an iterative rating process, presenting judges with normative data about item performance, revising the rating judgment into a Yes/No decision, assigning relative weights to dimensions within a test, and using item response theory in setting cut scores. In this study, subject matter expert raters were provided with a ‘difficulty anchored’ rating scale to use while making Angoff ratings; this scale can be viewed as a variation of the Angoff normative data modification. The rating scale presented test items having known p‐values as anchors, and served as a simple means of providing normative information to guide the Angoff rating process. Results are discussed regarding reliability of the mean Angoff rating (.73) and the correlation of mean Angoff ratings with item difficulty (observed r ranges from .65 to .73).  相似文献   

16.
Content balancing is often required in the development and implementation of computerized adaptive tests (CATs). In the current study, we propose a modified a‐stratified method, the a‐stratified method with content blocking. As a further refinement of a‐stratified CAT designs, the new method incorporates content specifications into item pool stratification. Simulation studies were conducted to compare the new method with three previous item selection methods: the a‐stratified method; the a‐stratified with b‐blocking method; and the maximum Fisher information method with Sympson‐Hetter exposure control. The results indicated that the refined a‐stratified design performed well in reducing item overexposure rates, balancing item usage within the pool, and maintaining measurement precision, in a situation where all four procedures were forced to balance content.  相似文献   

17.
A method is proposed for the detection of item bias with respect to observed or unobserved subgroups. The method uses quasi-loglinear models for the incomplete subgroup × test score × Item 1 × ... × itemk contingency table. If subgroup membership is unknown the models are Haberman's incomplete-latent-class models.The (conditional) Rasch model is formulated as a quasi-loglinear model. The parameters in this loglinear model, that correspond to the main effects of the item responses, are the conditional estimates of the parameters in the Rasch model. Item bias can then be tested by comparing the quasi-loglinear-Rasch model with models that contain parameters for the interaction of item responses and the subgroups.The author thanks Wim J. van der Linden and Gideon J. Mellenbergh for comments and suggestions and Frank Kok for empirical data.  相似文献   

18.
This paper proposes a structural analysis for generalized linear models when some explanatory variables are measured with error and the measurement error variance is a function of the true variables. The focus is on latent variables investigated on the basis of questionnaires and estimated using item response theory models. Latent variable estimates are then treated as observed measures of the true variables. This leads to a two-stage estimation procedure which constitutes an alternative to a joint model for the outcome variable and the responses given to the questionnaire. Simulation studies explore the effect of ignoring the true error structure and the performance of the proposed method. Two illustrative examples concern achievement data of university students. Particular attention is given to the Rasch model.  相似文献   

19.
The test-retest reliability of qualitative items, such as occur in achievement tests, attitude questionnaires, public opinion surveys, and elsewhere, requires a different technique of analysis from that of quantitative variables. Definitions appropriate to the qualitative case are made both for the reliability coefficient of an individual on an item and for the reliability coefficient of a population on the item. From but a single trial of a large population on the item, it is possible to compute alower bound to the group reliability coefficient. Two kinds of lower bounds are presented. From two experimentally independent trials of the population on the item, it is possible to compute anupper bound to the group reliability coefficient. Two upper bounds are presented. The computations for the lower and upper bounds are all very simple. Numerical examples are given.  相似文献   

20.
This article describes the functions of a SAS macro and an SPSS syntax that produce common statistics for conventional item analysis including Cronbach’s alpha, item difficulty index (p-value or item mean), and item discrimination indices (D-index, point biserial and biserial correlations for dichotomous items and item-total correlation for polytomous items). These programs represent an improvement over the existing SAS and SPSS item analysis routines in terms of completeness and user-friendliness. To promote routine evaluations of item qualities in instrument development of any scale, the programs are available at no charge for interested users. The program codes along with a brief user’s manual that contains instructions and examples are downloadable from suen.ed.psu.edu/~pwlei/plei.htm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号