期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Analysis of distractor difficulty in multiple-choice items

Javier?Revuelta Email author 《Psychometrika》2004,69(2):217-234

Two psychometric models are presented for evaluating the difficulty of the distractors in multiple-choice items. They are based on the criterion of rising distractor selection ratios, which facilitates interpretation of the subject and item parameters. Statistical inferential tools are developed in a Bayesian framework: modal a posteriori estimation by application of an EM algorithm and model evaluation by monitoring posterior predictive replications of the data matrix. An educational example with real data is included to exemplify the application of the models and compare them with the nominal categories model.This research was supported by the DGI grant BSO2002-01485.I would like to thank Eric Maris and Vicente Ponsoda for their advice, Juan Botella for providing the data for the empirical application, and three anonymous reviewers for their comments that were essential for improving the quality of the paper. 相似文献

2.

Gaussian model‐based partitioning using iterated local search

下载免费PDF全文

Michael J. Brusco Emilie Shireman Douglas Steinley Susan Brudvig J. Dennis Cradit 《The British journal of mathematical and statistical psychology》2017,70(1):1-24

The emergence of Gaussian model‐based partitioning as a viable alternative to K‐means clustering fosters a need for discrete optimization methods that can be efficiently implemented using model‐based criteria. A variety of alternative partitioning criteria have been proposed for more general data conditions that permit elliptical clusters, different spatial orientations for the clusters, and unequal cluster sizes. Unfortunately, many of these partitioning criteria are computationally demanding, which makes the multiple‐restart (multistart) approach commonly used for K‐means partitioning less effective as a heuristic solution strategy. As an alternative, we propose an approach based on iterated local search (ILS), which has proved effective in previous combinatorial data analysis contexts. We compared multistart, ILS and hybrid multistart–ILS procedures for minimizing a very general model‐based criterion that assumes no restrictions on cluster size or within‐group covariance structure. This comparison, which used 23 data sets from the classification literature, revealed that the ILS and hybrid heuristics generally provided better criterion function values than the multistart approach when all three methods were constrained to the same 10‐min time limit. In many instances, these differences in criterion function values reflected profound differences in the partitions obtained. 相似文献

3.

Mixed-effects analyses of rank-ordered data

Ulf Böckenholt 《Psychometrika》2001,66(1):45-62

相似文献

4.

A comparison of three simple test theory models

J. O. Ramsay 《Psychometrika》1989,54(3):487-499

In very simple test theory models such as the Rasch model, a single parameter is used to represent the ability of any examinee or the difficulty of any item. Simple models such as these provide very important points of departure for more detailed modeling when a substantial amount of data are available, and are themselves of real practical value for small or even medium samples. They can also serve a normative role in test design.As an alternative to the Rasch model, or the Rasch model with a correction for guessing, a simple model is introduced which characterizes strength of response in terms of the ratio of ability and difficulty parameters rather than their difference. This model provides a natural account of guessing, and has other useful things to contribute as well. It also offers an alternative to the Rasch model with the usual correction for guessing. The three models are compared in terms of statistical properties and fits to actual data. The goal of the paper is to widen the range of minimal models available to test analysts.This research was supported by grant AP320 from the Natural Sciences and Engineering Research Council of Canada. The author is grateful for discussions with M. Abrahamowicz, I. Molenaar, D. Thissen, and H. Wainer. 相似文献

5.

Model based clustering of large data sets: Tracing the development of spelling ability

Herbert?Hoijtink Email author Annelise?Notenboom 《Psychometrika》2004,69(3):481-498

There are two main theories with respect to the development of spelling ability: the stage model and the model of overlapping waves. In this paper exploratory model based clustering will be used to analyze the responses of more than 3500 pupils to subsets of 245 items. To evaluate the two theories, the resulting clusters will be ordered along a developmental dimension using an external criterion. Solutions for three statistical problems will be given: (1) an algorithm that can handle large data sets and only renders non-degenerate clusters; (2) a goodness of fit test that is not affected by the fact that the number of possible response vectors by far out-weights the number of observed response vectors; and (3) a new technique,data expunction, that can be used to evaluate goodness-of-fit tests if the missing data mechanism is known. Research supported by a grant (NWO 411-21-006) of the Dutch Organization for Scientific Research. 相似文献

6.

The many null distributions of person fit indices 总被引：1，自引：0，他引：1

Ivo W. Molenaar Herbert Hoijtink 《Psychometrika》1990,55(1):75-106

This paper deals with the situation of an investigator who has collected the scores ofn persons to a set ofk dichotomous items, and wants to investigate whether the answers of all respondents are compatible with the one parameter logistic test model of Rasch. Contrary to the standard analysis of the Rasch model, where all persons are kept in the analysis and badly fittingitems may be removed, this paper studies the alternative model in which a small minority ofpersons has an answer strategy not described by the Rasch model. Such persons are called anomalous or aberrant. From the response vectors consisting ofk symbols each equal to 0 or 1, it is desired to classify each respondent as either anomalous or as conforming to the model. As this model is probabilistic, such a classification will possibly involve false positives and false negatives. Both for the Rasch model and for other item response models, the literature contains several proposals for a person fit index, which expresses for each individual the plausibility that his/her behavior follows the model. The present paper argues that such indices can only provide a satisfactory solution to the classification problem if their statistical distribution is known under the null hypothesis that all persons answer according to the model. This distribution, however, turns out to be rather different for different values of the person's latent trait value. This value will be called ability parameter, although our results are equally valid for Rasch scales measuring other attributes.As the true ability parameter is unknown, one can only use its estimate in order to obtain an estimated person fit value and an estimated null hypothesis distribution. The paper describes three specifications for the latter: assuming that the true ability equals its estimate, integrating across the ability distribution assumed for the population, and conditioning on the total score, which is in the Rasch model the sufficient statistic for the ability parameter.Classification rules for aberrance will be worked out for each of the three specifications. Depending on test length, item parameters and desired accuracy, they are based on the exact distribution, its Monte Carlo estimate and a new and promising approximation based on the moments of the person fit statistic. Results for the likelihood person fit statistic are given in detail, the methods could also be applied to other fit statistics. A comparison of the three specifications results in the recommendation to condition on the total score, as this avoids some problems of interpretation that affect the other two specifications.The authors express their gratitude to the reviewers and to many colleagues for comments on an earlier version. 相似文献

7.

Identifiability and Equivalence of GLLIRM Models

Javier Revuelta 《Psychometrika》2009,74(2):257-272

The generalized logit–linear item response model (GLLIRM) is a linearly constrained nominal categories model (NCM) that computes the scale and intercept parameters for categories as a weighted sum of basic parameters. This paper addresses the problems of the identifiability of the basic parameters and the equivalence between different GLLIRM models. It is shown that the identifiability of the basic parameters depends on the size and rank of the coefficient matrix of the linear functions. Moreover, two models are observationally equivalent if the product of the respective coefficient matrices has full column rank. Finally, the paper also explores the relations between the parameters of nested models. I would like to express my gratitude to the editor and three anonymous reviewers for their helpful suggestions on earlier versions of the paper. This work was supported by the Comunidad de Madrid (Spain) grant: CCG07-UAM/ESP-1615. 相似文献

8.

Polytomous multilevel testlet models for testlet‐based assessments with complex sampling designs

下载免费PDF全文

Hong Jiao Yuan Zhang 《The British journal of mathematical and statistical psychology》2015,68(1):65-83

A pplications of standard item response theory models assume local independence of items and persons. This paper presents polytomous multilevel testlet models for dual dependence due to item and person clustering in testlet‐based assessments with clustered samples. Simulation and survey data were analysed with a multilevel partial credit testlet model. This model was compared with three alternative models – a testlet partial credit model (PCM), multilevel PCM, and PCM – in terms of model parameter estimation. The results indicated that the deviance information criterion was the fit index that always correctly identified the true multilevel testlet model based on the quantified evidence in model selection, while the Akaike and Bayesian information criteria could not identify the true model. In general, the estimation model and the magnitude of item and person clustering impacted the estimation accuracy of ability parameters, while only the estimation model and the magnitude of item clustering affected the item parameter estimation accuracy. Furthermore, ignoring item clustering effects produced higher total errors in item parameter estimates but did not have much impact on the accuracy of ability parameter estimates, while ignoring person clustering effects yielded higher total errors in ability parameter estimates but did not have much effect on the accuracy of item parameter estimates. When both clustering effects were ignored in the PCM, item and ability parameter estimation accuracy was reduced. 相似文献

9.

An Algorithm to Improve Test Answer Copying Detection Using the Omega Statistic

Hotaka Maeda Bo Zhang 《International Journal of Testing》2017,17(1):55-73

The omega (ω) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected copied responses through probability sampling and bootstrapping. In doing so, the bias in copier ability estimation will be determined and used to update the ability estimate for calculating the modified omega (ω^m), a new statistic based on the ω. The performance of ω^m and ω were compared in a Monte Carlo simulation study under 40 typical testing conditions (2 test lengths x 4 sample sizes x 5 levels of copying). In almost all conditions, the ω^m had the same or better controlled Type I error and higher power than ω. The increase in power was particularly eminent when the source's estimated ability was higher than the copier and when 20% or 30% of items were copied. These findings support the use of the ω^m as a replacement of ω to detect answer copying in multiple choice exams. 相似文献

10.

Flexible Computerized Adaptive Tests to Detect Misconceptions and Estimate Ability Simultaneously

Yu Bao Yawei Shen Shiyu Wang Laine Bradshaw 《应用心理检测》2021,45(1):3

The Scaling Individuals and Classifying Misconceptions (SICM) model is an advanced psychometric model that can provide feedback to examinees’ misconceptions and a general ability simultaneously. These two types of feedback are represented by a discrete and a continuous latent variable, respectively, in the SICM model. The complex structure of the SICM model brings difficulties in estimating both misconception profile and ability efficiently in a linear test. To overcome this challenge, this study proposes a flexible computerized adaptive test (FCAT) design as a new test delivery method to increase test efficiency by administering an individualized test to examinees. We propose three item selection methods and two transition criteria to determine adaptive steps based on the needs of estimating one or two latent variables. Through two simulation studies, we demonstrate how to select an appropriate item selection method for an adaptive step and what transition criterion should be used between two adaptive steps. Results reveal the combination of the item selection method and the transition criterion could improve the estimation accuracy of a specific latent variable to a different extent and thus provide further guidance in designing an FCAT. 相似文献

11.

Discrepancy Risk Model Selection Test theory for comparing possibly misspecified or nonnested models

R.?M.?Golden Email author 《Psychometrika》2003,68(2):229-249

相似文献

12.

Bayesian analysis of order-statistics models for ranking data 总被引：1，自引：0，他引：1

Philip L. H. Yu 《Psychometrika》2000,65(3):281-299

In this paper, a class of probability models for ranking data, the order-statistics models, is investigated. We extend the usual normal order-statistics model into one where the underlying random variables follow a multivariate normal distribution. Bayesian approach and the Gibbs sampling technique are used for parameter estimation. In addition, methods to assess the adequacy of model fit are introduced. Robustness of the model is studied by considering a multivariate-t distribution. The proposed method is applied to analyze the presidential election data of the American Psychological Association (APA).The author is grateful to K. Lam, K.F. Lam, the Editor, an associate editor, and three reviewers for their valuable comments and suggestions. This research was substantially supported by the CRCG grant 335/017/0015 of the University of Hong Kong and a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKU 7169/98H). Upon completion of this paper, I became aware that similar work had been done independently by K.G. Yao and U. Böckenholt (1999). 相似文献

13.

Latent variable selection in multidimensional item response theory models using the expectation model selection algorithm

Ping-Feng Xu Laixu Shang Qian-Zhen Zheng Na Shan Man-Lai Tang 《The British journal of mathematical and statistical psychology》2022,75(2):363-394

The aim of latent variable selection in multidimensional item response theory (MIRT) models is to identify latent traits probed by test items of a multidimensional test. In this paper the expectation model selection (EMS) algorithm proposed by Jiang et al. (2015) is applied to minimize the Bayesian information criterion (BIC) for latent variable selection in MIRT models with a known number of latent traits. Under mild assumptions, we prove the numerical convergence of the EMS algorithm for model selection by minimizing the BIC of observed data in the presence of missing data. For the identification of MIRT models, we assume that the variances of all latent traits are unity and each latent trait has an item that is only related to it. Under this identifiability assumption, the convergence of the EMS algorithm for latent variable selection in the multidimensional two-parameter logistic (M2PL) models can be verified. We give an efficient implementation of the EMS for the M2PL models. Simulation studies show that the EMS outperforms the EM-based L₁ regularization in terms of correctly selected latent variables and computation time. The EMS algorithm is applied to a real data set related to the Eysenck Personality Questionnaire. 相似文献

14.

Consistent estimation in the rasch model based on nonparametric margins

Dean Follmann 《Psychometrika》1988,53(4):553-562

Consider the class of two parameter marginal logistic (Rasch) models, for a test ofm True-False items, where the latent ability is assumed to be bounded. Using results of Karlin and Studen, we show that this class of nonparametric marginal logistic (NML) models is equivalent to the class of marginal logistic models where the latent ability assumes at most (m + 2)/2 values. This equivalence has two implications. First, estimation for the NML model is accomplished by estimating the parameters of a discrete marginal logistic model. Second, consistency for the maximum likelihood estimates of the NML model can be shown (whenm is odd) using the results of Kiefer and Wolfowitz. An example is presented which demonstrates the estimation strategy and contrasts the NML model with a normal marginal logistic model.This research was supported by NIMH traning grant, 2 T32 MH 15758-06 and by ONR contract N00014-84-K-0588. The author would like to thank Diane Lambert, John Rolph, and Stephen Fienberg for their assistance. Also, the comments of the referees helped to substantially improve the final version of this paper. 相似文献

15.

Robust estimation of ability in the Rasch model

Howard Wainer Benjamin D. Wright 《Psychometrika》1980,45(3):373-391

Estimating ability parameters in latent trait models in general, and in the Rasch model in particular is almost always hampered by noise in the data. This noise can be caused by guessing, inattention to easy questions, and other factors which are unrelated to ability. In this study several alternative formulations which attempt to deal with these problems without a reparameterization are tested through a Monte Carlo simulation. It was found that although no one of the tested schemes is uniformly superior to all others, a modified jackknife stood out as the best one in general, it was also super efficient (more efficient than the asymptotically optimal estimator) for tests with forty or fewer items. It is proposed that this sort of jackknifing scheme for estimating ability be considered for practical work.This research was funded through a grant from the Law Enforcement Assistance Administration (78-NI-AX-0047) to the Bureau of Social Science Research, Howard Wainer, Principal Investigator. We would like to thank Ronald Mead, Anne Morgan and James Ramsay for kind, generous, and invaluable help at various stages of the project. 相似文献

16.

Model selection for minimum‐diameter partitioning

Michael J. Brusco Douglas Steinley 《The British journal of mathematical and statistical psychology》2014,67(3):471-495

The minimum‐diameter partitioning problem (MDPP) seeks to produce compact clusters, as measured by an overall goodness‐of‐fit measure known as the partition diameter, which represents the maximum dissimilarity between any two objects placed in the same cluster. Complete‐linkage hierarchical clustering is perhaps the best‐known heuristic method for the MDPP and has an extensive history of applications in psychological research. Unfortunately, this method has several inherent shortcomings that impede the model selection process, such as: (1) sensitivity to the input order of the objects, (2) failure to obtain a globally optimal minimum‐diameter partition when cutting the tree at K clusters, and (3) the propensity for a large number of alternative minimum‐diameter partitions for a given K. We propose that each of these problems can be addressed by applying an algorithm that finds all of the minimum‐diameter partitions for different values of K. Model selection is then facilitated by considering, for each value of K, the reduction in the partition diameter, the number of alternative optima, and the partition agreement among the alternative optima. Using five examples from the empirical literature, we show the practical value of the proposed process for facilitating model selection for the MDPP. 相似文献

17.

Simultaneous analysis of multivariate polytomous variates in several groups

Sik-Yum Lee Wai-Yin Poon P. M. Bentler 《Psychometrika》1989,54(1):63-73

相似文献

18.

Bayes factors: Prior sensitivity and model generalizability

Charles C. Liu Murray Aitkin 《Journal of mathematical psychology》2008,52(6):362-375

Model selection is a central issue in mathematical psychology. One useful criterion for model selection is generalizability; that is, the chosen model should yield the best predictions for future data. Some researchers in psychology have proposed that the Bayes factor can be used for assessing model generalizability. An alternative method, known as the generalization criterion, has also been proposed for the same purpose. We argue that these two methods address different levels of model generalizability (local and global), and will often produce divergent conclusions. We illustrate this divergence by applying the Bayes factor and the generalization criterion to a comparison of retention functions. The application of alternative model selection criteria will also be demonstrated within the framework of model generalizability. 相似文献

19.

Testing unidimensionality in polytomous Rasch models

Karl Bang Christensen Jakob Bue Bjorner Svend Kreiner Jørgen Holm Petersen 《Psychometrika》2002,67(4):563-574

A fundamental assumption of most IRT models is that items measure the same unidimensional latent construct. For the polytomous Rasch model two ways of testing this assumption against specific multidimensional alternatives are discussed. One, a marginal approach assuming a multidimensional parametric latent variable distribution, and, two, a conditional approach with no distributional assumptions about the latent variable. The second approach generalizes the Martin-Löf test for the dichotomous Rasch model in two ways: to polytomous items and to a test against an alternative that may have more than two dimensions. A study on occupational health is used to motivate and illustrate the methods.The authors would like to thank Niels Keiding, Klaus Larsen and the anonymous reviewers for valuable comments to a previous version of this paper. This research was supported by a grant from the Danish Research Academy and by a general research grant from Quality Metric, Inc. 相似文献

20.

Choice, contingency discrimination, and foraging theory 总被引：17，自引：16，他引：1

下载免费PDF全文

Baum W Schwendiman J Bell K 《Journal of the experimental analysis of behavior》1999,71(3):355-373

Four pigeons were trained on eight or nine pairs of independent concurrent variable-interval schedules. The range of reinforcement ratios included extreme ratios (up to 532 to 1). Large samples of stable performance were gathered. Contrary to the findings of Davison and Jones (1995), the generalized matching law described choice more accurately than a contingency-discriminability model. Taking small samples (5 to 10 sessions) and applying a more liberal stability criterion used by Davison and Jones only increased the unsystematic variance in the data and in estimates of generalized-matching-law sensitivity. Because changing to dependent scheduling and inserting a changeover delay had no systematic effect, the deviations from generalized matching reported by Davison and Jones probably arose from imperfectly discriminated stimuli. Analysis of visits revealed that visits to the nonpreferred alternative were brief and approximately constant. When choice between the preferred (rich) and nonpreferred (lean) alternatives, regardless of position, was analyzed according to the generalized matching law, sensitivities approximated 1.0, with bias in favor of the lean alternative. This bias, which arose from an excessive frequency of visits to the lean alternative, explains undermatching as the result of fitting one line to a choice relation that consists of two displaced lines, both with a slope of 1.0. The pattern of deviation from the generalized matching line confirmed this account. The findings suggest an alternative analysis of choice that focuses on probability of visiting the lean alternative as the dependent variable. This probability was directly proportional to ratio of reinforcement. Matching, undermatching, and overmatching may all be explained by a view of concurrent performance based on foraging theory, in which responding occurs primarily at the rich alternative and is occasionally interrupted by brief visits to the lean alternative. 相似文献