首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
A theoretical model is given for dealing with omitted responses. Two special cases are investigated.This work was supported in part by contract N00014-80-C-0402, project designation NR 150-453 between the Office of Naval Research and Educational Testing Service. Reproduction in whole or in part is permitted for any purpose of the United States Government.  相似文献   

A non-forced choice model is developed that describes subject behavior on repeat trial discrimination tests of the pick 1 ofk form. The model is developed from the Dirichlet distribution, and it allows for the derivation of individual true scores and of sampling properties for various constructs of interest. These results permit the analysis and comparison of test designs. The model is applied to issues such as forced vs. non-forced choice formats, the best number of alternatives at a choice point, and the selection of expert panels.  相似文献   

Van der Linden's (2007, Psychometrika, 72, 287) hierarchical model for responses and response times in tests has numerous applications in psychological assessment. The success of these applications requires the parameters of the model to have been estimated without bias. The data used for model fitting, however, are often contaminated, for example, by rapid guesses or lapses of attention. This distorts the parameter estimates. In the present paper, a novel estimation approach is proposed that is robust against contamination. The approach consists of two steps. In the first step, the response time model is fitted on the basis of a robust estimate of the covariance matrix. In the second step, the item response model is extended to a mixture model, which allows for a proportion of irregular responses in the data. The parameters of the mixture model are then estimated with a modified marginal maximum likelihood estimator. The modified marginal maximum likelihood estimator downweights responses of test-takers with unusual response time patterns. As a result, the estimator is resistant to several forms of data contamination. The robustness of the approach is investigated in a simulation study. An application of the estimator is demonstrated with real data.  相似文献   

A model is proposed that describes subject behavior on repeat paired comparison preference tests. The model extends prior work in this area in that it explicitly allows for abstentions and permits the derivation of individual true scores for discrimination ability as well as conditional estimates of proportionate preference. With these results, the properties of a paired comparison test can be thoroughly explored. An empirical example is presented, and test design issues are considered. In particular, repeat paired comparison preference tests are shown to be inherently less efficient discrimination tests than are pick 1 of 2 tests.  相似文献   

An extension is described to a product testing model to account for misinformation among subjects. A misinformed subject is one who associates the taste of product A with product B and vice-versa; thus, the subject would tend to perform incorrectly on pick 1 of 2 tests. A likelihood ratio test for the presence of misinformation is described. The model is applied to a data set, and misinformation is found to exist. Biases due to model misspecificationand other implications for product testing are discussed.The first author is currently on leave from Carnegie Mellon University.  相似文献   

Suboptimal effort is a major threat to valid score-based inferences. While the effects of such behavior have been frequently examined in the context of mean group comparisons, minimal research has considered its effects on individual score use (e.g., identifying students for remediation). Focusing on the latter context, this study addressed two related questions via simulation and applied analyses. First, we investigated how much including noneffortful responses in scoring using a three-parameter logistic (3PL) model affects person parameter recovery and classification accuracy for noneffortful responders. Second, we explored whether improvements in these individual-level inferences were observed when employing the Effort Moderated IRT (EM-IRT) model under conditions in which its assumptions were met and violated. Results demonstrated that including 10% noneffortful responses in scoring led to average bias in ability estimates and misclassification rates by as much as 0.15 SDs and 7%, respectively. These results were mitigated when employing the EM-IRT model, particularly when model assumptions were met. However, once model assumptions were violated, the EM-IRT model’s performance deteriorated, though still outperforming the 3PL model. Thus, findings from this study show that (a) including noneffortful responses when using individual scores can lead to potential unfounded inferences and potential score misuse, and (b) the negative impact that noneffortful responding has on person ability estimates and classification accuracy can be mitigated by employing the EM-IRT model, particularly when its assumptions are met.  相似文献   

A modified beta binomial model is presented for use in analyzing ramdom guessing multiple choice tests and certain forms of taste tests. Detection probabilities for each item are distributed beta across the population subjects. Properties for the observable distribution of correct responses are derived. Two concepts of true score estimates are presented. One, analogous to Duncan's empirical Bayes posterior mean score, is appropriate for assessing the subject's performance on that particular test. The second is more suitable for predicting outcomes on similar tests.This research was made possible by a grant from the Center for Food Policy Research, Graduate School of Business, Columbia University.  相似文献   

胡伟  吕勇 《心理学探新》2011,(4):326-331
知识内隐性的判别一直是内隐学习研究领域的研究重点之一,恰当的区分内隐被试和外显被试对于该类研究有着至关重要的意义。本文介绍知识内隐性判别问题的研究历史,由最初的主观标准和客观标准到后来改进的研究方法,指出了随着对待内隐和外显学习两者关系的理论的发展,研究方法有必要与时俱进,并提出了新的区分内隐被试和外显被试的新方法。相对于传统研究方法,新方法从内隐知识和外显知识互为补充、"任何一种学习即存在内隐学习,也存在外显学习"的理论指导下,通过被试的猜测水平来判断其内隐水平,并确定"纯的"内隐被试的标准。该方法发展地看待内隐学习及其与外显学习的关系,有助于研究者更加深入、精确地研究内隐学习的相关问题。  相似文献   

Subjects’ decisions in multiple-choice tests are an interesting domain for the analysis of decision making under uncertainty. When the test is graded using a rule that penalizes wrong answers, each item can be viewed as a lottery where a rational examinee would choose whether to omit (sure reward) or answer (take the lottery) depending on risk aversion and level of knowledge. We formalize students as heterogeneous decision makers with different risk attitudes and levels of knowledge. Building on IRT, we compute the optimal penalty given students’ optimal behavior and the trade-off between bias and measurement error. Although MCQ examinations are frequently used, there is no consensus as to whether a penalty for wrong answers should be used or not. For example, examinations for medical licensing in some countries include MCQ sections with penalty while in others there is no penalty for wrong answers. We contribute to this discussion with a formal analysis of the effects of penalties; our simulations indicate that the optimal penalty is relatively high for perfectly rational students but also when they are not fully rational: even though penalty discriminates against risk averse students, this effect is small compared with the measurement error that it prevents.  相似文献   

Estimating ability parameters in latent trait models in general, and in the Rasch model in particular is almost always hampered by noise in the data. This noise can be caused by guessing, inattention to easy questions, and other factors which are unrelated to ability. In this study several alternative formulations which attempt to deal with these problems without a reparameterization are tested through a Monte Carlo simulation. It was found that although no one of the tested schemes is uniformly superior to all others, a modified jackknife stood out as the best one in general, it was also super efficient (more efficient than the asymptotically optimal estimator) for tests with forty or fewer items. It is proposed that this sort of jackknifing scheme for estimating ability be considered for practical work.This research was funded through a grant from the Law Enforcement Assistance Administration (78-NI-AX-0047) to the Bureau of Social Science Research, Howard Wainer, Principal Investigator. We would like to thank Ronald Mead, Anne Morgan and James Ramsay for kind, generous, and invaluable help at various stages of the project.  相似文献   

The guessing of answers in multiple choice tests adds random error to the variance of the test scores, lowering their reliability. Formula scoring rules that penalize for wrong guesses are frequently used to solve this problem. This paper uses prospect theory to analyze scoring rules from a decision‐making perspective and focuses on the effects of framing on the tendency to guess. In three experiments participants were presented with hypothetical test situations and were asked to indicate the degree of certainty that they thought was required for them to answer a question. In accordance with the framing hypothesis, participants tended to guess more when they anticipated a low grade and therefore considered themselves to be in the loss domain, or when the scoring rule caused the situation to be framed as entailing potential losses. The last experiment replicated these results with a task that resembles an actual test. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

Five different ability estimators—maximum likelihood [MLE ()], weighted likelihood [WLE ()], Bayesian modal [BME ()], expected a posteriori [EAP ()] and the standardized number-right score [Z ()]—were used as scores for conventional, multiple-choice tests. The bias, standard error and reliability of the five ability estimators were evaluated using Monte Carlo estimates of the unknown conditional means and variances of the estimators. The results indicated that ability estimates based on BME (), EAP () or WLE () were reasonably unbiased for the range of abilities corresponding to the difficulty of a test, and that their standard errors were relatively small. Also, they were as reliable as the old standby—the number-right score.  相似文献   

A goodness of fit test presented by Andersen is shown to be incorrect. The correct test is described and a re-analysis of Andersen's data is provided.  相似文献   

知识点认知规律的实验研究   总被引:2,自引:0,他引:2  
隋光远 《心理科学》2003,26(2):308-311
知识点是人的认知单位,人学习知识,必须以知识点为单位、逐个知识点地学习。本实验以概念学习为例,研究了人一次学习一个知识点与同时学习两个知识点时的成绩差异。研究结果表明,1. 在概念中的信息组块数不超过短时记忆容量的前提下,一次学习一个概念,构成概念的信息组块数对概念掌握影响不显著。2.在同样条件下,一次学习一个知识点的成绩好于同时学习两个知识点的成绩,从而验证了知识点认知规律。  相似文献   

物理概念图试题的评分方法   总被引:3,自引:0,他引:3       下载免费PDF全文
详细介绍了运用知识编码对物理概念图试题进行评分的具体方法,并对物理概念图试题的难度、区分度、信度和效度进行了检验,计算了物理概念图试题与传统物理题型共同组成的期末物理测验的信度,说明依据知识编码评定物理概念图试题分数,是一种行之有效的评分方法,按照知识编码评定物理概念图试题的分数使物理概念图试题成为一道高质量的试题。  相似文献   

The recognition heuristic postulates that individuals should choose a recognized object more often than an unrecognized one whenever recognition is related to the criterion. This behavior has been described as a one‐cue, noncompensatory decision‐making strategy. This claim and other assumptions were tested in four experiments using paired‐comparison tasks with cities and other geographical objects. The main results were (1) that the recognized object was chosen more often than the unrecognized one when the recognition cue was valid; (2) that participants' behavior did not reflect the recognition validity of their own knowledge; (3) that a less‐is‐more effect (i.e., better performance with less knowledge) was either absent or of only small size; and (4) that judgments were influenced by further knowledge, which could even compensate for the recognition cue. In sum, the recognition cue represents an important piece of knowledge in paired comparisons, but apparently not the only one. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

John–Michael Kuczynski says the "paradox of analysis" can be resolved with the proper definition of "partial knowledge." He says that this definition will not do: (K) S has partial knowledge of x = df S knows some, but not all, of x 's parts. He offers an alternative account of incomplete or partial knowledge. I argue here that: (a) Kuczynski's chief criticisms of (K) are defective; (b) his proposed solution to the paradox of analysis has no clear application to the paradox in its familiar forms; and (c) his solution may not avoid the puzzle about partial knowledge it was designed to resolve.  相似文献   

Book Review     
Book reviewed in this article:
Hunter Brown, William James on Radical Empiricism and Religion  相似文献   

Suppose a collection of standard tests is given to all subjects in a random sample, but a different new test is given to each group of subjects in nonoverlapping subsamples. A simple method is developed for displaying the information that the data set contains about the correlational structure of the new tests. This is possible to some extent, even though each subject takes only one new test. The method uses plausible values of the partial correlations among the new tests given the standard tests in order to generate plausible simple correlations among the new tests and plausible multiple correlations between composites of the new tests and the standard tests. The real data example included suggests that the method can be useful in practical problems.  相似文献   

项目反应理论中参数估计程序的实现,一直是研究现代测量理论的学者们关注的问题。该研究从理论上探讨了多级模型参数估计的实现途径,并模拟了5批不同数量及分布情形的项目及被试参数,生成相应的原始得分矩阵,对自编程序及国际流行的相关程序进行了严格的比较校验,验证结果证明本程序具有精确、稳定的性能,并且发现被试量太少将影响参数估计的精确性及稳定性  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号