首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
On a multiple-choice test in which each item hask alternative responses, the test taker is permitted to choose any subset which he believes contains the one correct answer. A scoring system is devised that depends on the size of the subset and on whether or not the correct answer is eliminated. The mean and variance of the score per item are obtained. Methods are derived for determining the total number of items that should be included on the test so that the average score on all items can be regarded as a good measure of the subject's knowledge. Efficiency comparisons between conventional and the subset selection scoring procedures are made. The analogous problem ofr > 1 correct answers for each item (withr fixed and known) is also considered.The authors are grateful to M. Aitkin, C. Coombs, F. Lord, and the reviewers for their comments and suggestions.  相似文献   

2.
A multiple-answer multiple-choice test item has a certain number of alternatives,any number of which might be keyed. The examinee is also allowed to mark any number of alternatives. This increased flexibility over the one keyed alternative case is useful in practice but raises questions about appropriate scoring rules. In this article a certain class of item scoring rules called thebinary class is considered. The concepts ofstandard scoring rules and equivalence among these scoring rules are introduced in the misinformation model for which the traditional knowledge model is a special case. The examinee's strategy with respect to a scoring rule is examined. The critical role of a quantity called the scoring ratio is emphasized. In the case of examinee uncertainty about the number of correct alternatives on an item, a Bayes and a minimax strategy for the examinee are developed. Also an appropriate response for the examiner to the minimax strategy is outlined.Research partially supported under Grants N00014-67-A-0314-0022 from the Office of Naval Research and GS-32514 and MPS 75-07539 from the National Science Foundation.  相似文献   

3.
The paper deals with certain problems connected with the assumption that choice probabilities ps(x, y) depend on the subject s. A set of postulates is given, which implies the existence of sequences of “classification standards”, i.e., sequences {zj} such that whenever we have 0 < ps0(x, zi) < 1 for some s0 and i, then ps(zi+k, x) = ps(x, zi?k) = 1 for all s, and k ≥ 1. Elements of any such sequence {zj} can serve as boundaries between successive categories of classification based on the following rule: Assign x to jth category if you feel it is “to the right” of zj and “to the left” of zj+1. Under the condition stated above this rule is unambiguous, and the resulting classification has the property that every element is assigned to one of the two neighboring categories, regardless who performs the classification.Next, the postulates are enriched so as to imply the existence of “tightest” among such sequences {zj}, hence leading to a classification with largest number of categories.  相似文献   

4.
Considered in this paper is a decision task which has been employed to study multistage betting behavior. When the task commences, a decision maker (DM) is provided with some capital x (x > 0) which he is required to allocate over m (m > 1) mutually exclusive and collectively exhaustive alternatives, each of which occurs with probability pi (pi > 0, i = 1,…, m; Σi=1mpi = 1). If the amount yi is allocated to alternative i (yi ≥ 0, Σi=1myi = x) and alternative i obtains, DM's capital for the next stage of the game becomes yiri, where ri (ri > 0) is the return per unit allocated to alternative i. The task consists of N stages.Defining risk in terms of the mean and variance of DM's bets, and assuming that the minimization of risk is DM's objective, decision policies satisfying this objective are derived in closed form and their testable properties are briefly discussed.  相似文献   

5.
A technique is indicated by which approximations to the factor loadings of a new test may be obtained if factor loadings of a given group of tests and the correlations of the new test with the other tests are known. The technique is applicable to any orthogonal system and is especially adapted to cases in which a ji a jk = 0 wheni k. Application is also made to the simultaneous determination of the factor weights of a group of tests in which no additional common factor is present. The technique is useful in adding tests to a completed factorial solution and in using factorial solutions involving errors to give results which are approximately correct.  相似文献   

6.
A model for multiple-choice exams is developed from a signal-detection perspective. A correct alternative in a multiple-choice exam can be viewed as being a signal embedded in noise (incorrect alternatives). Examinees are assumed to have perceptions of the plausibility of each alternative, and the decision process is to choose the most plausible alternative. It is also assumed that each examinee either knows or does not know each item. These assumptions together lead to a signal detection choice model for multiple-choice exams. The model can be viewed, statistically, as a mixture extension, with random mixing, of the traditional choice model, or similarly, as a grade-of-membership extension. A version of the model with extreme value distributions is developed, in which case the model simplifies to a mixture multinomial logit model with random mixing. The approach is shown to offer measures of item discrimination and difficulty, along with information about the relative plausibility of each of the alternatives. The model, parameters, and measures derived from the parameters are compared to those obtained with several commonly used item response theory models. An application of the model to an educational data set is presented.  相似文献   

7.
郭磊  刘伟 《心理科学》2018,(1):189-195
Zhang(2013)提出了序贯监测程序(SMP)用以检测CAT中的题目在作答过程中是否发生泄漏。然而,该方法会出现虚报且未关注在题目泄漏后,对能力估计精度产生的影响。本研究在SMP基础上引入个人拟合指标,提出SMP_PFI方法,拟在给定的置信度上核实被SMP标记的题目是否真正泄漏,并探查SMP_PFI方法对能力估计精度与被封存题目数量关系的影响。实验结果表明:新方法能够有效降低SMP单独运行时的一类错误。通过控制CPFI值能够平衡能力估计精度与被封存题目数量之间的关系。  相似文献   

8.
Let X = 〈X, ≧, R1, R2…〉 be a relational structure, 〈X, ≧〉 be a Dedekind complete, totally ordered set, and n be a nonnegative integer. X is said to satisfy n-point homogeneity if and only if for each x1,…, xn, y1,…, yn such that x1 ? x2 ? … ? xn and y1 ? y2 … ? yn, there exists an automorphism α of X such that α(x1) = yi. X is said to satisfy n-point uniqueness if and only if for all automorphisms β and γ of X, if β and γ agree at n distinct points of X, then β and γ are identical. It is shown that if X satisfies n-point homogeneity and n-point uniqueness, then n ≦ 2, and for the case n = 1, X is ratio scalable, and for the case n = 2, interval scalable. This result is very general and may in part provide an explanation of why so few scale types have arisen in science. The cases of 0-point homogeneity and infinite point homogeneity are also discussed.  相似文献   

9.
In the multistage betting game (MBG), a decision maker (DM) is provided with some capital x which he is required to bet over m (m > 1) mutually exclusive and collectively exhaustive alternatives, each of which occurs with probability pi (pi > 0, i = 1,…, m; Σi = 1mpi = 1). If yi is bet on alternative i (yi ≥ 0, Σi = 1myi = x) and alternative i obtains, the DM's capital for the next stage is yiri, (ri > 0). The MBG lasts until either the DM loses his capital or N stages elapse, whichever comes first. Each of six subjects participated in six sessions consisting of several hundred 3-alternative MBG stages. A within-subject design assigned negative expected value (EV) bets to the first three sessions and positive EV bets to three more sessions. Significant effects were found due to return rate, capital size, homogeneous runs of either wins or losses, and individual differences. Four maximization of expected utility and two minimization of risk models were presented and tested. A modified logarithmic utility model is proposed, which provides the best fit to the data. The implications of the results and directions for further research are briefly discussed.  相似文献   

10.
Estimation of the reliability of ratings   总被引:9,自引:0,他引:9  
A procedure for estimating the reliability of sets of ratings, test scores, or other measures is described and illustrated. This procedure, based upon analysis of variance, may be applied both in the special case where a complete set of ratings from each ofk sources is available for each ofn subjects, and in the general case wherek 1,k 2, ...,k n ratings are available for each of then subjects. It may be used to obtain either a unique estimate or a confidence interval for the reliability of either the component ratings or their averages. The relations of this procedure to others intended to serve the same purpose are considered algebraically and illustrated numerically.The writer wishes to acknowledge the helpful comments and suggestions of Professors E. E. Cureton, Harold Gulliksen, and E. F. Lindquist.  相似文献   

11.
It is well known that coefficient alpha can be used to estimate the reliability of a test even when the test is split into several parts. It is also known that alpha can severely underestimate test reliability when the several parts have an unequal number of items. A gernalization of alpha,β k, is proposed to correct this defect. Several properties ofβ k are also presented. The author gratefully acknowledges the assistance of Dr. Leonard Feldt for reviewing an earlier draft of this paper, and Ms. Rita Karwacki Bode and Mr. Dave Mansell for the analysis of the experimental data reported here. The comments of an unknown referee which contributed substantially to the clarity of the presentation are also gratefully acknowledged.  相似文献   

12.
Image theory for the structure of quantitative variates   总被引:1,自引:0,他引:1  
A universe of infinitely many quantitative variables is considered, from which a sample ofn variables is arbitrarily selected. Only linear least-squares regressions are considered, based on an infinitely large population of individuals or respondents. In the sample of variables, the predicted value of a variablex from the remainingn – 1 variables is called the partial image ofx, and the error of prediction is called the partial anti-image ofx. The predicted value ofx from the entire universe, or the limit of its partial images asn , is called the total image ofx, and the corresponding error is called the total anti-image. Images and anti-images can be used to explain why any two variablesx j andx k are correlated with each other, or to reveal the structure of the intercorrelations of the sample and of the universe. It is demonstrated that image theory is related to common-factor theory but has greater generality than common-factor theory, being able to deal with structures other than those describable in a Spearman-Thurstone factor space. A universal computing procedure is suggested, based upon the inverse of the correlation matrix.This paper introduces one of three new structural theories, each of which generalizes common-factor analysis in a different direction.Nodular theory extends common-factor analysis to qualitative data and to data with curvilinear regressions (6).Order-factor theory introduces the notions oforder among the observed variables and ofseparable factors (7). The presentimage theory is relevant also to the other two.Attention may be called to empirical results published since this paper was written: Louis Guttman, Two new approaches to factor analysis, Annual Technical Report on contract Nonr—731(00). The present research was aided by an uncommitted grant-in-aid from the Ford Foundation.  相似文献   

13.
SupposeP i (i) (i = 1, 2, ...,m, j = 1, 2, ...,n) give the locations ofmn points inp-dimensional space. Collectively these may be regarded asm configurations, or scalings, each ofn points inp-dimensions. The problem is investigated of translating, rotating, reflecting and scaling them configurations to minimize the goodness-of-fit criterion Σ i=1 m Σ i=1 n Δ2(P j (i) G i ), whereG i is the centroid of them pointsP i (i) (i = 1, 2, ...,m). The rotated positions of each configuration may be regarded as individual analyses with the centroid configuration representing a consensus, and this relationship with individual scaling analysis is discussed. A computational technique is given, the results of which can be summarized in analysis of variance form. The special casem = 2 corresponds to Classical Procrustes analysis but the choice of criterion that fits each configuration to the common centroid configuration avoids difficulties that arise when one set is fitted to the other, regarded as fixed.  相似文献   

14.
A hybrid procedure for number correct scoring is proposed. The proposed scoring procedure is based on both classical true-score theory (CTT) and multidimensional item response theory (MIRT). Specifically, the hybrid scoring procedure uses test item weights based on MIRT and the total test scores are computed based on CTT. Thus, what makes the hybrid scoring method attractive is that this method accounts for the dimensionality of the test items while test scores remain easy to compute. Further, the hybrid scoring does not require large sample sizes once the item parameters are known. Monte Carlo techniques were used to compare and contrast the proposed hybrid scoring method with three other scoring procedures. Results indicated that all scoring methods in this study generated estimated and true scores that were highly correlated. However, the hybrid scoring procedure had significantly smaller error variances between the estimated and true scores relative to the other procedures.  相似文献   

15.
The validity of a univocal multiple-choice test is determined for varying distributions of item difficulty and varying degrees of item precision. Validity is a function of d 2 + v 2 , where d measures item unreliability and v measures the spread of item difficulties. When this variance is very small, validity is high for one optimum cutting score, but the test gives relatively little valid information for other cutting scores. As this variance increases, eta increases up to a certain point, and then begins to decrease. Screening validity at the optimum cutting score declines as this variance increases, but the test becomes much more flexible, maintaining the same validity for a wide range of cutting scores. For items of the type ordinarily used in psychological tests, the test with uniform item difficulty gives greater over-all validity, and superior validity for most cutting scores, compared to a test with a range of item difficulties. When a multiple-choice test is intended to reject the poorestF per cent of the men tested, items should on the average be located at or above the threshold for men whose true ability is at theFth percentile.This research was performed under contract Nop 536 with the Bureau of Naval Personnel, and received additional support from the Bureau of Research and Service, College of Education, University of Illinois.  相似文献   

16.
Assume an X-linked gene in two alleles mediates performance on field dependent-independent tests such as the rod-and-frame test. Only the recessive gene with relative frequency q facilitates field independence. Other genotypes lead to field dependence. Under a simple genetic model, field dependence-independence may be viewed as outcomes of a discrete random variable B with field independent and dependent probabilities πiq and 1 ? πiq for men, and πiq2 and 1 ? πiq2 for women, respectively. The parameter πi is a maturational ageindexed parameter, 0 < πi ≤ 1, monotonically increasing with development until maturity when πk = 1. Observations of performance are made on a random variable W of the form W = B + N, where N is normal in distribution independent of B; N represents a composite of influences including error. The model implies testable age-related between- and within-sex predictions regarding E(W) and Var(W), predictions which appear to coincide with major empirical findings; it also generates novel predictions. For instance, W is a mixture of normals distribution. The model is briefly evaluates in two data sets.  相似文献   

17.
Multiple‐choice response formats are troublesome, as an item is often scored as solved simply because the examinee may be lucky at guessing the correct option. Instead of pertinent Item Response Theory models, which take guessing effects into account, this paper considers a psycho‐technological approach to re‐conceptualizing multiple‐choice response formats. The free‐response format is compared with two different multiple‐choice formats: a traditional format with a single correct response option and five distractors (‘1 of 6’), and another with five response options, three of them being distractors and two of them being correct (‘2 of 5’). For the latter format, an item is scored as mastered only if both correct response options and none of the distractors are marked. After the exclusion of a few items, the Rasch model analyses revealed appropriate fit for 188 items altogether. The resulting item‐difficulty parameters were used for comparison. The multiple‐choice format ‘1 of 6’ differs significantly from the multiple‐choice format ‘2 of 5’, while the latter does not differ significantly from the free‐response format. The lower difficulty of items ‘1 of 6’ suggests guessing effects.  相似文献   

18.
Nested logit models have been presented as an alternative to multinomial logistic models for multiple-choice test items (Suh and Bolt in Psychometrika 75:454–473, 2010) and possess a mathematical structure that naturally lends itself to evaluating the incremental information provided by attending to distractor selection in scoring. One potential concern in attending to distractors is the possibility that distractor selection reflects a different trait/ability than that underlying the correct response. This paper illustrates a multidimensional extension of a nested logit item response model that can be used to evaluate such distinctions and also defines a new framework for incorporating collateral information from distractor selection when differences exist. The approach is demonstrated in application to questions faced by a university testing center over whether to incorporate distractor selection into the scoring of its multiple-choice tests. Several empirical examples are presented.  相似文献   

19.
Four experiments addressed the relevance of the eigenvaluesI k of the inertia tensor for perceiving length by dynamic touch. Experiments 1–2 focused on the consequences of limiting variation in the minimum eigenvalueI 3. Both revealed that perceived length is a function ofI k . Whether the contribution ofI 3 is detected, however, depends on the range of values that characterize a particular object set. Experiments 3–4 considered the relationship between an independent index of a rod’s diameter, which does not affectI k , and actual manipulation of a rod’s diameter, which does affectI k . Whereas the former appeared as satisfaction of implicit instructions to alter reports of perceived length, the latter entailed actual differences in perceived length in accordance withI k . Results are discussed with respect to the links among actual length, perceived length, andI k , as well as, in particular, how these links guarantee that perceived length is in the range of actual lengths.  相似文献   

20.
Birnbaum's three-parameter logistic model for the multiple-choice item in the latent trait theory is considered with respect to the item response information function and the unique maximum condition. It is clarified that with models of knowledge or random guessing nature, which include the three-parameter logistic model, the unique maximum condition is not satisfied for the correct answer, and the item response information function is negative for the interval (− ∞,θ g ). It is suggested that we should useθ g as a criterion in selecting optimal items for a specified group of examinees, so that we can practically avoid the possibility of non-unique maxima of the likelihood function on the response pattern given by an examinee in the group. The work described in this paper was partially done while the author was at University of New Brunswick, Canada, in 1968–1970, supported by NRC Grant APA-345 from National Research Council of Canada.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号