期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A subset selection technique for scoring items on a multiple choice test

Jean D. Gibbons Ingram Olkin Milton Sobel 《Psychometrika》1979,44(3):259-270

On a multiple-choice test in which each item hask alternative responses, the test taker is permitted to choose any subset which he believes contains the one correct answer. A scoring system is devised that depends on the size of the subset and on whether or not the correct answer is eliminated. The mean and variance of the score per item are obtained. Methods are derived for determining the total number of items that should be included on the test so that the average score on all items can be regarded as a good measure of the subject's knowledge. Efficiency comparisons between conventional and the subset selection scoring procedures are made. The analogous problem ofr > 1 correct answers for each item (withr fixed and known) is also considered.The authors are grateful to M. Aitkin, C. Coombs, F. Lord, and the reviewers for their comments and suggestions. 相似文献

2.

Multiple-answer multiple-choice test items: Responding and scoring through bayes and minimax strategies

George T. Duncan E. O. Milton 《Psychometrika》1978,43(1):43-57

A multiple-answer multiple-choice test item has a certain number of alternatives,any number of which might be keyed. The examinee is also allowed to mark any number of alternatives. This increased flexibility over the one keyed alternative case is useful in practice but raises questions about appropriate scoring rules. In this article a certain class of item scoring rules called thebinary class is considered. The concepts ofstandard scoring rules and equivalence among these scoring rules are introduced in the misinformation model for which the traditional knowledge model is a special case. The examinee's strategy with respect to a scoring rule is examined. The critical role of a quantity called the scoring ratio is emphasized. In the case of examinee uncertainty about the number of correct alternatives on an item, a Bayes and a minimax strategy for the examinee are developed. Also an appropriate response for the examiner to the minimax strategy is outlined.Research partially supported under Grants N00014-67-A-0314-0022 from the Office of Naval Research and GS-32514 and MPS 75-07539 from the National Science Foundation. 相似文献

3.

On linear classifications under varying choice probabilities

Robert Bartoszyński 《Journal of mathematical psychology》1978,18(3):249-259

The paper deals with certain problems connected with the assumption that choice probabilities p_s(x, y) depend on the subject s. A set of postulates is given, which implies the existence of sequences of “classification standards”, i.e., sequences {z_j} such that whenever we have 0 < p_s₀(x, z_i) < 1 for some s₀ and i, then p_s(z_i+k, x) = p_s(x, z_i?k) = 1 for all s, and k ≥ 1. Elements of any such sequence {z_j} can serve as boundaries between successive categories of classification based on the following rule: Assign x to jth category if you feel it is “to the right” of z_j and “to the left” of z_j+1. Under the condition stated above this rule is unambiguous, and the resulting classification has the property that every element is assigned to one of the two neighboring categories, regardless who performs the classification.Next, the postulates are enriched so as to imply the existence of “tightest” among such sequences {z_j}, hence leading to a classification with largest number of categories. 相似文献

4.

Decision policies minimizing risk in a multistage betting game

Amnon Rapoport William E. Stein 《Journal of mathematical psychology》1974,11(1):42-58

Considered in this paper is a decision task which has been employed to study multistage betting behavior. When the task commences, a decision maker (DM) is provided with some capital x (x > 0) which he is required to allocate over m (m > 1) mutually exclusive and collectively exhaustive alternatives, each of which occurs with probability p_i (p_i > 0, i = 1,…, m; Σ_i=1^mp_i = 1). If the amount y_i is allocated to alternative i (y_i ≥ 0, Σ_i=1^my_i = x) and alternative i obtains, DM's capital for the next stage of the game becomes y_ir_i, where r_i (r_i > 0) is the return per unit allocated to alternative i. The task consists of N stages.Defining risk in terms of the mean and variance of DM's bets, and assuming that the minimization of risk is DM's objective, decision policies satisfying this objective are derived in closed form and their testable properties are briefly discussed. 相似文献

5.

The determination of the factor loadings of a given test from the known factor loadings of other tests

Paul S. Dwyer 《Psychometrika》1937,2(3):173-178

A technique is indicated by which approximations to the factor loadings of a new test may be obtained if factor loadings of a given group of tests and the correlations of the new test with the other tests are known. The technique is applicable to any orthogonal system and is especially adapted to cases in which a _ji a _jk = 0 wheni k. Application is also made to the simultaneous determination of the factor weights of a group of tests in which no additional common factor is present. The technique is useful in adding tests to a completed factorial solution and in using factorial solutions involving errors to give results which are approximately correct. 相似文献

6.

A Signal Detection Model for Multiple-Choice Exams

Lawrence T. DeCarlo 《应用心理检测》2021,45(6):423

A model for multiple-choice exams is developed from a signal-detection perspective. A correct alternative in a multiple-choice exam can be viewed as being a signal embedded in noise (incorrect alternatives). Examinees are assumed to have perceptions of the plausibility of each alternative, and the decision process is to choose the most plausible alternative. It is also assumed that each examinee either knows or does not know each item. These assumptions together lead to a signal detection choice model for multiple-choice exams. The model can be viewed, statistically, as a mixture extension, with random mixing, of the traditional choice model, or similarly, as a grade-of-membership extension. A version of the model with extreme value distributions is developed, in which case the model simplifies to a mixture multinomial logit model with random mixing. The approach is shown to offer measures of item discrimination and difficulty, along with information about the relative plausibility of each of the alternatives. The model, parameters, and measures derived from the parameters are compared to those obtained with several commonly used item response theory models. An application of the model to an educational data set is presented. 相似文献

7.

CAT中结合贝叶斯方法与序贯监测程序的题库质量监控技术

郭磊刘伟《心理科学》2018,(1):189-195

Zhang(2013)提出了序贯监测程序(SMP)用以检测CAT中的题目在作答过程中是否发生泄漏。然而,该方法会出现虚报且未关注在题目泄漏后,对能力估计精度产生的影响。本研究在SMP基础上引入个人拟合指标,提出SMP_PFI方法,拟在给定的置信度上核实被SMP标记的题目是否真正泄漏,并探查SMP_PFI方法对能力估计精度与被封存题目数量关系的影响。实验结果表明：新方法能够有效降低SMP单独运行时的一类错误。通过控制CPFI值能够平衡能力估计精度与被封存题目数量之间的关系。相似文献

8.

On the scales of measurement

Louis Narens 《Journal of mathematical psychology》1981,24(3):249-275

Let

X

= 〈X, ≧, R₁, R₂…〉 be a relational structure, 〈

X, ≧

〉 be a Dedekind complete, totally ordered set, and n be a nonnegative integer.

X

is said to satisfy n-point homogeneity if and only if for each x₁,…, x_n, y₁,…, y_n such that x₁ ? x₂ ? … ? x_n and y₁ ? y₂ … ? y_n, there exists an automorphism α of

X

such that α(x₁) = y_i.

X

is said to satisfy n-point uniqueness if and only if for all automorphisms β and γ of

X

, if β and γ agree at n distinct points of

X

, then β and γ are identical. It is shown that if

X

satisfies n-point homogeneity and n-point uniqueness, then n ≦ 2, and for the case n = 1,

X

is ratio scalable, and for the case n = 2, interval scalable. This result is very general and may in part provide an explanation of why so few scale types have arisen in science. The cases of 0-point homogeneity and infinite point homogeneity are also discussed. 相似文献

9.

How one gambles if one must: Effects of differing return rates on multistage betting decisions

Amnon Rapoport Sandra G. Funk Jay R. Levinsohn Lyle V. Jones 《Journal of mathematical psychology》1977,15(2):169-198

In the multistage betting game (MBG), a decision maker (DM) is provided with some capital x which he is required to bet over m (m > 1) mutually exclusive and collectively exhaustive alternatives, each of which occurs with probability p_i (p_i > 0, i = 1,…, m; Σ_{i = 1}^mp_i = 1). If y_i is bet on alternative i (y_i ≥ 0, Σ_{i = 1}^my_i = x) and alternative i obtains, the DM's capital for the next stage is y_ir_i, (r_i > 0). The MBG lasts until either the DM loses his capital or N stages elapse, whichever comes first. Each of six subjects participated in six sessions consisting of several hundred 3-alternative MBG stages. A within-subject design assigned negative expected value (EV) bets to the first three sessions and positive EV bets to three more sessions. Significant effects were found due to return rate, capital size, homogeneous runs of either wins or losses, and individual differences. Four maximization of expected utility and two minimization of risk models were presented and tested. A modified logarithmic utility model is proposed, which provides the best fit to the data. The implications of the results and directions for further research are briefly discussed. 相似文献

10.

Estimation of the reliability of ratings 总被引：9，自引：0，他引：9

Robert L. Ebel 《Psychometrika》1951,16(4):407-424

A procedure for estimating the reliability of sets of ratings, test scores, or other measures is described and illustrated. This procedure, based upon analysis of variance, may be applied both in the special case where a complete set of ratings from each ofk sources is available for each ofn subjects, and in the general case wherek ₁,k ₂, ...,k _n ratings are available for each of then subjects. It may be used to obtain either a unique estimate or a confidence interval for the reliability of either the component ratings or their averages. The relations of this procedure to others intended to serve the same purpose are considered algebraically and illustrated numerically.The writer wishes to acknowledge the helpful comments and suggestions of Professors E. E. Cureton, Harold Gulliksen, and E. F. Lindquist. 相似文献

11.

A generalization of coefficient alpha

Nambury S. Raju 《Psychometrika》1977,42(4):549-565

It is well known that coefficient alpha can be used to estimate the reliability of a test even when the test is split into several parts. It is also known that alpha can severely underestimate test reliability when the several parts have an unequal number of items. A gernalization of alpha,β _k, is proposed to correct this defect. Several properties ofβ _k are also presented. The author gratefully acknowledges the assistance of Dr. Leonard Feldt for reviewing an earlier draft of this paper, and Ms. Rita Karwacki Bode and Mr. Dave Mansell for the analysis of the experimental data reported here. The comments of an unknown referee which contributed substantially to the clarity of the presentation are also gratefully acknowledged. 相似文献

12.

Image theory for the structure of quantitative variates 总被引：1，自引：0，他引：1

Louis Guttman 《Psychometrika》1953,18(4):277-296

A universe of infinitely many quantitative variables is considered, from which a sample ofn variables is arbitrarily selected. Only linear least-squares regressions are considered, based on an infinitely large population of individuals or respondents. In the sample of variables, the predicted value of a variablex from the remainingn – 1 variables is called the partial image ofx, and the error of prediction is called the partial anti-image ofx. The predicted value ofx from the entire universe, or the limit of its partial images asn , is called the total image ofx, and the corresponding error is called the total anti-image. Images and anti-images can be used to explain why any two variablesx _j andx _k are correlated with each other, or to reveal the structure of the intercorrelations of the sample and of the universe. It is demonstrated that image theory is related to common-factor theory but has greater generality than common-factor theory, being able to deal with structures other than those describable in a Spearman-Thurstone factor space. A universal computing procedure is suggested, based upon the inverse of the correlation matrix.This paper introduces one of three new structural theories, each of which generalizes common-factor analysis in a different direction.Nodular theory extends common-factor analysis to qualitative data and to data with curvilinear regressions (6).Order-factor theory introduces the notions oforder among the observed variables and ofseparable factors (7). The presentimage theory is relevant also to the other two.Attention may be called to empirical results published since this paper was written: Louis Guttman, Two new approaches to factor analysis, Annual Technical Report on contract Nonr—731(00). The present research was aided by an uncommitted grant-in-aid from the Ford Foundation. 相似文献

13.

Generalized procrustes analysis

J. C. Gower 《Psychometrika》1975,40(1):33-51

SupposeP _i ⁽ⁱ⁾ (i = 1, 2, ...,m, j = 1, 2, ...,n) give the locations ofmn points inp-dimensional space. Collectively these may be regarded asm configurations, or scalings, each ofn points inp-dimensions. The problem is investigated of translating, rotating, reflecting and scaling them configurations to minimize the goodness-of-fit criterion Σ _i=1 ^m Σ _i=1 ⁿ Δ²(P _j ⁽ⁱ⁾ G _i), whereG _i is the centroid of them pointsP _i ⁽ⁱ⁾ (i = 1, 2, ...,m). The rotated positions of each configuration may be regarded as individual analyses with the centroid configuration representing a consensus, and this relationship with individual scaling analysis is discussed. A computational technique is given, the results of which can be summarized in analysis of variance form. The special casem = 2 corresponds to Classical Procrustes analysis but the choice of criterion that fits each configuration to the common centroid configuration avoids difficulties that arise when one set is fitted to the other, regarded as fixed. 相似文献

14.

A Proposed Number Correct Scoring Procedure Based on Classical True-Score Theory and Multidimensional Item Response Theory

《International Journal of Testing》2013,13(2):131-141

A hybrid procedure for number correct scoring is proposed. The proposed scoring procedure is based on both classical true-score theory (CTT) and multidimensional item response theory (MIRT). Specifically, the hybrid scoring procedure uses test item weights based on MIRT and the total test scores are computed based on CTT. Thus, what makes the hybrid scoring method attractive is that this method accounts for the dimensionality of the test items while test scores remain easy to compute. Further, the hybrid scoring does not require large sample sizes once the item parameters are known. Monte Carlo techniques were used to compare and contrast the proposed hybrid scoring method with three other scoring procedures. Results indicated that all scoring methods in this study generated estimated and true scores that were highly correlated. However, the hybrid scoring procedure had significantly smaller error variances between the estimated and true scores relative to the other procedures. 相似文献

15.

Efficiency of multiple-choice tests as a function of spread of item difficulties

Lee J. Cronbach Willard G. Warrington 《Psychometrika》1952,17(2):127-147

The validity of a univocal multiple-choice test is determined for varying distributions of item difficulty and varying degrees of item precision. Validity is a function of _d ² + _v ² , where _d measures item unreliability and _v measures the spread of item difficulties. When this variance is very small, validity is high for one optimum cutting score, but the test gives relatively little valid information for other cutting scores. As this variance increases, eta increases up to a certain point, and then begins to decrease. Screening validity at the optimum cutting score declines as this variance increases, but the test becomes much more flexible, maintaining the same validity for a wide range of cutting scores. For items of the type ordinarily used in psychological tests, the test with uniform item difficulty gives greater over-all validity, and superior validity for most cutting scores, compared to a test with a range of item difficulties. When a multiple-choice test is intended to reject the poorestF per cent of the men tested, items should on the average be located at or above the threshold for men whose true ability is at theFth percentile.This research was performed under contract Nop 536 with the Bureau of Naval Personnel, and received additional support from the Bureau of Research and Service, College of Education, University of Illinois. 相似文献

16.

A strong developmental theory of field dependence-independence

Hoben Thomas 《Journal of mathematical psychology》1982,26(2):169-178

Assume an X-linked gene in two alleles mediates performance on field dependent-independent tests such as the rod-and-frame test. Only the recessive gene with relative frequency q facilitates field independence. Other genotypes lead to field dependence. Under a simple genetic model, field dependence-independence may be viewed as outcomes of a discrete random variable B with field independent and dependent probabilities π_iq and 1 ? π_iq for men, and π_iq² and 1 ? π_iq² for women, respectively. The parameter π_i is a maturational ageindexed parameter, 0 < π_i ≤ 1, monotonically increasing with development until maturity when π_k = 1. Observations of performance are made on a random variable W of the form W = B + N, where N is normal in distribution independent of B; N represents a composite of influences including error. The model implies testable age-related between- and within-sex predictions regarding E(W) and Var(W), predictions which appear to coincide with major empirical findings; it also generates novel predictions. For instance, W is a mixture of normals distribution. The model is briefly evaluates in two data sets. 相似文献

17.

On Minimizing Guessing Effects on Multiple‐Choice Items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format

Klaus D. Kubinger Stefana Holocher‐Ertl Manuel Reif Christine Hohensinn Martina Frebort 《International Journal of Selection & Assessment》2010,18(1):111-115

Multiple‐choice response formats are troublesome, as an item is often scored as solved simply because the examinee may be lucky at guessing the correct option. Instead of pertinent Item Response Theory models, which take guessing effects into account, this paper considers a psycho‐technological approach to re‐conceptualizing multiple‐choice response formats. The free‐response format is compared with two different multiple‐choice formats: a traditional format with a single correct response option and five distractors (‘1 of 6’), and another with five response options, three of them being distractors and two of them being correct (‘2 of 5’). For the latter format, an item is scored as mastered only if both correct response options and none of the distractors are marked. After the exclusion of a few items, the Rasch model analyses revealed appropriate fit for 188 items altogether. The resulting item‐difficulty parameters were used for comparison. The multiple‐choice format ‘1 of 6’ differs significantly from the multiple‐choice format ‘2 of 5’, while the latter does not differ significantly from the free‐response format. The lower difficulty of items ‘1 of 6’ suggests guessing effects. 相似文献

18.

Application of a Multidimensional Nested Logit Model to Multiple-Choice Test Items

Daniel?M.?Bolt Email author James?A.?Wollack Youngsuk?Suh 《Psychometrika》2012,77(2):339-357

Nested logit models have been presented as an alternative to multinomial logistic models for multiple-choice test items (Suh and Bolt in Psychometrika 75:454–473, 2010) and possess a mathematical structure that naturally lends itself to evaluating the incremental information provided by attending to distractor selection in scoring. One potential concern in attending to distractors is the possibility that distractor selection reflects a different trait/ability than that underlying the correct response. This paper illustrates a multidimensional extension of a nested logit item response model that can be used to evaluate such distinctions and also defines a new framework for incorporating collateral information from distractor selection when differences exist. The approach is demonstrated in application to questions faced by a university testing center over whether to incorporate distractor selection into the scoring of its multiple-choice tests. Several empirical examples are presented. 相似文献

19.

Inertial eigenvalues, rod density, and rod diameter in length perception by dynamic touch

Claudia Carello Paula Fitzpatrick Ittai Flascher M. T. Turvey 《Attention, perception & psychophysics》1998,60(1):89-100

Four experiments addressed the relevance of the eigenvaluesI _k of the inertia tensor for perceiving length by dynamic touch. Experiments 1–2 focused on the consequences of limiting variation in the minimum eigenvalueI ₃. Both revealed that perceived length is a function ofI _k. Whether the contribution ofI ₃ is detected, however, depends on the range of values that characterize a particular object set. Experiments 3–4 considered the relationship between an independent index of a rod’s diameter, which does not affectI _k, and actual manipulation of a rod’s diameter, which does affectI _k. Whereas the former appeared as satisfaction of implicit instructions to alter reports of perceived length, the latter entailed actual differences in perceived length in accordance withI _k. Results are discussed with respect to the links among actual length, perceived length, andI _k, as well as, in particular, how these links guarantee that perceived length is in the range of actual lengths. 相似文献

20.

A comment on Birnbaum's three-parameter logistic model in the latent trait theory

Fumiko Samejima 《Psychometrika》1973,38(2):221-233

Birnbaum's three-parameter logistic model for the multiple-choice item in the latent trait theory is considered with respect to the item response information function and the unique maximum condition. It is clarified that with models of knowledge or random guessing nature, which include the three-parameter logistic model, the unique maximum condition is not satisfied for the correct answer, and the item response information function is negative for the interval (− ∞,θ _g). It is suggested that we should useθ _g as a criterion in selecting optimal items for a specified group of examinees, so that we can practically avoid the possibility of non-unique maxima of the likelihood function on the response pattern given by an examinee in the group. The work described in this paper was partially done while the author was at University of New Brunswick, Canada, in 1968–1970, supported by NRC Grant APA-345 from National Research Council of Canada. 相似文献