期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Empirical bayes estimates of domain scores under binomial and hypergeometric distributions for test scores

Miao-Hsiang Lin Chao A. Hsiung 《Psychometrika》1994,59(3):331-359

We introduce two simple empirical approximate Bayes estimators (EABEs)— and —for estimating domain scores under binomial and hypergeometric distributions, respectively. Both EABEs (derived from corresponding marginal distributions of observed test scorex without relying on knowledge of prior domain score distributions) have been proven to hold -asymptotic optimality in Robbins' sense of convergence in mean. We found that, where and are the monotonized versions of and under Van Houwelingen's monotonization method, respectively, the convergence rate of the overall expected loss of Bayes risk in either or depends on test length, sample size, and ratio of test length to size of domain items. In terms of conditional Bayes risk, and outperform their maximum likelihood counterparts over the middle range of domain scales. In terms of mean-squared error, we also found that: (a) given a unimodal prior distribution of domain scores, performs better than both and a linear EBE of the beta-binomial model when domain item size is small or when test items reflect a high degree of heterogeneity; (b) performs as well as when prior distribution is bimodal and test items are homogeneous; and (c) the linear EBE is extremely robust when a large pool of homogeneous items plus a unimodal prior distribution exists.The authors are indebted to both anonymous reviewers, especially Reviewer 2, and the Editor for their invaluable comments and suggestions. Thanks are also due to Yuan-Chin Chang and Chin-Fu Hsiao for their help with our simulation and programming work. 相似文献

2.

Empirical bayes point estimates of latent trait scores without knowledge of the trait distribution

William Meredith Jack Kearns 《Psychometrika》1973,38(4):533-554

In this paper, recent developments in empirical Bayes procedures are tied in with current work in mental test theory. Point estimators of true scores are derived for the binomial and Rasch test models. These estimators are shown to be asymptotically optimal. Smoothing and an empirical study of the behavior of empirical Bayes estimates are taken up in the final section.This research was supported by the National Science Foundation, Division of Biological and Medical Sciences, Program in Psycho-Biology, Grant No. NSF GB-30779. 相似文献

3.

Comparison of the three-decision and conventional multiple-choice tests

C S Bernhardson 《Psychological reports》1967,20(3):695-698

相似文献

4.

Comparison of intercorrelations of scale scores from the opinions about mental illness scale

J Fracchia J Pintyr J Crovello C Sheppard S Merlis 《Psychological reports》1972,30(1):149-150

相似文献

5.

Significant intercorrelations among measures of overinclusive thinking

R J Craig 《Psychological reports》1970,26(2):571-574

相似文献

6.

Memorial consequences of multiple-choice testing on immediate and delayed tests

Lisa K. Fazio Pooja K. Agarwal Elizabeth J. Marsh Henry L. Roediger 《Memory & cognition》2010,38(4):407-418

Multiple-choice testing has both positive and negative consequences for performance on later tests. Prior testing increases the number of questions answered correctly on a later test but also increases the likelihood that questions will be answered with lures from the previous multiple-choice test (Roediger & Marsh, 2005). Prior research has shown that the positive effects of testing persist over a delay, but no one has examined the durability of the negative effects of testing. To address this, subjects took multiple-choice and cued recall tests (on subsets of questions) both immediately and a week after studying. Although delay reduced both the positive and negative testing effects, both still occurred after 1 week, especially if the multiple-choice test had also been delayed. These results are consistent with the argument that recollection underlies both the positive and negative testing effects. 相似文献

7.

The effect of type and timing of feedback on learning from multiple-choice tests

Butler AC Karpicke JD Roediger HL 《Journal of experimental psychology. Applied》2007,13(4):273-281

Two experiments investigated how the type and timing of feedback influence learning from a multiple-choice test. First, participants read 12 prose passages, which covered various general knowledge topics (e.g., The Sun) and ranged between 280 and 300 words in length. Next, they took an initial six-alternative, multiple-choice test on information contained in the passages. Feedback was given immediately for some of the multiple-choice items or after delay for other items. Participants were either shown the correct answer as feedback (standard feedback) or were allowed to keep answering until the correct answer was discovered (answer-until-correct feedback). Learning from the test was assessed on a delayed cued-recall test. The results indicated that delayed feedback led to superior final test performance relative to immediate feedback. However, type of feedback did not matter: discovering the correct answer through answer-until-correct feedback produced equivalent performance relative to standard feedback. This research suggests that delaying the presentation of feedback after a test is beneficial to learning because of the spaced presentation of information. 相似文献

8.

The role of instructions in the variability of sex-related differences in multiple-choice tests

Gerardo Prieto Ana R. Delgado 《Personality and individual differences》1999,27(6):415

When both experts and lay people interpret data on sex-related differences, they usually forget that the instruments for data collection might be provoking such differences. This experiment, carried out on 240 participants, focused on the effects of four instruction/scoring conditions on sex effect size in two computerized tests — vocabulary and mental rotation, for which sex-related differences had been shown to be, respectively, small (favoring females) and large (favoring males). Given the caution which seems to characterize female performance, our general hypothesis predicted that, under instructions encouraging guessing, effect sizes favoring males would augment and effect sizes favoring females would diminish. The opposite results were expected under instructions discouraging guessing. Some supporting evidence was found. 相似文献

9.

The relation of the reliability of multiple-choice tests to the distribution of item difficulties

Frederic M. Lord 《Psychometrika》1952,17(2):181-194

Under certain assumptions an expression, in terms of item difficulties and intercorrelations, is derived for the curvilinear correlation of test score on the ability underlying the test, this ability being defined as the common factor of the item tetrachoric intercorrelations corrected for guessing. It is shown that this curvilinear correlation is equal to the square root of the test reliability. Numerical values for these curvilinear correlations are presented for a number of hypothetical tests, defined in terms of their item parameters. These numerical results indicate that the reliability and the curvilinear correlation will be maximized by (1) minimizing the variability of item difficulty and (2) making the level of item difficulty somewhat easier than the halfway point between a chance percentage of correct answers and 100 per cent correct answers. 相似文献

10.

Attitudes towards the use of tests and test scores

Gellman E Guarino AJ Witte JE 《Psychological reports》2001,89(3):669-671

This exploratory study compared the perceived use of tests and test scores of 43 adult education teachers and 130 teachers in the K-12 system tested earlier. 相似文献

11.

Efficiency of multiple-choice tests as a function of spread of item difficulties

Lee J. Cronbach Willard G. Warrington 《Psychometrika》1952,17(2):127-147

The validity of a univocal multiple-choice test is determined for varying distributions of item difficulty and varying degrees of item precision. Validity is a function of _d ² + _v ² , where _d measures item unreliability and _v measures the spread of item difficulties. When this variance is very small, validity is high for one optimum cutting score, but the test gives relatively little valid information for other cutting scores. As this variance increases, eta increases up to a certain point, and then begins to decrease. Screening validity at the optimum cutting score declines as this variance increases, but the test becomes much more flexible, maintaining the same validity for a wide range of cutting scores. For items of the type ordinarily used in psychological tests, the test with uniform item difficulty gives greater over-all validity, and superior validity for most cutting scores, compared to a test with a range of item difficulties. When a multiple-choice test is intended to reject the poorestF per cent of the men tested, items should on the average be located at or above the threshold for men whose true ability is at theFth percentile.This research was performed under contract Nop 536 with the Bureau of Naval Personnel, and received additional support from the Bureau of Research and Service, College of Education, University of Illinois. 相似文献

12.

A note on the computation of a table of intercorrelations

TUCKER LR 《Psychometrika》1948,13(4):245-250

Outlined is the method used at present by the Educational Testing Service for computing intercorrelations from basic summations. This procedure is adapted to the use of high speed calculators in performing the calculations, and much of its value lies in the complete system of checks that is a part of the method. Besides the correlations that are the object of the procedure, covariances, means, standard deviations, and the number of cases are also recorded on the completed form to be available for further statistical steps. 相似文献

13.

Interpretation of the standard error of measurement when true scores and error scores on mental tests are not independent

D W Zimmerman R H Williams 《Psychological reports》1966,19(2):611-617

相似文献

14.

Decision making under internal uncertainty: the case of multiple-choice tests with different scoring rules

Bereby-Meyer Y Meyer J Budescu DV 《Acta psychologica》2003,112(2):207-220

This paper assesses framing effects on decision making with internal uncertainty, i.e., partial knowledge, by focusing on examinees' behavior in multiple-choice (MC) tests with different scoring rules. In two experiments participants answered a general-knowledge MC test that consisted of 34 solvable and 6 unsolvable items. Experiment 1 studied two scoring rules involving Positive (only gains) and Negative (only losses) scores. Although answering all items was the dominating strategy for both rules, the results revealed a greater tendency to answer under the Negative scoring rule. These results are in line with the predictions derived from Prospect Theory (PT) [Econometrica 47 (1979) 263]. The second experiment studied two scoring rules, which allowed respondents to exhibit partial knowledge. Under the Inclusion-scoring rule the respondents mark all answers that could be correct, and under the Exclusion-scoring rule they exclude all answers that might be incorrect. As predicted by PT, respondents took more risks under the Inclusion rule than under the Exclusion rule. The results illustrate that the basic process that underlies choice behavior under internal uncertainty and especially the effect of framing is similar to the process of choice under external uncertainty and can be described quite accurately by PT. 相似文献

15.

Not all errors are created equal: metacognition and changing answers on multiple-choice tests. 总被引：2，自引：0，他引：2

Philip A Higham Catherine Gerrard 《Revue canadienne de psychologie expérimentale》2005,59(1):28-34

Two experiments investigated the role of metacognition in changing answers to multiple-choice, general-knowledge questions. Both experiments revealed qualitatively different errors produced by speeded responding versus confusability amongst the alternatives; revision completely corrected the former, but had no effect on the latter. Experiment 2 also demonstrated that a pretest, designed to make participants' actual experience with answer changing either positive or negative, affected the tendency to correct errors. However, this effect was not apparent in the proportion of correct responses; it was only discovered when the metacognitive component to answer changing was isolated with a Type 2 signal-detection measure of discrimination. Overall, the results suggest that future research on answer changing should more closely consider the metacognitive factors underlying answer changing, using Type 2 signal-detection theory to isolate these aspects of performance. 相似文献

16.

Note on intercorrelations of scales of the Myers-Briggs type indicator

H G Richek 《Psychological reports》1969,25(1):28-30

相似文献

17.

Duration estimates of two information processing components

Timothy A. Salthouse 《Acta psychologica》1982,52(3):213-226

Adult humans attempted to make quick responses to the first of two sequentially presented visual stimuli. At short interstimulus intervals (less than about 100 msec) accuracy was impaired by a different second stimulus and this was hypothesized to reflect the activity of an information processing component concerned with stimulus registration. At longer interstimulus intervals (up to approximately 350 msec) reaction time was inhibited by a different second stimulus and this was assumed to reflect the activity of a second component concerned with decision. The stimulus registration component was insensitive to variations in the complexity of the task, while the decision component was found to be greater for a task requiring recognition (is the current stimulus the same as an earlier one?) than for one merely requiring choice (what is the current stimulus?). This functional independence and the sizeable difference in the temporal range of susceptibility led to the conclusion that two distinct information processing components were involved. 相似文献

18.

Some comments on confounded correlations among Rorschach scores

WITTENBORN JR 《Journal of consulting psychology》1959,23(1):75-77

相似文献

19.

Empirical tests of philosophical intuitions

Woolfolk RL 《Consciousness and cognition》2011,20(2):415-416

Experimental philosophy seeks to examine empirically various factual issues that, either explicitly or implicitly, lie at the foundations of philosophical positions. A study of this genre (Miller & Feltz, 2011) was critiqued. Questions about the study were raised and broader issues pertaining to the field of experimental philosophy were discussed. 相似文献

20.

The comparability of WAIS and WISC subtest scores and IQ estimates 总被引：2，自引：0，他引：2

M Y Quereshi 《The Journal of psychology》1968,68(1):73-82

相似文献