共查询到20条相似文献,搜索用时 15 毫秒
1.
We introduce two simple empirical approximate Bayes estimators (EABEs)—
and
—for estimating domain scores under binomial and hypergeometric distributions, respectively. Both EABEs (derived from corresponding marginal distributions of observed test scorex without relying on knowledge of prior domain score distributions) have been proven to hold -asymptotic optimality in Robbins' sense of convergence in mean. We found that, where
and
are the monotonized versions of
and
under Van Houwelingen's monotonization method, respectively, the convergence rate of the overall expected loss of Bayes risk in either
or
depends on test length, sample size, and ratio of test length to size of domain items. In terms of conditional Bayes risk,
and
outperform their maximum likelihood counterparts over the middle range of domain scales. In terms of mean-squared error, we also found that: (a) given a unimodal prior distribution of domain scores,
performs better than both
and a linear EBE of the beta-binomial model when domain item size is small or when test items reflect a high degree of heterogeneity; (b)
performs as well as
when prior distribution is bimodal and test items are homogeneous; and (c) the linear EBE is extremely robust when a large pool of homogeneous items plus a unimodal prior distribution exists.The authors are indebted to both anonymous reviewers, especially Reviewer 2, and the Editor for their invaluable comments and suggestions. Thanks are also due to Yuan-Chin Chang and Chin-Fu Hsiao for their help with our simulation and programming work. 相似文献
2.
3.
In this paper, recent developments in empirical Bayes procedures are tied in with current work in mental test theory. Point estimators of true scores are derived for the binomial and Rasch test models. These estimators are shown to be asymptotically optimal. Smoothing and an empirical study of the behavior of empirical Bayes estimates are taken up in the final section.This research was supported by the National Science Foundation, Division of Biological and Medical Sciences, Program in Psycho-Biology, Grant No. NSF GB-30779. 相似文献
4.
5.
6.
7.
Lisa K. Fazio Pooja K. Agarwal Elizabeth J. Marsh Henry L. Roediger 《Memory & cognition》2010,38(4):407-418
Multiple-choice testing has both positive and negative consequences for performance on later tests. Prior testing increases
the number of questions answered correctly on a later test but also increases the likelihood that questions will be answered
with lures from the previous multiple-choice test (Roediger & Marsh, 2005). Prior research has shown that the positive effects
of testing persist over a delay, but no one has examined the durability of the negative effects of testing. To address this,
subjects took multiple-choice and cued recall tests (on subsets of questions) both immediately and a week after studying.
Although delay reduced both the positive and negative testing effects, both still occurred after 1 week, especially if the
multiple-choice test had also been delayed. These results are consistent with the argument that recollection underlies both
the positive and negative testing effects. 相似文献
8.
Two experiments investigated how the type and timing of feedback influence learning from a multiple-choice test. First, participants read 12 prose passages, which covered various general knowledge topics (e.g., The Sun) and ranged between 280 and 300 words in length. Next, they took an initial six-alternative, multiple-choice test on information contained in the passages. Feedback was given immediately for some of the multiple-choice items or after delay for other items. Participants were either shown the correct answer as feedback (standard feedback) or were allowed to keep answering until the correct answer was discovered (answer-until-correct feedback). Learning from the test was assessed on a delayed cued-recall test. The results indicated that delayed feedback led to superior final test performance relative to immediate feedback. However, type of feedback did not matter: discovering the correct answer through answer-until-correct feedback produced equivalent performance relative to standard feedback. This research suggests that delaying the presentation of feedback after a test is beneficial to learning because of the spaced presentation of information. 相似文献
9.
When both experts and lay people interpret data on sex-related differences, they usually forget that the instruments for data collection might be provoking such differences. This experiment, carried out on 240 participants, focused on the effects of four instruction/scoring conditions on sex effect size in two computerized tests — vocabulary and mental rotation, for which sex-related differences had been shown to be, respectively, small (favoring females) and large (favoring males). Given the caution which seems to characterize female performance, our general hypothesis predicted that, under instructions encouraging guessing, effect sizes favoring males would augment and effect sizes favoring females would diminish. The opposite results were expected under instructions discouraging guessing. Some supporting evidence was found. 相似文献
10.
Jeri L. Little 《Journal of Cognitive Psychology》2018,30(5-6):520-531
Answering multiple-choice questions improves access to otherwise difficult-to-retrieve knowledge tested by those questions. Here, I examine whether multiple-choice questions can also improve accessibility to related knowledge that is not explicitly tested. In two experiments, participants first answered challenging general knowledge (trivia) multiple-choice questions containing competitive incorrect alternatives and then took a final cued-recall test with those previously tested questions and new related questions for which a previously incorrect answer was the correct answer. In Experiment 1, participants correctly answered related questions more often and faster when they had taken a multiple-choice test than when they had not. In Experiment 2, I showed that the more accurate and faster responses were not simply a result of previous exposure to those alternatives. These findings have practical implications for potential benefits of multiple-choice testing and implications for the processes that occur when individuals answer multiple-choice questions. 相似文献
11.
Frederic M. Lord 《Psychometrika》1952,17(2):181-194
Under certain assumptions an expression, in terms of item difficulties and intercorrelations, is derived for the curvilinear correlation of test score on the ability underlying the test, this ability being defined as the common factor of the item tetrachoric intercorrelations corrected for guessing. It is shown that this curvilinear correlation is equal to the square root of the test reliability. Numerical values for these curvilinear correlations are presented for a number of hypothetical tests, defined in terms of their item parameters. These numerical results indicate that the reliability and the curvilinear correlation will be maximized by (1) minimizing the variability of item difficulty and (2) making the level of item difficulty somewhat easier than the halfway point between a chance percentage of correct answers and 100 per cent correct answers. 相似文献
12.
This exploratory study compared the perceived use of tests and test scores of 43 adult education teachers and 130 teachers in the K-12 system tested earlier. 相似文献
13.
The validity of a univocal multiple-choice test is determined for varying distributions of item difficulty and varying degrees of item precision. Validity is a function of
d
2
+
v
2
, where
d measures item unreliability and
v measures the spread of item difficulties. When this variance is very small, validity is high for one optimum cutting score, but the test gives relatively little valid information for other cutting scores. As this variance increases, eta increases up to a certain point, and then begins to decrease. Screening validity at the optimum cutting score declines as this variance increases, but the test becomes much more flexible, maintaining the same validity for a wide range of cutting scores. For items of the type ordinarily used in psychological tests, the test with uniform item difficulty gives greater over-all validity, and superior validity for most cutting scores, compared to a test with a range of item difficulties. When a multiple-choice test is intended to reject the poorestF per cent of the men tested, items should on the average be located at or above the threshold for men whose true ability is at theFth percentile.This research was performed under contract Nop 536 with the Bureau of Naval Personnel, and received additional support from the Bureau of Research and Service, College of Education, University of Illinois. 相似文献
14.
Rüdiger F. Pohl 《决策行为杂志》2006,19(3):251-271
The recognition heuristic postulates that individuals should choose a recognized object more often than an unrecognized one whenever recognition is related to the criterion. This behavior has been described as a one‐cue, noncompensatory decision‐making strategy. This claim and other assumptions were tested in four experiments using paired‐comparison tasks with cities and other geographical objects. The main results were (1) that the recognized object was chosen more often than the unrecognized one when the recognition cue was valid; (2) that participants' behavior did not reflect the recognition validity of their own knowledge; (3) that a less‐is‐more effect (i.e., better performance with less knowledge) was either absent or of only small size; and (4) that judgments were influenced by further knowledge, which could even compensate for the recognition cue. In sum, the recognition cue represents an important piece of knowledge in paired comparisons, but apparently not the only one. Copyright © 2006 John Wiley & Sons, Ltd. 相似文献
15.
TUCKER LR 《Psychometrika》1948,13(4):245-250
Outlined is the method used at present by the Educational Testing Service for computing intercorrelations from basic summations. This procedure is adapted to the use of high speed calculators in performing the calculations, and much of its value lies in the complete system of checks that is a part of the method. Besides the correlations that are the object of the procedure, covariances, means, standard deviations, and the number of cases are also recorded on the completed form to be available for further statistical steps. 相似文献
16.
17.
This paper assesses framing effects on decision making with internal uncertainty, i.e., partial knowledge, by focusing on examinees' behavior in multiple-choice (MC) tests with different scoring rules. In two experiments participants answered a general-knowledge MC test that consisted of 34 solvable and 6 unsolvable items. Experiment 1 studied two scoring rules involving Positive (only gains) and Negative (only losses) scores. Although answering all items was the dominating strategy for both rules, the results revealed a greater tendency to answer under the Negative scoring rule. These results are in line with the predictions derived from Prospect Theory (PT) [Econometrica 47 (1979) 263]. The second experiment studied two scoring rules, which allowed respondents to exhibit partial knowledge. Under the Inclusion-scoring rule the respondents mark all answers that could be correct, and under the Exclusion-scoring rule they exclude all answers that might be incorrect. As predicted by PT, respondents took more risks under the Inclusion rule than under the Exclusion rule. The results illustrate that the basic process that underlies choice behavior under internal uncertainty and especially the effect of framing is similar to the process of choice under external uncertainty and can be described quite accurately by PT. 相似文献
18.
Not all errors are created equal: metacognition and changing answers on multiple-choice tests. 总被引:2,自引:0,他引:2
Two experiments investigated the role of metacognition in changing answers to multiple-choice, general-knowledge questions. Both experiments revealed qualitatively different errors produced by speeded responding versus confusability amongst the alternatives; revision completely corrected the former, but had no effect on the latter. Experiment 2 also demonstrated that a pretest, designed to make participants' actual experience with answer changing either positive or negative, affected the tendency to correct errors. However, this effect was not apparent in the proportion of correct responses; it was only discovered when the metacognitive component to answer changing was isolated with a Type 2 signal-detection measure of discrimination. Overall, the results suggest that future research on answer changing should more closely consider the metacognitive factors underlying answer changing, using Type 2 signal-detection theory to isolate these aspects of performance. 相似文献
19.
Timothy A. Salthouse 《Acta psychologica》1982,52(3):213-226
Adult humans attempted to make quick responses to the first of two sequentially presented visual stimuli. At short interstimulus intervals (less than about 100 msec) accuracy was impaired by a different second stimulus and this was hypothesized to reflect the activity of an information processing component concerned with stimulus registration. At longer interstimulus intervals (up to approximately 350 msec) reaction time was inhibited by a different second stimulus and this was assumed to reflect the activity of a second component concerned with decision. The stimulus registration component was insensitive to variations in the complexity of the task, while the decision component was found to be greater for a task requiring recognition (is the current stimulus the same as an earlier one?) than for one merely requiring choice (what is the current stimulus?). This functional independence and the sizeable difference in the temporal range of susceptibility led to the conclusion that two distinct information processing components were involved. 相似文献
20.
Woolfolk RL 《Consciousness and cognition》2011,20(2):415-416
Experimental philosophy seeks to examine empirically various factual issues that, either explicitly or implicitly, lie at the foundations of philosophical positions. A study of this genre (Miller & Feltz, 2011) was critiqued. Questions about the study were raised and broader issues pertaining to the field of experimental philosophy were discussed. 相似文献