期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The effect of type and timing of feedback on learning from multiple-choice tests

Butler AC Karpicke JD Roediger HL 《Journal of experimental psychology. Applied》2007,13(4):273-281

Two experiments investigated how the type and timing of feedback influence learning from a multiple-choice test. First, participants read 12 prose passages, which covered various general knowledge topics (e.g., The Sun) and ranged between 280 and 300 words in length. Next, they took an initial six-alternative, multiple-choice test on information contained in the passages. Feedback was given immediately for some of the multiple-choice items or after delay for other items. Participants were either shown the correct answer as feedback (standard feedback) or were allowed to keep answering until the correct answer was discovered (answer-until-correct feedback). Learning from the test was assessed on a delayed cued-recall test. The results indicated that delayed feedback led to superior final test performance relative to immediate feedback. However, type of feedback did not matter: discovering the correct answer through answer-until-correct feedback produced equivalent performance relative to standard feedback. This research suggests that delaying the presentation of feedback after a test is beneficial to learning because of the spaced presentation of information. 相似文献

2.

Verbal comprehension: The lexical decomposition strategy to define unfamiliar words

《Intelligence》1987,11(1):1-20

Two experiments were conducted to study one facet of verbal intelligence—the ability of adolescents and adults to use a lexical decomposition strategy to define prefixed words and pseudowords. In the first experiment, subjects in grades 8, 10, 11, and college were given a multiple-choice vocabulary test. The test measured subjects' abilities to use wordpart meanings when defining words. Subjects' metacognitive knowledge of both the words and their prefixes and stems was assessed through a pair of rating tasks. Adults performed better than the adolescents on the vocabulary test. Evidence for a lexical decomposition strategy was found for both the adolescents and adults on subsets of the most familiar items. Measures of metacognitive knowledge were related significantly to measures of vocabulary performance for the adolescents and adults.In the second experiment, college subjects were given one of four multiple-choice vocabulary tests that included decomposable (prefixed) and nondecomposable known (words) and unknown (pseudowords) stimuli. Evidence for use of the lexical decomposition strategy was strong even though there were small effects of two performance factors predicted to affect the use of the strategy. Results are discussed in relation to a theory on the use of internal context in verbal comprehension 相似文献

3.

The memorial consequences of multiple-choice testing

Marsh EJ Roediger HL Bjork RA Bjork EL 《Psychonomic bulletin & review》2007,14(2):194-199

The present article addresses whether multiple-choice tests may change knowledge even as they attempt to measure it. Overall, taking a multiple-choice test boosts performance on later tests, as compared with nontested control conditions. This benefit is not limited to simple definitional questions, but holds true for SAT II questions and for items designed to tap concepts at a higher level in Bloom’s (1956) taxonomy of educational objectives. Students, however, can also learn false facts from multiple-choice tests; testing leads to persistence of some multiple-choice lures on later general knowledge tests. Such persistence appears due to faulty reasoning rather than to an increase in the familiarity of lures. Even though students may learn false facts from multiplechoice tests, the positive effects of testing outweigh this cost. 相似文献

4.

Recollecting events associated with victimization

R K Guenther C Frey 《Psychological reports》1990,67(1):207-217

Subjects first completed social desirability and anxiety personality scales and then read a story about a woman meeting her brother for lunch. Some subjects were then told that the story involved sexual abuse. A week later all subjects took a multiple-choice memory test over the story. The results indicated that subjects categorized as repressors based on the personality scales had a lower proportion of negative than positive errors than did the nonrepressors, but only when they believed the story was about sexual abuse. However, repressors answered correctly as many items as did nonrepressors. The results were consistent with the idea that repressors remember as much about victimization experiences as do nonrepressors but are more likely to fill in the missing details of victimization experiences with positive reconstructions designed to reduce the over-all negative quality associated with victimization. 相似文献

5.

Simplified formulas for item selection and construction

Dorothy C. Adkins Herbert A. Toops 《Psychometrika》1937,2(3):165-171

The formula for the Pearson correlation coefficient of a dichotomous variable with a multiple-categoried variable is simplified for computational purposes by effecting in the multiple-categoried variable two types of arbitrary distributions: (1) rectangular and (2) proportional to binomial expansion coefficients. The formulas which result are convenient for the selection of test items and are applicable to the objective estimation of the comparative merits of the alternatives in multiple-choice test items. It is shown that the authoritative answer should have a high positive criterion coefficient, while the omissions and several wrong-answer alternatives should each have low (algebraic) negative criterion coefficients. 相似文献

6.

Test-enhanced learning in the classroom: long-term improvements from quizzing 总被引：1，自引：0，他引：1

Roediger HL Agarwal PK McDaniel MA McDermott KB 《Journal of experimental psychology. Applied》2011,17(4):382-395

Three experiments examined whether quizzing promotes learning and retention of material from a social studies course with sixth grade students from a suburban middle school. The material used in the experiments was the course material students were to learn and some of the dependent measures were the actual tests on which students received grades. In within-subject designs, students received three low-stakes multiple-choice quizzes in Experiments 1 and 2 and performance on quizzed items was compared to that on items that were presented twice (Experiment 2) or items that were not presented on the initial quizzes (Experiments 1 and 2). We found that students' performance on both chapter exams and semester exams improved following quizzing relative to either not being quizzed or relative to the twice-presented items. In Experiment 3, students were given one multiple-choice quiz in class and encouraged to quiz themselves outside of class using a Web-based system. The assessment in this experiment was a short answer test in which students had to produce answers, but we also used multiple-choice tests. Once again, we found that quizzing of material produced a positive effect on chapter and semester exams. These results show the robustness of retrieval practice via testing as a learning mechanism in a classroom setting using the subject matter of the course and (in most cases) the tests on which students received grades as the dependent measures. Our results add to a growing body of evidence that retrieval practice in the classroom can boost academic performance. 相似文献

7.

Effects of college students' learning styles and gender on their test preparation strategies

Carol Speth Robert Brown 《Applied cognitive psychology》1990,4(3):189-202

This study investigated the effects of approach to studying, gender and type of examination on test preparation strategies. Educational psychology students completed the Approaches to Studying Inventory (Entwistle and Ramsden, 1983) regarding their general learning characteristics, and thus were assigned to four approach groups. Students also answered questions about how they might study for either an essay or a multiple-choice examination. Factor analysis of those items yielded several study strategy subscales. When scores on the time-effort, integration, selection and cognitive monitoring subscales were used as dependent variables in a 4×2×2 (cluster × gender × type of test) MANCOVA, a significant three-way interaction suggested that male and female students using different approaches react differently to multiple-choice or essay tests, and the patterns differ by strategy. 相似文献

8.

A Latent Trait Analysis of the MMPI

《Multivariate behavioral research》2013,48(4):385-407

Commonly used techniques for analyzing the structure of the MMPI scales were discussed and the use of a latent trait model was suggested as an alternative. The items on each scale of the MMPI were calibrated using a discrimination statistic. The item calibration statistics obtained from a replication sample were highly correlated with those obtained in the first sample. Poor fitting items were identified, and possible reasons for poor fits were discussed. The scales generally had few poor fits. The poor fitting items were generally those identified by Wiener (1956) as comprising the "subtle" subscales of the test. 相似文献

9.

General ability measurement: An application of multidimensional item response theory

Daniel O. Segall 《Psychometrika》2001,66(1):79-97

相似文献

10.

Item characteristics and answer-changing behaviors

Ballance CT 《Psychological reports》2006,98(1):205-208

Difficulty and discrimination indices on 113 multiple-choice test items were compared with points gained or lost from students' changing answers. Difficulty of items was not significantly correlated with score gain from changing answers, a result consistent with previous research. Item discrimination had a low correlation with test score gain from changing answers, a result not consistent with previous research. 相似文献

11.

2007-2010年心理学专业基础综合考试的多元概化理论研究 总被引：1，自引：0，他引：1

下载免费PDF全文

关丹丹王博车宏生《心理科学》2011,34(4):950-956

摘要本研究使用多元概化理论分析2007-2010年心理学专业基础综合考试。结果表明：1.从考查的学科内容看,心理统计与测量、普通心理学的测量精度较高,而发展与教育心理学、实验心理学的测量精度偏低;2.从设置的题型看,多选题的测量精度偏低,其他题型的测量精度较高;减少单选题数量、增加多选题数量可在保障全卷测量精度的基础上大幅提高多选题的测量精度;3.全卷测量精度很高,不同年度的试卷在学科内容和题型结构上可看成是“平行”试卷。相似文献

12.

Estimating Difficulty from Polytomous Categorical Data

Javier Revuelta 《Psychometrika》2010,75(2):331-350

A comprehensive analysis of difficulty for multiple-choice items requires information at different levels: the test, the items, and the alternatives. This paper introduces a new parameterization of the nominal categories model (NCM) for analyzing difficulty at these three levels. The new parameterization is referred to as the NE–NCM and is statistically equivalent to the NCM. The NE–NCM is applied to a sample of responses from a logical analysis test. The results suggest that the individuals execute a self-terminated response process that is mostly determined by working memory load. 相似文献

13.

Polytomous Latent Scales for the Investigation of the Ordering of Items

Rudy Ligtvoet L. Andries van der Ark Wicher P. Bergsma Klaas Sijtsma 《Psychometrika》2011,76(2):200-216

We propose three latent scales within the framework of nonparametric item response theory for polytomously scored items. Latent scales are models that imply an invariant item ordering, meaning that the order of the items is the same for each measurement value on the latent scale. This ordering property may be important in, for example, intelligence testing and person-fit analysis. We derive observable properties of the three latent scales that can each be used to investigate in real data whether the particular model adequately describes the data. We also propose a methodology for analyzing test data in an effort to find support for a latent scale, and we use two real-data examples to illustrate the practical use of this methodology. 相似文献

14.

A multiple change score comparison of traditional and behavioral college teaching procedures

Alba E Pennypacker HS 《Journal of applied behavior analysis》1972,5(2):121-124

Seventy-six students in a college-level course in human development were divided into an experimental and a control group of approximately equal size. Both groups were given a pretest composed of fill-in and multiple-choice items. The control group was exposed to conventional educational practices while the experimental group was treated in a manner similar to that described by Johnston and Pennypacker (1971), performing only on fill-in items. Post-test results showed significantly greater changes in the experimental group, regardless of the type of test item, although the difference was greater in the case of the fill-in items. The results are discussed in terms of their implications for both future research and tactics in the development of improved teaching technologies. 相似文献

15.

Computer programs to facilitate detailed analysis of how people study text passages

Keith A. Wollen Robert S. Cone Matthew G. Margres Bruce P. Wollen 《Behavior research methods》1985,17(3):371-378

Three computer programs are described for an IBM Personal Computer; these programs enable the gathering of detai3led data on reading and study processes. The programs (A) present any text of the experimenter’s choosing and permit subjects to study the lines and pages of that text in any order; (B) allow for the subject’s use of a light pen to record study time per line, to answer questions, and to indicate lines that are, for instance, important or difficult by “highlighting” them (i.e, changing them to reverse video); (C) allow for multiple study sessions and a review session; (D) present and score multiple-choice questions; (E) permit the use of fill-in test items; (F) calculate 176 values, including detailed data on highlighting, quality of study, testing, and times and frequencies on 18 different categories of lines in study and in review; and (G) make a printout of the raw data and the 176 summary statistics. Also described is one example of a 28-page text and a test that have been used in conjunction with the programs. Data are provided to illustrate the usefulness of the programs. 相似文献

16.

初中词汇理解能力量表的编制 总被引：4，自引：2，他引：2

曹亦薇《心理学报》1999,32(2):215-221

应用项目反应理论为初中各年级编制了词汇理解能力的测验,其中包含了１４３个多项选择的词汇项目,经过反复预测和大规模的正式测试,证关了这三个测验的量表拟全于２ＰＬ模型,项目特征曲线拟合度良好的项目占全体项目数９０％以上,能力的一维性也得以确认,经等值化后,各年级的区分度均值分别为０．６１（初一）,０．５９（初二）,０．５５（初三）难度均值分别为－１．６１,－１．３０,－０．５６。相似文献

17.

An examination of factors contributing to a reduction in subgroup differences on a constructed-response paper-and-pencil test of scholastic achievement

Edwards BD Arthur W 《The Journal of applied psychology》2007,92(3):794-801

The authors investigated subgroup differences on a multiple-choice and constructed-response test of scholastic achievement in a sample of 197 African American and 258 White test takers. Although both groups had lower mean scores on the constructed-response test, the results showed a 39% reduction in subgroup differences compared with the multiple-choice test. The results demonstrate that the lower subgroup differences were explained by more favorable test perceptions for African Americans on the constructed-response test. In addition, the two test formats displayed comparable levels of criterion-related validity. The results suggest that the constructed-response test format may be a viable alternative to the traditional multiple-choice test format in efforts to simultaneously use valid predictors of performance and minimize subgroup differences in high-stakes testing. 相似文献

18.

Examining differential item functioning due to item difficulty and alternative attractiveness

Paul Westers Henk Kelderman 《Psychometrika》1992,57(1):107-118

A method for analyzing test item responses is proposed to examine differential item functioning (DIF) in multiple-choice items through a combination of the usual notion of DIF, for correct/incorrect responses and information about DIF contained in each of the alternatives. The proposed method uses incomplete latent class models to examine whether DIF is caused by the attractiveness of the alternatives, difficulty of the item, or both. DIF with respect to either known or unknown subgroups can be tested by a likelihood ratio test that is asymptotically distributed as a chi-square random variable. 相似文献

19.

Can Item Format (Multiple Choice vs. Open-Ended) Account for Gender Differences in Mathematics Achievement?

Beller Michal Gafni Naomi 《Sex roles》2000,42(1-2):1-21

The purpose of this study was to investigate differential performance of boys and girls on open-ended (OE) and multiple-choice (MC) items on the 1988 and 1991 International Assessment of Educational Progress (IAEP) mathematics test. In the 1988 mathematics assessment, a representative sample of approximately 1,000 13-year-olds in each of the six participating countries was assessed. In the 1991 mathematics assessment, a representative sample of 9- and 13-year-olds (approximately 1,650 from each age group) in some 20 participating countries was assessed. Analyses of both assessments yielded results that indicated that boys generally performed better than girls in mathematics. In the 1988 assessment, gender effects were larger on MC items than on OE items, corresponding to results of earlier studies. However, the 1991 IAEP assessment produced contrary results: gender effects tended to be larger for OE items than for MC items. These inconsistent results challenge the assertion that girls perform relatively better on OE test items, and suggest that item format alone cannot account for gender differences in mathematics performance. Further investigation of the data revealed that the inconsistent patterns of gender effects with regard to item format were related to the difficulty level of the items, regardless of item format. Correlations between item difficulty and item gender effect size were computed for age 13 in the 1988 assessment and for ages 9 and 13 in the 1991 assessment. The correlations obtained were 0.26, 0.47, and 0.53, respectively, suggesting that the more difficult the items, the better boys perform relative to girls. 相似文献

20.

Test expectancy and question answering in prose processing

Richard B. May Janny M. Thompson 《Applied cognitive psychology》1989,3(3):261-269

Two experiments were conducted to study the effects of expectanices about test format (recall versus recognition) upon the retention of information from prose. In each study subjects expecting recall recalled better than those expecting a multiple-choice test. Serial position analysis in Experiment 1 suggested differential use of study time in groups expecting different types of test. Examination of study time use in Experiment 2 indicated that subjects expecting multiple-choice showed greater variability in the use of time spent reading prose segments. They were also more likely to employ idiosyncratic orders of reading segments. In general the results seem compatible with the theoretical model of Gillund and Shiffrin (1984) emphasizing the ratio of two types of coding. 相似文献