期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

MULTIPLE-CHOICE AND CONSTRUCTED RESPONSE TESTS OF ABILITY: RACE-BASED SUBGROUP PERFORMANCE DIFFERENCES ON ALTERNATIVE PAPER-AND-PENCIL TEST FORMATS 总被引：1，自引：1，他引：0

WINFRED ARTHUR JR. BRYAN D. EDWARDS GERALD V. BARRETT 《Personnel Psychology》2002,55(4):985-1008

We present an example of an innovative constructed response test format–a write-in/mark-in paper-and-pencil test–as an alternative to the traditional multiple-choice paper-and-pencil test, with the potential for reducing subgroup differences. We present subgroup differences data on these 2 paper-and-pencil test formats on an operational promotional exam in a sample of African American and White firefighters. The tests were designed to measure the same content domain. Using within-subjects data that compared the performance of 13 African American and 14 White fire captains, and between-subjects data that compared the performance of 21 African American and 49 White fire captains, several results were in the predicted direction such that subgroup differences were reduced on the constructed response test. However, these results did not reach statistical significance. Therefore, the study points to the need for additional research to further evaluate the promise of the constructed response test format. 相似文献

2.

History Assessments of Thinking: A Validity Study

Mark Smith Joel Breakstone Sam Wineburg 《认知与教导》2019,37(1):118-144

This article reports a validity study of History Assessments of Thinking (HATs), which are short, constructed-response assessments of historical thinking. In particular, this study focuses on aspects of cognitive validity, which is an examination of whether assessments tap the intended constructs. Think-aloud interviews with 26 high school students were used to examine the thinking elicited by 8 HATs and multiple-choice versions of these tasks. Results showed that although both HATs and multiple-choice items tapped historical thinking processes, HATs better reflected student proficiency in historical thinking than their multiple-choice counterparts. Item format also influenced the thinking elicited, with multiple-choice items eliciting more instances of construct-irrelevant reasoning than the constructed-response versions. Implications for history assessment are discussed. 相似文献

3.

Analyzing Components of Reading On Performance Assessments: An Expanded Simple View

Evelyn S. Johnson Joseph R. Jenkins Mark Jewell 《Reading Psychology》2013,34(3):267-283

This research examined the validity of the theoretical model of reading outlined by the Simple View of Reading when measuring reading ability with a performance-based reading test. Participants were 95 fourth-grade, students randomly sampled from four schools in an urban district. The test we studied employed a mixture of traditional (multiple-choice) and performance assessment approaches (constructed-response items that required written responses). Our findings indicated that writing ability emerged as an important source of individual differences in explaining overall reading ability, even when we deconstructed the test into a multiple-choice-only score. 相似文献

4.

Sources of errors on visuoperceptual tasks: role of education, literacy, and search strategy

Byrd DA Jacobs DM Hilton HJ Stern Y Manly JJ 《Brain and cognition》2005,58(3):251-257

The current study explored possible sources of demographic effects through analyses of errors from modified formats of the Benton Visual Retention Test (BVRT) completed by African American elders. Results indicate that: (1) reading level was a stronger predictor of BVRT performance than years of education; (2) on the single-item matching format of the task, individuals with lower reading levels disproportionately produced errors on items that differed in geometric, rather than spatial features; and (3) on a multiple-choice matching format, individuals with lower reading levels committed more errors on items where the target was located in the lower half of a 2 x 2 matrix. 相似文献

5.

Individual differences in text comprehension as a function of test anxiety and prior knowledge

Minnaert AE 《Psychological reports》1999,84(1):167-177

This study investigated the relationship between reading comprehension and comprehension monitoring with undergraduates (223 women, 69 men). Further, the effect of test anxiety and of prior knowledge on reading comprehension and on comprehension monitoring was examined in groups of students of equal intellectual ability. Students with high scores on reading comprehension performed better on a comprehension monitoring task as well. Individual differences in reading comprehension with a multiple-choice response format emerged as a function of the interaction between test anxiety and prior knowledge. Students with low prior knowledge and high test anxiety performed worst of all. We found a far less detrimental effect of test anxiety and prior knowledge on monitoring comprehension than on reading comprehension. 相似文献

6.

Vocabulary test format and differential relations to age

Bowles RP Salthouse TA 《Psychology and aging》2008,23(2):366-376

Although vocabulary tests are generally considered interchangeable, regardless of format, different tests can have different relations to age and to other cognitive abilities. In this study, 4 vocabulary test formats were examined: multiple-choice synonyms, multiple-choice antonyms, produce the definition, and picture identification. Results indicated that, although they form a single coherent vocabulary knowledge factor, the formats have different relations to age. In earlier adulthood, picture identification had the strongest growth, and produce the definition had the weakest. In later adulthood, picture identification had the strongest decline, and multiple-choice synonyms had the least. The formats differed in their relation to other cognitive variables, including reasoning, spatial visualization, memory, and speed. After accounting for the differential relations to other cognitive variables, differences in relation to age were eliminated with the exception of differences for the picture identification test. No theory of the aging of vocabulary knowledge fully explains these findings. These results suggest that using a single indicator of vocabulary may yield incomplete and somewhat misleading results about the aging of vocabulary knowledge. 相似文献

7.

Test expectancy and question answering in prose processing

Richard B. May Janny M. Thompson 《Applied cognitive psychology》1989,3(3):261-269

Two experiments were conducted to study the effects of expectanices about test format (recall versus recognition) upon the retention of information from prose. In each study subjects expecting recall recalled better than those expecting a multiple-choice test. Serial position analysis in Experiment 1 suggested differential use of study time in groups expecting different types of test. Examination of study time use in Experiment 2 indicated that subjects expecting multiple-choice showed greater variability in the use of time spent reading prose segments. They were also more likely to employ idiosyncratic orders of reading segments. In general the results seem compatible with the theoretical model of Gillund and Shiffrin (1984) emphasizing the ratio of two types of coding. 相似文献

8.

The Three‐option Format for Knowledge and Ability Multiple‐choice Tests: A case for why it should be more commonly used in personnel testing

Bryan D. Edwards Winfred Arthur Jr Leonardis L. Bruce 《International Journal of Selection & Assessment》2012,20(1):65-81

Multiple‐choice (MC) tests are arguably the most widely used testing format in applied settings. In the psychometric and education literatures, research on the optimal number of options for knowledge and ability MC tests has revealed that three‐option tests are psychometrically equivalent and, in some cases, superior to five‐option tests. In addition, there are a number of practical, economic, and administrative advantages associated with the use of three‐option MC tests. Yet, despite its advantages, the three‐option format is underutilized in personnel selection. Across two studies, we compared test‐taker perceptions, criterion‐related validity, and sex‐based subgroup differences, and in Study 1, we compared race‐based subgroup differences on three‐ and five‐option tests. Participants in the two studies completed a three‐ or five‐option version of ACT. Test perceptions, criterion‐related validity, and race‐ and sex‐based subgroup differences were similar across test formats. The implications for the expanded use of three‐option tests in applied settings and future directions for research are discussed. 相似文献

9.

Case-Mixing Effects on Spelling Recognition: The Importance of Test Format

Burt JS Hutchinson BJ 《Journal of psycholinguistic research》2000,29(4):433-451

In a multiple-choice spelling recognition test, 56 university students were more accurate on more regular than irregular words, and on lower-case than mixed-case words, with the case mixing effect greater for irregular than regular words. In Experiment 2, the same words were presented singly in correct or incorrect spellings and distortion of word shape was achieved by case mixing (32 subjects) or by alternating the size of lower-case letters within a word (32 subjects). The main effects of regularity and distortion were replicated and the effect of distortion was greater for incorrect than correct stimuli, with correctly spelled words suffering a decrement in accuracy of less than 5 percentage points. Case mixing had a greater effect than size mixing on response latencies. In Experiment 3, with comparable test procedures, case mixing interacted with regularity in the subjects analysis for the multiple choice format, but not the single presentation format. This result indicates that comparisons based on visual configuration may be an artifact of multiple-choice tests. 相似文献

10.

Can Item Format (Multiple Choice vs. Open-Ended) Account for Gender Differences in Mathematics Achievement?

Beller Michal Gafni Naomi 《Sex roles》2000,42(1-2):1-21

The purpose of this study was to investigate differential performance of boys and girls on open-ended (OE) and multiple-choice (MC) items on the 1988 and 1991 International Assessment of Educational Progress (IAEP) mathematics test. In the 1988 mathematics assessment, a representative sample of approximately 1,000 13-year-olds in each of the six participating countries was assessed. In the 1991 mathematics assessment, a representative sample of 9- and 13-year-olds (approximately 1,650 from each age group) in some 20 participating countries was assessed. Analyses of both assessments yielded results that indicated that boys generally performed better than girls in mathematics. In the 1988 assessment, gender effects were larger on MC items than on OE items, corresponding to results of earlier studies. However, the 1991 IAEP assessment produced contrary results: gender effects tended to be larger for OE items than for MC items. These inconsistent results challenge the assertion that girls perform relatively better on OE test items, and suggest that item format alone cannot account for gender differences in mathematics performance. Further investigation of the data revealed that the inconsistent patterns of gender effects with regard to item format were related to the difficulty level of the items, regardless of item format. Correlations between item difficulty and item gender effect size were computed for age 13 in the 1988 assessment and for ages 9 and 13 in the 1991 assessment. The correlations obtained were 0.26, 0.47, and 0.53, respectively, suggesting that the more difficult the items, the better boys perform relative to girls. 相似文献

11.

"None of the above" as a correct and incorrect alternative on a multiple-choice test: implications for the testing effect

Odegard TN Koen JD 《Memory (Hove, England)》2007,15(8):873-885

Both positive and negative testing effects have been demonstrated with a variety of materials and paradigms (Roediger & Karpicke, 2006b). The present series of experiments replicate and extend the research of Roediger and Marsh (2005) with the addition of a "none-of-the-above" response option. Participants (n=32 in both experiments) read a set of passages, took an initial multiple-choice test, completed a filler task, and then completed a final cued-recall test (Experiment 1) or multiple-choice test (Experiment 2). Questions were manipulated on the initial multiple-choice test by adding a "none-of-the-above" response alternative (choice "E") that was incorrect ("E" Incorrect) or correct ("E" Correct). The results from both experiments demonstrated that the positive testing effect was negated when the "none-of-the-above" alternative was the correct response on the initial multiple-choice test, but was still present when the "none-of-the-above" alternative was an incorrect response. 相似文献

12.

Evaluation of a networked self-testing program

Clarke DE 《Psychological reports》2000,86(1):127-128

The use of a computerized, multiple-choice test bank to present practice and assessment tests on a network was evaluated with 46 men and 119 women from a first-year class in psychology. A correlation of .65 (p < .001) between scores on a traditional paper-and-pencil test and scores on a computerized test provided some validity for the computerized assessment. Regression analysis showed that ability (previous academic performance) and motivation (number of practice tests taken) accounted for 73% of the explained variance in computerized test scores. Sex differences did not enter the regression equation significantly. 相似文献

13.

Exploring the Evocation of Verbal Perspective Taking Using a Linguistic Relational Triangulation Questionnaire (RTQ-MST9)

Guinther Paul M. Vlachodimos Vasileios Stewart Ian 《The Psychological record》2022,72(3):429-447

The Psychological Record - The present article exhibits the use of a linguistic multiple-choice questionnaire format for evoking relational triangulation performances while examining whether... 相似文献

14.

Depictions of motion devised by a blind person

Kennedy JM Merkas CE 《Psychonomic bulletin & review》2000,7(4):700-706

A blind man (E.A.) was asked to draw pictures suggesting wheels in various kinds of motion. Six pictures were drawn by E.A. The pictures were shown to sighted subjects, who were asked to assign labels to the pictures, in a multiple-choice format. The labels were assigned at a rate above chance. We argue that the pictures are metaphoric and that pictorial metaphor relies on common properties of the static picture and the kinetic referent. 相似文献

15.

Impression formation of tests: retrospective judgments of performance are higher when easier questions come first

Abigail Jackson Robert L. Greene 《Memory & cognition》2014,42(8):1325-1332

Four experiments are reported on the importance of retrospective judgments of performance (postdictions) on tests. Participants answered general knowledge questions and estimated how many questions they answered correctly. They gave higher postdictions when easy questions preceded difficult questions. This was true when time to answer each question was equalized and constrained, when participants were instructed not to write answers, and when questions were presented in a multiple-choice format. Results are consistent with the notion that first impressions predominate in overall perception of test difficulty. 相似文献

16.

Patterns of memory performance in children with controlled epilepsy on the CVLT-C.

J Williams T Phillips M L Griebel G B Sharp B Lange T Edgar P Simpson 《Child neuropsychology》2001,7(1):15-20

Decreased memory skills have been reported in children with epilepsy. However, standardized instruments to evaluate learning and memory in children have been unavailable until recently. The present study was designed to assess memory patterns in children with epilepsy based on the California Verbal Learning Test-Children's Version (CVLT-C). The test was administered to 44 children with complex partial seizures and 21 children with generalized seizures between 8 and 13 years of age. Children in the study had been treated for epilepsy for at least 6 months, had well-controlled seizures on monotherapy, and had no evidence of anticonvulsant toxicity. Children with head injuries, learning disabilities, or hyperactivity were excluded. Test results did not reflect differences in memory performance based on seizure type. Scores for the entire sample indicated intact new learning, decreased intrusions and perseverative responses, and better short-term than long-term delayed recall. Recognition skills were stronger than long-term delayed recall skills and suggested that memory performance may be improved for these children when a multiple-choice format is available in academic settings. 相似文献

17.

Influence of Question Format and Text Availability on the Assessment of Expository Text Comprehension

Yasuhiro Ozuru Rachel Best Courtney Bell Amy Witherspoon Danielle S. McNamara 《认知与教导》2013,31(4):399-438

This study examines how passage availability and reading comprehension question format (open-ended vs. multiple-choice) influence question answering. In two experiments, college undergraduates read an expository passage and answered open-ended and multiple-choice versions of text-based, local, and global bridging inference questions. Half the participants were allowed to refer to the passage when answering the questions and half were not. Participants' prior domain knowledge relating to the text contents was assessed using multiple-choice and open-ended questions. Correlation-based analyses in the two experiments indicated: (a) a decline in the relationship between prior domain knowledge and comprehension when the passage was available during question answering; and (b) a high correlation between multiple-choice and open-ended question answering performance when the passage was not available for reference. Overall the results indicate that the nature of the reading comprehension assessment is influenced by the specific task with which comprehension is assessed. 相似文献

18.

Patterns of Memory Performance in Children with Controlled Epilepsy on the CVLT-C

Jane Williams Tonya Phillips May L. Griebel Gregory B. Sharp Bernadette Lange Terence Edgar 《Child neuropsychology》2013,19(1):15-20

Decreased memory skills have been reported in children with epilepsy. However, standardized instruments to evaluate learning and memory in children have been unavailable until recently. The present study was designed to assess memory patterns in children with epilepsy based on the California Verbal Learning Test-Children's Version (CVLT-C). The test was administered to 44 children with complex partial seizures and 21 children with generalized seizures between 8 and 13 years of age. Children in the study had been treated for epilepsy for at least 6 months, had well-controlled seizures on monotherapy, and had no evidence of anticonvulsant toxicity. Children with head injuries, learning disabilities, or hyperactivity were excluded. Test results did not reflect differences in memory performance based on seizure type. Scores for the entire sample indicated intact new learning, decreased intrusions and perseverative responses, and better short-term than long-term delayed recall. Recognition skills were stronger than long-term delayed recall skills and suggested that memory performance may be improved for these children when a multiple-choice format is available in academic settings. 相似文献

19.

Reliability of Pass and Fail Decisions on Tests Employing Cut Scores

Gautam Puhan Leanne Gall 《Psychological studies》2012,57(3):273-282

The study evaluated the reliability of pass and fail classifications for several teacher certification tests. Since these tests are used in the context of a cut score to classify examinees as pass and fail, evaluating the accuracy and consistency of these classifications is important. The classification accuracy and consistency statistics were estimated using the RELCLASS software. Results indicated the following. (1) The 29 teacher certification tests that were examined had a relatively high classification accuracy (0.827 to 0.999) and consistency (0.760 to 0.999). (2) Both classification accuracy and consistency increased as the difference between the mean and cut score increased. (3) Classification accuracy and consistency was higher for multiple-choice (MC) as compared to tests consisting of only constructed-response (CR) items or a combination of CR and MC items. 相似文献

20.

Psychometric evaluation of the Geriatric Depression Scale and the Zung Self-Rating Depression Scale using an elderly community sample

V K Dunn W P Sacco 《Psychology and aging》1989,4(1):125-126

In this study the psychometric properties of the Geriatric Depression Scale (GDS) and the Zung Self-Rating Depression Scale (SDS) were evaluated and compared, using a relatively large elderly community sample. The GDS generally performed well, replicating earlier findings from a different population. Also, as hypothesized, the SDS, which has a multiple-choice format, had a higher non-completion rate than the GDS, which has a true-false format. Finally, no significant differences between the responses of young-old and old-old subjects were observed. 相似文献