首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
该研究参照国内外已有的画人测验材料 ,并在以往使用绘人智能测验的基础上 ,重新设计 ,提出了一份包含 80个评分项目的画人评分标准。新标准首先在课题组内试用修改 ,然后又用于 9名标准评分员的培训 ,最后运用总计 2 5 8名 ,年龄为 6~ 1 2岁的小学儿童作预试 ,进行系统的项目分析。在上述过程中 ,对项目进行反复推敲删改 ,最终形成了一份供制订画人智力测验常模用的画人评分标准。该标准分有无、细节、比例、奖励四个维度 ,按身体部位分成头、头发、眼、耳、鼻、口、颈、躯干、上肢、手、下肢、脚以及连接和服饰共十四个大类 ,每大类包含 4~ 8个评分点 ,总计 75个评分点。  相似文献   

2.
与传统的纸笔测验(Paper And Pencil Based Test, P&P)相比计算机化自适应测验(Computerized Adaptive Testing, CAT)根据被试的作答反应自适应地选择题目, 它不仅缩短了测验长度, 还极大地提高了测验的准确性。然而, 目前绝大多数CAT不允许被试修改答案, 研究者主要担心修改答案会降低CAT的有效性。允许修改答案符合被试一贯的测验习惯, 修改之后的分数更能反映被试真实的水平, 从而能够进一步促进CAT在实际中的应用。现有的研究主要从三个方面提出了可修改答案CAT的控制方法:一是测验设计; 二是改进选题策略; 三是建构模型。未来的研究应进一步探讨这些方法之间的比较与结合, 以及对可修改答案认知诊断CAT (Cognitive Diagnostic CAT, CD-CAT)的研究。  相似文献   

3.
丁树良  罗芬  戴海琦  朱玮 《心理学报》2007,39(4):730-736
在IRT框架下,建立了0-1评分方式下单维双参数Logistic多题多做(MAMI)测验模型。与Spray给出的一题多做(MASI)模型相比,MAMI不仅模型更加精致,而且扩展了适用范围,参数估计方法也不同,采用EM算法求取项目参数。Monte Carlo模拟结果显示,应用MAMI测验模型与测验题量作相应增加的作法相比,两者给出的能力估计精度相同,但MAMI模型给出的项目参数估计精度更高。如果将MAMI测验模型与被试人数相应增加的作法相比,项目参数的估计精度相同,但MAMI给出的能力参数估计精度更高。这个发现表明,在一定条件下若允许修改答案,并采用累加式记分方式,纵使题量不变,也可使能力估计的精度相当于题量增加一倍的估计精度,而项目参数估计精度也会提高。这些发现不仅对技能评价和认知能力评价有参考价值,而且对数据的处理方式也有参考价值  相似文献   

4.
Recent research has demonstrated that a more liberal response criterion is used when people make judgments about bizarre items than about common items in old-new tests of recognition. The present study was designed to test 2 possible explanations of the bizarre response bias. The bizarre-relations explanation suggests that the bizarre response bias is triggered by the bizarre relations depicted in test items. The target-constituent explanation suggests that the bizarre response bias is the result of a sense of familiarity with constituents of bizarre test items. These explanations were tested by examining the influence of lure manipulations on memory discrimination and response bias for common and bizarre hand-drawn pictures. The results indicated support for the target-constituent explanation by reversing the response bias (obtaining a common response bias) in a recognition test that used common lures containing constituents from bizarre target items and bizarre lures containing constituents from common target items. The results also indicated that increased verbal elaboration enhanced memory discrimination and reduced response bias for both common and bizarre stimuli. The implications of these results are discussed with regard to the false memory controversy.  相似文献   

5.
Recent research has demonstrated that a more liberal response criterion is used when people make judgments about bizarre items than about common items in old-new tests of recognition. The present study was designed to test 2 possible explanations of the bizarre response bias. The bizarre-relations explanation suggests that the bizarre response bias is triggered by the bizarre relations depicted in test items. The target-constituent explanation suggests that the bizarre response bias is the result of a sense of familiarity with constituents of bizarre test items. These explanations were tested by examining the influence of lure manipulations on memory discrimination and response bias for common and bizarre hand-drawn pictures. The results indicated support for the target-constituent explanation by reversing the response bias (obtaining a common response bias) in a recognition test that used common lures containing constituents from bizarre target items and bizarre lures containing constituents from common target items. The results also indicated that increased verbal elaboration enhanced memory discrimination and reduced response bias for both common and bizarre stimuli. The implications of these results are discussed with regard to the false memory controversy.  相似文献   

6.
We explored college students' discrimination of complex visual stimuli that involvedmultiple-item displays. The items in each of the displays could be all the same, all different, or diverse mixtures of some same and some different items. The participants had to learn which of two arbitrary responses was correct for each of the displays without being told about the sameness or differentness of the stimuli. We observed a general improvement in discrimination performance--a rise in choice accuracy and a fall in reaction time-as the number of icons in the display was increased, even when the participants had been trained from the outset with displays containing different numbers of items and when smaller numbers of items were not randomly distributed but grouped in the center of the display. The participants' discrimination behavior also depended on the mixture of same and different items in the displays. Striking individual differences in the participants' discrimination behavior disclosed that people sometimes respond as do pigeons and baboons trained with a similar task. This and previous related research suggest that variability discrimination may lie at the root of same-different categorization behavior.  相似文献   

7.
Recognition without identification (RWI) is old-new discrimination among recognition test items that go unidentified. Recently, the effect has been shown in situations that require pre-experimental connections between unidentified studied items and their test cues, such as when the test cues are general knowledge questions and the unidentified studied items are their answers, or when the test cues are pictures of celebrities and the unidentified studied items are their names. In these cases, RWI demonstrates a peculiar relationship with tip-of-the-tongue (TOT) experiences: Participants give higher recognition ratings when in a TOT state than when not, even though studying an item does not increase the probability of a TOT state for that item. The present study extends these findings to the recognition of scene information. We demonstrate a scene RWI effect with scenes when scene names cannot be retrieved, and replicate the previously reported relationship between TOT states and RWI. In addition, we show that the relationship between RWI and reported TOT states also occurs between RWI and reported déjà vu states with the test scenes.  相似文献   

8.
Choices were presented to 9 individuals with developmental disabilities using a two-choice format. Each pair of items, selected based on prior preference assessment, was presented to each participant in three conditions (actual items, pictures of the items, and spoken-name presentation) using a reversal design. The evaluation was conducted using food items, and was then repeated using nonfood items. The participants were also given a test to measure their skills on discrimination tasks ranging in difficulty from simple to conditional discriminations. The participants' abilities to make consistent choices with food and nonfood items were predicted, with 94% accuracy, by their discrimination skills. The findings suggest that presentation methods can affect the accuracy of a choice assessment, and that the systematic assessment of basic discrimination skills can be used to predict the effectiveness of different presentation methods in this population.  相似文献   

9.
The current study investigated the impact of requiring respondents to elaborate on their answers to a biodata measure on mean scores, the validity of the biodata item composites, subgroup mean differences, and correlations with social desirability. Results of this study indicate that elaborated responses result in scores that are much lower than nonelaborated responses to the same items by an independent sample. Despite the lower mean score on elaborated items, it does not appear that elaboration affects the size of the correlation between social desirability and responses to biodata items or that it affects criterion-related validity or subgroup mean differences in a practically significant way.  相似文献   

10.
The Wonderlic Personnel Test (1983) was administered twice over a 3-week period under conditions in which the activity of the second test was experimentally manipulated. Data from 302 undergraduates were analyzed. The standard test-retest reliability coefficient, .872, was not significantly different from the coefficients obtained from three other groups that, on the second test, were each given specific instructions: (a) to reason out the answers (pure reassess condition); (b) to use reasoning, memory of their initial responses, or both (reassess and memory); or (c) to take an alternate form of the test (parallel). However, the standard test-retest reliability coefficient was higher, p less than .10, than the coefficient obtained from a condition (pure memory) in which subjects were instructed to duplicate their previous responses, using only memory. Although the subjects in the test-retest and combined reassess and memory conditions reported recalling previous answers for 20-25% of the items on the second test, it was concluded that conscious repetition of specific responses did not seriously inflate the estimate of test-retest reliability.  相似文献   

11.
允许修改答案的认知诊断计算机化自适应测验(Reviewable Cognitive Diagnostic Computerized Adaptive Testing,RCD-CAT),有利于更准确诊断被试的知识状态,题目口袋法(Item Pocket,IP)为被试提供了缓存作答并修改的机会,改进的题目口袋法(Modified IP,MIP)对IP内修改的题目重新计分。模拟研究比较了IP、MIP、stocking Ⅰ和stocking Ⅱ在RCD-CAT效果,结果发现:stocking设计的效果最优,其中stocking Ⅱ的效果略优于stocking Ⅰ,IP法和MIP法判准率要低于传统CD-CAT,stocking设计在RCD-CAT具有较好的应用前景。  相似文献   

12.
Numerous studies have found a null list strength effect (LSE) for recognition sensitivity: Strengthening memory traces associated with some studied items does not impair recognition of nonstrengthened studied items. In Experiment 1, the author found a LSE using receiver operating characteristic-based measures of recognition sensitivity. To account for the discrepancy between this and prior research, the author (a) argues that a LSE occurs for recollection but not for discrimination based on familiarity, and (b) presents self-report data consistent with this hypothesis. Experiment 2 tested the dual-process hypothesis more directly, using switched-plurality (SP) lures to isolate the contribution of recollection. There was a significant LSE for comparisons involving SP lures; the LSE for discrimination of studied items and nominally unrelated lures (which can be supported by familiarity) was not significant.  相似文献   

13.
Taking a social psychological approach to metacognitive judgments, this study analyzed the difference in realism (validity) in confidence and frequency judgments (i.e., estimates of overall accuracy) between one's own and another person's answers to general knowledge questions. Experiment 1 showed that when judging their own answers, compared with another's answers, the participants exhibited higher overconfidence, better ability to discriminate correct from incorrect answers, lower accuracy, and lower confidence. However, the overconfidence effect could be attributable to the lowest level of confidence. Furthermore, when heeding additional information about another's answers the participants showed higher confidence and better discrimination ability. The overconfidence effect of Experiment 1 was not found in Experiment 2. However, the results of Experiment 2 were consistent with Experiment 1 in terms of discrimination ability, confidence, and accuracy. Finally, in both experiments the participants gave lower frequency judgments of their own overall accuracy compared with their frequency judgments of another person's overall accuracy.  相似文献   

14.
Cravings for food and other substances can impair cognition. We extended previous research by testing the effects of caffeine cravings on cued-recall and recognition memory tasks, and on the accuracy of judgements of learning (JOLs; predicted future recall) and feeling-of-knowing (FOK; predicted future recognition for items that cannot be recalled). Participants (N?=?55) studied word pairs (POND-BOOK) and completed a cued-recall test and a recognition test. Participants made JOLs prior to the cued-recall test and FOK judgements prior to the recognition test. Participants were randomly allocated to a craving or control condition; we manipulated caffeine cravings via a combination of abstinence, cue exposure, and imagery. Cravings impaired memory performance on the cued-recall and recognition tasks. Cravings also impaired resolution (the ability to distinguish items that would be remembered from those that would not) for FOK judgements but not JOLs, and reduced calibration (correspondence between predicted and actual accuracy) for JOLs but not FOK judgements. Additional analysis of the cued-recall data suggested that cravings also reduced participants’ ability to monitor the likely accuracy of answers during the cued-recall test. These findings add to prior research demonstrating that memory strength manipulations have systematically different effects on different types of metacognitive judgements.  相似文献   

15.
Two experiments investigated the role of metacognition in changing answers to multiple-choice, general-knowledge questions. Both experiments revealed qualitatively different errors produced by speeded responding versus confusability amongst the alternatives; revision completely corrected the former, but had no effect on the latter. Experiment 2 also demonstrated that a pretest, designed to make participants' actual experience with answer changing either positive or negative, affected the tendency to correct errors. However, this effect was not apparent in the proportion of correct responses; it was only discovered when the metacognitive component to answer changing was isolated with a Type 2 signal-detection measure of discrimination. Overall, the results suggest that future research on answer changing should more closely consider the metacognitive factors underlying answer changing, using Type 2 signal-detection theory to isolate these aspects of performance.  相似文献   

16.
计算机化自适应测验选题策略述评   总被引:2,自引:0,他引:2  
毛秀珍  辛涛 《心理科学进展》2011,19(10):1552-1562
计算机化自适应测验(computerized adaptive testing, CAT)是基于测量理论和计算机技术的一种测验模式。它根据考生的作答反应自适应地选择测验项目。选题策略是CAT的重要组成部分之一, 关系到测量效率、测验安全和测验信、效度等重要问题。根据CAT是否具有非统计约束对传统CAT和认知诊断CAT的选题策略进行了分类介绍, 未来研究应进一步提高选题策略的综合表现、深入探讨多级评分项目和认知诊断CAT的选题策略。  相似文献   

17.
Previous studies have shown that punishing people through a large penalty for volunteering incorrect information typically leads them to withhold more information (metacognitive response bias), but it does not appear to influence their ability to distinguish between their own correct and incorrect answers (metacognitive accuracy discrimination). The goal of the current study was to demonstrate that punishing people for volunteering incorrect information—versus rewarding volunteering correct information—produces more effective metacognitive accuracy discrimination. All participants completed three different general-knowledge tests: a reward test (high points for correct volunteered answers), a baseline test (equal points/penalties for volunteered correct/incorrect answers) and a punishment test (high penalty for incorrect volunteered answers). Participants were significantly better at distinguishing between their own correct and incorrect answers on the punishment than reward test, which has implications for situations requiring effective accuracy monitoring.  相似文献   

18.
This paper examines psychometric properties of scores derived from calibration curves (overconfidence, calibration, resolution, and slope) and an analogue of overconfidence that is based on a posttest estimate of the proportion of correctly solved items. Four tests from the theory of fluid and crystallized intelligence were used, and two of these tests employed both sequential and simultaneous methods of item presentation. The results indicate that the overconfidence score not only has the highest reliability, but is the only score with a reliability normally considered adequate for use in individual differences research. There is some, albeit weak, difference in subjects' level of overconfidence between sequential and simultaneous methods of item presentation. Correlational evidence confirms our previous findings that overconfidence scores from perceptual and ‘knowledge’ tasks define the same factor. In agreement with the results of Gigerenzer, Hoffrage and Kleinbolting (1991), subjects' post-test estimates of their performance showed lower levels of overconfidence than did the traditional measures based on subjects' confidence judgment responses to individual items. Also, after controlling for the actual test performances, the post-test performance estimates and average confidence ratings were only slightly positively correlated, suggesting that different psychological processes may underlie these two measures. Finally, our results suggest that average confidence over all items in the test may be a more useful measure in individual differences research than scores derived from calibration curves.  相似文献   

19.
People remember information better if they generate the information while studying rather than read the information. However, prior research has not investigated whether this generation effect extends to related but unstudied items and has not been conducted in classroom settings. We compared third graders’ success on studied and unstudied multiplication problems after they spent a class period generating answers to problems or reading the answers from a calculator. The effect of condition interacted with prior knowledge. Students with low prior knowledge had higher accuracy in the generate condition, but as prior knowledge increased, the advantage of generating answers decreased. The benefits of generating answers may extend to unstudied items and to classroom settings, but only for learners with low prior knowledge.  相似文献   

20.
Objective tests of personality typically include a number of items or trials; the total score on the test is the sum of the subject's “correct” responses across all such trials. Normally, the trials are varied systematically across various facets of the test design, so that the total score represents a composite measure of accuracy averaged across these test facets. However, since only one score is computed for each subject, some potentially important kinds of individual differences—namely all those associated with each particular variation in the test design—are treated solely as measurement unreliability. Such a psychometric stance may serve to obscure more differentiated types of individual differences, with the result that composite scores from trials based on one type of experimental design may not be highly related to such scores from trials using a somewhat different design. The present paper presents a general procedure for scoring objective tests more analytically. To illustrate this general rationale, and to demonstrate its potential utility, data have been reanalyzed from two previous studies, one using the Rod-and-Frame test, the other the Müller-Lyer illusion. In both cases, the traditional global accuracy score did not correlate significantly with other theoretically related variables, while a number of component scores were quite highly related.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号