首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Simons R  Goddard R  Patton W 《Assessment》2002,9(3):292-300
Despite the comprehensive treatment of test validity in most technical manuals, test authors appear to routinely assume that clients and professionals will score their instruments without error. Recently Allard and Faust challenged this assumption by suggesting that error rates "may not be rare or benign" and demonstrated that tests with more complex scoring procedures were associated with a greater number of scoring errors. This study investigated error rates that resulted from hand scoring seven psychometric tests commonly employed in psychological practice. Significant and serious error rates were identified for both psychologist and client scorers across all tests investigated. Scoring complexity was found to predict the base rate of scorer errors. The findings suggest that greater development in and attention to test-scoring procedures is required to restrict the likelihood of scorer error.  相似文献   

2.
The current study evaluated whether a computer‐based training program could improve observers' accuracy in scoring discrete instances of problem behavior at 5x normal speed using a multiple‐baseline design across subjects. During pretraining and posttraining, observers attempted to score multiple examples of problem behavior at 5.0x without feedback. During training, participants scored multiple examples of problem behavior at 5.0x with automated feedback. Researchers measured omission (missing problem behavior) and commission (scoring other behavior as problem behavior) errors and the total duration of scoring time to determine the observers' accuracy and efficiency, respectively. After training, all participants scored instances of problem behavior with less than 11% error using 5.0x. The time required to score the videos across 90‐min observations was reduced by 66%. Results extend previous evaluations of fast forwarding by demonstrating that the training program could be used to teach observers to accurately score problem behavior using a speed faster than 3.5x.  相似文献   

3.
The Comprehensive System (CS; Exner, 1974, 1978) for scoring Rorschach responses is the most widely taught and most widely accepted system in use today. The complexity and labor- intensive nature of the CS makes the issue of scoring accuracy a central concern. Twenty-one graduate psychology students and 12 professionals scored 20 Rorschach responses drawn from normal and clinical protocols. In general. accuracy scores for both students and professionals were below acceptable levels. Accuracy scores were clearly better for the code categories of Location, DQ, Pairs, Popular, and Z than for Determinants, FQ, Content, and Special Scores. Responses from clinical protocols were subject to more error. The results suggest that high levels of scoring errors may exist in the field use of the CS. Training standards may need to be devised to insure scoring competence.  相似文献   

4.
Pigeons obtained food by making four responses on three keys in a specified sequence, e.g., left, right, center, right. Under the "tandem-learning" condition, all three keys were the same color throughout the response sequence, and the sequence was changed from session to session. After total errors per session (overall accuracy) and within-session error reduction (learning) had stabilized, the effects of varying doses phenobarbital and chlordiazepoxide were assessed. For comparison, the drug tests were also conducted under a "tandem-performance" condition, in which the response sequence was the same from session to session, and under corresponding "chain-learning" and "chain-performance" conditions, where different colored keylights were associated with the response sequence. Under all four baseline conditions, the largest dose of each drug impaired overall accuracy. Under the two learning conditions, the error rate decreased across trials within each session, but the degree of negative acceleration was less in the drug sessions than in the control sessions. In contrast, under the two performance conditions, the error rate was relatively constant across trials, but was higher in the drug sessions than in the control sessions. Of the four baselines, the chain-learning condition was the most sensitive to the drug effects.  相似文献   

5.
6.
In this study, we compared the utility of three instruments, the Personality Assessment Inventory (PAI; Morey, 1991), the Structured Inventory of Malingered Symptomatology (Smith & Burger, 1997), and the Structured Interview of Reported Symptoms (SIRS; Rogers, Bagby, & Dickens, 1992) to detect malingering among prisoners. We examined 4 inmate samples: (a) prisoners instructed to malinger, (b) "suspected malingerers" identified by psychiatric staff, (c) general population control inmates, and (d) psychiatric patients. Intercorrelations among the measures for the total sample (N = 115) were quite high, and receiver operating characteristic analyses suggested similar rates of overall predictive accuracy across the measures. Despite this, commonly recommended cut scores for these measures resulted in widely differing rates of sensitivity and specificity across the subsamples. Moreover, although all instruments performed well in the nonpsychiatric samples (i.e., simulators and controls), classification accuracy was noticeably poorer when attempting to differentiate between psychiatric patients and suspected malingerers, with only 2 PAI indicators significantly discriminating between them.  相似文献   

7.
We examined the occurrence of faking on a rating situational judgment test (SJT) by comparing SJT scores and response styles of the same individuals across two naturally occurring situations. An SJT for medical school selection was administered twice to the same group of applicants (N = 317) under low‐stakes (T1) and high‐stakes (T2) circumstances. The SJT was scored using three different methods that were differentially affected by response tendencies. Applicants used significantly more extreme responding on T2 than T1. Faking (higher SJT score on T2) was only observed for scoring methods that controlled for response tendencies. Scoring methods that do not control for response tendencies introduce systematic error into the SJT score, which may lead to inaccurate conclusions about the existence of faking.  相似文献   

8.
Studies of graduate students learning to administer the Wechsler scales have generally shown that training is not associated with the development of scoring proficiency. Many studies report on the reduction of aggregated administration and scoring errors, a strategy that does not highlight the reduction of errors on subtests identified as most prone to error. This study evaluated the development of scoring proficiency specifically on the Wechsler (WISC-IV and WAIS-III) Vocabulary, Comprehension, and Similarities subtests during training by comparing a set of 'early test administrations' to 'later test administrations.' Twelve graduate students enrolled in an intelligence-testing course participated in the study. Scoring errors (e.g., incorrect point assignment) were evaluated on the students' actual practice administration test protocols. Errors on all three subtests declined significantly when scoring errors on 'early' sets of Wechsler scales were compared to those made on 'later' sets. However, correcting these subtest scoring errors did not cause significant changes in subtest scaled scores. Implications for clinical instruction and future research are discussed.  相似文献   

9.
The responding of rats was reinforced on one key after a 1-sec auditory stimulus and on a second key after a 5-sec stimulus. With errors punished by a short timeout, all subjects achieved a high level of accuracy. A chain of responses during the stimuli mediated the performance so that when the auditory signals were omitted accuracy decreased only slightly. Response-independent aversive stimulation superimposed upon this procedure both suppressed the total amount of behavior and reduced the accuracy of the discriminative performance, the intensity of the stimulus determining the error rate. The increase in errors under these conditions may have depended in part upon differential suppression of members of the response chain, but such suppression was not necessary, since error rate increased even in its absence. Furthermore, the locus of response disruption within the chain was not consistent from day to day either for any individual animal or across animals.  相似文献   

10.
The constructive nature of memory is generally adaptive, allowing us to efficiently store, process and learn from life events, and simulate future scenarios to prepare ourselves for what may come. However, the cost of a flexibly constructive memory system is the occasional conjunction error, whereby the components of an event are authentic, but the combination of those components is false. Using a novel recombination paradigm, it was demonstrated that details from one autobiographical memory (AM) may be incorrectly incorporated into another, forming AM conjunction errors that elude typical reality monitoring checks. The factors that contribute to the creation of these conjunction errors were examined across two experiments. Conjunction errors were more likely to occur when the corresponding details were partially rather than fully recombined, likely due to increased plausibility and ease of simulation of partially recombined scenarios. Brief periods of imagination increased conjunction error rates, in line with the imagination inflation effect. Subjective ratings suggest that this inflation is due to similarity of phenomenological experience between conjunction and authentic memories, consistent with a source monitoring perspective. Moreover, objective scoring of memory content indicates that increased perceptual detail may be particularly important for the formation of AM conjunction errors.  相似文献   

11.
ABSTRACT

The identification-production framework suggests that aging is associated with a decline in production forms of repetition priming, particularly under test conditions that maximize response competition. The present study examined this prediction by testing young and healthy older adults in a single-encoding version of the verb generation task in which some items had one dominant verb response (low competition) or had no such dominant response (high competition). Further analyses examined whether priming and error rates were related to performance on neuropsychological tests purported to measure frontal lobe functioning. Priming was invariant across age groups and was not related to frontal lobe status in older adults, but frontal lobe status did predict task performance: low-frontal older adults made more errors than high-frontal older adults, particularly for high-competition items and items with high association strength. These results are not consistent with the identification-production framework, but are consistent with the conclusions that (a) aging is associated with invariance in the processes that support repetition priming in the verb generation task and (b) frontal lobe status in aging is related to verb generation performance.  相似文献   

12.
Word-finding difficulties observed in some patients with anomia have been attributed to an insufficient activation of phonology by semantics. There are, however, few direct tests of this hypothesis. This paper reports the case of FR, who presented with anomic aphasia following temporal lobe epilepsy and a cavernoma in the left superior temporal lobe. His anomic deficit was characterized by: (1) no apparent associated semantic impairment; (2) item consistency for accuracy and errors across different administrations; (3) accuracy strongly correlated with word frequency; and (4) a partial, albeit weak, knowledge of the gender of unnamed items. We conducted a naming experiment in which target pictures were implicitly primed by briefly presented masked words. Results showed that the prior presentation of the written target name improved accuracy. When compared with unprimed trials, the presence of the primes also increased phonological errors and decreased semantic errors. We argue that automatic phonological activation derived directly from the implicit written primes interacted with the remaining phonological input from the picture's semantic representation leading to increased accuracy and a change in the balance of error types.  相似文献   

13.
Technological advances have allowed professionals to obtain extended recordings of caregiver–client interactions in natural settings, but scoring recorded video at normal speed to identify instances of low‐rate problem behavior is impractical in terms of scoring time. Fast forwarding is a continuous measurement system in which all seconds of an observation are viewed at a speed faster than normal. In Study 1, we evaluated whether three groups of five observers could discriminate problem behavior at three fast‐forwarding speeds across 10‐min observations. We analyzed the efficiency of using fast forwarding compared to continuous scoring, and interobserver agreement across the fast‐forwarding speeds. In Study 2, we compared the accuracy, efficiency, and social acceptability of fast forwarding (3.5x) and momentary time sampling (3.5 s) across 90‐min observations. Results support the use of 3.5x fast forwarding as a viable measurement system of improving the practicality of scoring problem behavior from video.  相似文献   

14.
In the task-switching paradigm, the latency switch-cost score—the difference in mean reaction time between switch and nonswitch trials—is the traditional measure of task-switching ability. However, this score does not reflect accuracy, where switch costs may also emerge. In two experiments that varied in response deadlines (unlimited vs. limited time), we evaluated the measurement properties of two traditional switch-cost scoring methods (the latency switch-cost score and the accuracy switch-cost score) and three alternatives (a rate residual score, a bin score, and an inverse efficiency score). Scores from the rate residual, bin score, and inverse efficiency methods had comparable reliability for latency switch-cost scores without response deadlines but were more reliable than latency switch-cost scores when higher error rates were induced with a response deadline. All three alternative scoring methods appropriately accounted for differences in accuracy switch costs when higher error rates were induced, whereas pure latency switch-cost scores did not. Critically, only the rate residual and bin score methods were more valid indicators of task-switching ability; they demonstrated stronger relationships with performance on an independent measure of executive functioning (the antisaccade analogue task), and they allowed the detection of larger effect sizes when examining within-task congruency effects. All of the three alternative scoring methods provide researchers with a better measure of task-switching ability than do traditional scoring methods, because they each simultaneously account for latency and accuracy costs. Overall, the three alternative scoring methods were all superior to the traditional latency switch-cost scoring method, but the strongest methods were the rate residual and bin score methods.  相似文献   

15.
Measures of perceptual speed ability have been shown to be an important part of assessment batteries for predicting performance on tasks and jobs that require a high level of speed and accuracy. However, traditional measures of perceptual speed ability sometimes have limited cost-effectiveness because of the requirements for administration and scoring of paper-and-pencil tests. There have also been concerns about the validity of previous computer approaches to administering perceptual speed tests (e.g., see Mead & Drasgow, 1993). The authors developed two sets of computerized perceptual speed tests, with touch-sensitive monitors, that were designed to parallel several paper-and-pencil tests. The reliability and validity of the tests were explored across three empirical studies (N = 167, 160, and 117, respectively). The final study included two criterion tasks with 4.67 and 10 hours of time-on-task practice, respectively. Results indicated that these new measures provide both high levels of reliability and substantial validity for performance on the two skill-learning tasks. Implications for research and application for computerized perceptual speed tests are discussed.  相似文献   

16.
Four different age groups (8-9-year-olds, 11-12-year-olds, 13-15-year-olds and young adults) performed a spatial rule-switch task in which the sorting rule had to be detected on the basis of feedback or on the basis of switch cues. Performance errors were examined on the basis of a recently introduced method of error scoring for the Wisconsin Card Sorting Task (WCST; Barcelo & Knight, 2002). This method allowed us to differentiate between errors due to failure-to-maintain-set (distraction errors) and errors due to failure-to-switch-set (perseverative errors). The anticipated age differences in performance errors were most pronounced for perseverative errors between 8-9 years and 11-12 years, but for distraction errors adult levels were not reached until 13-15 years. These findings were interpreted to support the notion that set switching and set maintenance follow distinct developmental trajectories.  相似文献   

17.
Greene E  Frawley W 《Perception》2005,34(11):1339-1352
In previous studies, we have found that the accuracy in judging collinearity of lines or dots varies considerably from one subject to another as a function of the relative angle of the stimulus elements. A model of errors generally shows large excursions across several subranges of angular position. These do not appear to be motor errors, at least not ones that are well separated from perceptual mechanisms. The errors are most likely generated at primary visual cortex, or beyond. We examined and modeled accuracy in judging collinearity of dot pairs, varying the angular position of the dots through 360 degrees, the distance between the dots (stimulus span), and the distance at which the subject was required to respond (response span). Subjects manifested idiosyncratic profiles of error across angular positions, as reported previously. But across the tested range of spans, from 4 to 8 deg, the errors tended to be the same, irrespective of stimulus or response span. This suggests that the judgments are based on a radial (angular) measure of spatial position. We discuss these results in the context of proposals that the brain maps spatial position using rotation coordinates. These new data are consistent with the hypothesis that subjects use the z-axis coordinates as a mental protractor for judging angular position and collinearity.  相似文献   

18.
Three procedures for correcting errors made during discrimination training were examined: error statement (saying ‘no’), modeling the correct response, and No Feedback. Six children with autism (age 3–7 years) were taught to match words to pictures with each of the three procedures, and the number of trials to mastery was compared across conditions. Results varied across participants. Two participants performed as well with no feedback as they did with an error correction procedure; two acquired skills slightly more quickly with an error correction procedure than with no feedback, but showed no difference between error correction procedures; one did best with error statement; and one did best with modeling. Results indicate that the choice of error correction procedure can have a large effect on rate of skill acquisition but that the optimal procedure may vary across individuals. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

19.
Tests of accuracy in interpersonal perception take many forms. Often, such tests use designs and scoring methods that produce overall accuracy levels that cannot be directly compared across tests. Therefore, progress in understanding accuracy levels has been hampered. The present article employed several techniques for achieving score equivalency. Mean accuracy was converted to a common metric, pi [Rosenthal, R., & Rubin, D. B. (1989). Effect size estimation for one-sample multiple-choice-type data: Design, analysis, and meta-analysis. Psychological Bulletin, 106, 332–337] in a database of 109 published results representing tests that varied in terms of scoring method (proportion accuracy versus correlation), content (e.g., personality versus affect), number of response options, item preselection, cue channel (e.g., face versus voice), stimulus duration, and dynamism. Overall, accuracy was midway between guessing level and a perfect score, with accuracy being higher for tests based on preselected than unselected stimuli. When item preselection was held constant, accuracy was equivalent for judging affect and judging personality. However, comparisons must be made with caution due to methodological variations between studies and gaps in the literature.  相似文献   

20.
Homophone confusion errors were examined in a series of 6 experiments. Across a variety of tasks, readers consistently made more errors on homophone trials than on control trials. These effects were established in Experiment 1 using a semantic-decision task in which participants judged whether pairs of words were related or unrelated. For both related and unrelated trials, error rates were higher for homophones as compared with controls. Results such as these have previously been taken as evidence for the role of phonology in lexical access and reading. However, differences in orthographic knowledge (more specifically, knowledge of spelling-to-meaning correspondences) across participants and homophone items significantly predicted homophone errors across all tasks. In addition, spelling tasks and multiple-choice questionnaires revealed differences in orthographic knowledge across participants and homophone items. Although these results do not rule out a role for phonology in lexical access, they indicate that homophone confusion errors may also be due to factors other than phonology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号