首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Williams JE  Weed NC 《Assessment》2004,11(4):316-329
There are eight commercially available computer-based test interpretations (CBTIs) for the Minnesota Multiphasic Personality Inventory-2 (MMPI-2), of which few have been empirically evaluated. Prospective users of these programs have little scientific data to guide choice of a program. This study compared ratings of these eight CBTIs. Test users were randomly assigned to rate either a single authentic CBTI report on one of their clients or a single CBTI report generated from a modal MMPI-2 profile for their clinical setting. In all, 257 authentic and modal CBTI reports were rated by 41 clinicians on 10 dimensions. Each of the authentic reports received substantially higher ratings than the modal reports, with ratings of perceived accuracy and opinion confirmation best differentiating between authentic and modal reports. Automated Assessment Associates' report received the highest overall ratings; reports published by Western Psychological Services, Pearson Assessments, and the Caldwell Report were also distinguished on one or more ratings dimensions.  相似文献   

2.
A prototypic experiment for validating computer-based test interpretations (CBTIs) was conducted. Undergraduates (N = 63) completed the Comprehensive Personality Profile Compatibility Questionnaire (CPPCQ; Craft, 1987). One treatment group rated real CBTIs for relative accuracy, and another group rated bogus CBTIs. A significant main effect for differences in ratings indicated that the real CPPCQ profiles were rated as 74.5% accurate whereas the bogus CBTIs were rated as 57.9% accurate. Several covariate effects were tested, but none were significant.  相似文献   

3.
To determine the reliability with which untrained raters could identify stress in the speech of a single person, two forms of the same material, (1) speech broken into short utterances and (2) speech in its conversational context, were presented to 40 linguistically naive psychology students who were asked to underline those syllables that they perceived as stressed. High reliabilities were obtained from both interrater measures (r=0.96 for each treatment) and a test-retest estimate (r=0.88). However, significantly larger total stress scores were recorded under the short utterance presentation than under the context condition. It was suggested that this result occurred because each of the few syllables in short utterances received greater attention than did the stream of syllables in context. Subsequent regression analysis led to the prediction that, for a short passage to attain a mean score equal to that which it would receive if rated in context, it should contain approximately 40 syllables.  相似文献   

4.
Reliability and validity of some specific fear questionnaires   总被引:2,自引:0,他引:2  
Normative psychometric data on a Swedish translation of fear questionnaires concerning snakes, spiders, public speaking, and mutilation given to 223 college students are presented. High internal consistencies were found for all four questionnaires, and low intercorrelations among the four inventories emerged. In a separate study the inventories were administered to a clinical sample of spider and snake phobics. Phobics scored higher on their respective phobic scale than the college group but did not differ from controls on mutilation or public speaking scales. A significant negative correlation emerged between snake and spider scores among the phobics. One year test-retest reliabilities in the phobic sample were high. Finally, snake and spider phobics viewed and rated phobic and nonphobic slides. Snake phobics rated snake slides as more aversive than spider slides, whereas the reverse was true for spider phobics. The correlation between fear ratings and questionnaire scores was significant. Use of these scales in evaluating therapeutic changes is encouraged.  相似文献   

5.
We replicated the essential results of a prior study on the capacity of the BAROMAS scales to reflect stress in medical school as perceived by students. As before, subjective stress was high at the start of medical school, and when facing the exams prerequisite to entry into clinical clerkships. On most measures, stress was lowest when the second year began (i.e. after having passed the first). Once again, most test-retest reliabilities (significant rs ranged from 0.24 to 0.66 for confidence ratings at 12- and 20-months after entry) were moderate.  相似文献   

6.
The ability to perform movement imagery has been shown to influence motor performance and learning in sports and rehabilitation. Self-report questionnaires have been developed to assess movement imagery ability in adults, such as the Movement Imagery Questionnaire 3 (MIQ-3); however, there is a dearth of developmentally appropriate measures for use with children. To address this gap, the focus of this research was to develop an imagery ability questionnaire for children. This process involved adaptation of the MIQ-3 via: i) cognitive interviewing with twenty children, ii) validation with 206 children by examining its factor structure via multitrait-multi method confirmatory factor analysis, and iii) examination of test-retest reliability with 23 children. The findings of Study 1 led to changes to the wording of the questionnaire and modifications of the instructions to successfully adapt the MIQ-3 for children aged 7–12 years. The validation undertaken in Study 2 found that a correlated-traits correlated-uniqueness model provided the best fit to the data. Finally, test-retest reliabilities varied from fair (for external visual imagery) to substantial (for kinesthetic imagery). With respect to ease of imaging, no significant gender or age-group differences were noted. However, significant difference were found among the three imagery modalities (p < .001), with external visual imagery rated as easiest to image and kinesthetic imagery rated as the most difficult. Taken together, findings support the use of the MIQ-C for examining movement imagery ability with children.  相似文献   

7.
We studied 205 low-income families, using the Family Needs Scale (FNS). Factor analysis of the FNS data resulted on a 7-factor solution with high internal consistency within the various subscales. We provide normative scores based on the factor structure of the FNS. A total of 53 parents completed the FNS on two occasions with an average of four weeks between these two ratings. In general, the test-retest reliabilities were low to moderate. A total of 61 pairs of parents independently rated their families with the FNS. Again, agreement between raters was low to moderate. Several factors that may have detracted from better test-retest and interrater reliability were identified. Our data point to the need for more psychometric studies with the FNS.  相似文献   

8.
Students in a residential special school for children with emotional and behavioral disorders participated in a study designed to reduce their levels of inappropriate behavior. The residential care staff rated the students' behavioral problems and their class teachers rated their overt self-esteem pre and post intervention. In addition, the students completed self-ratings of their self-esteem. The students were divided into two groups, experimental and control. A multiple baseline across behaviors design was used to assess behavioral changes in the experimental group. Both groups received tangible rewards to the same level but only the experimental group received them contingent upon behaving appropriately. Results showed that the experimental group students made substantial reductions in their levels of inappropriate behavior, which were maintained at a three-month followup. Also, ratings of their behavioral problems by residential child care staff suggested that this improvement in behavior had generalized beyond the classroom to the residential setting. However, no significant differences were found between the pre- and post-intervention ratings of their self-esteem or teacher ratings of their overt self-esteem.  相似文献   

9.
The Diabetes Care Profile (DCP) is frequently used to assess diabetes-related quality of life. The social support scales have demonstrated internal consistency but test-retest reliability has not been established. Sixty-three type 2 diabetes patients participated in a telephone coaching study designed to improve diabetes control. The DCP was filled out at pre- and post-test. The test-retest reliabilities of three social support scales (Global, GET, GET-WANT) were calculated. At a mean retest interval of 6.5 months, the support scales showed reliabilities of 0.48 (Global and GET), and 0.38 (GET-WANT). The social support scales of the DCP show adequate long-term test-retest reliability.  相似文献   

10.
Job analysts who collect occupational information for the Dictionary of Occupational Titles observed and interviewed job incumbents representing 20 diverse occupations and rated each occupation on a wide variety of characteristics following standard United States Employment Services procedures. On the basis of four ratings, the large majority of 70 scales were found to have coefficient alpha (or KR-20) reliabilities in excess of .SO, and 25 scales had reliabilities ranging from .90 to .98; a variance ratio procedure yielded largely consistent estimates. Reliabilities were similar to those found in an earlier study using different procedures and were similar to those from a well-developed, occupa-tionally anchored scale of "Job Complexity," developed for this study. Scales representing broad, abstract job characteristics tended to have higher reliabilities than scales representing more concrete job characteristics.  相似文献   

11.
Meta-interpretive reliability is a new method to evaluate the accuracy with which personality trait scores are communicated via interpretive statements in a computer-based test interpretation (CBTI). The prototypic experimental design is based on a two-way repeated measures analysis of variance (ANOVA); the two effects are personality traits and randomly chosen CBTI protocols. In this application, 101 psychologists read four examples of the Karson Clinical Report (KCR, Karson & O'Dell, 1975) and estimated the original trait scores from the Sixteen Personality Factor Questionnaire (16PF; Cattell, Eber, & Tatsuoka, 1970) on which the KCR is based. Estimated trait score variance was significantly related to the Trait x Protocol interaction and the main effects for personality trait and differences among protocols (omega 2 = .55). The total effect size corresponded to a multiple correlation of .74, suggesting that the KCR had acceptable meta-interpretive reliability. The protocol effect denoted a context effect created by the juxtaposition of several interpretive statements. Additional analyses showed that individual differences among raters contributed to less than 1% of the estimated standard ten (sten) score variance. Meta-interpretive reliability is proposed as an index of the upper limit of validity for CBTIs.  相似文献   

12.
Wayne  Julie Holliday  Cordeiro  Bryanne L. 《Sex roles》2003,49(5-6):233-246
In this study, we examined perceptions of the citizenship behaviors of male and female employees who took leave to care for a newborn, a sick child, a sick parent, or who did not take leave. In a 2 (employee gender) × 4 (reason for leave) × 2 (participant gender) experimental design, 242 undergraduate students read a mock personnel file and rated the employee on altruism and generalized compliance. Female employees were not rated differently whether they took leave or not. Male employees who took leave for birth or eldercare were rated less likely to be altruistic at work than their male counterparts who did not take leave and their female counterparts who took leave. There was also a bias against male leave takers for generalized compliance ratings, especially by male evaluators. Future research ideas and implications for organizational practice are discussed.  相似文献   

13.
This paper investigated three factors, two related to accountability and one an individual difference factor, which may influence rating level: (1) identifiability, (2) to whom one feels accountable (audience), and (3) conscientiousness. In study 1, results from students who rated their instructors indicated that not only did raters relatively high in conscientiousness report feeling more accountability but also that identifiability and conscientiousness interacted in predicting rating level. Raters relatively low in conscientiousness provided higher ratings when identified but raters relatively high in conscientious did not provide higher ratings. Rating audience did not influence rating level. Study 2 replicated the findings from study 1: Low conscientious raters assigned higher ratings when identified than when anonymous but high conscientious raters did not assign higher ratings when identified. Implications are discussed.  相似文献   

14.
In prior studies, Shapiro and Goldberg (1986, 1990) failed to find a relationship between in-vivo ratings by children of treatment acceptability and treatment effectiveness. These studies involved the use of interdependent and dependent group contingencies designed to improve the spelling performance of sixth grade students. To investigate whether the failure to link treatment acceptability and effectiveness may have been due to the subjects' inability to understand the differences in treatment conditions, this study utilized a pre-intervention training package to enhance salient differences between two types of group contingencies. Results of this study showed that both group contingencies were successful at improving the spelling performance of students, particularly the poorer spellers. Prior to treatment, students preferred the interdependent condition, with the higher-achieving students expressing the strongest preference. After implementation of the training package, both conditions were now rated as equally acceptable. Pre- and post-test acceptability ratings of each condition tended to be significantly correlated but correlations between acceptability ratings and treatment effectiveness were negligible at all points in the study. Limitations of the present study and suggestions for further research are discussed.  相似文献   

15.
The present study explored the psychometric properties of Turner and Engle’s (1989) operation span task, a widely used measure of working memory capacity. We administered the task three times to 33 college students, using equivalent test materials. The interval between the first and second administrations was 3 weeks, with 6–7 weeks between the second and third administrations. Alpha coefficients were all .75 or more. Recall accuracy decreased as operation set size increased. Raw test-retest correlations ranged from .67 to .81, the corrected reliability was .88, and stability scores ranged from .76 to .92. Performance improved from the first to the second test. Relative to reported reliabilities of other tasks used to assess individual differences in working memory capacity, the operation span task appears to have several statistical advantages.  相似文献   

16.
This study explored the effects of a social studies peer-teaching intervention on student perceptions of class environment, adjustment, and academic performance. There were 45 students in the experimental group (E) and 46 controls (C) from four fifth-grade classes in a suburban, predominantly white, middle-class school. The Classroom Environment Scale (CES) and a School Opinion Survey were used to assess student views of the classroom. Students completed self-esteem and peer sociometric rating measures and teachers submitted adjustment ratings for all pupils. Report card and average monthly grades were recorded in social studies. After the intervention, Es compared to Cs came to see their classes as more Involved, Orderly and Organized, and Competitive, and reported being happier in class and enjoying aspects of their school work more. The groups did not differ in change in self-esteem. Both groups improved directionally in peer liking, though Cs did so more than Es. Teachers rated Es as having increased competence and decreased in problems after the program. Es did significantly better than Cs both on report cards and monthly social studies grades due primarily to the substantial improvement of Es with initially low academic status.  相似文献   

17.
Inter-rater reliability and accuracy are measures of rater performance. Inter-rater reliability is frequently used as a substitute for accuracy despite conceptual differences and literature suggesting important differences between them. The aims of this study were to compare inter-rater reliability and accuracy among a group of raters, using a treatment adherence scale, and to assess for factors affecting the reliability of these ratings. Paired undergraduate raters assessed therapist behavior by viewing videotapes of 4 therapists' cognitive behavioral therapy sessions. Ratings were compared with expert-generated criterion ratings and between raters using intraclass correlation (2,1). Inter-rater reliability was marginally higher than accuracy (p = 0.09). The specific therapist significantly affected inter-rater reliability and accuracy. The frequency and intensity of the therapists' ratable behaviors of criterion ratings correlated only with rater accuracy. Consensus ratings were more accurate than individual ratings, but composite ratings were not more accurate than consensus ratings. In conclusion, accuracy cannot be assumed to exceed inter-rater reliability or vice versa, and both are influenced by multiple factors. In this study, the subject of the ratings (i.e. the therapist and the intensity and frequency of rated behaviors) was shown to influence inter-rater reliability and accuracy. The additional resources needed for a composite rating, a rating based on the average score of paired raters, may be justified by improved accuracy over individual ratings. The additional time required to arrive at a consensus rating, a rating generated following discussion between 2 raters, may not be warranted. Further research is needed to determine whether these findings hold true with other raters and treatment adherence scales.  相似文献   

18.
This study compared how adults assess the credibility of children who either: (1) experienced a misleading suggestive interview, (2) were coached to lie or (3) experienced a non‐misleading interview. Preschool children (N = 24) were interviewed about a game they had played. One third of them spontaneously reported the truth, one third lied in response to coaching and one third spontaneously reported misinformation from a prior misleading suggestive interview. One hundred and twenty‐nine college students watched videotaped interviews of these children describing their alleged play activities and assessed their credibility. Children who had experienced misleading suggestive questioning were rated as less credible than those who were telling the truth and those who were lying. Adults could accurately detect truth‐telling children above chance, whereas accuracy was below chance detecting both lying children and children who had been misinformed. Adults were most confident of their ratings of truth‐telling children. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

19.
The aim of this study is to assess the test-retest stability of the Spanish version of Youth Self Report after 18 mo. for a sample of 357 Catalonian high school students (158 boys and 199 girls). At Time 2 the girls' scores increased on Delinquent and Aggressive Behavior scales and, therefore on Externalizing scores. At Time 2 the boys' scores increased on Attention Problems and Delinquent Behavior and decreased on Anxious/Depressed, Social Problems, and Internalizing scales. Significant differences in the remaining scales were not observed. The test-retest intraclass correlations for the broad-band scales ranged between .62 (Internalizing) and .68 (Externalizing) and for the narrow-band scales between .37 and .67. The correlations for girls and boys were similar but slightly higher for girls on Anxious/Depressed and Thought Problems.  相似文献   

20.
In an investigation of students’ prejudicial biases against instructors who smoke, 61 female and 16 male undergraduates watched and listened to a 20-min lecture about parasomnias, completed a survey asking for instructor evaluation ratings and ratings of perceived learning, and completed a lecture-retention test with multiple-choice questions to assess actual learning. In a between-subjects design, the lecture was given by either a man or woman, who was portrayed as a smoker or nonsmoker. The instructors’ sex and smoking status did not affect the students’ perceived or actual learning (all p’s > .05). However, a significant interaction on the instructor evaluation ratings revealed that students rated the female instructors equivalently (p = .78), but rated the smoker male instructor significantly lower than the nonsmoker male instructor (p = .01). These findings suggest that students hold prejudicial biases against male instructors who smoke, but that these biases do not affect student learning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号