首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
While the Angoff (1971) is a commonly used cut score method, critics ( Berk, 1996; Impara & Plake, 1997 ) argue the Angoff places too‐high cognitive demands on raters. In response to criticisms of the Angoff, a number of modifications to the method have been proposed. Some suggested Angoff modifications include using an iterative rating process, presenting judges with normative data about item performance, revising the rating judgment into a Yes/No decision, assigning relative weights to dimensions within a test, and using item response theory in setting cut scores. In this study, subject matter expert raters were provided with a ‘difficulty anchored’ rating scale to use while making Angoff ratings; this scale can be viewed as a variation of the Angoff normative data modification. The rating scale presented test items having known p‐values as anchors, and served as a simple means of providing normative information to guide the Angoff rating process. Results are discussed regarding reliability of the mean Angoff rating (.73) and the correlation of mean Angoff ratings with item difficulty (observed r ranges from .65 to .73).  相似文献   

2.
Creativity in teaching is a significant and complex construct. However, in the local educational context, creativity in teaching has received little attention. This study aimed to investigate the validity, practicality, and benefits of applying a modified consensual assessment technique (CAT) to assess creativity in teaching design. Four hundred and eighty‐five written teaching designs were collected from 167 in‐service and pre‐service primary school teachers in Hong Kong. Instead of expert teachers, “supportive” peers, who had shown support, interest, and initiative in creative teaching were recruited as judges. A warm‐up exercise, with no definition of creativity, was given to the judges before beginning their assessments. The results indicated overall consistency in the judges' assessments of creativity, and that creativity factor could be distinguished from pedagogical skills and other technical factors. Most of the peer judges reported personal gains in creative teaching by engaging in the assessment process. On average, each judge spent approximately only 2 minutes rating each written teaching design. The findings confirm that the modified CAT is a valid and economical assessment method with learning benefits for the judges. The special values and implications of using supportive peer judges in consensual assessment are further discussed.  相似文献   

3.
In this study, we used a quasi-experimental pretest–posttest mixed design to assess the effect of association instruction on students' poetic creativity. Creativity was judged using the consensual assessment technique. A total of 64 fourth-grade students from two intact classes participated in the study. One class was assigned to the experimental group (n = 34) and the other to the control group (n = 30). Weekly for 5 weeks, the experimental group received 30 minutes of instruction in forming associations, and then each student composed a Chinese free verse based on a given association theme. The control group received traditional writing lectures prior to composing Chinese free verses. Three groups of judges assessed the completed poems (a total of 320 poems), evaluating their creativity on 14 dimensions. The judges included three expert teachers with at least 10 years of teaching experience in Chinese, three teachers who had won awards in nationwide Chinese writing contests, and three professors of children's literature; the overall inter-rater reliability was .85. The experimental group showed greater creativity compared to the control group in number association (d = 1.09), picture association (d = 0.62), and free association (d = 1.07). This article also discusses how to select judges, assessment criteria for children's poetic creativity, and techniques for association instruction to enhance children's poetic creativity.  相似文献   

4.
5.
The Consensual Assessment Technique (CAT) argues that the most valid judgments of the creativity are those of the combined opinions of experts in the field. Yet who exactly qualifies as an expert to evaluate a creative product such as a short story? This study examines both novice and expert judgments of student short fiction. Results indicate a need for caution in using non‐expert raters. Although there was only a small (but statistically significant) difference between experts' and novices' mean ratings, the correlation between the two sets of ratings was just .71. Experts were also far more consistent in their ratings compared to novices, whose level of inter‐rater reliability was potentially problematic.  相似文献   

6.
The judge or jury makes a subjective determination of when an expert is credible. However, no published measure exists for assessment of the credibility of expert witnesses. The current study addressed this gap by developing and cross‐validating the Witness Credibility Scale (WCS). Drawing on the narrative literature, we hypothesized that credibility was a product of four factors: “likeability,” “believability,” “trustworthiness,” and “intelligence.” A 41‐item measure was initially constructed based on successive iterations of ratings by a panel of judges using items from the Osgood Semantic Differential measure and was subsequently administered to 264 undergraduates. A factor analysis of the data yielded a factor structure that consisted of four factors labeled, “knowledge,” “likeability,” “trustworthiness,” and “confidence.” The final version of the WCS used 20 adjectives with four subscales of five items, each subscale reflecting high loadings on the respective factors. The scale was then tested in five additional studies, in which the scale successfully differentiated between groups of videotaped experts testifying in manipulated conditions. The empirical data from these studies permit a foundation for comparing outcome data in future research investigations. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

7.
The 13‐item self‐rated creativity scale (SRCS) initially developed for supervisory rating of employees’ creativity was modified by some researchers and used as a self‐report of creativity. However, it is not clear if the modified SRCS is psychometrically sound. The present study addressed this gap in three studies (N = 1,033). The exploratory factor analysis (Study 1) revealed a two‐factor solution after removing Item 9 due to low factor loading. Confirmatory factor analysis was then used in Study 2 to examine and compare the conceptual one‐factor model with 13 items (Model 1), one‐factor model with 12 items (Model 2), two‐factor model with 12 items (Model 3), and the 12‐item bifactor model with one general factor and two specific factors (Model 4). The results indicated that Model 4 is more superior to all the competing models. Study 3 further confirmed that the bifactor model, showed support to the reliability and convergent validity, and found partial metric invariance across Chinese and Malay undergraduates. Taken together, the modified (12‐item) SRCS is a psychometrically sound tool for self‐rated creativity in the Malaysian context.  相似文献   

8.
This study sought to understand the effect of problem finding and creativity style on the creative musical product. Participants (N = 32) were categorized by creativity style (adaptor or innovator) using the Kirton Adaption-Innovation Inventory. The participants completed two musical composition problems involving two different degrees of problem finding behaviors: an open (ill defined) and a closed (more defined) problem. The resulting products were scored for creativity by three judges using a modified version of Amabile's “consensual assessment technique.” A repeated measures analysis of variance (ANOVA) was used to analyze the data. The independent variables were composition problem type and creativity style, and the dependent variable was the creativity score on the open and closed problems. No significant differences due to problem type, creativity style, or the interaction of the two factors was found. This research supports the assertion of Kirton that adaption-innovation theory is a measure of creativity style rather than creativity level, but calls into question its use in individual creativity style.  相似文献   

9.
Two studies were designed to compare (a) the rated creativity of artworks created by American and Chinese college students, and (b) the criteria used by American and Chinese judges to evaluate these artworks. The study demonstrated that the two groups of students differed in their artistic creativity. American participants produced more creative and aesthetically pleasing artworks than did their Chinese counterparts, and this difference in performance was recognized by both American and Chinese judges. The difference between the use of criteria by American and Chinese judges was small, and consisted mainly of the American judges' use of stricter standards in evaluating overall creativity. Moreover, in general, there was a greater consensus among Chinese judges regarding what constitutes creativity than among American judges. The study also revealed, but preliminarily, that the artistic creativity of Chinese students was more likely to be reduced as a function of restrictive task constraints or of the absence of explicit instructions to be creative. The results of this study seem to support the hypothesis that an independent self‐oriented culture is more encouraging of the development of artistic creativity than is an interdependent self‐oriented culture. Other possible explanations, such as differences in people's attitudes toward and motivation for engaging in art activities, or socioeconomic factors might also account for differences in people's artistic creativity.  相似文献   

10.
Although previous studies have attempted to use different experiences of raters to rate product creativity by adopting the Consensus Assessment Method (CAT) approach, the validity of replacing CAT with another measurement tool has not been adequately tested. This study aimed to compare raters with different levels of experience (expert ves. nonexpert raters) using both CAT and the product creativity measurement instrument (PCMI) to assess the product creativity of 56 design works based on a design competition. The results showed that nonexpert raters who used either CAT or PCMI had higher interreliability than expert raters. Using PCMI was found to result in higher correlation than using CAT for the expert and nonexpert raters, although the correlation between the CAT and PCMI methods was statistically insignificantly different. After regression analysis, the results showed that all PCMI items had higher explanatory power for the creativity scores using CAT and, moreover, the nonexpert raters were found to have higher explanatory power than the expert raters. Based on these results, it is recommended that the use of both nonexpert raters and PCMI is an alternative way of enhancing the flexibility of product creativity assessment.  相似文献   

11.
Ratings on a creativity rating scale of students' designs of a hands-free mobile phone holder were compared for 2 sets of raters: experts (professional art teachers) and novices (visual art students). Reliabilities of total creativity scores were high for both groups, and interjudge consistency on total creativity scores, as well as on grades, was high among novices, but not as high among experts. Correlations between grades and total functional creativity scores within and across groups of raters (experts and novices) were highly significant. Scores on the scale resembled those yielded by assessments using grades and the scale did not yield better consistency among judges than conventional grades. Nonetheless, it provided a differentiated assessment of products that made it possible to explain the basis of experts' opinions and the reasons for disagreement, and to discuss the strengths and weaknesses of students' designs in a systematic and differentiated way.  相似文献   

12.
Research on teachers' creativity fostering behavior has been much neglected in spite of the important role teachers play in developing student creativity. One possible reason for this is the lack of a suitable measure of teachers' creativity fostering behavior. A 45‐item self‐rating scale based on nine creativity fostering behaviors identified by Cropley (1997) was developed and validated with self‐describing adjectives checklist. Analysis shows adequate construct and concurrent validities. Specific teachers' creativity fostering behaviors were found to correlate with sex and ethnicity. Further work is suggested.  相似文献   

13.
The Expagg questionnaire was developed to measure a subject's view of their own aggression as a relatively instrumental or relatively expressive act. Two issues have been raised pertaining to the dimensional structure of the questionnaire: the use of principal components analysis on dichotomous responses and the possibility that instrumental and expressive representations might be independent dimensions rather than opposite ends of a single continuum. In study 1, dichotomous Expagg data from 405 subjects were subjected to microfact, principal components, and factor analysis. Each produced a first general factor, and the correlations between the item loadings were in excess of r = .99. In study 2, a 40‐item Likert scale version of Expagg was given to 295 subjects. Principal components analysis, paired item correlations, and subscale correlations suggested partial independence of instrumental and expressive items. Two new 8‐item scales measuring instrumental and expressive representations were constructed that maximise their independence. Potential uses of these revised scales are discussed. Aggr. Behav. 25:435–444, 1999. © 1999 Wiley‐Liss, Inc.  相似文献   

14.
For item responses fitting the Rasch model, the assumptions underlying the Mokken model of double monotonicity are met. This makes non‐parametric item response theory a natural starting‐point for Rasch item analysis. This paper studies scalability coefficients based on Loevinger's H coefficient that summarizes the number of Guttman errors in the data matrix. These coefficients are shown to yield efficient tests of the Rasch model using p‐values computed using Markov chain Monte Carlo methods. The power of the tests of unequal item discrimination, and their ability to distinguish between local dependence and unequal item discrimination, are discussed. The methods are illustrated and motivated using a simulation study and a real data example.  相似文献   

15.
Background: Although there have been numerous studies conducted on the psychometric properties of Biggs' Learning Process Questionnaire (LPQ), these have involved the use of traditional omnibus measures of scale quality such as corrected item total correlations, internal consistency estimates of reliability, and factor analysis. However, these omnibus measures of scale quality are sample dependent and fail to model item responses as a function of trait level. And since the item trait relationship is typically nonlinear, traditional factor analytic methods are inappropriate. Aims: The purpose of this study was to identify a unidimensional subset of LPQ items and examine the effectiveness of these items and their options in discriminating between changes in the underlying trait level. In addition to assessing item quality, we were interested in assessing overall scale quality with non‐sample dependent measures. Method: The sample was split into two nearly equal halves, and a undimensional subset of items was identified in one of these samples and cross‐validated in the other. The nonlinear relationship between the probability of endorsing an item option and the underlying trait level was modelled using a nonparametric latent trait technique known as kernel smoothing and implemented with the program TestGraf. After item and scale quality were established, maximum likelihood estimates of participants' trait level were obtained and used to examine grade and gender differences. Results: A undimensional subset of 16 deep and achieving items was identified. Slightly more than half of these items needed some of their options combined so that the probability of endorsing an item option as a function of increasing trait level corresponded to the ideal rank ordering of the item options. With this adjustment, scale quality as measured by the information function and standard error function was found to be good. However, no statistically significant gender differences were observed and, although statistically significant grade differences were observed, they were not substantively meaningful. Conclusions: The use of nonparametric kernel‐smoothing techniques is advocated over parametric latent trait methods for the analysis of attitudinal and psychological measures involving polychotomous ordered‐response categories. It is also suggested that latent trait methods are more appropriate than traditional test‐based measures for studying differential item functioning both within and between cultures. Nonparametric kernel‐smoothing techniques hold particular promise in identifying and understanding cross‐cultural differences in student approaches to learning at both the item and scale level.  相似文献   

16.
This study describes the development of a screening tool for gaming addiction in adolescents – the Gaming Addiction Identification Test (GAIT). Its development was based on the research literature on gaming and addiction. An expert panel comprising professional raters (= 7), experiential adolescent raters (= 10), and parent raters (= 10) estimated the content validity of each item (I‐CVI) as well as of the whole scale (S‐CVI/Ave), and participated in a cognitive interview about the GAIT scale. The mean scores for both I‐CVI and S‐CVI/Ave ranged between 0.97 and 0.99 compared with the lowest recommended I‐CVI value of 0.78 and the S‐CVI/Ave value of 0.90. There were no sex differences and no differences between expert groups regarding ratings in content validity. No differences in the overall evaluation of the scale emerged in the cognitive interviews. Our conclusions were that GAIT showed good content validity in capturing gaming addiction. The GAIT needs further investigation into its psychometric properties of construct validity (convergent and divergent validity) and criterion‐related validity, as well as its reliability in both clinical settings and in community settings with adolescents.  相似文献   

17.
This study assessed the relationships between characteristics of biographical items from the Armed Services Applicant Profile and the items' validity in predicting the retention of enlisted military personnel. Item characteristics were appraised with ratings by expert judges and test takers, word and alternative counts, and response latencies. Item content was also appraised with ratings by expert judges. The more valid items involved overt behavior or experiences, dealt with discrete behavior or experiences, and had heterogeneous content. After controlling for item content, only the latter characteristic was related to validity. Item characteristics and item content interacted in several instances.  相似文献   

18.
The relation between narcissism and other‐derogation has been examined primarily in the context of ego threat. In three studies, we investigated whether narcissistic individuals derogate others in the absence of ego threat. In Study 1, 79 judges watched four videotaped dyadic interactions and rated the personality of the same four people. In Study 2, 66 judges rated the personality of a friend. In Study 3, 72 judges considered the average Northeastern University student and rated the personality of this hypothetical person. Across the three studies, targets' personality characteristics were described on the 100‐item California Adult Q‐Sort (CAQ; Block, 2008). Judges' ratings of targets were compared to a CAQ prototype of the optimally adjusted person to assess target‐derogation. Judges' narcissism and other‐derogation were positively related in Studies 1 and 2. Narcissism positively predicted and self‐esteem negatively predicted target‐derogation after controlling for each other in Study 3. Narcissistic individuals derogate others more than non‐narcissistic individuals regardless of whether ego threat is present or absent.  相似文献   

19.
Several studies have found an association between frequency of dream recall and creativity. We tested the hypothesis that training individuals to increase dream recall by means of a daily dream log would increase scores on the Torrance Test of Creative Thinking (TTCT). One hundred twenty‐five participants completed a baseline measure of creativity (TTCT, figural version) as well as of dream recall, dissociation, thinness of psychological boundaries, mindful‐attention awareness, and well‐being. Participants were randomly allocated to two groups: the experimental group (n = 55) received a daily dream log; while the control group (n = 32) received a similarly phrased log registering memories of a vivid episode from the previous day. After 27 days, all participants completed follow‐up measurements identical to those at baseline. A non‐randomized non‐intervention group (n = 35) was used to test for practice effects on the TTCT. There was significant selective increase for the “creative strengths” component, which was only observed in the experimental group. There were significant correlations between creativity and dissociation as well as between creativity and thinness of psychological boundaries. Enhanced dream recall through daily dream logging fosters aspects of creativity. Associations between creativity, dissociation, and thinness of boundaries, suggest that increased awareness to dreams increases creativity through a “loosening” of stereotyped thinking pattern.  相似文献   

20.
Daubert required judges to base their decisions about the admissibility of expert witness testimony in large part on the reliability and validity of empirical observations. Because judges have a wide array of duties and may not be equipped to understand the complexities of statistical analysis, some jurists have recommended that court‐appointed experts assist judges in their gatekeeping function. To assist such experts in scrutinizing empirical papers, we propose a Structured Statistical Judgement (SSJ) that takes advantage of advances in the various statistical methods – such as effect sizes that adjust for error – which have allowed researchers to report increasingly more reliable and valid observations. We also include supplementary materials that court‐appointed experts can use both as a codebook to operationalize the SSJ and as a quick reference that will aid consultation with judges. An initial application of the SSJ examined all 93 empirical articles published in Psychology, Public Policy, and Law and Law and Human Behavior in 2015 and resulted in excellent interrater reliability (π = 0.83; π = 0.95; π = 0.97), at the same time it indicated that a majority of the articles fail to include the comprehensive and transparent statistical analysis that would be most useful to courts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号