首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
主观评分中多面Rasch模型的应用   总被引:1,自引:1,他引:0  
主观评分中存在的不一致性导致主观评分的信度降低。多面Rasch模型基于项目反应理论,可以应用于评分员效应的识别和消除,从而提高主观评分的信度。该文介绍多面Rasch模型的理论和应用框架,介绍了国外相关的典型应用,并且讨论了该模型的应用条件。  相似文献   

2.
Subjects (N=225) drawn from ten subpopulations were asked to rate nine different concepts on 39 evaluative semantic differential scales plus the “strong-weak” and “fast-slow” scales. Nineteen separate factor analyses (principal factoring with iteration followed by Varimax rotation) were performed for: (1) all subjects rating all concepts, (2–10) all subjects rating each concept, (11) adult males rating all concepts, (12) adult females rating all concepts, (13) college students rating all concepts, (14) nonstudents rating all concepts, (15) fifth grade students rating all concepts, (16) all males rating “Liz Taylor,” (17) all females rating “Liz Taylor,” (18) all males rating “your father,” and (19) all females rating “your father.” Rotated factor loadings greater than .50 are displayed for all analyses. There is little apparent similarity among the factor structures. In addition, oblique (Oblimin) and Quartimax rotations are performed for two of the analyses (all subjects rating “LBJ” and “U.S. Government”). The three rotated factor structures for each concept differ radically.  相似文献   

3.
THE CONTROL OF BIAS IN RATINGS: A THEORY OF RATING   总被引:2,自引:0,他引:2  
Based on several years of research and a careful analysis of the rating process Wherry developed a theory of rating. An accurate rating is seen as being a function of three major components: Performance of the ratee, observation of that performance by the rater, and the recall of those observations by the rater. Cast in a mold of classical psychometric theory each of these components is seen as consisting of a systematic portion and a random portion. The systematic portion of each component is further broken down. The performance of the ratee is a combination of true ability or aptitude for the job and the influence of the environment. What the rater observes is a function the performance of the ratee and bias of observation and what the rater recalls is a result of those observations combined with a bias of recall. The development of the theory of rating unfolds by defining the various factors that affect each of these components in a series of linear equations. Various theorems and corollaries are proposed which should lead to a maximization of the true ability component of the ratee and minimize environmental influence and the bias and error components. The theorems and corollaries suggest testable hypotheses for the researcher in performance evaluation.  相似文献   

4.
EFFECTS OF TRAINING AND RATING SCALES ON RATING ERRORS   总被引:1,自引:0,他引:1  
Ninety business students were randomly assigned to one of three conditions where they used behavioral observation scales (BOS), behavioral expectation scales (BES), or trait scales in observing people on videotape. Half the individuals received four hours of training to minimize rating errors. Rating errors were reduced significantly regardless of the rating scale that was used. However, behavioral criteria were more resistant to rating errors than trait scales. There was no significant difference between BOS and BES on this dimension. With regard to practicality, BOS were evaluated as significantly better than BES and trait scales. BES and trait scales did not differ significantly on this measure.  相似文献   

5.
For comparative evaluation of the subjective effects of 50 mg chlorpromazine, 0.10 g amobarbital and 10 mg amphetamine (phenopromine. sulf.) two types of formalized rating procedures—a verbal check list and graphic rating scales—were administered to 187 university students. Repeated self-ratings were performed 45, 90 and 180 minutes after oral intake. By the check-list method the expected difference between amphetamine and chlorpromazine was significantly established in all the three ratings. Only in the 45-minute rating was a significant difference obtained between amobarbital and amphetamine. The graphic rating scales were clearly less efficient as judged from the greater proportion reporting 'no change'.  相似文献   

6.
This paper argues that a construct-oriented approach to test validation is likely to enhance scientific understanding of our predictor measures, performance criteria, and links between them. In particular, examining relationships between relatively homogeneous predictors and criteria tapping specific performance areas operationalizes earlier conceptual statements made by Guion and Dunnette about test validation for scientific understanding. Two demonstrations are offered to show how measures of predictor constructs have predictably different patterns of correlations with different criteria. In a study of Navy recruiters ( N = 267), individual personality scales had significantly different relationships with three different rating criteria; in a second study, with Army enlisted soldiers ( N = 8, 642), cognitive ability and personality construct measures also showed predictable patterns of correlations, with rating criteria measuring three different performance areas. The paper discusses scientific and practical implications of this construct-oriented approach to test validation.  相似文献   

7.
School psychologists have traditionally experienced difficulty in assessing children referred to them for behavior disorders. Given this reported difficulty, a behavioral assessment model is proposed which specifies three types of assessment information: direct observations, rating scale data, and interview data. Characteristics of these three types of assessment information are discussed, along with recommendations for their use. Two psychological models are suggested to guide school psychologists through the assessment process. Bergan's behavioral consultation model is recommended for securing valid and reliable interview and observational data, and Campbell and Fiske's multitrait-multimethod model is proposed as a means of logically integrating behavioral assessment information. The notion of convergence or agreement between dissimilar assessment methods is discussed.  相似文献   

8.
The polytomous unidimensional Rasch model with equidistant scoring, also known as the rating scale model, is extended in such a way that the item parameters are linearly decomposed into certain basic parameters. The extended model is denoted as the linear rating scale model (LRSM). A conditional maximum likelihood estimation procedure and a likelihood-ratio test of hypotheses within the framework of the LRSM are presented. Since the LRSM is a generalization of both the dichotomous Rasch model and the rating scale model, the present algorithm is suited for conditional maximum likelihood estimation in these submodels as well. The practicality of the conditional method is demonstrated by means of a dichotomous Rasch example with 100 items, of a rating scale example with 30 items and 5 categories, and in the light of an empirical application to the measurement of treatment effects in a clinical study.Work supported in part by the Fonds zur Förderung der Wissenschaftlichen Forschung under Grant No. P6414.  相似文献   

9.
Jürgen Rost 《Psychometrika》1988,53(3):327-348
A general approach for analyzing rating data with latent class models is described, which parallels rating models in the framework of latent trait theory. A general rating model as well as a two-parameter model with location and dispersion parameters, analogous to Andrich's Dislocmodel are derived, including parameter estimation via the EM-algorithm. Two examples illustrate the application of the models and their statisticalcontrol. Model restrictions through equality constrains are discussed and multiparameter generalizations are outlined.  相似文献   

10.
Problem solving tasks of different informational content were presented for free choice, and the attractiveness of the situation was measured by repeated ratings. The hypothesis of an inverted U-shaped relationship between uncertainty and attractiveness of the situation was supported. The value of rating as a method for this purpose, and some results indicating the importance of stimulus patterning are discussed.  相似文献   

11.
HSK主观考试评分的Rasch实验分析   总被引:1,自引:0,他引:1  
主观评分中存在的不一致性导致主观评分的信度降低。多面Rasch模型基于项目反应理论,可以应用于评分员效应的识别和消除,从而提高主观评分的信度。该文介绍多面Rasch模型的理论和应用框架,设计了基于该模型的HSK主观考试评分质量控制应用框架,利用HSK作文评分数据进行了实验验证。  相似文献   

12.
黎光明  蒋欢 《心理科学》2019,(3):731-738
包含评分者侧面的测验通常不符合任意一种概化理论设计,因此从概化理论的角度来看这类测验下的数据应属于缺失数据,而决定缺失结构的就是测验的评分方案。用R软件模拟出三种评分方案下的数据,并比较传统法、评价法和拆分法在各评分方案下的估计效果,结果表明:(1)传统法估计准确性较差;(2)评分者一致性较高时,适宜用评价法进行估计;(3)拆分法的估计结果最准确,仅在固定评分者评分方案下需注意评分者与考生数量之比,该比值小于等于0.0047 时估计结果较为准确。  相似文献   

13.
The consistency and loci of leniency, halo, and range restriction effects in performance ratings were investigated in a longitudinal study. Ratings were provided by approximately 90 supervisors in a metropolitan police department, who rated approximately 350 police-rank subordinates on five occasions over a three and one-half year period. Rating effects were computed separately as rater-and ratee-based statistics, and intercorrelated among the five rating periods. The nature of the data set made it possible to hold either raters or ratees constant for each analysis, thus permitting inferences regarding the sources of reliable variance in effects as due to raters or ratees. It was concluded that reliable variance in mean ratings is partly attributable to ratees, but mainly introduced by raters. Reliable halo variance is attributable to raters, and range restriction is a product of stable group performance variability within intact ratee groups. Implications of these results for future rating process research are discussed.  相似文献   

14.
Research into the perceived restorativeness of environments tends to focus on the Kaplans' Attention Restoration Theory at the expense of the affective considerations of Ulrich's psychoevolutionary model. To better understand the role of emotion, this study used contextual text‐based primers (newspaper articles) to manipulate participants' affective state (positive or negative) prior to them rating different environments using the Restorative Components Scale. Sixty‐nine participants completed the web‐based study, being pseudo‐randomly allocated to the positive‐ or negative‐prime condition before rating three natural and three urban environments. Natural environments were rated as more restorative than urban, with negative‐priming giving higher mean ratings for all environments. This effect was overall statistically significant for two components (Being Away and Fascination), but only Fascination showed a significant interaction of affective‐prime and environment, a larger effect being seen for natural environments. Results are discussed in terms of current understanding of the interrelationship between attentional and affective processes.  相似文献   

15.
Judgments of attitude statements with the method of equal-appearing intervals have been found to vary as a function of the judges' attitudes. In this paper explanations of the relationship between judges' attitudes and judgments of attitude statements in terms of models of psychophysical judgment are discussed. It is argued that psychophysical models such as adaptation-level theory, the range-frequency model, and the ‘rubber-band’ model and its derivations, cannot account satisfactorily for judges' performance of the attitude rating task in a great number of studies. The reason for this failure, it is argued, is that the stimulus series employed in the psychophysical judgment research on which these models are based typically varied only on the dimension being judged. The sets of statements judged in attitude rating studies, however, vary not only on the dimension of interest (favourability—Unfavourability) but also on a number of other dimensions. It is suggested that this incidental stimulus variation of attitude statements may account for the failure of psychophysical models to predict accurately the performance of judges in the attitude rating task. It is argued that if principles which could account for the effects of this incidental stimulus variation on attitude ratings could be incorporated into psychophysical models, the predictive qualities of these models could be improved considerably. One such model is discussed.  相似文献   

16.
17.
Research has consistently identified poor interrater agreement among multiple assessments of managerial performance. Three alternative sources of dissensus in the effectiveness ratings were examined: rating errors, selective perceptions, and variations in criteria type or weight. As the available empirical evidence and theoretical analysis show, all three causes provide plausible reasons—though in varying degrees—for the low agreement coefficient. However, an empirical study designed to test three specific hypotheses on criterion type and criterion weights found consensus in the effectiveness models of superiors, subordinates, and peers. Consensus among different raters was high on both the role behaviors and on the personal traits of the managers as criteria for effectiveness. While these findings supported Biddle's role theory (1979), disagreement on the relative weights of these criteria was evident. These observations underscore the need for further conceptualization on the preference functions of raters as a primary source of the low convergent validity coefficients among multiple raters. Further research is also desirable on contextual and cognitive factors that may lead to shifts in criterion type and criterion weight, as well as on actual rating error tendencies among different raters.  相似文献   

18.
The purpose of this study was to determine the extent to which direct judgments of similarity by supervisors and incumbents could provide the same job classification results as a more elaborate job analysis procedure involving measures of task overlap among jobs. To accomplish this, 8 foreman jobs in a chemical processing plant were analyzed and compared on 237 task statements. In addition, 15 foremen incumbents and 17 supervisors evaluated the similarities among the same 8 foremen jobs in a paired comparisons rating task. The task-oriented job analysis required hundreds of man-hours to complete; the rating task took 15 minutes. Results using hierarchical cluster analysis and multidimensional scaling analysis revealed that the global judgments and the task-oriented data led to identical conclusions. Also, it was found that incumbent ratings produced the same results as ratings from supervisors. Uses, advantages, and disadvantages of the procedure are outlined.  相似文献   

19.
In general, correlations between assessment centre (AC) ratings and personality inventories are low. In this paper, we examine three method factors that may be responsible for these low correlations: differences in (i) rating source (other versus self), (ii) rating domain (general versus specific), and (iii) rating format (multi‐ versus single item). This study tests whether these three factors diminish correlations between AC exercise ratings and external indicators of similar dimensions. Ratings of personality and performance were combined in an analytical framework following a 2 × 2 × 2 (source, domain, format) completely crossed, within subjects design. Results showed partial support for the influence of each of the three method factors. Implications for future research are discussed. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

20.
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号